|
|
Computational Methods for Word Sense
Disambiguation in English Text
Location: Adelaide Supervisor: Prof. Daniel McMichael Value:
$525 p.w. + expenses Project description The main purpose of
computational linguistics is the extraction of meaning from text in order to
provide useful services, such as question-answering and translation. This project builds on CSIRO’s
extensive work on combinatory categorial grammars for automatic parsing of
English text. This activity is
building towards algorithms for extracting deep semantics from text (see
below). A key element of that,
examined by this project, is the automatic estimation of word senses in
previously unseen text. Its outcome
will be an investigation of selected existing algorithms for word sense disambiguation;
such algorithms are able to extract the correct senses for the words the
sentence “The plane banked sharply.”, for example:
Background Traditionally, grammars
have been used to extract the syntactic structure of sentences and
text fragments. For example, using a very simple context-free grammar,
the syntactic structure of the sentence "I ate doughnuts for tea",
shown as a parse tree, is
The grammar we have used has the following syntactical
categories (some of which are parts of speech):
A set of production rules that define the possible branchings in a parse tree was used to construct the tree
above. While the parse tree identifies
such items as verbs, verb phrases, subjects and objects, it does not identify
the semantics of the sentence.
By semantics, we mean a set of objects (people and things) and
a set of predicates (attributions and processes) that summarise its
meaning. In this sentence the
semantics are:
In such a statement of semantics, each object and predicate have
significance external to the text; for example, the predicate
“Eat(.)” corresponds to an action of animate objects of ingesting
food carry out, and the object referred to by “I” is a
person. The extraction of deep
semantics requires the estimation of word senses, and to do this we need to
create an extensive corpus of word-sense annotated text – hence the
need for this project. For more background information see Grammars
and Syntactic Methods. The Project The project will take place over about 10
weeks and will involve the implementation of context-sensitive word sense
disambiguation algorithms including, for example, those of Lesk (1986), Yarrowsky (1992),
and Dorow and Widdows (2003), [1], [2], [3], [4], [5]. The student
will work closely with the Information Inference for Decision Support team
(previously, The Business Intelligence Group) in Skills Required
Bibliography [1] Agirre, E. and [2] Dorow, [3]
[4] Resnik, P. 1997. Selectional preference and sense
disambiguation. In Proceedings of the ACL SIGLEX Workshop on Tagging Text
with Lexical Semantics: Why, What, and How?, [5] Yarowsky, D. 1992. Word-sense disambiguation using
statistical models of For further information re this project e-mail: Instructions for Applying for a Vacation Scholarship For further human resources information on
this Danielle McNicol telephone: (03) 9545 8036 or Yvonne Craig
telephone: (03) 9545 8009 There are no selection criteria to
address. Instead you must provide evidence as to why you would be the
most appropriate applicant to be granted a scholarship. This is to be
attached when it asks you to attach your Selection Criteria in the
online application process. There are 6 scholarships being offered at a
number of locations. Please clearly state which project/s you wish
to be considered for (up to three preferences will be accepted). You will need to provide course transcripts
at interview (must have a credit average or better). Ensure you include a
Curriculum Vitae. If you are unable to lodge your application
online you can facsimile your application (quoting reference number: 03/M28)
to (02) 6276 6707 or alternatively post to: CSIRO
Careers Online No applications received after the closing
date of Apply On-line
Now!
(corrected) |
||||||||||||||||||||||||||||||||