CSIRO Australia

 

 

Computational Methods for Word Sense Disambiguation in English Text


Location: Adelaide

Supervisor: Prof. Daniel McMichael

Value: $525 p.w. + expenses

Project description

The main purpose of computational linguistics is the extraction of meaning from text in order to provide useful services, such as question-answering and translation.  This project builds on CSIRO’s extensive work on combinatory categorial grammars for automatic parsing of English text.  This activity is building towards algorithms for extracting deep semantics from text (see below).  A key element of that, examined by this project, is the automatic estimation of word senses in previously unseen text.  Its outcome will be an investigation of selected existing algorithms for word sense disambiguation; such algorithms are able to extract the correct senses for the words the sentence “The plane banked sharply.”, for example:

Word

The

plane

banked

sharply.

Correct word sense

determiner

aircraft

turn and rotated the aircraft on its axis towards the turn

involving high rates of change of angular acceleration

Incorrect word sense

determiner

plane as in geometry

paid in (as with a cheque)

as in “sharply delineated shape”

Background

Traditionally, grammars have been used to extract the syntactic structure of sentences and text fragments.  For example, using a very simple context-free grammar, the syntactic structure of the sentence "I ate doughnuts for tea", shown as a parse tree, is

The grammar we have used has the following syntactical categories (some of which are parts of speech):

V = verb

VP = verb phrase

N = noun

NP = noun phrase

S = sentence

PP = prepositional phrase

P = preposition

ART = article

A set of production rules that define the possible branchings in a parse tree was used to construct the tree above.   While the parse tree identifies such items as verbs, verb phrases, subjects and objects, it does not identify the semantics of the sentence.  By semantics, we mean a set of objects (people and things) and a set of predicates (attributions and processes) that summarise its meaning.  In this sentence the semantics are:

Objects:

O1 = (ref = “I”, sing.)

O2 = (ref = “the doughnuts”, plur.)

O3 = (ref = “tea”, sing.)

Predicates:

P1 = Eat(subject = O1, object = O2, tense = past, Attachment = P2)

P2 = For(argument = O3)

In such a statement of semantics, each object and predicate have significance external to the text; for example, the predicate “Eat(.)” corresponds to an action of animate objects of ingesting food carry out, and the object referred to by “I” is a person.  The extraction of deep semantics requires the estimation of word senses, and to do this we need to create an extensive corpus of word-sense annotated text – hence the need for this project.

For more background information see Grammars and Syntactic Methods.

The Project

The project will take place over about 10 weeks and will involve the implementation of context-sensitive word sense disambiguation algorithms including, for example, those of Lesk (1986), Yarrowsky (1992), and Dorow and Widdows (2003), ‎[1], ‎[2], ‎[3], ‎[4], ‎[5].  The student will work closely with the Information Inference for Decision Support team (previously, The Business Intelligence Group) in Adelaide, and be responsible for writing a report on the findings.  There will be the opportunity to experiment and look for improvements in the algorithms investigated.

Skills Required

  • Strong mathematical background
  • Some knowledge of programming algorithms
  • Good written communication skills
  • Solid IT skills (on either windows or UNIX platforms)

Bibliography

[1]    Agirre, E. and Martinez, D., 2001, “Learning Class-to-Class Selectional Preferences”, Proceedings of the ACL CONLL Workshop. Toulouse, France.

[2]    Dorow, B. Widdows, D., 2003, “Discovering Corpus-Specific Word Senses”, EACL 2003, Budapest, Hungary Conference Companion (research notes and demos) pages 79-82.

[3]    Lesk, M.E. 1986. Automated sense disambiguation using machine-readable dictionaries: How to tell a pine cone from an ice cone. In Proceedings of the SIGDOC Conference.

[4]    Resnik, P. 1997. Selectional preference and sense disambiguation. In Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, Washington, D.C., USA.

[5]    Yarowsky, D. 1992. Word-sense disambiguation using statistical models of Roget’s categories. In Proceedings of COLING-92, Nantes, France.

 

For further information re this project e-mail:
Daniel.McMichael@csiro.au

Instructions for Applying for a Vacation Scholarship

For further human resources information on this Danielle McNicol telephone: (03) 9545 8036 or Yvonne Craig telephone: (03) 9545 8009

There are no selection criteria to address.  Instead you must provide evidence as to why you would be the most appropriate applicant to be granted a scholarship.  This is to be attached when it asks you to attach your Selection Criteria in the online application process.    

There are 6 scholarships being offered at a number of locations.

Please clearly state which project/s you wish to be considered for (up to three preferences will be accepted).

You will need to provide course transcripts at interview (must have a credit average or better).

Ensure you include a Curriculum Vitae.

If you are unable to lodge your application online you can facsimile your application (quoting reference number: 03/M28) to (02) 6276 6707 or alternatively post to:

CSIRO Careers Online
PO Box 225
DICKSON  ACT  2602       

No applications received after the closing date of 10th October 2003 will be considered.

Apply On-line Now! (corrected)

 

© Copyright 1997-2003, CSIRO Australia
Use of this web site and information available from
it is subject to our
Legal Notice and Disclaimer and Privacy Statement

© Copyright 2010, CSIRO Australia
Use of this web site and information available from
it is subject to our
Legal Notice and Disclaimer and Privacy Statement