|
Business Intelligence - Research
|
Combinatory Categorial Grammars
|

Combinatory, categorial
grammars
A combinatory, categorial
grammar (CCG) is essentially a set of logical rules which indicate how
pairs of functional elements may be combined. The grammar consists of
categories, which specify the type of each element, and combinators, which
dictate how categories may be combined to form new categories.
The categories,
e.g. X, X/Y, X\Y, (X\Y)/Z, etc., stipulate whether each element is a
function or an argument. An argument will have an atomic category, e.g. X,
which contains no internal structure (i.e. no forward or backward
slashes). For functional categories, the number, types and location of the
arguments of the function are specified explicitly by the directions of
the slashes and the positions of the sub-categories. Thus, the category
(X\Y)/Z indicates that an element with category Z should appear to the
right of the functional element, and that an element with category Y
should appear to the left of the functional element. Given these two
arguments, the result will be a composite element with category X. An
example of such categories is given for the sentence:
He sat down.
N (S\N)/P P
The final atomic category, S,
in (S\N)/P indicates that the result will be a complete sentence.
The combinators,
e.g. >, <, >B, etc., stipulate whether or not two categories may be
combined, and also help determine the result of such a combination. Some
typical rules are:
Forward application: X/Y
Y => X
using >
Backward application: Y
X\Y => X
using <
Forward composition: X/Y Y/Z
=> X/Z using >B
The example sentence above
obeys this simple grammar:
He sat down.
N
(S\N)/P P
------------->
S\N
-----------------<
S
The above structure is one
representation of a parse of the likely syntax of
the sentence.
Lexicalisation
The CCG parse of a sequence of
elements (e.g. a sentence) provides structure, but it does not provide much
towards an interpretation of the sequence. Some of the meaning of the sequence
lies in the underlying nature of the elements themselves. For instance, in the
example sentence above the verb sat provides most of the meaning of the
sentence, and one possible interpretation might be given by the predicate sat
(he, down).
This syntactical approach to
semantics relies on a lexicalisation of the CCG, in which the rules are
extended to make allowances for the forms of the elements themselves. In such an
LCCG, the above rules would become:
Forward application: X/Y:
f(.)
Y: a => X:
f(a) using >
Backward application: Y: a X\Y:
f(.) => X: f(a)
using <
Forward composition: X/Y: f(.) Y/Z:
g(.) => X/Z: f(g(.)) using >B
where the symbol a indicates
an argument, and f(.) and g(.) are functions. The values of a,
f and g are usually derived from the lexical names of the
elements. In the sentence above, the lexical names might just be the words
themselves (or simplifications of the words). Possible LCCG parses of the
sentence might then be:
He sat
down.
N: he
(S\N)/P: sat P: down
------------------------->
S\N:
sat (down)
---------------------------------<
S: (sat (down)) (he)
or
He sat down.
N: he
(S\N)/P: sat(., .) P: down
------------------------------->
S\N: sat (., down)
------------------------------------<
S: sat (he, down)
Observe that the syntactico-semantics
of LCCG parses (e.g. the two interpretations above) are not uniquely defined by
the rules of the grammar alone.
Context sensitivity
**** I'm here ****
ACME builds and its subsidiary markets the widgets.
Difficulties
arise in automatically identifying the referent of its and the product
that ACME builds.
In view of the problems caused by the grammatical
structure, identification of the syntax should be an aid to the information
extraction. The current leading edge parsers use context free grammars. The
analysis produced by such a system is illustrated below. It’s clear that the
link between ACME Inc. and the widgets it manufactures has not been identified.

A context
free parse
Combinatory Categorial Grammars (CCG) are a mildly
context-sensitive lexicalised grammar formalism that offers transparent
semantics and simple accounts of some of the more complicated grammatical
structures (relative clauses, coordination and subject and object
extraction. A CCG derivation of the same sentence at the
top of this page shows how the more structured tags and sophisticated
rules used ensure that both relationships expressed in the sentence are
reflected in the analysis.
Parsers for context-sensitive grammars have higher
computational complexity than those for context free grammars; O(n6)
compared O(n3) for a sentence containing n words.
However, the use of a probabilistic model of the grammar enables the selective
use of the context-sensitive parts of the grammar and therefore efficient
parsers. At the highest level the joint probability of a n-word,
sentence, S, and its n-level derivation,,
, can be written:
Using such a model the Business Intelligence Group has
developed, implemented and patented and patented a Probabilistic Lexicalised
parser for a Combinatory Categorial Grammar, which we will use both as a
research tool and in appplications.

To
top
last updated September 29, 2003 12:44 AM
Geoff.Jarrad@csiro.au
|