Go to CSIRO Corporate website
Mathematical and Information Sciences Smarter Information Use
null
null HOME Research Media Careers Contacts Events Site Map Productsnavigation null null
null

 

 

Business Intelligence
Projects
Expertise
Papers
Reports
Talks
Seminars
Interactions
Resources
Publicity
People
Location
Contact
Business Intelligence - Research

Combinatory Categorial Grammars

Combinatory, categorial grammars

A combinatory, categorial grammar (CCG) is essentially a set of logical rules which indicate how pairs of functional elements may be combined. The grammar consists of categories, which specify the type of each element, and combinators, which dictate how categories may be combined to form new categories.

The categories, e.g. X, X/Y, X\Y, (X\Y)/Z, etc., stipulate whether each element is a function or an argument. An argument will have an atomic category, e.g. X, which contains no internal structure (i.e. no forward or backward slashes). For functional categories, the number, types and location of the arguments of the function are specified explicitly by the directions of the slashes and the positions of the sub-categories. Thus, the category (X\Y)/Z indicates that an element with category Z should appear to the right of the functional element, and that an element with category Y should appear to the left of the functional element. Given these two arguments, the result will be a composite element with category X. An example of such categories is given for the sentence:

                            He    sat    down.

                            N    (S\N)/P    P

The final atomic category, S, in (S\N)/P indicates that the result will be a complete sentence.

The combinators, e.g. >, <, >B, etc., stipulate whether or not two categories may be combined, and also help determine the result of such a combination. Some typical rules are:

                            Forward application:      X/Y     Y       =>    X       using    >

                            Backward application:    Y        X\Y    =>    X       using    <

                            Forward composition:     X/Y    Y/Z    =>    X/Z    using    >B

The example sentence above obeys this simple grammar:

                            He     sat      down.

                            N    (S\N)/P    P
                                   ------------->
                                           S\N
                             -----------------<
                                        S

The above structure is one representation of a parse of the likely syntax of the sentence.

Lexicalisation

The CCG parse of a sequence of elements (e.g. a sentence) provides structure, but it does not provide much towards an interpretation of the sequence. Some of the meaning of the sequence lies in the underlying nature of the elements themselves. For instance, in the example sentence above the verb sat provides most of the meaning of the sentence, and one possible interpretation might be given by the predicate sat (he, down).

This syntactical approach to semantics relies on a lexicalisation of the CCG, in which the rules are extended to make allowances for the forms of the elements themselves. In such an LCCG, the above rules would become:

                            Forward application:      X/Y: f(.)     Y: a         =>    X: f(a)           using    >

                            Backward application:    Y: a           X\Y: f(.)   =>    X: f(a)           using    <

                            Forward composition:     X/Y: f(.)    Y/Z: g(.)   =>    X/Z: f(g(.))    using    >B

where the symbol a indicates an argument, and f(.) and g(.) are functions. The values of a, f and g are usually derived from the lexical names of the elements. In the sentence above, the lexical names might just be the words themselves (or simplifications of the words). Possible LCCG parses of the sentence might then be:

                            He          sat                 down.

                            N: he    (S\N)/P: sat   P: down
                                        ------------------------->
                                                S\N: sat (down)
                             ---------------------------------<
                                        S: (sat (down)) (he)

or

                            He              sat                 down.

                            N: he    (S\N)/P: sat(., .)    P: down
                                        ------------------------------->
                                                S\N: sat (., down)
                             ------------------------------------<
                                        S: sat (he, down)

Observe that the syntactico-semantics of LCCG parses (e.g. the two interpretations above) are not uniquely defined by the rules of the grammar alone.

Context sensitivity

**** I'm here ****

ACME builds and its subsidiary markets the widgets.

Difficulties arise in automatically identifying the referent of its and the product that ACME builds.

In view of the problems caused by the grammatical structure, identification of the syntax should be an aid to the information extraction. The current leading edge parsers use context free grammars. The analysis produced by such a system is illustrated below. It’s clear that the link between ACME Inc. and the widgets it manufactures has not been identified.

Context-free derivation of "ACME builds and its subsidiary markets the widgets"

A context free parse

 

Combinatory Categorial Grammars (CCG) are a mildly context-sensitive lexicalised grammar formalism that offers transparent semantics and simple accounts of some of the more complicated grammatical structures (relative clauses, coordination and subject and object extraction. A CCG derivation of the same sentence at the top of this page shows how the more structured tags and sophisticated rules used ensure that both relationships expressed in the sentence are reflected in the analysis.

Parsers for context-sensitive grammars have higher computational complexity than those for context free grammars; O(n6) compared O(n3) for a sentence containing n words. However, the use of a probabilistic model of the grammar enables the selective use of the context-sensitive parts of the grammar and therefore efficient parsers. At the highest level the joint probability of a n-word, sentence, S, and its n-level derivation,, , can be written:

Using such a model the Business Intelligence Group has developed, implemented and patented and patented a Probabilistic Lexicalised parser for a Combinatory Categorial Grammar, which we will use both as a research tool and in appplications.

 


Up arrowTo top

last updated September 29, 2003 12:44 AM
Geoff.Jarrad@csiro.au

© Copyright 2010, CSIRO Australia
Use of this web site and information available from
it is subject to our
Legal Notice and Disclaimer and Privacy Statement