Lexicon - Manual of the XDG Development Kit

Next: Lattices, Previous: Outputs1, Up: Compiler

4.6 Lexicon

In this section, we explain how to write the lexicon of an XDG grammar. The XDG lexicon is a mapping from words to sets of lexical entries for that word. Lexical entries can be constructed using a hierarchy of parametrized lexical classes.

4.6.1 Disjunction

The grammar file compiler supports a kind of disjunction to express lexical generalizations. The idea stems from Marie-Helene Candito's work on Metagrammar (Generating an LTAG out of a Principle-based Hierarchical Representation, References) for LTAG. Disjunction corresponds to Candito's notion of crossings. The Metagrammar approach is pursued further by Benoit Crabbe and Denys Duchier (A Metagrammatical Formalism for Lexicalized TAGs, Lexical Classes for Structuring the Lexicon of a TAG, Metagrammar Redux), and Denys Duchier actually had the idea to incorporate crossings under the disguise of disjunction into the XDK grammar file compiler (References).

4.6.2 Defining lexical entries

A lexical entry is divided into entry dimensions corresponding to the individual used dimensions. The type of an entry dimension for dimension d equals the entry type for d. In the following, we call the value of entry dimension for a dimension d the d entry. Obligatorily, each lexical entry must define the word feature on the lex dimension. This is the key of the lexical entry.

In the UL, a lexical entry is written as follows:

     defentry {
       dim d_1 <term_1>
       ...
       dim d_n <term_n> }

where <term_i> is the value of the entry dimension for dimension d_i, i.e. the d_i entry (1<=i<=n).

4.6.2.1 Example (lexical entry)

Below, we show an example lexical entry. It follows the type definitions of our example grammar file Grammars/Acl01.ul:

     defentry {
       dim id {in: {det}
               out: {}
               agrs: ($ fem & (dat|gen) & sg & def)
               agree: {}
               govern: {det: $ ()
                        subj: $ ()
                        obj: $ ()
                        vbse: $ ()
                        vprt: $ ()
                        vinf: $ ()
                        prt: $ ()}}
       dim lp {in: {df}
               out: {}
               on: {d}}
       dim idlp {blocks: {}
                 end: {d: {}
                       df: {}
                       n: {}
                       mf: {}
                       vcf: {}
                       p: {}
                       pf: {}
                       v: {}
                       vxf: {}}
       dim lex {word: "der"}}

The id entry sets the in field to value {det}, i.e. a singleton set containing the constant det. It sets the out field to value {}, i.e. the empty set. The agrs field is set to ($ fem & (dat|gen) & sg & def) which is a set generator expression. We explain set generator expressions in Types reference. Suffice to say here that set generator expressions describe sets of tuples of a certain type, using set generator conjunction & and set generator disjunction |. Here, the set generator expression describes all tuples with feminine gender (fem), either dative or genitive case ((dat|gen)), singular (sg), and definite (def). The agree field is set the empty set, and the govern field to a record which maps each edge label on the id dimension to an empty set generator expression. The empty set generator expression denotes all possible tuples of the corresponding type.

The lp entry sets the in field to value {df}, out to {}, and on to {d}.

The idlp entry sets the blocks field to {}. The link field is set to a record which maps each edge label on the lp dimension to the empty set.

The value of word on the lex dimension is “der”; i.e. the dimension record for the lex dimension sets the key for the entire lexical entry to “der”

4.6.3 Defining lexical classes

Lexical entries can be build more conveniently using lexical classes. A lexical class is a lexical entry with the difference that the value of the feature word on the lex dimension does not have to be defined. Instead each lexical class has its unique class identifier. In the UL, a lexical class is written as follows:

     defclass <constant> {
       dim d_1 <term_1>
       ...
       dim d_n <term_n> }

where the constant is the class identifier, and <term_i> is the d_i entry (1<=i<=n).

4.6.3.1 Example (lexical class)

Here is an example lexical class:

     defclass "det" {
       dim id {in: {det}
               out: {}
               agrs: ($ fem & (dat|gen) & sg & def)
               agree: {}
               govern: {det: $ ()
                        subj: $ ()
                        obj: $ ()
                        vbse: $ ()
                        vprt: $ ()
                        vinf: $ ()
                        prt: $ ()}}
       dim lp {in: {df}
               out: {}
               on: {d}}
       dim idlp {blocks: {}
                 end: {d: {}
                       df: {}
                       n: {}
                       mf: {}
                       vcf: {}
                       p: {}
                       pf: {}
                       v: {}
                       vxf: {}}
       dim lex {word: "der"}}

The only difference to the lexical entry above is that the class has the identifier “det” in addition to its key “der”.

4.6.3.2 Class parameters

Classes can introduce an arbitrary number of variables called class parameters.

In the UL, class parameters are introduced after the class identifier and must begin with an upper case letter:

     defclass <constant> <variable_1> ... <variable_m> {
       dim d_1 <term_1>
       ...
       dim d_n <term_n> }

where <variable_j> (1<=j<=m) correspond to the class parameters.

4.6.3.3 Example (lexical class with parameters)

Here is an example of a class with class parameters:

     defclass "det" Word Agrs {
       dim id {in: {det}
               out: {}
               agrs: Agrs
               agree: {}
               govern: {det: $ ()
                        subj: $ ()
                        obj: $ ()
                        vbse: $ ()
                        vprt: $ ()
                        vinf: $ ()
                        prt: $ ()}}
       dim lp {in: {df}
               out: {}
               on: {d}}
       dim idlp {blocks: {}
                 link: {d: {}
                        df: {}
                        n: {}
                        mf: {}
                        vcf: {}
                        p: {}
                        pf: {}
                        v: {}
                        vxf: {}}
       dim lex {word: Word}}

The lexical class has two parameters, Word and Agrs. Word is the value of the word feature on the lex dimension, and Agrs is the value of the agrs feature on the id dimension.

4.6.4 Using lexical classes

Lexical classes can used to construct other lexical classes or to construct lexical entries. All parameters of a class must be instantiated upon use.

In the UL, a class use is written as follows:

     useclass <constant> {
       <variable_1> : <term_1>
       ...
       <variable_m> : <term_m> }

where the constant is the class identifier, and parameter <variable_j> is bound to <term_j> (1<=j<=m).

Notice that you can omit the useclass keyword for convenience.

4.6.4.1 Example (class use)

In the example below, we construct a lexical entry for the word “der” using the lexical class det defined above (note that we omit the useclass keyword here):

     defentry {
       "det" {Word: "der"
              Agrs: ($ fem & (dat|gen) & sg & def)}}

The resulting lexical entry is identical to the lexical entry in the example given above.

4.6.5 Disjunction

The XDK grammar file compiler supports the use of disjunction, as a powerful tool to model lexical generalizations. If a value can be either A or B, you write that down: A or B. In the resulting lexicon, the XDK grammar file compiler compiles out all possibilities into separate lexical entries.

In the UL, disjunction is written using the | operator.

4.6.5.1 Example (disjunction of set generator expressions)

In the example below, we use disjunction to express that the determiner “der” in German can have three different agreement values:

     defentry {
       "det" {Word: "der"
             Agrs: (($ masc & nom & sg & def) |
                    ($ fem & (dat|gen) & sg & def) |
                    ($ gen & pl & def))}}

I.e., the agreement value is either ($ masc & nom & sg & def), ($ fem & (dat|gen) & sg & def), or ($ gen & pl & def). In the resulting lexicon, the expression above yields three lexical entries, differing only in the value of their agreement (i.e. the value of the feature agrs on the id dimension). Notice that the | operator within the second set generator expression (($ fem & (dat|gen) & sg & def) stands for set generator disjunction which is a different form of disjunction inside set generators. Set generator disjunction does not yield additional lexical entries.