In this section, we explain how to write the lexicon of an XDG grammar. The XDG lexicon is a mapping from words to sets of lexical entries for that word. Lexical entries can be constructed using a hierarchy of parametrized lexical classes.
The grammar file compiler supports a kind of disjunction to express lexical generalizations. The idea stems from Marie-Helene Candito's work on Metagrammar (Generating an LTAG out of a Principle-based Hierarchical Representation, References) for LTAG. Disjunction corresponds to Candito's notion of crossings. The Metagrammar approach is pursued further by Benoit Crabbe and Denys Duchier (A Metagrammatical Formalism for Lexicalized TAGs, Lexical Classes for Structuring the Lexicon of a TAG, Metagrammar Redux), and Denys Duchier actually had the idea to incorporate crossings under the disguise of disjunction into the XDK grammar file compiler (References).
A lexical entry is divided into entry dimensions
corresponding to the individual used dimensions. The type of an entry
dimension for dimension d equals the entry type for d.
In the following, we call the value of entry dimension for a dimension
d the d entry.
Obligatorily, each lexical entry must define the word
feature
on the lex dimension. This is the key
of the lexical entry.
In the UL, a lexical entry is written as follows:
defentry { dim d_1 <term_1> ... dim d_n <term_n> }
where <term_i>
is the value of the entry dimension for
dimension d_i
, i.e. the d_i
entry (1<=i<=n
).
Below, we show an example lexical entry. It follows the type
definitions of our example grammar file Grammars/Acl01.ul
:
defentry { dim id {in: {det} out: {} agrs: ($ fem & (dat|gen) & sg & def) agree: {} govern: {det: $ () subj: $ () obj: $ () vbse: $ () vprt: $ () vinf: $ () prt: $ ()}} dim lp {in: {df} out: {} on: {d}} dim idlp {blocks: {} end: {d: {} df: {} n: {} mf: {} vcf: {} p: {} pf: {} v: {} vxf: {}} dim lex {word: "der"}}
The id entry sets the in
field to value {det}
, i.e. a
singleton set containing the constant det
. It sets the
out
field to value {}
, i.e. the empty set. The
agrs
field is set to ($ fem & (dat|gen) & sg & def)
which is a set generator expression.
We explain set generator expressions in Types reference. Suffice
to say here that set generator expressions describe sets of tuples of
a certain type, using set generator conjunction &
and set
generator disjunction |
. Here, the set generator expression
describes all tuples with feminine gender (fem
), either dative
or genitive case ((dat|gen)
), singular (sg
), and
definite (def
). The agree
field is set the empty set,
and the govern
field to a record which maps each edge label on
the id dimension to an empty set generator expression. The empty set
generator expression denotes all possible tuples of the corresponding
type.
The lp entry sets the in
field to value {df}
,
out
to {}
, and on
to {d}
.
The idlp entry sets the blocks
field to {}
. The
link
field is set to a record which maps each edge label on the
lp dimension to the empty set.
The value of word
on the lex dimension is “der”; i.e. the
dimension record for the lex dimension sets the key for the entire
lexical entry to “der”
Lexical entries can be build more conveniently using lexical
classes.
A lexical class is a lexical entry with the difference that the value
of the feature word
on the lex dimension does not have to be
defined. Instead each lexical class has its unique class
identifier.
In the UL, a lexical class is written as follows:
defclass <constant> { dim d_1 <term_1> ... dim d_n <term_n> }
where the constant is the class identifier, and <term_i>
is the
d_i
entry (1<=i<=n
).
Here is an example lexical class:
defclass "det" { dim id {in: {det} out: {} agrs: ($ fem & (dat|gen) & sg & def) agree: {} govern: {det: $ () subj: $ () obj: $ () vbse: $ () vprt: $ () vinf: $ () prt: $ ()}} dim lp {in: {df} out: {} on: {d}} dim idlp {blocks: {} end: {d: {} df: {} n: {} mf: {} vcf: {} p: {} pf: {} v: {} vxf: {}} dim lex {word: "der"}}
The only difference to the lexical entry above is that the class has the identifier “det” in addition to its key “der”.
Classes can introduce an arbitrary number of variables called class parameters.
In the UL, class parameters are introduced after the class identifier and must begin with an upper case letter:
defclass <constant> <variable_1> ... <variable_m> { dim d_1 <term_1> ... dim d_n <term_n> }
where <variable_j>
(1<=j<=m
) correspond to the class
parameters.
Here is an example of a class with class parameters:
defclass "det" Word Agrs { dim id {in: {det} out: {} agrs: Agrs agree: {} govern: {det: $ () subj: $ () obj: $ () vbse: $ () vprt: $ () vinf: $ () prt: $ ()}} dim lp {in: {df} out: {} on: {d}} dim idlp {blocks: {} link: {d: {} df: {} n: {} mf: {} vcf: {} p: {} pf: {} v: {} vxf: {}} dim lex {word: Word}}
The lexical class has two parameters, Word
and
Agrs
. Word
is the value of the word
feature on
the lex dimension, and Agrs
is the value of the agrs
feature on the id dimension.
Lexical classes can used to construct other lexical classes or to construct lexical entries. All parameters of a class must be instantiated upon use.
In the UL, a class use is written as follows:
useclass <constant> { <variable_1> : <term_1> ... <variable_m> : <term_m> }
where the constant is the class identifier, and parameter
<variable_j>
is bound to <term_j>
(1<=j<=m
).
Notice that you can omit the useclass
keyword for convenience.
In the example below, we construct a lexical entry for the word
“der” using the lexical class det
defined above (note that we
omit the useclass
keyword here):
defentry { "det" {Word: "der" Agrs: ($ fem & (dat|gen) & sg & def)}}
The resulting lexical entry is identical to the lexical entry in the example given above.
The XDK grammar file compiler supports the use of disjunction, as a powerful tool to model lexical generalizations. If a value can be either A or B, you write that down: A or B. In the resulting lexicon, the XDK grammar file compiler compiles out all possibilities into separate lexical entries.
In the UL, disjunction is written using the |
operator.
In the example below, we use disjunction to express that the determiner “der” in German can have three different agreement values:
defentry { "det" {Word: "der" Agrs: (($ masc & nom & sg & def) | ($ fem & (dat|gen) & sg & def) | ($ gen & pl & def))}}
I.e., the agreement value is either ($ masc & nom & sg & def)
,
($ fem & (dat|gen) & sg & def)
, or ($ gen & pl &
def)
. In the resulting lexicon, the expression above yields three
lexical entries, differing only in the value of their agreement
(i.e. the value of the feature agrs
on the id
dimension). Notice that the |
operator within the second set
generator expression (($ fem & (dat|gen) & sg & def)
stands for
set generator disjunction which is a different form of disjunction
inside set generators. Set generator disjunction does not yield
additional lexical entries.