5.2 Formal Presentation

In this section, we briefly outline the formal presentation of dependency grammar presented in [Duc99a]. The basic idea is that each node of the dependency tree must be assigned a lexical entry from the lexicon, and that certain principles of well-formedness must be verified. For example, the lexical entry stipulates a valency, and the node must have precisely the outgoing dependency edges that realize this valency.

A dependency grammar is a 7-tuple

\TUP{\setof{Words},\setof{Cats},\setof{Args},\setof{Comps},\setof{Mods},\setof{Lexicon},\setof{Rules}}

\setof{Words}

a finite set of strings notating fully inflected forms of words.

\setof{Cats}

a finite set of categories such as n (noun), det (determiner), or vfin (finite verb).

\setof{Args}

a finite set of (what we call) agreement tuples such as <masc sing 3 nom>.

\setof{Comps}

a finite set of complement roles, such as subject or zu_infinitive.

\setof{Mods}

a finite set of modifier roles, such as adj (adjectives), disjoint from \setof{Comps}. We write \setof{Roles}=\setof{Comps}\uplus\setof{Mods} for the set of all types of roles.

\setof{Lexicon}

a finite set of lexical entries (see below).

\setof{Rules}

a finite family of binary predicates, indexed by role labels, expressing local grammatical principles: for each \rho\in\setof{Roles}, there is \Gamma_\rho\in\setof{Roles} such that \Gamma_\rho(w_1,w_2) characterizes the grammatical admissibility of a dependency edge labeled \rho from mother w_1 to daughter w_2.

A lexical entry is an attribute value matrix (AVM) with signature:

\avm{%
{string}{\setof{Words}}%
{cat}{\setof{Cats}}%
{agr}{\setof{Agrs}}%
{comps}{\mbox{$2^{\text{\setof{Comps}}}$}}}

and we write attribute access in functional notation. If e is a lexical entry, then \Feature{string}(e) is the full form of the corresponding word, \Feature{cat}(e) is the category, \Feature{agr}(e) is the agreement tuple, and \Feature{comps}(e) the valency expressed as a set of complement roles.

5.2.1 Dependency Tree

We assume given a set of nodes \VV which for simplicity we identify with the set of integers \{1,\ldots,n\}, and a set \EE\subseteq\VV\times\VV\times\setof{Roles} of labeled directed edges between these node. (\VV,\EE) is a directed graph, in the classical sense, with labeled edges. We restrict our attention to finite graphs that are also trees.

A dependency tree is then defined as a pair \TT\equiv((\VV,\EE),\Feature{entry}) of a tree as stipulated above and a function \Feature{entry} mapping each node in \VV to a lexical entry in \setof{Lexicon}.

5.2.2 Well-Formedness Principles

Not all dependency trees as described above are grammatically admissible. We now describe the conditions of admissibility, aka well-formedness.

While we have identified nodes with integers, we still prefer to write w_i (and often just w or w') instead of i to remind the reader that they correspond to words and to distinguish them from other occurrences of integers that have no such interpretation. We also write w\EDGE\rho w' to represent an edge labeled \rho from mother w to daughter w'.

First, any complement required by a node's valency must be realized precisely once:

\forall w\in\VV,\
\forall\rho\in\Feature{entry}(w),\
\exists! w'\in\VV,\ w\EDGE\rho w'\in\EE

Second, if there is an outgoing edge from w, then it must be labeled by a modifier role or by a complement role in w's valency:

\forall w\EDGE\rho w'\in\EE,\
\rho\in\setof{Mods}\UNION
\Feature{comps}(\Feature{entry}(w))

Third, whenever there is an edge w\EDGE\rho w', then the grammatical condition \Gamma_\rho(w,w') for \Gamma_\rho\in\setof{Rules} must be satisfied in \TT:

\forall w\EDGE\rho w'\in\EE,\
\TT\models\Gamma_\rho(w,w')


Denys Duchier
Version 1.2.0 (20010221)