Next: XML syntax, Previous: Types reference, Up: Compiler
In this section, we describe the syntax of User Language (UL) grammar files, using the Extended Backus Naur Form (EBNF) as defined in the XML specification of the W3C (see http://www.w3.org/TR/REC-xml#sec-notation).
In this section, we lay out the lexical syntax of the UL.
Here are the keywords of the UL:
<keyword> ::= args | attrs |
bool | bot |
card |
defattrstype | defclass | defdim | defentry |
defentrytype | defgrammar | deflabeltype | deftype |
dim | dims |
entry |
infty | int | ints | iset |
label | list |
vec |
output |
ref |
set | string |
top | tuple | tv |
useclass | usedim | useoutput | useprinciple |
valency
Here are the operators of the UL:
<operator> ::= { | } | ( | ) | * | & | '| | '@ | [ | ] | < | > |
$ | . | :: | _ | ^ | ! | ? | + | # | :
Identifiers consist of letters and the underscore:
<id> ::= [a-zA-Z_]+
Integers consist of numbers:
<int> ::= [0-9]+
Strings can be quoted using single quotes (<sstring>), double
quotes (<dstring>), or guillemet quotes (<gstring>). You
can freely choose between the different kinds of quotes. Inside the
quotes, you can write strings using any characters from the ISO 8859-1
character set. We write . for “any character from the ISO
8859-1 character set”:
<sstring> ::= '.+'
<dstring> ::= ".+"
<gstring> ::= «.+»
End of line comments are written using the percent symbol %.
Balanced comments start with /* and end with */.
Files can be included using the \input directive. For example
to include the file Chorus_header.ul, you write:
\include "Chorus_header.ul"
In this section, we lay out the context-free syntax of the UL. We
write all keywords in lower case, and all non-terminals in upper case
letters. We use single quotes to escape the meta characters (,
), [, ], ?, *, +, #,
|, and ..
The start symbol of our context-free grammar is S:
S ::= Defgrammar*
Here is the UL Syntax for grammar definitions:
Defgrammar ::= defdim Constant { Defdim* }
| defclass Constant Constant* { Class }
| defentry { Class* }
| usedim Constant
defdim Constant { Defdim* } defines a dimension with
identifier Constant, and dimension definitions Defdim*.
defclass Constant Constant* { Class } defines a lexical class
with identifier Constant, class variables Constant*, and
class body Class.
defentry { Class* } defines a lexical entry defined by
class bodies Class*.
usedim Constant uses the dimension with identifier
Constant.
Here is the UL syntax for dimension definitions:
Defdim ::= defattrstype Type
| defentrytype Type
| deflabeltype Type
| deftype Constant Type
| useprinciple Constant { Useprinciple* }
| output Constant
| useoutput Constant
defattrstype Type defines the attributes type Type.
defentrytype Type defines the entry type Type.
deflabeltype Type defines the label type Type.
deftype Constant Type defines the type Type with
identifier Constant.
useprinciple Constant { Useprinciple* } uses the principle
with identifier Constant and dimension and argument mappings
Useprinciple*.
output Constant chooses output Constant.
useoutput Constant uses output Constant.
Here is the UL syntax for principle use instructions:
Useprinciple ::= dims { VarTermFeat* }
| args { VarTermFeat* }
dims { VarTermFeat* } is the dimension mapping
VarTermFeat*.
args { VarTermFeat* } is the argument mapping
VarTermFeat*.
This is the UL syntax of types:
Type ::= { Constant* }
| set '(' Type ')'
| iset '(' Type ')'
| tuple '(' Type* ')'
| list '(' Type ')'
| valency '(' Type ')'
| { TypeFeat+ }
| { : }
| vec '(' Type_1 Type_2 ')'
| card
| int
| ints
| string
| bool
| ref '(' Constant ')'
| Constant
| label '(' Constant ')'
| tv '(' Constant ')'
| '(' Type ')'
{ Constant* } is a finite domain consisting of the
constants Constant*.
set '(' Type ')' is a accumulative set with domain Type.
iset '(' Type ')' is a intersective set with domain Type.
tuple '(' Type* ')' is a tuple with projections Type*.
list '(' Type ')' is a list with domain Type.
valency '(' Type ')' is a valency with domain Type.
{ TypeFeat+ } is a record with features TypeFeat+.
{ : } is the empty record.
vec '(' Type_1 Type_2 ')' is a vector with fields Type_1
and values of type Type_2.
card is a cardinality set.
int is an integer.
ints is a set of integers.
string is a string.
bool is a boolean.
ref '(' Constant ')' is a type reference to the type with
identifier Constant.
Constant is a shortcut for ref '(' Constant ')'.
label '(' Constant ')' is an label reference to the label type
on the dimension referred to by dimension variable Constant.
tv '(' Constant ')' is a type variable.
'(' Type ')' encapsulates type Type.
Here is the UL syntax of a lexical class body:
Class ::= dim Constant Term
| useclass Constant
| useclass Constant { VarTermFeat* }
| Constant
| Constant { VarTermFeat* }
| Class_1 & Class_2
| Class_1 '|' Class_2
| '(' Class ')'
dim Constant Term defines the entry Term for the
dimension with identifier Constant.
useclass Constant uses the lexical class with identifier
Constant.
Constant is a shortcut for useclass Constant.
useclass Constant { VarTermFeat* } uses the lexical class
with identifier Constant and class parameters
VarTermFeat*.
Constant { VarTermFeat* } is a shortcut for useclass
Constant { VarTermFeat* }.
Class & Class is the conjunction of Class_1 and
Class_2.
Class '|' Class is the disjunction of Class_1 and
Class_2.
'(' Class ')' brackets class Class.
Here is the UL syntax of terms:
Term ::= Constant
| Integer
| top
| bot
| Featurepath
| CardFeat
| { Term* }
| '[' Term* ']'
| { Recspec+ }
| { : }
| $ Setgen
| $ '(' ')'
| Term :: Type
| Term_1 & Term_2
| Term_1 '|' Term_2
| Term_1 @ Term_2
| '<' Term* '>'
| '(' Term ')'
Constant is a constant.
Integer is an integer.
top is lattice top.
bot is lattice bottom.
Featurepath is a feature path.
CardFeat is a cardinality specification.
{ Term* } is a set of the elements Term*.
'[' Term* ']' is a list of the elements Term* (in this
order).
{ Recspec+ } is a record with specification Recspec+.
{ : } is the empty record.
$ Setgen introduces set generator expression with set
generator expression body Setgen.
$ '(' ')' is the empty set generator expression.
Term :: Type is a type annotation of term Term with type
Type.
Term_1 & Term_2 is the conjunction of Term_1 and
Term_2.
Term_1 '|' Term_2 is the disjunction of Term_1 and
Term_2.
Term_1 @ Term_2 is the concatenation
of Term_1 and Term_2. Concatenation is restricted to
strings.
'<' Term* '>' is an order generator specification of a list of
elements Term*.
'(' Term ')' brackets term Term.
Here is the UL syntax of feature paths:
Featurepath ::= Root '.' Constant '.' Aspect ('.' Constant)+
Root ::= _|^
Aspect ::= attrs|entry
Root '.' Constant '.' Aspect ('.' Constant)+ is a feature path
with root variable Root, dimension variable Constant,
aspect Aspect, and the list fields ('.'Constant)+.
Here is the UL syntax of record specifications:
Recspec ::= TermFeat
| Recspec_1 & Recspec_2
| Recspec_1 '|' Recspec_2
| '(' Recspec ')'
TermFeat is a feature.
Recspec_1 & Recspec_2 is the conjunction of Recspec_1
and Recspec_2.
Recspec_1 '|' Recspec_2 is the disjunction of Recspec_1
and Recspec_2.
'(' Recspec ')' brackets record specification
Recspec.
Here is the UL syntax of set generator expression bodies:
Setgen ::= Constant
| Setgen_1 & Setgen_2
| Setgen_1 '|' Setgen_2
| '(' Setgen ')'
Constant is a constant.
Setgen_1 & Setgen_2 is the conjunction of Setgen_1
and Setgen_2.
Setgen_1 '|' Setgen_2 is the disjunction of Setgen_1
and Setgen_2.
'(' Setgen ')' brackets set generator expression body
Setgen.
Here is the UL syntax of constants:
Constant ::= <id> | <sstring> | <dstring> | <gstring>
I.e. a constant is either an identifier (<id>), a single quoted
string (<sstring>), a double quoted string (<dstring>),
or a guillemot quoted string (<gstring>).
Here is the UL syntax of constants:
Integer ::= <int> | infty
I.e. an integer is either an integer (<int>) or the keyword for
“infinity” (infty).
Here is the UL syntax of features:
ConstantFeat ::= Constant_1 : Constant_2
TermFeat ::= Constant : Term
VarTermFeat ::= Constant : Term
TypeFeat ::= Constant : Type
CardFeat ::= Constant Card
ConstantFeat is a feature with field Constant_1 and
value Constant_2.
TermFeat and VarTermFeat are features with field
Constant and value Term.
TypeFeat is a feature with field Constant and value
Type.
CardFeat is a cardinality specification with field
Constant and cardinality set Card.
Here is the UL syntax of cardinality sets:
Card ::= !
| '?'
| '*'
| '+'
| '#' { Integer* }
| '#' '[' Integer_1 Integer_2 ']'
! is cardinality set {0}.
'?' is the cardinality set {0,1}.
'*' is the cardinality set {0,...,infty} where
infty means “infinity”.
'+' is the cardinality set {1,...,infty}.
'#' { Integer* } is the cardinality set including the
integers Integer*.
'#' '[' Integer_1 Integer_2 ']' is the cardinality set
including the closed interval between Integer_1 and
Integer_2.