Next: XML syntax, Previous: Types reference, Up: Compiler
In this section, we describe the syntax of User Language (UL) grammar files, using the Extended Backus Naur Form (EBNF) as defined in the XML specification of the W3C (see http://www.w3.org/TR/REC-xml#sec-notation).
In this section, we lay out the lexical syntax of the UL.
Here are the keywords of the UL:
<keyword> ::= args | attrs | bool | bot | card | defattrstype | defclass | defdim | defentry | defentrytype | defgrammar | deflabeltype | deftype | dim | dims | entry | infty | int | ints | iset | label | list | vec | output | ref | set | string | top | tuple | tv | useclass | usedim | useoutput | useprinciple | valency
Here are the operators of the UL:
<operator> ::= { | } | ( | ) | * | & | '| | '@ | [ | ] | < | > | $ | . | :: | _ | ^ | ! | ? | + | # | :
Identifiers consist of letters and the underscore:
<id> ::= [a-zA-Z_]+
Integers consist of numbers:
<int> ::= [0-9]+
Strings can be quoted using single quotes (<sstring>
), double
quotes (<dstring>
), or guillemet quotes (<gstring>
). You
can freely choose between the different kinds of quotes. Inside the
quotes, you can write strings using any characters from the ISO 8859-1
character set. We write .
for “any character from the ISO
8859-1 character set”:
<sstring> ::= '.+' <dstring> ::= ".+" <gstring> ::= «.+»
End of line comments are written using the percent symbol %
.
Balanced comments start with /*
and end with */
.
Files can be included using the \input
directive. For example
to include the file Chorus_header.ul
, you write:
\include "Chorus_header.ul"
In this section, we lay out the context-free syntax of the UL. We
write all keywords in lower case, and all non-terminals in upper case
letters. We use single quotes to escape the meta characters (
,
)
, [
, ]
, ?
, *
, +
, #
,
|
, and .
.
The start symbol of our context-free grammar is S
:
S ::= Defgrammar*
Here is the UL Syntax for grammar definitions:
Defgrammar ::= defdim Constant { Defdim* } | defclass Constant Constant* { Class } | defentry { Class* } | usedim Constant
defdim Constant { Defdim* }
defines a dimension with
identifier Constant
, and dimension definitions Defdim*
.
defclass Constant Constant* { Class }
defines a lexical class
with identifier Constant
, class variables Constant*
, and
class body Class
.
defentry { Class* }
defines a lexical entry defined by
class bodies Class*
.
usedim Constant
uses the dimension with identifier
Constant
.
Here is the UL syntax for dimension definitions:
Defdim ::= defattrstype Type | defentrytype Type | deflabeltype Type | deftype Constant Type | useprinciple Constant { Useprinciple* } | output Constant | useoutput Constant
defattrstype Type
defines the attributes type Type
.
defentrytype Type
defines the entry type Type
.
deflabeltype Type
defines the label type Type
.
deftype Constant Type
defines the type Type
with
identifier Constant
.
useprinciple Constant { Useprinciple* }
uses the principle
with identifier Constant
and dimension and argument mappings
Useprinciple*
.
output Constant
chooses output Constant
.
useoutput Constant
uses output Constant
.
Here is the UL syntax for principle use instructions:
Useprinciple ::= dims { VarTermFeat* } | args { VarTermFeat* }
dims { VarTermFeat* }
is the dimension mapping
VarTermFeat*
.
args { VarTermFeat* }
is the argument mapping
VarTermFeat*
.
This is the UL syntax of types:
Type ::= { Constant* } | set '(' Type ')' | iset '(' Type ')' | tuple '(' Type* ')' | list '(' Type ')' | valency '(' Type ')' | { TypeFeat+ } | { : } | vec '(' Type_1 Type_2 ')' | card | int | ints | string | bool | ref '(' Constant ')' | Constant | label '(' Constant ')' | tv '(' Constant ')' | '(' Type ')'
{ Constant* }
is a finite domain consisting of the
constants Constant*
.
set '(' Type ')'
is a accumulative set with domain Type
.
iset '(' Type ')'
is a intersective set with domain Type
.
tuple '(' Type* ')'
is a tuple with projections Type*
.
list '(' Type ')'
is a list with domain Type
.
valency '(' Type ')'
is a valency with domain Type
.
{ TypeFeat+ }
is a record with features TypeFeat+
.
{ : }
is the empty record.
vec '(' Type_1 Type_2 ')'
is a vector with fields Type_1
and values of type Type_2
.
card
is a cardinality set.
int
is an integer.
ints
is a set of integers.
string
is a string.
bool
is a boolean.
ref '(' Constant ')'
is a type reference to the type with
identifier Constant
.
Constant
is a shortcut for ref '(' Constant ')'
.
label '(' Constant ')'
is an label reference to the label type
on the dimension referred to by dimension variable Constant
.
tv '(' Constant ')'
is a type variable.
'(' Type ')'
encapsulates type Type
.
Here is the UL syntax of a lexical class body:
Class ::= dim Constant Term | useclass Constant | useclass Constant { VarTermFeat* } | Constant | Constant { VarTermFeat* } | Class_1 & Class_2 | Class_1 '|' Class_2 | '(' Class ')'
dim Constant Term
defines the entry Term
for the
dimension with identifier Constant
.
useclass Constant
uses the lexical class with identifier
Constant
.
Constant
is a shortcut for useclass Constant
.
useclass Constant { VarTermFeat* }
uses the lexical class
with identifier Constant
and class parameters
VarTermFeat*
.
Constant { VarTermFeat* }
is a shortcut for useclass
Constant { VarTermFeat* }
.
Class & Class
is the conjunction of Class_1
and
Class_2
.
Class '|' Class
is the disjunction of Class_1
and
Class_2
.
'(' Class ')'
brackets class Class
.
Here is the UL syntax of terms:
Term ::= Constant | Integer | top | bot | Featurepath | CardFeat | { Term* } | '[' Term* ']' | { Recspec+ } | { : } | $ Setgen | $ '(' ')' | Term :: Type | Term_1 & Term_2 | Term_1 '|' Term_2 | Term_1 @ Term_2 | '<' Term* '>' | '(' Term ')'
Constant
is a constant.
Integer
is an integer.
top
is lattice top.
bot
is lattice bottom.
Featurepath
is a feature path.
CardFeat
is a cardinality specification.
{ Term* }
is a set of the elements Term*
.
'[' Term* ']'
is a list of the elements Term*
(in this
order).
{ Recspec+ }
is a record with specification Recspec+
.
{ : }
is the empty record.
$ Setgen
introduces set generator expression with set
generator expression body Setgen
.
$ '(' ')'
is the empty set generator expression.
Term :: Type
is a type annotation of term Term
with type
Type
.
Term_1 & Term_2
is the conjunction of Term_1
and
Term_2
.
Term_1 '|' Term_2
is the disjunction of Term_1
and
Term_2
.
Term_1 @ Term_2
is the concatenation
of Term_1
and Term_2
. Concatenation is restricted to
strings.
'<' Term* '>'
is an order generator specification of a list of
elements Term*
.
'(' Term ')'
brackets term Term
.
Here is the UL syntax of feature paths:
Featurepath ::= Root '.' Constant '.' Aspect ('.' Constant)+ Root ::= _|^ Aspect ::= attrs|entry
Root '.' Constant '.' Aspect ('.' Constant)+
is a feature path
with root variable Root
, dimension variable Constant
,
aspect Aspect
, and the list fields ('.'Constant)+
.
Here is the UL syntax of record specifications:
Recspec ::= TermFeat | Recspec_1 & Recspec_2 | Recspec_1 '|' Recspec_2 | '(' Recspec ')'
TermFeat
is a feature.
Recspec_1 & Recspec_2
is the conjunction of Recspec_1
and Recspec_2
.
Recspec_1 '|' Recspec_2
is the disjunction of Recspec_1
and Recspec_2
.
'(' Recspec ')'
brackets record specification
Recspec
.
Here is the UL syntax of set generator expression bodies:
Setgen ::= Constant | Setgen_1 & Setgen_2 | Setgen_1 '|' Setgen_2 | '(' Setgen ')'
Constant
is a constant.
Setgen_1 & Setgen_2
is the conjunction of Setgen_1
and Setgen_2
.
Setgen_1 '|' Setgen_2
is the disjunction of Setgen_1
and Setgen_2
.
'(' Setgen ')'
brackets set generator expression body
Setgen
.
Here is the UL syntax of constants:
Constant ::= <id> | <sstring> | <dstring> | <gstring>
I.e. a constant is either an identifier (<id>
), a single quoted
string (<sstring>
), a double quoted string (<dstring>
),
or a guillemot quoted string (<gstring>
).
Here is the UL syntax of constants:
Integer ::= <int> | infty
I.e. an integer is either an integer (<int>
) or the keyword for
“infinity” (infty
).
Here is the UL syntax of features:
ConstantFeat ::= Constant_1 : Constant_2 TermFeat ::= Constant : Term VarTermFeat ::= Constant : Term TypeFeat ::= Constant : Type CardFeat ::= Constant Card
ConstantFeat
is a feature with field Constant_1
and
value Constant_2
.
TermFeat
and VarTermFeat
are features with field
Constant
and value Term
.
TypeFeat
is a feature with field Constant
and value
Type
.
CardFeat
is a cardinality specification with field
Constant
and cardinality set Card
.
Here is the UL syntax of cardinality sets:
Card ::= ! | '?' | '*' | '+' | '#' { Integer* } | '#' '[' Integer_1 Integer_2 ']'
!
is cardinality set {0}.
'?'
is the cardinality set {0,1}.
'*'
is the cardinality set {0,...,infty} where
infty means “infinity”.
'+'
is the cardinality set {1,...,infty}.
'#' { Integer* }
is the cardinality set including the
integers Integer*
.
'#' '[' Integer_1 Integer_2 ']'
is the cardinality set
including the closed interval between Integer_1
and
Integer_2
.