3 Compiler Implementation

3.1 Structure


Figure 3.1: Compiler Structure


3.2 Lexical/Syntax Analysis

Front-end Generator

As front-end generator the Gump tool described in [Kor98a] is used.

Files

scanner.ozg, parser.ozg

Implementation notes

Because the equality sign = and the asterisk * can either be an identifier or a keyword, it is differentiated between normal identifiers, the asterisk, and the equality sign to avoid shift/reduce conflicts.

Furthermore, infix operators and nonfix identifiers are distinguished. This is done during syntax analysis via a dictionary containing all infix identifiers. Updating of the dictionary is done by the parser. With this information the scanner can decide if it should append the token identifier or infix_operator to the stream. Note that there must be more than one dictionary because fixity declarations can be locally nested.

The infix operators are completely elaborated in the parser.

3.3 Compilation

Files

compiler.oz

Implementation Notes

The SML to Oz Compiler translates the output of the parser into the intermediate representation of the base language of Oz described in [Kor98b].

3.3.1 Translation Schemes

The Oz base language would be to difficult to read for people not being familiar with. Therefore, the schemes are given in normal Oz syntax.

Identifiers

Identifiers are translated with several prefixes because there are two name spaces in the dynamic semantics of Standard ML: one for constructors and value identifiers and another one for structure identifiers.

SML identifier

value identifier

constructor identifier

structure identifier

x

vc@x

vc@x and cn@x

st@x

Constructors

datatype TREE =
    Leaf   
  | Branch of TREE*TREE;

It must be distinguished between nullary and unary constructors. In this case Leaf has arity zero and Branch has arity one.

For every constructor an Oz name is needed for pattern matching. It is not possible to use atoms for this purpose because constructors in SML are generative; that means that the redefinition of a constructor is different to the old one. So the translation of Leaf looks as follows:

declare `cn@Leaf` = {NewName}
declare `vc@Leaf` = `cn@Leaf` 

Unary constructors are translated as functions like in SML:

declare `cn@Branch` = {NewName}
declare 
fun {`vc@Branch` `arg_X`}
   `cn@Branch`(`arg_X`)
end 

Exceptions

Exceptions are a kind of constructors and therefore are handled in the same way.

Valbindings

Because of fundamental differences in the binding rules of Oz and SML some temporary variables for the translation of value bindings are introduced. An alternative solution would have been the renaming of identifiers. But this introduces many variables in the top level environment of the Oz interpreter which can not be garbage collected.

val x = 2  
and y = x + 2
and rec g = fn x => x+1
and f = fn x => f x;
 

declare 
`tmp1` 
declare 
`tmp1` = 'VAL'(2 {`vc@+` 'R'(`vc@x` 2)})
declare 
`tmp2` 
declare 
'VAL'(`vc@g` `vc@f`) = `tmp2` 
declare 
!`tmp2` = 'VAL'(fun {$ `arg_X`}
                   case `arg_X` of `vc@x` then 
                      {`vc@+` 'R'(`vc@x` 1)}
                   end 
                end fun {$ `arg_X`}
                       case `arg_X` of `vc@x` then 
                          {`vc@f` `vc@x`}
                       end 
                    end)
declare 
'VAL'(`vc@x` `vc@y`) = `tmp1` 

Locals

Though the sematics of local in Oz and SML are the same the translation is more complicated because of the binding rule differences mentioned above.

local  
    val x = 2;
in 
    val y = x + 1;
end;

declare 
% val x = 2;
local 
   `tmp1` 
in 
   local 
      `tmp1` = 'VAL'(2)
   in 
      local 
         `tmp2` 
      in 
         local 
            'VAL'() = `tmp2` 
         in 
            local 
               !`tmp2` = 'VAL'()
            in 
               local 
                  'VAL'(`vc@x`) = `tmp1` 
               in 
                  % val y = x + 1;
                  local 
                     `tmp1` 
                  in 
                     local 
                        `tmp1` = 'VAL'({`vc@+` 'R'(`vc@x` 1)})
                     in 
                        local 
                           `tmp2` 
                        in 
                           local 
                              'VAL'() = `tmp2` 
                           in 
                              local 
                                 !`tmp2` = 'VAL'()
                              in 
                                 local 
                                    'VAL'(`vc@y`) = `tmp1` 
                                 in 
                                    % Now the result is assigned to a temporary
                                    % variable which is not introduced in the
                                    % upper local declarations.
                                    `tmp_v0@y` = `vc@y` 
                                 end 
                              end 
                           end 
                        end 
                     end 
                  end 
               end 
            end 
         end 
      end 
   end 
end 
declare 
% Assigning the temporary variable to the proper variable
`vc@y` = `tmp_v0@y` 
 

Because local...in...end can be nested into each other a unique number is appended to each temporary variable. This ensures that the names of temporary variables do not clash.

3.3.2 Compiler Internals

``Meta-Lists''

SML Programs are mainly lists of declarations. Either local or global. Therefore declarations are first translated into what we call ``meta-lists'' as follows:

[meta(Dec1 Pos1) meta(Dec2 Pos2) ... meta(DecN PosN)]
 

where each Deci is an already translated declaration.

After that the methods

makeLocalDec(L Exp ?O)

and

makeTopDec(L ?O)

generate local resp. global Oz declarations from the meta lists. The result of the global version can be seen at Valbindings and the result of the local one at Locals.


Andreas Simon and Gerhard Schneider
Generated on Thursday, November 26th 1998, 23:35:31