Section H1 Adding a statement to -core

Let's have a look the -core grammar. The current -core language (and compiler) feature a few more top level declarations and statements, but we keep it simple for the presentation:

core:
  | top*
top:
  | "Task" ID ( C-params ) { stmt* }
  | "Reset" { stmt* }
  | ...   
              
stmt:
  | #> C-code <#                                  
  | "async" after? before? ID ( C-params ) ;             
  | "sync" ID ( C-params ) ;
  | ...

  
after:
  | "after" time                              

before:
  | "before" time          

time:
  | INTVAL tunit?

tuint:
  | "us" | "ms" | "s"                                                                                                   

- * matches zero or many occurrences 
- ? matches zero or one occurrence
- An ID is a string denoting an identifier.
- C-params uses the C notation for specifying parameters.
- #> C-code <# in-lines C-code in the core program


Assume we want to extend the language (and compiler with a halt statement), having the following grammar.

stmt:           
  | "halt"  ;


The intended behaviour of halt, is just to "halt" the execution of the task at this position. (Later we will discuss how to achieve this behaviour.) Of course, since we allow C-code embedding we could implement halt directly in the C-code, but that is not the point, we want to show how the -core language can be altered and implemented in the compiler!

Lexing and parsing.

In order for the compiler to identify this new statement we need first to add the "halt" to the set of symbols detected by the compiler. This is done at the stage of lexing, where the text input (a program in -core) is translated into a stream of symbols. We refer to the string "halt" as a lexeme while the symbol is referred to as a token.

The -core compiler is implemented in OCaml using Menhir for parser generation. Let us now have a closer look at the two files, Lexer.mll (defining the lexer) and Parser.mly (defining the parser).

Tokens are defined in the Parser.mly. (There are other ways to do this, so this is a design choice we took.)

...

(* RTFM-core/Parser.mly *)

%token <string> ID
%token <int>    INTVAL
%token <string> STRINGVAL
%token <string> CCODE
%token <string> PARAMS
%token MODULE INCLUDE ISR TASK FUNC RESET IDLE SYNC ASYNC PEND AFTER BEFORE (* PRIO *) CLAIM USEC MSEC SEC SC LCP RCP EOF
...

As seen above, tokens can be associated with additional information (e.g., <int> INTVAL).

We now introduce a new HALT token (without any associated information).
%token HALT


So we have the token, let us now add a lexing rule for the HALT token in the Lexer.mll.

...
rule lex = parse
  | "module"             { MODULE }                          (* module system related *)
  | "include"            { INCLUDE }
   
  | "pend"               { PEND }                            (* statements *)
  | "sync"               { SYNC }
  | "async"              { ASYNC }
  | "halt"               { HALT }
...


We can now compile the -core compiler and give it a sample program:
// Halt.core
// Per Lindgren (C) 2014
//
// Test of halt

Reset {
    halt ;
}


The compiler will give a parsing error. We do not have the grammar rule yet, but let us fix that in Parser.mly.

...
stmt:
  | CCODE                                   { ClaimC ($1) }
  | CLAIM ID LCP stmt* RCP                  { Claim ($2, $4) }
...
  | HALT SC                                 { ???? }
...


Now we come to the problem on how the HALT should be represented internally in the compiler. This is done by means of an Abstract Syntax Tree (AST), defined in the file AST.ml. A program is represented as a list of top level declarations, which in turn (may) hold a list of statements, e.g. the Rest and Idle tasks.

We add a representation for our new halt statement as follows:

type stmt =
  | Claim     of string * stmt list                 (* res_id, stmts                    *)
...
  | Halt                                            (* halt takes no arguments *)
...

As we do not associate any additional information with the halt statement, the constructor Halt takes no arguments.

So we can go back to the Parser.mly and use the Halt constructor.
...
stmt:
...
  | HALT SC                                 { Halt }
...


Now we should be able to run the -core compiler, and parse the example program. However, we will get a run-time error (we have not defined how the Halt constructor in the AST should be handled.)

We are actually warned that we do not cover Halt at the time of running ocamlbuild.
First thing we fix is the pretty printing defined in the AST.ml.

let string_of_tops tl = 
  let rec stmts t sl = myconcat nl (mymap (stmt (t ^ tab)) sl)          
  and top = function
    | TopC (c)                 -> c (* "#> CCODE <#" *)
...
    | Idle (s)                 -> "Idle " ^ op ^ stmts "" s ^ cl ^ nl
  and stmt t = function 
    | Claim (id, s)            -> t ^ "Claim " ^ id ^ op ^ stmts t s ^ t ^ cl
... 
    | Halt                     -> t ^ "--- our new halt ---"
  in
  myconcat nl (mymap top tl)


The function mymap is defined in Common.ml. In contrast to the standard map (List library),
mymap f l  

allows f to raise the UnMatched exception, and still create a list output. This allows us to more succinctly operate on the AST. (However it has the drawback that warnings are not emitted in case f raises the UnMatched exception.)

The function myconcat follows the List.concat idea (contacting a list of strings delimited by s), but puts an additional s at the end (which is useful to he handling of statement lists, etc...). (Alternate ways of achieving this behaviour is of course possible.)
mymap s f l  


A statement is handled as follows:
  and stmt t = function 
    | Claim (id, s)            -> t ^ "Claim " ^ id ^ op ^ stmts t s ^ t ^ cl
... 

where and makes it available for mutual recursion (w.r.t., to top and stmts) and function makes it match an implicit argument. It is a shorthand for writing:
  and stmt t s = match s with
    | Claim (id, s)            -> t ^ "Claim " ^ id ^ op ^ stmts t s ^ t ^ cl
... 


Pretty indentation is achieved by prepending each statement with an initial string (strings are concerted by the infix ^ operator). The tab is a tab string "\t” defined in Common.ml, while t holds the indenting level (initially set to "" when stmt is called from the the top level, e.g. in
 | Idle (s)                   -> "Idle " ^ op ^ stmts "" s ^ cl ^ nl

where stmts is defined as:
  let rec stmts t sl = myconcat nl (mymap (stmt (t ^ tab)) sl)

Notice, that stmt is given the indenting level (t ^ tab), and then mapped to each element of the statement list sl, the resulting list (of strings) is concerted with the nl, where nl is "\n" (in Common.ml).

If you are new to functional programming, it may be wort spending some time understanding the pretty printing function, since similar patterns occur over and over in the -core compiler.

You can now try the compiler again, giving the option -d_ast which invokes the pretty printing directly after parsing the input. You should see output similar to:
Module:
Use :
Prog:
Reset  {
	My HALT0
}
...


before the compiler gets stuck at later stages: You will need to make changes to the following files:
  • Gv.ml You will need to extend the internal data structure with DotHalt int, the int is to give a line number (tag) in the gv record structure, so that we can later render interconnections.
  • ICGenRT.ml Generate C code for the halt, e.g., "RT_halt(" ^ s ^ ");"). (We will later discuss how to implement this functionality in the run-time system.)
  • SpecAST.ml Another pretty printing of AST, this time for the SpecAST








Last edited Sep 6, 2014 at 10:13 AM by RTFMPerLindgren, version 7