next up previous
Next: 4.5 Pretty-printing Up: 4 Example Previous: 4.3 Example Translation

4.4 Parsing and AST Construction

The AST is constructed using a scanner and parser generator of the implementor's choice with calls to the KHEPERA library AST construction routines. At the level of the scanner, KHEPERA provides support for source code line number and token offset tracking. This support is optional, but is very helpful for debugging. If the implementor desires line number and token offset tracking, the scanner must interact with KHEPERA in three ways: first, each line of source code must be registered. In versions of lex that support states, providing this information is trivial (although inefficient), as show in Figure 5. For other scanner generators, or if scanning efficiency is of great concern, other techniques can be used. The routine src_line stores a copy of the line using low-level string-handling support. While the routines used in these examples are tailored for lex semantics, the routines are generally wrapper routines for lower-level KHEPERA functions and would, therefore, be easy to implement for other front-end tools.


  
Figure 5: Storing Lines While Scanning
\begin{figure}
 \begin{center}
 \begin{verbatim}
NL \n
...
%%
<INITIAL\gt{
 .*{N...
 ... BEGIN(OTHER);
}
...
{NL} BEGIN(INITIAL);\end{verbatim} \end{center}\end{figure}

KHEPERA also handles interpretation of line number information generated by the C preprocessor. This requires a simple lex action:

  ^#\ .*      src_cpp_line(yytext, yyleng);

Finally, every scanner action must advance a pointer to the current position on the current line. This is accomplished by having every action make a call to src_get(yyleng), a minor inconvenience that can be encapsulated in a macro.

The productions in the parser need only call KHEPERA tree-building routines--all other work can be reserved for later tree walking. This tends to simplify the parser description file, and allows the implementor to concentrate on parsing issues during this phase of development. A few example yacc productions are shown in Figure 6. The second argument to tre_mk is a pointer to the (optional) source position information obtained during scanning. The abstract representation of the constructed AST is that of an n-ary tree, and routines are available to walk the tree using this viewpoint.[*]


  
Figure 6: Building the AST While Parsing
\begin{figure}
 \begin{center}
\begin{verbatim}
Statement: Identifier '=' Expres...
 ...tement
 {
 $$ = tre_append($1, $2);
 }
 ;\end{verbatim} \end{center}\end{figure}

Immediately after the parsing phase, the AST is available for printing. Without any pretty-printer description, the AST is printed as a nested S-expression, as shown in Figure 7.


  
Figure 7: Example Input and Initial AST
\begin{figure}
 \begin{center}
 Original Program:
 \begin{verbatim}
A = range(de...
 ...ifier/''i''))))
 (N_Identifier/''i'')))))\end{verbatim} \end{center}\end{figure}


next up previous
Next: 4.5 Pretty-printing Up: 4 Example Previous: 4.3 Example Translation
Rickard Faith
8/31/1997