COMP 240 Compiler Project
$ Revised: Tue Oct 8, 2002 by prins@cs.unc.edu
I. The Project
The basic project is the construction of a compiler to translate
either the Tiger or MiniJava programming language
into the MIPS-I instruction set.
The two languages are defined in the next section.
The MIPS-I instruction set and how MIPS-I programs can be executed are
described in section III.
The schedule and deliverables for the project are in
section IV.
The basic project can be extended in scope by increasing the number and
complexity of the language constructs that can be translated, or by
adding advanced analysis or optimization methods.
II. Source Language Specifications
II.1 Tiger
We will use the informal syntax and semantics of Tiger as described
in Appendix A of Appel's text.
That section is purposefully vague and in some cases incomplete in order to
give you some things to think about -- bring your questions to class!
II.2 miniJava
MiniJava is a subset of Java. miniJava programs are legal
Java programs and their semantics are determined by Java
semantics.
The primitive types are limited to int, boolean and
void. The array type constructor is limited to int
[], and String is the only predefined type.
The compilation unit is a single anonymous package, without any
imports, and with any number of classes. The first of these classes
is the mainclass consisting only of a static main() method.
This method declares but can not access the single String []
parameter (because it is not a miniJava type -- feel free to fix
that).
With the exception of the mainclass, classes contain only instance
variables and methods. The only modifiers in such declarations are
public and private. An instance variable declaration
may not include an initializing assignment. Methods may not be overloaded
explicitly in a class or through inheritance. Inheritance may override
methods defined in a superclass.
Statements are limited to block statement, declaration statement,
expression statement, conditional (if) statement, repetitive
(while) statement, counting loop (for statement, and
the return statement. The return statement can only
appear as the last satement in a method, and only in methods returning
a non-void value.
Expressions include variable reference (including array access),
method invocation, assignment, creation of new object
instances, and integer or String literals. The operators are limited
to
> < !
== <= >= != && ||
+ - * /
All of these operators are infix binary operators with the exception of
Logical negation ! which is a unary operator, and arithmetic
negation - which is both a unary and binary operator.
Precedence is according to Java rules. There are no operations on
strings.
The only predefined methods are System.out.print(),
System.out.println(), and System.in.read(). The
print methods are overloaded for int and String (but
not boolean) arguments. The String provides the
length field.
miniJava grammar
In the following, allcaps names within angle brackets like
INTLIT stand for the corresponding lexical
units.
Prog ::= MainClass ( ClassDeclaration )* EOF
MainClass ::= "class" Identifier "{"
"public" "static" "void" "main" "(" "String" "[" "]" Identifier ")" "{"
Statement*
"}"
"}"
ClassDeclaration ::= "class" Identifier ( "extends" Identifier )? "{"
( VarDeclaration | MethodDeclaration )*
"}"
VarDeclaration ::= Modifier Type Identifier ";"
MethodDeclaration ::=
Modifier Type Identifier "(" ( Type Identifier ( "," Type Identifier )* )? ")"
"{" ( Statement )* ("return" Expression ";" )? "}"
Modifier ::= ("public" | "private")?
Type ::= Identifier ("[" "]" )?
Statement ::= "{" ( Statement )* "}"
| LocalVarDecl
| Expression ";"
| "if" "(" Expression ")" Statement ("else" Statement)?
| "while" "(" Expression ")" Statement
| "for" "(" LocalVarDecl ";" Expression ";" Assignment ")" Statement
LocalVarDecl ::= Type Identifier "=" Expression ";"
Expression ::= Reference
| Reference "=" Expression
| Expression BINOP Expression
| UNOP Expression
| "new" Identifier ("(" ")" | "[" Expression "]")
| Literal
| "(" Expression ")"
Reference ::= ("this" ".")? Target ("." Target )*
Target ::= Identifier
| Identifier "[" Expression "]"
| Identifier "(" ( Expression ( "," Expression )* )? ")"
Literal ::= INTEGERLITERAL
| STRINGLITERAL
| BOOLEANLITERAL
Identifier ::= IDENTIFIER
III. MIPS-I Instruction Set and Run-time Support
The MIPS-I instruction set and a simple ABI is described in
Appendix A
of Hennessy and Patterson,
Computer Organization and Design: The Hardware/Software Interface.
The SPIM simulator
is available to execute MIPS assembler code on a variety of platforms
and Operating Systems, including Windows, with a simple GUI for inspecting
the simulated machine state.
Execution of Tiger or miniJava
programs requires the implementation of a few functions
in the standard library to provide some basic functionality including
input/output.
These functions are available in a form suited for use with SPIM
as described
here.
An alternative strategy is to compile a version of
these functions written in C
for the MIPS-I instruction set using a C compiler on an SGI, linked
with an object code obtained from assembling (using the SGI assembler) the
code generated by your compiler.
Note that the calling conventions described in the first paragraph
correspond to those used by the GCC C compiler, which by default
generates instructions for the MIPS-I instruction set that can be
assembled by the SGI assembler.
This is the more challenging route, but gives the satisfaction of
running your programs natively on SGI hardware.
IV. Schedule and Deliverables
The complete compiler is due Wednesday December 4. A schedule
of intermediate milestones follows.
Date | Milestone |
Sep 10 | Project and team selection |
Oct 11 | Scanner, parser, and AST construction |
Oct 25 | Full semantic checking and Intermediate Representation |
Nov 15 | Code generation |
Dec 4 | Complete compiler submission |
Deliverables
Please submit a short written guide to your compiler containing the
following sections:
- Scope of your project. Please make clear
which basic and optional parts of the project
you have implemented. List known limitations of your implementation
(it is better for you to identify these than for me to find them).
For those submitting a non-standard project, this section will
require more work.
- Guide to your compiler. Give a short overview of the organization
of your compiler. This overview, coupled with comments
in the source code, should allow me to locate components of your
compiler and understand your implementation.
Please provide more detailed information on optional components
and special algorithms/implementations you developed.
- Runtime environment and sample generated code. Explain your
procedure linkage convention and how you interface to the
runtime support code. Explain your memory management stategy (if any).
Include the complete code generated
by the compiler (minus the runtime support code) for a small
representative program. For Tiger compilers, the representative
program is here.
If necessary, add some annotations to the generated code for
clarification.
- Testing of the compiler. You don't need to show the
result of all your tests, but you should explain what testing you
performed.
For a Tiger compiler,
you should exercise your compiler on the
tests provided by Appel,
although most of these check detection of errors by the compiler, and
only merge and queens print a result when executed.
Here is a Tiger test program of intermediate
complexity to try out.
You should also develop your own suite of programs to test the
compiler, particularly to demonstrate any optional components
you implement.
An easy way to document all of this is to describe the
kinds of testing you did and to provide access to the tests,
annotating each test with a short comment to explain what
it is testing.
- Accessing the Compiler. Please tell me how I can access
your compiler sources, test sources, your executable compiler,
and an electronic copy of this document. Please give me a way to run
your compiler on test programs. To the extent that it is simple
to do, provide all of this in the form of something I can download
in toto.
V. Project Evaluation
For the standard project, you are expected to construct a complete
compiler for the Tiger or miniJava language.
The compiler should, at a minimum, compile
correct Tiger or miniJava programs into
MIPS-I assembler code that can be executed using the SPIM simulator, while
incorrect programs should generate an accurate error report
from the lexical, syntax or semantic analyzers.
Additional credit can be earned by implementation of additional language
constructs or advanced optimizations.
In all cases, implementation should include all syntactic and semantic
validity checking (as appropriate), code generation, any necessary
run-time support, and a test suite to establish correct operation.
If you work on the project as a two-person team, each member of the team will
receive 80% of the score of the complete project so that solo efforts are not
overly disadvantaged. It is expected that a two-person team will complete some
options beyond the basic project requirements.
The point values represent the maximum score for a feature. The
actual score depends on the clarity of the approach, integration into the
compiler, completeness, correctness and quality of generated code.
Basic Project Requirements (200 points total)
- Basic expressions (30 points)
- Basic types and type checking: int, string
- Basic expressions involving integer and string constants
and standard library functions.
- Output of integers and strings via standard library
- Block structure and basic statements (40 points)
- Block structure: let, (non-recursive) type declarations and
variable declarations
- Basic statements: assignment, conditional if,
and repetitive statements (while and for)
with break statement
- Input of characters
- Procedures and functions (50 points)
- Top-level functions
- Nested functions
- Functions with parameters
- Recursive functions
- Array and Record types (35 points)
- Recursive types and type checking
- Record and array expressions and assignment
- Code generation (45 points)
- MIPS-I instruction selection
- Intraprocedural register allocation
Optional Language Extensions
- Add an 8-byte floating point type real to Tiger.
Extend the arithmetic operations to real values, and adjust the type checking
rules. Real values may be passed to or returned from functions. (40 points)
- Determine under what circumstances it is safe and possible to allocate
array values in the activation record rather than on the heap. Implement
an analysis to locate array allocations that may be stack allocated and
adjust your compiler to use this strategy when possible (45 points).
- Add a function type to Tiger and implement Fun-Tiger (25 points), or
PureFun-Tiger with some optimizations (closure conversion,
tail recursion, inlining) (25 points + 20 points per optimization), or
Lazy-Tiger (35 points). See Appel, Chapter 15.
- Add classes and methods to Tiger and implement Object-Tiger with
type checking, single inheritance, and dynamic method invocation (40 points).
See Appel, Chapter 14.
Optional Program Optimization Extensions
- Register allocation with full coalescing and register spilling (40 points)
- Common subexpression elimination within basic blocks (25 points) or
intraprocedural (50 points)
- Constant folding within basic blocks (15 points) or
intraprocedural (30 points)
Optional Compiler Enhancements
- Error recovery. Enable the compiler to continue analysis
in a reasonable fashion in
the presence of lexical, syntactic or semantic errors (40 points).