COMP 240 Compiler Project


$ Revised: Tue Oct 8, 2002 by prins@cs.unc.edu

I. The Project

The basic project is the construction of a compiler to translate either the Tiger or MiniJava programming language into the MIPS-I instruction set. The two languages are defined in the next section. The MIPS-I instruction set and how MIPS-I programs can be executed are described in section III. The schedule and deliverables for the project are in section IV.

The basic project can be extended in scope by increasing the number and complexity of the language constructs that can be translated, or by adding advanced analysis or optimization methods.


II. Source Language Specifications

II.1 Tiger

We will use the informal syntax and semantics of Tiger as described in Appendix A of Appel's text. That section is purposefully vague and in some cases incomplete in order to give you some things to think about -- bring your questions to class!

II.2 miniJava

MiniJava is a subset of Java. miniJava programs are legal Java programs and their semantics are determined by Java semantics.

The primitive types are limited to int, boolean and void. The array type constructor is limited to int [], and String is the only predefined type.

The compilation unit is a single anonymous package, without any imports, and with any number of classes. The first of these classes is the mainclass consisting only of a static main() method. This method declares but can not access the single String [] parameter (because it is not a miniJava type -- feel free to fix that).

With the exception of the mainclass, classes contain only instance variables and methods. The only modifiers in such declarations are public and private. An instance variable declaration may not include an initializing assignment. Methods may not be overloaded explicitly in a class or through inheritance. Inheritance may override methods defined in a superclass.

Statements are limited to block statement, declaration statement, expression statement, conditional (if) statement, repetitive (while) statement, counting loop (for statement, and the return statement. The return statement can only appear as the last satement in a method, and only in methods returning a non-void value.

Expressions include variable reference (including array access), method invocation, assignment, creation of new object instances, and integer or String literals. The operators are limited to

        >	<	!			
	==	<=	>=	!=	&&	||	
	+	-	*	/
All of these operators are infix binary operators with the exception of Logical negation ! which is a unary operator, and arithmetic negation - which is both a unary and binary operator. Precedence is according to Java rules. There are no operations on strings.

The only predefined methods are System.out.print(), System.out.println(), and System.in.read(). The print methods are overloaded for int and String (but not boolean) arguments. The String provides the length field.

miniJava grammar

In the following, allcaps names within angle brackets like INTLIT stand for the corresponding lexical units.

Prog      ::= MainClass ( ClassDeclaration )* EOF 

MainClass ::= "class" Identifier "{" 
                 "public" "static" "void" "main" "(" "String" "[" "]" Identifier ")" "{" 
                    Statement* 
                 "}" 
              "}"
 
ClassDeclaration ::= "class" Identifier ( "extends" Identifier )? "{" 
                         ( VarDeclaration | MethodDeclaration )* 
                     "}" 

VarDeclaration   ::= Modifier Type Identifier ";" 

MethodDeclaration ::= 
       Modifier Type Identifier "(" ( Type Identifier ( "," Type Identifier )* )? ")" 
                           "{" ( Statement )* ("return" Expression ";" )? "}" 

Modifier  ::= ("public" | "private")?

Type      ::=  Identifier ("[" "]" )?

Statement ::=   "{" ( Statement )* "}" 
              | LocalVarDecl
              | Expression ";"
              | "if" "(" Expression ")" Statement ("else" Statement)? 
              | "while" "(" Expression ")" Statement 
              | "for" "(" LocalVarDecl ";"  Expression ";" Assignment ")" Statement

LocalVarDecl ::= Type Identifier "=" Expression ";"

Expression ::=  Reference
              | Reference "=" Expression
              | Expression BINOP Expression
	      | UNOP Expression
	      | "new" Identifier ("(" ")" | "[" Expression "]")
              | Literal
              | "(" Expression ")"

Reference  ::= ("this" ".")? Target ("." Target )* 
 
Target     ::=  Identifier 
              | Identifier "[" Expression "]"
              | Identifier "(" ( Expression ( "," Expression )* )? ")" 


Literal    ::=  INTEGERLITERAL
	      | STRINGLITERAL
	      | BOOLEANLITERAL

Identifier ::= IDENTIFIER 


III. MIPS-I Instruction Set and Run-time Support

The MIPS-I instruction set and a simple ABI is described in Appendix A of Hennessy and Patterson, Computer Organization and Design: The Hardware/Software Interface. The SPIM simulator is available to execute MIPS assembler code on a variety of platforms and Operating Systems, including Windows, with a simple GUI for inspecting the simulated machine state.

Execution of Tiger or miniJava programs requires the implementation of a few functions in the standard library to provide some basic functionality including input/output. These functions are available in a form suited for use with SPIM as described here.

An alternative strategy is to compile a version of these functions written in C for the MIPS-I instruction set using a C compiler on an SGI, linked with an object code obtained from assembling (using the SGI assembler) the code generated by your compiler. Note that the calling conventions described in the first paragraph correspond to those used by the GCC C compiler, which by default generates instructions for the MIPS-I instruction set that can be assembled by the SGI assembler. This is the more challenging route, but gives the satisfaction of running your programs natively on SGI hardware.


IV. Schedule and Deliverables

The complete compiler is due Wednesday December 4. A schedule of intermediate milestones follows.

DateMilestone
Sep 10Project and team selection
Oct 11Scanner, parser, and AST construction
Oct 25Full semantic checking and Intermediate Representation
Nov 15Code generation
Dec 4Complete compiler submission

Deliverables

Please submit a short written guide to your compiler containing the following sections:
  1. Scope of your project. Please make clear which basic and optional parts of the project you have implemented. List known limitations of your implementation (it is better for you to identify these than for me to find them). For those submitting a non-standard project, this section will require more work.
  2. Guide to your compiler. Give a short overview of the organization of your compiler. This overview, coupled with comments in the source code, should allow me to locate components of your compiler and understand your implementation. Please provide more detailed information on optional components and special algorithms/implementations you developed.
  3. Runtime environment and sample generated code. Explain your procedure linkage convention and how you interface to the runtime support code. Explain your memory management stategy (if any). Include the complete code generated by the compiler (minus the runtime support code) for a small representative program. For Tiger compilers, the representative program is here. If necessary, add some annotations to the generated code for clarification.
  4. Testing of the compiler. You don't need to show the result of all your tests, but you should explain what testing you performed. For a Tiger compiler, you should exercise your compiler on the tests provided by Appel, although most of these check detection of errors by the compiler, and only merge and queens print a result when executed. Here is a Tiger test program of intermediate complexity to try out. You should also develop your own suite of programs to test the compiler, particularly to demonstrate any optional components you implement. An easy way to document all of this is to describe the kinds of testing you did and to provide access to the tests, annotating each test with a short comment to explain what it is testing.
  5. Accessing the Compiler. Please tell me how I can access your compiler sources, test sources, your executable compiler, and an electronic copy of this document. Please give me a way to run your compiler on test programs. To the extent that it is simple to do, provide all of this in the form of something I can download in toto.

V. Project Evaluation

For the standard project, you are expected to construct a complete compiler for the Tiger or miniJava language. The compiler should, at a minimum, compile correct Tiger or miniJava programs into MIPS-I assembler code that can be executed using the SPIM simulator, while incorrect programs should generate an accurate error report from the lexical, syntax or semantic analyzers.

Additional credit can be earned by implementation of additional language constructs or advanced optimizations. In all cases, implementation should include all syntactic and semantic validity checking (as appropriate), code generation, any necessary run-time support, and a test suite to establish correct operation.

If you work on the project as a two-person team, each member of the team will receive 80% of the score of the complete project so that solo efforts are not overly disadvantaged. It is expected that a two-person team will complete some options beyond the basic project requirements.

The point values represent the maximum score for a feature. The actual score depends on the clarity of the approach, integration into the compiler, completeness, correctness and quality of generated code.

Basic Project Requirements (200 points total)

  1. Basic expressions (30 points)
  2. Block structure and basic statements (40 points)
  3. Procedures and functions (50 points)
  4. Array and Record types (35 points)
  5. Code generation (45 points)

Optional Language Extensions

  1. Add an 8-byte floating point type real to Tiger. Extend the arithmetic operations to real values, and adjust the type checking rules. Real values may be passed to or returned from functions. (40 points)
  2. Determine under what circumstances it is safe and possible to allocate array values in the activation record rather than on the heap. Implement an analysis to locate array allocations that may be stack allocated and adjust your compiler to use this strategy when possible (45 points).
  3. Add a function type to Tiger and implement Fun-Tiger (25 points), or PureFun-Tiger with some optimizations (closure conversion, tail recursion, inlining) (25 points + 20 points per optimization), or Lazy-Tiger (35 points). See Appel, Chapter 15.
  4. Add classes and methods to Tiger and implement Object-Tiger with type checking, single inheritance, and dynamic method invocation (40 points). See Appel, Chapter 14.

Optional Program Optimization Extensions

  1. Register allocation with full coalescing and register spilling (40 points)
  2. Common subexpression elimination within basic blocks (25 points) or intraprocedural (50 points)
  3. Constant folding within basic blocks (15 points) or intraprocedural (30 points)

Optional Compiler Enhancements

  1. Error recovery. Enable the compiler to continue analysis in a reasonable fashion in the presence of lexical, syntactic or semantic errors (40 points).