Workshop on Compiler Construction with Introduction to GCC

Uday Khedker and Amitabha Sanyal
Department of Computer Science & Engg., IIT Bombay


About the workshop
Pedagogy Contents Pre-requisites Philosophy behind the proposed pedagogy


About the Workshop

This is a one-week workshop primarily aimed at teachers in various colleges. This workshop is organized under the aegis of the GCC Resource Center of the Department of Computer Science and Engineering of IIT Bombay.

We can conduct this course at any local venue provided there is a local organizer who is willing to make the arrangements. Basically, we need a class room and associated lab with machines running a contemporary version of Linux supporting the following packages. Other than these, the arrangements include the usual lodging and boarding required for us, our teaching assistants, and the participants. If you are interested in organizing this workshop, please contact us.

Packages required for conducting the workshop are:


Proposed Pedagogy

Modern  day  compilers  are,   to  a  large  extent,  automatically generated  rather  than  hand-crafted.   This workshop  will  provide hands-on  experience  in  building  compilers through  the  use  of  compiler generation tools.  While the tools for generating compile front-ends, lex  and yacc, are now standard  textbook material, the workshop  will also  include  the use  of  a tools  called iburg  for generating back-ends.

The main objective of this workshop is to teach how to build a compiler incrementally such that:
Please see the philosophy behind this pedagogy for a justification.  The basic motivation behind this approach is to be able to strike a good balance between theory and practice and make the theory follow the practice instead of the other way round which is what happens in traditional courses.
 

Our choice of source language is loosely based on C while the target machine is MIPS as defined by the spim simulator (some sample spim programs collected from various resources available on the net). I have used spim7.3 version.

The emphasis of the workshop will be on laboratory exercises in which participants will be required to build a compiler in a series of steps. Starting with a very simple language, each step will end in a working compiler for a language, and the subsequent step will start by augmenting the previous language by adding a few more features. Lectures in the workshop will be used to provide  the conceptual background regarding different aspects of compiler construction.

In  general, the  delivery plan  for  the workshop  will consists  of several rounds of:


Contents of the Workshop

Theory
Practice
  1. Scanning. Expressing tokens through regular expressions, converting regular expressions to DFA, generating scanners from specifications, structure of generated scanners, efficient representation of scanning tables.
  2. Parsing. Expressing syntactic structure through grammars. LR parsing theory, generating parsers from specifications.
  3. Static Semantics. Using attributes for (i) type checking (ii)  calculating variable offsets (iii) Generating abstract syntax trees.
  4. Runtime Environments. Activation records, function prologues and epilogues for allocation and deallocation of activation records.
  5. Code Generation. Optimal code generation for expression trees. Generating code generators from specifications. Simple register allocation.
  6. Introduction to GCC. The compiler generation framework. Retargetability model. Inspecting intermediate represenations of program.
Using  the  compiler generation  tools lex, yacc , and iburg for generating MIPS code accepted by the spim simulator for the following language features:
  1. Constant expressions
  2. Expressions involving variables
  3. Assignment statements
  4. Array references
  5. Declarations
  6. Function calls
These experiments will be preceded by lab exercises on writing simple interpreters.


Tentative Schedule


Here is a tentative list of topics. The schedule will be based on the proposed coverage but may change dynamically to suit the needs of the workshop and depends on the way the workshop progresses. The lectures and lab work will be interleaved and this interleaving may be decided dynamically based on the specific needs of the audience.


Day
Compilation Concepts Covered
Lab exercises
1
  1. Structure of compilers and interpreters
  2. Introduction to scanning and parsing though lex and yacc
  1. Assembly language programming using spim.
  2. Evaluation (i.e. interpretation) of constant expressions involving arithmetic operations.
2
  1. Scanning theory
  2. Introduction to attribute evaluation and symbol tables
  3. Introduction to name and scope analysis, intermediate code generation, abstract syntax trees, three address codes, instruction selection through iburg, local register allocation, activation records and stack allocation for local variables, emitting assembly code for spim.

Evaluation (i.e. interpretation) of
  1. Expressions with variables and assignments to variables.
  2. Conditional expressions involving relational and boolean operations.
3
  1. Parsing theory.
  2. Introduction to declaration processing and type analysis.
Compilation of
  1. Constant expressions involving arithmetic operations.
  2. Expressions with variables and assignments to variables.
  3. Boolean expressions involving relational operations on integer expressions.
4 Run-time environment support provided by a compiler.
Declarations of integer and boolean data types.
5
Theory of instruction selection
Compilation of
  1. Control flow constructs if-then, if-then-else, while-do etc.
  2. Conditional expressions using control flow constructs.
6
Theory of instruction selection
Compilation of arrays
7
Introduction to gcc
Compilation of function declarations and calls, parameter passing.


Pre-requisites

Good knowledge of C and familiarity with Unix/Linux.

The philosophy behind this approach

This approach is very different from the traditional approach of teaching compiler construction phase-wise. The traditional phase-wise approach first covers scanning in full details,  then parsing, then semantic analysis etc. The problem with such an approach is that a student does not get a feel of the complete compiler or the overall process of compilation until very late in the course. While there are many small compilers (eg. in the "dragon back" by Aho, Sethi, and Ullman) can be used to expose the students to overall compilation, they are not scalable. So students end up seeing a lot of theory piece-wise but do not develop a feel of  the overall compilation process. (In principle, the phases have very well defined simple interfaces, but to get its feel through an implementation which allows experimentation is a different matter  altogether.) By  time the students are ready to put the tools based on this theory to non-trivial use, it is already too late in the semester. Further, instruction selection phase remains more of a theory and very little practice gets across to the students. The proposed approach eliminates these drawbacks and bring the tools very early in the course. This is possible because the tools like lex, yacc, and iburg can be used without knowing the theory tool. An added advantage of this approach is that it serves as a motivation for the students to learn the theory by raising their curiosity.

Note that the basic motivation behind this approach is to be able to strike a good balance between theory and practice. The hope is that by bringing the practice upfront as a driving force a better balance can be achieved compared to the current offerings which are perceived as being biased towards theory by students.