Workshop on Compiler Construction with
Introduction to GCC
Khedker and Amitabha Sanyal
Department of Computer Science
Engg., IIT Bombay
This is a one-week workshop primarily
aimed at teachers in various
colleges. This workshop is organized under the aegis of the GCC Resource Center of
the Department of Computer
Science and Engineering of IIT
We can conduct this course at any local venue provided there is a local
organizer who is willing to make the arrangements. Basically, we need a
class room and associated lab with machines running a contemporary
version of Linux supporting the following packages. Other than these,
the arrangements include the usual
lodging and boarding required for us, our teaching assistants, and the
participants. If you are interested in organizing this workshop, please
Packages required for conducting the workshop are:
- The usual packages/commands available on a contemporary
distribution of Linux
gcc, lex/flex, yacc/bison,
and postscipt viewer
- Extra packages
Modern day compilers are, to
a large extent, automatically generated
rather than hand-crafted. This workshop
will provide hands-on experience in
building compilers through the use of
compiler generation tools. While the tools for
generating compile front-ends,
yacc, are now
standard textbook material, the workshop will also
use of a tools called
iburg for generating
The main objective of this workshop is to teach how to build a compiler
incrementally such that:
Please see the philosophy
behind this pedagogy for a justification. The basic
motivation behind this approach is to be able to
strike a good balance between theory and practice and make the theory
follow the practice instead of the other way round which is what
in traditional courses.
- Each steps supports new source
language features rather than a new phase in the compiler. In
other words, each steps results in a complete working compiler which
generates code which can be executed. We begin with a very small language consisting
merely of expression.
- Tools like
are used before the theory behind scanning, parsing, and instruction
selection is covered.
Our choice of source language is loosely based on C while the target
machine is MIPS as defined by the
spim programs collected from various resources available on the
The emphasis of the workshop will be on laboratory exercises in
participants will be required to build a compiler in a series of steps.
Starting with a very simple language, each step will end in a working
compiler for a language, and the subsequent step will start by
augmenting the previous language by adding a few more features.
Lectures in the workshop will be used to provide the conceptual
background regarding different aspects of compiler construction.
In general, the delivery plan for the
workshop will consists of several rounds of:
- Lectures to introduce (certain aspects) of the tools.
- Laboratory exercises to use the tools to implement a
language with a given set of features.
- Lectures that will utilize the experience gained in the
laboratory exercises to provide the theoretical basis of compiler
Expressing tokens through regular expressions, converting regular
expressions to DFA, generating scanners from specifications, structure
of generated scanners, efficient representation of scanning tables.
Expressing syntactic structure through grammars. LR parsing theory,
generating parsers from specifications.
Semantics. Using attributes for (i) type checking (ii)
calculating variable offsets (iii) Generating abstract syntax trees.
Environments. Activation records, function prologues and
epilogues for allocation and deallocation of activation records.
Generation. Optimal code generation for expression trees.
Generating code generators from specifications. Simple register
to GCC. The compiler generation framework. Retargetability
model. Inspecting intermediate represenations of program.
|Using the compiler
for generating MIPS code accepted by the
spim simulator for
the following language features:
These experiments will be preceded by lab exercises on writing simple
- Constant expressions
- Expressions involving variables
- Assignment statements
- Array references
- Function calls
Here is a tentative list of topics. The schedule will be based on the
proposed coverage but may change dynamically to suit the needs of the
workshop and depends on the way the workshop progresses. The lectures
and lab work will be
interleaved and this interleaving may be decided dynamically based on
the specific needs of the audience.
- Structure of compilers and interpreters
- Introduction to scanning and parsing though lex and yacc
- Assembly language programming using spim.
- Evaluation (i.e. interpretation) of constant expressions
involving arithmetic operations.
- Scanning theory
- Introduction to attribute evaluation and symbol tables
- Introduction to name and scope analysis, intermediate code
generation, abstract syntax trees, three address codes, instruction
selection through iburg, local register allocation, activation records
and stack allocation for local variables, emitting assembly code for
Evaluation (i.e. interpretation) of
- Expressions with variables and assignments to variables.
- Conditional expressions involving relational and
- Parsing theory.
- Introduction to declaration processing and type analysis.
- Constant expressions involving arithmetic operations.
- Expressions with variables and assignments to variables.
- Boolean expressions involving relational operations on
environment support provided by a compiler.
|Declarations of integer and
boolean data types.
- Control flow constructs if-then, if-then-else, while-do etc.
- Conditional expressions using control flow constructs.
|Compilation of arrays
|Compilation of function
declarations and calls, parameter passing.
Good knowledge of C and familiarity with Unix/Linux.
philosophy behind this approach
This approach is very different from the
approach of teaching compiler construction phase-wise. The traditional
phase-wise approach first
covers scanning in full details, then parsing, then semantic
analysis etc. The problem with such an approach is that
does not get a feel of the complete compiler or the overall process of
until very late in the course. While there are many small compilers
(eg. in the "dragon back" by Aho, Sethi, and Ullman) can be used to
expose the students to overall compilation, they are not
scalable. So students end up seeing a lot of theory piece-wise but do
not develop a feel
of the overall
compilation process. (In principle, the phases
have very well defined simple interfaces, but to get its feel through
which allows experimentation is a different matter altogether.)
By time the students are ready to put the tools based on this
non-trivial use, it is already too late in the semester. Further,
instruction selection phase
remains more of a theory and very little practice gets across to the
students. The proposed approach eliminates these drawbacks and
bring the tools very early in the course. This is possible because the
Note that the basic motivation behind this approach is to be able to
strike a good balance between theory and practice. The hope is that by
bringing the practice upfront as a driving force a better balance can
be achieved compared to the current offerings which are perceived as
being biased towards theory by students.
can be used without
knowing the theory tool. An added advantage of this approach is that it
serves as a motivation for the students to learn the theory by raising