Name of the course: Design and Implementation of the Gnu Compiler Generation Framework
Level of the course: This is a course to be offered to M. Tech. students.
Prerequisite: UG level compilers and architecture course
Structure of the course: Lectures 0, Tutorials 1.5, Practicals 3, Credits 6.
Motivation :
Due to rapid advancements in the nature and number of programming languages, operating systems and architectures the gap between the theory of compiler construction taught in standard courses and the practice of development of industry strength compilers has been increasing steadily. This course is aimed at demonstrating the application of various compilation techniques in an industry strength production quality compiler which is in a wide-spread use. GCC is a natural choice for this course as it is one of the most widely use compiler generation framework that has been used to build compilers for several dozen architectures and many operating systems. The main takeaways of the course are:
A student gets to see a practical application of the theory in a compiler that is used by the student often.
A student acquires the skill of understanding and modifying a large code base systematically.
A student will be able to contribute to open source movement by contributing to GCC.
Objective of the course:
The course will study GCC from the view point of the impact of retargetability on the design decisions related to the architecture of the compiler. The main learning is through a series of controlled experiments to be conducted on the GCC sources. Each experiment is designed to bring out specific features of different phases of the compiler.
Course Contents:
A review of compilation phases, overview of GCC architecture, the impact of retargetability.
AST and Gimple IRs in GCC, AST to Gimple translation, machine independent optimizations on Gimple IR
RTL IR, Gimple to RTL Translation, Retargetability Issues, GCC target representation architecture details (C preprocessor macros, instruction set semantics representation, MD-RTL language to capture target instruction semantics at compiler construction time, target specific support system, incorporating target into the IR-RTL language to express target instruction semantics at compilation time), build time conversion of MD-RTL to IR-RTL.
Instruction emission (RTL to target assembly translation)
Experiments:
The experiments involve building GCC, experiments on AST, Gimple, RTL, and machine descriptions.
Texts/Reference:
Brian J. Gough. An Introduction to GCC. Network Theory Ltd, 2005.
Course material of GCC workshop conducted by IIT Bombay.
Instructors who could be interested in offering this course: Uday Khedker, S. Biswas, A. Sanyal
Additional Information about the course, Sample Experiments:
These are for internal use and are not required to be sent to the academic office.
A) Building GCC, and initial experiments
A.1) Download GCC version
4.0.x from IITB FTP server and build and install the compiler in your
home directory. Ensure that the compiler binary does not clash with
the system compiler and that you can use it from anywhere in your
file system/home directory.
A.2) Find out
the switches of the compiler that can be used to request GCC dump
AST, Gimple, and RTL representations. Study the
dumps for compilation of following sample C programs. Relate the
dumps to the C programs.
a) An empty main().
b) Global integer variables used in simple arithmetic expressions.
c) Local variables instead of globals in the program in (b).
d) Other primitive data types, along with qualifiers.
e) structs of simple integer variables.
f) Branching constructs in C: "if-then-else", "switch-case".
g) Looping constructs: "while-do", "for".
h) Function calls of various number of arguments.
A.3) From the RTL
dumps, obtain the layout of the activation record.
B) AST
B.1) From the
raw tree dumps, find the subtrees of various constructs (as specified
by the instructor) in the input program (e.g. subtree of "int
a;",or "a_function_call();" etc.) and report the
information contained in the root nodes of these subtrees.
C) Gimple
Unless otherwise stated, we are concerned with the following optimizations in the Gimple pass: alias analysis, useless statements removal, partial redundancy elimination, loop unrolling, and dead code elimination
C.1) From the source code of GCC, list all the possible optimizations that the compiler performs on a program. Compare this with the actual optimizations performed for a set of given programs. Explain any differences observed.
C.2) Change the order of pass execution in GCC and
determine the impact on compilation of a few programs.
C.3)
Add a "null" optimization pass that merely announces the
entry into and exit from the pass.
C.4) To the "null" pass above, add code that would traverse the Gimple (CFG) data structure and generate some simple statistics, like the number ofdata types used in the program,number of global variables (total and per data type), of the program being compiled.
D) RTL
Unless
otherwise stated, we are concerned with the following optimizations
in the RTL pass: Global CSE, instruction combiner and instruction
scheduling.
D.1) List all the optimizations that GCC can
perform on the RTL representation.
D.2) Experiment with
changing the order of optimization until the generated code is
different. Is there a minimal change with maximal impact on the
generated code? Is the generated code "correct"?
D.3)
Write an interpreter for SPIM/i386 assembly for simple arithmetic
expressions in C. How have you represented the semantics of the
assembly instructions internally?
E) GCC Machine Description
E.1) Build a
cross compiler for sparc or mips processors.
E.2) Add a
minimal machine description of a new back end.
E.3)
Given a minimal machine description for SPIM, add support for
compiling integer arithmetic expressions with basic operators in
C.
E.4) For three architectures: i386,
Sparc and Mips:
a) Find the differences in the architecture details for the register file, instruction set etc.
b) List the impact of these differences on compilation issues like activation record construction, register allocation,
and instruction scheduling.
c) Study the implementation of the macros for these issues in the respective machine descriptions.
d) Use the insights to dedicate register "r0" of SPIM for use in data move operations that have a "0" as an argument.
E.5) Implement the data movement
operations in SPIM.
E.6) Implement the SPIM data movement
operations in more than one way.