Name of the course: Design and Implementation of the Gnu Compiler Generation Framework

Level of the course: This is a course to be offered to M. Tech. students.

Prerequisite: UG level compilers and architecture course

Structure of the course: Lectures 0, Tutorials 1.5, Practicals 3, Credits 6.

Motivation :

Due to rapid advancements in the nature and number of programming languages, operating systems and architectures the gap between the theory of compiler construction taught in standard courses and the practice of development of industry strength compilers has been increasing steadily. This course is aimed at demonstrating the application of various compilation techniques in an industry strength production quality compiler which is in a wide-spread use. GCC is a natural choice for this course as it is one of the most widely use compiler generation framework that has been used to build compilers for several dozen architectures and many operating systems. The main takeaways of the course are:

Objective of the course:

The course will study GCC from the view point of the impact of retargetability on the design decisions related to the architecture of the compiler. The main learning is through a series of controlled experiments to be conducted on the GCC sources. Each experiment is designed to bring out specific features of different phases of the compiler.

Course Contents:

  1. A review of compilation phases, overview of GCC architecture, the impact of retargetability.

  2. AST and Gimple IRs in GCC, AST to Gimple translation, machine independent optimizations on Gimple IR

  3. RTL IR, Gimple to RTL Translation, Retargetability Issues, GCC target representation architecture details (C preprocessor macros, instruction set semantics representation, MD-RTL language to capture target instruction semantics at compiler construction time, target specific support system, incorporating target into the IR-RTL language to express target instruction semantics at compilation time), build time conversion of MD-RTL to IR-RTL.

  4. Instruction emission (RTL to target assembly translation)

Experiments:

The experiments involve building GCC, experiments on AST, Gimple, RTL, and machine descriptions.

Texts/Reference:

Instructors who could be interested in offering this course: Uday Khedker, S. Biswas, A. Sanyal




Additional Information about the course, Sample Experiments:

These are for internal use and are not required to be sent to the academic office.

A) Building GCC, and initial experiments

A.1) Download GCC version 4.0.x from IITB FTP server and build and install the compiler in your home directory. Ensure that the compiler binary does not clash with the system compiler and that you can use it from anywhere in your file system/home directory.

A.2) Find out the switches of the compiler that can be used to request GCC dump AST, Gimple, and RTL
representations. Study the dumps for compilation of following sample C programs. Relate the dumps to the C programs.

a) An empty main().

b) Global integer variables used in simple arithmetic expressions.

c) Local variables instead of globals in the program in (b).

d) Other primitive data types, along with qualifiers.

e) structs of simple integer variables.

f) Branching constructs in C: "if-then-else", "switch-case".

g) Looping constructs: "while-do", "for".

h) Function calls of various number of arguments.


A.3) From the RTL dumps, obtain the layout of the activation record.

B) AST

B.1) From the raw tree dumps, find the subtrees of various constructs (as specified by the instructor) in the input program (e.g. subtree of "int a;",or "a_function_call();" etc.) and report the information contained in the root nodes of these subtrees.


C) Gimple

Unless otherwise stated, we are concerned with the following optimizations in the Gimple pass: alias analysis, useless statements removal, partial redundancy elimination, loop unrolling, and dead code elimination

    C.1) From the source code of GCC, list all the possible optimizations that the compiler performs on a program. Compare this with the actual optimizations performed for a set of given programs. Explain any differences observed.

    C.2) Change the order of pass execution in GCC and determine the impact on compilation of a few programs.

    C.3) Add a "null" optimization pass that merely announces the entry into and exit from the pass.

    C.4) To the "null" pass above, add code that would traverse the Gimple (CFG) data structure and generate some simple statistics, like the number ofdata types used in the program,number of global variables (total and per data type), of the program being compiled.


D) RTL

Unless otherwise stated, we are concerned with the following optimizations in the RTL pass: Global CSE, instruction combiner and instruction scheduling.

D.1) List all the optimizations that GCC can perform on the RTL representation.

D.2) Experiment with changing the order of optimization until the generated code is different. Is there a minimal change with maximal impact on the generated code? Is the generated code "correct"?

D.3) Write an interpreter for SPIM/i386 assembly for simple arithmetic expressions in C. How have you represented the semantics of the assembly instructions internally?


E) GCC Machine Description

E.1) Build a cross compiler for sparc or mips processors.

E.2) Add a minimal machine description of a new back end.

E.3) Given a minimal machine description for SPIM, add support for compiling integer arithmetic expressions with basic operators in C.

E.4) For three architectures: i386, Sparc and Mips:

a) Find the differences in the architecture details for the register file, instruction  set etc.

b) List the impact of these differences on compilation issues like activation record construction, register allocation,

and instruction scheduling.

c) Study the implementation of the macros for these issues in the respective machine descriptions.

d) Use the insights to dedicate register "r0" of SPIM for use in data move operations that have a "0" as an argument.


E.5) Implement the data movement operations in SPIM.

E.6) Implement the SPIM data movement operations in more than one way.