GCC Code Carnival

GCC Code Carnival is a programming contest targeted at GCC users. We have floated various projects based on our research and development. These have been divided into open projects with high level specifications and projects with concrete and detailed specifications. This will give you hands on experience with GCC and will open your doors to many opportunities.  We have provided preliminary descriptions of the projects here.  We will provide more detailed descriptions in the workshop (and also update these pages). We will create a process of registration for the contest, will decide on the mode of submission and associated timelines and announce them during the workshop. The submissions will be judged for the efficiency and clarity of code. Good quality submissions are likely to be rewarded by cash awards.

We may add more projects and update project specifications. It is in general not possible to create rules to foresee all possible situations and we reserve the right to take decisions based on the situations and alter the rules in the best interest of the spirit of the contest.

    Open Projects with High Level Specification

  • Simplifying Machine Descriptions by Eliminating C Code
    This project aims to simplify machine description of GCC back-end. A machine description file (*.md file) for a target machine contains patterns describing the instruction of the target machine. Some of these patterns contain large portions of C code to customize the instructions. To reduce the size and improve the readability of the machine description files, we propose to specify the customizations by adding patterns that customize instructions using predicates and constraints, instead of specifying in terms of C code. This conversion can be made more effective if we are able to handle constraints at the time of expansion.

    We will provide the changes to GCC source code (if any) and sample changes in i386 and spim machine descriptions. You are expected to rewrite some machine descriptions; our choice is i386 but you are free to choose some other machine descriptions too. The rewritten machine descriptions should be validated by comparing the generating assembly programs for the original and re-written machine descriptions.

  • Dynamic Loading of Functions for LTO
    Link Time Optimization (LTO) gives GCC the capability of dumping its internal representation (GIMPLE) to disk, so that all the different compilation units that make up a single executable can be optimized as a single module. This expands the scope of inter-procedural optimizations to encompass the whole program (or, rather, everything that is visible at link time).

    A full functional LTO was included in GCC 4.6.0. When using LTO, gcc writes the intermediate representation into the object file. At link time, gcc reads it in and uses it to implement cross-module optimizations. Currently, GCC supports LTO mode for inlining. It is possible for a developer to write an interprocedural pass that uses LTO facility of reading a large call graph consisting of all functions whose GIMPLE IR is available at link time. It is possible to read the call graph with summary information for the functions and avoid loading the entire function bodies. However, if an analysis requires examining function bodies then LTO framework loads bodies of ALL functions. This increases the memory consumption significantly for large programs.

    This project aims at providing the facility of dynamic loading and unloading of function bodies. Then it would be possible for an analysis to avoid reading the entire program all at the same time and bring the function bodies in memory on demand. This would have to be done efficiently.

    We will provide a report describing our study and explain the requirements. You are expected to implement APIs for dynamic loading and unloading of the control flow graph of a given function. You must show that this reduces the memory foot print compared to the situation when all function bodies are loaded together.

  • Improving Liveness-based Pointer Analysis
    Precise flow- and context- sensitive pointer analysis is generally considered prohibitively expensive for large programs; most tools relax one or both of the requirements for scalability. We have implemented liveness-based flow- and context- sensitive pointer analysis (L-FCPA) interprocedural points-to analysis in GCC-4.6.0. It computes points-to information for only live pointers and limits the propagation of pointer information to their live ranges. Interprocedural context-sensitivity is achieved through a value-based termination of call strings.

    We are in the process of changing the data structures by hiding them behind well defined APIs (see the corresponding project with detailed specifications). In this project we expect you to study the code and find other sources of improvement. Some of these could be towards improving the reach of the analysis by enriching it (eg. better handling of field sensitivity, pointers to heap, function pointers and pointer arithmetic) while some other could improve the time or space efficiency of the method.
    Projects with Concrete and Detailed Specifications

  • Improving Data Structures of Liveness-based Pointer Analysis
    Our liveness based pointer analysis currently uses linked lists for storing pointer information. Searching over large linked lists is restricted to linear search. This proves to be a bottleneck while compiling large benchmarks. We are in the process of changing the data structures by hiding them behind well defined APIs. We have a couple of naive implementations of these APIs.

    In this project we expect you to provide efficient implementation of the APIs (eg. using HashMaps or some other efficient data structures). We will provide the refactored code with APIs. 

  • Rewriting Machine Descriptions using SpecRTL
    The current machine descriptions used by GCC for target specifications are very complex, non-composable and repetitive in nature. We have devised a specification language for RTL templates (SpecRTL) occurring in machine descriptions. specRTL  provides a mechanism for combining RTL patterns to form a new pattern; thereby reducing the repetition. Machine descriptions written using specRTL are simpler and hence easy to read, construct, and maintain. For integrating specRTL with GCC, we have a specRTL compiler to convert .md files written in specRTL to generate conventional .md files. The most convenient feature of this conversion is that it is incremental and non-disruptive. It is possible to re-write a small group of patterns in specRTL retaining the rest of the .md file in the conventional form. This can be compiled using specRTL and gcc can be built and tested. Once the results seem satisfactory, one can rewrite another group of patterns.

    This project aims at converting existing machine description files to SpecRTL language. We have already done this for a significant part of i386 machine description file. You are required to rewrite the remaining part. You are also free to choose other machine descriptions. We will provide the specRTL compiler.