A History of Sclp


The current version of sclp is preceded by many earlier versions. Its immediate predecessor of cfglp implemented by Tanu Kanvar and Uday Khedker (http://www.cse.iitb.ac.in/~uday) for the courses cs302+cs316 Language Processors (theory and lab) at IIT Bombay in 2012. Its input was control flow graph (cfg) representation of GIMPLE produced by GCC as input. Its front end used flexc++ and bisonc++. It was tweaked in many ways and updated by a host of students. However, none of the updates brought a significant change. Most significant changes started happening from 2019 and are described below.

  1. In 2019, N Venkatesh made many changes that prompted changing the name from cfglp to sclp.
    1. Its input language was changed from cfg representation to a subset of C.
    2. The front end was revised to replace flexc++ and bisonc++ by flex and bison. Flexc++ and bisonc++ generate a C++ scanner/parser. However, the interaction between the parser and the scanner and the rest of the code was non-trivial and migrating the implementation from Ubuntu 12.04 to later versions was a pain. By contrast, flex and bison generate a C scanner/parser that use C++ code in actions.  Hence we started using flex and bison. The resulting code easily compiled with g++ and its migration to later versions of Ubuntu has not been an issue.
    3. Since the input language was no longer constrained by the dump of GCC, we added a print operation for printing the values of variables.
  2. In 2020, there were many updates to sclp.
    1. Nisha Biju and Saari Rajan helped Uday Khedker in the following
      1. Instead of generating RTL (earlier called icode) directly from AST, a Three Address Code (TAC) IR was introduced and RTL code was generated from TAC.
      2. String data type with constant strings was introduced.
      3. A read operation was supported for integer and floating point variables, and print operation was extended to all variables (including string variables) and string constants.
      4. Significant changes were made in command line switches for printing each representation (tokens, AST, TAC, RTL, and assembly) as well as stopping after each phase (scanning, parsing, AST construction (i.e. semantic analysis), TAC generation and RTL generation).
      5. The source was engineered to produce compiler for a given combination of language level (L1, L2, L3, L4, L5) and compilation phase (scan, parser, ast, tac, rtl, asm) by taking the combination as the input. The earlier version of sclp maintained different code base for a fixed set of combinations of language and compilation phase. This was problematic because a bug fix of enhancement in the code base of one combination had to be manually implemented in the code base of other combinations.
    2. Nicky Nagdev, Manali Wani, and Samruddhi Hardas introduced generation of control flow graph over the TAC IR. They also introduced the construction of SSA form of the IR.
    3. Rasesh Tongia and Mehul Jain implemented data flow analysis passes for available expressions analysis and live variables analysis. They also implemented these analysis at the interprocedural level using value contexts.
    4. The following changes made on the then existing code have yet not been ported to the new version because of continuous changes in the way the code is engineered.
      1. Saari Rajan re-implemented an interpreter for AST because of substantial changes in ASTs, thanks to the a change in the input language.
      2. Mansi Shinde, Simran Moondra, Aditi Sathiyapalan, and Sejal Shroff extended the front end to support a restricted form of classes.
  3. In 2021, sclp underwent the following updates
    1. Uday Khedker extended the support for pointers and arrays. The ASTs support for structures was also created. The parser needs to be enhanced to support structures along with declaration processing and type checking.
    2. Michelle Thalakottur, Kinjal Parikh, Pranathi Bora, and Ayushi Jain rewrote the assembly generation pass by creating an explicit data structure to store the assembly IR and cleaned up some design decisions in the implementation.
    3. Aditya Pradhan made the following important contributions:
      1. A major limitation of sclp was that that each phase was dependent on the earlier phases so if a student had trouble in one assignment, all subsequent assignments were severely limited. Hence sclp was extended to support import and export of AST, TAC, and RTL IRs as JSON objects. This allows a standalone implementation of a phase by importing the JSON object of the previous IR. These JSON objects can be obtained by the students by running the reference implementation whose executable form is made available to the students.
      2. Since the number of passes was growing, a pass manager inspired by GCC's pass manager, was introduced. The pass manager allows categorization of passes as mandatory and optional passes and defining their preconditions and prerequisite passes. This makes it much easier to introduce passes.
      3. The specification of command line options was rationalized and re-implemented using the argp.h library.
This is how we have reached the current version.