The current version of sclp is preceded by many earlier versions.
Its immediate predecessor of cfglp implemented by Tanu Kanvar and
Uday Khedker (http://www.cse.iitb.ac.in/~uday) for the courses
cs302+cs316 Language Processors (theory and lab) at IIT Bombay in
2012. Its input was control flow graph (cfg) representation of
GIMPLE produced by GCC as input. Its front end used flexc++ and
bisonc++. It was tweaked in many ways and updated by a host of
students. However, none of the updates brought a significant change.
Most significant changes started happening from 2019 and are
described below.
- In 2019, N Venkatesh made many changes that prompted changing
the name from cfglp to sclp.
- Its input language was changed from cfg
representation to a subset of C.
- The front end was revised to replace flexc++ and
bisonc++ by flex and bison. Flexc++ and bisonc++ generate a
C++ scanner/parser. However, the interaction between the
parser and the scanner and the rest of the code was
non-trivial and migrating the implementation from Ubuntu 12.04
to later versions was a pain. By contrast, flex and bison
generate a C scanner/parser that use C++ code in
actions. Hence we started using flex and bison. The
resulting code easily compiled with g++ and its migration to
later versions of Ubuntu has not been an issue.
- Since the input language was no longer constrained
by the dump of GCC, we added a print operation for printing
the values of variables.
- In 2020, there were many updates to sclp.
- Nisha Biju and Saari Rajan helped Uday Khedker in
the following
- Instead of generating RTL (earlier called icode)
directly from AST, a Three Address Code (TAC) IR was
introduced and RTL code was generated from TAC.
- String data type with constant strings was
introduced.
- A read operation was supported for integer and
floating point variables, and print operation was extended
to all variables (including string variables) and string
constants.
- Significant changes were made in command line
switches for printing each representation (tokens, AST, TAC,
RTL, and assembly) as well as stopping after each phase
(scanning, parsing, AST construction (i.e. semantic
analysis), TAC generation and RTL generation).
- The source was engineered to produce compiler for
a given combination of language level (L1, L2, L3, L4, L5)
and compilation phase (scan, parser, ast, tac, rtl, asm) by
taking the combination as the input. The earlier version of
sclp maintained different code base for a fixed set of
combinations of language and compilation phase. This was
problematic because a bug fix of enhancement in the code
base of one combination had to be manually implemented in
the code base of other combinations.
- Nicky Nagdev, Manali Wani, and Samruddhi Hardas
introduced generation of control flow graph over the TAC IR.
They also introduced the construction of SSA form of the IR.
- Rasesh Tongia and Mehul Jain implemented data flow
analysis passes for available expressions analysis and live
variables analysis. They also implemented these analysis at
the interprocedural level using value contexts.
- The following changes made on the then existing
code have yet not been ported to the new version because of
continuous changes in the way the code is engineered.
- Saari Rajan re-implemented an interpreter for AST
because of substantial changes in ASTs, thanks to the a
change in the input language.
- Mansi Shinde, Simran Moondra, Aditi Sathiyapalan,
and Sejal Shroff extended the front end to support a
restricted form of classes.
- In 2021, sclp underwent the following updates
- Uday Khedker extended the support for pointers and
arrays. The ASTs support for structures was also created. The
parser needs to be enhanced to support structures along with
declaration processing and type checking.
- Michelle Thalakottur, Kinjal Parikh, Pranathi Bora,
and Ayushi Jain rewrote the assembly generation pass by
creating an explicit data structure to store the assembly IR
and cleaned up some design decisions in the implementation.
- Aditya Pradhan made the following important
contributions:
- A major limitation of sclp was that that each
phase was dependent on the earlier phases so if a student
had trouble in one assignment, all subsequent assignments
were severely limited. Hence sclp was extended to support
import and export of AST, TAC, and RTL IRs as JSON objects.
This allows a standalone implementation of a phase by
importing the JSON object of the previous IR. These JSON
objects can be obtained by the students by running the
reference implementation whose executable form is made
available to the students.
- Since the number of passes was growing, a pass
manager inspired by GCC's pass manager, was introduced. The
pass manager allows categorization of passes as mandatory
and optional passes and defining their preconditions and
prerequisite passes. This makes it much easier to introduce
passes.
- The specification of command line options was
rationalized and re-implemented using the argp.h library.
This is how we have reached the current version.