Language supported by Sclp

The language supported by sclp is based on C with several simplifications and some enrichments. We describe it in terms of data types, values, operations on them, and data and control constructs. This description is based on the current design. The language and sclp is a work in progress and will be updated later.

Data type, values, and operations on values

sclp supports integers, floating point numbers, booleans values, and strings:

Integer (int) and floating point (float) numbers have the usual semantics in C (4 byte values for integers and 8 byte values for floating point numbers). They appear as constants in programs and new values can be computed using the following arithmetic operators: ternary operator ?: (conditional expression); binary operators +, -, *, /; and unary operator -. They can be compared using the comparison operators <, <=, >, >=, !=, and ==. These values can be read from standard input and can be printed on standard output. The floating point numbers appearing in the program are accepted only in decimal point format. However, the spim input-output support accepts without a decimal point as input value for a floating point variable.

Unlike C, integers are not interpreted as boolean values. Instead there is a type called bool and boolean values are computed using the comparison operators. They can be assigned to variables and can be operated by the boolean operators &&, ||, and !. Note that boolean values do not have a concrete syntax to appear in the programs and thus we cannot use true or false as constants in a program. (This design choice has been made to keep the number of classes in sclp implementation small, primarily for rolling out a reference implementation on time; it may change later.) If true and false values are required, they can be computed by assigning comparison expressions such as 3>5 (for false value) and 3<5 (for true value) to variables declared as bool. These values cannot be read from standard input or printed on standard output.

The strings (string) supported by sclp appear as sequences of characters in a pair of double quotes ("). There are no operators on strings (nor any library functions) to compute strings or compare them. They can be assigned to variables and can be passed around from a variable (declared as string) to another through assignments. They cannot be read from standard input but can be printed on standard output. Since there are no operations on strings, they are not null terminated. Besides, there is no escape character and hence a double quote cannot appear within a string.

Language Constructs

At the moment, the language supports scalar values (int, float, bool, string), pointers and multi-dimensional arrays. Pointers can point only to scalars and arrays can contain only scalars. There are no user defined types. Pointers and arrays are written in the usual C syntax.

The computations in a program are organized using the following constructs:

A program is contained in a single file (i.e. separate compilation is not supported).

Comments can be introduced in C++ style. They ignore everything appearing after // until the end of the line containing it.

There are three kinds of expression: arithmetic, comparison (or relational), and boolean.

The arithmetic operators have higher precedence than the relational operators which in turn have a higher precedence than the boolean operators. Within a group of operators (arithmetic, relational, and boolean), the operators have the usual precedences. The relational operators are non-associative; other operators have the usual associativity as in C.
The arithmetic expressions can be used in comparison expressions (but not the other way round), which in turn, can be used in logical expressions (but not the other way round). Also, no operator accepts values of different types. All expressions are free of side effects, i.e. pre- or post- increment or decrement operators (++ or --) are not supported. Note that as a consequence of this decision, --x is -(-x) in our language and so is --5 (whose value is 5) which is an error in C (side-effect operators can work only on variables).
An arithmetic expression (or a comparison expression) takes the arguments of either int type or float type but they are not allowed to be mixed. For example the expression a+b<c+d is a valid relational expression with the implied (syntactic) grouping (a+b)<(c+d); the compiler does not treat the grouping as a+(b<c)+d because it is invalid as an arithmetic expression. As a consequence, the expression a<b+c<d is invalid under sclp semantics with any grouping. Similarly, the expression a<b&&c<d is a valid logical expression with the implied grouping (a<b)&&(c<d) but a possible grouping a<(b&&c)<d is invalid as a relational expression. Note that some of these distinctions cannot be made purely syntactically and semantic actions are used for prohibiting these errors. Hence some of these errors are not detected when the -sa-parse switch which disables the semantic actions, is used. In summary, relational operators do not take boolean or string operands. Bolean operators take only boolean values as operators.

The result of ternary expression can be a boolean or a string value too.

Expression (a<b)&&(c<d) can also be written as a sequence of statements x = a<b; y = c<d; z =x&&y; where x, y, and z, have been declared as bool variables.

Note that an assignment is not an expression. Hence unlike C, the assignment operator = cannot appear within an expression and a=b=c is disallowed.

Operators have the following precedences and associativities. The higher numbers indicate higher precedence. In the type signature below $B$ represents the bool type, $\alpha$ can be either int or float type, and $\alpha$ can be one of the int,float, bool, or string type.

Precedence Level
Operator Group
Operators
Type Signature
Associativity

1 (lowest)
Ternary Expression
?, : $B \times \alpha \times \alpha \to \alpha$
Right

2
Boolean
||
$B \times B \to B$
Left

3
&&
$B \times B \to B$
Left

4
!
$B \to B$
Right

5
Relational
!=, ==, <, <=, >, >= $\alpha \times \alpha \to B$
Non-associative

6
Arithmetic
+, -
$\alpha \times \alpha \to \alpha$
Left

7
*, /
$\alpha \times \alpha \to \alpha$
Left

8 (highest)
- (unary)
$\alpha \to \alpha$
Right

Precedence Level	Operator Group	Operators	Type Signature	Associativity
1 (lowest)	Ternary Expression	`?, :`	$B \times \alpha \times \alpha \to \alpha$	Right
2	Boolean	`\|\|`	$B \times B \to B$	Left
3	`&&`	$B \times B \to B$	Left
4	`!`	$B \to B$	Right
5	Relational	`!=, ==, <, <=, >, >=`	$\alpha \times \alpha \to B$	Non-associative
6	Arithmetic	`+, -`	$\alpha \times \alpha \to \alpha$	Left
7	`*, /`	$\alpha \times \alpha \to \alpha$	Left
8 (highest)	`-` (unary)	$\alpha \to \alpha$	Right

Statements represent the executable instructions and include the following.

Assignment statements are terminated by a semicolon. A variable of a given type can only be assigned the values of the same type.
Selection statements type if and if-else statements in the usual C syntax.
Iteration statements while and do-while in the usual C syntax. Other statements such as for, switch,and goto are not supported as of now.
Compound statement contain a sequence of statements in a pair of braces ({ and }). Unlike C, compound statements cannot include declarations. Compound statements can be empty (i.e., { } is a valid compound statement).
Unlike C, expressions terminated by a semicolon do not become statements.
Function calls are in C syntax. Unlike C, function calls do not appear as operands of expressions. A function call may appear in the right hand side of an assignment or may be terminated by a semicolon to become an independent statement. If the result of a function is required in an expression, it must be assigned to a variable and the variable used in the expression. A function may be defined anywhere but its prototype declaration must appear as a global declaration.
Print statement prints the value of a variable of the typles int, float, or a string (but not bool), and constants of type int, float, or a string. A bool value has no concrete syntax hence there is no question of printing it. Unlike C, there is no format string for printing.
Read statement reads the value of an int or a float variable. Strings cannot be read from the input. They can only appear as constants in the program. Unlike C, there is no format string for reading.
Return statement that returns a value for a value-returning function.
All statements can appear only within a function body. Thus there are no static initializations, unlike C.

Functions have optional parameters and optional return value. Function headers are specified in C syntax and are followed by a compound statement. A void function does not return a value and hence does not contain a return statement. Besides, a function may not have parameters. Similar to C, functions are not nested. They may have local variables and may also access global variables. Function prototypes must precede function definitions. Function names cannot be used as variables.

Declarations are in C syntax (type specifier followed by a comma separated list of variables terminated by a semicolon). All local declarations must appear in the beginning before any executable statement (and hence declarations may not contain initializers). Global declarations of variables and functions may be interleaved but every function call or occurrence of a variable must be preceded by its declaration. As usual, local declarations shadow the global declarations in that it is okay for both of them to have a variable with the same name but the global declaration of variable becomes invisible, i.e. a use of the variable correspond to its local declaration.

As usual, a variable name (or a function name) is a sequence of letters (underscore "_" inclusive) or digits but must begin with a letter. All keywords (for introducing types and statements) are reserved and cannot be used as names.

Several example of valid programs have been provided below. The finer details of the language can be discovered by creating examples and running them through the reference implementation.

Phases of Sclp Compiler

Sclp compiles a program by constructing a series of intermediate representations (IRs) such that each IR is computed from the earlier IR. Sclp embodies object oriented design and implementation. The implementation language is C++. Parser and scanner are implemented using lex and yacc (lex/flex manual, yacc/bison manual, lex & yacc book). The overall compilation flow is the classical sequence of various phases as illustrated below.

program : global_decl_statement_list func_def_list (rule 1) | func_def_list (rule 2) ; global_decl_statement_list : global_decl_statement_list func_decl (rule 3) | global_decl_statement_list var_decl_stmt (rule 4) | var_decl_stmt (rule 5) | func_decl (rule 6) ; func_decl : func_header '(' formal_param_list ')' ';' (rule 7) | func_header '('')' ';' (rule 8) ; func_def_list : func_def_list func_def (rule 9) | func_def (rule 10) ; func_header : named_type NAME (rule 11) ; func_def : func_header '('formal_param_list')' '{'optional_local_var_decl_stmt_list statement_list'}'(rule 12) | func_header '(' ')' '{' optional_local_var_decl_stmt_list statement_list '}' (rule 13) ; formal_param_list : formal_param_list ',' formal_param (rule 14) | formal_param (rule 15) ; formal_param : param_type NAME (rule 16) ; param_type : INTEGER (rule 17) | FLOAT (rule 18) | BOOL (rule 19) | STRING (rule 20) ; statement_list : statement_list statement (rule 21) | %empty (rule 22) ; statement : assignment_statement (rule 23) | if_statement (rule 24) | do_while_statement (rule 25) | while_statement (rule 26) | compound_statement (rule 27) | print_statement (rule 28) | read_statement (rule 29) | call_statement (rule 30) | return_statement (rule 31) ; call_statement : func_call ';' (rule 32) ; func_call : NAME '(' actual_arg_list ')' (rule 33) ; actual_arg_list : non_empty_arg_list (rule 34) | %empty (rule 35) ; non_empty_arg_list : non_empty_arg_list ',' actual_arg (rule 36) | actual_arg (rule 37) ; actual_arg : expression (rule 38) ; return_statement : RETURN expression ';' (rule 39) ; optional_local_var_decl_stmt_list : %empty (rule 40) | var_decl_stmt_list (rule 41) ; var_decl_stmt_list : var_decl_stmt (rule 42) | var_decl_stmt_list var_decl_stmt (rule 43) ; var_decl_stmt : named_type var_decl_item_list ';' (rule 44) ; var_decl_item_list : var_decl_item_list ',' var_decl_item (rule 45) | var_decl_item (rule 46) ; var_decl_item : NAME (rule 47) | NAME array_decl (rule 48) | pointer_decl NAME (rule 49) ; pointer_decl : '*' (rule 50) | '*' pointer_decl (rule 51) ; array_decl : '[' INTEGER_NUMBER ']' (rule 52) | '[' INTEGER_NUMBER ']' array_decl (rule 53) ; named_type : INTEGER (rule 54) | FLOAT (rule 55) | VOID (rule 56) | STRING (rule 57) | BOOL (rule 58) ; assignment_statement : variable_as_operand ASSIGN expression ';' (rule 59) | variable_as_operand ASSIGN func_call ';' (rule 60) | variable_as_operand ASSIGN ADDRESSOF variable_name ';' (rule 61) ; if_condition : '(' expression ')' (rule 62) ; if_statement : IF if_condition statement ELSE statement (rule 63) | IF if_condition statement (rule 64) ; do_while_statement : DO statement WHILE '(' expression ')' ';' (rule 65) ; while_statement : WHILE '(' expression ')' statement (rule 66) ; compound_statement : '{' statement_list '}' (rule 67) ; print_statement : WRITE expression ';' (rule 68) ; read_statement : READ variable_name ';' (rule 69) ; expression : expression '+' expression (rule 70) | expression '-' expression (rule 71) | expression '*' expression (rule 72) | expression '/' expression (rule 73) | '-' expression (rule 74) | '(' expression ')' (rule 75) | expression '?' expression ':' expression (rule 76) | expression AND expression (rule 77) | expression OR expression (rule 78) | NOT expression (rule 79) | rel_expression (rule 80) | variable_as_operand (rule 81) | constant_as_operand (rule 82) ; rel_expression : expression LT expression (rule 83) | expression LE expression (rule 84) | expression GT expression (rule 85) | expression GE expression (rule 86) | expression NE expression (rule 87) | expression EQ expression (rule 88) ; variable_as_operand : variable_name (rule 89) | array_access (rule 90) | pointer_access (rule 91) ; variable_name : NAME (rule 92) ; array_access : variable_name array_dimensions (rule 93) ; pointer_access : '*' variable_name (rule 94) | '*' pointer_access (rule 95) ; array_dimensions : '[' expression ']' (rule 96) | array_dimensions '[' expression ']' (rule 97) ; constant_as_operand : INTEGER_NUMBER (rule 98) | DOUBLE_NUMBER (rule 99) | STRING_CONSTANT (rule 100) ;

SCLP Data Structures