Coding Standards
Introduction
To be good, according to the vulgar standard of goodness, is
obviously quite easy. It merely requires a certain amount of sordid
terror, a certain lack of imaginative thought, and a certain low
passion for middle-class respectability.
-- Oscar Wilde (1854 - 1900)
The purpose of this document is to define a standard coding style to
be used by Compiler Group at CSE department, IIT-Bombay for Garbage
Collection project (and other future projects) implementation.
The effort started during first week of June, 2003, it will take time to
standardize and so the page will keep on changing.
Why Standard is Important?
Though at initial stages it seems like a burden to follow standard,
its advantages become visible once the software grows to few thousand
lines spanning few hundred files. Some of the advantages are:
- Programmer feels comfortable with the code written by others,
as it is similar to what he himself would have written.
- Person joining the group at later stage can pickup the code
easily (once he is familiar with the standards).
- If care is taken to define the standard in such a way that it
avoids problematic C++ idioms, then silly mistakes can be
avoided.
The problem with having standard is that it takes time to get
acquainted with it. And if care is not taken during this transition period,
then the resulting code will be a mix of standard and programmer's
natural style. This can be avoided by having regular code review
sessions.
Top
Naming Conventions
My name is Bond. James Bond.
-- Ian Fleming
It is very important to give meaningful names to all your constructs. A
name like getAverageHeight() or
get_avg_height() gives us much more information then
calculate().
NOTE: Though the naming convention talks about
global variables naming, it is for completeness sake only. A global in
program should be avoided as much as possible. An unnecessary global
remains unnecessary, even if it is given a very good name. Same things
hold true for Macros and #define-s too.
First some general rules for naming:
Here are specific rules for giving names to various constructs:
Constants, Enums and #define-s
#define constants should be avoided in favor
of const. If unavoidable, then same naming scheme as
for Constants and Enums should be used.
- Constants and Enumerated types (enums) names should be
distinguishable from variables.There are few conventions for it:
- All uppercase name, with _ as word separator:
MAX_ERRORS
- suffix C in name: maxErrorsC
We shall follow the first convention ( all uppercase with _ ).
Classes
- Name of a class should communicate its purpose. It is always
beneficial to identify and name all the major classes in the
program at design stage itself.
- Class name should start with an uppercase alphabet.
class BasicBlock;
Variables
- Major variables, the ones which are shared by multiple
functions and/or module should be identified and named at
design stage itself.
- Variable name should start with a lowercase alphabet.
Quad* firstQuad;
Functions
functions should be named similar to variables. We can always
distinguish between the because of the parenthesis associated with
functions.
- Prefixes should be used in functions to make its meaning clear.
This is specially useful for boolean functions. Some common
prefixes are:
- is : isPrinterOn()
- has: hasPages()
- can: canOpenBottle()
- get: getMaxLimit()
- set: setPath()
Macros
Like global variables, macros should be avoided in favor of inline
functions. But some times they become unavoidable (for e.g
assert). Special care should be taken while defining
Macros which themselves declare variables.
- Macro names should be distinguishable from function
names. Again, there are few conventions for it:
- All uppercase name, with _ as word separator:
GET_DATA()
- suffix _m in name: getData_m()
- Variables defined inside macro body should NEVER clash with
name in the scope calling the macro, or havoc will result. It
is a good idea to have an entirely different convention for
naming variables defined inside macro body.
We shall follow the first convention ( all uppercase with _ ).
Top
File Naming and Organization
It is not so important to know everything as to know the exact
value of everything, to appreciate what we learn, and to arrange what
we know.
-- Hannah More
Files should be organized into directories in a module-wise fashion
instead of having a monolithic structure where all source code files
and all header files are in a single directory. This should be part
of design process.
File suffix should be used to distinguish file type:
- Yacc source file should have .y suffix
- Lex source file should have .l suffix
- C code files should have .c suffix
- C++ code files should have .cpp suffix.
- header files should have .h suffix
- Object files should have .o suffix
Though it is possible to have alternative suffixes for some of the
file types (for example .C, .cc etc for C++), there is no
reason for preferring one over other, as long as everyone uses the
same suffix.
Compilation should be done using make through rules
defined in Makefile.
Header files are for declarations. They should NOT contain any definitions
of variables,functions, and methods.
A source file should NEVER be #include-d in other file. Source files
are meant to be compiled separately. make utility depends on it
for saving recompilation time.
Top
Formatting and Indentation
When I'm working on a problem, I never think about beauty. I think
only how to solve the problem. But when I have finished, if the
solution is not beautiful, I know it is wrong.
-- R. Buckminster Fuller (1895 - 1983)
- The start brace "{" and end brace "}" for a function should be
in first column (and preferably alone in that line). Vi(m)
depends on this for easy navigation to start and end of a
function.
- Indentation should be 4 spaces.
- A line should NEVER be more than 80 characters, because that is
the default width of xterm, printers etc. Agreed you can adjust
the width while reading on screen, but you cannot adjust the
width of your printout. And it is a pain to read lines like
this:
.
.
.
if (cond)
{
// do something
if (more conditions)
{
// handle the complex
nesting
average_cost = cost/num
ber_of_people;
}
}
.
.
.
It helps to keep the width of xterm (or any other application
used for editing) to 80 columns, so you know as soon as you
violate this guideline
- Nesting should be less then than 3 or 4 levels, else think
about factoring out code.
- We shall follow policy of placing the initial brace and the
trailing brace on its own line, inline with the keyword:
if (condition) while (condition)
{ {
... ...
} }
- if-then-else formatting should be like the following example:
if (...) if (...)
{ {
} }
else else if (...)
{ {
} }
else
{
}
This list is in no way complete, things will be added as and when
seemed necessary. If something is not described here then there is
one final rule for it: MAKE IT READABLE.
Top
Comments and Documentation
Comments are free but facts are sacred.
-- Charles Prestwich Scott
(Note: Here I am not talking about the documentation as in
design-document or user-guide, but documents which explain working of
some part of the code and passed among generation of developers to
understand the source code.)
Comments and Documentation are aids to understand the code. They
help us in following the program flow, and skip parts for which we are
not interested in details.
If one is not careful, then they become rewriting of what
is already written in the code. So, they have same problems as
duplication of source code: consistency, maintenance etc. These should
be minimized by making the code self explanatory.
Of course, there are cases when comments/documentation is
needed. No one is expected to know each and every details of all
modules of a complex system. In such cases comments/documentation
should be used to provide an overview of the system. Also, they can be
used to record other useful informations - e.g. bug-id, kludge,
reference etc., and to keep track of sequence of events which lead to a
particular design decision.
Some situations where comments are required are explained below.
- Each major function should have a header and following should
be documented :
- Promise
- Contract
- Requirement
Note:
All of this should be very briefly, concisely and
precisely written. if you are not able to do so, it
means that you have not designed the module function
correctly.
- Global variable declaration should be accompanied by a comment
describing its purpose, valid usage, and other
nitty-gritties.
- Comments should also be used to describe uncommon scenario,
kludges, workarounds. These should have specific format
for easy identification.
Top
Classes
The loftier the building, the deeper must the foundation be laid.
-- Thomas Kempis
- Ensure that all the classes in your application have:
- default constructor
- copy constructor
- overloaded = operator
Also ensure that all the class data items are appropriately
initialized in constructor and = assigns to each member of
the class.
- Don't ever directly modify a data structure. abstract the
modification in a separate routine. this will make your core
application code transparent to any changes that may occur to the
underlying data structure.
- An extension of the above point. don't make your modifier
routines public thereby allowing every class to modify the data
and thus creating a cause for potential havoc. Instead:
- Make the modifiers private.
- identify classes that may need to modify the data of this
class A and make those classes as friend of the class
A.
- Some routines/interfaces are inherently used by every one -
for e.g. push/pop routines of a generic Stack class. Only
such routines must be made public.
- Separate the core algorithm/strategy to be implemented in a
separate class from the actual class consisting of data and
data management/bookkeeping routines.
- Ensure that your classes are not bloated. If you have a
class which has a huge amount of data and methods, it means that
you have not designed to correct level of granularity and hence
need to break this bloated class into more classes. The final
required object can be constructed by object composition of these
subclasses. Note that the new class you identify can only have
methods and no data items in particular (see previous point).
- Ensure that all derivable classes have virtual destructor.
Top
Functions
Success is more a function of consistent common sense than it is of
genius.
-- An Wang
The points mentioned here also apply class methods, macros also unless
explicitly mentioned otherwise.
- A Function should normally do only one job and do it well. Avoid
generic functions with lots of conditional branches to do
everything. If a function is supposed to do multiple jobs (for
e.g. a driver function to add, delete, modify etc.), then
create helper functions and delegate responsibilities
to them.
- Make functions simple and small. Ideal size of functions is
around 35 - 40 lines, each at most 80 characters (see Formatting and Indentation). Basic idea is
that it should be contained in a single screen.
An Example of clever coding is :
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= n; j++)
{
A[i][j] = (i/j) * (j/i);
}
}
This example makes array "A" an identity matrix. It can be
rewritten in a much readable form as:
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= n; j++)
{
A[i][j] = 0;
}
A[i][i] = 1;
}
Incidentally, the simpler example is also more efficient as it
avoids 2 division and 1 multiplication for each of n*n
iterations.
- Typical macro should be no more then 2 lines.
- Number of parameters to a procedure should be no more then 5-6.
- The name of function and its parameter should have logical
significance.
- Side-effects should be avoided. If unavoidable, then they should
be properly documented.
- When passing a parameter to a function by value, Don't pass a
huge object/array etc. by value, as thumb rule, pass it by a
constant reference. This should lead to performance gains. This
also applies to local variables that you may create inside a
function. Example:
AliasAnalyzer AA = CoreAnalyzer.
If AA serves only as a temporary variable which
has no meaning beyond the scope of the function, then use a
reference. The way as it is used correctly, copy constructor of
CoreAnalyzer would get invoked and it would copy
all the contents of CoreAnalyzer to
AA. Also, destructor of
AA will be called at the end of
scope. Instead we could do with :
const AliasAnalyzer& AA = CoreAnalyzer.
Top
Using STL
Libraries are not made; they grow.
-- Augustine Birrell
This section summarizes points to be taken care of while using Standard
template Library (STL).
- Use STL instead of creating your own container data
structures.
- Don't use hash_maps in STL, they are not portable across
platforms (MSVC on Windows does not support hash_maps). In
case you need to use a hash_map, take an approval from
appropriate person.
- When using maps in STL, make sure you have defined the
LessThan function object. maps need this function object
for ordering of elements that are inserted in the map. you don't
need to write this function object in case the key element in
your map is an integer.
- Use typedef to create iterator types or else the code becomes
unnecessarily lengthy.
- For big data types (classes), use pointer to object instead of
object itself to create STL data type (vector, set etc.). The
reason for this is STL data types may move around their data lots
of times. For big data, this means a lot of calls to copy
constructor, which incurs run time penalty.
Top
Pointers vs References
You will find it a very good practice always to verify your
references sir.
-- Martin Routh
If coding in C++, encourage use of references instead of
pointers. Infact a pointer should typically be passed to a function
only in cases where you need to execute something on the pointer being
null condition.
Top
Minimizing Bugs while Coding
If debugging is the art of removing bugs, then programming must be the
art of inserting them.
-- Unknown
Being a little careful while coding makes it easy to minimize bugs in the
code. Though there are no fixed rules to achieve it, some tips are given
below.
- Follow the guidelines!
- Initialize your variables.
- Limit the scope of variables. In a language like C++, local
variables should be declared near to its first use. This helps in
keeping track of initialization of local variables.
- Build a simple memory profiler for your application by overriding
the new and delete operators for all the classes. Keep a static
count in each of the classes. Increment this count in the
overriden new operator to indicate an object allocation. Similary
decrement the count in the overriden delete operator. At the end
of the applicaton run, the count should be zero, thus indicating
that the no. of allocs are equal to the no. of deallocs. Any
mismatch indicates a problem (memory leak) and hence should be a
cause of concern.
- Keep optional compiler warnings on. Most of the compilers
can warn for legal language idioms which can be used in mistaken
way and for non-portable code, for e.g.:
if (a = '\n' ) { /* should it be if (a == '\n') ? */
lineNum++;
}
Following flags given to gcc/g++ catch almost all
important warnings (see man page of gcc/g++ for details).
-Wall -Wextra -O -pedantic -Werror
-Werror forces gcc/g++ to treat warnings as errors.
lint and its variations on various platforms
(splint, pclint etc) can also be used instead.
- Use compiler directives and/or command line flags to generate
controlled diagnostics. A very simple example is:
#ifdef DBG_ON
# define dbg_print(stmt) stmt
#else
# define dbg_print(stmt) (void)0;
#endif
int main()
{
printf("Compile with -DDBG_ON flag to "
"see one more line of output.\n");
dbg_print(printf("-DDBG_ON flag given\n"));
return 0;
}
- Compiler directives can also be used to write driver code which
can be used to test parts of code independently. Use conditional
compilation (or conditional function calls) to perform unit
testing while a function/module is being developed.[TODO: Example
To Add]
- Makefile rules should be used to control the compilation process
by creating rules/variables to pass different flags on ompiler
command-line.
- Assert Yourself. Any time you make an assumption, make
sure you put an assertion for that assumption. All C/C++
compilers provide a default implementation of assert (see
man assert). In case the assertion is violated, it
prints the file-name and line-number of failing assert. This is
sufficient for small projects. For large projects, however, you
may want to write your own version of assert to provide more
diagnostics. An example implementation is:
/* ---------------------- my_assert.h ---------------------- */
/* file to define customized assertion.
* Use model:
*
* assert((x > 0), info("unexpected negative value %d", x));
*/
/* helper functions */
extern void assert_msg(const char *expr, const int line,
const char *filename);
extern void info(const char *fmt, ...);
#ifdef NO_DBG
# define assert(_cond_, _info_) (void)0
#else
# define assert(_cond_, _info_) \
if (_cond_) { /* do nothing */ } \
else { assert_msg(#_cond_, __LINE__, __FILE__); _info_; abort(); }
#endif /* NO_DBG */
/* --------------------------------------------------------- */
/* ---------------------- my_assert.c ---------------------- */
/* file to define helper functions used by customized assert.
*/
extern void info(const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
(void) vfprintf(stderr, fmt, ap);
va_end(ap);
(void)fprintf(stderr, "\n");
}
extern void assert_msg(const char *expr, const int line,
const char *filename)
{
/* Cleanup stuff, e.g closing file handles, releasing
* memory etc., should be put here.
*/
fflush(NULL);
(void)fprintf(stderr, "\nASSERTION (%s) FAILED: %s %d\n",
expr, filename, line);
}
/* --------------------------------------------------------- */
Top
Minimizing Bugs by Testing
What we have to do is to be forever curiously testing new opinions
and courting new impressions.
-- Walter Pater
Testing is an integral part of software development. Tests help us not
only in making sure that what we have written is correct, but also
in finding out if someone breaks the code later.
Tests are written either by the programmer himself, or by someone in QA
(Quality Assurance) team - one who knows only about the working of the
software, but does not have any knowledge about the
implementation. Second type of testing is called black-box
testing.
Both types of testing have their own advantages and disadvantages. A
person who has knowledge about the program can read the code, and
construct a test-case about the part which is not handled properly. But,
at the same time he gets a little biased about the description written
in the code. QA person writes random test hoping to catch
missing/incorrectly implemented feature. He has nothing but functional
specifications and his own experience to guide him. But this result in
testcase which are not biased by impleentation. For e.g., programmer
tend to write testcase about valid inputs only, because they are more
interested in the working of their algorithm. While QA people try to
write testcases about error conditions to see if the errors are handled
in a satisfactory way.
Here we are more interested in the testing done by programmer himself,
as there in no seperate QA team. Some important points to keep in mind
are:
- No part of your code should remain un-exercised. There should be
separate tests to exercise all branches in the program. A simple
way to ensure it is that whenever you write a branch in program,
write testcases to test each path at the branch (this means
writing atleast n testcases for n-way branch).
- Use code profiling tools (see Miscellaneous)
to make sure code is getting exercised.
- Looking at the final output is not enough. Use debugger to step
through the code to make sure that the test executes the correct
part of the code.
- Automate the comparison of final output with some golden log.
- There is never a stupid test-case. So, never delete a test-case.
Top
Miscellaneous
Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe trying to
produce bigger and better idiots. So far, the Universe is winning.
-- Rich Cook
Some of the processes which help in developing good software are:
- Having functional specifications, design document, and other
helper dicuments ready before start of coding.
- Peer code review.
- Some of the important tools used for code tuning in large
software development are:
- Lint tools (lint, splint) - To catch portability problems
- Debuggers (gdb, dbx) - To catch logical errors
- Memory Profilers (purify, mempetrol, valgrind) - To detect
memory leaks, unitialized memory use etc.
- Code coverage tools (purecov, tcov, gcov) - To look for
the part of code which did not get exercized
- Code Profilers (quantify, gprof, prof, vprof) - Used for
optimizing the performance
Look for the tools available for your platform, learn them and
use them extensively.
Top
References
The man who doesn't read good books has no advantage over the man who
can't read them.
-- Mark Twain (1835 - 1910)
- The Elements of Programming Style,
Brian W. Kernighan and P. J. Plauger.
-
The Practice of Programming,
Brian W. Kernighan and Rob Pike.
- Effective C++, Scott Meyers.
- More Effective C++, Scott Meyers.
- Code Complete, Steve McConell.
- Writing Solid Code, Steve Maguire.
-
C++ Coding Standard
Top
Last updated by $Author: karkare $ on $Date: 2005/05/29 17:48:00 $