The genius of Stallman ====================== In my attempt to write an RTL-based binary retranslator (essentially an asm->RTL decompiler) i've had to read and understand the some of the GCC generator programs in some detail. This is a project as part of the "Design and implementation of the GNU Compiler collection" course , a course specifically meant for understanding some of the complexities of GCC. The piece de resistance of GCC is ofcourse the code-generator programs. The programs are invoked during the compiler build process and they output C programs. The input to the generator programs are the "machine description" files, an elegant declarative way of specifying instruction patterns of a machine. These generated programs are thus the 'machine dependent' part of the code-generation phase. The whole set-up is extraordinarily convoluted,complicated, and elegant at the same time. I've been mainly focussed on the instruction-pattern recognizer:genrecog. Genrecog is the generator program that generates the RTL->asm_insn recognizer program. It basically uses the declarative predicates present in the MD files, converts them to C tests, and returns the code of the instruction to be emitted. It was by chance that i stumbled onto the GCC-1.27 source on my hard-disk. "This will be fun", i said to myself, "It'll probably be just a simple C->asm conventional compiler, easy to understand". In utter disbelief i stared at the screen as it showed me the same source file(genrecog.c) i had been reading for the past 2 weeks. On top it says " Copyright (C) 1987,1988 Free Software Foundation, Inc." So yes, the code is 23 years old, and not much has changed! Even way back then, back before i was even born, GCC had machine descriptions, RTL, generator programs, all the infamous passes: reload,combine,final,recog,expand. So the code itself has certainly survived the test of time, and opensource. Let's not forget that GCC was the first Free Software project undertaken, before the days of the internet(well, almost), when there was no slashdot or reddit to tell the world about your Great New Compiler. Observe a file after 2 years in the Linux kernel and you wont have a clue what happened. Entire subsystems get elimninated,refactored,butchered,combined,merged,renamed .. you name it. Developers devour your code like pirhanas, constantly making changes. But what happened to GCC? It is indeed the Cathedral, standing tall after all these years. Coming back to the machine-descriptions. What was most surprising after browsing the 1.21 source (the earliest version of GCC available, unfortunately) was that such an amazingly advanced concept was used even at that time, back when the compiler was in infancy. Even in those early stages, GCC had it's own (and quite powerful) extensions to the C language (var length arrays etc), supported a bunch of architectures, and had superb documentation. Infact, most of the current documentation of GCC (the GCC internals document) is exactly the same as the one present in 1.21, in the internals.dvi (this is back in those days when pdf was yet to be invented). And here's the really good part: all those 1,25,000 lines of magnificence were written by one man. Read the changelog to find out who. What an amazing foresight and uncomrehensible genius must RMS be having to write a compiler framework using machine-descriptions and generator programs? The fact that even today, 25 years after he wrote the first version, after 100s of the most talented hackers have worked on the compiler; his design, his code, his documentation, his intuition behind the retargetability mechanism, all stand intact? GCC then, is the oldest "open source" software still in active use. Sure, a lot of linux code can be traced back to early BSD/UNIX/SUN, but only in spirit. But no, there is something older than GCC. Its the editor i am writing this on. Emacs. For >35 years now it has been the most advanced text-editor on the planet, and that is again due to the genius of Stallman's design and programming skills. Sure, vi users might argue, but the only reason vi has survived so long is because lots of users have managed to contribute extensions to it. And it has more users than emacs because it appears to be easier to use. Emacs on the other hand is still growing in abilities at a blinding pace - mainly due to it's brilliant design. So RMS it seems, was not only the last of the true hackers, he was also the greatest.