Cross Compilation and GCC

Cross Compilation and GCC

We use a variant of the compact T diagram notation to clearly and concisely express a conventional tool chain and apply it to demonstrate the construction of a cross tool chain. This cross chain is then used to construct a complete tool chain native to a new target system. The description clarifies the issues to be dealt with in a canadian cross build.

1 Introduction

In a chapter devoted to a description of the practical aspects of writing a compiler, Aho, Ullman and Sethiaho-sethi-ullman use a T diagram proposed by Bratmanbratman. The T diagram is used to concisely represent the three main components of compiler writing: $S$ – the source language accepted by the compiler, $T$ – the target language to which programs in $S$ are compiled to, and $I$ – the implementation language used to write the compiler itself. The notation is effectively used to discuss compiler development including bootstrapping. In this notice we present a variation of the notation and employ it to describe the construction of a cross tool chain. The symbolic nature of the notation clearly brings out the consistency of the chaining requirements. This is demonstrated by employing the notation to describe the generation of a native tool chain for a new target using the generated cross tool chain.

2 Generalizing the Notation

The T diagram captures a translation from source language $S$ to a target language $T$. The implementation language, $I$, is also the language whose interpreter can execute the compiler. We will refer to $I$ as the interpretation, i.e. the execution, language. This permits us to concisely express a translator $L$ that executes on $I$ and translates from a source language $S$ to a target language $T$ as $^SL^T_I$. The symbols $S$, $T$ and $I$ occupy the same positions as in the T diagram. However, the ordinarily empty T diagram now has the name of the translator $L$ at it's center and the rest of the T diagram is discarded. The name of the translator is used to denote the individual components of a conventional tool chain. To use the notation, we substitute $S$, $T$ and $I$ appropriately.

3 The Tool Chain

The conventional tool chain, $T$, that generates an executable from a set of source language files is made up of a compiler $C$, an assembler $A$, and a linker $L$. The linker also uses any necessary library $LIB$ along with the input received from the previous components of the tool chain. Denote high level language used to express a program by $c$, the corresponding assembly program by $s$, the object language program by $o$ and the final executable language by $x$.

Ignoring the library for a moment, the native tool chain is \beginequation ^cT^x_x : ^cC^s_x \rightarrow ^sA^o_x \rightarrow ^oL^x_x \endequation where the –> is an infix sequencing operator that sequences the operation on the RHS after the operation on it's LHS.

The library is not truly a translation system. It is a repository of object files and serves the linker to complete its job. To include it we cannot use the sequence operation. Instead, we introduce a higher precedence “parallel” operation, denoted by an infix `||', that expresses the subsidiary, but necessary, role of the library to the linker. Because the library is not a translation system, it need not strictly follow our variant of the T diagram. In fact, it just undergoes a translation as any other program would when passed through the tools. We denote the library by $LIB$ and the “execution” language that it is in is denoted by a right subscript so that overall notational consistency is preserved when expressing the operation of the translation tools on it. Thus, the library source code in source language $c$ is denoted by $LIB_c$. When the library has been compiled it is in assembly language and is then denoted as $LIB_s$. Finally after being assembled the library is denoted as $LIB_o$. Including the library into Eq.(eq:basic:tool:chain:schematic), the tool chain is \beginequation ^cT^x_x : ^cC^s_x \rightarrow ^sA^o_x \rightarrow LIB_o \vert\vert ^oL^x_x \endequation

4 The Cross Tool Chain Problem

gcc-cgf-fig.png

Figure 4.1: The cross compilation problem and notation.

A cross tool chain generates executables for a different target than the one on which it itself runs. Denote the cross target by a primed right superscript. The cross tool chain problem can be stated as:

Given the tool chain on the host, Eq.(eq:basic:tool:chain), and the sources, \begineqnarray ^cC^s_c &~& \mathrm The\ Compiler\ Source\ Code \\ ^sA^o_c &~& \mathrm The\ Assembler\ Source\ Code \\ ^oL^x_c &~& \mathrm The\ Linker\ Source\ Code \\ LIB_c &~& \mathrm The\ Library\ Source\ Code \endeqnarray we need to generate \beginequation ^cT^x^\prime_x : ^cC^s^\prime_x \rightarrow ^s^\primeA^o^\prime_x \rightarrow LIB_o^\prime \vert\vert ^o^\primeL^x^\prime_x. \endequation

The compiler $^cC^s_c$ in Eq.(eq:basic:cross:tool:chain) is complete in the sense that it has information about the signatures of the names provided by the library $LIB$. Without this information, the compiler would not be able to perform certain checks on a few objects that may appear in the source code. The tool chain would fail in it's final phase when the linker resolves references to any library objects referred to in the original input source. If the source did not refer to any objects in the library, then the basic tool chain in Eq.(eq:basic:tool:chain) would work. Fig.(Figure 4.1) states the problem and details the notation used.

5 Cross Compiler Generation

gcc-cgf-fig.png

Figure 5.1: Steps in generating the cross compiler hosted on the unprimed system.