Compiler design

One of the most prominent aspects of the study of Computer Science (CS) is the art and science of making compilers. It may seem surprising that compilers are such an important part of Computer Science, since a compiler, in fact, does not do much of anything except act as a converter. All a compiler really does is convert a high-level language (HLL) program into machine language so the computer can run it. (Some compilers do not convert into machine language, but instead convert between other languages, such as from Pascal to C, but the basic paradigm of a compiler is a program that coverts an HLL into machine language.) It seems like a simple conversion tool, and that's all it really is. But the practice of making them is surprisingly complex. This, combined with the fact that compilers are so important to developers today (very, very few people program commercially in machine language anymore, but computers still need things to be stated in machine language in order to run them) explains why compilers are such a crucial part of Computer Science, since CS really is the science of program development more than anything else.

Compilers have all varieties of sizes; Some are tiny and do little more than ultra-simple conversions, some are huge, full-blown productivity applications that fill a whole CD-ROM because they're filled with library functions. Regardless of their size or scope, compilers have a common set of basic functions they perform. The essential framework of a compiler looks something like this:

1. Get the source code

Before a compiler can really do anything, it needs to have the source code that it's meant to compile, so the first step is simply to retrieve the source code for processing. How this is done depends on the compiler; Some compilers require the source to be saved in a separate file. Then the compiler simply opens up that file and loads it into memory for processing. Other compilers have a built-in typing interface where you can type the source code directly into the program. If a separate source file is used, the first thing the compiler must do is open that file; How this is done will depend entirely on the platform the compiler is running on, and how that platform handles file access. If the compiler itself takes user input, some kind of user interface will have to be created, at the bare minimum a space will need to be made on the screen for the user to type in the code, so the compiler can store and process it.

2. Parse the commands

This is a big one. This is where text parsing comes into play, which is a whole field of Computer Science unto itself. Entire books could be (and have been) written on how to process text in a computer to re-format or analyze it. The basic idea when parsing program code, though, is clear: First, understand the basic function of each command in the program. Is it a PRINT command? An INPUT command? A GOTO command? This is BASIC terminology, but the same principles apply in virtually any programming language; Almost every programming language in the world has facilities to display information on the screen, receive user input, and jump to a different function or area of the program. Know what the basic command is, and then factor in the arguments or parameters for that command. Understand what the programmer is trying to do with this code; That's the goal of this step.

3. Generate the same commands in the target language

This is obviously harder than it sounds, but the basic principle is still a simple substitution of commands. If you're converting from BASIC to C, then PRINT becomes printf. IF INPUT$ = "PRINT" THEN OUTPUT$ = "printf" is simple. Similarly, accept all the parameters for the command and re-format them as appropriate. Obviously, if you're converting into machine language, you've got to turn everything into numbers, so you have to keep track of (for exaple) the memory offsets of the program functions and what numbers to substitute for them in the final code, which is why the job is much more complicated than it might sound at first. But it's certainly possible, and when all the work is done, the result is a functional compiler, an essential tool for modern software development.

Back to the main page