Blog.

Code Generation- CuriousX

Cover Image for Code Generation- CuriousX
Jennifer
Jennifer

code generation is the process of translating high-level programming language code into machine code that can be executed by a computer. The generated code can be in the form of Assembly language, binary code, or any other low-level representation that a computer can understand.

The back-end of a compiler is responsible for producing efficient machine code from the intermediate representation generated by the front-end. This process involves several steps, including instruction selection, register allocation, and control flow analysis.

Instruction Selection

The first step in code generation is selecting the instructions that will be used to implement the intermediate representation. This step is known as instruction selection. The goal of instruction selection is to generate code that is both correct and efficient.

For example, let's consider the following abstract assembly:

add r1, r2, r3

This represents the addition of the values in registers r2 and r3, with the result stored in register r1. The instruction selection phase would select the appropriate machine instruction to implement this operation, depending on the target architecture. On an x86 architecture, the corresponding instruction would be add r1, r2, r3.

Register Allocation

The next step in code generation is register allocation. This step involves assigning variables to registers or memory locations. Registers are small, fast storage locations inside the CPU that are used to hold frequently accessed data. Assigning variables to registers can improve performance by reducing memory accesses.

For example, consider the following code:

x = a + b

The back-end would need to allocate registers for the variables a, b, and x. Depending on the target architecture and the number of available registers, the back-end may need to spill some of the variables to memory.

Control Flow Analysis

The final step in code generation is control flow analysis. This step involves analyzing the program's control flow and generating code that implements it correctly. Control flow refers to the order in which instructions are executed in a program.

For example, consider the following code:

if (x > 0) {
    y = 1;
} else {
    y = 0;
}

The back-end would need to generate code that implements the correct control flow for this statement. This might involve generating conditional jump instructions or branching instructions, depending on the target architecture.

CuriousX code generator takes an AST as input and outputs ARMv8 assembly code, it traverses the AST in postfix order and generates code for each node in the AST.

Supported AST Nodes The code generator supports the following AST nodes:

  • AssignNode: assigns a value to a variable
  • IntNode: represents an integer value
  • VarNode: represents a variable
  • PlusNode: adds two expressions
  • MinusNode: subtracts two expressions
  • MultiplyNode: multiplies two expressions
  • DivideNode: divides two expressions
  • PrintNode: prints an expression

During code generation, the register allocator manages the allocation and de-allocation of registers. The allocator uses a stack-based allocation strategy and spills registers with the lowest priority to memory when no registers are available. When a register is freed, the allocator marks it as available for reuse

Let's use the following example code to illustrate the code generation process:

a = 6
x = 5 + 2 * a

The AST for this code is:

             =
            / \
           a   6
               
                 
             +
            / \
           5   *
              / \
             2   a

The code generator generates the following ARMv8 assembly code:

mov r0, #6         ; Move the value 6 into register r0
str r0, [sp, #-4]! ; Store the value of register r0 onto the stack and update the stack pointer
mov r0, #5         ; Move the value 5 into register r0
mov r1, #2         ; Move the value 2 into register r1
ldr r2, [sp, #0]   ; Load the value of variable a from the stack into register r2
mul r1, r1, r2     ; Multiply the values in r1 and r2 and store the result in r1
add r0, r0, r1     ; Add the values in r0 and r1 and store the result in r0
str r0, [sp, #-4]! ; Store the value of r0 onto the stack and update the stack pointer
bx lr             ; Return from the function

The first two lines initialize variable a to the value of 6. The value 6 is loaded into register r0, then it is stored onto the stack at the top of the stack. The stack pointer is then decremented by 4 to allocate space for the stored value.

The next lines compute the value of x by adding 5 to 2 times the value of a. The values 5 and 2 are loaded into registers r0 and r1, respectively. The value of a is loaded from the stack into register r2. The values in r1 and r2 are multiplied, and the result is stored in r1. The values in r0 and r1 are added, and the result is stored in r0. The value of x is then stored onto the stack at the top of the stack.

The last line is a return statement, indicating the end of the code generation function. The program flow returns to the calling function using the link register (lr).

You can check out the code on github and run it against you source codes. To ensure the arm assembly generated you can use miniarm an arm emulator to validate it.