22 January 2004 A tiny compiler, part 3 I'm building a crude compiler, piece by piece. For my last entry, I wrote and tested the command-line portion of the compiler — the code that waits for keystrokes and echoes them to the screen. Now I'm writing and testing c_comma — the code that moves a byte from the data stack to the end of the code space. Let's review: The compiler works by storing bytes of machine code into a buffer called the code space. The colorForth programmer uses something called a data stack (or simply the stack) both to pass data to a word being called and to return data to a calling word. A word is a procedure or routine, whose name and address are stored in a dictionary. The address in the dictionary is an address within the code space — the address of the machine code to be executed when the word is called. My compiler, when finished, will come with a dictionary containing a single word: c, (pronounced "C-comma"). When this word is executed, it moves a byte from the stack and stores it at the end of the code space. Because a program in memory is just a string of such bytes, my compiler will represent the minimum amount of code needed to compose and run programs. Code The complete boot sector sets up the segment registers and the call stack, then it sets up the data stack. In this case, the data stack is just a string of 48 random byte values stored in the boot sector itself. (Note that the items on the stack here are byte [8-bit] values, not 16-bit values. My compiler, when finished, will have a 16-bit data stack.) The AL register contains the top item on the data stack, and the SI register points to the second item on the data stack. Traditionally, stacks are "upside-down" — the top of the stack has a lower memory address than the bottom, so that whenever you push an item onto the stack, the items already on the stack remain in the same memory addresses as before, while the new item has an address lower than those of the other items. Here the stack is pre-filled, so it isn't necessary to push the test values onto the stack; they are ready to be pulled off and used. To initialize the stack, I simply point SI at the top item... mov si, stack_top ...and then move the top item into AL, while incrementing SI so that it points at the second item. The LODSB instruction does both of these things at once. lodsb c_comma The c_comma routine works like this: The code space takes up the 32KB from offsets 0x8000 through 0xFFFF. The variable here serves as the pointer to the end of the code space. mov bx, [here] The top item in the data stack, as I've said, is already in AL, so I just store it. mov [bx], al Now I have to adjust the code-space pointer by the size of the item I've just stored there (one byte). Because the code space grows upward, the pointer has to be incremented. inc bx mov [here], bx Now that the item has been copied into the code space, it needs to be dropped from the stack. lodsb ret Test code The test code simply calls c_comma forty-eight times — once for each item on the stack — and then dumps the first forty-eight bytes of the code space to the screen. mov cx, 48 .1: call c_comma dec cx jnz .1 mov bx, 0x8000 call dump_16 call dump_16 call dump_16 Results The code works. What I get is what I expected to get: 0000:8000: 05 16 27 38 49 5A 6B 7C 8D 9E AF B0 C1 D2 E3 F4 | ..'8IZk|........ 0000:8010: D0 EF FE 0D 1C 2B 3A 49 58 67 76 85 94 A3 B2 C1 | .....+:IXgv..... 0000:8020: 12 34 56 78 9A BC DE F0 23 45 67 89 AB CD EF 01 | .4Vx....#Eg..... Check the index for other entries. |