21 January 2004 A tiny compiler, part 2 In my last entry, I discussed building a crude compiler small enough to fit into a boot sector. Now I'll start building the compiler, piece by piece. The first piece I'm going to build is the command line. This compiler won't work like most compilers. The conventional compiler (gcc, for example — or for that matter a conventional assembler such as NASM) requires that the user pass it the name of a file, which the compiler opens and reads. My compiler will not read from any files. It will simply wait for the user to type in words, which the compiler will compile or execute, depending on the current color. In other words, my compiler will actually be a very crude IDE. So I need to write an endless loop that simply waits for keystrokes and (for now) simply echoes them to the screen. How the code works The code I have here simply waits for a key. After the boot sector loads and runs, you will see a blank screen with the cursor blinking in the upper-left corner, and you just type. You type in words. Each word is exactly two characters long; after each word, there are two spaces to separate the word from the next. To change the color of the word you are typing, hit a function key: F1 turns the word red; F2, green; F3, yellow; F4, magenta. If you type one character and then press a function key, this changes the color in mid-word; the color of the word will be the color of the second character in the word. The special characters — Backspace, Enter, Delete, Tab, and even the space bar — don't work; you can type only visible characters. In future entries, each word will either represent a chunk of machine code to be called or executed, or be a number in hexadecimal that will be converted to a byte value and pushed onto the data stack. Each color will tell the compiler to do something specific with the word. BIOS routines I will need the following BIOS routines:
Code As usual, you can download the complete boot-sector code and try it out. I'm going to do something a little different. I will list all of the code in the file, and I will keep track of how many bytes each chunk of code takes up. I'm doing this because I want to fit the complete compiler and programming environment small enough to fit inside a boot sector, and I don't know exactly when I'll stop adding features. Setup As usual, the boot sector must set up the segment registers. This takes 15 bytes of machine code. jmp word 0:segzero segzero: mov ax, cs mov ds, ax mov es, ax mov fs, ax mov gs, ax The boot sector must also set up the stack. (I'm just setting up the call stack here; I'll leave out the data stack required by colorForth for now.) This takes 10 bytes (for a total of 25 bytes). cli mov ax, 0x1000 mov ss, ax mov sp, 0xFFFE sti I now force text mode (mode 3 — 80 columns x 25 rows) and clear the screen. This takes 5 bytes (for a total of 30 bytes). mov ax, 3 int 0x10 Changing character colors The first thing I do is change the attribute of the characters in the upper-left corner of the screen, so the first character you type will be one of the four colors listed in the attribs table below. To change the color of the character about to be typed, I pass a space character and the byte in attrib to INT 10h, AH=09h. Characters typed afterward will be in this new color until the user presses a function key. These lines are executed immediately after the screen is cleared, and thereafter only if the user has just either pressed a function key or typed the second character in a word. (11 bytes — total 41 bytes.) .4: mov ax, (9 * 0x100) + ' ' mov bx, [attrib] mov cx, 1 int 0x10 Getting keystrokes This is where the loop begins. I wait for a key. This takes 4 bytes (for a total of 45 bytes). .1: xor ah, ah int 0x16 The BIOS function returns with an ASCII character code in AL and the scan code of the key in AH. I check for the scan code of a function key. The function keys F1, F2, F3, and F4 have the scan codes 0x3B, 0x3C, 0x3D, and 0x3E respectively. This makes testing for them simple: I just subtract 0x3B from the scan code. If the result is not 0, 1, 2, or 3, then the key pressed was not one of these four function keys, so I jump ahead. (8 bytes — total 53 bytes.) sub ah, 0x3B cmp ah, 4 jnb .2 If the key was a function key, then AH contains 0, 1, 2, or 3 — which I use as an offset into the attribs table so I can retrieve the appropriate byte to use as the screen attribute for the current word. (12 bytes — total 65 bytes.) xor bh, bh mov bl, ah mov ah, [attribs+bx] mov [attrib], ah After this, I jump back to the code that recolors the current character according to the attribute byte. (2 bytes — total 67 bytes.) jmp short .4 (Note how this is set up. If you type the first character of a word and then hit a function key, only the second character is shown in the new color, and the word as a whole is considered to have the color of the second character. This is cheesy, of course, but it's OK for a compiler/IDE intended to be as tiny as this one is.) If the key was not one of the four function keys, I check the ASCII code. If this is the code for a control character (such as Tab or Enter), I go back and wait for another keystroke; otherwise I continue. Thus the user can't move around on the screen; he can only type printable characters. (If the user can't hit Enter, then I don't have to write code to deal with the cursor suddenly jumping to another location on the screen.) (4 bytes — total 71 bytes.) .2: cmp al, 33 jb .1 Echoing characters Now I "echo" the ASCII code (in AL) to the screen and move the cursor forward. (6 bytes — total 77 bytes.) xor bh, bh mov ah, 0xE int 0x10 Putting spaces between words Each word is two characters long, so after every second character, I want to print two spaces. (Putting two spaces after each two-character word means that the length of a row [80] is divisible by the number of columns a word takes up [4], so that I never need to watch out for the end of a row.) To tell whether the character just typed was the second one in the current word, I keep a byte-variable named c, which I increment after each character typed. If this variable is even (divisible by two), then the second character has been typed, and it's time to print the two spaces. (11 bytes — total 88 bytes.) inc byte [c] mov al, [c] test al, 1 jnz .1 After the second character has been typed and displayed, I call on a BIOS function to print a space to the screen — twice. (7 bytes per call — 14 bytes for both calls — total 102 bytes. Note that making a subroutine out of this and then calling the subroutine twice would not have saved any bytes.) mov ax, (0xE * 0x100) + ' ' xor bh, bh int 0x10 mov ax, (0xE * 0x100) + ' ' xor bh, bh int 0x10 I still need to change the character attribute, so I return to the top of the loop. (2 bytes — total 104 bytes.) jmp short .4 Data The data section contains two variables and one table (7 bytes). The attrib variable is actually two bytes, so that I can save a byte by using mov bx, [attrib] to load both BL and BH for a BIOS function call. c: db 0 attrib: db 0x04, 0 attribs: db 0x04 ; red on black (define) db 0x02 ; green on black (compile) db 0x0E ; yellow on black (execute) db 0x05 ; magenta on black (hex) The rest of the boot sector, of course, is padding. times 510 - ($-$$) db 0x90 ; nop db 0x55, 0xAA Size The combined size of the code and data sections I've presented here is 111 bytes. The boot sector provides 510 bytes of space, so I still have 399 bytes of space left over. Entries to come To get this compiler to work, I'll need to write and test the c-comma code (the routine that stores bytes of machine code into the code space) and the code to convert two hexadecimal characters (a magenta word) into a byte on the data stack. After that, I'll need to write the code to set up the dictionary in the space beyond the end of the boot sector (at 0000:7E00), then the code to define new words (red words) in the dictionary, then the code to execute yellow words immediately, and finally the code to compile calls to green words.
Who knows? Maybe I'll get ambitious and include code to read other sectors off the floppy — sectors containing precode, which is compiled into the dictionary as soon as it's loaded. That would make the compiler almost useful. But this project is really a test to see how much functionality I can squeeze into a boot sector. Eventually I'll have to quit this and get working on a real 32-bit compiler that runs in protected mode. Check the index for other entries. |