Search (using Google):  Web Karig

 

21 January 2004

A tiny compiler, part 2

In my last entry, I discussed building a crude compiler small enough to fit into a boot sector. Now I'll start building the compiler, piece by piece. The first piece I'm going to build is the command line.

This compiler won't work like most compilers. The conventional compiler (gcc, for example — or for that matter a conventional assembler such as NASM) requires that the user pass it the name of a file, which the compiler opens and reads. My compiler will not read from any files. It will simply wait for the user to type in words, which the compiler will compile or execute, depending on the current color. In other words, my compiler will actually be a very crude IDE.

So I need to write an endless loop that simply waits for keystrokes and (for now) simply echoes them to the screen.

How the code works

The code I have here simply waits for a key. After the boot sector loads and runs, you will see a blank screen with the cursor blinking in the upper-left corner, and you just type. You type in words. Each word is exactly two characters long; after each word, there are two spaces to separate the word from the next. To change the color of the word you are typing, hit a function key: F1 turns the word red; F2, green; F3, yellow; F4, magenta. If you type one character and then press a function key, this changes the color in mid-word; the color of the word will be the color of the second character in the word.

The special characters — Backspace, Enter, Delete, Tab, and even the space bar — don't work; you can type only visible characters.

In future entries, each word will either represent a chunk of machine code to be called or executed, or be a number in hexadecimal that will be converted to a byte value and pushed onto the data stack. Each color will tell the compiler to do something specific with the word.

BIOS routines

I will need the following BIOS routines:

  • INT 10h, AH=00h — set video mode and clear screen. I'll use this one to ensure that the computer screen is in text mode (80 columns by 25 rows) and to clear the screen.

  • INT 16h, AH=00h — get key. This one waits until the user presses a key, then returns two things: It returns the ASCII character code of the key pressed in AL, and it returns the key scan code in AH. (If the key has an ASCII code that isn't zero, then I can just print the code to the screen.)

  • INT 10h, AH=0Eh — print character and move cursor. This one prints a given character and not only moves the cursor forward; it also scrolls the screen when the bottommost row on the screen overflows.

  • INT 10h, AH=09h — print character without moving cursor. This one prints a given character with a given attribute (color) to the screen at the cursor position. I need this because the previous service does not provide a way to set the attribute of the character being printed.

Code

As usual, you can download the complete boot-sector code and try it out.

I'm going to do something a little different. I will list all of the code in the file, and I will keep track of how many bytes each chunk of code takes up. I'm doing this because I want to fit the complete compiler and programming environment small enough to fit inside a boot sector, and I don't know exactly when I'll stop adding features.

Setup

As usual, the boot sector must set up the segment registers. This takes 15 bytes of machine code.

		jmp	word 0:segzero

segzero:
		mov	ax, cs
		mov	ds, ax
		mov	es, ax
		mov	fs, ax
		mov	gs, ax

The boot sector must also set up the stack. (I'm just setting up the call stack here; I'll leave out the data stack required by colorForth for now.) This takes 10 bytes (for a total of 25 bytes).

		cli
		mov	ax, 0x1000
		mov	ss, ax
		mov	sp, 0xFFFE
		sti

I now force text mode (mode 3 — 80 columns x 25 rows) and clear the screen. This takes 5 bytes (for a total of 30 bytes).

		mov	ax, 3
		int	0x10

Changing character colors

The first thing I do is change the attribute of the characters in the upper-left corner of the screen, so the first character you type will be one of the four colors listed in the attribs table below.

To change the color of the character about to be typed, I pass a space character and the byte in attrib to INT 10h, AH=09h. Characters typed afterward will be in this new color until the user presses a function key.

These lines are executed immediately after the screen is cleared, and thereafter only if the user has just either pressed a function key or typed the second character in a word. (11 bytes — total 41 bytes.)

	.4:	mov	ax, (9 * 0x100) + ' '
		mov	bx, [attrib]
		mov	cx, 1
		int	0x10

Getting keystrokes

This is where the loop begins.

I wait for a key. This takes 4 bytes (for a total of 45 bytes).

	.1:	xor	ah, ah
		int	0x16

The BIOS function returns with an ASCII character code in AL and the scan code of the key in AH. I check for the scan code of a function key.

The function keys F1, F2, F3, and F4 have the scan codes 0x3B, 0x3C, 0x3D, and 0x3E respectively. This makes testing for them simple: I just subtract 0x3B from the scan code. If the result is not 0, 1, 2, or 3, then the key pressed was not one of these four function keys, so I jump ahead. (8 bytes — total 53 bytes.)

		sub	ah, 0x3B
		cmp	ah, 4
		jnb	.2

If the key was a function key, then AH contains 0, 1, 2, or 3 — which I use as an offset into the attribs table so I can retrieve the appropriate byte to use as the screen attribute for the current word. (12 bytes — total 65 bytes.)

		xor	bh, bh
		mov	bl, ah
		mov	ah, [attribs+bx]
		mov	[attrib], ah

After this, I jump back to the code that recolors the current character according to the attribute byte. (2 bytes — total 67 bytes.)

		jmp	short .4

(Note how this is set up. If you type the first character of a word and then hit a function key, only the second character is shown in the new color, and the word as a whole is considered to have the color of the second character. This is cheesy, of course, but it's OK for a compiler/IDE intended to be as tiny as this one is.)

If the key was not one of the four function keys, I check the ASCII code. If this is the code for a control character (such as Tab or Enter), I go back and wait for another keystroke; otherwise I continue. Thus the user can't move around on the screen; he can only type printable characters. (If the user can't hit Enter, then I don't have to write code to deal with the cursor suddenly jumping to another location on the screen.) (4 bytes — total 71 bytes.)

	.2:	cmp	al, 33
		jb	.1

Echoing characters

Now I "echo" the ASCII code (in AL) to the screen and move the cursor forward. (6 bytes — total 77 bytes.)

		xor	bh, bh
		mov	ah, 0xE
		int	0x10

Putting spaces between words

Each word is two characters long, so after every second character, I want to print two spaces. (Putting two spaces after each two-character word means that the length of a row [80] is divisible by the number of columns a word takes up [4], so that I never need to watch out for the end of a row.)

To tell whether the character just typed was the second one in the current word, I keep a byte-variable named c, which I increment after each character typed. If this variable is even (divisible by two), then the second character has been typed, and it's time to print the two spaces. (11 bytes — total 88 bytes.)

		inc	byte [c]
		mov	al, [c]
		test	al, 1
		jnz	.1

After the second character has been typed and displayed, I call on a BIOS function to print a space to the screen — twice. (7 bytes per call — 14 bytes for both calls — total 102 bytes. Note that making a subroutine out of this and then calling the subroutine twice would not have saved any bytes.)

		mov	ax, (0xE * 0x100) + ' '
		xor	bh, bh
		int	0x10
		mov	ax, (0xE * 0x100) + ' '
		xor	bh, bh
		int	0x10

I still need to change the character attribute, so I return to the top of the loop. (2 bytes — total 104 bytes.)

		jmp	short .4

Data

The data section contains two variables and one table (7 bytes). The attrib variable is actually two bytes, so that I can save a byte by using mov bx, [attrib] to load both BL and BH for a BIOS function call.

c:
		db	0
attrib:
		db	0x04, 0
attribs:
		db	0x04 ; red on black (define)
		db	0x02 ; green on black (compile)
		db	0x0E ; yellow on black (execute)
		db	0x05 ; magenta on black (hex)

The rest of the boot sector, of course, is padding.

		times	510 - ($-$$) db 0x90 ; nop
		db	0x55, 0xAA

Size

The combined size of the code and data sections I've presented here is 111 bytes. The boot sector provides 510 bytes of space, so I still have 399 bytes of space left over.

Entries to come

To get this compiler to work, I'll need to write and test the c-comma code (the routine that stores bytes of machine code into the code space) and the code to convert two hexadecimal characters (a magenta word) into a byte on the data stack. After that, I'll need to write the code to set up the dictionary in the space beyond the end of the boot sector (at 0000:7E00), then the code to define new words (red words) in the dictionary, then the code to execute yellow words immediately, and finally the code to compile calls to green words.

Who knows? Maybe I'll get ambitious and include code to read other sectors off the floppy — sectors containing precode, which is compiled into the dictionary as soon as it's loaded. That would make the compiler almost useful. :-)

But this project is really a test to see how much functionality I can squeeze into a boot sector. Eventually I'll have to quit this and get working on a real 32-bit compiler that runs in protected mode.

Check the index for other entries.