Search (using Google):  Web Karig

 

7 February 2004

Code that moves itself

I recently bought a small laptop on eBay for a hundred bucks (a Soyo PW-9800). I've decided to remove the Windows partition from the Soyo's hard disk and devote the whole hard disk to Karig. I want to be able to take the laptop to a café somewhere, plug it in, turn it on, and have it boot up into Karig, so I can start writing or coding while I have my coffee. So I'm dropping the tiny-compiler miniproject for now and concentrating on writing something I can store on the Soyo's hard disk.

So I'm working on a boot sector that will move itself down to address 0:0x0800 and then load 62KB from the hard disk (filling memory up to address 0:0xFFFF). This 62KB block would contain Karig's setup code, kernel, and compiler.

I just had to figure out how to get NASM to produce the kind of code I wanted.

The problem

First of all, NASM complains when you include more than one ORG directive in a source-code file (it prints "error: program origin redefined" and produces no output file), so this code won't work:

[ORG 0x7C00]
	; copy code to 0x0800
	; jump to code at 0x0800

[ORG 0x0800]
	; code continues

So I had to come up with a source-code file with one ORG directive, but which contained code to be executed from two locations in memory: The block of code that sets up and executes the copying of the boot-sector code to the new location had to be executed from the old location (0x7C00 to 0x7DFF), while everything else had to be executed from the new location (0x0800 to 0x09FF).

NASM uses the ORG directive to figure out how to translate a JMP statement into the correct machine code. If I use [ORG 0x7C00] and enter "JMP 0x0800", NASM produces three bytes of machine code. However, what NDISASM (the disassembler that comes with NASM) produces is not

00000000  E90008            jmp 0x0800

as a newbie might expect, but rather

00000000  E9FD8B            jmp 0x8c00

This requires a little study.

First, the '0x8c00' adds up, because NDISASM assumes "ORG 0", but I wanted "ORG 0x7C00", and if you add the word 0x7C00 to the word 0x8C00, you get the word 0x0800, which is the JMP destination I wanted. But second, look at the second and third byte of machine code — "FD8B", which add up to the word "8BFD". Where did this come from?

The answer is that, in near jumps, the address in a JMP instruction is really a signed number that is added to the processor's instruction pointer. In this case, 0x8C00 is really minus 0x7400, so the processor is subtracting 0x7400 from 0x7C00 and getting 0x0800. Also, 0x8BFD is really minus 0x7403. If the processor loads the JMP instruction and updates the instruction pointer as soon as it knows how many bytes the instruction takes up, then the address in the instruction pointer isn't 0x7C00, but 0x7C03, the address of the next instruction. Subtract 0x7403, and again you get 0x0800. So everything adds up here.

But the bulk of the code in the boot sector, and in the sectors that the boot sector is supposed to load, requires "[ORG 0x0800]" because the bulk of this code will be executed within the block of memory that begins at 0x0800, not the block of memory beginning at 0x7C00 where the BIOS loads the boot sector.

The catch is that the code that actually jumps to the new block at 0x0800 must be executed from the old block at 0x7C00 and therefore requires an "[ORG 0x7C00]". If I try to do a "JMP 0x0800" after setting an "[ORG 0x0800]", I get

00000000  E9FDFF            jmp 0x0

which is equivalent to a "JMP NEAR $" (a near jump to the current instruction — a three-byte endless loop), which is not what I want at all.

Solution: Far jump

The solution is to use a far jump — "JMP 0:0x0800". A near jump, when executed, adds a number to the instruction pointer; a far jump, when executed, stores a number into the instruction pointer.

However, a far jump is five bytes long, and I wanted to save bytes wherever I could. (I don't know how much I can cram into 62KB, but I'd like to find out.) Then I realized that every boot sector I've written since my first one started off with a far jump, which I used to straighten out the CS register (to ensure that it contained a zero, and not 0x07C0, which is what some BIOSes put in there before calling the boot sector, according to something I read somewhere). Suppose I began the boot sector with code that didn't depend on the exact content of CS — that is, it made no subroutine calls — and put the far jump at the end of this code instead of at the beginning? Before this, I had planned on using a far jump to straighten out CS and a near jump to the 0x0800 block; I'd replace these with a single far jump to the 0x0800 block that at the same time straightened out CS. I would thus save three bytes (the length of a near-jump instruction).

The code

There are two parts to this section. The first covers the new boot-sector statements needed to implement (1) copying the boot sector to 0x0800 and (2) jumping to the boot-sector copy without crashing. The second section covers changing the destination of the jump, so as to make room for some variables or other data in the boot sector, so that they will be copied along with the rest of the boot sector and will still be available after the jump has been made.

The complete boot-sector code is here.

A simple jump

What I have here is code that jumps from address 0x7C00+N to address 0x0800+N+5 (five being the length in bytes of a far jump). The code is in two "stages," with the first stage being the copying and the far jump, and the second being whatever the code in the 0x0800 block does.

The code is 16-bit, and the origin is 0x0800. The Stage-One code should be able to run anywhere in memory, so that changing the origin won't change the machine code produced from these instructions (i.e., the fact that the ORG directive here is technically wrong for these instructions doesn't matter, because NASM will produce the correct machine code no matter what the ORG value is).

[ORG 0x0800]
[BITS 16]

First, I straighten out the four data-segment registers — all of these should point at segment zero. (Note that I've removed the far jump I've been using to straighten out the code-segment register.)

stage_1:
		xor	ax, ax
		mov	ds, ax
		mov	es, ax
		mov	fs, ax
		mov	gs, ax

Then I set up a stack, beginning at 0x0800 and growing downwards (so that the first word pushed will be written to 0x07FE).

		cli
		mov	ss, ax
		mov	sp, 0x0800

Having stored in the stack pointer the address to which the boot sector is to be copied, I start preparing for the move by copying the address into DI before re-enabling interrupts.

		mov	di, sp
		sti

Now I finish setting up the move, which requires the following registers to be set up:

  • DS:SI must contain the address of the data being copied.
  • ES:DI must contain the address of the block of memory into which the data is being copied.
  • CX must contain the number of bytes (if you're using MOVSB) or words (if MOVSW) to be copied.
  • The direction flag must be cleared to zero if you want to copy addresses above the address in DS:SI, or set to one if you want to copy addresses below.

So far, I've already set DS, ES, and DI. I still need to clear the direction flag and set SI and CX.

		cld
		mov	cx, 256
		mov	si, 0x7C00

Finally I can move the boot sector...

		rep	movsw

...and make my far jump. Note that I have to put in "(stage_2 - stage_1)" because I want to skip over that part of the boot sector that has already been executed. I wrote that the far jump goes from 0x7C00+N to 0x0800+N+5; the "(stage_2 - stage_1)" is the "N+5" here.

		jmp	0:0x0800 + (stage_2 - stage_1)

stage_2:

Verifying that the jump occurred

To ensure that the copy and jump worked, I wrote some Stage-Two test code. First, I wiped out the original boot-sector by filling it with zeroes.

		cld
		xor	al, al
		mov	di, 0x7C00
		mov	cx, 256
		rep	stosw

Then I dumped 64 bytes from the boot-sector copy, and 64 bytes from the old boot-sector block. As I had hoped, the memory dump occurred, the first four lines displayed machine-code bytes, and the last four lines displayed null bytes.

		call	clear_screen
		mov	bx, 0x0800 ; test code only
		call	dump_16
		call	dump_16
		call	dump_16
		call	dump_16
		mov	bx, 0x7C00 ; test code only
		call	dump_16
		call	dump_16
		call	dump_16
		call	dump_16

Jumping over a block of variables

I wanted to be able to specify variables or constants right in the source code, and to have such data right in the boot sector so that I don't need to write code to set them up outside of the boot sector, and so they'd be copied too. I'm thinking of using this option when writing the code to load that 62KB of sectors from the disk.

You usually place such data between subroutines — after JMP or RET statements. That far jump I used to go to Stage Two didn't have to land at the instruction immediately following the copy of that far-jump instruction; it could land a little further up in memory, thus leaving a space between the far-jump instruction copy and the first instruction executed afterward. This space would be perfect for storing variables or tables if you have no other place for them.

So my first modification here will be to the far jump. The modified far jump will jump from 0x7C00+N to 0x0800+N+5+X, where X is the size of whatever non-code stuff I want to put into the boot sector.

My second modification here will be to add test code to prove that the far jump is indeed landing at exactly the correct address, whether X is equal to zero or anything else.

Here follows the above source code, with modifications in boldface.

The first part of the boot sector is unchanged. Stage One still needs to straighten out the segment registers, set up the call stack, and copy the boot sector to address 0x0800.

[ORG 0x0800]
[BITS 16]

stage_1:
		xor	ax, ax
		mov	ds, ax
		mov	es, ax
		mov	fs, ax
		mov	gs, ax

		cli
		mov	ss, ax
		mov	sp, 0x0800
		mov	di, sp
		sti

		cld
		mov	cx, 256
		mov	si, 0x7C00
		rep	movsw

But here I add a line of test code. I want to use a counter to prove that my far jump lands in the right place. The counter is in BX, which I set to zero. (I have to put the instruction in Stage One, before the jump is made, to ensure that it is executed.)

		xor	bx, bx

Then I jump. I don't really have to modify the jump instruction at all. All I have to do is store my data between the jump instruction and the stage_2 label.

		jmp	0:0x0800 + (stage_2 - stage_1)

		dd	0x12345678, 0x9ABCDEF0
		inc	bx
		inc	bx
		inc	bx

stage_2:

Note that if the jump lands a byte or two too low in memory, the counter will be incremented one or two too many times. The test will succeed only if the counter contains a one at the end of this.

The counter should still be zero immediately after the jump. I increment it here, at the beginning of Stage Two. If the jump lands a byte or two too high in memory, BX will not be incremented.

		inc	bx

I need a way to display the value of the counter on the screen. So I wipe out the original boot-sector code, as before, but I store BX to the first byte there.

		cld
		xor	al, al
		mov	di, 0x7C00
		mov	cx, 256
		rep	stosw
		mov	[0x7C00], bx

The last dozen lines of source code are the same, but the effect has one difference: The first byte on the fifth line printed on the screen (beginning with 0000:7C00:) should be 01, not 00, because BX was just stored there, and BX should contain a one.

		call	clear_screen
		mov	bx, 0x0800
		call	dump_16
		call	dump_16
		call	dump_16
		call	dump_16

		mov	bx, 0x7C00
		call	dump_16
		call	dump_16
		call	dump_16
		call	dump_16

Results

So the results need to be as follows:

  • The first four lines on the screen should be a dump of the first 64 bytes of the boot-sector code at 0x0800.
  • The next four lines on the screen should be a dump of the first 64 bytes of the original boot-sector area at 0x7C00. No code should be there anymore, so this block should be all zeroes — except for the first byte.
  • The first byte in the fifth line should be 01, not 00, because the counter value in BX was stored in memory at 0x7C00.

So my code works. The lines that appear on my laptop look like this:

0000:0800: EA 05 7C 00 00 8C C8 8E D8 8E C0 8E E0 8E E8 FA | ..|  ...........
0000:0810: 8E D0 BC 00 08 89 E7 FB B9 00 01 BE 00 7C F3 A5 | ... ..... .. |..
0000:0820: 31 DB E9 09 8C 78 56 34 12 F0 DE BC 9A F4 43 FC | 1....xV4......C.
0000:0830: 30 C0 BF 00 7C B9 00 01 F3 AB 89 1E 00 7C E8 20 | 0.. |. ..... |. 
0000:7C00: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | 
0000:7C10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | 
0000:7C20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | 
0000:7C30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 | 

Now I can get busy writing the code to read 62KB from the disk. It'll probably just be a rewrite of code already presented.

Check the index for other entries.