Sei sulla pagina 1di 16

ASSEMBLER PROGRAMMING USING DEBUG - ADD, SUB, MUL and DIV

So, what's going to happen ? First we will have a look at the debug command and see what it can do for us. Debug comes free with DOS and windows95/98 and is used to access the registers and RAM directly. It is a hackers dream ! (most hackers use advanced versions of debug called hex editors and disassemblers, debug is still very powerful though). Debug is a command line interface, so we only use the keyboard and it is important to understand some of the basic commands. We will investigate how to look at registers, do a memory dump, do some hexadecimal arithmetic and write some data into the registers so that we can add up and play with two numbers. Once we have the basics in place we will worry about how to actually write a program. Instructions that you need to do are written in red

DEBUG
First, start the assembler by opening a DOS window in windows95/98/NT (start/programs/msdos prompt) You should be in the windows directory (c:\windows). If you are not, change to that folder now. Start the debug program by typing debug <enter>. The prompt should schange to a simple "-" You can switch back and forward from this tutorial to the dos window from the taskbar or using <shift> + <tab>. That way you'll be able to follow the steps as we go along. Okay, lets have a look at the registers, type "r" <enter>

you should see a display similar to the one above. What does it all mean ? Well....

The AX, BX, CX and DX symbols refer to the four registers. At the moment they all have 0000 in them. SP means the stack pointer, it is an address in memory we can use to stack up variables. IP refers to the instruction pointer, it normally starts at 0100 since that is where the code part of a program usually starts. The instruction pointer points to the next instruction that needs to be loaded and executed. 0100 is not enough to identify a place in memory, so it is coupled with the CS register (0F70) which tells us the segment or 64k block which is being addressed. Don't stress, it is a bit like saying the next page to read is from volume 28687 (the segment), page number 8192 (offset) from a massive set of instruction manuals. The addresses refer to positions in the RAM.

The code 0F70:0100 is the address of the next instruction in full and the actual data at that address currently says 03F1, which debug has thoughtfully translated for us into semienglish and told us it means to add together the SI and CX registers. This means nothing to us at the moment though. In fact it is just garbage. Your display will undoubtedly have different values for CS and the instruction at offset 0100h.

All of the other symbols we can safely ignore at the moment.

Hexadecimal notation
The numbers being used here are all in hexadecimal or hex for short. Hex numbers start at 0 and go to 15 before counting over again. We are used to counting to 9 before adding a tens column, hex gets to 15. It is a very logical way of counting. 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1A, 1B, 1C, 1D, 1E, 1F, 20 etc. Hex is beautiful because one hex digit represents 4 bits two hex digits represent one byte or 8 bits four hex digits represents two bytes or one word So FF is the hex number for 255 decimal (15 * 16 + 15). To avoid confusion, hex numbers are sometimes written with an "h" after them, 3A7h is really the hexadecimal for our familiar decimal number 8359. Further, the registers contain 4 hex digits, so they obviously hold 2 bytes, or one word of data. Hexadecimal Summary page The addresses you see refer to the position of the data from the start of memory in hex. Due to an historical decision, intel requires that you specify the segment and the offset when specifying an address. When you start the debug program, it sets up a segment that is empty for you to use. When I was working it turned out to be 0F70, but your segment will be different. In this first tutorial we will only have to worry about the offset (0100h) and not the segment.

Registers
Lets put a bit of data into the registers.... type: r AX <enter> 8 <enter> r <enter> You have just told debug to replace the contents of the AX register with 8h typing r the second time did another register dump so you could see the change. Now place 2h in the BX register using some logical skills to work out the command (r BX).

Do a register dump to check it works. We have written data directly to the registers in the CPU.

Adding the AX and BX registers


This is a bit trickier, we need to get the command for adding the numbers into memory, tell the IP (Instruction pointer) to point to that instruction, load the instruction into the control unit of the CPU and have the ALU do the right instruction from its set of instructions. We can then have a look at the registers to see if it has worked. The add instruction will be placed at offset 0100h in whatever segment you are using. Debugs command for entering data into memory is "e". The bits in green are my comments to tell you what is happening. Type in the following: e 100 <enter> 'enter address 0100h 01 <enter> 'write 01h at that address e 101 <enter> 'enter address 0101h D8 <enter> 'write D8h at that address The extra hex numbers coming up after the prompt tell us the existing contents of those addresses. Since we are using the current segment, we don't have to tell debug about the CS and can use the offset (0100h) alone. We can also leave out leading zeros and debug doesn't mind if we write in upper or lower case for the a,b.c,d,e,f parts of hex numbers. We edited the bytes with two edit commands, we could have used a space after the first edit and done a sort of continuous edit, we will use this method later. That was the machine code for the instruction ADD AX, BX. Do a register dump to confirm it. You should see: 0F70:0100 01.D8 ADD AX, BX

Congratulations, you have just written your first line of code and you have even written it in machine code. You have just set the 16 bits of data at the addresses 0100h and 0101h to 0000000111011000b or 01D8h.

Machine Code
On ALL intel machines, from the earliest 8088 to the fastest pentium IIIs, the machine code 01D8 means to add the AX and BX registers. On a motorola processor used by Macs, the machine code would be very different. So when we build a program to run on an IBM intel based processor, the assembler codes up the instructions knowing what the machine

code for certain commands is. It is easy to remember ADD AX,BX, but a bit harder to remember that 01D8h is the machine code instruction for it. The assembler does the conversion for us. Each processor has a certain number of instructions it can use. A simple set of instructions is called a RISC (Reduced Instruction Set Computing) while a larger more complex one is called a CISC (Complex Instruction Set Computing). A RISC runs faster but can't do as much, a CISC runs slower but does much more. An example might be that a RISC has no instruction for multiplication, since multiplying is really just repeated adding, while the CISC can do it in one go. This is an extreme example, since most RISCs can do multiplication. The difference between a pentium and a pentium MMX is that the MMX has a few more instructions so that the processor can do a few more things with data, particularly concerning graphics for games, so the CISC is even more CISC.

Trace
So now the instruction is in place, lets execute it. Since the IP currently points to offset 0100h, the code is ready to run (that's why we selected address offset 0100 when we entered the data eh ?). The command for running a single line of code is t, which stands for trace. After each line is executed, debug gives us a register dump to show us what has happened. Tracing a program is to execute it line by line and examine all of the variables. Type t once to execute the command You should see that the AX register now has Ah in it. You might want to check that 8h + 2h = Ah...it does. (you probably want to see 10, but 10 decimal is written as A in hexadecimal) Note that the result of the addition is stored in the AX register. Sort of logical really. Note also that the IP now points to 0102h....since we haven't put any instructions in that location in RAM it would be unwise to execute it, the CPU might head off into limbo. That would be a bad thing. We should set the IP back to 0100h.... try using "r IP", this will let us place 0100h in the IP.... now type "t" again. The instruction to ADD AX,BX is done again. That time it added AX (Ah) to BX (2h) to get.... How about checking some other numbers for AX and BX...try placing 1 in AX, 1 in BX, resetting the IP to 100h and doing a trace.

SUBTRACTION
The machine code for SUB AX,BX is 29D8h.

Using the e command, edit the instruction at 0100 to subtract the BX from the AX register. As a tip, you can enter it in one go by using a space after the first byte has been entered and continuing. Note that the old bytes for ADD AX,BX (01 D8 ) come up as we enter the new code. try running the code, remember to set the IP to 0100h each time, otherwise...it could be bad. Try placing 0 in AX and 1h in BX. Mmmm, intels have a funny way of showing negative numbers. It all works out in the end though.

MULTIPLICATION...a trick
The machine code for MUL AX,BX is F7E3h. Try it, (make sure about the IP, otherwise...) put 5h in AX and 3h in BX and edit the instruction at 0100 to F7E3h Before you do the trace, do a register dump, the instruction is MUL BX. No reference is made to AX. The processor assumes you are going to multiply the AX register....now do the trace. You should see the value F stored in AX, which is the hex value for 15 (5 * 3 = F) put 3A7h in AX and 92Ah in BX, reset the IP to 0100. We are multiplying together 935 and 2346 decimal, which should give an answer 2193510 decimal, or 217866h do a trace. The DX register is now not 0000, it should be 21h. This is because multiplying together registers, which are 2 byte words, gives some big answers. To prevent overflow, the answer is stored in a pair of registers, the AX and the DX. The high word is stored in DX, the low word in AX. AX should contain 7866h set AX to FFFF and BX to 0010, then try the MUL AX,BX instruction again. can you explain what has happened. AX is full, so it carries into DX when multiplied by BX. BX is 10h, so it shifts the AX one place to the left (like multiplying by 10 eh ?) F is the carry, FFF0 is left in AX. Easy peasy. Multiplying by 16 in the hex world is a breeze.

When the registers are combined like this we write it as a double word (4 bytes) AX:DX, tradtion states that the low word always comes first. This is an important lesson for novice hackers.

DIVISION
The machine code for DIV AX,BX is F7F3h. Set this at address 0100h The machine wants to find a double word in AX:DX with which to divide by BX Set DX = 007Ch AX = 4B12h BX = 0100h. We are dividing 007C4B12h by 0100h. now set the IP back to 0100. (or it could be bad) Register dump to check all is ready and correct. ..trace. The answer should be 7C4Bh with a remainder 12h. Where did you see the answers ? The dividend is in AX, the remainder in DX, neat eh ? How to Quit To exit from debug you type the command "q": This is an example of useful information buried at the bottom of the page.

Summary
Congratulations, you have learnt the machine code and assembler instructions for addition, subtraction, multiplication and division. Along the way we learnt about hex numbers and the debug commands r - register e - enter t - trace q - quit As you have gathered, we are not writing in assembler, we are actually writing in machine code. We are placing into the registers and RAM the machine code for simple arithmetic. Later, we will use ADD AX,BX directly instead of 01D8h, but for now we will continue to

use machine code for some more simple examples. The next step is to write a two line program that does something. Wooh Hooo...

ASSEMBLER PROGRAMMING USING DEBUG INTS, MOV, Dumps, Unassemble and Assemble
This tutorial will take us a few steps further. We will still be writing in machine code, but we will investigate running a small, two line program and exiting from it gracefully instead of tracing through it. Along the way we will discuss the wonderful world of DOS/BIOS interrupts and learn about moving data into the registers. We will also perform the miracle of writing to the display. Finally, we will learn how to write in assembler instead of machine code. The Registers Hi and Lo Bytes There are four general purpose registers used in an intel CPU. AX, BX, CX and DX. Each register is 2 bytes long and is divided up into a high byte and a low byte. AX is made up from AH and AL, BX is BH and BL and so on. Processors from the 386 and up have an additional 4 registers called EAX, EBX, ECX and EDX, which are extended registers which can hold a double word or 4 bytes. This makes the processor a 32 bit processor. Unfortunately debug is quite old and can't access the extended registers. Terminating Normally- The Technical stuff In procedural languages such as VB or C, the "end" statement must appear somewhere so that the code can stop running. If we want to run a program using assembler we must also tell the computer when to stop executing the code, otherwise the IP will keep clicking over into instructions we never intended to execute. What actually happens during a termination is that the current process (our assembler program) is booted out of the CPU and control returned to the Operating System. This is hard to visualise, even with a DOS program, since the actual code is only running for a fraction of the time the CPU is ticking over, it has more important things to do like update the video display, run the clock, check the keyboard and so on. With windows95/98/NT and other time sharing multitasking operating systems, such as Unix, our humble assembler program is only one of many programs sharing the CPU. In any case, we are running the program from within another program. The debug program. It is our mother and we are aptly called the daughter. Debug is looking after us and we really want to terminate and let debug carry on as before. To do this the operating system needs to be told something fundamental, rather than just add something up using registers. The way this is handled is through interrupts.

Interrupts There are many hundreds of interrupts. A complete list of interrupts can be downloaded from the internet and consumes a small forest if printed. Each interrupt has many variations. If we want to tell a video card to change modes we use an interrupt. If we want to write to a floppy, we use an interrupt. If we want to make a sound we use an interrupt. An interrupt is a signal to the computer to stop doing what it is currently doing and do something with the hardware (or software). The CPU actually spends part of its time watching for interrupts. It watches the mouse, keyboard, clock and many other devices. It is important we use the correct interrupt and it is here that the hardware to operating system relationship is strained. How does the operating system know that the particular video card installed uses interrupt #10 with AH=0h and AL=29h to set a graphics screen with 256 colours 800*600 ? What if it sets AL=30h instead and the video card thinks this is the instruction to start a small fire in the power supply unit ? As you can see, using the wrong interrupt is bad. This is where the drivers come in. Drivers let the operating system know what interrupts the particular piece of hardware uses. Some drivers are standard, the mouse driver for example, is pretty well straightforward. Others, such as the latest driver for the voodooIII 3Dfx card uses interrupts designed to give programmers nightmares. Well, this was a bit of a digression, after all, I really just needed to say that int20 stops program execution. Int 20 - The easy way to stop a program Something to do... set the IP to 0100h (use r IP <enter>100<enter>) place the code CDh 20h at offset address 0100 (use e 0100 <enter> CD <space> 20 <enter>) do a register dump to check ("r"), you should see the instruction decoded as INT 20 To run this we can't really use the t command, since the actual interrupt is itself a small section of code and we would need to trace through several hundred lines of code to get the thing to work. This is further complicated by the fact that the interrupt is located in a different segment of RAM and doing a trace means we must reset the CS register afterwards. Instead we need to actually run the code. Whoa ! To run the program (all one line of it, which says to stop !) we use the "g" command, we tell debug where to stop executing by placing an address after "g". We will stop execution

at (which means before) 102 (and after 100). Execution starts at the current IP, so make sure it is set to 0100h. g 102 Why did we tell debug to stop at 102 ? For safety, my dear hacker, for safety ! If we had the wrong interrupt and had inadvertently told the computer to start trying to spin the hard drive off its spindle, the process would at least stop after 0100 before going into limbo with the garbage instructions after that point. It stops before instruction 102 and we have a chance to stop our runaway train. In this case (hopefully), you got a message saying "program terminated normally". Nice message to get, that one. Now we are confident it works, give it full throttle, reset the IP to 0100 (I'll have to keep reminding you, otherwise it will be bad) and let the interrupt stop the program by itself... g So now we know how to stop. Use Int20. Writing to the Display DOS is a program that is sitting in a part of RAM. DOS provides a whole pile of interrupts with which you can do useful things. DOS routines all use the same interrupt, int21h. You tell the computer which particular DOS routine you actually want by placing values in the registers. If we set AL to 02h this tells DOS to print the character which is located in the high byte of the DX register (DH). Lets do it... set the IP to 0100 Place CDh 21h at address 0100h (using the e command remember ?) Place CDh 20h at address 0102h Do a register dump (use the r command dummy !) You should see INT 21 as the command Now we need to tell DOS what to do when it gets called.... place 02h into AL...(the print command). Whoa ! not so straightforward. You'd think that with 02h in AL, AX would look like 0020h yes ? But no ! the low byte actually comes first, like in counting 1,2,3 etc AX is made up of AL:AH. The actual command is:

r AX <enter> 0200 <enter> AL is actually stored as the first byte and AH as the second, so setting AX to 0200h places 02h into AL and 00h into AH. The same funny thing happened with the double word register AX:DX, the high part comes last. This is worth remembering, bytes are stored back to front to make up words. finally place the character we want to print in DH....again we need to remember the high byte comes second. r DX 004C 'high byte is second....

Now we are ready to go, (do a register dump to make sure) g You should see the letter "L" followed by the words "program terminated normally" The whole thing is shown below

You can see here where I entered the code, changed the AX and DX registers, set the IP and then, on a wing and a prayer, typed g. Fun isn't it ?

The code we placed in DH was the ASCII code for the letter L. Try placing different ASCII codes in DH to see that it works. Always remember to reset the IP before typing g. If you forget and bad things happen, the simplest way out of the mess is to stop debug and start again. Windows95/98 is quite good at stopping programs that are heading off to woop woop...usually. Remember, quit debug using the q command. Unassemble So far we have used the register dump to examine a single line of code and the values of the registers. Debug has a command available that will take the contents of RAM and unassemble the machine code into assembler instructions. The command is "u". Lets try it on our code.... u 100 'Unassemble the code beginning at offset 0100h

As you should see, the first two instructions are int 21 int 20 the rest are garbage instructions. Debug just unassembles machine code, it does not try and pretend the code makes sense. The instructions after our two lines are what debug deciphers into assembler from the random junk that was sitting in RAM when debug was started. Assemble Now we are ready to use the power of Debug to convert assembler into machine code. This way we don't have to mess around with hex machine codes, we can write in somewhat meaningful mnemonics. Lets assemble some code starting at address 0100 a 100 mov AH,02 mov DL, 4C int 21 mov DL, 55 int 21 mov DL, 43 int 21 mov DL, 49 int 21 mov DL, 4E int 21 mov DL, 44 'start assembly at offset adress 0100h

int 21 mov DL, 41 int 21 mov DL, 4C int 21 mov DL, 45 int 21 int20 <enter> 'enter on a blank line causes debug to stop assembling do an unassemble to make sure you have it right u 100 it should look like this....

Now, set the IP to 100 and type g Interesting huh ? We have used a new instruction, MOV, the machine code for MOV AH is B4h, the second byte specifies what to move, in this case on line 0100 it says B402, move 02. The code for MOV DL is B2h.

Much easier to remember MOV DL than B2h. This code moves a succession of ASCII characters into DL, each time calling interrupt 21 to print them on the display. Finally it calls int 20 to terminate. Writing Text This was quite cumbersome. Why can't we write a whole string without having to move the characters into DL each time ? DOS, in its generosity, provides us with another interrupt routine which lets us write out a string to the display. Int21 AH=09h does the job. How does DOS know the string is finished ? We must place a special character at the end of the string. In this next example we are going to write the string into memory starting at offset 200 (so that it doesn't interfere with our code which starts at 100). Yippee, we are going to have a data section ! e 200 48 65 6C 6C 6F 2C 20 44 4F 53 20 68 65 72 65 2E 24 <enter> Use spaces between the bytes. The last number (24h) is the ASCII code for $, which is the end of string character recognised by DOS. DOS will print out the characters until it gets to 24h. We can now assemble the code to print out this marvellous string a 100 MOV AH,09 MOV DX, 0200 INT 21 INT 20 <enter> 'the DOS interrupt routine for printing a string 'this is where the string is located in memory that we wish to print 'call the interrupt 'terminate

make sure the IP is set to 100h, then type g. DUMP When we unassemble we ask Debug to take machine code and translate into assembler. When we assemble, we ask debug to make up the machine code for the instructions we give it. When we dump we ask debug to give us a raw dump of memory, Debug doesn't do anything to it except give us a display of the bytes (in rows of 16) and an ASCII value for each byte, since it might be useful. The command for dump is...you guessed it..."d"

if you worked through the example above then... Try dumping 0200

There, in the mess of characters on the right, is our message. On the left are the bytes we typed in. The rest is garbage. There should be a lesson in the garbage though. It tells us that although we haven't played with RAM at offset 0230, there is in fact something there apart from 00h. Never assume a zero value is in memory, always initialise your data if you want to use it. Well Done If you have got this far and have understood what is going on, and have worked through the examples, you have done very well. As you might have gathered, assembler programming is a bit of a black art. There are lots of little secrets, lots of reading and a large potential for stuffing up. Nothing is easy in assembler. We now have a few debug commands up our sleeve, you should write them down and what they do: e r u

d a r XX t g q We also have a few assembler commands that are useful: ADD SUB MUL DIV MOV INT

(20 and 21)

We have also learnt heaps about bytes, words, hex and ASCII. If you are game, you might investigate how we write a program in assembler and actually save it to a file so that we can run it as a standalone program independently of debug. The assignment will give you a god test of your understanding of this topic.

Potrebbero piacerti anche