Sei sulla pagina 1di 13

LAB NO.

4
THE ELEMENTS OF THE ASSEMBLY LANGUAGE AND THE FORMAT OF THE EXECUTABLE PROGRAMS 1. Object of lab
The purpose of this lab is to present: the instruction format in assembly language, the most important pseudo-instructions regarding segments, data reservation and definition. The structure of the executable programs .COM and .EXE. is shown

2. Theoretical considerations
2.1. The elements of the assembly language TASM 2.1.1. The format of the instructions An instruction is always written on one line of maximum 128 characters. The general format is: [<label>:] [<opcod>[<operands>][;<comments>]] where: <label> is a name, maximum 31 characters (letters, numbers or special characters _,?,@,..), the first character is a letter or one of the special characters. Each label has attached a value. This value, the offset, is the relative address in the segment where it belongs, or the segment and the offset, Segment:Offset <opcod> the mnemonic of the instruction; <operands> the operand (or operands) associated with the instruction are in concordance to the syntax required for the instruction. It may be a constant, a symbol or expressions containing these; <comments> a certain text after the character ; . The insertion of blank lines and of a number of spaces or TABs is allowed. These facilitate easy reading of the program.

27

ASSEMBLY LANGUAGE PROGRAMMING

2.1.2 Specification of constants Numerical constants are represented by a string of digits, the first digit is decimal digit from 0 to 9 (if for example the number is in hexadecimal and starts with a character, a 0 will be put in front of it). The basis of the number is specified through a letter at the end of the number (B for binary, Q for octal, D for decimal, H for hexadecimal; without an explicit specification, the number is considered in decimal). Examples: 010010100B, 26157Q (octal), 7362D (or 7362), 0AB3H. To specify a real number use the decimal point, e.g. 1.0; The coma is a list separator so 1,0 means two elements!!! Character constants or character strings are specified between quotation ( ) or apostrophes ( ). Examples: string of chars, string of characters 2.1.3. Symbols The symbols are user defined and represent memory locations. These can be: labels or variables. Any symbol has the following attributes: - the segment where it is defined, this attribute may miss - the offset (the relative address in the segment) - the type of the symbol (given by the definition)

2.1.4. Labels Labels may be defined only in code segments and they can be operands to CALL or JMP instructions. The attributes of labels are: - the segment (generally CS) is the address of the paragraph where the segment begins which contains the label. When a reference is made to the label, the value is found in CS (the effective value is known only during execution) - the offset is the distance in bytes of the label from the beginning of the segment where it has been defined - the type determines the reference manner of the label; there are two types: NEAR and FAR. A NEAR type reference can be used only in the same segment where it has been defined so only the offset is present. The FAR type reference specifies also the segment (segment: offset). - The labels are defined at the beginning of the source line. If the label is followed by a : character then it will be of NEAR type. 28

ELEMENTS OF ASSEMBLY LANGUAGE AND THE FORMAT OF THE EXECUTABLE PROGRAMS 2.1.5. Variables The definition of variables (incorrectly called data labels) may be made with space booking pseudo-instructions. The attributes of variables are: - segment and offset similarly to labels - the type is a constant, which shows the length (in bytes) of the booked memory area: BYTE (1), WORD (2), DWORD (4), QWORD (8), TWORD (10), STRUC (defined by the user), RECORD (2). Examples: DAT DB 0FH, 07H ; occupies one byte each, totally 2 DATW LABEL WORD;label for type conversion MOV AL,DAT ; AL<-0FH MOV AX,DATW ; AL<-0FH, AH<-07H MOV AX,DAT ; type error !!! 2.1.6. Expressions Expressions are defined through constants, symbols, pseudooperators and operators (for variables are considered only by address and not content, because when assembling, only the address is known segment:offset). 2.1.7. Operators (in the order of priorities) 1. Brackets () [] . (dot) - structure_name.variable serves for binding the name of a structure with its elements LENGTH number of zone element SIZE the zone length in bytes WIDTH a fields width from a RECORD Example: EXP DW 100 DUP (1) Then: LENGTH EXP has the value 100 TYPE EXP has the value 2 SIZE EXP has the value 200 2. segment register: - redefinition of implicit segment

29

ASSEMBLY LANGUAGE PROGRAMMING

Example: MOV AX, ES:[BX] 3. PTR redefinition of variable type Example: DAT DB 03 MOV AX, WORD PTR DAT OFFSET returns the offset of a symbol SEG returns the segment of a symbol TYPE a zones type (length of elements) THIS creation of expression of given type with CLC value operative (Current Location Counter) Example: SIRO EQU THIS BYTE SIRC DW 100 DUP(?) SIRC is a defined of 100 length; the variable SIRO has the same segment and offset as SIRC but it is BYTE type. 4. HIGH addresses the high part of a word LOW addresses the low part of a word Example: DAT DW 2345H MOV AH, HIGH DAT ; AH<-23 5. * / MOD Example: MOV CX, (TYPE EXP)*(LENGTH EXP) 6. +7. EQ, NE, LE, LT, GE, GT 8. NOT logic operative 9. AND 10. OR, XOR 11. SHORT forces the short appeal Example: JMP label ; direct jump JMP SHORT label ; IP is relative

2.1.8. Pseudo instructions 30

ELEMENTS OF ASSEMBLY LANGUAGE AND THE FORMAT OF THE EXECUTABLE PROGRAMS Pseudo-instructions are commands (orders, instructions) for assembler, necessary for the proper translations of the program and for facilitating the programmers job. We will present only the pseudo-instructions indispensable in writing the first programs. They are called Pseudo instructions because they do not generate machine code, they only instruct the assembler how to deal with things. 2.1.9. Pseudo-instructions to define segments Any segment is identified with a name and class, both specified by the user. When defined, the segments receives a series of attributes, which specify the assembler and the link-editor the relations between segments. The segments definition is made through: segment_name SEGMENT [align_type] [combine type] [class] ;... ... segment_name ENDS where: segment_name is the segments name chosen by the user (the name is associated with a value, corresponding to the segments position in memory). align_type is the segments alignment type. Possible values are: PARA BYTE WORD PAGE (paragraph alignment, 16 bytes multiple) (byte alignment) (word alignment) (page alignment 256 bytes multiple)

combine_type is actually the segments type and represents an information for the link-editor specifying the connection of segments with the same name. It may be: PUBLIC specifies the concatenation COMMON specifies the overlap AT expression specifies the segments load having the address expression *16 STACK shows that the current segment is the stack segment MEMORY specifies the segments location as the last segment from the program, highest in physical memory class is the segments class; the link-editor continually arranges the segments having the same class in order of their appearance. It is recommended to use the code, data, constant, memoryor stack classes.

31

ASSEMBLY LANGUAGE PROGRAMMING

2.1.10. The designation of the active segment In a program more segments could be defined. The assembler verifies whether the data or the instructions addressed may be reached with the segment register having a certain content. For a proper workflow the address of the active segments must be communicated to the assembler. You must tell which segment register contains the address of which segment. ASSUME < seg_reg >:< seg_name>, < seg_reg >:< seg_name>... reg-_eg the segment register seg_name the segment which will be active Example: ASSUME CS:prg, DS:date1, ES:date2 Remarks: - the pseudo-instruction does not load the segment register but communicates the assembler where the symbols are defined - DS should be loaded at the beginning of the program with a typical sequence: ASSUME DS: data_seg_ name MOV AX, data_seg_ name MOV DS, AX - CS must not be initialized, but must be activated with ASSUME before the first label - instead of seg_name from ASSUME the NOTHING identifier may be used if we dont want to associate a segment to the segment register. 2.1.11. Memory location reservation Usually the data is defined in a data segment. The pseudo-instruction to define and allocate space in memory is: <name> <type> [expression list] [<factor> DUP (<expression list>)] where: name is the symbols name type - is the symbols type: DB for one byte space DW for one word space (2 bytes) DD for double word space (4 bytes) DQ for quadruple word space (8 bytes) DT for 10 bytes space

32

ELEMENTS OF ASSEMBLY LANGUAGE AND THE FORMAT OF THE EXECUTABLE PROGRAMS expressions list an expression which whose result the memory location is initialized; the ? character is used to allocate only with no initialization factor a constant, which shows how many times the list after DUP is repeated Examples: DAT db 45 dat1 db 45h, a, A, 85h dat2 db abcdefghi;the text is generated lg_dat2 db $-dat2 ;the length of the string dat2 ($ is the value of CLC Current Location Counter) Aa db 100 dup(56h) ; 100 bytes initialized with the value 56h bb db 20 dup (?) ; 20 bytes that are not initialized neared dw dat1 ;contains the offset of the variable dat1 farAdr dd dat1 ; contains the address variable dat1 (offset + segment) of

2.1.12. Other possibilities for defining symbols - the definition of constants: <name> EQU <expression> The symbol name will be replaced with the values expression. - labels declaration: <name> LABEL <type> <name> label will have the value of the segment where it is defined, the offset equal to the offset of the first instruction for data assignment or other instructions which follow and the type defined by the <type> which may be: BYTE, WORD, DWORD, QWORD, TBYTE, the name of a structure, NEAR or FAR. If : is used after the name, the label will be of NEAR type: Example: if we have the definitions ENTRY ENTRY1: then: LABEL FAR

33

ASSEMBLY LANGUAGE PROGRAMMING

JMP ENTRY JMP ENTRY1

; is FAR type jump ; is NEAR type jump

2.1.13. Current Location Counter CLC modification ORG <expression> ; the CLC will get the expressions value Example: ORG 100h ORG $+2 ; CLC at 100h ; skip 2 bytes ($ means CLCs current value)

2.1.14. Procedure definition A procedure may be defined as a sequence of instructions which ends with RET instruction and is called with CALL. The definition is made with the sequence: <procedure_name> PROC <[NEAR], FAR> ; the procedures instructions < procedure_name > ENDP Example: ; DBADD procedure, which at (DX:AX) adds (CX:BX) with the result in (DX:AX) DBADD PROC NEAR ADD AX,BX ; add word LOW ADC DX,CX ; add word HIGH with CARRY RET DBADD ENDP The call is made with CALL DBADD from the same segment. From other segments the procedure is visible but unreachable. Remarks: the declaration of the procedure does not generate any machine code, neither the ENDP; the user must assure the returning with RET. - the same procedure may not be called as FAR and as NEAR. This attribute is established very carefully during program design (the solution of declaring FAR every procedure is apparently simple but totally non-economic). - you may define nested procedures 2.2. The programs structure in assembly language 2.2.1. .COM programs

34

ELEMENTS OF ASSEMBLY LANGUAGE AND THE FORMAT OF THE EXECUTABLE PROGRAMS The program contains only one segment, so the code and data may have, on the whole, maximum 64KB; all references are relative to the start of the segment. The first executable instruction of the source program must be at address 100H, usually a JMP. The ORG 100H pseudo-instruction is used to reserve space for PSP. The data can be placed anywhere in the program, but it is recommended to be put them at the beginning. Attention must be paid not to execute by mistake the data area, you should skip over the data area with a jump instruction, otherwise these will be interpreted as instructions, the result being other than one should expect. It is not necessary to initialize the segment registers, they are all loaded with the common value from CS. The program ends with a RET or with a call to the DOS system function return to OS: INT 21H with parameter 4CH in AH .

2.2.2. Sample for .COM style programs ;-------------------------copy from here COMMENT * short presentation of the program * CODE SEGMENET PARA PUBLIC CODE ASSUME CS:CODE, DS:CODE, ES:CODE ORG 100H START: JMP ENTRY ;this will skip the data area ;define and place your data here ENTRY: ; put your programs instructions here MOV AH,4CH INT 21H ; exit to operating system CODE ENDS END START ;--------------------------copy until here

35

ASSEMBLY LANGUAGE PROGRAMMING

2.2.3. .EXE style programs The programs may be as large as the available memory. For the correct execution, the user must explicitly initialize DS, ES and SS registers. It is recommended to design .EXE programs as FAR type procedures in order to enable correct context reversing. If the program has been called from another program it should be able to return control to the calling program. For this reason use the sequence: push ds mov ax,0 push ax A vector is saved on the stack which points at the start of the PSP (Program Segment Prefix), a DOS data structure. The exit of .EXE program may be made by a RET in FAR context.

36

ELEMENTS OF ASSEMBLY LANGUAGE AND THE FORMAT OF THE EXECUTABLE PROGRAMS

2.2.4. Sample for .EXE style program COMMENT * identification information for the program, author, data, programs function, utilization * ;------------------------------------------------; EXTERN section ; the declaration of external variables ;------------------------------------------------;-----------------------------------------------; PUBLIC section ; the list of GLOBALS variables defined in this file ;------------------------------------------------;------------------------------------------------; CONSTANT section ; The definitions of constants, including INCLUDE ;instructions, which read ; constant definitions STACK_SIZEEQU 256 ;------------------------------------------------;------------------------------------------------; MACRO section ; Macro definitions, structures, recordings and/or ;INCLUDE instructions which read such definitions ;------------------------------------------------;------------------------------------------------; DATA section ; data definitions ;------------------------------------------------DATA SEGMENT PARA PUBLIC DATA ;... ... define data area DATA ENDS ;... ... more data segments if needed ;------------------------------------------------; STACK section ;------------------------------------------------MySTACK SEGMENT PARA STACK STACK DW STACK_SIZEDUP (?) ; the stack will have 256 words STACK_START LABEL WORD ; the top of the stack 37

ASSEMBLY LANGUAGE PROGRAMMING

MySTACK ENDS ;------------------------------------------------; CODE section ;------------------------------------------------CODE SEGMENT PARA PUBLIC CODE START PROC FAR ASSUME CS:CODE, DS:DATA PUSH DS ;prepare to return to DOS XOR AX,AX PUSH AX MOV AX,DATA; initialise DS MOV DS, AX ;------------------------------------------------;... ... the main programs instructions ;------------------------------------------------RET ; FAR return to DOS START ENDP ;------------------------------------------------; PROCEDURES ; other procedures from the main program ;------------------------------------------------CODE ENDS ;... ... more code segments if needed ;------------------------------------------------; the memorys segment section ;------------------------------------------------MEMORY SEGMENT PARA MEMORY MEMORY ;... ... programs at high addresses ;... the definition of the programs memory limits MEMORY ENDS END START 2.3. Program written in assembly language The program calculates the sum of an array of unsigned numbers at SIR address and length specified in LGSIR variable; the result will be put in SUM location. The first source program will be of .COM type CODE SEGMENT PARA PUBLIC CODE ASSUME CS:CODE, DS:CODE ORG 100H START: JMP ENTRY ;first executable instruction SIR DB 1,2,3,4 LGSIR DB $-SIR ;length of array SUM DB ? ; place for sum 38

ELEMENTS OF ASSEMBLY LANGUAGE AND THE FORMAT OF THE EXECUTABLE PROGRAMS ENTRY: MOV CH,0 MOV CL,LGSIR MOV AL,0 MOV SI,0 NEXT: ADD AL,SIR[SI] INC SI LOOP NEXT ;program actually starts here ; in CX the number of elements ; initialize sum with zero ; first element

; add of the current element ; next element ; Decrement CX and jump to next ; if CX differs from 0 MOV SUM,AL; store result in SUM MOV AH, 4Ch ; end of program INT 21H ; DOS system function call CODE ENDS ; end of segment END START

3. Lab tasks
1. Understand the presented examples 2. Assemble, link-edit and trace the above program. Watch registers and memory contents, SUM location. 3. The same program will be rewritten and traced in .EXE format. 4. Change the program to compute the sum of an array of words (DW) not bytes (DB). Make more changes to be able to add larger numbers. Hint: the result may not fit in the same space as the elements to add.

39

Potrebbero piacerti anche