Lecture #6: Assembler: From Nand To Tetris

Lecture #6:
Assembler
From Nand to Tetris
www.nand2tetris.org
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 2: Boolean Arithmetic slide 1
Where we are at:
Abstract design
Human abstract interface
Software
Thought hierarchy
Chapters 9, 12 Compiler
H.L. Language
& abstract interface
Chapters 10 - 11
Operating Sys.
VM Translator
Virtual
abstract interface
Machine
Chapters 7 - 8
Assembly
Language
Assembler
Chapter 6
abstract interface
Computer
Machine Architecture
abstract interface
Language
Chapters 4 - 5
Hardware Gate Logic
abstract interface
Platform Chapters 1 - 3 Electrical
Chips & Engineering
Hardware Physics
Logic Gates
hierarchy
Why care about assemblers?
Because …
•  Assemblers employ ni3y programming tricks
•  Assemblers are the first rung up the so3ware hierarchy ladder
•  An assembler is a translator of a simple language
•  Wri=ng an assembler = low-impact prac=ce for wri=ng

compilers.
–  WriAen in Java
–  Will come up with useful methods along the way
Assembly example For now,
ignore all
details!
Source code (example) Target code
// Computes 1+...+RAM[0] 0000000000010000

// And stored the sum in RAM[1] 1110111111001000

@i 0000000000010001

M=1 // i = 1 1110101010001000

@sum 0000000000010000

M=0 // sum = 0 1111110000010000

(LOOP) assemble 0000000000000000
execute
@i // if i>RAM[0] goto WRITE 1111010011010000

D=M 0000000000010010

@R0 1110001100000001

D=D-M 0000000000010000

@WRITE 1111110000010000

D;JGT 0000000000010001

... // Etc. ...

The program translation challenge

n  Extract the program’s semantics from the source program,
using the syntax rules of the source language
n  Re-express the program’s semantics in the target language,
using the syntax rules of the target language
Assembler = simple translator
n  Translates each assembly command into one or more binary machine instructions
n  Handles symbols (e.g. i, sum, LOOP, …).
Revisiting Hack low-level
programming: an example CPU emulator screen shot
Assembly program (sum.asm) after running this program
// Computes 1+...+RAM[0]
// And stores the sum in RAM[1].
@i
M=1 // i = 1

@sum
M=0 // sum = 0 user
supplied
(LOOP) input
@i // if i>RAM[0] goto WRITE
D=M
@0 program
D=D-M generated
output
@WRITE
D;JGT
@i // sum += i
D=M
@sum
M=D+M
@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(WRITE)
@sum
D=M
@1
M=D // RAM[1] = the sum
The CPU emulator allows loading and executing
(END)
symbolic Hack code. It resolves all the symbolic
@END
symbols to memory locations, and executes the code.
0;JMP
The assembler’s view of an assembly
program
Assembly program
Assembly program =
// Computes 1+...+RAM[0]
a stream of text lines, each being
@i one of the following:
M=1 // i = 1
@sum q  A-instruction
M=0 // sum = 0
(LOOP) q  C-instruction
@i // if i>RAM[0] goto WRITE
D=M q  Symbol declaration: (SYMBOL)
@0
D=D-M q  Comment or white space:
@WRITE
D;JGT // comment
@i // sum += i
D=M
@sum
Helper Methods:
M=D+M
@i // i++ q  cleanLine(raw : String)
M=M+1
@LOOP // goto LOOP q  parseCommandType(clean : String)
0;JMP
(WRITE) q  Can you write a parse method for each
@sum instruction?
D=M The challenge:
@1
(END)
Translate the program into a sequence of
@END 16-bit instructions that can be executed
0;JMP by the target hardware platform.
Translating / assembling
A-instructions
Symbolic: @value // Where value is either a non-negative decimal number
// or a symbol referring to such number.
value (v = 0 or 1)
Binary: 0 v v v v v v v v v v v v v v v
Translation to binary:
q  If value is a non-negative decimal number, simple
q  If value is a symbol, later.

Helper Methods:
q  isValidSymbol(symbol : String) : boolean
q  decimalToBinary(toConvert : int) : String
Translating / assembling C-instructions
Symbolic: dest=comp;jump // Either the dest or jump fields may be empty.
// If dest is empty, the "=" is ommitted;
Transla=on to binary: // If jump is empty, the ";" is omitted.
simple! comp dest jump
Binary: 1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
Translating / assembling C-
Instructions
•  Approach:
–  Parse C-instruc=on into dest, comp, comp
•  Already know the structure for this! Great!
–  For each part, look for M/D/A/etc. symbols
•  Add appropriate bit/flag for each symbol in appropriate place
•  Return String of bits
–  Doable, but complicated!
•  Approach #2:
–  Realize that there is a small set of possibili=es (see previous slide)
–  Create a lookup table of all possibili=es
•  Tedious, but the investment pays off!
•  Lookups are EASY
•  If string present in look up table, return matching string of bits
•  Otherwise, return null (great for error checking)
•  What’s a look up table?
–  HashMap
•  Think dic=onaries
–  “parsec” : “unit of length for astronomically large distances”
–  Key : Value
HashMap
•  So a dic=onary is just a set of Key/Value pairs
•  Using HashMap:
import java.util.HashMap;
•  Create a HashMap:
//String key(mnemonic), String value(bits)
HashMap<String, String> compCodes =
new HashMap<String, String>;
•  Add key/value pair to HashMap:
compCodes.put(“A+1”, “0110111”);
//a c1 c2 c3 c4 c5 c6 values for A+1. why a here?
•  Check if key exists in HashMap:
compCodes.containsKey(“A+1”); //returns boolean
•  Lookup value for key in HashMap:
compCodes.get(“A+1”);
//returns null if key not present, else value
•  Remove key/value pair in HashMap (useful later):
compCodes.remove(“A+1”);
//returns null if key not present, else value
The overall assembly logic
Assembly program
For each (real) command // Computes 1+...+RAM[0]
q  Parse the command, @i
i.e. break it into its underlying fields M=1 // i = 1
using parse() method @sum

M=0 // sum = 0
q  A-instruc=on: replace the symbolic (LOOP)
reference (if any) with the @i // if i>RAM[0] goto WRITE

D=M
corresponding memory address, @0
which is a number (…more later) D=D-M
@WRITE
q  C-instruc=on: for each field in the D;JGT
instruc=on, generate the @i // sum += i
corresponding binary code D=M

@sum
using parse*() methods M=D+M
q  Assemble the translated binary codes

@i // i++
M=M+1
into a complete 16-bit machine @LOOP // goto LOOP
instruc=on 0;JMP
(WRITE)
using Code class methods and parts of @sum
instruc=on from Parser variables D=M
q  Write the 16-bit instruc=on to the

@1
output file. (END)
@END
0;JMP
Typical symbolic Hack
Handling symbols (aka symbol resolution) assembly code:
@R0
D=M
Assembly programs typically have many @END
symbols: D;JLE
@counter
q  Labels that mark des=na=ons of goto M=D
commands @SCREEN
D=A
q  Labels that mark special memory loca=ons @x
q  Variables
M=D
(LOOP)
@x
A=M
These symbols fall into two categories: M=-1
q  User–defined symbols (created by programmers) @x
D=M
q  Pre-defined symbols (used by the Hack plakorm). @32
D=D+A
@x
M=D
@counter
MD=M-1
@LOOP
D;JGT
(END)
@END
0;JMP
Handling symbols: user-defined symbols assembly code:
@R0
Label symbols: Used to label des=na=ons of goto D=M
commands. Declared by the pseudo-command @END
D;JLE
(XXX). This direc=ve defines the symbol XXX to @counter
refer to the instruc=on memory loca=on holding the M=D
next command in the program @SCREEN
D=A
Variable symbols: Any user-defined symbol xxx @x
appearing in an assembly program that is not M=D
defined elsewhere using the (xxx) direc=ve is (LOOP)
treated as a variable, and is automa=cally assigned a @x
A=M
unique RAM address, star=ng at RAM address 16 M=-1
(why start at 16? Later...) @x
D=M
By conven=on, Hack programmers use lower-case @32
and upper-case to represent variable and label D=D+A
names, respec=vely @x
M=D
@counter
Q: Who does all the “automatic” assignments of symbols MD=M-1
to RAM addresses? @LOOP
See reading/appendix for valid symbol chars D;JGT
(END)
A: As part of the program translation process, the assembler @END
resolves all the symbols into RAM addresses. 0;JMP
Handling symbols: pre-defined symbols assembly code:
@R0
D=M
Virtual registers: @END
The symbols R0,…, R15 are automatically predefined to D;JLE
refer to RAM addresses 0,…,15 @counter
M=D
I/O pointers: The symbols SCREEN and KBD are automatically @SCREEN
predefined to refer to RAM addresses 16384 and 24576, D=A
respectively (base addresses of the screen and keyboard @x
M=D
memory maps)
(LOOP)
VM control pointers: the symbols SP, LCL, ARG, THIS, and THAT @x
(that don’t appear in the code example on the right) are A=M
M=-1
automatically predefined to refer to RAM addresses 0 to
@x
4, respectively D=M
(The VM control pointers, which overlap R0,…, R4 will come to @32
D=D+A
play in the virtual machine implementation, covered in the
@x
next lecture) M=D
@counter
Q: Who does all the “automatic” assignments of symbols MD=M-1
to RAM addresses? @LOOP
D;JGT
A: As part of the program translation process, the assembler (END)
resolves all the symbols into RAM addresses. @END
0;JMP
Handling symbols: symbol table
Source code (example) Symbol table
// Computes 1+...+RAM[0] Every program has these! R0 0

// And stored the sum in RAM[1] R1 1

@i R2 2

M=1 // i = 1 ... ...

@sum R15 15

M=0 // sum = 0 SCREEN 16384

(LOOP) KBD 24576

@i // if i>RAM[0] goto WRITE SP 0

D=M LCL 1

@R0 ARG 2

D=D-M THIS 3

@WRITE
THAT 4

D;JGT
WRITE 18

@i // sum += i
END 22

D=M
@sum i 16

M=D+M sum 17

@i // i++
M=M+1
@LOOP // goto LOOP
0;JMP
(WRITE) This symbol table is generated
@sum by the assembler, and used to
D=M
translate the symbolic code
@R1
M=D // RAM[1] = the sum into binary code.
(END)
@END
0;JMP
Symbol table
Handling symbols: constructing the symbol table R0 0

R1 1

Source code (example) R2 2

... ...

// Computes 1+...+RAM[0] R15 15

// And stored the sum in RAM[1]
SCREEN 16384

@i
KBD 24576

M=1 // i = 1
SP 0

@sum
LCL 1

M=0 // sum = 0
(LOOP) ARG 2

@i // if i>RAM[0] goto WRITE THIS 3

D=M THAT 4

@R0 WRITE 18

D=D-M END 22

@WRITE i 16

D;JGT sum 17

@i // sum += i
D=M Initialization: create an empty
@sum symbol table and populate it with all
M=D+M the pre-defined symbols
@i // i++
M=M+1 First pass: go through the entire
@LOOP // goto LOOP source code, and add all the user-
defined label symbols to the symbol
0;JMP
(WRITE)
@sum table (without generating any code)
D=M
@R1 Second pass: go again through the
M=D // RAM[1] = the sum source code, and use the symbol
(END) table to translate all the commands.
@END In the process, handle all the user-
0;JMP defined variable symbols.
The assembly process (detailed algorithm)
•  Ini=aliza=on: create the symbol table and ini=alize it with the pre-defined symbols
•  First pass: march through the source code without genera=ng any code.
For each label declara=on (LABEL) that appears in the source code,
add the pair <LABEL , n > to the symbol table

•  Second pass: march again (advance) through each line of source code, and process each:
–  If the line is a C-instruc=on, simple
–  If the line is @xxx where xxx is a number, simple
–  If the line is @xxx and xxx is a symbol, look it up in the symbol table and
proceed as follows:
•  If the symbol is found, replace it with its numeric value and complete
the command’s transla=on
•  If the symbol is not found, then it must represent a new variable:
add the pair <xxx , n > to the symbol table, where n is the next

available RAM address, and complete the command’s transla=on.

•  (Plakorm design decision: the allocated RAM addresses are running,
star=ng at address 16. Must keep track of next available RAM
address too!).
The result ... Target code
Source code (example)

0000000000010000

1110111111001000

// Computes 1+...+RAM[0] 0000000000010001

// And stored the sum in RAM[1] 1110101010001000

@i 0000000000010000

M=1 // i = 1 1111110000010000

@sum 0000000000000000

M=0 // sum = 0 1111010011010000

(LOOP) 0000000000010010

@i // if i>RAM[0] goto WRITE 1110001100000001

D=M 0000000000010000

@R0 1111110000010000

D=D-M 0000000000010001

@WRITE
D;JGT assemble 1111000010001000

0000000000010000

@i // sum += i 1111110111001000

D=M 0000000000000100

@sum 1110101010000111

M=D+M 0000000000010001

@i // i++ 1111110000010000

M=M+1 0000000000000001

@LOOP // goto LOOP 1110001100001000

0;JMP 0000000000010110

(WRITE) 1110101010000111

@sum
D=M
@R1 Note that comment lines and
(END)
pseudo-commands (label
@END declarations) generate no code,
0;JMP nor contribute to final line count
Proposed assembler implementation
An assembler program can be wriAen in any high-level language.
We propose a language-independent design, as follows.
So3ware modules:
q  Parser: Unpacks each command into its underlying fields
q  Code: Translates each field into its corresponding binary value,

and assembles the resul=ng values
q  SymbolTable: Manages the symbol table
q  Main: Ini=alizes I/O files and drives the show.
q  We will call this Assembler, and it also does error handling (excep=ons or booleans)!
Proposed implementation stages
q  Stage I: Build a basic assembler for programs with no symbols
q  Stage II: Extend the basic assembler with symbol handling

capabilities.
Parser (a software module in the assembler
program)
Parser (a software module in the assembler program) / continued
Code (a software module in the assembler program)
SymbolTable (a software module in the
assembler program)
Perspective
•  Simple machine language, simple assembler
•  Most assemblers are not stand-alone, but rather
encapsulated in a translator of a higher order
•  C programmers that understand the code generated by
a C compiler can improve their code considerably
•  C programming (e.g. for real-=me systems) may involve
re-wri=ng cri=cal segments in assembly, for
op=miza=on
•  Wri=ng an assembler is an excellent prac=ce for wri=ng
more challenging translators, e.g. a VM Translator and a
compiler, as we will do in the next lectures.

Lecture #6: Assembler: From Nand To Tetris

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Lecture #6: Assembler: From Nand To Tetris

Caricato da

Copyright:

Formati disponibili

Lecture #6:

• Assemblers employ ni3y programming tricks

• Assemblers are the ﬁrst rung up the so3ware hierarchy ladder

• An assembler is a translator of a simple language

• Wri=ng an assembler = low-impact prac=ce for wri=ng

– Will come up with useful methods along the way

The program translation challenge

q If value is a non-negative decimal number, simple

q If value is a symbol, later.

q isValidSymbol(symbol : String) : boolean

q decimalToBinary(toConvert : int) : String

simple! comp dest jump

using parse() method @sum

reference (if any) with the @i // if i>RAM[0] goto WRITE

corresponding binary code D=M

q Assemble the translated binary codes

q Write the 16-bit instruc=on to the

available RAM address, and complete the command’s transla=on.

Source code (example)

q Parser: Unpacks each command into its underlying ﬁelds

q Code: Translates each ﬁeld into its corresponding binary value,

q SymbolTable: Manages the symbol table

q Main: Ini=alizes I/O ﬁles and drives the show.

Proposed implementation stages

q Stage I: Build a basic assembler for programs with no symbols

q Stage II: Extend the basic assembler with symbol handling

Potrebbero piacerti anche

•  Assemblers employ ni3y programming tricks

•  Assemblers are the ﬁrst rung up the so3ware hierarchy ladder

•  An assembler is a translator of a simple language

•  Wri=ng an assembler = low-impact prac=ce for wri=ng

–  Will come up with useful methods along the way

q  If value is a non-negative decimal number, simple

q  If value is a symbol, later.

q  isValidSymbol(symbol : String) : boolean

q  decimalToBinary(toConvert : int) : String

q  Assemble the translated binary codes

q  Write the 16-bit instruc=on to the

q  Parser: Unpacks each command into its underlying ﬁelds

q  Code: Translates each ﬁeld into its corresponding binary value,

q  SymbolTable: Manages the symbol table

q  Main: Ini=alizes I/O ﬁles and drives the show.

q  Stage I: Build a basic assembler for programs with no symbols

q  Stage II: Extend the basic assembler with symbol handling