Sei sulla pagina 1di 48

TODAY’S LECTURE

 Intro to Embedded System


 Micro Controller Architecture
 CortexM Microcontroller
 Assembly programming
Embedded Systems
▪ An embedded system is any device that includes
a programmable computer but is not itself a
general-purpose computer.
▪ Since the system is usually dedicated to a
specific task, design engineer can optimize:
▪ Performance
▪ Cost/size
▪ Real time requirements
▪ Power consumption
▪ Reliability
▪ etc.
Components of Embedded Systems
▪ Analog Components
▪ Sensors, Actuators, Controllers, …
▪ Digital Components
▪ Processor, Coprocessors
▪ Memories
▪ Controllers, Buses
▪ Application Specific Integrated Circuits (ASIC)
▪ Converters – A2D, D2A, …
▪ Software
▪ Application Programs
▪ Exception Handlers
Structure of an Embedded System
Basic Comparison of PC and Embedded Structure

Embedded System:
▪ Normally the primary memory, central
processing unit and many peripheral
components including ADCs are housed on a
single chip.
▪ These chips are also called micro-controllers.
▪ All the units typically operate on same voltage,
like 5V.
PC v.s Micro-controllers

Multiple peripherals can


be replaced and
communicate with the
CPU
All peripherals
on a single
chip
Memory
▪ Memory serves as a short-term and long term
data storage space for the processor.
▪ Both the program and the data are stored in the
memory.
▪ Read-Only-Memory (ROM): Data can only be
read during execution. Non-Volatile in nature.
Can be descriptively called program memory.
▪ Random-Access-Memory (RAM): data can be
read or written during code execution. Volatile
in nature. Can be descriptively called data
memory.
Read Only Memory (ROM)
▪ The memory that allows the processor to only read its contents is
read only memory (ROM).
▪ Another attribute of ROM is that they can store information
permanently even when no power is applied. (Non-Volatile
memory).
▪ Instructions or a program code are normally stored in a ROM.
▪ The boot sequence is a piece of code, which does not vary over
time and is stored in a ROM.
▪ Types include:
▪ Programmable ROMs (PROMs)
▪ Programmed once only
▪ Erasable Programmable ROMS (EPROMs)
▪ Reprogrammed using UV light
▪ Electrically Erasable PROMS (EEPROMs)
▪ Reprogrammed using electric signals
▪ Flash memory
▪ Same as EEPROMs but allow multi byte read write operations ~ faster
Random Access Memory (RAM)
▪ The memory which allows the processor to read from
and write to its locations is named random access
memory (RAM)
▪ One limitation of RAM is that information stored in it
is lost as soon as the power applied to it is removed. (
Volatile memory)
▪ RAM is not limited by the number of read and write
cycles and is more suitable for storing data that is
updated frequently.(Scratch pad memory)
▪ Types include:
▪ Dynamic Ram(DRAM)~ 2 transistors+1 capacitor ~small size
▪ Static RAM (SRAM) ~ flip flop 4-6 transistor~ large size
A typical Microcontroller
Microprocessor Classification
▪ A microprocessor can be classified based on various
aspects.
Instruction Set Architecture (ISA)
▪ Any CPU has a set of instructions that it recognizes
and responds to.
▪ All programs are built up in one way or another
from this instruction set.
▪ One classification can be done based on the
complexity of instructions, while the other
possibility is based on the instruction operands.

RISC v.s CISC


▪ RISC: Reduced instruction set computer
▪ CISC: Complex instruction set computer.
Complex Instruction Set
▪ One approach is to build sophisticated CPUs with vast
instruction sets, with an instruction ready for every
foreseeable operation.
▪ In early computers the processors were much faster than
available memories. Fetching an instruction from
memory used to become the performance bottleneck.
▪ A single complex instruction can perform many
operations. However, complex instructions require many
processor clock cycles to complete and most of the
instructions can access memory.
▪ A program running on a CISC architecture-based machine
involves a relatively small number of complex
instructions, which can provide high code density.
Reduced Instruction Set
▪ Each instruction is of fixed length.
▪ Most instructions take the same amount of time
to execute.
▪ The simplified instructions make it possible to
execute an instruction in a single processor clock
cycle. However, some of the instructions in RISC
may require more than one clock cycle for its
execution completion.
▪ A task when run on a RISC computer requires a
relatively larger number of simplified instructions
and results in low code density.
Instruction Operand Based ISA Classification
▪ Instructions in an assembly language program in
general have multiple operands. E.g. (2+2=4)
▪ The operands for an instruction can be specified
either using memory or registers or combination
of both.
▪ The ISA classification based on how the instruction
operands are specified can be categorized in the
following groups.
▪ Memory-memory
▪ Register-memory
▪ Register-register
Instruction Operand Based ISA Classification
▪ Memory-memory:
▪ This type of ISA allows more than one operand of most
instructions to be specified in memory.
▪ Register-memory:
▪ These architectures allow one operand of an instruction to be
specified in memory, while the other operand is in CPU register.
▪ In this ISA the individual instructions execute faster, compared to memory-
memory based ISA, due to fewer memory accesses.
▪ However, this case may require more number of instructions to complete the
same task.
▪ Register-register:
▪ This ISA classification is also called load-store architecture.
▪ Direct access to the memory is not allowed to most of the instructions in this
ISA. Rather specific instructions, named as load and store instructions, are
responsible for any data movement between registers and memory.
▪ All instructions other than load and store instructions get their operands
from and store their results to registers.
▪ The execution of most of the instructions in load-store ISA is very fast, in
many cases single clock cycle.
Von-Neumann Architecture
▪ One address bus and one data bus, and the
same address and data buses serve both
program and data memories.
▪ The input/output may also be interconnected in
this way and made to behave like memory as far
as the CPU is concerned.
▪ Simple and logical, and gives a certain type of
flexibility.
▪ The addressable memory area can be divided up
in any way between program memory and data
memory.
Harvard Architecture
▪ Every memory area gets its own address bus and
its own data bus.
▪ Greater flexibility in bus size, but we pay for it
with a little more complexity.
▪ With program memory and data memory each
having their own address and data buses, each
can be a different size, appropriate to their
needs, and data and program can be accessed
simultaneously.
Harvard v.s Von-Neumann
Performance Comparison
▪ Possible ways to speedup execution are listed
below:
1. Use fewer instructions for a given program. In
other words, it is possible to improve the
execution speed by efficient programming.
2. Reduce the number of cycles for the instructions.
This is mainly dependent on the instruction set
architecture of the microprocessor.
3. Speed up the clock frequency of the
microprocessor or equivalently reduce the cycle
time. This refers to the maximum clock frequency
of the processor.
Increasing Clock Frequencies
▪ Can we increase the clock frequencies infinitely?
▪ Some physical and technical limitations forbid us to
enhance the clock frequencies.
▪ Electric Signals travels approximately the speed of light.
Lets assume the processor is running at a clock frequency
of 3 GHz.
▪ One clock cycle will be completed in 1/3 ns.
▪ In 1/3 ns electrical signal can travel approximately 10 cm.
▪ we need to ensure that a clock signal is available at
different parts of the circuit (almost) simultaneously.
▪ This can be approximately ensured by requiring the
propagation delay much less than clock cycle time. (e.g.
Factor of safety ~ 10)
Increasing Clock Frequencies
▪ Thus a factor of safety of 10 will require the size of
the circuit to be not more than 1 cm.
▪ The CPU core die size of Intel Core2 Duo Processor
[Intel(2014)] is approximately 1 cm.
▪ Now if we want to double the speed of this
processor, the size of the processor core should
become one half.
▪ If this size limitation is violated, then what happens
is that some parts of the processor circuit are
operating in the current clock cycle, while some
other parts of the circuit are still not done with the
previous clock cycle. This type of distributed system
is very hard to deal with in practical systems
Increasing Clock Frequencies
▪ Heat Dissipation is also another critical issue which
dictates the clock frequencies.
▪ Processors are made up of transistors and these
transistors can be switched ON and OFF at the clock
frequency.
▪ Each transistor dissipates power when switched from one
state to another, switching at fast speeds leads to more
power dissipation.
▪ First the heat sinks and fans are connected to the
processors and sometimes liquid nitrogen is also used in
some high end servers as well for the cooling purpose.
▪ One natural solution to this limitation is to use multiple
processors of moderate speed rather than to have one
processor of too high speed, leading to multi-core
processor architecture
Number Systems….
• We can view a number as represented by:

• 𝒅𝟐 𝒅𝟏 𝒅𝟎 . 𝒅−𝟏 𝒅−𝟐 𝒅−𝟑 = 𝒅𝟐 𝒂𝟐 + 𝒅𝟏 𝒂𝟏 + 𝒅𝟎 𝒄𝟎 + 𝒅−𝟏 𝒂−𝟏 + 𝒅−𝟐 𝒂−𝟐 +


𝒅−𝟑 𝒂−𝟑

• where ‘a’ is the number base we use for this representation


and 𝑑𝑖 is a digit in this number base: 0 ≤ 𝑑𝑖 ≤ a-1
For example, with BINARY:
𝒃𝟐 𝒃𝟏 𝒃𝟎 . 𝒃−𝟏 𝒃−𝟐 = 𝒃𝟐 𝟐𝟐 + 𝒃𝟏 𝟐𝟏 + 𝒃𝟎 𝟐𝟎 + 𝒃−𝟏 𝟐−𝟏 + 𝒃−𝟐 𝟐−𝟐

• and 𝑏𝑖 is 0 or 1
This is an excellent representation for digital systems, but poor for us to
use. Why?
• Hint: Quick, what is 1001011110102? Bigger than 100? 1,000? 10,000?
Better choice for people:
OCTAL (23 ) or HEXADECIMAL (or HEX) (24 )
Just a grouping of binary bits into groups of 3 or 4 bits.
Straightforward for people to deal with. Why?
One-to-one representation of what’s happening inside the circuit.
Number Systems….
4572

97A

6
8
Number Systems….
ARM Instruction Set Architecture
▪ Historically, the ARM processors have supported
two different instruction sets:
▪ ARM instructions that are 32 bit
▪ Thumb instructions that are 16 bits.
▪ The size of an instruction (i.e., assembly instruction)
signifies the number of bits required to store the
machine code or opcode of that instruction.
▪ Using Thumb instructions can provide higher code
density.
▪ On the other hand, using ARM instructions can
improve the processor execution performance.
ARM Instruction Set Architecture
▪ Consider a simple user program that involves six
operations to be performed of which three are
simple operations, while the other three are
complex operations.
▪ The three simple operations are supported by both
Thumb as well as ARM instruction set architectures
using three assembly instructions.
▪ On the other hand, the three complex operations
are supported by ARM instruction set and there are
three corresponding assembly instructions.
▪ However, the Thumb instruction set supports these
complex operations by using two assembly
instructions for each of these complex operations.
ARM Instruction Set Architecture
▪ We assume that each of the assembly instructions
used by the program requires one cycle for
execution. Now for the same clock speed, the ARM
ISA based processor will require six clock cycles for
execution, while it will require nine clock cycles for
a Thumb based processor.
▪ On the other hand, it will require 24 bytes(4*6) of
memory space to store the program for ARM ISA
based processor, while 18 bytes (2*(3+6)) will be
required by the Thumb ISA based processor.
▪ This simple example illustrates why ARM ISA has, in
general, better execution performance, while
Thumb ISA has higher code density
ARM and Thumb Instruction
• Thumb2 is a superset of the Thumb instruction set. Thumb2
introduces 32-bit instructions that are intermixed with the 16-bit
instructions. The Thumb2 instruction set covers all the functionality
of the Thumb instruction set.
• Thumb2 has the execution performance close to that of the ARM
instruction set and has the code density performance close to the
original Thumb Instruction Set Architecture (ISA).
• TheThumb2 technology extended the Thumb ISA into a highly code
density efficient and yet powerful instruction set that delivers
significant benefits in terms of ease of use, code size, and
performance.
Register Set
▪ Processor registers are one of the most important
components of a microprocessor core. The
registers can be differentiated based on their
functionality.
✓General-purpose registers, R0-R12
✓Stack Pointer (SP):
✓Link Register (LR):
✓Program Counter (PC):
✓Program Status Registers (PSRs):
✓Control register (CONTROL):
Register Set
Register Set
▪ General-purpose registers, R0-
R12
1. Registers R0-R7 are called low
registers and are accessible by all
instructions that specify a
general-purpose register
2. Registers R8-R12 are called high
registers and are accessible by all
32-bit instructions that specify a
general-purpose register.
Registers R8-R12 are not
accessible by any Thumb (16-bit)
instructions.
Register Set
▪ Stack Pointer (SP):
▪ Register R13 is used as the Stack Pointer (SP). A stack pointer
is a small register that stores the address of the last
program request in a stack. A stack is a specialized buffer
which stores data from the top down.
▪ In addition, it is important to remember that stack pointer is
a banked register with two copies, namely Main Stack
Pointer (MSP) and Process Stack Pointer (PSP).
▪ The Main Stack Pointer (MSP) is the default Stack Pointer
after reset, and is used when running exception handlers.
The Process Stack Pointer (PSP) can only be used in Thread
mode (when not handling exceptions)
▪ Only one copy of the stack pointer (R13) is visible and active
at a given time. This means that stack pointer logically has
one copy at any arbitrary time instant, while physically it has
always two copies.
Register Set
▪ Program Counter (PC):
▪ Register R15 is called the program counter register.
▪ PC contains the current program or instruction address that is to be
executed. This register can be modified by the program itself to control the
flow of the program.
▪ Bit 0 of this register is always 0, which ensures that the instructions are always
aligned to either word or halfword boundaries in the code memory .The usage
and allocation of general-purpose registers in the execution of a specific task
can be performed either automatically by the compiler or manually by writing
an assembly program.
▪ Link Register (LR):
▪ Register R14 is the subroutine Link Register (LR). The LR receives the return
address from the program counter register.
▪ A link register is a special-purpose register which holds the address to return
to when a function call completes.
▪ The link register contains the return address to be used by the processor,
when returning from a function or service routine.
▪ When the link register is not used for holding a return address, it can be
treated as a general-purpose register.
Register Set
▪ Processor registers are one of the most important
components of a microprocessor core. The
registers can be differentiated based on their
functionality.
✓General-purpose registers, R0-R12
✓Stack Pointer (SP):
✓Link Register (LR):
✓Program Counter (PC):
✓Program Status Registers (PSRs):
✓Control register (CONTROL):
Memory Organization
▪ When we refer to memory locations by
address, we only do so in units of bytes,
halfwords or words
▪ • Words
▪ 32 bits = 4 bytes = 1 word = 2 halfwords
▪ In diagram to right, we have two words:
▪ At addresses 0x20000000 and
0x20000004
▪ – Can you store a word/halfword
anywhere?
▪ NO.
▪ A word can only be stored at an address
that's divisible by 4. A halfword is stored
at an address that’s divisible by 2.
▪ Memory address of a word/halfword is
the lowest address of all four/two bytes
in that word.
Endianness
▪ Big Endian:
▪ address of most significant byte = word
address
▪ Little Endian:
▪ address of least significant byte = word
address
▪ ARM is Little Endian by default.
Little Address, Little Byte
▪ One hex digit takes how many bits to
store? 4
ARM Instructions, Major Groups
▪ Data Movement
▪ Load
▪ Store
▪ Move
▪Cortex-M assembly programming ▪ Arithmetic and Logic
multiple assembly instructions can ▪ Add and Subtract
be divided into following groups ▪ Multiply and Divide
▪ Shift and Rotate
▪ Compare and Branch
▪ Compare, Test
▪ Branch
▪ Miscellaneous
▪ Wait for events
▪ Interrupts
▪ Many others
ARM Assembly Language

▪ Label is a symbolic reference to this instruction’s address in


memory.
▪ Mnemonic represents the operation to be performed.
▪ The number of operands varies, depending on each specific
instruction. Some instructions have no operands at all.
▪ operand1 is typically the destination register, and operand2 and
operand are source operands.
▪ operand2 is usually a register.
▪ operand3 can represent many different things, depending on
instruction.
▪ Everything after the semicolon is a comment, which is ignored
by the assembler. We will discuss this in more detail later. This is
just to get us going.
ADD and MOV Data Processing Instructions
▪ ADD → (addition) and MOV→ (move)

▪ Optional suffix S in ADDS instruction is responsible for


updating the condition code flags in the application
program status register.

▪ MOV instruction is used for simple data transfers within


the processor.
▪ The MOV instruction cannot be used for data transfers
between processor registers and memory
Load/Store Instructions
▪ Load Register : LDR rt, [rs]

▪ The load register, LDR, instruction is used to transfer data from


memory to the processor register .
▪ Fetch word from memory and put into register

▪ STR, instruction is used to transfer data from processor register


to memory.

▪ The maximum permissible immediate value for MOV instruction


is either 8-bit for 16-bit encoding or 12-bit for 32-bit encoding. On
the other hand, LDR instruction allows any arbitrary 32-bit
immediate value that can be loaded to the specified register.
Data Movement Instructions
▪ Memory Access:
▪ To access memory, first establish a register pointing to
the object. This pointer (called an index) is then used in
a Load (LDR) or Store (STR) instruction (or both).
▪ Load word from memory into a register
▪ E.g. Somehow make [r5] = 0x82000020
▪ LDR r2, [r5] ;will place the 4 bytes starting at address
0x82000020 into register 2: [0x82000020:23] → r2
▪ Store word from register into memory
▪ STR r3, [r5] ;will place register 3 contents (4 bytes) into
addresses 0x82000020:0x82000023 (see previous example)
Indexed Load Example
▪ Address of the data in memory is in a register shown below with
▪ [R1] = 0x20000004 ,
▪ [PC]=0x144, and [0x144:0x145]= 0x6808
▪ [0x20000004:0x20000007] = 0x12345678
▪ Brackets below denote use of register as index to reference
memory
▪ LDR R0 , [R1] ; R0 = value pointed to by R1
Data Processing Instructions
▪ Specifically, the data processing instructions
have been divided into the following subgroups.
▪ Shift, rotate, and logical instructions
▪ Basic math instructions
▪ Data movement instructions
▪ Bitfield instructions
▪ Test and compare instructions
▪ Saturating instructions
Some Other Instructions

• Stack Memory Access


– Push
– Pop
• Branch Instructions
– For Loop
– While Loop
– Switch Cases
– If then condition
ANY QUESTIONS?

Potrebbero piacerti anche