Efficient Coding of Embedded Software

Page 1 of 46 Date
Version 1.2
Efficient coding of Embedded 19.06.2006
Software
DS/EES
Author: Telephone
Dieter Röder Fe-46043
Efficient coding
of
Embedded Software
Efficient Coding of Embedded Software
Content
Content........................................................................................................................... 2
Abstract .......................................................................................................................... 3
1. Standard data types ................................................................................................ 4
2. Order of variable ..................................................................................................... 5
2.1. Examples: Order of variables .............................................................................. 5
3. Initialization of variables .......................................................................................... 5
4. Economical use of measurement points ................................................................. 6
5. Use of bool & _bit.................................................................................................... 6
5.1. Example: Use of static bool: ................................................................................ 7
5.2. Example: Waste of RAM with static bool............................................................. 7
6. Locating rules.......................................................................................................... 8
7. Static keyword & inline functions............................................................................. 9
7.1. Example: Inline functions & macros .................................................................. 10
8. Passing return value ............................................................................................. 10
8.1. Example: Passing return values ........................................................................ 10
9. Passing function parameter .................................................................................. 11
9.1. Examples: Passing Function Parameters.......................................................... 11
10. Loops................................................................................................................. 11
10.1. Example: Size of loop counter variable.......................................................... 12
10.2. Example: Local copy of variables and constants ........................................... 12
10.3. Example: Break conditions of loops............................................................... 12
10.4. Example: Pointer arithmetic in loops.............................................................. 12
11. AND / OR query in IF statement........................................................................ 13
11.1. Examples: AND OR query in IF statement ................................................... 13
12. Switch case ....................................................................................................... 13
12.1. Example: Switch case.................................................................................... 14
13. Redundant code or copy & p(w)aste ................................................................. 14
13.1. Example: Copy and p(w)aste......................................................................... 14
14. Use of Mem functions -copy, -set, fill................................................................. 15
15. Local copy of data ............................................................................................. 15
15.1. Examples: Local copy of variables and constants ......................................... 16
16. Function calls..................................................................................................... 16
17. DSP functionality ............................................................................................... 17
18. Programming examples..................................................................................... 18
18.1. Order of variables .......................................................................................... 18
18.2. Use of measurement points ........................................................................... 19
18.3. Passing return values .................................................................................... 20
18.4. Passing Function Parameters........................................................................ 20
18.5. Size of loop counter variable.......................................................................... 22
18.6. Break conditions of loops............................................................................... 24
18.7. Pointer arithmetic in loops.............................................................................. 26
18.8. Local copy of variables and constants ........................................................... 27
18.9. AND OR query in If statement ....................................................................... 29
18.10. Switch case.................................................................................................... 31
18.11. Copy and p(w)aste......................................................................................... 35
18.12. Waste of RAM with static bool ....................................................................... 37
DS/EES1-Roe: Efficient Coding of Embedded Software

© All rights rest with Robert Bosch GmbH, including patent rights. All rights of use of reproduction and publication rest
with R. B. GmbH Page 2
18.13. Not allowed use of static bool ........................................................................ 38

18.14. Inline functions & macros............................................................................... 38
19. Overview of all rules .......................................................................................... 40
20. Tricks and tools ................................................................................................. 45
20.1. How to generate an assembler dump. ........................................................... 45
20.2. WinRtm .......................................................................................................... 45
21. Useful Links ....................................................................................................... 46
Abstract
On most of the embedded systems the resources, which means memory and runtime,
are strictly limited. This has different reasons but one of the most important are the
costs, which are in focus to keep an ECU competitive. Therefore it is mandatory to
develop a maximum of functionality with a minimum of resources. Beside RAM and
code also an important resource is the runtime. A real time system has a fixed
scheduling where the operation has to be done in a specified time frame.
ECU resources are like a spiders web. If you pull on one angle, the other angles move
also. What does this mean? For example, if we want to save runtime we have to spend
memory or vice versa, but in most of the cases a smaller code leads to fewer runtime.
This manual should give software developers some hints and tips how to program
resource optimized. Nevertheless, the developer itself has the deepest knowledge of
the requirements and therefore the responsibility to program as resource optimized as
possible.
The chapters are divided into 4 parts: Explanation, common rules, EDC17 specific
rules and a link to one or more examples.
Most of the examples are out of the EDC software, some are build specially for better
overview. For own created examples the software build environment from EDC17 with
GNU compiler version 3.3.6 is used.
Legend of the signs:
7 This sign indicates that the optimization gives a benefit for runtime.
7 This sign indicates that the optimization gives a benefit for code
memory.
This sign indicates that the optimization gives a benefit for RAM
7 memory.
This sign indicates that the following chapter discusses EDC17 specifics
and rules.

1. Standard data types
Standard types
bool Boolean type (represented as uint8!!)
_bit Single bit (only global or static variables)
bit8, bit16, bit32 Bit field types
uint8, sint8 8-Bit types
real32 Float types
uint generic type (size of register)
sint generic type (size of register)
RULES:
• Use global/static bool only if really necessary e.g. high frequently called 7
variables.
• In the generic type (register size) defined variables obtain optimal performance
at access. Inside a function use generic type/register size where ever possible, 7
cast only when really necessary e.g. handover to a function, at the end of a
function for return value…
7
• Temporary variables should be declared in generic type, for global and static
the size has to be considered too.
EDC17 Specifics & Rules
Info: Register size of TriCore is 32 bit.
RULES:
• Since TriCore does not directly support real64, this type should not be used
to save memory and runtime.
• Use _bit instead of bool for RAM variables

2. Order of variable
Explanation:
The arrangement of variables respectively constants in structures are done by the

following rules:
32 Bit variables are located at an address divisible by 4
16 Bit variables are located at an address divisible by 2
It is recommended to sort by size e.g. first all pointers, then all 32-Bit-variables, then all 7
16-Bit-variables and then all 8-Bit-variables.
Note: Pointers are 32 Bit sizes. e.g. also uint8*
RULE:
• Sort global and static variables by size from huge to small.
2.1. Examples: Order of variables
3. Initialization of variables
Explanation:
Static variables are initialized at start up with "unsigned 0" or in case of a pointer with
"NULL". This is Ansi-C standard and also described in ISO/IEC 9899:1999. 7
RULES:
• Do not initialize static variable and pointer when init value is 0.
7
• Do not initialize local variables if the first access is a write access.
Every variable which have to be initialized needs 12 Byte of Code and also runtime in
the init phase.
RULES:
• Declare static/global variable, where the initial values are known on start up, at
the location where they are defined.
e.g. static uint16 my_staticExampleVar = 10;
• For global/static variables where the variable initialized via define or constant
value use the Macro initValueRAM.

• Hint: Functionality of Macro initValueRam:

If the initial value is != 0 the code will be expanded, otherwise no code will
be generated.
4. Economical use of measurement points

7
Since measurement points are only needed in the development phase, they should be
used economically. Measurement points could not contribute anything to the
functionality of the software. Every measurement point costs 12 Bytes of code and the 7
therefore needed runtime and also, when there is no overlay mechanism, its size of
RAM.
(e.g. TC1766 not possible because of bug in µC). 7
RULES:
• Due to overlay mechanism it is not allowed to read from measurement points.

The location could be in a non physical memory area and so the result is
undefined.
• Use measurement point as economical as possible.
• Put as much information as possible in one measurement point (use bit

strings).
5. Use of bool & _bit

Explanation:
From Ansi-C point of view, a boolean variable is handled as an integer. A variable

defined as bool must only have the value TRUE or FALSE, no other values.
Also no arithmetical operations should be executed. Since the type _bool is only
specified in C99 Standard, most of the compiler handles the type bool as an integer
and gives no warnings if other values then TRUE/FALSE are used or arithmetical
operation are done with type bool.
For good programming style and readability of the code, the developer has to take care
on that.
RULES:
• Boolean variables must only have the values TRUE or FALSE.

7
• No arithmetic operations allowed with boolean type.

• No pointer operations on/with bit variables.
• Avoid the use of static bool, try to use a bit string with several information in
one variable.
In EDC the type bool is defined as unsigned char (uint8).

Defined values: FALSE = 0; TRUE = !FALSE.
RULES:
• Single bit variables can be used to store binary states efficiently.
• Bit variables can only be located in normal RAM and protected RAM, as they
require absolute addressable memory.
• It is allowed to use bits as messages.
• Bit variables must not be used as constants, calibration values, local variables
or inside a structure.
• Use _bit for:

o conditional jumps, if/else (2 instructions including load)
o immediate written 0/1 e.g.: Bit = 0; (1 instruction including store)
o saving RAM
• Do NOT use -Bit for

o copying Bit to Bit e.g.: Bit1 = Bit2; (5 instructions, jump required!)
o writing a local variable (4 instructions, jump required!)
o read into a local variable (2 instructions)
o local variables -> use bool
5.1. Example: Use of static bool:
5.2. Example: Waste of RAM with static bool
Hint: Due to the instruction set, an optimal usage is limited. This has to be
considered carefully, when converting binary state variables to bit data type.

6. Locating rules
Explanation:
Most of the Microcontrollers have different memory sections with different access
times. This leads to variable runtime depending in which memory area the code or the
variables are located. High frequently called functions e.g. libraries, OS, hardware
encapsulation, must be located in the fastest flash memory. This do also apply for the
variables used in this functions. They must be located in the fastest RAM.
Definition of high frequently called in EDC17:

• General: Processes and variables with call frequency ≥ 500 Hz >= 2ms tasks
and n-sync.
Memory areas:
7
Code: There are 2 respectively 3 memory areas:
Internal flash, Scratch Pad RAM (SPRAM) and if available external
flash.
RAM: DMI, DMU incl. Overlay RAM, SPRAM depending on the derivate.
SPRAM: Location for High frequently called function with less code and high
runtime and also Interrupt vector table, OS. A list of possible functions is
available in intranet at Resource team page.
Extern. flash: Only for less frequently called functions e.g. 100ms task processes or
for processes with less runtime but high code usage.
RULES:
• Code: Locate in internal flash:

- Service library e.g. SrVB_XXX
- Operating system
- Hardware encapsulation e.g. Adc_Get
- n-sync, 1ms, 2ms called processes
- Interrupt called processes e.g. Uart
• Code: Locate in SPRAM
- OS (small runtime critical)
- Interrupt vector table
- high frequently called functions with less code high runtime (list available)
• Code: Locate in ext. flash:

- nothing! …but if necessary
- Init processes
- less frequently called and process with less runtime and high code usage.
• Variables: Locate in fastest RAM (DMI)

- Variables used from process mentioned above in "Internal flash and SPRAM"

- Variables with call frequency ≥ 500 Hz

- in special cases with very critical runtime also maps and curves e.g. CSC-P
- Stack and CSA
• Variables in low speed RAM:

- less frequently called e.g. 100 ms processes
- measurement points
• Variables in SPRAM:
- Only measurement points with uint16 and uint32 size. Uint8 is NOT possible.
Hint: in EDC17 with ~2,7% of code we generate ~ 45% of the load!
7. Static keyword & inline functions

Explanation:
There are sometimes small uncertainty about the keyword static. Therefore some
explanations and examples.
The keyword static has two different meanings. Inside of C-file the word defines the
created object as static, means only visible for functions inside the same C-file.
This is similar to data objects too.
A variable declared inside the function is only visible inside this function.
A variable declared inside the module (C_file) is visible for all functions inside the
module but not for any other module.
A function which is only called once must be an inline. This could be done via an
explicit declaration as inline or by declare it as static. A static declaration gives the
compiler the possibility to inline the function when only called once.
RULE:
• Declare functions, which are only used in the same C-file, as static.
In EDC17 the keyword static has also an important impact on the define of inline
functions.
If the attribute "static" is used on declaration and definition (static inline functionname)
no callable instance will be generated, as long as the size of the function is under the
compiler inline limit.
static inline ret_type function_name (parameter) {code}
If an inline function is declared with the "__attribute__((always_inline))" the function will

always be inlined independent of its size. 7
static inline ret_type function_name (parameter) __attribute__
((always_inline)) {code}
Examples for inline declaration

callable use in
declaration instance EDC17 inline until
static inline functionname() no yes inline limit (compiler)
static inline functionname()
__attribute__ ((always_inline)) no yes always
inline functionname() yes no inline limit (compiler)
extern inline functionname() no no inline limit (compiler)
Hint: It makes sense to implement a functionality as inline function, when the

executed code for this functionality is smaller or equal the overhead which is
needed for a call and return of a sub function. There runtime and RAM could be
saved. An inline function should not be used when the function is inlined very
often and so the code size increases. In that case the runtime benefit could be
secondary. In other cases e.g. functions in fast raster (1ms, 2ms, or n-
synchronous raster) it could be that the runtime advantage is more important and
the disadvantage in code size could be acceptable. In some cases it makes also
sense to do variant handling via code inline. This rules are valid for macros also.
It is better (more safely) to use inline functions instead of macros because inline
functions are type and C-rules checked by the compiler, macros are only expanded
from the pre-processor.
7.1. Example: Inline functions & macros
8. Passing return value

Explanation:
Simple mathematical calculations can be executed directly at the return and have not
to be type cast into another type.
RULES:
7
• Do simple mathematical calculations directly at the return expression.
• The readability has a higher importance than the optimization! Very complex 7
operations must be done without the optimization.
8.1. Example: Passing return values

9. Passing function parameter

Explanation:
Generally functions have no limitation on the number of arguments (only the stack
size).
But from the view "embedded software" there is a difference. All of the microcontrollers
have a limited set of registers which is used for parameter passing. If these registers
are filled with handing over parameters the rest will be copied to the stack. The access
on parameters stored in the registers is the most efficient. This will be discussed later
There are 2 possibilities for passing parameters to functions: "Call by value" or "Call
by reference"
RULES:
• From the resource point of view keep the number of parameters as small as
possible. In a lot of cases it is better to use an array or structure. These has
also be considered carefully when a function interface gets designed. 7
• For local variables "Call by value" is most efficient, unless there is more than 1
return value needed. 7
• For global variables "Call by value" is most efficient, unless the variable should
also be written.
• For global structures "Call by reference" is most efficient.
• Huge local structures should be avoided. Structures with 2 elements or less can
be handled like one normal variable.
The Infineon Tricore has 8 registers for parameter passing, 4 for values D4-D7 and 4
for addresses (pointers) A4-A7. More arguments will be copied, as mentioned above,
to the stack.
9.1. Examples: Passing Function Parameters
10. Loops
Explanation:
Since loops mostly executed multiple it is necessary to do runtime optimised coding.

This means to program the code as small as possible which also saves code. Most of
the modern CPU have a special instruction respectively a pipeline (see. chapter HW)
for loop instructions. It has to be checked if this instruction is used. The main criteria is
to declare the loop counter variable in generic type.

RULES:
• The count / loop instruction should be used wherever possible.
• Not relevant computations in loops should be moved outside the loop body.
7
Avoid something like this:
CSam_loopCt_u16[stMskSelect_C*2+1] = …
7
• Declare loop counter variable in generic type.
• Inside the loop no access to data stored in flash, use temp. variable in front of
the loop. For copy constant arrays (flash) also memcpy functionality could be
used. It has to taken care that the size of local array is not greater than 30
bytes that the stack size keeps in limit.
• Keep the break condition as easy as possible.
• Use a constant (number) or define for break condition, no ext. variable or

calibration constant, no function calls.
• Use pointer arithmetic for fast loop
• Do not always try to solve the requirement with the smallest C-Code. The
smallest C-Code do not always leads to the optimal result. This has to be
considered carefully.
10.1. Example: Size of loop counter variable
10.2. Example: Local copy of variables and constants
10.3. Example: Break conditions of loops
10.4. Example: Pointer arithmetic in loops
Variable size for loop counter where the loop instruction will be used:
uint/sint Size jmp/loop

Uint 8 Jmp
Sint 8 Jmp
Uint 16 Jmp
Sint 16 Loop
Uint 32 Loop
Sint 32 Loop

By using sint16 the loop instruction will also be used, but caution, the extr instruction is
necessary anyway. -> use uint/sint.
11. AND / OR query in IF statement

Explanation:
An AND/OR query should be held as short as possible. This saves code and runtime.
7
Rules:
• Merge AND and OR queries with a mask constant wherever possible

7
• Define constants bitwise (1,2,4...)
• In case of OR query: Put the most probable case in the first statement
• In case of AND query: Put the most improbable case in the first statement
11.1. Examples: AND OR query in IF statement
12. Switch case

Explanation:
Switch case is very useful for the programmer to avoid deeply nested statements.
Different cases could be handled very easily with a switch case statement and also it
has a good readability. Switch case is the preferred solution for a state maschine.
Nevertheless a switch is not always the best solution. Most of the compilers create a
pointer table for fast access. This is effective for runtime but not for code consumption
when there are gaps between the cases.
RULES:
• The initial (first) case should be case 0.
• A switch case should only be used if the cases follow each other e.g. case 0,
7
case 1.
• The minimum of cases is 5, otherwise an optimized if/else statement should be

used.

Overview:
Function type / number of If- Switch- constant

cases Statement statement table
less cases "++" - -
lot of cases, following each other,
allocation to the same variable - - "++"
lot of cases, following each other - "++" -
lot of cases, gaps between the
cases e.g. case 2 case 9 .... "++" "+" -
12.1. Example: Switch case
13. Redundant code or copy & p(w)aste

Explanation:
As already mentioned before, the memory of an embedded system is limited.

Therefore the developer has to take care to program maximum functionality with a
minimum of code. An often seen bad habit is to generate multiple needed functionality
via copy and paste. This is not very effective from the code consumption point of view.
The programmer has always to think about if the code could be merged in a function or
a loop.
RULES:
• Do not generate multiple needed functionality via copy and paste.
• For multiple needed functionality use sub functions or loops. Sometimes also
loops inside the sub functions are helpful. 7
• Consider if all of the code have to be executed/needed in every configuration or
if the code could be conditional executed /generated.
13.1. Example: Copy and p(w)aste

14. Use of Mem functions -copy, -set, -fill

Explanation:
For effective copy actions it could be helpful to use fast functions like memcpy().
void *memcpy(void *dest, const void *source, size_t number_of_bytes); 7
The memcpy() function operates as efficiently as possible on memory areas. It does
not check for overflow of any receiving memory area. Specifically, memcpy() copies n 7
bytes from memory area source to destination. It returns a pointer to destination.
Source and destination area may not overlap. To get the number of bytes to be copied,
the use of macro sizeof() is helpful. Sizeof will return the number of bytes reserved for
a variable or data type.
This rules are also valid for a memfill and memset functions.
void *MemFill(void *dest, uint8 pattern , sint32 n)
void MemSet(uint32* xDest_pu32, uint32 xPattern_u32, uint32
numBytes_u32)
In EDC17 there are assembler optimized functions:

SrvB_MemCopy (void *dest, const void *source, size_t number_of_bytes)
The above reasons are also valid for MemFill and MemSet functions:
void *SrvB_MemFill(void *dest, uint8 pattern , sint32 n)
void SrvB_MemSet(uint32* xDest_pu32, uint32 xPattern_u32, uint32

numBytes_u32)
15. Local copy of data

Explanation:
On most of the microcontrollers the data access to the ROM memory is slow compared
with an access to the RAM. Therefore it is necessary to reduce the number of access
to slower memory to a minimum. For data which is used more then once, it makes
sense to create a local copy as access to RAM is much faster. This does also apply to
global variables, where a multiple store instructions could be avoided.
RULES:
• Make a local copy when a constant or global variable is used more than once in
the function. This is also necessary to get data consistency inside the function.
7
In EDC17 there are a lot of access to calibration data stored in the flash: Constants
(_C) and constant arrays (_CA). For calibration data every access initiates a read from
flash memory, means the value could not be held in a register.,

In most of the cases the access to constant is random, which causes initial wait cycles
(up to 14 cycles) on every read. In some cases it makes sense to copy also constant
arrays to a local array via fast memcpy() function.
RULES:
• Use local copies also for messages if used more than once in the same
function.
• Use local copies also for messages if used more than once in the same
function.
• Make local copies also for constant array (_CA) if the values used multiple.
15.1. Examples: Local copy of variables and constants
16. Function calls

Explanation:
As known, (sub)functions are invoked via a call. The CPU has to save the registers
and to load the address of the called function. There are different implementation in
the controllers how to do that. Some save the registers to the stack, others have an
own context save area which can be accessed very fast. Also the load of the program
code is different. If there is no cache memory implemented, the code have to be
loaded direct from the flash. With a cache the code can be loaded from the cache (in
case of a cache hit) or loaded also from flash to cache. Unfortunately the CPU loads
the whole cache line which takes the initialize access time plus the time to load the line
which depends on the line size and the bus width. This is an advantage when the code
of the cache line could be executed, but not if one call or jump follows the other.
This is what should be explained here.
Bad example:
uint func_A (uint);

uint func_B (uint);
uint func_C (uint);
uint func_D (uint);
uint func_A (func_B (func_C (func_D (uint))));
RULES:
• Avoid deep nested function calls.
• No 1 to 1 connection to a sub function (should be inlined as static).

7

The TriCores Metis and Leda CPU has a context save area and a code cache with
16kB respective 8kB. Here the consequence of a deep nested call in EDC17, in
assumption that the code is not yet in the cache (cache miss).
This are the actions what a TriCore is doing:

There are 46 cycles and 4 context saves till the first effective instruction is executed.
6 cycl 4 cycl 2 cycl

init load exec1
access cachel. instr.
init load exec1
call func_B init load exec1
6 cycl 4 cycl ?
call func_C init load exec1
call func_D
func_A (...) func_B (...) func_C (...) func_D (...)
17. DSP functionality
The TriCore family has a very high performance digital signal processor on board. It is
very effective for programming fast routines with a maximum of performance. They
have to be implemented in assembler language.
Here some examples for what the DSP functionality could be used:
Generator:
e.g. Complex wave generator
Scalars:
e.g. 16 bit signed multiplication
e.g. Complex multiplication…..
Vectors:
e.g. Vector multiplication
e.g. Vector square difference…

Filters:
e.g. Magnitude square
e.g. FIR …
Examples of the assembler code and a detailed description can be found in the
Infineon DSP optimisation guide.
\\si9346\carpu$\extern\Info\user manuals\Application Notes\Application_Notes\General
18. Programming examples
18.1. Order of variables
bad:
uint8 XZY_xTestval1_u8;
/* Gap 3Byte */
uint16* XYZ_adrData_pu16;
uint8 XYZ_xTestval2_u8;
/* Gap 1Byte */
uint16 XYZ_stMachine1_u16;
correct:
uint16* XYZ_adrData_pu16;
uint16 XYZ_stMachine1_u16;
This is mainly important for structures which are often used multiple!
wrong:
typedef struct
{
uint8 xTestval1_u8;
/* Gap 3Byte */
uint16* adrData_pu16;
uint8 xTestval2_u8;
/* Gap 1Byte */
uint16 stMachine1_u16;
} XYZ_Header_t;
Memory Layout:
8-bit
32-bit
8-bit 16 -bit

A declaration like this:
XYZ_Header_t struct_1;
is a waste of 16 Byte.
correct:
typedef struct
{
uint16* adrData_pu16;
uint16 stMachine1_u16;
uint8 xTestval1_u8;
uint8 xTestval2_u8;
} XYZ_Header_t;
Memory Layout:
32-bit
16 bit 8-bit 8-bit
This is also an advantage when you have to initialize the struct via a memcpy.
Hint: The EDC17 GNU Compiler/Linker could handle the alignment for
normal variables but not for structures. For future compatibility and to
avoid gaps in structures it is recommended to sort all the variables.
18.2. Use of measurement points

static bool stRng1;
...
static bool stRng11;
XXX_stRng1_mp = stRng1;
...
XXX_stRngSpo_mp = stRng11; 7
What is the problem:
• 11 variables with boolean size will be written as measurement points. This
generates the code and runtime for write 11 Mp.
Solution:
• By defining stRng as uint16 and handle it like a bit string, also XXX_stRng_mp,
the measurement point could be written within 1 instruction. This saves in this

case 120 Bytes code 9 Byte RAM and runtime depending on the location of the
measurement points.
18.3. Passing return values

A simple example for the illustration, here in a mixed code representation. A sint8-type
is handed over, processed and returned as sint16-type.
sint32 CSam_passing_Return(sint8 value_s8)

{
sint16 variable1_s16=2;
/* mathematical operation */
variable1_s16 = variable1_s16 + (sint16)value_s8;
59e: 8b 24 00 20 add %d2,%d4,2
5a2: 37 02 50 20 extr %d2,%d2,0,16
5a6: 00 90 ret
return ((sint32)variable1_s16);
}
Direct calculation at passing return value:
sint32 CSam_passing_Return2(sint8 value_s8)

{
sint16 variable1_s16=2;
/* mathematical operations */
return(variable1_s16 + (sint16)value_s8);
5a8: 8b 24 00 20 add %d2,%d4,2
5ac: 00 90 ret
}
Here the extr. operation could be saved.
RULES:
• Simple mathematical calculations can be executed directly at the hand over of the
return value.
18.4. Passing Function Parameters
Here an example of a function with call by value. The first mathematical operations
could be done directly (see registers d4 and d5). For the next calculation a load from
stack is necessary (ld.w %d15,[%sp]0) which cost additional code and runtime.

sint32 CSam_cbv ( sint32 variable1_s32, sint32 variable2_s32,

sint32 variable3_s32, sint32 variable4_s32,
sint32 variable5_s32, sint32 variable6_s32,
sint32 variable7_s32, sint32 variable8_s32)
{
sint32 var1_s32;
var1_s32 = variable1_s32;
var1_s32 = var1_s32+variable2_s32;
5c: 0b 54 00 20 add %d2,%d4,%d5
60: 58 00 ld.w %d15,[%sp]0
62: 42 62 add %d2,%d6
64: 42 72 add %d2,%d7
66: 42 f2 add %d2,%d15
68: 58 01 ld.w %d15,[%sp]4
6a: 42 f2 add %d2,%d15
... return var1_s32;
}
An example of call by reference:
sint32 CSam_cbr(sint32 *variable1_s32,sint32*variable2_s32,

sint32 *variable3_s32, sint32 *variable4_s32,
sint32 *variable5_s32, sint32 *variable6_s32,
sint32 *variable7_s32, sint32 *variable8_s32)
{
sint32 var1_s32;
var1_s32 = *variable1_s32;
var1_s32 = var1_s32+*variable2_s32;
c6: 4c 50 ld.w %d15,[%a5]0
c8: 54 42 ld.w %d2,[%a4]
ca: 42 f2 add %d2,%d15
...
The values have to be loaded first before they could be added. The behavior is the
same if the parameter is an address to a structure or union. The advantage in
handover structure is, that the access happens over one register and an offset.
So, 4 addresses could be held in the registers a4-a7.
sint32 CSam_struct_ptr(const data *version)

{
sint32 var1_s32;
var1_s32 = version->variable1_s32;
var1_s32 = var1_s32+version->variable2_s32;
13a: 4c 41 ld.w %d15,[%a4]4//<- baseregister+offset
13c: 54 42 ld.w %d2,[%a4]
13e: 42 f2 add %d2,%d15
140: 4c 42 ld.w %d15,[%a4]8

An example of quite a lot (to much!) parameters
uint8 XXX_HBrgErrHndlr(
const uint numIC ,
const uint8 swtSelOvrCurrErr,
const uint8 stPsDiaDisbl,
sint16 *rPs
const XXX_HBrgPar_t *HBrgParStruct,
XXX_HBrgStat_t *HBrgStatStruct,
XXX_HBrgLoc_t *HBrgLocStruct,
DSM_DFCType DFC_OvrCurr,
DSM_DFCType DFC_TempOvrCurr,
DSM_DFCType DFC_OvrTemp,
DSM_DFCType DFC_UndrVltg,
DSM_DFCType DFC_ShCirOvrLd,
DSM_DFCType DFC_ShCirBatt1,
DSM_DFCType DFC_ShCirBatt2,
DSM_DFCType DFC_ShCirGnd1,
DSM_DFCType DFC_ShCirGnd2,
DSM_DFCType DFC_OpnLd,
sint32 dT
)
RULES:
also be considered carefully when a function interface gets designed.
also be written.
• Only values which are changed in the function should be handover via
reference.
18.5. Size of loop counter variable
Use of uint 8 for loop counter:
uint16 CSam_loops(uint32 cnt2_u32)

{
uint8 cnt_u8;
for(cnt_u8=0;cnt_u8 < 50;cnt_u8++)

0: 82 01 mov %d1,0
{
CSam_loopCt_u16[cnt_u8] = 1;
02: 8f 21 20 00 sha %d0,%d1,2
06: 91 00 00 20 movh.a %a2,0

0a: 9a 11 add %d15,%d1,1

0c: d9 22 00 00 lea %a2,[%a2]0 <0 <CSam_loops>>
10: 37 0f 68 10 extr.u %d1,%d15,0,8
14: 01 20 00 f6 addsc.a %a15,%a2,%d0,0
18: 82 1f mov %d15,1
1a: 68 0f st.w [%a15]0,%d15
1c: da 31 mov %d15,49
1e: 7f 1f f2 ff jge.u %d15,%d1,2 <CSam_loops+0x2>
}
}
Here the loop optimization could not be used. There are additional 2 cycles for the
"jge.u" and 1 cycle for the extract "extr.u" instruction and 1 cycle for the mov %d15,49.
This means over 30% more runtime for this loop compared to optimised version.
Optimised: Use of uint for loop counter:

uint cnt_uint;
for(cnt_uint=0;cnt_uint < 50;cnt_uint++)
22: 82 01 mov %d1,0
24: c5 02 31 00 lea %a2,31 <CSam_loops+0x31>
{
CSam_loopCt_u16[cnt_uint] = 1;
28: 8f 21 20 f0 sha %d15,%d1,2 no extr.
2c: 91 00 00 30 movh.a %a3,0 instruction
30: c2 11 add %d1,1
32: d9 33 00 00 lea %a3,[%a3]0 <0 <CSam_loops>>
36: 10 3f addsc.a %a15,%a3,%d15,0
38: 82 1f mov %d15,1
3a: 68 0f st.w [%a15]0,%d15
3c: fc 26 loop %a2,28 <CSam_loops+0x28>
}
The loop instruction has the advantage that it could be executed with 1 cycle latency at
repeat.
Inside the loop there should be no access to data stored in flash or to global variables.
Access times to flash are much higher then on temp. variable stored in register or
stack. Calibration constants will be loaded for each access which causes additional
code and runtime.
uint cnt_uint;
50: 82 00 mov %d0,0
{
CSam_loopCt_u16[cnt_uint] = sum;
56: 91 00 00 30 movh.a %a3,0
5a: 8f 20 20 f0 sha %d15,%d0,2
5e: c2 10 add %d0,1
60: 91 00 00 40 movh.a %a4,0
64: d9 33 00 00 lea %a3,[%a3]0
68: 19 41 00 00 ld.w %d1,[%a4]0 <0 <CSam_UseStaticVar>>
6c: 10 3f addsc.a %a15,%a3,%d15,0
6e: 68 01 st.w [%a15]0,%d1
70: fc 23 loop %a2,56 <CSam_loops+0x46>

18.6. Break conditions of loops
Bad example:
while (numErr < NUMERR_C || numErr < STNUM_C || state < STATE_C)
{ ....
What are the problems

• Long OR query has to be executed every loop cycle
• Access to constants read every cycle
Solution:
• Try to merge the query: numErr is double inside
• Make local copies of the _C constants
Bad Example with global variable

for(cnt_uint=0;cnt_uint < Max_value_uint32;cnt_uint++)
Bad Example with Calibration data constant

for(cnt_uint=0;cnt_uint < Max_value_C;cnt_uint++)
Rule:
• Do not use global variable or constants as break condition, make a local copy
first
• Values which defined with "#define Max_value 50" could be used without
difficulty
Bad Example:
while (a > b || c > d || e < f)
23e: 91 00 00 20 movh.a %a2,0
242: 91 00 00 30 movh.a %a3,0
246: 19 20 00 00 ld.w %d0,[%a2]0
24a: 19 3f 00 00 ld.w %d15,[%a3]0 <0CSam_UseStaticVar>
24e: 3f 0f 16 80 jlt.u %d15,%d0,27a <CSam_loops+0x1f0>
252: 91 00 00 f0 movh.a %a15,0
256: 19 f0 00 00 ld.w %d0,[%a15]0<0<CSam_UseStaticVar>
25a: 91 00 00 f0 movh.a %a15,0
25e: 19 ff 00 00 ld.w%d15,[%a15]0<0<CSam_UseStaticVar>
262: 3f 0f 0c 80 jlt.u %d15,%d0,27a <CSam_loops+0x1f0>
266: 91 00 00 f0 movh.a %a15,0
26a: 19 f0 00 00 ld.w %d0,[%a15]0<0<CSam_UseStaticVar>
26e: 91 00 00 f0 movh.a %a15,0
272: 19 ff 00 00 ld.w d15,[%a15]0<0<CSam_UseStaticVar>
276: 7f f0 26 80 jge.u %d0,%d15,2c2 <CSam_loops+0x238>
27a: 19 23 00 00 ld.w %d3,[%a2]0
27e: 19 32 00 00 ld.w %d2,[%a3]0
{
CSam_loopCt_u16[cnt_uint++] = 1;
282: 8f 21 20 f0 sha %d15,%d1,2
286: 91 00 00 20 movh.a %a2,0

28a: c2 11 add %d1,1

28c: d9 22 00 00 lea %a2,[%a2]0<0 <CSam_UseStaticVar>>
290: 10 2f addsc.a %a15,%a2,%d15,0
292: 82 1f mov %d15,1
294: 68 0f st.w [%a15]0,%d15
296: 3f 32 f6 ff jlt.u %d2,%d3,282 <CSam_loops+0x1f8>
29a: 91 00 00 f0 movh.a %a15,0
29e: 19 f0 00 00 ld.w %d0,[%a15]0<0<CSam_UseStaticVar>
2a2: 91 00 00 f0 movh.a %a15,0
2a6: 19 ff 00 00 ld.w 15,[%a15]0<0<CSam_UseStaticVar>>
2aa: 3f 0f ec ff jlt.u %d15,%d0,282 <CSam_loops+0x1f8>
2ae: 91 00 00 f0 movh.a %a15,0
2b2: 19 f0 00 00 ld.w %d0,[%a15]0<0<CSam_UseStaticVar>
2b6: 91 00 00 f0 movh.a %a15,0
2ba: 19 ff 00 00 ld.w d15,[%a15]0<0<CSam_UseStaticVar>
2be: 3f f0 e2 ff jlt.u %d0,%d15,282 <CSam_loops+0x1f8>
}
! This is quite a lot of code before the loop as well as in the loop body. The
next code example is doing the same with less break conditions.
while (cnt_uint <50 )

218: da 31 mov %d15,49
21a: 82 01 mov %d1,0
21c: 3f 2f 11 80 jlt.u %d15,%d2,23e <CSam_loops+0x1b4>
220: c5 02 31 00 lea %a2,31 <CSam_InitArray+0x21>
{
CSam_loopCt_u16[cnt_uint++] = 1;
224: 8f 21 20 f0 sha %d15,%d1,2
228: 91 00 00 30 movh.a %a3,0
22c: c2 11 add %d1,1
22e: d9 33 00 00 lea %a3,[%a3]0 <0 <CSam_UseStaticVar>
232: 10 3f addsc.a %a15,%a3,%d15,0
234: 82 1f mov %d15,1
236: 68 0f st.w [%a15]0,%d15
238: fc 26 loop %a2,224 <CSam_loops+0x19a>
23a: 3b 20 03 10 mov %d1,50
}
RULES:
• Keep the break condition as easy and short as possible!
• Make local copies of global variables and constants

18.7. Pointer arithmetic in loops
This is a short example about the benefit by using pointer arithmetic. This is a very
common, but worse, example of initializing an array. The initializing of XXCoef1 is
done via assignment. and via pointer arithmetic.
uint16 CSam_loops(uint8 cnt2_u8)

{
/* assuming XXX_NUMCYL03_XXX is a global constant or variable */
/*common example of initialze an array */
XXCoef1[0] = XXX_NUMCYL03_AA0;
...
leads to this code:
8a: 7b 00 f8 13 movh %d1,16256
8e: 59 01 00 00 st.w [%a0]0,%d1
92: 7b 00 04 14 movh %d1,16448
96: 59 01 00 00 st.w [%a0]0,%d1
9a: 7b 00 00 f4 movh %d15,16384
9e: 59 0f 00 00 st.w [%a0]0,%d15
a2: 7b 00 0a 14 movh %d1,16544
a6: 59 01 00 00 st.w [%a0]0,%d1
aa: 7b 00 08 f4 movh %d15,16512
ae: 59 0f 00 00 st.w [%a0]0,%d15
b2: 7b 00 0e 14 movh %d1,16608
b6: 59 01 00 00 st.w [%a0]0,%d1
ba: 7b 00 0c f4 movh %d15,16576
be: 59 0f 00 00 st.w [%a0]0,%d15
c2: 7b 00 11 14 movh %d1,16656
c6: 59 01 00 00 st.w [%a0]0,%d1
ca: 7b 00 10 f4 movh %d15,16640
ce: 59 0f 00 00 st.w [%a0]0,%d15

There are 18 instructions with 32 bit width used = 72 Byte Code.
Using pointer arithmetic leads to this code:
void CSam_ForLoopPointer()
{
uint cnt_uint;
/* declare 2 pointer */
real32 *konst;
real32 *ram;
/* get address of the arrays */

ram = &XXCoef2[0];
78: d9 04 00 00 lea %a4,[%a0]0
konst = &XXX_NUMCYL01_AA[0];
7c: d9 03 00 00 lea %a3,[%a0]0
/* loop over */
for (cnt_uint = 0; cnt_uint <=8; cnt_uint++)
80: a0 82 mov.a %a2,8
{
*ram++ = *konst++;
82: 44 3f ld.w %d15,[%a3+]
84: 64 4f st.w [%a4+],%d15
86: fc 2e loop %a2,82
}
}
Now there are 6 instructions used, 2 with 32 bit width and 4 with 16 bit= 16 Byte Code.
!The code is factor ~4 smaller. This is the same functionality like a

memcpy()
18.8. Local copy of variables and constants
Example: Local copy before loop:
tmp_sum = sum;
72: 82 00 mov %d0,0
{
CSam_loopCt_u16[cnt_uint] = tmp_sum;
78: 8f 20 20 f0 sha %d15,%d0,2
7c: 91 00 00 50 movh.a %a5,0
80: c2 10 add %d0,1
82: d9 55 00 00 lea %a5,[%a5]0 <0 <CSam_UseStaticVar>>
86: 10 5f addsc.a %a15,%a5,%d15,0
88: 68 01 st.w [%a15]0,%d1

8a: fc 27 loop %a2,78 <CSam_loops+0x68>

}
By defining a temporary variable 2 instructions could be saved. But most important is

the runtime which could be saved. The runtime benefit depends also on the location of
the variable and differ from 1cycle for DMI RAM, 5 cycle for DMU up to 14 cycle for a
constant from flash.
Avoid direct access to a constant inside a loop.
One solution for an array is to copy the constant/constant array to a local variable or
struct via a fast copy loop like memcpy. If the same constant or array is used multiple,
the runtime benefit is also multiple.
Example: Make a local copy of a message:
void Comp_FuncZ_proc(void)
{
mylocalVar = Value1;
if (mylocalVar == Value2) ...
/* at the end of function */

Comp_MsgA = mylocalVar
}
Example: Make a local copy a constant:
if ( XXXX_stMskSelect_C != 0 ) <- 1. read 14 cycles

{
numErrMax = ctErr;
numDFC = (uint8) SrvB_GetBitField (XXXX_stErrDFC_CA[ctErr],
DFC_IN_ERR_PATH_START,
DFC_IN_ERR_PATH_LENGTH);
....
}
... 10 rows later
if ( XXXX_stMskSelect_C != 0 ) <- 2. read 14 cycles

...
_________________________________________________________________
better:
temp. variable in generic size.
tmp_XXXX_stMskSelect_C = (cast)XXXX_stMskSelect_C; <- 1.read 14 cycles
if ( tmp_XXXX_stMskSelect_C != 0 )/* <- 2. read 1 or 2 cycles */

{ /* (dep. var in register or stack)*/
... 10 rows later
if ( tmp_XXXX_stMskSelect_C != 0 ) /*<- 3. read 1 or 2 cycles */

{ ...

The local copy save here 10 to 12 cycles which is nearly the half runtime!
Typical use of a constant array:
for ( numErrPtt = XXXX_CYL_SHCIR_HSLS; numErrPtt < XXXX_NUM_ERR;

numErrPtt++ )
{
if ( XXXX_stErrMskSelect_CA[numErrPtt] != 0 )
{
numErrMax = ctErr;
numDFC = (uint8) SrvB_GetBitField (XXXX_stErrDFC_CA[ctErr],
DFC_IN_ERR_PATH_START,
DFC_IN_ERR_PATH_LENGTH);
....
}
many more instructions ...
Optimisation:
//or typeof(XXXX_stErrMskSelect_CA)
uint8 tmp_stErrMskSelect_CA[sizeof(XXXX_stErrMskSelect_CA)]
SrvB_MemCopy8( &tmp_stErrMskSelect_CA,
XXXX_stErrMskSelect_CA, sizeof(XXXX_stErrMskSelect_CA))
for ( numErrPtt = XXXX_CYL_SHCIR_HSLS; numErrPtt < XXXX_NUM_ERR;

numErrPtt++ )
{ // or typeof(XXXX_stErrMskSelect_CA)
if ((cast to type) tmp_stErrMskSelect_CA[numErrPtt] != 0 )
{
Hint: There is also a Cpu_MemCopy16 and 32 which copies int16/int32 sizes very fast.
For, while, do while: Which of the loop should be used depends on the program job
which has to be solved. From resource point of view every loop could be used, the
compiler generates the nearly the same code for all.
18.9. AND OR query in If statement
One bad example:

if ((stNxt == SYNC_TIMEOUT) ||
(stNxt == SYNC_WAIT_INC) ||
(stNxt == SYNC_RESYNC_OFFSET) ||
(stNxt == SYNC_PHASE_PLAUS_CHK)) ||
(stNxt == SYNC_PHASE_PLAUS_SECOND_CHK)
{…
}
The CPU has to execute all statements until one is true or last statement is reached.

One example how it could work better if the constants are defined bitwise (1,2,4..):
build a mask:
# define SYNC_ALL_BITS (SYNC_TIMEOUT | SYNC_WAIT_INC |

SYNC_RESYNC_OFFSET |
SYNC_PHASE_PLAUS_CHK |
SYNC_PHASE_PLAUS_SECOND_CHK)
if (stNxt & SYNC_ALL_BITS != 0)

{.....
Another example how it should NOT be done:
If ( (stPhSig == 1) ||
(stPhSig == 2) ||
(stPhSig == 3) ||
(stPhSig == TIO_TO_MANY_EDGES) ||
(XXX_numPhEdgLstIntNone == 1) ||
(XXX_numPhEdgLstIntNone == TIO_TO_MANY_EDGES)
)
{.....
…also !!NOT!!
if ((((stOvrTemp & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stOvrTemp & DSM_ST_DEB_PRELIM_HEAL_MSK)) ||
((stShCirOvrLd & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stShCirOvrLd & DSM_ST_DEB_PRELIM_HEAL_MSK)) ||
((stShCirBatt1 & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stShCirBatt1 & DSM_ST_DEB_PRELIM_HEAL_MSK)) ||
((stShCirBatt2 & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stShCirBatt2 & DSM_ST_DEB_PRELIM_HEAL_MSK)) ||
((stShCirGnd1 & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stShCirGnd1 & DSM_ST_DEB_PRELIM_HEAL_MSK)) ||
((stShCirGnd2 & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stShCirGnd2 & DSM_ST_DEB_PRELIM_HEAL_MSK))) != FALSE)
This are 12 OR to check if one of the bits are set!
Solution 1: Merge Mask
#define tmp_DSM_MSK ( DSM_ST_DEB_PRELIM_DEF_MSK |

DSM_ST_DEB_PRELIM_HEAL_MSK)
if ((stOvrTemp & tmp_DSM_MSK) ||(stShCirOvrLd & tmp_DSM_MSK)||

(stShCirBatt1 & tmp_DSM_MSK)||(stShCirBatt2 & tmp_DSM_MSK)||
(stShCirGnd1 & tmp_DSM_MSK) || (stShCirGnd2 & tmp_DSM_MSK))
{...

Solution 2: Save status in 1 variable
uint stAll = 0;
stAll |= DSM_DebRepCheck (DFC_OvrTemp,....)

stAll |= (DSM_DebRepCheck ( DFC_ShCirOvrLd...) << 1)
stAll |= (DSM_DebRepCheck ( DFC_ShCirBatt1...) << 2)
if (stAll)
{...
RULES:
• Try to build a mask for less querries.
18.10. Switch case
A simple switch case with huge gaps between the cases:
sint32 CSam_smal_switch_40 ( uint32 variable_u32 )

{
sint32 returnvariable_s32;
switch ( variable_u32 )
{
case 0:
returnvariable_s32 = 4;
break;
case 6:
break;
case 23:
break;
case 39:
break;
default:
break;
}
return returnvariable_s32;
}

The result of the compiler is a huge pointer table:
sint32 Csam_smal_switch_40 ( uint32 variable_u32 )

{
sint32 returnvariable_s32;
switch ( variable_u32 )
610: da 27 mov %d15,39
612: 3f 4f 15 80 jlt.u %d15,%d4,63c
;CSam_smal_switch_40+0x2c;
616: 91 00 00 f0 movh.a %a15,0
61a: 01 f4 02 f6 addsc.a %a15,%a15,%d4,2
61e: 99 ff 00 00 ld.a %a15,[%a15]0x626
622: 00 00 nop
624: dc 0f ji %a15
626: 00 00 06 c6 case 0:
62a: 00 00 06 dc default:
62e: 00 00 06 dc default:
632: 00 00 06 dc default:
636: 00 00 06 dc default:
63e: 00 00 06 ca case 6:
642: 00 00 06 dc default:
646: 00 00 06 dc default:
652: 00 00 06 dc default:
656: 00 00 06 dc default:
662: 00 00 06 dc default:
666: 00 00 06 dc default:
672: 00 00 06 dc default:
676: 00 00 06 dc default:
682: 00 00 06 d0 case 23:
686: 00 00 06 dc default:
692: 00 00 06 dc default:
696: 00 00 06 dc default:
6a2: 00 00 06 dc default:
6a6: 00 00 06 dc default:
6aa: 00 00 06 dc default:
6ae: 00 00 06 dc default:
6b2: 00 00 06 dc default:
6b6: 00 00 06 dc default:
6ba: 00 00 06 dc default:
6be: 00 00 06 dc default:

6c2: 00 00 06 d6 case 39:

{
case 0: returnvariable_s32 = 4;
6c6: 82 42 mov %d2,4
break;
6c8: 3c 0c j 640 CSam_smal_switch_40+0x30;
6ca: 3b b0 06 20 mov %d2,107
break;
6ce: 3c 09 j 640 CSam_smal_switch_40+0x30;
6d0: 3b 60 0e 20 mov %d2,230
break;
6d4: 3c 06 j 640 ;CSam_smal_switch_40+0x30;
6d6: 3b 10 13 20 mov %d2,305
break;
6da: 3c 03 j 640 ;CSam_smal_switch_40+0x30;
default: returnvariable_s32 = 560;
6dc: 3b 00 23 20 mov %d2,560
}
return returnvariable_s32;
6e0: 00 90 ret
In this case a table of 40 pointers have generated, but only 4 will be effectively used.
This is a waste of 36x4 =154 bytes of code.
Another example: Normally this construct is predestinated for a switch case:
uint32 CSam_SwitchCase(uint32 switchstatement_u32)

{
uint32 returnvalue_u32;
switch (switchstatement_u32)
{
case 0:
returnvalue_u32 = 4;
break;
case 1:
break;
case 2:
break;
case 3:
break;
case 4:
break;
case 5:

break;
case 6:
break;
case 7:
break;
default:
break;
}
return (returnvalue_u32);
}
But the compiler handles the write to the same variable not especially.
Using a constant array where every constant is equivalent to one return value a lot
code could be saved:
Const uint8 sc_map[] = {04,17,23,35,42,56,64,76};
uint32 CSam_SwitchCase_opt(uint32 switchstatement_u32)

{
uint32 returnvalue_u32;
if(switchstatement_u32>7)
{
}
else
{
returnvalue_u32 = sc_map[switchstatement_u32];
}
return (returnvalue_u32);
}
The code of the first switch case generates 34 instructions with a size of 96 bytes, the
code of the if/else only 7 instructions with a size of 20 bytes plus 8 bytes constants.
This is a factor ~3 in the code size.
RULES:
case 1.
• Test the different possibilities in the software.

18.11. Copy and p(w)aste
Here an example where a lot of code with nearly same functionality is written/copied
multiple.
stBal[ST_ENG1]=(bool)((SrvX_IpoGroupCurveS16(dSrchRslt,
XXX_qSetMax1_GCUR)>= SetFlt)&&
(SrvX_IpoGroupCurveS16(dSrchRslt,
XXX_qSetMin1_GCUR) <= SetFlt));
... 10 nearly equal blocks later
stBal[ST_ENG11]=(bool)((SrvX_IpoGroupCurveS16(dSrchRslt,
XXX_qSetMax11_GCUR) >= SetFlt)&&
(SrvX_IpoGroupCurveS16(dSrchRsltQnt,
XXX_qSetMin11_GCUR) <= SetFlt));
This generates the same code 11 times.
One solution could be:

Declare a constant pointer array…
const SrvX_ValS16_t* const ptoMaxCurve[n]={&XX_qSetMax1_GCUR,
&XXX_qSetMax2_GCUR...
const SrvX_ValS16_t* const ptoMinCurve[n]={&XX_qSetMin1_GCUR,

&XXX_qSetMin2_GCUR...
...use a loop.
for {x=0;x<loopCnt;x++)
{
stQntUnBal[x]=(bool)((SrvX_IpoGroupCurveS16(dSrchRsltQnt,
ptoMaxCurve[x]) >= SetFlt)&&
(SrvX_IpoGroupCurveS16(dSrchRsltQnt,
ptoMinCurve[x]) <= SetFlt));
}
Another example for calling a local function 10 times after each other.
stCR1 = XXX_CheckRange( stClthDeb ,

stBrkDeb ,
stCR1 ,
nEngFlt ,
tEngFlt ,
tAdapTUsFlt ,
vFlt ,
tAdapTUsFlt ,
(sint16)tAir ,
rAPPFlt ,
numGearDeb ,
uBattFlt ,

&uBattCDHystStateRng1 ,
(bool)SrvB_GetBit(stEng, BT_PRJ_RNG_CHK1),
&stRng1 ,
stBal ,
RANGE1
);
{
.......
}
stCR2 = XXX_CheckRange(…
stCR10 = XXX_CheckRange(
This local function is called 10 times consecutively, and only 4 of the handover
parameter are different. This causes overhead of code and runtime for the additional
load store instructions.
This gives this code 10 times

8012e772: 82 10 mov %d0,1
8012e774: 91 00 00 2d movh.a %a2,53248
8012e778: 59 a0 20 00 st.w [%sp]32,%d0
8012e77c: 82 30 mov %d0,3
8012e77e: 19 a2 18 20 ld.w %d2,[%sp]152
8012e782: 02 d4 mov %d4,%d13
8012e784: d9 22 96 86 lea %a2,[%a2]27158
8012e788: 02 e5 mov %d5,%d14
8012e78a: 02 b7 mov %d7,%d11
8012e78c: 91 10 00 4d movh.a %a4,53249
8012e790: d9 a5 32 20 lea %a5,[%sp]178
8012e794: 8c 20 ld.h %d15,[%a2]0
8012e796: 91 10 00 2d movh.a %a2,53249
8012e79a: 59 a0 28 00 st.w [%sp]40,%d0
8012e79e: 19 a0 1c 20 ld.w %d0,[%sp]156
8012e7a2: d9 22 75 08 lea %a2,[%a2]-31691
8012e7a6: d9 44 40 18 lea %a4,[%a4]-31680
8012e7aa: 14 26 ld.bu %d6,[%a2]
8012e7ac: 78 07 st.w [%sp]28,%d15
8012e7ae: 74 a0 st.w [%sp],%d0
8012e7b0: 58 25 ld.w %d15,[%sp]148
8012e7b2: 19 a0 10 20 ld.w %d0,[%sp]144
8012e7b6: 59 a2 04 00 st.w [%sp]4
8012e7ba: 19 a2 0c 20 ld.w %d2,[%sp]140
8012e7be: 59 a9 24 00 st.w [%sp]36
8012e7d6: 6d 00 7d 03 call 8012eed0
This are 84 Byte * 10 times = 840 byte.

By using a loop maybe also inside the function 840 byte minus a little overhead ~750
bytes of code can be saved.
RULES:
loops inside the sub functions are helpful.

18.12. Waste of RAM with static bool

static bool stCR1;
static bool stCR2;
static bool stCR3;
...
static bool stCR11;
static bool uBattCDHystStateRng1;

...
stCR1 = XXX_CheckRange( stClthDeb ,

stBrkDeb ,
stCR1 ,
nEngFlt ,
tEngFlt ,
tAdapTUsFlt ,
vFlt ,
tAdapTUsFlt ,
(sint16)tAir ,
rAPPFlt ,
numGearDeb ,
uBattFlt ,
&uBattCDHystStateRng1 ,
(bool)SrvB_GetBit(stEng, BT_PRJ_RNG_CHK1),
&stRng1 ,
stQntUnBal ,
RANGE1
);
{
.......
What are the problems:

• static bool is a waste of RAM and should be avoided, only one bit out of 8 is
used.
• This local function is called 11 times consecutively, and only 4 of the handover
parameter are different. This causes overhead of code and runtime for the
additional load store instructions.
• 17 parameter values are a lot, as the CPU could only handover 4 values and 4
pointer in the registers and the rest over stack.
• inside the function XXX_CheckRange the static bool stCRx variable is used as
bit string. Only True or False is allowed for boolean variables.
Solution:
• check if all ranges have to be executed every time!

• define uBattCDHystStateRng and stRng as uint16 bit string,
or, if not possible, as struct, or use type _bit.
• define return value as uint16 bit string with all states inside.

• call XXX_CheckRange only once and do a loop over all necessary ranges
inside the function.
18.13. Not allowed use of static bool

static bool stNhi; <- Use unique name for static var
static bool ctSegBPRedNum; <- Wrong naming conventions:

ct=counter; -> for bool: st or b
ctSegBPRedNum = (Epm_numCyl << 2); <- No arithmetical operations!!
if (ctSegBPRedNum <= 0) <- Query only on TRUE/FALSE

{
ctSegBPRedNum-- ; <- No arithmetical operations!!
}
Rules:
• Boolean variables must only have the values TRUE or FALSE
• No pointer operations on/with boolean variables.
18.14. Inline functions & macros
It has always to be considered how often a inline function(also conditional compilation)

is used in the code.
Here a bad example:
void XXX_cylinder_proc(void)
{
#if (NUM_ZYL >= 1)
ignition_time [0] = (ignition_time[0] * correction_factor[0]) >> 16;
ignition_delay [0] = (ignition_time[0]) + ((ignition_delay [0] *
correction_factor[0]) >> 16);
injection_time [0] = (injection_time [0] * correction_factor[0]) >> 16;
injection_delay [0] = (injection_time[0]) + ((injection_delay [0] *
#endif
#if (NUM_ZYL >= 2)

ignition_time [1] = (ignition_time [1] * correction_factor[1]) >> 16;
injection_time [1] = (injection_time [1] * correction_factor[1]) >> 16;
injection_delay [1] = (injection_time[1]) + ((injection_delay [1] *
#endif
... until 6
#if (NUM_ZYL >= 6)
ignition_time [5] = (ignition_time [5] * correction_factor[5]) >> 16;
#endif }

In worst case this code is expanded 6 times. With a loop the code is only needed once.
for (x = 0; x < NUM_ZYL;x++)

{
ignition_time [x] = (ignition_time[x] * correction_factor[x]) >> 16;
ignition_delay [x] = (ignition_time[x]) + ((ignition_delay [x] *
correction_factor[x]) >> 16);
injection_time [x] = (injection_time [x] * correction_factor[x]) >> 16;
injection_delay [x] = (injection_time[x]) + ((injection_delay [x] *
correction_factor[x]) >> 16);
}
In macros it is also very important to have a look at the data consisty.

Here is a example of a library function with and without consistency.
extern volatile uint32 x;

extern volatile uint32 y;
uint32 z;
...
z = SrvB_Min (x,y);
The 'volatile' keyword forces the compiler to read the variables, each time they are
used in the C code. Calibration data are declared as volatile
This is equivalent to:
if (y > x)
{
z = x;
}
else
{
z = y;
}
And the processor will read the x and y variable twice, since the are 'volatile':
Asumption y=50; x=25

if (y > x)
{
<-interrupt occurs:
x=5000
z = x;
}
else
{
z = y;
}
Now z is equal to 5000
You expect, the z is always limited to 50, the max value of y?
By making a local copy this will be avoided.

#define SrvB_Min(x, y) \
({ \
typeof(x) _res; \
typeof(x) _x=(x); \
typeof(y) _y=(y); \
_res = (((_y) > (_x)) ? (_x) : (_y)); \
})
Examples for macro definitions:
Macros for symbolic numeric constants:

#define COMP_PI 3.1415926F
#define EEEPDD_PAGESIZE 128ul
Function like macros:

#define COMP_MUL(A,B) ((A) * (B))
#define SrvB_SetBitMask(base,mask) ((base) |= \
(typeof(base))(mask))
#define EEP_GET_RAM_ADR(BLKIDX)(Eep_adEepRam_cu32 + \
(uint32)EEP_GET_RAM_OFS(BLKIDX))
Other examples:
#define EEEBD_GET_BLOCK_TYPE ((EEEBD_GET_BLKFLAGS & \
EEEBD_BLKTYP_MSK) >> \
EEEBD_BLKTYP_BP)
#define EEEPDD_INACTIVE_SECTOR \
((uint8)((Eeepdd_GlobVars_s.xActRdSec_u8 == 0) ? 1 : 0))
Not allowed are:

#define MY_UINT32 uint32
#define MY_IF if(
#define MY_ELSE else
19. Overview of all rules

Standard data types
• Use global/static bool only if really necessary e.g. high frequently called
variables.
• In the generic type (register size) defined variables obtain optimal performance
at access. Inside a function use generic type/register size where ever possible,
cast only when really necessary e.g. handover to a function, at the end of a
function for return value…
• Temporary variables should be declared in generic type, for global and static
the size has to be considered too.
• Use bit instead of bool

EDC17:
• Since TriCore does not directly support real64, this type should not be used
to save memory and runtime.
Order of variables
• Sort global and static variables by size from huge to small.
Initialization of variables
• Do not initialize static variable and pointer when init value is 0.
• Do not initialize local variables if the first access is a write access.
EDC17:
• Declare static/global variable, where the initial values are known on start up, at
the location where they are defined.
e.g. static uint16 my_staticExampleVar = 10;
• For global/static variables where the variable initialized via calibration value use
the Macro initValueRAM.
Economical use of measurement points
• Due to overlay mechanism it is not allowed to read from measurement points.

The location could be in a non physical memory area and so the result is
undefined.
• Use measurement point as economical as possible.
• Put as much information as possible in one measurement point (use bit

strings).
• Try to use a mechanism, e.g. conditional compiling, to switch it off after testing.
Use of bool & _bit
• Boolean variables must only have the values TRUE or FALSE.
• No pointer operations on/with boolean variables.
• Avoid the use of static bool, try to use a bit string with several information in
one variable.

EDC17:
• Single bit variables can be used to store binary states efficiently.
• Bit variables can only be located in normal RAM and protected RAM, as they
require absolute addressable memory.
• It is allowed to use bits as messages.
• Bit variables must not be used as constants, calibration values, local variables
or inside a structure.
• Use _bit for:

o conditional jumps, if/else (2 instructions including load)
o immediate written 0/1 e.g.: Bit = 0; (1 instruction including store)
o saving RAM
• Do NOT use -Bit for

o copying Bit to Bit e.g.: Bit1 = Bit2; (5 instructions, jump required!)
o writing a local variable (4 instructions, jump required!)
o read into a local variable (2 instructions)
o local variables -> use bool
Locating rules:
EDC17:
• Code: Locate in internal flash:
- Service library e.g. SrVB_XXX
- Operating system
- Hardware encapsulation e.g. Adc_Get
- n-sync, 1ms, 2ms called processes
- Interrupt called processes e.g. Uart
• Code: Locate in SPRAM

- OS
- Interrupt vector table
- high frequently called functions with less code high runtime (list available)
• Code: Locate in ext. flash:

- nothing! …but if necessary
- less frequently called and process with less runtime and high code usage.
• Variables: Locate in fastest RAM (DMI)

- Variables used from process mentioned above in "Internal flash and SPRAM"
- Variables with call frequency ≥ 500 Hz
- in special cases with very critical runtime also maps and curves e.g. CSC-P
- Stack and CSA
• Variables in low speed RAM:

- less frequently called e.g. 100 ms processes

- measurement points
• Variables in SPRAM:
- Only measurement points with uint16 and uint32 size, uint8 is NOT possible.
Passing return values:
• Simple mathematical calculations can be executed directly at the return

expression.
Passing function parameter
also be considered carefully when a function interface gets designed.
• For local variables "Call by value" is most efficient, unless there is more than 1
return value needed.
also be written.
• For global structures "Call by reference" is most efficient.
• Huge local structures should be avoided. Structures with 2 elements or less can
be handled like one normal variable.
Loops
• The count / loop instruction should be used wherever possible.
• Not relevant computations in loops should be moved outside the loop body.
Avoid something like this:
CSam_loopCt_u16[stMskSelect_C*2+1] = …
• Declare loop counter variable in generic type.
• Inside the loop no access to Data stored in Flash, use temp. variable in front of
the loop. For copy constant arrays (flash) also memcpy functionality could be
used. It has to taken care that the size of local array is not greater than 30
bytes that the stack size keeps in limit.
• Keep the break condition as easy as possible.
• Use a constant (number) or define for break condition, no ext. variable or

calibration constant.
• Use pointer arithmetic for fast loop

• Do not always try to solve the requirement with the smallest C-Code. The
smallest C-Code do not always leads to the optimal result. This has to be
considered carefully.
AND / OR query in If statement
• Merge AND and OR queries with a mask constant wherever possible
• In case of OR query: Put the most probable case in the first statement
• In case of AND query: Put the most improbable case in the first statement
Switch Case
case 1.
• The minimum of cases is 5, otherwise an optimized if/else statement should be

used.
Redundant code or copy & p(w)aste
loops inside the sub functions are helpful.
• Consider if all of the code have to be executed/needed in every configuration or

if the code could be conditional executed /generated.
Local copy of data
• Make a local copy when a constant or global variable is used more than once in
the function. This is also necessary to get data consistency inside the function.
EDC17:
• Use local copies also for messages.
• Make local copies also from _CA if the values used multiple.
Function calls
• Avoid deep nested function calls.

• No 1 to 1 connection to a sub function (should be an inline then).
20. Tricks and tools
20.1. How to generate an assembler dump.

EDC17 Specific Rules
It is helpful to see which assembler code the compiler generates. This can be done
with the software build environment.
How to compile a single file:
• Open cmd window
• start TBCON in a view in the medc17 folder.
example: M:\reference_c01_rdd2fe\medc17>
• type command: swb build.c_to_o --file=dir\filename.c --dest=tmp
How to get assembler & C-code from object file:

• This will be done with tricore-objdump delivered with toolbase on every PC in
the toolbase\hightec\VersionXXX\bin folder.
• typecommand :C:\toolbase\hightec\CD_v3.3.6.7_1\bin\tricore-objdump.exe -dS
dir\filename.o > samples.txt
For more information about software build type swb --help >swbhelp.txt (because help
is to big to read on monitor).
20.2. WinRtm
WinRtm is a windows based tool which gives the possibility to measure and analyse
the runtimes of functions and the accesses on variables. But it is not "only" a runtime
measurement tool it analyses also (for each module!):
Static view:
• Flash: intern, extern, code size, calibration data, constant.
• RAM: Variable, SPRAM, RAM DMI, RAM DMU, stack
Dynamic view
• Runtime: Runtime min/max/average, call distance min/max, interrupts,
reentrant calls.
There is a separate documentation available for WinRtm.

21. Useful Links

Link to WinRtm Delivery page:
http://www.intranet.bosch.com/ds/topics/edc/111_resources/072_winrtm/index.html
General Coding standards:

http://www.intranet.bosch.com/ds//esq/100_topics/202_processportal/300_engineering/
303_coding//001_Guidelines/EDC502-CUR-D-General_Coding_Standards.pdf
Coding styleguide:
http://www.intranet.bosch.com/ds/esq/100_topics/202_processportal/300_engineering/
303_coding/001_Guidelines/styleguide_en.pdf
Naming conventions:
http://www.intranet.bosch.com/ds/esq/100_topics/202_ProcessPortal/300_Engineering
/311_ComponentDesign/001_Guidelines/PMT_SE_NamingConventions.pdf
Resource team hompage with EDC specific optimization possibilities:

http://www.intranet.bosch.com/ds/me_d-edc-
17/platform/dsresmgmt/06_documentation/index.html
DSP optimization guide:

\\si9346\carpu$\extern\Info\user manuals\Application
Notes\Application_Notes\General\dsp_opt_guide_part_1_v164.pdf
\\si9346\carpu$\extern\Info\user manuals\Application
Notes\Application_Notes\General\dsp_opt_guide_part_2_v164.pdf


Efficient Coding of Embedded Software

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Efficient Coding of Embedded Software

Caricato da

Copyright:

Formati disponibili

Page 1 of 46 Date

DS/EES1-Roe: Efficient Coding of Embedded Software

18.13. Not allowed use of static bool ........................................................................ 38

Legend of the signs:

DS/EES1-Roe: Efficient Coding of Embedded Software

1. Standard data types

bool Boolean type (represented as uint8!!)

_bit Single bit (only global or static variables)

bit8, bit16, bit32 Bit field types

uint8, sint8 8-Bit types

uint16, sint16 16-Bit types

uint32, sint32 32-Bit types

uint64, sint64 64-Bit types

real32 Float types

uint generic type (size of register)

sint generic type (size of register)

EDC17 Specifics & Rules

Info: Register size of TriCore is 32 bit.

• Use _bit instead of bool for RAM variables

DS/EES1-Roe: Efficient Coding of Embedded Software

The arrangement of variables respectively constants in structures are done by the

Note: Pointers are 32 Bit sizes. e.g. also uint8*

• Sort global and static variables by size from huge to small.

2.1. Examples: Order of variables

EDC17 Specifics & Rules

DS/EES1-Roe: Efficient Coding of Embedded Software

• Hint: Functionality of Macro initValueRam:

4. Economical use of measurement points

EDC17 Specifics & Rules

• Due to overlay mechanism it is not allowed to read from measurement points.

• Use measurement point as economical as possible.

• Put as much information as possible in one measurement point (use bit

5. Use of bool & _bit

From Ansi-C point of view, a boolean variable is handled as an integer. A variable

• Boolean variables must only have the values TRUE or FALSE.

DS/EES1-Roe: Efficient Coding of Embedded Software

• No pointer operations on/with bit variables.

EDC17 Specifics & Rules

In EDC the type bool is defined as unsigned char (uint8).

• Single bit variables can be used to store binary states efficiently.

• It is allowed to use bits as messages.

• Use _bit for:

• Do NOT use -Bit for

5.1. Example: Use of static bool:

5.2. Example: Waste of RAM with static bool

DS/EES1-Roe: Efficient Coding of Embedded Software

EDC17 Specifics & Rules

Definition of high frequently called in EDC17:

• Code: Locate in internal flash:

• Code: Locate in ext. flash:

• Variables: Locate in fastest RAM (DMI)

DS/EES1-Roe: Efficient Coding of Embedded Software

- Variables with call frequency ≥ 500 Hz

• Variables in low speed RAM:

Hint: in EDC17 with ~2,7% of code we generate ~ 45% of the load!

7. Static keyword & inline functions

EDC17 Specifics & Rules

If an inline function is declared with the "__attribute__((always_inline))" the function will

Examples for inline declaration

DS/EES1-Roe: Efficient Coding of Embedded Software

Hint: It makes sense to implement a functionality as inline function, when the

7.1. Example: Inline functions & macros

8. Passing return value

8.1. Example: Passing return values

If an inline function is declared with the "attribute((always_inline))" the function will

sint32 CSam_cbr(sint32 variable1_s32,sint32variable2_s32,