Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Version 1.2
Efficient coding of Embedded 19.06.2006
Software
DS/EES
Author: Telephone
Dieter Röder Fe-46043
Efficient coding
of
Embedded Software
Efficient Coding of Embedded Software
Content
Content........................................................................................................................... 2
Abstract .......................................................................................................................... 3
1. Standard data types ................................................................................................ 4
2. Order of variable ..................................................................................................... 5
2.1. Examples: Order of variables .............................................................................. 5
3. Initialization of variables .......................................................................................... 5
4. Economical use of measurement points ................................................................. 6
5. Use of bool & _bit.................................................................................................... 6
5.1. Example: Use of static bool: ................................................................................ 7
5.2. Example: Waste of RAM with static bool............................................................. 7
6. Locating rules.......................................................................................................... 8
7. Static keyword & inline functions............................................................................. 9
7.1. Example: Inline functions & macros .................................................................. 10
8. Passing return value ............................................................................................. 10
8.1. Example: Passing return values ........................................................................ 10
9. Passing function parameter .................................................................................. 11
9.1. Examples: Passing Function Parameters.......................................................... 11
10. Loops................................................................................................................. 11
10.1. Example: Size of loop counter variable.......................................................... 12
10.2. Example: Local copy of variables and constants ........................................... 12
10.3. Example: Break conditions of loops............................................................... 12
10.4. Example: Pointer arithmetic in loops.............................................................. 12
11. AND / OR query in IF statement........................................................................ 13
11.1. Examples: AND OR query in IF statement ................................................... 13
12. Switch case ....................................................................................................... 13
12.1. Example: Switch case.................................................................................... 14
13. Redundant code or copy & p(w)aste ................................................................. 14
13.1. Example: Copy and p(w)aste......................................................................... 14
14. Use of Mem functions -copy, -set, fill................................................................. 15
15. Local copy of data ............................................................................................. 15
15.1. Examples: Local copy of variables and constants ......................................... 16
16. Function calls..................................................................................................... 16
17. DSP functionality ............................................................................................... 17
18. Programming examples..................................................................................... 18
18.1. Order of variables .......................................................................................... 18
18.2. Use of measurement points ........................................................................... 19
18.3. Passing return values .................................................................................... 20
18.4. Passing Function Parameters........................................................................ 20
18.5. Size of loop counter variable.......................................................................... 22
18.6. Break conditions of loops............................................................................... 24
18.7. Pointer arithmetic in loops.............................................................................. 26
18.8. Local copy of variables and constants ........................................................... 27
18.9. AND OR query in If statement ....................................................................... 29
18.10. Switch case.................................................................................................... 31
18.11. Copy and p(w)aste......................................................................................... 35
18.12. Waste of RAM with static bool ....................................................................... 37
Abstract
On most of the embedded systems the resources, which means memory and runtime,
are strictly limited. This has different reasons but one of the most important are the
costs, which are in focus to keep an ECU competitive. Therefore it is mandatory to
develop a maximum of functionality with a minimum of resources. Beside RAM and
code also an important resource is the runtime. A real time system has a fixed
scheduling where the operation has to be done in a specified time frame.
ECU resources are like a spiders web. If you pull on one angle, the other angles move
also. What does this mean? For example, if we want to save runtime we have to spend
memory or vice versa, but in most of the cases a smaller code leads to fewer runtime.
This manual should give software developers some hints and tips how to program
resource optimized. Nevertheless, the developer itself has the deepest knowledge of
the requirements and therefore the responsibility to program as resource optimized as
possible.
The chapters are divided into 4 parts: Explanation, common rules, EDC17 specific
rules and a link to one or more examples.
Most of the examples are out of the EDC software, some are build specially for better
overview. For own created examples the software build environment from EDC17 with
GNU compiler version 3.3.6 is used.
7 This sign indicates that the optimization gives a benefit for runtime.
7 This sign indicates that the optimization gives a benefit for code
memory.
This sign indicates that the optimization gives a benefit for RAM
7 memory.
This sign indicates that the following chapter discusses EDC17 specifics
and rules.
Standard types
RULES:
• Use global/static bool only if really necessary e.g. high frequently called 7
variables.
• In the generic type (register size) defined variables obtain optimal performance
at access. Inside a function use generic type/register size where ever possible, 7
cast only when really necessary e.g. handover to a function, at the end of a
function for return value…
7
• Temporary variables should be declared in generic type, for global and static
the size has to be considered too.
RULES:
• Since TriCore does not directly support real64, this type should not be used
to save memory and runtime.
2. Order of variable
Explanation:
It is recommended to sort by size e.g. first all pointers, then all 32-Bit-variables, then all 7
16-Bit-variables and then all 8-Bit-variables.
RULE:
3. Initialization of variables
Explanation:
Static variables are initialized at start up with "unsigned 0" or in case of a pointer with
"NULL". This is Ansi-C standard and also described in ISO/IEC 9899:1999. 7
RULES:
• Do not initialize static variable and pointer when init value is 0.
7
• Do not initialize local variables if the first access is a write access.
Every variable which have to be initialized needs 12 Byte of Code and also runtime in
the init phase.
RULES:
• Declare static/global variable, where the initial values are known on start up, at
the location where they are defined.
e.g. static uint16 my_staticExampleVar = 10;
• For global/static variables where the variable initialized via define or constant
value use the Macro initValueRAM.
RULES:
• Avoid the use of static bool, try to use a bit string with several information in
one variable.
RULES:
• Bit variables can only be located in normal RAM and protected RAM, as they
require absolute addressable memory.
• Bit variables must not be used as constants, calibration values, local variables
or inside a structure.
Hint: Due to the instruction set, an optimal usage is limited. This has to be
considered carefully, when converting binary state variables to bit data type.
6. Locating rules
Explanation:
Most of the Microcontrollers have different memory sections with different access
times. This leads to variable runtime depending in which memory area the code or the
variables are located. High frequently called functions e.g. libraries, OS, hardware
encapsulation, must be located in the fastest flash memory. This do also apply for the
variables used in this functions. They must be located in the fastest RAM.
Memory areas:
7
Code: There are 2 respectively 3 memory areas:
Internal flash, Scratch Pad RAM (SPRAM) and if available external
flash.
RAM: DMI, DMU incl. Overlay RAM, SPRAM depending on the derivate.
SPRAM: Location for High frequently called function with less code and high
runtime and also Interrupt vector table, OS. A list of possible functions is
available in intranet at Resource team page.
Extern. flash: Only for less frequently called functions e.g. 100ms task processes or
for processes with less runtime but high code usage.
RULES:
• Variables in SPRAM:
- Only measurement points with uint16 and uint32 size. Uint8 is NOT possible.
There are sometimes small uncertainty about the keyword static. Therefore some
explanations and examples.
The keyword static has two different meanings. Inside of C-file the word defines the
created object as static, means only visible for functions inside the same C-file.
This is similar to data objects too.
A variable declared inside the function is only visible inside this function.
A variable declared inside the module (C_file) is visible for all functions inside the
module but not for any other module.
A function which is only called once must be an inline. This could be done via an
explicit declaration as inline or by declare it as static. A static declaration gives the
compiler the possibility to inline the function when only called once.
RULE:
• Declare functions, which are only used in the same C-file, as static.
In EDC17 the keyword static has also an important impact on the define of inline
functions.
If the attribute "static" is used on declaration and definition (static inline functionname)
no callable instance will be generated, as long as the size of the function is under the
compiler inline limit.
static inline ret_type function_name (parameter) {code}
callable use in
declaration instance EDC17 inline until
static inline functionname() no yes inline limit (compiler)
static inline functionname()
__attribute__ ((always_inline)) no yes always
inline functionname() yes no inline limit (compiler)
extern inline functionname() no no inline limit (compiler)
It is better (more safely) to use inline functions instead of macros because inline
functions are type and C-rules checked by the compiler, macros are only expanded
from the pre-processor.
Simple mathematical calculations can be executed directly at the return and have not
to be type cast into another type.
RULES:
7
• Do simple mathematical calculations directly at the return expression.
• The readability has a higher importance than the optimization! Very complex 7
operations must be done without the optimization.
Generally functions have no limitation on the number of arguments (only the stack
size).
But from the view "embedded software" there is a difference. All of the microcontrollers
have a limited set of registers which is used for parameter passing. If these registers
are filled with handing over parameters the rest will be copied to the stack. The access
on parameters stored in the registers is the most efficient. This will be discussed later
There are 2 possibilities for passing parameters to functions: "Call by value" or "Call
by reference"
RULES:
• From the resource point of view keep the number of parameters as small as
possible. In a lot of cases it is better to use an array or structure. These has
also be considered carefully when a function interface gets designed. 7
• For local variables "Call by value" is most efficient, unless there is more than 1
return value needed. 7
• For global variables "Call by value" is most efficient, unless the variable should
also be written.
• Huge local structures should be avoided. Structures with 2 elements or less can
be handled like one normal variable.
The Infineon Tricore has 8 registers for parameter passing, 4 for values D4-D7 and 4
for addresses (pointers) A4-A7. More arguments will be copied, as mentioned above,
to the stack.
10. Loops
Explanation:
RULES:
• Not relevant computations in loops should be moved outside the loop body.
7
Avoid something like this:
CSam_loopCt_u16[stMskSelect_C*2+1] = …
7
• Declare loop counter variable in generic type.
• Inside the loop no access to data stored in flash, use temp. variable in front of
the loop. For copy constant arrays (flash) also memcpy functionality could be
used. It has to taken care that the size of local array is not greater than 30
bytes that the stack size keeps in limit.
• Do not always try to solve the requirement with the smallest C-Code. The
smallest C-Code do not always leads to the optimal result. This has to be
considered carefully.
Variable size for loop counter where the loop instruction will be used:
By using sint16 the loop instruction will also be used, but caution, the extr instruction is
necessary anyway. -> use uint/sint.
An AND/OR query should be held as short as possible. This saves code and runtime.
7
Rules:
• In case of OR query: Put the most probable case in the first statement
• In case of AND query: Put the most improbable case in the first statement
Switch case is very useful for the programmer to avoid deeply nested statements.
Different cases could be handled very easily with a switch case statement and also it
has a good readability. Switch case is the preferred solution for a state maschine.
Nevertheless a switch is not always the best solution. Most of the compilers create a
pointer table for fast access. This is effective for runtime but not for code consumption
when there are gaps between the cases.
RULES:
• A switch case should only be used if the cases follow each other e.g. case 0,
7
case 1.
Overview:
RULES:
• For multiple needed functionality use sub functions or loops. Sometimes also
loops inside the sub functions are helpful. 7
• Consider if all of the code have to be executed/needed in every configuration or
if the code could be conditional executed /generated.
For effective copy actions it could be helpful to use fast functions like memcpy().
void *memcpy(void *dest, const void *source, size_t number_of_bytes); 7
The memcpy() function operates as efficiently as possible on memory areas. It does
not check for overflow of any receiving memory area. Specifically, memcpy() copies n 7
bytes from memory area source to destination. It returns a pointer to destination.
Source and destination area may not overlap. To get the number of bytes to be copied,
the use of macro sizeof() is helpful. Sizeof will return the number of bytes reserved for
a variable or data type.
This rules are also valid for a memfill and memset functions.
void *MemFill(void *dest, uint8 pattern , sint32 n)
void MemSet(uint32* xDest_pu32, uint32 xPattern_u32, uint32
numBytes_u32)
The above reasons are also valid for MemFill and MemSet functions:
void *SrvB_MemFill(void *dest, uint8 pattern , sint32 n)
On most of the microcontrollers the data access to the ROM memory is slow compared
with an access to the RAM. Therefore it is necessary to reduce the number of access
to slower memory to a minimum. For data which is used more then once, it makes
sense to create a local copy as access to RAM is much faster. This does also apply to
global variables, where a multiple store instructions could be avoided.
RULES:
• Make a local copy when a constant or global variable is used more than once in
the function. This is also necessary to get data consistency inside the function.
7
In EDC17 there are a lot of access to calibration data stored in the flash: Constants
(_C) and constant arrays (_CA). For calibration data every access initiates a read from
flash memory, means the value could not be held in a register.,
In most of the cases the access to constant is random, which causes initial wait cycles
(up to 14 cycles) on every read. In some cases it makes sense to copy also constant
arrays to a local array via fast memcpy() function.
RULES:
• Use local copies also for messages if used more than once in the same
function.
• Use local copies also for messages if used more than once in the same
function.
• Make local copies also for constant array (_CA) if the values used multiple.
As known, (sub)functions are invoked via a call. The CPU has to save the registers
and to load the address of the called function. There are different implementation in
the controllers how to do that. Some save the registers to the stack, others have an
own context save area which can be accessed very fast. Also the load of the program
code is different. If there is no cache memory implemented, the code have to be
loaded direct from the flash. With a cache the code can be loaded from the cache (in
case of a cache hit) or loaded also from flash to cache. Unfortunately the CPU loads
the whole cache line which takes the initialize access time plus the time to load the line
which depends on the line size and the bus width. This is an advantage when the code
of the cache line could be executed, but not if one call or jump follows the other.
This is what should be explained here.
Bad example:
RULES:
The TriCores Metis and Leda CPU has a context save area and a code cache with
16kB respective 8kB. Here the consequence of a deep nested call in EDC17, in
assumption that the code is not yet in the cache (cache miss).
call func_D
func_A (...) func_B (...) func_C (...) func_D (...)
The TriCore family has a very high performance digital signal processor on board. It is
very effective for programming fast routines with a maximum of performance. They
have to be implemented in assembler language.
Here some examples for what the DSP functionality could be used:
Generator:
e.g. Complex wave generator
Scalars:
e.g. 16 bit signed multiplication
e.g. Complex multiplication…..
Vectors:
e.g. Vector multiplication
e.g. Vector square difference…
Filters:
e.g. Magnitude square
e.g. FIR …
Examples of the assembler code and a detailed description can be found in the
Infineon DSP optimisation guide.
\\si9346\carpu$\extern\Info\user manuals\Application Notes\Application_Notes\General
bad:
uint8 XZY_xTestval1_u8;
/* Gap 3Byte */
uint16* XYZ_adrData_pu16;
uint8 XYZ_xTestval2_u8;
/* Gap 1Byte */
uint16 XYZ_stMachine1_u16;
correct:
uint16* XYZ_adrData_pu16;
uint16 XYZ_stMachine1_u16;
uint8 XYZ_xTestval1_u8;
uint8 XYZ_xTestval2_u8;
This is mainly important for structures which are often used multiple!
wrong:
typedef struct
{
uint8 xTestval1_u8;
/* Gap 3Byte */
uint16* adrData_pu16;
uint8 xTestval2_u8;
/* Gap 1Byte */
uint16 stMachine1_u16;
} XYZ_Header_t;
Memory Layout:
8-bit
32-bit
8-bit 16 -bit
XYZ_Header_t struct_1;
XYZ_Header_t struct_2;
XYZ_Header_t struct_3;
XYZ_Header_t struct_4;
is a waste of 16 Byte.
correct:
typedef struct
{
uint16* adrData_pu16;
uint16 stMachine1_u16;
uint8 xTestval1_u8;
uint8 xTestval2_u8;
} XYZ_Header_t;
Memory Layout:
32-bit
16 bit 8-bit 8-bit
This is also an advantage when you have to initialize the struct via a memcpy.
Hint: The EDC17 GNU Compiler/Linker could handle the alignment for
normal variables but not for structures. For future compatibility and to
avoid gaps in structures it is recommended to sort all the variables.
XXX_stRng1_mp = stRng1;
XXX_stRng2_mp = stRng2;
XXX_stRng3_mp = stRng3;
...
XXX_stRngSpo_mp = stRng11; 7
What is the problem:
• 11 variables with boolean size will be written as measurement points. This
generates the code and runtime for write 11 Mp.
Solution:
• By defining stRng as uint16 and handle it like a bit string, also XXX_stRng_mp,
the measurement point could be written within 1 instruction. This saves in this
case 120 Bytes code 9 Byte RAM and runtime depending on the location of the
measurement points.
RULES:
• Simple mathematical calculations can be executed directly at the hand over of the
return value.
Here an example of a function with call by value. The first mathematical operations
could be done directly (see registers d4 and d5). For the next calculation a load from
stack is necessary (ld.w %d15,[%sp]0) which cost additional code and runtime.
var1_s32 = variable1_s32;
var1_s32 = var1_s32+variable2_s32;
5c: 0b 54 00 20 add %d2,%d4,%d5
var1_s32 = var1_s32+variable3_s32;
var1_s32 = var1_s32+variable4_s32;
var1_s32 = var1_s32+variable5_s32;
60: 58 00 ld.w %d15,[%sp]0
62: 42 62 add %d2,%d6
64: 42 72 add %d2,%d7
66: 42 f2 add %d2,%d15
var1_s32 = var1_s32+variable6_s32;
68: 58 01 ld.w %d15,[%sp]4
6a: 42 f2 add %d2,%d15
... return var1_s32;
}
An example of call by reference:
uint8 XXX_HBrgErrHndlr(
const uint numIC ,
const uint8 swtSelOvrCurrErr,
const uint8 stPsDiaDisbl,
sint16 *rPs
const XXX_HBrgPar_t *HBrgParStruct,
XXX_HBrgStat_t *HBrgStatStruct,
XXX_HBrgLoc_t *HBrgLocStruct,
DSM_DFCType DFC_OvrCurr,
DSM_DFCType DFC_TempOvrCurr,
DSM_DFCType DFC_OvrTemp,
DSM_DFCType DFC_UndrVltg,
DSM_DFCType DFC_ShCirOvrLd,
DSM_DFCType DFC_ShCirBatt1,
DSM_DFCType DFC_ShCirBatt2,
DSM_DFCType DFC_ShCirGnd1,
DSM_DFCType DFC_ShCirGnd2,
DSM_DFCType DFC_OpnLd,
sint32 dT
)
RULES:
• From the resource point of view keep the number of parameters as small as
possible. In a lot of cases it is better to use an array or structure. These has
also be considered carefully when a function interface gets designed.
• For global variables "Call by value" is most efficient, unless the variable should
also be written.
• Only values which are changed in the function should be handover via
reference.
The loop instruction has the advantage that it could be executed with 1 cycle latency at
repeat.
Inside the loop there should be no access to data stored in flash or to global variables.
Access times to flash are much higher then on temp. variable stored in register or
stack. Calibration constants will be loaded for each access which causes additional
code and runtime.
uint cnt_uint;
for(cnt_uint=0;cnt_uint < 50;cnt_uint++)
50: 82 00 mov %d0,0
52: c5 02 31 00 lea %a2,31 <CSam_loops+0x21>
{
CSam_loopCt_u16[cnt_uint] = sum;
56: 91 00 00 30 movh.a %a3,0
5a: 8f 20 20 f0 sha %d15,%d0,2
5e: c2 10 add %d0,1
60: 91 00 00 40 movh.a %a4,0
64: d9 33 00 00 lea %a3,[%a3]0
68: 19 41 00 00 ld.w %d1,[%a4]0 <0 <CSam_UseStaticVar>>
6c: 10 3f addsc.a %a15,%a3,%d15,0
6e: 68 01 st.w [%a15]0,%d1
70: fc 23 loop %a2,56 <CSam_loops+0x46>
Bad example:
while (numErr < NUMERR_C || numErr < STNUM_C || state < STATE_C)
{ ....
Solution:
• Try to merge the query: numErr is double inside
• Make local copies of the _C constants
Rule:
• Do not use global variable or constants as break condition, make a local copy
first
• Values which defined with "#define Max_value 50" could be used without
difficulty
Bad Example:
while (a > b || c > d || e < f)
23e: 91 00 00 20 movh.a %a2,0
242: 91 00 00 30 movh.a %a3,0
246: 19 20 00 00 ld.w %d0,[%a2]0
24a: 19 3f 00 00 ld.w %d15,[%a3]0 <0CSam_UseStaticVar>
24e: 3f 0f 16 80 jlt.u %d15,%d0,27a <CSam_loops+0x1f0>
252: 91 00 00 f0 movh.a %a15,0
256: 19 f0 00 00 ld.w %d0,[%a15]0<0<CSam_UseStaticVar>
25a: 91 00 00 f0 movh.a %a15,0
25e: 19 ff 00 00 ld.w%d15,[%a15]0<0<CSam_UseStaticVar>
262: 3f 0f 0c 80 jlt.u %d15,%d0,27a <CSam_loops+0x1f0>
266: 91 00 00 f0 movh.a %a15,0
26a: 19 f0 00 00 ld.w %d0,[%a15]0<0<CSam_UseStaticVar>
26e: 91 00 00 f0 movh.a %a15,0
272: 19 ff 00 00 ld.w d15,[%a15]0<0<CSam_UseStaticVar>
276: 7f f0 26 80 jge.u %d0,%d15,2c2 <CSam_loops+0x238>
27a: 19 23 00 00 ld.w %d3,[%a2]0
27e: 19 32 00 00 ld.w %d2,[%a3]0
{
CSam_loopCt_u16[cnt_uint++] = 1;
282: 8f 21 20 f0 sha %d15,%d1,2
286: 91 00 00 20 movh.a %a2,0
! This is quite a lot of code before the loop as well as in the loop body. The
next code example is doing the same with less break conditions.
RULES:
• Keep the break condition as easy and short as possible!
• Make local copies of global variables and constants
This is a short example about the benefit by using pointer arithmetic. This is a very
common, but worse, example of initializing an array. The initializing of XXCoef1 is
done via assignment. and via pointer arithmetic.
...
XXCoef1[0] = XXX_NUMCYL03_AA0;
8a: 7b 00 f8 13 movh %d1,16256
8e: 59 01 00 00 st.w [%a0]0,%d1
XXCoef1[1] = XXX_NUMCYL03_AA1;
XXCoef1[2] = XXX_NUMCYL03_AA2;
92: 7b 00 04 14 movh %d1,16448
96: 59 01 00 00 st.w [%a0]0,%d1
9a: 7b 00 00 f4 movh %d15,16384
9e: 59 0f 00 00 st.w [%a0]0,%d15
XXCoef1[3] = XXX_NUMCYL03_AA3;
XXCoef1[4] = XXX_NUMCYL03_AA4;
a2: 7b 00 0a 14 movh %d1,16544
a6: 59 01 00 00 st.w [%a0]0,%d1
aa: 7b 00 08 f4 movh %d15,16512
ae: 59 0f 00 00 st.w [%a0]0,%d15
XXCoef1[5] = XXX_NUMCYL03_AA5;
XXCoef1[6] = XXX_NUMCYL03_AA6;
b2: 7b 00 0e 14 movh %d1,16608
b6: 59 01 00 00 st.w [%a0]0,%d1
ba: 7b 00 0c f4 movh %d15,16576
be: 59 0f 00 00 st.w [%a0]0,%d15
XXCoef1[7] = XXX_NUMCYL03_AA7;
XXCoef1[8] = XXX_NUMCYL03_AA8;
c2: 7b 00 11 14 movh %d1,16656
c6: 59 01 00 00 st.w [%a0]0,%d1
ca: 7b 00 10 f4 movh %d15,16640
ce: 59 0f 00 00 st.w [%a0]0,%d15
void CSam_ForLoopPointer()
{
uint cnt_uint;
/* declare 2 pointer */
real32 *konst;
real32 *ram;
/* loop over */
for (cnt_uint = 0; cnt_uint <=8; cnt_uint++)
80: a0 82 mov.a %a2,8
{
*ram++ = *konst++;
82: 44 3f ld.w %d15,[%a3+]
84: 64 4f st.w [%a4+],%d15
86: fc 2e loop %a2,82
}
}
Now there are 6 instructions used, 2 with 32 bit width and 4 with 16 bit= 16 Byte Code.
tmp_sum = sum;
for(cnt_uint=0;cnt_uint < 50;cnt_uint++)
72: 82 00 mov %d0,0
74: c5 02 31 00 lea %a2,31 <CSam_loops+0x21>
{
CSam_loopCt_u16[cnt_uint] = tmp_sum;
78: 8f 20 20 f0 sha %d15,%d0,2
7c: 91 00 00 50 movh.a %a5,0
80: c2 10 add %d0,1
82: d9 55 00 00 lea %a5,[%a5]0 <0 <CSam_UseStaticVar>>
86: 10 5f addsc.a %a15,%a5,%d15,0
88: 68 01 st.w [%a15]0,%d1
One solution for an array is to copy the constant/constant array to a local variable or
struct via a fast copy loop like memcpy. If the same constant or array is used multiple,
the runtime benefit is also multiple.
void Comp_FuncZ_proc(void)
{
mylocalVar = Value1;
better:
temp. variable in generic size.
The local copy save here 10 to 12 cycles which is nearly the half runtime!
Optimisation:
//or typeof(XXXX_stErrMskSelect_CA)
uint8 tmp_stErrMskSelect_CA[sizeof(XXXX_stErrMskSelect_CA)]
SrvB_MemCopy8( &tmp_stErrMskSelect_CA,
XXXX_stErrMskSelect_CA, sizeof(XXXX_stErrMskSelect_CA))
Hint: There is also a Cpu_MemCopy16 and 32 which copies int16/int32 sizes very fast.
For, while, do while: Which of the loop should be used depends on the program job
which has to be solved. From resource point of view every loop could be used, the
compiler generates the nearly the same code for all.
One example how it could work better if the constants are defined bitwise (1,2,4..):
build a mask:
If ( (stPhSig == 1) ||
(stPhSig == 2) ||
(stPhSig == 3) ||
(stPhSig == TIO_TO_MANY_EDGES) ||
(XXX_numPhEdgLstIntNone == 1) ||
(XXX_numPhEdgLstIntNone == 2) ||
(XXX_numPhEdgLstIntNone == 3) ||
(XXX_numPhEdgLstIntNone == TIO_TO_MANY_EDGES)
)
{.....
…also !!NOT!!
if ((((stOvrTemp & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stOvrTemp & DSM_ST_DEB_PRELIM_HEAL_MSK)) ||
((stShCirOvrLd & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stShCirOvrLd & DSM_ST_DEB_PRELIM_HEAL_MSK)) ||
((stShCirBatt1 & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stShCirBatt1 & DSM_ST_DEB_PRELIM_HEAL_MSK)) ||
((stShCirBatt2 & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stShCirBatt2 & DSM_ST_DEB_PRELIM_HEAL_MSK)) ||
((stShCirGnd1 & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stShCirGnd1 & DSM_ST_DEB_PRELIM_HEAL_MSK)) ||
((stShCirGnd2 & DSM_ST_DEB_PRELIM_DEF_MSK) ||
(stShCirGnd2 & DSM_ST_DEB_PRELIM_HEAL_MSK))) != FALSE)
uint stAll = 0;
if (stAll)
{...
RULES:
• Try to build a mask for less querries.
case 6:
returnvariable_s32 = 107;
break;
case 23:
returnvariable_s32 = 230;
break;
case 39:
returnvariable_s32 = 305;
break;
default:
returnvariable_s32 = 560;
break;
}
return returnvariable_s32;
}
In this case a table of 40 pointers have generated, but only 4 will be effectively used.
This is a waste of 36x4 =154 bytes of code.
switch (switchstatement_u32)
{
case 0:
returnvalue_u32 = 4;
break;
case 1:
returnvalue_u32 = 17;
break;
case 2:
returnvalue_u32 = 23;
break;
case 3:
returnvalue_u32 = 35;
break;
case 4:
returnvalue_u32 = 42;
break;
case 5:
returnvalue_u32 = 56;
break;
case 6:
returnvalue_u32 = 64;
break;
case 7:
returnvalue_u32 = 76;
break;
default:
returnvalue_u32 = 100;
break;
}
return (returnvalue_u32);
}
But the compiler handles the write to the same variable not especially.
Using a constant array where every constant is equivalent to one return value a lot
code could be saved:
return (returnvalue_u32);
}
The code of the first switch case generates 34 instructions with a size of 96 bytes, the
code of the if/else only 7 instructions with a size of 20 bytes plus 8 bytes constants.
This is a factor ~3 in the code size.
RULES:
• A switch case should only be used if the cases follow each other e.g. case 0,
case 1.
Here an example where a lot of code with nearly same functionality is written/copied
multiple.
stBal[ST_ENG1]=(bool)((SrvX_IpoGroupCurveS16(dSrchRslt,
XXX_qSetMax1_GCUR)>= SetFlt)&&
(SrvX_IpoGroupCurveS16(dSrchRslt,
XXX_qSetMin1_GCUR) <= SetFlt));
stBal[ST_ENG11]=(bool)((SrvX_IpoGroupCurveS16(dSrchRslt,
XXX_qSetMax11_GCUR) >= SetFlt)&&
(SrvX_IpoGroupCurveS16(dSrchRsltQnt,
XXX_qSetMin11_GCUR) <= SetFlt));
...use a loop.
for {x=0;x<loopCnt;x++)
{
stQntUnBal[x]=(bool)((SrvX_IpoGroupCurveS16(dSrchRsltQnt,
ptoMaxCurve[x]) >= SetFlt)&&
(SrvX_IpoGroupCurveS16(dSrchRsltQnt,
ptoMinCurve[x]) <= SetFlt));
}
Another example for calling a local function 10 times after each other.
&uBattCDHystStateRng1 ,
(bool)SrvB_GetBit(stEng, BT_PRJ_RNG_CHK1),
&stRng1 ,
stBal ,
RANGE1
);
{
.......
}
stCR2 = XXX_CheckRange(…
stCR10 = XXX_CheckRange(
This local function is called 10 times consecutively, and only 4 of the handover
parameter are different. This causes overhead of code and runtime for the additional
load store instructions.
RULES:
• Do not generate multiple needed functionality via copy and paste.
• For multiple needed functionality use sub functions or loops. Sometimes also
loops inside the sub functions are helpful.
Solution:
• call XXX_CheckRange only once and do a loop over all necessary ranges
inside the function.
}
Rules:
• No arithmetic operations allowed with boolean type.
• Boolean variables must only have the values TRUE or FALSE
• No pointer operations on/with boolean variables.
void XXX_cylinder_proc(void)
{
#if (NUM_ZYL >= 1)
ignition_time [0] = (ignition_time[0] * correction_factor[0]) >> 16;
ignition_delay [0] = (ignition_time[0]) + ((ignition_delay [0] *
correction_factor[0]) >> 16);
injection_time [0] = (injection_time [0] * correction_factor[0]) >> 16;
injection_delay [0] = (injection_time[0]) + ((injection_delay [0] *
correction_factor[0]) >> 16);
#endif
... until 6
#if (NUM_ZYL >= 6)
ignition_time [5] = (ignition_time [5] * correction_factor[5]) >> 16;
ignition_delay [5] = (ignition_time[5]) + ((ignition_delay [5] *
correction_factor[5]) >> 16);
#endif }
In worst case this code is expanded 6 times. With a loop the code is only needed once.
The 'volatile' keyword forces the compiler to read the variables, each time they are
used in the C code. Calibration data are declared as volatile
if (y > x)
{
z = x;
}
else
{
z = y;
}
And the processor will read the x and y variable twice, since the are 'volatile':
#define SrvB_Min(x, y) \
({ \
typeof(x) _res; \
typeof(x) _x=(x); \
typeof(y) _y=(y); \
_res = (((_y) > (_x)) ? (_x) : (_y)); \
})
Other examples:
#define EEEBD_GET_BLOCK_TYPE ((EEEBD_GET_BLKFLAGS & \
EEEBD_BLKTYP_MSK) >> \
EEEBD_BLKTYP_BP)
#define EEEPDD_INACTIVE_SECTOR \
((uint8)((Eeepdd_GlobVars_s.xActRdSec_u8 == 0) ? 1 : 0))
• Use global/static bool only if really necessary e.g. high frequently called
variables.
• In the generic type (register size) defined variables obtain optimal performance
at access. Inside a function use generic type/register size where ever possible,
cast only when really necessary e.g. handover to a function, at the end of a
function for return value…
• Temporary variables should be declared in generic type, for global and static
the size has to be considered too.
EDC17:
• Since TriCore does not directly support real64, this type should not be used
to save memory and runtime.
Order of variables
Initialization of variables
EDC17:
• Declare static/global variable, where the initial values are known on start up, at
the location where they are defined.
e.g. static uint16 my_staticExampleVar = 10;
• For global/static variables where the variable initialized via calibration value use
the Macro initValueRAM.
• Try to use a mechanism, e.g. conditional compiling, to switch it off after testing.
• Avoid the use of static bool, try to use a bit string with several information in
one variable.
EDC17:
• Single bit variables can be used to store binary states efficiently.
• Bit variables can only be located in normal RAM and protected RAM, as they
require absolute addressable memory.
• Bit variables must not be used as constants, calibration values, local variables
or inside a structure.
Locating rules:
EDC17:
• Code: Locate in internal flash:
- Service library e.g. SrVB_XXX
- Operating system
- Hardware encapsulation e.g. Adc_Get
- n-sync, 1ms, 2ms called processes
- Interrupt called processes e.g. Uart
- measurement points
• Variables in SPRAM:
- Only measurement points with uint16 and uint32 size, uint8 is NOT possible.
• From the resource point of view keep the number of parameters as small as
possible. In a lot of cases it is better to use an array or structure. These has
also be considered carefully when a function interface gets designed.
• For local variables "Call by value" is most efficient, unless there is more than 1
return value needed.
• For global variables "Call by value" is most efficient, unless the variable should
also be written.
• Huge local structures should be avoided. Structures with 2 elements or less can
be handled like one normal variable.
Loops
• Not relevant computations in loops should be moved outside the loop body.
Avoid something like this:
CSam_loopCt_u16[stMskSelect_C*2+1] = …
• Inside the loop no access to Data stored in Flash, use temp. variable in front of
the loop. For copy constant arrays (flash) also memcpy functionality could be
used. It has to taken care that the size of local array is not greater than 30
bytes that the stack size keeps in limit.
• Do not always try to solve the requirement with the smallest C-Code. The
smallest C-Code do not always leads to the optimal result. This has to be
considered carefully.
• In case of OR query: Put the most probable case in the first statement
• In case of AND query: Put the most improbable case in the first statement
Switch Case
• A switch case should only be used if the cases follow each other e.g. case 0,
case 1.
• For multiple needed functionality use sub functions or loops. Sometimes also
loops inside the sub functions are helpful.
• Make a local copy when a constant or global variable is used more than once in
the function. This is also necessary to get data consistency inside the function.
EDC17:
• Use local copies also for messages.
• Make local copies also from _CA if the values used multiple.
Function calls
It is helpful to see which assembler code the compiler generates. This can be done
with the software build environment.
How to compile a single file:
• Open cmd window
• start TBCON in a view in the medc17 folder.
example: M:\reference_c01_rdd2fe\medc17>
• type command: swb build.c_to_o --file=dir\filename.c --dest=tmp
For more information about software build type swb --help >swbhelp.txt (because help
is to big to read on monitor).
20.2. WinRtm
WinRtm is a windows based tool which gives the possibility to measure and analyse
the runtimes of functions and the accesses on variables. But it is not "only" a runtime
measurement tool it analyses also (for each module!):
Static view:
• Flash: intern, extern, code size, calibration data, constant.
• RAM: Variable, SPRAM, RAM DMI, RAM DMU, stack
Dynamic view
• Runtime: Runtime min/max/average, call distance min/max, interrupts,
reentrant calls.
Coding styleguide:
http://www.intranet.bosch.com/ds/esq/100_topics/202_processportal/300_engineering/
303_coding/001_Guidelines/styleguide_en.pdf
Naming conventions:
http://www.intranet.bosch.com/ds/esq/100_topics/202_ProcessPortal/300_Engineering
/311_ComponentDesign/001_Guidelines/PMT_SE_NamingConventions.pdf
\\si9346\carpu$\extern\Info\user manuals\Application
Notes\Application_Notes\General\dsp_opt_guide_part_2_v164.pdf