Differential Algebra Software Library With Automatic Code Generation For Space Embedded Applications

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/322310349
Differential Algebra software library with automatic code generation for space
embedded applications
Conference Paper · January 2018

DOI: 10.2514/6.2018-0398
CITATIONS READS
0 28
4 authors, including:
Mauro Massari Pierluigi Di Lizia

Politecnico di Milano Politecnico di Milano
55 PUBLICATIONS 164 CITATIONS 87 PUBLICATIONS 276 CITATIONS
SEE PROFILE SEE PROFILE
Francesco Cavenago
Politecnico di Milano
8 PUBLICATIONS 2 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
LUMIO: LUnar Meteoroid Impacts Observer View project
Orbit determination of resident space objects using radar sensors in multibeam configuration View project
All content following this page was uploaded by Francesco Cavenago on 06 February 2018.
The user has requested enhancement of the downloaded file.

AIAA SciTech Forum 10.2514/6.2018-0398
8–12 January 2018, Kissimmee, Florida
2018 AIAA Information Systems-AIAA Infotech @ Aerospace
Differential Algebra software library with automatic code

generation for space embedded applications
Mauro Massari∗ and Pierluigi Di Lizia† and Francesco Cavenago‡

Politecnico di Milano, Milan, Italy, 20156
Alexander Wittig§
University of Southampton, Southampton, United Kingdom, SO17 1BJ.
Differential Algebra (DA) techniques have become increasingly popular in various aerospace
engineering applications over the past 5-10 years. They allow computing polynomial expan-
Downloaded by POLITECNICO DI MILANO on February 6, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2018-0398
sions of functions representing a dynamical system in terms of initial conditions or parameters.

The calculation of these polynomials is computationally expensive, but can often replace many
iterations of a pointwise computation or provide valuable higher order information otherwise
not readily available. DA will allow reducing the computational burden associated to onboard
implementation of such high order Kalman filters which are needed to increase the level of
autonomy in active debris removal (ADR) missions. In this paper we describe the implementa-
tion of the DA Core Engine 2.0 (DACE 2.0) which is entirely developed in C11 with a powerful
modern C++ interface. Current space processors developed in Europe (LEON-3, LEON-4)
run at speeds of hundreds of MHz, providing limited computational power on board of cur-
rent and near-future spacecraft. Any developed software which target embedded hardware
onboard spacecraft is subject to some strong limitation in both coding and resource utilization,
mainly the need of using C only. In order to partly maintain the advantages given by oper-
ator overloading and object oriented programming for writing mathematical expression, an
automatic translation of the DACE 2.0 C++ code into pure C11 code have been implemented.
The resulting implementation is tested in a processor in the loop (PIL) test-bench using simple
problems which are representative of the computational resources needed by an high order
filter.
I. Introduction
Differential Algebra (DA) techniques have become increasingly popular in various aerospace engineering applications
over the past 5-10 years. They allow computing polynomial expansions of functions representing a dynamical system
in terms of initial conditions or parameters. The calculation of these polynomials is computationally expensive, but
can often replace many iterations of a pointwise computation or provide valuable higher order information otherwise
not readily available. In particular, in ground based uncertainty propagation and sensitivity analysis this has been
shown to reduce the overall computational cost significantly when compared to alternative techniques in a wide variety
of fields ranging from particle accelerator physics to astrodynamics [1–6]. Another interesting application of DA in
engineering are high order DA Kalman filters [7]. DA will allow reducing the computational burden associated to
onboard implementation of such high order Kalman filters which are needed to increase the level of autonomy in active
debris removal (ADR) missions. A particularly challenging problem is the estimation of the relative pose and the
prediction of the target attitude are crucial for safe proximity operations[8]. The problem involves non-linear propagation
of the full 6DOF pose (i.e. both the translational and rotational state), which will highly benefits from an efficient
implementation of the DA. An efficient computer implementation of DA allows us to obtain high-order expansions with
limited computational time. In this paper we describe the implementation of the DA Core Engine 2.0 (DACE 2.0) which
is entirely developed in C11 with a powerful modern C++ interface.
Current space processors developed in Europe (LEON-3, LEON-4) run at speeds of hundreds of MHz, providing
limited computational power on board of current and near-future spacecraft. This constrains the level of spacecraft
∗ Assistant Professor, Department of Aerospace Science and Technology, Via La Masa 34, mauro.massari@polimi.it
† Assistant Professor, Department of Aerospace Science and Technology, Via La Masa 34, pierluigi.dilizia@polimi.it
‡ Ph.D Candidate, Department of Aerospace Science and Technology, Via La Masa 34, francesco.cavenago@polimi.it
§ Lecturer in Astronautics, Astronautics Group, Faculty of Engineering and the Environment, Building 13 .
Copyright © 2018 by Mauro Massari, Pierluigi Di Lizia, Francesco Cavenago, Alexander Wittig.
Published by the American Institute of Aeronautics and Astronautics, Inc., with permission.
+1 1/ 1
C r(0) x x+1
x+1
= = ≈
DA x x+1 1-x+x2-x3
+1 1/
Fig. 1 Evaluation of the expression 1/(1 + x) in C r (0) and DA arithmetic.
autonomy because even relatively simple autonomous operations require complex computations to be performed in near
real time. This paper shows the implementation of DA algorithms on board using hardware mimicking the limited
computational power available in space hardware. While ESA’s LEON-4 is currently the most relevant processor for this
application, our assessments is based on an ARM platform, the BeagleBone Black. This platform is still representative
of the limited computational power but easier to develop for due to wider community support.
Any developed software which target embedded hardware onboard spacecraft is subject to some strong limitation in
both coding and resource utilization. In particular, the coding language is constrained by the availability of validated
compiler for the target CPU architecture and Real Time Operating system (RTOS), while the resource utilization is
mainly constrained by guidelines thought to guarantee a predictable memory and computational power use. Given those
constraints, it is clear that C++ language and partly also C11 language have features that contrast those requirements.
This will bring to the need of writing the entire embedded code that uses the DACE 2.0 library using the low level C11
API, thus losing all the nice advantages of using the object oriented approach made available by the C++ interface of the
DACE 2.0 Library.
In order to partly maintain the advantages given by operator overloading and object oriented programming for
writing mathematical expression, an automatic translation of the DACE 2.0 C++ code into pure C11 code have been
implemented.
The resulting implementation is tested in a processor in the loop (PIL) test-bench using simple problems which
are representative of the computational resources needed by an high order filter. Comparison with already available
implementation of the DA on Desktop computer are shown.
II. Differential Algebra

Differential Algebra techniques allow solving analytical problems through an algebraic approach [2]. Similar to the
computer representation of real numbers as Floating Point (FP) numbers, DA allows the representation and manipulation
of functions on a computer. Each sufficiently often differentiable function f is represented by its Taylor expansion
around an expansion point truncated at an arbitrary finite order. Without loss of generality, we choose 0 as the expansion
point. Algebraic operations on the space of truncated Taylor polynomials are defined such that they approximate the
operations on the function space C r (0) of r times differentiable functions at 0. More specifically, each operation is
defined to result in the truncated Taylor expansion of the correct result computed on the function space C r (0). This
yields the so-called Truncated Power Series Algebra (TPSA) [1].
To illustrate the process, consider Figure 1. The expression 1/(x + 1) is evaluated once in C r (0) (top) and then
in DA with truncation order 3. Starting with the identity function x, we add one to arrive at the function x + 1,
the representation of which is fully accurate in DA as it is a polynomial of order 1. Continuing the evaluation the
multiplicative inversion is performed, resulting in the function 1/(1 + x) in C r (0). As this function is not a polynomial
any more, it is automatically approximated in DA arithmetic by its truncated Taylor expansion around 0, given by
1 − x + x 2 − x 3 . Note that, by definition of the DA operations, the diagram for each single operation commutes. That is
to say the same result is reached by first Taylor expanding a C r (0) function (moving from the top to the bottom of the
diagram) and then performing the DA operation (moving from left to right), or by first performing the C r (0) operation
and then Taylor expanding the result.
In addition to algebraic operations, the DA framework can be endowed with natural differentiation and integration
operators, completing the structure of a differential algebra. Intrinsic functions, such as trigonometric and exponential
functions, are built from elementary algebraic operations [2]. This way, Taylor expansions of arbitrary sufficiently
smooth functions given by some closed-form expression can be computed fully algebraically in a computer environment.
2
An important application of DA in engineering applications is the expansion of the flow ϕ(t; x0 ) of an Ordinary
Differential Equation (ODE) to arbitrary order with respect to initial conditions, integration times and system parameters.
The following is a short summary of the underlying concept. For a more complete introduction to DA, as well as a fully
worked out illustrative example of a DA based ODE integrator using a simple Euler step, see [7].
Consider the initial value problem (
xÛ = f (x, t)
(1)
x(t0 ) = x0,
and its associated flow ϕ(t; x0 ). By means of classical numerical integration schemes, such as Runge-Kutta or multi-step
methods, it is possible to compute the orbit of a single initial condition x0 using floating point arithmetic on a computer.
Starting instead from the DA representation of an initial condition x0 , and performing all operations in the numerical
integration scheme in DA arithmetic, DA allows propagating the Taylor expansion of the flow around x0 forward in time,
up to the desired final time t f , yielding a polynomial expansion of ϕ(t f ; x0 + δx0 ) up to arbitrary order.
The conversion of standard explicit integration schemes to their DA counterparts is rather straightforward. One
simply replaces all operations performed during the execution of the scheme by the corresponding DA operations. Step
size control and error estimates are performed only on the constant part of the polynomial, i.e. the reference trajectory
of the expansion point. The result is an automatic Taylor expansion of the result of the numerical method (i.e. the
numerical approximation to the flow) with respect to any quantity that was initially set to a DA value.
The main advantage of the DA-based approach is that there is no need to derive, implement and integrate variational
equations in order to obtain high-order expansions of the flow. As this is achieved by merely replacing algebraic
operations on floating-point numbers by DA operations, the method is inherently ODE independent. Furthermore, an
efficient implementation of DA allows us to obtain high-order expansions with limited computational time. In the
following section we describe the implementation of the DA Core Engine 2.0 (DACE 2.0) we implemented.
III. DA Core Engine 2

The DA Core Engine 2.0 (DACE 2.0) is the fundamental software layer providing an implementation of basic
DA operations. As these algorithms form the basis of all higher level algorithms implemented on top of them, these
operations are critical to performance and must be implemented efficiently. On the other hand, the interface to these
operations must be kept as intuitive as possible to facilitate fast and easy use in programs making use of DA operations.
In the following we will first present the specific requirements and resulting design decisions for the DACE 2.0. This
is followed by a short description of the DACE 2.0 functionality and user interface, as well as results of a performance
comparison on various computation platforms, both desktop and embedded. Lastly we briefly describe an autocoding
method that can generate pure C code with static memory management suitable for onboard execution.
A. Requirements
The following are the requirements set for the DACE 2.0 in order to guarantee its versatility and applicability in a
variety of application scenarios:
Speed: the DACE 2.0 core routines form the basis of all DA algorithms implemented on top. It is therefore
important to optimize these routines for speed across a variety of typical use cases.
Usability: whether developing new algorithms or modifying existing ones, the interface to the DACE 2.0 routines
must be easy and intuitive to use for the developer to allow rapid development of DA enabled applications.
Portability: the DACE 2.0 library must be employable in embedded, desktop and high performance computing
(HPC) environments alike. It must at a minimum run on current versions of Linux, Windows and Mac OS X on
both Intel x86 and ARM hardware using both 32 and 64 bit instruction sets.
Memory management: dynamic allocation of memory is desired in desktop and HPC environments, but a static
option must be available since many embedded systems require static memory allocation.
Parallelizability: code written with the DACE 2.0 library must be fully thread-safe and not restrict the use of
third-party parallelization techniques such as OpenMP or native threads. Facilities for efficient interprocess
communication (IPC) must be provided to efficiently send DA objects across e.g. network streams.
Maintainability: the library must be easy to build and install for users as well as easy to maintain and extend for
developers.
For each of these requirements a trade-off of various possible options has been performed and we selected the
following final design parameters for the library:
3
Programming language: C11 for the core routines, C++11 for the interface
Memory management: three options available at compile time (static, dynamic, legacy)
Build system: CMake
Documentation: Doxygen inline comments
License: Apache License 2.0 [9]
Version Control System: Git hosted on GitHub [10]
The programming language was chosen to allow maximum portability of the code while maintaining high execution
speed. While interpreted languages provide a high level of platform independence, they are also slower than compiled
languages. Other compiled languages, such as Fortran, were ruled out due to lack of available compilers on a wide
range of platforms as well as difficulties of interfacing with higher level languages. C compilers, on the other hand, are
available for virtually any environment and typically very stable. The C11 standard is well documented and includes
several advanced features regarding e.g. parallelization (such as thread local storage modifiers). It is now well supported
across a wide range of compilers. Furthermore, the choice of C meets the requirement for maintainability as it has a
large base of capable developers.
The memory management provides three different options which can be transparently switched by a compile time
option of the library. The interface to all three alternatives is transparent to the user. The first option is a purely statically
allocated memory model, which at compile time reserves a fixed amount of memory for a fixed number of DA objects.
This model is particularly suitable for onboard implementations, where memory management is not allowed. The
statically allocated memory is then managed internally by the DACE 2.0 memory management routines. The second
model instead uses the more common approach of dynamically allocating each DA object as needed using calls to the C
library malloc function. It allows for an unlimited number of DA objects to be allocated, subject only to the constraints
of the available memory. This is the most flexible option which also usually leads to the smallest memory footprint as it
only allocates as many objects as really needed for the execution. The third option is a legacy mode, in which a static
chunk of memory is allocated at first, from which new DA objects are then internally reserved as in the static case.
Should this initial allocation run out, a new larger chunk of memory is reserved dynamically to allow for further DA
allocations. While very marginally faster than fully dynamic memory management, this model is not recommended and
will probably be removed in the future.
While the use of a build system, such as CMake, is optional, it greatly simplifies the maintenance and portability of
the code. CMake is available for a wide variety of platforms and it can generate project files for many backends such as
Makefiles or Visual Studio Projects. It also allows the selection of build time options such as the memory subsystem to
use.
The API documentation of the DACE 2.0 is automatically generated based on formatted comments in the source
code. The system used is the Doxygen code documentation system [11]. This provides both increased maintainability as
well as usability for users and developers alike. The documentation is generated either in LaTeX report form or directly
as HTML suitable for inclusion on websites.
We decided to release the entire project as Open Source under the Apache 2.0 license. This license allows free use
of the DACE 2.0, in both commercial and non-commercial projects without forcing the users to release their own code.
Furthermore, it allows easy contributions to the project by third party developers. This simplifies the maintenance and
future development of the software both for the users as well as developers. The goal is to grow a community of users
and contributors to the DACE 2.0 ecosystem.
To that end, we published the complete public code of the DACE2 on GitHub [10]. GitHub is the most common
platform in open source software development, providing not just a source versioning system but also an ecosystem of
related tools such as wiki pages and bug trackers. The large number of developers familiar with the system lowers the
threshold for contributing to the project.
B. DACE C Core and C++ Interface

The DACE 2.0 C core implements all of the fundamental DA objects and routines. We will not present the algorithms
used here, instead we refer to [12] for the basic algorithm of storing and multiplying polynomials and [2] for the
algorithms for the high order expansion of the intrinsic functions such as sin and exp.
The DACE 2.0 core routines include the basic operations in DA arithmetic. The routines provided by this component
fall into these categories: DACE initialization and state, Memory management, Error handling, DA creation, DA
coefficient access, Input/Output, Arithmetic operations, Intrinsic functions, Norms and estimations, DA evaluation.
A complete overview of all routines is shown in Figure 2. Each of these functions, along with its calling convention,
4
Overview of the DACE Implementation
Public routines currently exported by the DACE core library

Initialization and State DA creation Arithmetic Operations Intrinsic Functions
• Query DACE version • Create constant • Addition (DA, FP) • Square
• Initialization • Create variable • Subtraction (DA, FP) • (n-th) Power
• Set cutoff • Create monomial • Multiplication (DA, FP) • Square Root
• Get cutoff • Create random DA • Division (DA, FP) • Inverse square root
• Create a copy of DA • Integration • (n-th) Root
• Set truncation order
• Truncate
• Get truncation order • Derivation
• Round
• Get machine epsilon • Modulo
• Get max order • Exponential
• Get max variables DA access • Natural logarithm
• Extract constant part Norms & Estimation • Arbitrary logarithm
• Extract linear part • Sine
• Absolute Value • Cosine
Memory Management • Extract coefficient • Norm • Tangent
• Set coefficient • Order sorted norm • Arcsine
• Allocate DA object • List all coefficients • Arccosine
• Order estimation
• Deallocate DA object • Arctangent
• Bounding
• Arctangent2
• Hyperbolic Sine
Input/Output • Hyperbolic Cosine
• Read from strings • Hyperbolic Tangent
Error Handling Evaluation
• Write to strings • Hyperbolic Arcsine
• Query error state • Partial plug • Hyperbolic Arccosine
• Export as BLOB*
• Reset error state • Evaluation tree • Hyperbolic Arctangent
• Import from BLOB*
Fig. 2 Overview of all functions implemented in the DACE2 C library.

36
* BLOB = Binary large object (lossless binary representation of the data in a DA)
is described in the DACE 2.0 API reference documentation [13] which is generated directly form the commented source
code.
We forgo a detailed discussion of each of the functions, referring the reader to the API documentation [13]. Instead,
we provide a simple example of a complete C and C++ program using simple DA operations. This program computes
the Taylor expansion of the expression f (x) = sin(2.5x) around the point x0 = 1.5.
1 # include <DA/ dacecore .h>
2 # include <stdlib .h>
3 # include <stdio .h>
4
5 int main ()
6 {
7 daceInitialize (10 , 1); // 10 th order , 1 variable
8
9 DACEDA x0 , fx;
10 daceAllocateDA (&x0 , 0); // 0 = automatic memory size
11 daceAllocateDA (&fx , 0);
12
13 // Create DA variable x0 = 1.0* dx1 + 1.5
14 daceCreateVariable (&x0 , 1, 1.0);
15 daceAddDouble (&x0 , 1.5 , &x0);
16
17 // Evaluate sin (2.5* x0)
18 daceMultiplyDouble (&x0 , 2.5 , &x0);
19 daceSine (&x0 , &fx);
20
21 dacePrint (& fx); // print result
22
23 daceFreeDA (& x0); // free memory
24 daceFreeDA (& fx);
25
26 return 0;
27 }
Line 7 initializes the DACE 2.0 library with a maximum computation order and number of variables. This call
should always be the first call of a DA enabled program to ensure the library and its internal structures are correctly
initialized. In normal operation this routine should also only be called once, upon subsequent calls all previously
allocated DA objects are automatically invalidated.
Line 9 declares the basic DA objects of type DACEDA, which are then allocated by a call to daceAllocateDA. This
routine will allocate the memory for the DA object according to the memory model selected at compile time of the
DACE 2.0 library. The additional argument 0 passed to this function indicates the amount of memory to allocate, or in
5
the case of 0 automatic sizing to be large enough for any DA object at the current order and variables setup.
Lines 14 and 15 create a new independent DA variable with index 1 (i.e. dx1 ∗ ) in the variable x0, followed by
adding 1.5 to it. This places the complete expression 1.5 + dx1 into x0.
Lines 18 and 19 then evaluate the expression sin(2.5x) with x0 into fx, which is then printed to the screen in line 21.
Finally lines 23 and 24 free the allocated memory of the two DA variables.
While relatively straightforward to read, it is obvious that these routines are not intended to be used directly by a
user. Coding more complex expressions is tedious using functional notation and requires the careful allocation (and
deallocation) of temporary variables. To provide more convenient access to the DA routines, language interfaces to
other, more convenient programming languages can be implemented. The default interface included in the DACE 2.0
library is a C++ interface. This interface implements a DA arithmetic data type that encapsulates DA routines and
handles all aspects of the DACE 2.0 core routines. This C++ interface allows the user to write code in a natural way as
for any of the built in arithmetic data types of the C++ language.
The following example shows a listing of C++ code performing the same computation as the C code shown
previously.
1 # include <DA/DA.h>
2 # include <iostream >
3
4 using namespace DACE;
5
6 int main ()
7 {
8 DA:: init (10 , 1); // 10 th order , 1 variable
9
10 DA x0 = 1.5+ DA (1) , // immediately assign x0 =1.5+ dx1
11 fx = sin (2.5* x0); // and evaluate sin (2.5x)
12
13 std :: cout << fx << std :: endl; // print result
14
15 return 0;
16 }
Clearly, this interface is much more intuitive. While line 8 is still the same first command to initialize the DACE
2.0 library to a given order and number of variables, the following lines are much simpler. First, all of the memory
management is gone as it is performed automatically within the C++ class. memory is allocated and deallocated along
with the DA objects that now encapsulate DACEDA library objects. Furthermore, the familiar mathematical notation in
lines 10 and 11 can be used just as with built-in arithmetic types such as double. The constructor DA(int) called in
line 10 instantiates a new DA object representing the n-th identity variables, i.e. dx1 for DA(1)† . All that remains is to
print the result to the standard output using the normal C++ stream I/O framework.
It is recommended that for normal programming tasks the C++ interface be used exclusively. It is much simpler,
prevents many common user errors, and encapsulates all functionality available within the DACE 2.0 library plus
additional higher level functionality not found in the DACE 2.0 C core functions.
C. C++ to C Autocoder
As stated in Sect. I, any developed software which target embedded hardware onboard spacecraft is subject to
some strong limitation in both coding and resource utilization. In particular, the coding language is constrained by the
availability of validated compiler for the target CPU architecture and Real Time Operating system (RTOS), while the
resource utilization is mainly constrained by guidelines thought to guarantee a predictable memory and computational
power use. The latter is particularly important when execution is scheduled in a hard real time environment when
the single task have limited well determined resources. This second constraint, affects also the choice of the coding
language, as the requirement on the predictability of resource utilization requires that the compiler does not perform a
lot of code optimization that can significantly alter the original code structure. Those code optimization can be tolerated
as far as they can be fine tuned and the constraints on the predictability is maintained (i.e loop unrolling, function
inlining). Moreover, the requirements on predictability pose a very stringent constraint when dealing with memory
allocation. As the memory footprint of the executed code should always be predictable and limited to the allocated
memory resources given by the RTOS scheduler, dynamic memory allocation is usually forbidden.
∗ Note that DA variables in the DACE 2.0 library historically use one based indexing, i.e. dx1 is the first independent variable
† Note also here the one based indexing, i.e. dx1 or DA(1) is the first independent variable, while DA(0) represents the constant function 1.0.
6
Fig. 3 C++ Autocoder scheme
Given those constraints, it is clear that C++ language and partly also C11 language have features that contrast those
requirements. In particular, C++ language is heavily based on the idea of dynamic memory allocation, which is hidden
behind the framework of object creation and destruction, making the predictability of the memory footprint very difficult.
Moreover, the use of Standard Template Library (STL) is very powerful, but also very implementation dependent in
the memory management, making very hard to guarantee a constant memory footprint across different compilers or
architectures. For this reason it is quite difficult to find validated C++ compiler for typical embedded architectures, thus
leading to the impossibility to use the C++ language for the coding of DA code on embedded hardware. Concerning
the C language, the dynamic memory allocation is available as a feature in the language, and typically suggested for
improving code performances, but usually not enforced. Moreover, even when applied, it is usually well exposed in the
code and never hidden behind the scene. This allows to write more predictable code, both in terms of memory usage
and computational resources utilization.
Following those considerations, the use of C++ is not permitted for the development of embedded code, while the C
language constrained to static memory allocation is. This will bring to the need of writing the entire code that uses the
DACE 2.0 library using the low level C API, thus losing all the nice advantages of using the object oriented approach
made available by the C++ interface of the DACE 2.0 Library.
In order to partly maintain the advantages given by operator overloading and object oriented programming for
writing mathematical expression, an automatic translation of the DACE 2.0 C++ code into pure C code have been
implemented. As the implementation of a full C++ to C translator is out of the scope of this work, the translation
problem is limited to the translation of the DACE 2.0 C++ interface classes. No other user defined C++ classes and
objects will be addressed by the developed C++ to C autocoder. This assumption allows to simply the problem and also
to use a more tailored approach.
The most diffused approach in automatic code translation is essentially based on the idea of developing a code parser
which is capable of translating any atomic operation of the original code in the equivalent atomic operation in the target
code. It should be clear even to the novice C++ programmer that this is a task that cannot be done easily when trying
to translate an Object Oriented language to a Procedural language. In this case there are language constructs of the
original Object Oriented language that does not have any equivalent in the target procedural language. Therefore, a
different approach should be considered.
The proposed approach takes advantage of the fact that the autocoder is limited to the DACE 2.0 C++ interface
classes, thus allowing to tackle the problem in an innovative and unusual way. The main idea is that of letting the code
to autocode itself instead of actually performing DA operation. The concept is better explained in Fig. 3, where the
interfaces between the user developed C++ code and the modules of the DACE 2.0 Library are shown. In that figure it is
possible to see that the User C++ code use the high level API provided by the DACE 2.0 C++ interface, which in turn is
interfaced to the C11 Core.
The idea behind the autocoder rely on the fact that the C++ code does not have a direct link with the C11 core,
as the C++ interface is taking care of the automatic translation of high level operation into low level DA operation
implemented in the core. Therefore, the C++ interface can be replaced by a custom implementation which maintain the
high level API but instead of translating into actual call to the C11 core is capable of generating the equivalent C11 code.
7
In this way, the autocoder is essentially a simple reimplementation of the C++ interface which generate the C11 code
associated only with the DACE C++ classes during the execution of a dedicated executable which is compiled against
the autocoder interface instead of the DACE C++ interface.
This approach is highly tailored to C++ interface of the DACE library and requires constant maintenance to keep it
in sync with the actual C++ interface. However, the approach allows to automatically translate all the user C++ code,
including functions and STL objects. This is possible because the autocoder actually run sequentially the original code,
generating the equivalent C11 code, thus automatically inlining all the function call, disregarding the fact that those are
actual function or constructors and members of the STL templates.
The automatic inlining of the function call allows for automatic handling of functions and templates but introduces
limitation to the code that can be successfully autocoded. In particular, the sequential execution of the original code,
means that in case of branch conditions only the true branch get autocoded, bringing to a C11 code which is not complete.
The same problem affects also variable size loops, because the autocoder generate the number of loops associated
with the particular instance of the running code. In order to solve those problems, the original code should be slightly
modified to include instruction to the autocoder for correct translation of if branches and variable size loops.
With those slight modifications, the C++ DA code can be successfully translated into C11 code. However, this is
true only for DA class objects, while other user defined data types and standard types are just evaluated. This is not a
real limitation, as the purpose of the autocoder is to allows easy implementation of mathematical expression.
A lot of operations in the DA class are tightly related with double and integer variables, thus requiring to autocode
also operations on double and integer variables. In order to obtain this, a custom implementation of double and integer
objects has been added both to the C++ interface and to the autocoder, the DAdouble and DAint. Those new data type
in the case of the C++ interface are just defined as double and int, while in the autocoder, proper instruction for the
generation of equivalent C11 code are introduced.
As reported above, one of the main issues that should be addressed by the autocoder is the generation of statically
allocated memory. The chosen approach allows also to solve easily this problem, in fact the code implemented in the
constructors and destructor of the autocoder automatically takes track of the number of needed memory location of
DA objects, DAdouble and DAint. At the same time a proper indexing is managed during the generation of the code,
thus allowing to always keep track of the actual location in memory of the corresponding objects. At the end of the
autocoding process, static memory declaration is added at the beginning of the generated code.
Considering the C++ code example shown in section B, the code need to be slightly modified to include autocoder
instructions as reported here:
1 # include <DA/DA.h>
2 # include <iostream >
3
4 #ifdef AUTOCODE
5 # include "DA/coder .h"
6 #endif
7
8 using namespace DACE;
9
10 int main ()
11 {
12 #ifdef AUTOCODE
13 coder :: currentCoder . maxOrder =10;
14 coder :: currentCoder . maxVariables =1;
15 #endif
16
17 DA:: init (10 , 1); // 10 th order , 1 variable
18
19 DA x0 = 1.5+ DA (1) , // immediately assign x0 =1.5+ dx1
20 fx = sin (2.5* x0); // and evaluate sin (2.5x)
21
22 # ifndef AUTOCODE // printing not implemented on purpose
23 std :: cout << fx << std :: endl; // print result
24 #endif
25
26 #ifdef AUTOCODE
27 coder :: createMain ();
28 coder :: push_front ("# include <dacecore .h >\n");
29 coder :: printCode ();
30 #endif
31
32 return 0;
33 }
8
As it is possible to see from the example, the autocoder instruction are just included in the code at the beginning and
at the very ending, and can conveniently wrapped in preprocessor code, allowing the use of the same source file for
both the autocoder version and the original C++ code. Note that I/O operations have not been implemented yet in the
autocoding on purpose, so all the I/O operation need to be directly tailored. Once compiled, this autocoder version,
instead of actually running, just produces the following C11 version of the code:
1 # include <dacecore .h>
2
3 int main ()
4 {
5 static double da_double [0];
6 static int da_int [2];
7 static DACEDA da_da [3];
8
9 daceInitialize (10 ,1);
10
11 for( unsigned int i = 0; i < sizeof ( da_da )/ sizeof ( da_da [0]); i++)
12 daceAllocateDA ( & da_da [i], 0 );
13
14 da_int [0]=1;
15 da_int [1]=10;
16
17 daceCreateVariable ( & da_da [0] , 1, 1.0e+00 );
18 daceAddDouble ( & da_da [0] , 1.5e+00 , & da_da [1] );
19 daceMultiplyDouble ( & da_da [1] , 2.5e+00 , & da_da [0] );
20 daceSine ( & da_da [0] , & da_da [2] );
21
22 for( unsigned int i = 0; i < sizeof ( da_da )/ sizeof ( da_da [0]); i++)
23 daceFreeDA ( & da_da [i] );
24 }
The code is really similar to the C11 code reported in section B, the only difference is related to the handling of
DA, double and int objects, which are all allocated in arrays at the beginning. The generated code has been carefully
compared with the original code, both in term of accuracy and computational time.
IV. Performance
This section is devoted to assess the computational efficiency of the DACE 2.0 library. The test will be performed by
comparing the DACE 2.0 performances against the DACE 1.0 fortran-base library, whose efficiency has already been
proved [5]. In performing the computational efficiency test, both DACE 1.0 and DACE 2.0 will be compiled using the
same compiler suite and flags, compiler optimizations and on the same platform (same operating system and same
hardware). Successively the same tests will be performed on the selected embedded platform.
Considering the general dynamical system
xÛ = f (x) (2)
the test will be based on multiple evaluations of the Picard–Lindelöf operator applied to the right hand side of Eq. (2),
i.e., ∫ t
x (t) = x0 + f (x (τ)) dτ (3)
t0
where x0 is the initial condition. More specifically, the test program performs the following steps. For each order k,
1) Initialize the first i components of the initial vector x0 as DA variables
2) Evaluate the operator (3) 104 times in its recursive form:
∫
x j+1 = x j + f (x) dτ, for j = 0, 1, . . . , 104 . (4)
The test is executed at order k = 1, . . . , 10 and number of variables i = 1, . . . , 6 considering the following dynamical
systems:
• Two-body dynamics:
µ
rÜ = − 3 r (5)
r
• Restricted Three-body problem (RTBP):
xÜ = 2 yÛ + Ωx
yÜ = −2 xÛ + Ωy (6)
zÜ = Ωz
9
Fig. 4 DACE 2.0/DACE 1.0 execution time ratio on Core i7 - RTBP
with (1 − µ) µ
1 2
Ω= x + y2 + + (7)
2 r1 r2
where r1 and r2 are the distances of the spacecraft from the two primaries and µ = m2 /(m1 + m2 ), with m1 and m2
the masses of the primaries.
The test has been performed by comparing DACE 1.0 and DACE 2.0 on a desktop workstation equipped with an Intel
Core i7-4820k @3.7 GHz with 32 Gb of RAM, and has been repeated 100 times for each order to avoid inconsistencies
due to possible execution delays.
The results of the execution time in seconds of both DACE 1.0 and DACE 2.0 on the Intel Core i7 are reported in
Table 1 while Fig. 4 report the ratios in the case of RTBP, clearly showing that the DACE 2.0 implementation constantly
gains in terms of performances.
In order to evaluate the performance of the DACE 2.0 on embedded hardware, the above mentioned tests have been
performed also on an embedded platform which has been considered to be representative of flight hardware in terms of
performance. The selected platform is the BeagleBone Black (BBB) Single Board Computer (SBC) which is based on
an ARMv7 processor (Cortex A8) @ 1GHz with 512Mb of RAM. The results of the tests are reported in table 2 for the
case of the RTBP, while in Fig. 5 the ratio with respect to the execution on the Intel Core i7 is reported. The execution
time on the BBB is almost constantly 23 times slower with the exception of very high order and number of variables. In
those cases the representation in memory of DA objects is characterized by a high number of coefficients, thus reaching
the bottleneck caused by limited memory transfer speed on the BBB.
Fig. 5 DACE 2.0 execution time ratio on BBB/Core i7 - RTBP
10
Table 1 DACE1/DACE2 Execution Time on Core i7-4820k for Two Body and RTBP
DACE 1.0 [s] Two Body DACE 1.0 [s] RTBP

Ord.\Var. 1 2 3 4 5 6 1 2 3 4 5 6
1 0.0178 0.0182 0.0191 0.0193 0.0195 0.0200 0.0408 0.0421 0.0450 0.0450 0.0457 0.0462
2 0.0232 0.0264 0.0295 0.0320 0.0337 0.0351 0.0533 0.0618 0.0700 0.0758 0.0802 0.0849
3 0.0305 0.0386 0.0454 0.0527 0.0602 0.0664 0.0706 0.0867 0.1193 0.1368 0.1549 0.1726
4 0.0416 0.0620 0.0806 0.1020 0.1248 0.1474 0.0960 0.1415 0.2240 0.2903 0.3337 0.4165
5 0.0571 0.0969 0.1395 0.1888 0.2513 0.3187 0.1388 0.2491 0.4475 0.5817 0.7168 0.9128
6 0.0773 0.1572 0.2500 0.3615 0.5079 0.7059 0.1812 0.3795 0.7888 1.1296 1.5561 2.0720
7 0.1041 0.2500 0.4231 0.6425 0.9824 1.4999 0.2449 0.5647 1.4186 2.1292 3.0874 4.4797
8 0.1388 0.4008 0.7340 1.1733 1.8851 2.9501 0.3305 0.8811 2.5087 3.9484 5.9790 9.0047
9 0.1811 0.5662 1.1326 1.9738 3.3750 5.6008 0.4358 1.3136 4.1430 6.9933 11.0536 17.7365
10 0.2365 0.8297 1.7891 3.3179 5.9740 10.4119 0.5720 2.0207 6.7544 11.9440 19.9880 33.8035
DACE 2.0 [s] Two Body DACE 2.0 [s] RTBP

Ord.\Var. 1 2 3 4 5 6 1 2 3 4 5 6
1 0.0166 0.0157 0.0160 0.0161 0.0164 0.0164 0.0325 0.0334 0.0356 0.0352 0.0354 0.0352
2 0.0187 0.0205 0.0217 0.0232 0.0243 0.0258 0.0399 0.0439 0.0495 0.0520 0.0544 0.0576
3 0.0227 0.0273 0.0313 0.0361 0.0420 0.0496 0.0492 0.0593 0.0785 0.0895 0.1017 0.1190
4 0.0295 0.0443 0.0594 0.0689 0.0867 0.1097 0.0636 0.0903 0.1432 0.1765 0.2168 0.2794
5 0.0379 0.0616 0.0904 0.1295 0.1819 0.2457 0.0841 0.1382 0.2622 0.3541 0.4711 0.6501
6 0.0486 0.0952 0.1578 0.2500 0.3879 0.5444 0.1099 0.2166 0.4684 0.6982 0.9995 1.4943
7 0.0631 0.1440 0.2706 0.4654 0.7595 1.1820 0.1442 0.3267 0.8424 1.3608 2.0723 3.2555
8 0.0828 0.2231 0.4505 0.8694 1.4812 2.3809 0.5829 0.6256 0.6067 0.6598 0.6930 0.7506
9 0.1061 0.3310 0.7333 1.4801 2.6957 4.6089 0.2458 0.7470 2.4847 4.6252 7.7690 13.3410
10 0.1350 0.4843 1.1721 2.5442 4.7965 8.6870 0.3146 1.1163 4.0582 8.1050 14.2496 25.5805
DACE 2.0/DACE 1.0 [-] Two Body DACE 2.0/DACE 1.0 [-] RTBP
Ord.\Var. 1 2 3 4 5 6 1 2 3 4 5 6
1 0.9306 0.8619 0.8388 0.8363 0.8370 0.8179 0.7977 0.7934 0.7911 0.7813 0.7743 0.7618
2 0.8080 0.7773 0.7376 0.7259 0.7195 0.7367 0.7483 0.7106 0.7066 0.6861 0.6782 0.6789
3 0.7425 0.7074 0.6898 0.6859 0.6977 0.7465 0.6968 0.6842 0.6578 0.6542 0.6566 0.6897
4 0.7084 0.7148 0.7379 0.6757 0.6946 0.7444 0.6632 0.6384 0.6391 0.6081 0.6496 0.6708
5 0.6627 0.6363 0.6480 0.6860 0.7239 0.7710 0.6059 0.5551 0.5858 0.6087 0.6572 0.7123
6 0.6291 0.6057 0.6312 0.6915 0.7638 0.7713 0.6063 0.5708 0.5938 0.6181 0.6423 0.7212
7 0.6064 0.5761 0.6396 0.7243 0.7731 0.7881 0.5888 0.5785 0.5939 0.6391 0.6712 0.7267
8 0.5970 0.5565 0.6137 0.7410 0.7857 0.8071 0.5829 0.6256 0.6067 0.6598 0.6930 0.7506
9 0.5860 0.5846 0.6474 0.7498 0.7987 0.8229 0.5641 0.5687 0.5997 0.6614 0.7029 0.7522
10 0.5710 0.5837 0.6551 0.7668 0.8029 0.8343 0.5500 0.5525 0.6008 0.6786 0.7129 0.7567
Table 2 DACE2 Execution Time on BeagleBone Black for RTBP
DACE 2.0 [s] on BBB R3BP DACE 2.0 RTBP BBB/Core i7 [-]
Ord.\Var. 1 2 3 4 5 6 1 2 3 4 5 6
1 0.6334 0.7306 0.7768 0.8132 0.8236 0.8232 19.4695 21.8735 21.8085 23.1055 23.2726 23.3896
2 0.8758 0.9997 1.1588 1.2311 1.3270 1.4523 21.9668 22.7759 23.4162 23.6800 24.3925 25.2084
3 1.0662 1.4062 1.9015 2.2367 2.7370 3.3142 21.6809 23.6939 24.2217 24.9975 26.9041 27.8456
4 1.3755 2.0819 3.3329 4.7674 6.1359 8.5701 21.6110 23.0486 23.2771 27.0117 28.3074 30.6757
5 1.7621 3.0311 5.8619 8.8551 13.1926 19.6565 20.9587 21.9250 22.3604 25.0081 28.0030 30.2349
6 2.4253 4.9748 11.2660 19.3777 31.1600 51.9222 22.0726 22.9628 24.0521 27.7534 31.1754 34.7463
7 2.9575 7.1722 19.6038 33.8612 61.7352 109.7210 20.5106 21.9532 23.2701 24.8834 29.7908 33.7037
8 3.6807 10.5890 32.0123 63.6502 122.8068 236.9549 19.1079 19.2100 21.0339 24.4343 29.6397 35.0583
9 4.5696 15.6793 50.4966 109.6337 228.7901 489.2236 18.5892 20.9883 20.3233 23.7036 29.4490 36.6707
10 5.9314 21.5240 82.6450 187.4629 425.2064 989.7722 18.8522 19.2809 20.3650 23.1293 29.8398 38.6924
11
In any case, considering that the Desktop computer considered for the comparison is quite powerful, the performances
obtained on the BBB shows that the realtime implementation of DA-based algorithm is feasible, especially in the case of
high order filtering which typically is limited to orders up to the 4th .
V. Conclusions
A fully C/C++ implementation of differential algebra has been implemented in the second version of the software
DACE, which will be made available as Open Source. DACE 2.0 has been installed on a BeagleBone Black Single
Board Computer, based on an ARMv7 processor (Cortex A8) @ 1GHz with 512Mb of RAM, which is deemed to
be representative of the limited computational performance available on current onboard processors. In addition, an
autocoder has been implemented to produce embeddable versions of DACE-based C++ codes. Then, a processor-in-the-
loop testbench has been setup to assess the performance of the DACE 2.0. Overall, the execution time on the BBB is
almost constantly 23 times slower with the exception of very high order and number of variables. In those cases the
representation in memory of DA objects is characterized by a high number of coefficients, thus reaching the bottleneck
caused by limited memory transfer speed on the BBB. However, the obtained performances shows that the performances
achievable on embedded hardware are reasonable for realtime application of DA algorithm on space hardware.
References
[1] Berz, M., “The method of power series tracking for the mathematical description of beam dynamics,” Nuclear Instruments and
Methods A258, 1987.
[2] Berz, M., Modern Map Methods in Particle Beam Physics, Academic Press, 1999.
[3] Armellin, R., Di Lizia, P., Bernelli-Zazzera, F., and Berz, M., “Asteroid close encounters characterization using differential
algebra: the case of Apophis,” Celestial Mechanics and Dynamical Astronomy, Vol. 107, 2010, pp. 451–470.
[4] Morselli, A., Armellin, R., Di Lizia, P., and Bernelli-Zazzera, F., “A high order method for orbital conjunctions analysis:
sensitivity to initial uncertainties,” Advances in Space Research, Vol. 55, 2015, pp. 311–333.
[5] Rasotto, M., Morselli, A., Wittig, A., Massari, M., Di Lizia, P., Armellin, R., Valles, C., and Ortega, G., “Differential algebra
space toolbox for nonlinear uncertainty propagation in space dynamics,” Proceedings of the 6th International Conference on
Astrodynamics Tools and Techniques, 2016.
[6] Massari, M., Di Lizia, P., and Rasotto, M., “Nonlinear Uncertainty Propagation in Astrodynamics Using Differential Algebra
and Graphics Processing Units,” Journal of Aerospace Information Systems, Vol. 14, No. 9, 2017, pp. 493–503.
[7] Valli, M., Armellin, R., Di Lizia, P., and Lavagna, M., “Nonlinear mapping of uncertainties in celestial mechanics,” Journal of
Guidance Control and Dynamics, Vol. 36, 2013, pp. 48–63.
[8] Wertz, J., and Bell, R., “Autonomous rendezvous and docking technologies: status and prospects,” Proceedings of the SPIE,
Vol. 5088, 2003, pp. 20–30.
[9] The Apache Software Foundation, “Apache License 2.0,” , 2004. URL https://www.apache.org/licenses/.
[10] DACE Development Group, “DA Core Engine 2.0,” , 2017. URL https://github.com/dacelib/dace.
[11] van Heesch, D., “Doxygen,” , 2016. URL https://www.doxygen.org/.
[12] Berz, M., “Algorithms for Higher Order Automatic Differentiation in many Variables with Applications to Beam Physics,”
Automatic Differentiation of Algorithms: Theory, Implementation, and Application, edited by A. Griewank and G. F. Corliss,
SIAM, Philadelphia, PA, 1991, pp. 147–156.
[13] DACE Development Group, DA Core Engine 2.0 API Reference Manual, 2017.
12
View publication stats

Differential Algebra Software Library With Automatic Code Generation For Space Embedded Applications

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Differential Algebra Software Library With Automatic Code Generation For Space Embedded Applications

Caricato da

Copyright:

Formati disponibili

See discussions, stats, and author proﬁles for this publication at: https://www.researchgate.

Conference Paper · January 2018

Mauro Massari Pierluigi Di Lizia

SEE PROFILE SEE PROFILE

LUMIO: LUnar Meteoroid Impacts Observer View project

The user has requested enhancement of the downloaded ﬁle.

Differential Algebra software library with automatic code

Mauro Massari∗ and Pierluigi Di Lizia† and Francesco Cavenago‡

sions of functions representing a dynamical system in terms of initial conditions or parameters.

II. Differential Algebra

III. DA Core Engine 2

B. DACE C Core and C++ Interface

Public routines currently exported by the DACE core library

Fig. 2 Overview of all functions implemented in the DACE2 C library.

Fig. 3 C++ Autocoder scheme

Fig. 4 DACE 2.0/DACE 1.0 execution time ratio on Core i7 - RTBP

Fig. 5 DACE 2.0 execution time ratio on BBB/Core i7 - RTBP

DACE 1.0 [s] Two Body DACE 1.0 [s] RTBP

DACE 2.0 [s] Two Body DACE 2.0 [s] RTBP

Table 2 DACE2 Execution Time on BeagleBone Black for RTBP

[11] van Heesch, D., “Doxygen,” , 2016. URL https://www.doxygen.org/.

View publication stats

Potrebbero piacerti anche