Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1, N
i
is instrumented
0, N
i
is uninstrumented.
(1)
In a reducible CFG, an inner-most loop L has exactly one
back edge that turns it into a cyclic graph, and the loop has a
unique entry node and a unique exit node, which are denoted by
N
entry
and N
exit
, respectively. The WCET of exit node N
exit
is given by
w(N
exit
) = C(N
exit
) +x[N
i
] overhead
instru
(2)
where C(N
i
) is a constant denoting the execution time of a BB
N
i
without instrumentation; overhead
instru
denotes the timing
overhead imposed by the instrumentation; since instrumenta-
tion for each BB is identical, we can use a constant to represent
the overhead here. If some other CFC algorithm incurs different
overheads for different BBs, the formulation can be easily
extended to take them into consideration.
Using V
loop
to represent the set of BBs in the loop, the
WCET of the program path leading from a node N
i
to N
exit
must be greater than or equal to the WCET of any path leading
from any successor of N
i
in V
loop
, i.e.,
N
i
V
loop
/{N
exit
} : (N
i
, N
succ
) E (3)
w(N
i
) w(N
succ
) +C(N
i
) +x[N
i
] overhead
instru
(4)
where E represents the set of edges in the CFG.
Additional constraints can be added iteratively until we reach
the loop entry node N
entry
to obtain w(N
entry
) as WCET of a
single-loop iteration. Due to the properties of reducible CFG,
we can always reduce any loop L into a super node, whose
WCET is computed as w(N
entry
) multiplied by the loop bound,
which is part of the WCET ow facts obtained from the SWEET
analysis tool, i.e.,
w
loop
= bound
loop
w(N
entry
). (5)
Iterative applications of the reduction operations will even-
tually collapse the entire program into a single node. Since
main() is the entry point for the entire program, the following
constraint species an upper bound of the program WCET:
w
main
WCET
ub
(6)
where WCET
ub
is a tolerable WCET upper bound specied
by the designer based on application requirements.
The optimization objective is to maximize the average total
execution time of the instrumented BBs, which is normalized
to the average total program execution time, i.e.,
OptObj =
i
x[N
i
] f
i
C
i
i
f
i
C
i
(7)
where f
i
is the average-case execution frequency of BB N
i
,
which is obtained via proling the program with random inputs;
C
i
is the execution time of BB N
i
. The numerator in (7) is the
total execution time of instrumented BBs, and the denominator
is the total execution time of the entire program, which are
both obtained via proling program executions with random
inputs. For a given program, the denominator can be viewed
as a constant, and its purpose is to make OptObj a relative
percentage for easy comparison purposes.
Equations (1)(7) form a complete ILP optimization model
that can be input to the CPLEX solver and solved.
As mentioned in [11], this style of ILP formulation does not
take into account any infeasible path information for WCET
estimation, i.e., all the paths in the CFG are considered feasible;
hence, it is pessimistic for programs with infeasible paths. This
is necessary to make it possible to collapse a loop into a super
node during the procedure for obtaining w
main
described in
this section. For example, consider an if-then-else conditional
statement within a while loop body, where the then branch is
much shorter than the else branch. The ILP formulation will
always count the execution time of the longer else branch for
GU et al.: WCET-AWARE PCFC FOR RESOURCE-CONSTRAINED REAL-TIME EMBEDDED SYSTEMS 5657
estimation of execution time of the overall while loop. This is
obviously safe; however, it may overestimate the WCET by a
large amount, if the else branch is rarely executed. To be more
accurate and less pessimistic, a sophisticated WCET analysis
tool, such as SWEET, is needed. On the other hand, it is not
possible to build an ILP formulation, since the WCET analysis
tool must be used as a blackbox and cannot be formulated as
a set of linear constraints as input into an ILP formulation.
Therefore, heuristic algorithms must be used, instead of ILP,
if a WCET analysis tool is used.
C. Greedy Heuristic Algorithm
Here, we present heuristic algorithms for solving the op-
timization problem. Compared to existing techniques for
compiler-based WCET reduction, where selected BBs on the
WCEP are placed into scratchpad memory to reduce its WCET,
we approach the PCFC problem from the reverse direction:
rst fully instrument all BBs in the program CFG according
to CEDA, referred to as full CFC, then remove instrumentation
from selected BBs, preferably those on the current WCEP, in
order to reduce the WCET to below WCET
ub
. (Unlike the ILP
formulation, where the program WCEP is implicitly encoded,
the program WCEP is explicitly represented in the greedy
heuristic algorithm.)
Algorithm 1 shows the Basic Greedy Heuristic Algorithm.
We start with the conguration of full CFC with every BB
instrumented based on CEDA, by initializing the two sets
InstBB (denoting the set of instrumented BBs) to be the set
of all BBs in the program CFG and UninstBB (denoting the
set of uninstrumented BBs) to be the empty set (line 1).
As we select BBs to be uninstrumented, we will gradually
move more and more BBs from set InstBB to set UninstBB.
We run the WCET analyzer SWEET [9] to generate current
WCET estimate WCET
cur
and the corresponding WCEP
cur
(line 2). If WCET
cur
exceeds the given upper bound
WCET
ub
(line 3), then select the BB N
i
with minimum
f
i
C
i
among all instrumented BBs on WCEP
cur
, and remove
instrumentation from it (lines 45). At this point, we should not
simply decrement WCET
cur
by the overhead reduction due to
uninstrumentation of BB N
i
, which is equal to Iter(N
i
)
O
inst
,
where Iter(N
i
) is the number of iterations of BB N
i
on the
path WCEP
cur
, and O
inst
is the constant runtime overhead
introduced by instrumentation for each BB N
i
, since there may
be a WCEP switch; hence, WCEP
cur
may no longer be the
program WCEP after uninstrumenting N
i
. Instead, we must
rerun SWEET to obtain the new WCEP
cur
and WCET
cur
(line 6).
{Algorithm 1} Basic Greedy Heuristic Algorithm
1. InstBB = {Set of all BBs}; UninstBB = ; /
Start
with full CFC, where all BBs are instrumented.
/
2. Run SWEET to obtain WCEP
cur
and WCET
cur
3. while (WCET
cur
> WCET
ub
){
4. Select N
i
InstBB with minimum f
i
C
i
among all
BBs on WCEP
cur
, and remove instrumentation from it.
5. InstBB = InstBB \ {N
i
}; UninstBB = UninstBB
{N
i
}; /
Move N
i
from InstBB to UninstBB.
/
6. Run SWEET to obtain the updated WCEP
cur
and
WCET
cur
}
Running SWEET to obtain WCEP
cur
and WCET
cur
can
be time consuming for large programs. Algorithm 1 has very
long running time since it needs to run SWEET very frequently,
after each BB that is uninstrumented. In order to improve
algorithm efciency, we propose the Enhanced Greedy Heuris-
tic Algorithm, as shown in Algorithm 2, where the parts that
are different from Algorithm 1 are shown in bold font. It
improves efciency by successively removing instrumentation
from multiple BBs on the WCEP on the order of increasing
average execution time (f
i
C
i
) until execution time of the
current execution path ET
wcep
has been reduced to below
WCET
ub
, with the goal of preferentially instrumenting BBs
with large average execution times. During this process, there
may have been multiple WCEP switches, so that SWEET-
computed WCEP
cur
is no longer the real WCEP, and the
execution time ET
wcep
is no longer the real WCET. However,
since the real WCET must be greater than or equal to ET
wcep
,
it is a necessary but not sufcient condition to reduce ET
wcep
to below WCET
ub
, in order to reduce the real WCET to below
WCET
ub
.
{Algorithm 2} Enhanced Greedy Heuristic Algorithm
1. InstBB = {Set of all BBs}; UninstBB = ; /
Start
with full CFC, where all BBs are instrumented.
/
2. Run SWEET to obtain WCEP
cur
and WCET
cur
3. while (WCET
cur
> WCET
ub
){
4. ET
wcep
= WCET
cur
;
5. while (ET
wcep
> WCET
ub
){
6. Select N
i
InstBB with minimum f
i
C
i
among all
BBs on the WCEP
cur
, and remove instrumentation
from it.
7. InstBB = InstBB \ {N
i
}; UninstBB = UninstBB
{N
i
}; /
Move N
i
from InstBB to UninstBB.
/
8. ET
wcep
= ET
wcep
Iter(N
i
)
O
inst
}
9. Run SWEET to obtain the updated WCEP
cur
and
WCET
cur
}
Algorithm 1 is similar to the greedy heuristic algorithm in
[11] for allocation of program data to scratchpad memory, but
our objective is to reduce the WCET to below a given upper
bound WCET
ub
, while the objective in [11] is to minimize the
WCET; hence, we are able to develop the enhanced Algorithm2
to enhance its efciency by exploiting the WCET
ub
constraint.
Since the heuristic Algorithm 1 is many times slower than
Algorithm 2, and their performance results are similar, we used
Algorithm 2 in our experiments, which is referred to as the
greedy heuristic.
5658 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 10, OCTOBER 2014
Fig. 6. Example for illustrating nonoptimality of the greedy heuristic
algorithm.
Fig. 7. Detection of CFEs with PCFC. (a) CFE is detected. (b) CFE is
undetected.
It is obvious that the greedy algorithm is not optimal, since
it only considers the current WCEP and does not have the
global view. Consider the CFG fragment in Fig. 6. The heuristic
algorithm may select one BB each to be uninstrumented on
path b and path c to reduce the WCET, which is determined by
the execution time of both paths {a b, a c}, by a certain
amount, but it may be possible to uninstrument one single BB
on path a to reduce the WCET by the same amount, which could
lead to a better solution.
We emphasize that neither the ILP formulation nor the
heuristic algorithm is optimal: the ILP formulation is not
optimal since it does not take into account infeasible path
information for WCET estimation, and the heuristic algorithm
is not optimal due to its inherent greedy nature.
D. Fault Detection Anomalies
We present some discussions on fault-detection anomalies,
i.e., fault-detection coverage may not be always higher for
larger number of instrumented BBs for a given program. Fig. 7
shows how the fault-detection coverage depends on patterns of
instrumented BBs, not just the number of them. In Fig. 7(a),
an erroneous jump between BBs lying on two different paths is
detectable, while in Fig. 7(b), an erroneous jump between two
consecutive uninstrumented BBs is undetectable.
Figs. 8 and 9 show that fault-detection ratio may be higher
for fewer number of instrumented BBs. Consider the original
CFG, as shown in Fig. 8(a). If N
1
and N
4
are chosen to be
instrumented, then the resulting PCFG is shown in (b); if only
N
1
is chosen to be instrumented, then the resulting PCFG is
shown in (c). Fig. 9 shows a closeup look at the PCFGs in
Fig. 8(b) and (c). In Fig. 8(b), N
1
is type A, since it has (at
least) two predecessors N
1
and N
4
, and one of its predecessors
(N
1
) has multiple (two) successors (N
1
and N
4
). In Fig. 8(c),
N
1
is type X, assuming that the BB connected with the top edge
entering N
1
does not have any other successors. According to
CEDA, the signature-update instruction for N
1
in (b) is S =
S AND d1(N
1
) (S = S AND 11000), whereas the signature-
Fig. 8. Fault detection anomaly. (a) Original CFG. (b) PCFG if N
1
and N
4
are instrumented; N
1
is a type-A node. (c) PCFG if only N
1
is instrumented;
N
1
is a type-X node.
Fig. 9. Closeup look at the PCFGs in (left) Fig. 8(b) and (right) Fig.8(c).
update function for N
1
in (c) is S = S XOR d1(N
1
) (S = S
XOR 00100). (Note that d1(N
1
) may be different for node N
1
for different PCFGs in (a) and (b), since its value is dependent
on the PCFG.) Consider the scenario when a CFE occurs in
N
1
, causing an erroneous jump from N
1
to N
2
or N
3
. For the
case in (b), when the control ow comes back to N
1
from N
3
,
before entering N
1
, the global signature S = 11000 since the
update at the end of N
1
is skipped. The AND operation (S = S
AND 11000 = 11000 AND 11000) at beginning of N
1
in (b)
results in the global signature value (11000), which is same as
the expected value; thus, the CFE escapes detection. For the
case in (c), the XOR operation (S = S XOR 00100 = 11000
XOR 00100) at the beginning of N
1
in (c) results in a global
signature value 11100 that is different from the expected value
of 11000; thus, the CFE will be detected. However, we cannot
always conclude that Fig. 8(b) has lower fault-detection ratio
than Fig. 8(c). Since Fig. 8(b) has N
4
instrumented in addition
to N
1
, it can help detect additional faults that are undetectable
in Fig. 8(c), with only N
1
instrumented.
IV. PERFORMANCE EVALUATION
We use a set of programs from the Malardalen WCET
benchmark [13] in our experiments, as shown in Table II. Since
our target is low-cost resource-constrained real-time embedded
GU et al.: WCET-AWARE PCFC FOR RESOURCE-CONSTRAINED REAL-TIME EMBEDDED SYSTEMS 5659
TABLE II
BENCHMARK PROGRAMS USED IN THE EXPERIMENTS
Fig. 10. Normalized WCET for full CFC (WCET
FCFC
/WCET
orig
).
systems, it is not appropriate to use large benchmark programs
for general-purpose computing, such as those of Standard
Performance Evaluation Corporation, since they are not real-
time programs running in an embedded environment. Since
our heuristic algorithm is very efcient, we expect it to work
well with larger programs without any surprises, although the
ILP method may run into scalability issues. We implement the
CFC algorithms within a Low-Level Virtual Machine (LLVM)
IR. We implemented fault injection by processing the IR of
the LLVM compiler framework, and inject all three types of
CFEs randomly, including branch insertion, branch deletion,
and branch target modication.
Fig. 10 shows that full CFC results in large increases in pro-
gram WCET for most programs, which may not be acceptable
to real-time embedded systems. This justies our objective of
imposing a user-specied upper bound on program WCET with
PCFC.
As mentioned earlier, neither ILP nor the greedy heuris-
tic algorithm is optimal. Fig. 11 shows OptObj, as dened
Fig. 11. OptObj increases with increasing WCET
ub
Fig. 12. Average fault-detection ratio increases with increasing WCET
ub
.
in (7), i.e., average percentage of execution time of the in-
strumented BBs, further averaged across the diverse set of
programs in Table II. It indicates a clear tradeoff between
the allowed WCET
ub
and OptObj, using either the greedy
heuristic algorithm or ILP: the larger WCET
ub
, the larger
value of OptObj. Starting with WCET
ub
= WCET
orig
, with
the increasing value of WCET
ub
, OptObj (with either ILP
or heuristic) increases rapidly, but we see diminishing returns
when WCET
ub
is increased further, particularly when it goes
above 1.4
WCET
orig
. This indicates that our PCFC approach
can achieve signicant coverage of instrumented BBs by re-
moving instrumentation from a small number of BBs that lie on
the long paths that are responsible for program WCET.
In general, ILP outperforms the greedy heuristic, except
when the WCET upper bound is 1.0
WCET
orig
, i.e., no
WCET increase is permitted beyond the original program
WCET. For this case, the ILP approach cannot instrument any
BBs at all (OptObj = 0), since it inherently overestimates the
WCET due to lack of information on infeasible paths, and its
WCET estimate exceeds WCET
orig
obtained with SWEET;
hence, it leaves no timing margin for instrumenting even a
single BB. However, the heuristic algorithm can instrument
some BBs, since it uses SWEET for accurate WCET analysis
and can add instrumentation to BBs that are not on the WCEP.
With increasing WCET
ub
, more CFC instrumentation can be
added to the program until we reach full CFC, where all BBs
are instrumented.
We use random fault injections to inject three types of
CFEs mentioned earlier. For each benchmark program and
each WCET upper bound WCET
ub
value, 2000 random CFE
injection runs are performed, and one CFE is randomly acti-
vated for each run, simulating the common SEU faults. The
fault-detection ratio is the percentage of injected faults that
are detected by the CFC instrumentation. Fig. 12 shows the
average fault-detection ratio, which increases with increasing
5660 IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 61, NO. 10, OCTOBER 2014
Fig. 13. Fault detection ratio and OptObj [as dened in (7)] for four programs: (a) cover; (b) r; (c) cnt; and (d) nsichneu. The x-axis is WCET
ub
/WCET
orig
.
Solid lines denote the fault-detection ratio; dot-dash lines denote OptObj; both are normalized to be a percentage between 0% and100%.
Fig. 14. Distribution of fault-injection results. The x-axis is WCET
ub
/WCET
orig
. Numbers in parenthesis denote the function return error code.
WCET
ub
. (Note that the fault-detection ratio is less than
100%, even for full CFC, since CEDA only detects internode
CFEs, not intranode CFEs.)
Fig. 13 shows the fault-detection ratio and OptObj for four
programs; both metrics increase with increasing WCET
ub
,
which is consistent with the average results in Figs. 11 and
12, but the plots for the two metrics do not have identical
shapes. This is expected since fault-detection ratio depends on
the specic patterns of instrumented BBs in the CFG, not just
on the average total execution time (OptObj) or the number
of instrumented BBs. That is, OptObj is a reasonable coarse-
grained metric for guiding placement of CFC instrumentation,
but it is not the sole factor in determining fault-detection ratio.
Fig. 14 shows detailed distribution of consequences of fault
injection for the heuristic algorithm (the results for ILP are
similar and omitted). The vertical bars denote silent errors
(exit from end of main), program crashes (abort, bus error, and
segmentation fault), and faults detected by CFC (exit from error
handler).
V. CONCLUSION
We have presented algorithms for PCFC to enable trade-
offs between the fault-detection ratio and the increase in pro-
gram WCET for SW-implemented CFC. Experimental results
demonstrate that PCFC signicantly enables reductions of the
program WCET compared to full CFC, at the cost of reduced
fault-detection ratio. Our techniques are useful for designing
safety-critical fault-tolerant systems on resource-constrained
HW platforms, when the designer cannot afford to use full CFC
but still expects a certain level of fault tolerance.
GU et al.: WCET-AWARE PCFC FOR RESOURCE-CONSTRAINED REAL-TIME EMBEDDED SYSTEMS 5661
APPENDIX
TABLE III
SOME NOTATIONS AND ABBREVIATIONS USED IN THIS PAPER
REFERENCES
[1] J. Muoz-Castaer, R. Asorey-Cacheda, F. J. Gil-Castieira,
F. J. Gonzlez-Castao, and P. S. Rodrguez-Hernndez, A review of
aeronautical electronics and its parallelism with automotive electronics,
IEEE Trans. Ind. Electron., vol. 58, no. 7, pp. 30903100, Jul. 2011.
[2] K. Erwinski, M. Paprocki, L. M. Grzesiak, K. Karwowski, and
A. Wawrzak, Application of Ethernet powerlink for communication in
a Linux RTAI open CNC system, IEEE Trans. Ind. Electron., vol. 602,
no. 2, pp. 628636, Feb. 2013.
[3] M. Idirin, X. Aizpurua, A. Villaro, J. Legarda, and J. Melndez, Imple-
mentation details and safety analysis of a microcontroller-based SIL-4
software voter, IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 822829,
Mar. 2011.
[4] A. Monot, N. Navet, B. Bavoux, and F. Simonot-Lion, Multisource
software on multicore automotive ECUscombining runnable sequenc-
ing with task scheduling, IEEE Trans. Ind. Electron., vol. 59, no. 10,
pp. 39343942, Oct. 2012.
[5] Q. Zhao, Z. Gu, and H. Zeng, PT-AMC: Integrating preemption
thresholds into mixed-criticality scheduling, in Proc. DATE, 2013,
pp. 141146.
[6] Q. Zhao, Z. Gu, and H. Zeng, HLC-PCP: A resource synchronization
protocol for certiable mixed criticality scheduling, IEEE Embedded
Syst. Lett., vol. 6, no. 1, pp. 811, Mar. 2014.
[7] A. S. R. Oliveira, L. Almeida, and A. de Brito Ferrari, The
ARPA-MT embedded SMT processor and its RTOS hardware accelera-
tor, IEEE Trans. Ind. Electron., vol. 58, no. 3, pp. 890904, Mar. 2011.
[8] R. Vemu and J. A. Abraham, CEDA: Control-ow error detection us-
ing assertions, IEEE Trans. Comput., vol. 60, no. 9, pp. 12331245,
Sep. 2011.
[9] N. Oh, P. P. Shirvani, and E. J. McCluskey, Control-ow checking by
software signatures, IEEE Trans. Rel., vol. 51, no. 1, pp. 111122,
Mar. 2002.
[10] B. Lisper, C. Sandberg, and N. Bermudo, A tool for automatic ow
analysis of C-programs for WCET calculation, in Proc. WORDS, 2003,
pp. 106112.
[11] V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen, WCET centric
data allocation to scratchpad memory, in Proc. RTSS, 2005, pp. 223232.
[12] S. Fischmeister and P. Lam, Time-aware instrumentation of real-time
programs, IEEE Trans. Ind. Informat., vol. 6, no. 4, pp. 652663,
Nov. 2010.
[13] J. Gustafsson, A. Betts, A. Ermedahl, and B. Lisper, The Mlardalen
WCET benchmarks: Past, present and future, in Proc. WCET, 2010,
pp. 136146.
Zonghua Gu received the Ph.D. degree in com-
puter science and engineering from the University of
Michigan, Ann Arbor, MI, USA, in 2004.
He is currently an Associate Professor with
Zhejiang University, Hangzhou, China. His research
interests include real-time embedded systems.
Chao Wang received the B.Sc. degree in com-
puter science in 2011 from Zhejiang University,
Hangzhou, China, where he is currently working
toward the Ph.D. degree.
Ming Zhang received the B.Sc. degree in soft-
ware engineering in 2010 from Zhejiang University,
Hangzhou, China, where he is currently working
toward the Ph.D. degree.
Zhaohui Wu (SM05) received the B.Sc. and
Ph.D. degrees in computer science from Zhejiang
University, Hangzhou, China, in 1988 and 1993,
respectively.
He is currently a Professor with the Department of
Computer Science, Zhejiang University. His research
interests include distributed articial intelligence, se-
mantic grid, and pervasive computing.
Dr. Wu is a Standing Council Member of the
China Computer Federation.