Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Developing Software in a
Multicore & Multiprocessor World
Tool-based approach for nding complex concurrency issues
and endian incompatibilities
To keep pace with customer demands for more functionality and speed,
software teams are moving away from single processor architectures at a rapid
rate. In particular, embedded devices that used to have one chip to perform
a constrained set of tasks are now working in heterogeneous processor
environments where processors are used for network connectivity, multi-media,
and a whole variety of requirements. According to new data from VDC Research,
this trend is only expected to accelerate: engineers expect that in two years time,
the number of single processor projects will drop by half.
The business impact of this growing complexity is stark: multicore and multiprocessor software
projects are 4.5X more expensive, have 25% longer schedules, and require almost 3X as many
software engineers.1
One area in particular where this growing complexity can have a dramatic impact on cost and schedule
overruns is in the area of software testing and code inspection. A multicore/processor environment can
add exponential complexity to effectively identifying errors in software. There are two areas in particular
that have the ability to drag the productivity of a software team through the floor: concurrency errors
and endian incompatibilities.
This whitepaper will discuss these types of issues in detail, explain how Klocworks source code
analysis engine, Klocwork Truepath can be used to address them, and walkthrough two examples
of these problems in prominent open source projects.
Current Project
Multicore and
multiprocessor
5.2%
Multicore
9.3%
Dont know
8.5%
Multicore and
multiprocessor
19.4%
Multiprocessor
20.8%
Expected in 2 Years
Dont know
2.9%
Single
processor
61.8%
Single
processor
30.1%
Multicore
21.4%
Multiprocessor
20.6%
Figure 1 | Processing
Architecture Used in
the Current Project
and Expected in Next
Two Years (Percent of
Respondents)
VDC Research, Next Generation Embedded Hardware Architectures: Driving Onset of Project Delays, Costs Overruns, and
Compile
Symbolic logic
Concurrency
Emulate native
build
Analyze control
ow graph
Analyze lock
dependencies
Build control ow
graph
Perform dataow
analysis
Figure 2 | Klocwork
Truepath tool chain provides
concurrency analysis
engine after control flow
graph analysis and build
emulation.
In this figure you can see that data relating to lock lifecycles is gathered by the normal analysis engine, and once this has been
produced for all modules in the system, the whole program space is then analyzed by the new concurrency analysis engine so
that loops in the lifecycle graph can be found, which equate to deadlocks.
Consider a function that operates as follows:
lock_t Lock1, Lock2;
void foo(int x) {
if( x & 1 ) {
lock(Lock1);
lock(Lock2);
}
else
lock(Lock1);
}
You can easily see by inspection that when passed an odd number as its parameter, this function defines a dependency of
Lock2 upon Lock1. Failing an odd parameter, Lock1 is still reserved, but this time there is no dependency of Lock2 upon
Lock1 at the local scope, although there may still remain that dependency (or another) at an inter-procedural scope.
Therefore, we have two discrete types of questions to ask when performing the analysis:
1. Symbolic logic questions:
a. Is there a valid control flow that gets us to call function foo() with an odd parameter?
b. Is there a valid control flow that results in foo() being called with an even parameter followed by a call to
another function that results in another lock (e.g. Lock2) being reserved before Lock1 is released?
2. Lock dependency questions:
a. If either of these are so, is there any other situation in the programs natural control flow whereby a counterdependency of Lock1 upon Lock2 can be reached, potentially resulting in a deadlock?
The first type of question is answered by Klocwork Truepaths symbolic logic engine during the normal course of program
analysis, just as any other type of defect is analyzed for inter-procedural data flows that can or cannot occur.
The second type of question is then answered by the concurrency analysis engine, fed by the collection of all possible
dependencies within the program space. The result is what tends to be a small set of incredibly difficult to find (manually), and
insanely difficult to understand (without a tool) deadlock scenarios that developers can triage and fix very quickly within the
natural course of their implementation tasks.
Endian Incompatibilities
Whilst it may be true that there are 10 kinds of people in the world, a switch from a little endian platform to a big endian platform
will muddy that impression considerably. An advisor of ours recently informed me with glee that hed finally set his MSB (having
passed his 64th birthday), but store that in nibble representation on an unexpected endian architecture and hed be regressing
to the nursery once more.
In short, endian representations affect how the host processor stores integral types in memory. Considering 32-bit integers,
each of which consists of four bytes of memory, the processor can chose to read and write those four bytes in a variety of
orders, although traditionally only two are used:
This picture becomes slightly muddied if the processor actually writes words at a time (this is mostly a fairly historical
representation now, but we mention it for completeness), and applies its endian assumptions to each word:
However the processor stores and reads such types is entirely at its own discretion and the business of nobody else. Until,
that is, the developer directs the processor to write such data into a medium for transmission, as opposed to storage in
memory.
Transmission media, which could be sockets, files, pipes, or any other inter-processor vector (e.g. interrupts that cause data
to be written to the PCI-Express interface, or to the serial bus, or), are addressed by the processor in exactly the same way
as memory unless specifically told to do otherwise.
Thus, a big endian processor will write a 32-bit integer onto a socket in byte order 3, 2, 1, 0. If the CPU on the other end
of the socket uses a little endian architecture, then obviously a value written onto the socket will be interpreted completely
differently when read. For example, a value of 29, written by a big endian processor and read by a little endian processor will
be interpreted as 53,504 not a small correction by any means.
Preparing a program for use with heterogeneous processor architectures therefore involves finding every integral type that
ever hits a transmission vector that could legitimately target another processor and ensuring that the read/write operation
involved transforms the data into / from a neutral representation that both sides agree on. In a program of any size at all,
obviously this is a non-trivial task.
Klocwork Truepath can help developers in this task as it now includes the ability to validate type representation usage
symmetrically as those types cross transmission vector boundaries. That is, the data flow engine within Klocwork Truepath
automatically validates that types that are written directly to a transmission vector are subject to host-to-neutral format
transformation before the write operation takes place. Likewise, integral types read from a transmission vector are tracked to
ensure that they are appropriately transformed prior to the first attempted usage on the host.
For example, consider the following function:
void foo(int sock)
{
int x;
This simple function makes the basic assumption that the reader on the other end of its socket has the same processor
architecture as the sender. This might be true, or more accurately it might be true today, but what designer can ever look far
enough into the future to know that it will always be true, regardless of market shifts, great ideas that marketing interns have,
etc.
Klocwork Truepath, upon analysis of this function, will point out:
Value x is used in host byte order, but should be used in environment/network byte order.
A developer versed in inter-architectural development will naturally modify this function to transform the value of the variable
x prior to transmission:
void foo(int sock)
{
int x, xt;
Likewise when it comes to reading information across a transmission vector, Klocwork Truepath traces the data flow of any
received integral types to ensure, in exactly the opposite way to sending, that any such values are transformed to host format
prior to their first usage.
Now I can call enter() multiple times, simulating some of the capabilities of a true recursive lock, and as long as I remember to
call leave() an equal number of times the lifecycle of the underlying non-recursive lock is managed correctly:
void foo()
{
// real lock is reserved
enter();
if( i-really-want-to )
{
// only the reference count is affected
enter();
leave();
}
// now the real lock is released
leave();
}
Now consider the requirement to implement an abstraction over thread-specific data storage. To ensure safety when
allocating such a structure, the database engine uses the singleton recursive lock described above to protect its activities with
an implementation that simplifies as follows:
int tlsCreated = 0;
data_t* create_data()
{
static data_t* tls;
enter();
if( tlsCreated == 0 )
tls = create_thread_data();
tlsCreated = 1;
leave();
init_data(tls);
return tls;
}
To simple inspection, this appears quite correct as it calls leave() the same number of times as enter() and thus should be
considered well behaved. Unfortunately life in the parallel world is rarely simple to analyze, and this case is certainly more
complicated than it first appears.
Consider a two core CPU executing two threads, both calling create_data at very slight offsets in time.
The first thread lets call our threads Thread 1 and Thread 2 begins executing create_data() and successfully calls the
enter() function. This results in the underlying lock, lock 2, being reserved to Thread 1:
Thread 1
create_data()
enter()
refCount = 0
reserve(lock1)
reserve(lock2)
release(lock1)
refCount = 1
Now lets assume that Thread 2 begins its execution of create_data() during the time that Thread 1 is active, and before it
releases lock 1:
Thread 1
Thread 2
create_data()
enter()
refCount = 0
reserve(lock1)
reserve(lock2)
create_data()
enter()
release(lock1) reserve(lock1)
One further assumption makes the scenario whole: Thread 1 at this moment is interrupted by the operating system, losing its
time on chip. Crucially, this happens before the reference count is updated. (Check the implementation of enter() and youll
see that the author unfortunately left the reference count update outside of what is supposed to guard access to it.) As the
reference count will therefore still read zero for Thread 2, it will attempt to reserve lock 2, resulting in Thread 2 blocking (as
lock 2 is already owned by Thread 1):
Thread 1
Thread 2
create_data()
enter()
refCount = 0
reserve(lock1)
reserve(lock2)
create_data()
enter()
release(lock1) reserve(lock1)
interrupted
refCount = 0
reserve(lock2)
blocked
Upon return from interrupt, Thread 1 is released and resumes execution where it left off, incrementing the reference count
and returning from the enter() function. Its execution of create_data() continues, leading to a call to the leave() function, which
unfortunately attempts to reserve lock 1 before doing anything else:
Thread 1
Thread 2
create_data()
enter()
refCount = 0
reserve(lock1)
reserve(lock2)
create_data()
enter()
release(lock1) reserve(lock1)
interrupted
refCount = 0
reserve(lock2)
blocked
refCount = 1
return
leave()
reserve(lock1);
blocked
Due to the fact that Thread 2 is currently blocked, waiting on lock 2, and currently owns lock 1, Thread 1 will now block on its
own attempt to reserve lock 1.
In short, this is a classic lock-order inversion contention caused by a poorly guarded data item, which when subject to race
condition (being read by one thread whilst in the process of being updated by another) causes one thread to reserve locks in
order while the other thread attempts to reserve them out of order, resulting in a deadlock.
With the race condition fixed, this singleton will operate correctly, although as previously described the author actually chose
to completely rewrite this module, providing a more useful re-entrant mutual exclusion capability for multiple threads, i.e.
removing the singleton semantic.
Figure 5 | Data
representation
analysis in action
In this example, its simple to see the assumption in all its glory, as that data member msg.msg_hdr.m_size is read and used
directly off the wire, in what could be, but isnt in this case, network order.
Now lets assume that a new generation of designers revisit this decision and instead place emphasis on scale and flexibility
over ease of implementation. Now they decide to place the statistics collector process on an arbitrary node in the hardware
design, rather than on the same node as the kernel process.
With this decision in place, the assumption that network byte order and host byte order are the same can no longer be made
in general. Porting to this new assumption set could take significant time, both for developers and for the test crew, faced with
putting together a matrix of CPUs / hosts that embody the plethora of representations we can expect to support in the field.
Using a tool-driven approach, however, this entire effort can be collapsed to a single analysis pass, taking minutes in total, to see
a report of whats involved. In this case, the designers would be faced with the following endian vulnerabilities that would need to
be addressed (along with the obvious logistical issues around how to place the process on the right host/CPU, of course):
pgstats.c: line 1988: function pgstat_recvbuffer()
Value msg.msg_hdr.m_size is used in network order.
pgstats.c: line 1443: function pgstat_send()
Value *msg is used in host byte order.
These two simple issues might be thought of as the whole problem domain. However, looking further into what this module is
capable of, certain information can be persisted across sessions using a statistics file. If we further our decision to allow the
process to be spawned on heterogeneous hardware, we might well continue that spread by allowing different instantiations of
said process to occur on heterogeneous hardware, thus requiring persistent data to be endian safe:
pgstats.c: line 2556: function pgstat_read_statsfile()
Value format_id is used in environment byte order.
Similar errors can be found on line(s): 2610, 2684, 2717, 2740.
pgstats.c: line 2312: function pgstat_write_statsfile()
Value format_id is used in host byte order.
Similar errors can be found on line(s): 2351, 2384, 2411, 2412.
Armed with this information, the designer can make all required updates to remove endian vulnerability from their code in
one pass.
Conclusion
The complexity of this problem domain is vast, so theres no one solution, tool, or approach that will address all your problems.
Development teams need to equip themselves with good tools, smart design assumptions, and even smarter developers to
reconcile the feature race being demanded by the market and the underlying platform complexity that implies. When it comes
to selecting a tool, source code analysis should be on your shortlist as it offers a compelling mix of scalability, flexibility and the
abiltiy to address a broad set of issues that will help you to ensure the overall security and reliability of your code.
About Klocwork
Klocwork helps developers create more secure and reliable software. Our tools analyze source code on-the-fly, simplify
peer code reviews, and extend the life of complex software. Over 900 customers, including the biggest brands in the mobile
device, consumer electronics, medical technologies, telecom, military and aerospace sectors, have made Klocwork part of
their software development process. Thousands of software developers, architects, and development managers rely on our
tools everyday to improve their productivity while creating better software.
IN CANADA:
30 Edgewater Street, Suite 114
Ottawa, ON K2L 1V8
t: 1.866.556.2967
f: 613.836.9088
www.klocwork.com
Klocwork Inc. All rights reserved. Klocwork and Klocwork Truepath are registered trademarks of Klocwork Inc.