Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
In a software build system, starting later simply means re-using as much output as possible from previous runs. This, of course, is precisely what tools like the classic Unix make facility and its most popular modern descendant, GNU Make 3.811, are designed to do: by comparing modification times on input and output files, they will only re-run the commands to update those targets that are out of date with respect to their inputs. More generally: incremental builds are an instance of a larger strategy usually called build avoidance. By making sophisticated use of a pre-built object cache, systems can avoid compilation. Incremental builds and avoidance strategies are critical to individual developer workflows, but they do not have widespread adoption within production teams tasked with building the full product. First, the speed of modern compilers is often superior to the lookup-and-retrieve mechanics of complicated cache systems; in these cases, paradoxically, it is faster to rebuild than to re-use. Second, avoidance systems are inherently unpredictabledepending on the nature of 2.
the change and the state of the object cache, build times can vary widely. Finally, and most importantly: release teams are held strictly accountable for being able to rebuild any version they ship; a nondeterministic cache means they can not guarantee their ability to recreate, bit-for-bit, the contents of a release. Finishing sooner is a less useful concept for our purposes. It manifests most prominently in large systems that have a long-running nightly or continuous build that creates a large set of components, each with distinct owners. While the entire product cannot be considered built until the very last link step is complete, any individual library or subcomponent may be ready hours earlier. Designing a system that makes these intermediate outputs readily available to their owners can dramatically reduce effective cycle time. By far the most important strategy for addressing long build times is to make the build process itself run faster. Obviously, this can be accomplished to some extent simply by using faster hardware: newer processors, disks, and networks that push more bits through the cycle will obviously yield correspondingly faster builds. In practice, however, large organizations quickly extract as much as they realistically can from hardware upgrades. What is really required to make substantial gains in performance, and what we will spend the rest of this paper examining, is making effective, scalable use of parallel builds. Builds are an excellent candidate for parallel (and distributed) computation, because they comprise a large number processes (typically, compiles and links) that are logically distinct. Engineers who first launch their builds against a multiprocessor system (or, more ambitiously, a distributed compute cluster) are universally disappointed to discover one or both of the following: 1. 2. The build fails unexpectedly The performance improvement is far less than expected
Underlying both of these conditions is a problem with dependencies. Implicit or missing dependencies means the parallel build has insufficient information to accurately order build steps. For example, a link consuming a compiled object may inadvertently run before the compile producing that object is complete. Conversely (and more evident in builds that are functional but slow), explicit or implicit overserialization can force the system to use only one processing node even when more are available. The problems are analogous to others in multithreaded programming: without effective synchronization, the system is vulnerable to races and deadlocks. Most build tools do not have a good facility for completely guaranteeing (or even specifying) safe parallel execution. Dependencies must be listed explicitly; for example in Make syntax, prerequisites and targets are enumerated on a single line:
3.
4.
Moving out the main iteration is problematic for two reasons. First, we have lost the implicit parallelism the Make rules afforded us. The Perl scripts foreach loop is necessarily serial, synchronously forking each components subprocess in turn. If we attempt to make it multithreaded, we must assume all the burden of synchronization, which can make maintenance much more difficult. Second: the Perl script gives us a tempting but dangerous facility to do component-specific pre- and post-processing outside of Make control. Using it will lead to serialization and inefficient job packing, as this build plot from a Windows Mobile 4 illustrates:
Fortunately, this is readily resolved by using the Perl script to do one-time setup and tear-down, then letting a top-level Makefile recursively invoke submakes for each component:
5.
By leveraging Makes pattern rule syntax, it is easy to preserve the data-driven brevity of the Perl script, and maintain the list of subcomponents in a single location. We can also annotate explicit dependencies between components using normal Make syntax, and we are absolved of dealing with any parallelization or synchronization issues.
This kind of usage is a side-effect in a declarative language, because the variable (here, VERSION) is global system state that is mutated as a consequence of updating a Make target. There are good reasons to avoid side-effects in general, as they make declarative systems much harder to understand and maintain.5 For our purposes, the side-effect is a particularly dangerous kind of implicit serialization, because Make has no visibility of its existence. Running a build with this construct in parallel will fail unpredictably. The simplest way to address the situation is to introduce an explicit dependency between the target that creates the side-effect and those that consume it (or, in very simple cases like the one above, merge the commands into a single target). While this will ensure a correct build, for a large number of targets, it will again introduce crippling serializations. In that case, the side-effect needs to be factored out of as many targets as possible, so that downstream consumer targets may be allowed to execute safely in parallel.
6.
Here, all compiles are implicitly serialized against writes to a single vc80.pdb file. The same pattern is seen in Makefile rules that update a single archive (for example, a ZIP or Jar file) as outputs are produced. As this build plot shows, running this in parallel can lead to serious problems as multiple processes attempt to update the same file simultaneously:
Here, the red bricks represent any job that was identified as attempting to read or modify a file before a job that was serially earlier had finished writing it. This build was run with ElectricAccelerator, which detected this condition and remedied it by re-running the affected target serially. While correct, this build lost any benefit of parallelization due to the multiply updated files. The best solution to this problem is to leverage the fact that almost all tools that do incremental updates of a common file (including the compilers and archive tools noted above) are capable of merging partially built files. We can use this property to restructure the invocation into two phases: first, let as many individual targets update a unique archive or database file in parallel, then, as a serialized final step, merge the individual files into the single destination:
7.
Using this construct as a guide, we can re-write our Makefile rule as:
The linker will now write foo.objs symbols into foo.pdb; when linking the final executable, it automatically knows where to retrieve the partial debug information.
all: compile libs executables compile: $(MAKE) -C srcs compile libs: $(MAKE) -C srcs libs executable: $(MAKE) -C srcs executables compile-component-foo: $(MAKE) -C srcs/foo compile
This structure can lead to a tremendous amount of wasted work. Thousands of small do-nothing jobs consume both processing power and network bandwidth to determine that nothing needs to be done for targets Make has already updated. On a build plot, jobs are so short and so closely packed together they appear as black bands at the start of make instance:
8.
The solution here is to avoid writing Makefile rules that recurse into the same directory multiple times during a single build. Rather than looping over the source tree repeatedly with a single directive on each iteration, restructure to loop once with multiple targets in each directory. The need to support cyclical dependencies is interesting, because all Make-based systems easily detect and reject Makefile constructs that explicitly contain cycles. The actual situation underlying what is usually called a cyclical dependency is a component A that is only partially built on a first pass. A second component, B, consumes the built part of A, and itself produces a new output that the remainder of A requires.
There is no cycle: rather, there are three components, A1, B, and A2 that have a simple serialization relationship between them. Modeling the relationship this way is both more accurate and more efficient than resorting to multiple passes over A.
all:
9.
This construct introduces a build-level serialization, with a build plot similar to the Make on the bottom construct. All of compilations in the optimization build are artificially serialized against the entire debug build. More generally: letting outputs build into the source directory necessarily means that the system can only support one build at a time. Once the project matures and requires independent builds on any number of axes (variants, versions, architectures), this can become a severe obstacle to efficient parallel builds. The solution is to avoid Makes defaults and instead define rules that instruct the compiler to write outputs into architecture or build-specific directories. By using GNU Makes filename manipulation functions, we can write a macro that transforms a list of sources in the current directory into targets in the output directory:
objname=$(addprefix $(OUTDIR)/ \ , $(notdir $(1:.cpp=.o))) SRCS = main.cpp foo.cpp bar.cpp OBJS = $(call objname,$(SRCS)) $(OUTDIR)/%.o: %.cpp $(COMPILE.cpp) -o $@ $<
Now, simply by redefining the OUTDIR macro, multiple builds over the same source tree can proceed simultaneously. This method has the added benefit that cleaning the build tree is as simple as deleting the single OUTDIR directory.
6. Monoliths
A monolith is a single build step or Makefile target that takes a disproportionately long time to complete:
It is easy to see how a monolith forces a serialization: until it completes, all other dependent jobs must wait. There are several reasons for a single long-running step: the build may be configured to recurse over the entire source tree updating dependencies, it may be required to do a large file copy or checkout 8, or it may need to run a test or analysis program that requires a lot of processing time. By far, the most common monolith in an embedded software build is a large link, usually responsible for building the final image. 10.
Sometimes monoliths can be partitioned effectively for parallelization. Aggregating links into libraries that are then combined into a final target can help distribute the work. In other cases, the monolith exists simply because a portion of the build was designed with a different (typically homegrown) tool or language that does not lend itself to parallelization; a port to Make would both help standardize maintenance and accelerate performance. In a lot cases, however, monoliths can not be partitioned. Here, the best practice is not to directly accelerate the long-running step, but rather to be aware of its impact and to challenge its existence in the first place. Is the long-running step actually required in every build, or can it be made conditional and executed only when necessary? Can its output be cached and re-used? If it must run, can it be pushed to the end of the build, where it will serialize fewer jobs? Undertaking the experiments to measure monoliths usually yields strategies to mitigate their impact.
7. Bad Dependencies
The last and most difficult problem to address in parallel builds is missing or inaccurate dependencies. The simplest real-world example usually looks like this:
all: foo.o myprogram foo.o: foo.c gcc -c foo.c -o foo.o myprogram: gcc foo.o -o myprogram
This will always build correctly in serial: Make ensures that prerequisites are processed left-to-right, and schedules the foo.o target before myprogram. In parallel, however, the lack of a dependency between myprogram and foo.o means that both the link and compile will be executed simultaneously. Depending on the timing and the starting conditions, this build may build myprogram correctly, it may fail unexpectedly with the linker complaining that the object does not exist, or, most insidiously, if the object already exists (as it would if this were an incremental run) the program would appear to build correctly, but the executable would no longer match the sources. This last condition is so frustrating and dangerous that many developers are wary of parallel builds in general, fearing incorrect builds. Unfortunately, careful inspection in very large builds can only solve part of this problem. Certainly looking for and fixing missing dependencies when build steps fail unexpectedly is best practice. A more rigorous approach is to centralize all rules and macros, to ensure, for example, that it is impossible to invoke the linker without listing all arguments as dependencies. Good examples of this kind of technique can be found in articles that address automatic dependency generation.9 Finally, ElectricAccelerator from Electric Cloud was designed explicitly to efficiently solve the problem of missing dependencies in parallel builds by introducing a custom filesystem that detects and corrects dependencies automatically.
11.
Conclusion
Slow software builds can have a serious impact on the productivity of an organization. Distributed parallel builds are the only way to completely address the problems of exponential code growth, but their effectiveness is marred by Make and build constructs that introduce serializations. This paper covered seven common problem patterns that naturally occur when engineers are looking for simple solutions to complexity but neglect to take the impact on parallelization into account. In almost all cases, there are alternative constructs that are functionally equivalent but allow far greater parallelization. Looking for and correcting these problems in embedded software build systems can dramatically reduce the software production cycle time which in turn can have tangible impact on business productivity.
References
1. 2. 3. 4. 5. 6. 7. 8. 9. GNU Make documentation: http://www.gnu.org/software/make/manual/make.html ElectricInsight build analyzer: http://www.electriccloud.com/products/electricinsight.php Notes on declarative programming: http://en.wikipedia.org/wiki/Declarative_programming Windows Mobile build phases: http://msdn.microsoft.com/en-us/library/aa448367.aspx Side Effects in Declarative Languages: http://en.wikipedia.org/wiki/Side_effect_%28computer_science%29 Sun CC Template Repositories: http://docs.sun.com/app/docs/doc/819-5267/bkagr?a=view Microsoft Visual Studio PDB files: http://msdn.microsoft.com/en-us/library/yd4f8bd1%28VS.71%29.aspx Strictly speaking, we do not include source configuration checkout as part of the build time; some systems, however, are designed to pull or query additional files from the SCM system on build start that can lead to a startup monolith. Automatic Dependency Generation: http://make.paulandlesley.org/autodep.html
2003-2010 Electric Cloud, Inc. All rights reserved. Electric Cloud, ElectricCommander, ElectricInsight, ElectricAccelerator and Electric Make are registered trademarks of Electric Cloud. Other company and product names may be trademarks of their respective owners.
12.