Sei sulla pagina 1di 14

A Case for Continuous Automated Testing Blended with

Continuous Refactoring, Instead of Test Driven Development

A Research Paper by Robert Holzhauser


Regis University College of Professional Studies

MSSE 600 - Object Oriented Software Engineering

8-24-2014

ABSTRACT
Over approximately the last 20 years, spanning 1994 to 2014, two related practice
disciplines have emerged in the software engineering literature as being valuable for delivering
quality software on schedule and with very low defect rates. These two practices are: Test
Driven Development (TDD), and refactoring. As these practices are maturing, even the creators
of TDD and refactoring are now saying that a test first methodology is optional, but what is
considered essential are the practices of continuous automated testing and refactoring.

Introduction
Originally created by NASA in the late 1960s (McClure, 1968), with roots going back
earlier, Test Driven Development is generally attributed to have been created and popularized by
Kent Beck starting in the mid 1990s. In a breath taking internal study in 2003 IBM found that
they were able to reduce their defect rate by 50% with minimal impact to developer productivity.
(Maximilien, 2003).

Early results like this got the attention of both the academic and business

communities. However, most studies have not adequately measured developer compliance to the
TDD steps, and have not taken into account the complexity of refactoring. In this paper I will
show that improvements to code quality, previously thought to have been attributed to TDD, may
actually have come from the combination of continuous automated testing and refactoring, rather
than the practice of writing tests first before coding.

Analysis
Test Driven Development is a way of approaching code construction which consists of
the three primary steps of:
1) Write a test for a piece of new code that you are about to write.
2) Write the code, and refine as needed until the test from step 1) passes.
3) Refactor, improving the code without changing the functionality.
My personal experience is that practicing Test Driven Development (TDD) seems
counter intuitive, but it is sometimes useful in clarifying what I am doing as I code, at least at
first. My natural inclination as a programmer is to write some code and then to check if its
right. TDD flips that idea on its side and says, Write the test before you write some code.

According to Kent Beck in his book, Test Driven Development by Example, (Beck, 2003)
the two most important rules for a developer to follow when writing code are: First, always
write an automated test that fails before writing any code. And, second, once that has been done,
refactor to remove any duplication. Sometimes Beck describes this as, red, green, refactor,
in which he is referring to the way testing tools, such as Junit, display failed tests as red and
passed tests as green.
Beck gives a number of reasons for the test first approach. He makes a number of
references to the developers stress level and state of mind. His belief is that TDD provides a
positive feedback loop which increases confidence and task enjoyment, simultaneously
decreasing stress, and keeping the developer in a peak performance zone. The other compelling
reason Beck gives for testing first is that it requires a good deal of clarity about how the code
will be implemented in order to devise a test for it. (Beck, 2003).
There have been several studies done which compare TDD, sometimes called Test First
Development, with another style of unit testing called, Test Last Development, or TLD. In
TLD, the developer:
1) Writes a piece of new code.
2) Writes and runs a test for the new code, modifying the code as needed until it passes
the test.
3) Refactoring. (Munir, H., et al, 2014).
To gain familiarity with TDD for this paper, I undertook a very small project of building
an app in Java for creating a sample of random numbers. My personal experience was that I
tended to fall into a rhythm of coding, testing, starting to refactor, and realizing that I had done
things out of sequence. Then, I adjusted the next iteration back to test first, but by the
subsequent iteration, I had slipped back into a TLD rhythm. This was probably just a matter of

my training and my needing to build the habit of TDD. It was actually much more difficult for
me to follow the TDD steps than I had ever expected.
However, I noticed that as soon as I had clarity about how I would test a piece of code, I
also had envisioned what that line of code would be, and I therefore would want to move forward
with implementing the code over the test. This raised the question for me of whether studies
which used developers who were not accustomed to and were not in the habit of implementing
the Test, Code, Refactor sequence of TDD might not have evaluated a pure TDD sequence.
After using pure TDD, or hybrid TDD TLD, I did feel the increase in confidence and
enjoyment that Beck described. As far as it goes however, this boost in confidence and joy was a
nice to have, not a must have aspect which noticeably improved code. However, I also believed
that my code was solid, and it seemed like I completed it at least a bit faster than I would have
without using some kind of approach of continuous testing and refactoring. As interesting as my
little experiment was, it was only a start. A sample size of one, without much scientific rigor,
does not make good science.
The Hammond and Umphress 2012 review of the literature on TDD studies confirms my
experience. Alternatelysince TDD is not well defined, it is possible that some respondents
may be incorrectly claiming to use TDD. (Hammond, 2012). This literature review also found
that many developers perceive TDD as too difficult, different, or requiring too much discipline
compared to what they normally do. These developers had difficulty adopting a TDD mindset,
or they frequently made mistakes in following the TDD protocol. Furthermore, the Hammond
review reports that some developers feel that TDD can lead to overall architecture mistakes, even
if the process is followed rigorously.

Hammond and Umphress went on to say that the studies on TDD, while finding some
benefit to TDD, had only found overall marginal benefit in productivity, to internal code
structure, or to external code structure. (Hammond, 2012). They ventured that a plausible reason
for the variance between developers is that the studies may have had invisible conformance
issues. (Hammond, 2012).
Seemingly in response to this, Finish researchers Fucci, Turhan, and Oivo conducted a
study similar to previous ones, but this time, they introduced a measure of conformance to the
TDD methodology. (Fucci, 2014). They concluded that there is no significant difference TDD
and TLD. (Fucci, 2014). However, there was a high degree of variability in the results obtained
by those with a high degree of conformance. The researchers speculated that there were other
factors for the variance, possibly skill. (Fucci, 2014).
Reading this study makes me wonder to which skills that they might be referring. The
sub-group with high conformance must be adequately skilled at Test Driven Development in
order to stick to the process, unless there is problem with how they are measuring conformance.
Let us assume that their method for measurement is acceptable and that their conjecture about
skill as the missing co-variant factor; again, we have the question of which skill or skills caused
the variance.
Becks original work on TDD explicitly devotes around one third of its pages to
refactoring and maintains it is a background theme for most of the book. (Beck, 2003). Of the
steps Red, Green, and Refactor, the third is the most elusive. While sometimes difficult to
implement, the first two steps write a failing test and write code to make it pass are in my
opinion very straightforward conceptually. Refactoring involves one or more maneuvers known

as refactorings, which are specific ways of moving the code toward a more optimal design, but
these steps are not necessarily intuitive or obvious.
The abundance of literature on refactoring alone, even though it is only a sub-step of both
TDD and TLD, supports my suggestion that it is the most complicated piece of TDD. One such
book is the classic Refactoring by Martin Fowler. Fowler has the distinction, along with Beck
and others, of being instrumental in bringing the Agile Manifesto into being and of belonging to
the project team from which many of the practices of eXtreme Programming emerged.
Refactoring is defined by Fowler as, A change made to the internal structure of software
to make it easier to understand and cheaper to modify without changing its observable behavior.
(Fowler, 2002). Refactoring.com, a site maintained by Fowler, has a catalog of over 90
refactoring patterns. So, we start to see a bit of a different picture. While the three steps of TDD
listed at the beginning of this paper seem simple and straight forward, the inclusion of
refactoring actually makes the TDD process far more complicated and complex.
Perhaps we could restate TDD as Red, Green, and appropriately select the applicable
refactoring patterns to apply from a list of over 90. In this light, TDD is suddenly not quite the
simple creature it first appears to be at first glance. It is not quite so simple a task as iterating
through steps 1, 2, 3 over and over again. Even Beck concedes this point in Fowlers book,
suggesting that refactoring is not easy to learn, and he implies that there are implicit metapatterns of knowing which of the refactoring patterns to apply. Deveolopers would know how to
apply these patterns effectively only with substantial experience in using them. (Fowler, 2002).
Beck, in his chapter 15 contribution to Fowlers Refactoring, has this to say on the topic,
The list of techniques is only the beginning. It is the gate you must pass through. Without the
techniques, you can't manipulate the design of running programs. With them, you still can't, but

at least you can start. Beck continues, Why are all these wonderful techniques really only the
beginning? Because you don't yet know when to use them and when not to, when to start and
when to stop, when to go and when to wait. It is the rhythm that makes for refactoring, not the
individual notes. (Fowler, 2002).
With a catalog of over 90 refactoring maneuvers to understand and be fluent with in
applying appropriately, there is without question is quite a lot of refactoring know-how with
which to be conversant. Fortunately, now we also have IDEs and other automated tools that
support a number of these operations.
Based on the lack of measures of compliance in most studies, I do not believe that studies
to date have validated the exact sequence of TDD: write a test, write code, and refactor. Rather,
what has probably been shown is the effects of having both of the components of: 1) write tests
for each piece of code, and 2) writing the code have been tested, but they have not necessarily
been implemented in that sequence.
In order to measure the full three steps, including refactoring, will require a very different
methodology of assessing or teaching programmers the refactoring patterns and of somehow
tracking the refactoring which was done or attempted. To do less than this is a disservice to
TDD, thinking that we have studied an apple, when actually, we have studied an orange.
Admittedly, specifically including refactoring in an experiment is a much more difficult
undertaking. On the other hand, doing so would help to validate the exact definition of TDD.
Modern IDEs such as NetBeans and Eclipse are able to perform some refactorings.
From looking at the refactoring menu on NetBeans 7.4, it is capable of doing approximately 15
refactorings. Fifteen out of ninety is certainly a good start. A 2008 study reviewed refactoring
tools, and it showed that some can do as many as 24 refactorings. (Huiking, 2008). So, I

anticipate that there will be continual improvement in the refactoring support available from
automated tools, largely because of the number of many papers and conference proceedings on
the topic of automated refactoring.

For now though, we cannot assume that we can leave

refactoring entirely up to the IDE or any other specialized refactoring software.


The general consensus among the software engineering community appears to be that
Fowler, Beck and others are right about the importance of refactoring. Now there is IDE support
for it, specialized refactoring tools available, and dozens of papers investigating ways to
automate it; refactoring is easier than ever to accomplish. Yet, even though it can be helpful,
developers should refactor with awareness rather than blind faith in the tools. As Abadi, et al.,
found in an attempt to recode a java servlet into the Model View Controller pattern, ...the
whole conversion could be described as a series of refactorings, most of these were inadequately
supported by the IDE, and some were not supported at all. (Abadi, 2008).
As described above, refactoring is inherently a complex activity. (Abadi, 2008).
Therefore, to do refactoring well requires an amount of knowledge, skill, and experience, which
can also be augmented by automation.
There are two times when developers refactor: 1) during the development process as
design problems are discovered, and 2) when the software becomes unhealthy. According to
Eclipse usage data, the second scenario is extremely rare. (Murphy-Hill, E.). Of course, the
first scenario is that which is advocated by TDD. This kind of refactoring goes beyond fixing
bugs and cleaning up code. Fixing bugs and cleaning up code can often be done with changing
design. Refactoring is inherently about design improvements.
My hypothesis is that while some developers may find some gains from going test first,
the most important thing to be gleaned from TDD is to constantly write automated unit tests for

every piece of code and to continually look for opportunities to refactor. Whether the test is built
before the code is constructed is more of a matter of personal preference. The crucial discipline
is to write automated tests for nearly every piece of code that you intend to go into production.
That includes writing tests for every class and every method. When you go along in building
your program, stop at each step and look for ways to refactor it.
To phrase this more succinctly: Always Test and Refactor Everything as you go. Can
this discipline be effective if its only, Always Test Everything?
Most of the experiments that have been done with TDD, due to lack of tighter controls
around process, may have essentially been measuring the benefits of doing some variation of the
more general practice of unit testing. Recent reviews have questioned how closely earlier studies
actually measured TDD. The discipline of testing to verify that code is working is something
which can be both taught and measured fairly easily. Some of the initial excitement about TDD
was due to positive results from studies which in retrospect were at least measuring some kind of
continuous automated testing, if not strictly TDD. It is likely that this initial excitement was
misattributed.
Again, the three practices of: 1) Creating a library of automated regressions tests, which
includes every piece of code of significance immediately before, during, or after its development.
2) Running the library of tests after implementing any new code, and not moving forward until
all tests pass. 3) Continuously looking for refactoring opportunities, both assisted by tools, and
manually.
Based on recent internet conversations that Beck, Fowler, and others have engaged in, it
seems that they would now agree with the above three steps as being what is crucial to take from

TDD (Fowler, 2014).

In this discussion, Beck states that he has no problem mixing styles of

TDD and TLD and that he has done so on at least one recent project.
Some have taken TDD to mean using a lot of mock variable values in the code,
something which can potentially damage the code quality (Fowler, 2014). This was a technique
Beck had demonstrated at length in his 2002 book (Beck, 2002). In the recent conversations,
Beck said that not only are mocks not necessary, but that he rarely uses them (Fowler, 2014).
They further say that when less experienced developers do TDD, they often don't
refactor enough, which leads to sub-optimal designs. Further, they point out that its not accurate
to compare an inexperienced developer's work and productivity to that of an experienced
developer (Fowler, 2014).
One postulate they explored was TDD as the gateway to self-testing code. This implies
that what the creators of the TDD movement now value about TDD is that it gives developers
access to an automated set of regression tests. They go on to say that there are types of coding
where TDD isnt the best choice.

Conclusion
With many studies having been done on TDD, they have generally shown TDD to have a
positive impact on code quality. However, some of the studies now appear to have generally
been measuring the benefit of the pairing of new code with the creation of an automated test and
of having the running of the library automated tests. Studies have also shown that some
developers have a difficult time adjusting to the test first sequence of TDD. Furthermore, the
same studies have mostly glossed over the refactoring step, leaving the possibility of a wide
variation in how, and even if, developers implemented it.
Therefore, I theorize that the benefit of TDD can be achieved through the more flexible
approach of a combination of continuous automated regression testing and continuous factoring.

References
Abadi, A, Ettinger, R., Feldman, Y. (2008). Reapproaching the Refactoring Rubicon. Nashville,
TN: ACM
Beck, K. (2003). Test Driven Development by Example. Boston, MA: Addison-Wesley.
Beck, K., Fowler, M., Heinemeier Hansson, D. (2014). Is TDD Dead? A series of conversations
between Kent Beck, David Heinemeier Hansson, and myself on the topic of Test-Driven
Development (TDD) and its impact upon software design. Retrieved July, 2014 from
http://martinfowler.com/articles/is-tdd-dead/
Dig, D. (2008). Refactoring.info retrieved from http://refactoring.info
Fontana, F. & Spinelli, S. (2011). Impact of Refactoring on Quality Code Evaluation. Honolulu,
HI: ACM
Fowler, M. (2000). Refactoring: Improving the Design of Existing Code. Addison Wesley.
Fucci, D., Turhan, B., Oivo, M. (2014). Conformance Factor in Test-driven Development: Initial
Results from an Enhanced Replication. London, England, BC, United Kingdom: ACM
Huiqing, L. & Simon, T., (2008). Tool Support for Refactoring Functional Programs. Nashville,
TN: ACM

Jeffries, R. & Melnik, G. (2007). TDD: The Art of Fearless Programming. IEEE Software.

Kerievsky, J. (2004). Refactoring To Patterns Catalog. Retrieved July, 2014 from


http://www.industriallogic.com/xp/refactoring/catalog.html
Maximilien, E.M., & Williams, L. (2003). Assessing Test-Driven Development at IBM. Software.
Engineering, 2003. Proceedings. 25th International Conference On .: IEEE.

Mclure, R., Bauer, F.L., Bolliet, L., Helms, H.J., Naur, P., Randell, B. (1968). Software
Engineering. Report on a conference sponsored by the NATO Science Committee, Garmisch,
Germany.

Murphy-Hill, E., & Black, A. (year unknown). Why Dont People Use Refactoring Tools?
Portland State University. Retrieved July, 2014 from
http://people.engr.ncsu.edu/ermurph3/papers/wrt07.pdf
Percival, J. & Harrison, N. (2013). Developer Perceptions of Process Desirability: Test Driven
Development and Cleanroom Compared. 2013 46th Hawaii International Conference on System
Sciences.
Sierra, K. & Bates, B. (2012). Headfirst Java: Headfirst Java A Learners Guide. 2nd Edition.
O'Reilly Media, Inc.
Wnuk, K., Munir, H., Petersen, K., Moayyed, M. (2014). An Experimental Evaluation of Test
Driven Development vs. Test-Last Development with Industry Professionals. London, England,
BC, United Kingdom: ACM.
Umphress, D., & Hammond, S. (2012). Test Driven Development: The State of the Practice.
Tuscaloosa, AL: ACM

Potrebbero piacerti anche