Sei sulla pagina 1di 6

Proceeding of the 3rd International Conference on Informatics and Technology, 2009

AN AUTOMATED TEST DATA GENERATION USING GENETIC ALGORITHM


1 2 3
Mawarny binti Md. Rejab , Rohaida Romli , Nooraini Yusof
College of Art and Science, Universiti Utara Malaysia, Kedah, Malaysia
1 2 3
e-mail: mawarny@uum.edu.my, aida@uum.edu.my, nooraini@uum.edu..my

ABSTRACT

Software testing is an activity aims to evaluate quality attributes or define capability of a program to meet the
required results. Software testing is cost consuming demanding high allocation of budget in a software
development. This is due to several processes are required in the software testing including error debugging,
program restructuring and validation of output with different set of test data. More effort and time have been spent in
selecting test data, whereby sometimes this needs to be done manually. Nevertheless, the cost can be reduced if
these processes are implemented automatically. Thus, automated test data generation has been introduced to
automate a process of creating program input that fulfilled testing criteria. Through the years, a number of different
approaches have been proposed for generating test data. The efficiency and effectiveness of data generation can
be improved when using genetic algorithm compared to other conventional search algorithms. Due to the ability of
genetic algorithm in searching optimal solution with certain constraints and requirements, genetic algorithm-based
automatic data generation has gained interest from many researchers. Thus, this paper presents an automated test
data generation using genetic algorithm due to the ability of genetic algorithm in obtaining an optimum set of test
data automatically. The automated test data generation focuses on providing a set of test data in evaluating the
correctness of Java programming assignments. The initial set of data is obtained by using a random method and an
equivalence partitioning technique is used to generate a set of possible test data from the randomly obtained set.
Then, the combinatorial approach is used to create possible combinations set of test data. Afterward, Genetic
Algorithm is applied as an optimization approach to select the optimum set of test data.

Keywords: Test Data, Test Data Generation, Genetic Algorithm.

1.0 INTRODUCTION

Software testing is an activity aims to evaluate quality attributes or define capability of a program to meet the
required results. It is conducted to presence of defects or error and provides more assurance for the software
quality. Software testing is very labor-intensive and expensive, it accounts for approximately 50% of the cost of a
software development [14]. In order to reduce the high cost of software development, various automated testing
tools are proposed by several researchers and practitioners. IBM rational robot, mercury winrunner, ranorex, qarun,
and testpartner are some examples of the available automated tools and are becoming accepted in software
industry. Software testing is a broad term encompassing a wide spectrum of different activities, from the testing of a
small piece of code by the developer (unit testing), to the customer validation of a large information system
(acceptance testing), to the monitoring at run-time of a network-centric service-oriented application [2]. This study
focuses on testing a small piece of code to ensure the correctness of a program by executing it with different test
data. Test data is a subset of elements chosen for using in software testing process and consists of representatives
set of inputs [4]. It contains a sample of every category of valid and possible invalid data conditions. The
correctness of the behavior of the program is evaluated by using several test data.

However, one of the most difficult in software testing is the test data generation. More effort and time have been
spent in selecting test data, whereby sometimes this needs to be done manually. Test data generation is defined as
a process of preparing test data for testing the validity and quality of software output and it must as well satisfy the
pre-defined criteria of a testing [16]. This is the crucial part and cost consuming in a software testing [4]. Test data
generation requires relevant and optimum set of data in fine tuning tested software from possible errors and any
unexpected mistakes that may exist, at the same time taking into accounts time and number of data required.
Generating test data manually is extremely time consuming due to the tedious process, especially for the complex
program. Besides, how well a series of test data generation will determine the reliability of the testing process.
Therefore, numerous attempts have been made to automate the test data generation. Thus, this paper is organized
as follows. First section highlights the automated test data generation. The second section focuses on automated
test data generation using genetic algorithm. Then, next section describes the prototype overview and the last part
focuses on the development of prototype using genetic algorithm.

©Informatics '09, UM 2009 RDT1 - 7


Proceeding of the 3rd International Conference on Informatics and Technology, 2009

2.0 AUTOMATED TEST DATA GENERATION

Through the years, a number of different approaches have been proposed for generating test data. Test data
generation can be categorized into three methods, namely random test data generation, path-oriented test data
generation and goal-oriented test data generation [5]. Random test data generation is the simplest and easy to
apply in generating test data. Input values are generated randomly until the selected statement is reached.
However, this method does not perform well in term of high coverage. Due to rely on probability, there is a low
finding semantically small faults. Thus, it just reveals a small percentage of the program input. Thus, it is
considered to have the lowest acceptance rate.

Path-oriented approach is a process of selecting a set of paths that covers all the statements satisfying a given
criterion and then generating input data to each defined path [11][19]. However, this approach is not
guaranteed that every path in program will be exercised if the selected path is satisfying the criterion. Thus,
errors in the program flow of control may remain undetected. So, it requires a stronger criterion in order to
cover all branches [19]. The path-oriented approach is probably best suited for program with a small number of
paths the selected node [5]. Two methods have been proposed to find input data in order to execute the
selected path, namely symbolic execution and execution-oriented test data generation.

In contrast, the goal-oriented approach of test data generation is different compared to the path-oriented test
data generation. The main objective of the goal-oriented approach is to focus on the part of program that
effects the execution of the selected statements and ignore any part of the program that does not influence the
execution of the statement [11]. Thus, this approach leads to the identification of input values for selected
branches in a program are executed. Two methods have been identified as an extension of the goal-oriented
approach, namely chaining approach and assertion-oriented approach. The defined approaches have been
automated by integrating them with other methods or techniques. The goal-oriented technique can be
automated by using genetic algorithm to search for test data that is satisfied test requirements [17].

Thus, several tools or system prototypes have been proposed by several researchers and practitioners.
Casegen is a one of test data generation system that has been implemented as part of the fortran automated
code evaluation system [19]. It was designed and built to generate test data automatically for testing fortran
program by using symbolic execution technique. Besides, testgen was introduced as a test data generation
system for pascal program [10]. This system was developed using the chaining approach as an extension of
execution-oriented methods of test data generation. Besides, SELECT system has been generated to assist
the formal systematic debugging of programs [3]. SELECT systematically handled the paths of program written
in lisp subset by using symbolic execution. SELECT appeared as a successful tool in automatically finding
useful test data. It has been constructed with construction of input data constraints to cover selected program
paths and automatic determination of actual input data to drive the test program through selected paths. This
system is similar to a system called Effigy that has been developed by king and his colleagues at IBM research.
Effigy is an interactive symbolic execution system for testing and debugging programs written in a simple pl/i
style programming language [9]. This system used symbolic execution approach to generate test data and it
was determined by dependency of the program’s control flow on its input. A new automated and combinatorial
software testing toll called Jtst has been developed by several researchers from Universiti Sains Malaysia
(USM). Jtst is a tool for customizing test data generation based on combinatorial approach. The tool focuses
on performing automated black-box testing without considering the behavioral specification [7].

In a program testing, preparing data satisfying the testing condition is not an easy task. It is a time-consuming
process especially when dealing with complex program [12]. Nevertheless, with available methods and tools,
automated data generation make the task easier. However, best techniques in obtaining optimum test data are
still of research interests. There is a need to a technique that is able to produce a set of data more accurately
fulfilling the testing criteria. Thus, genetic algorithm is used in automating test data generation due to the ability
of genetic algorithm in obtaining an optimum set of test data.

3.0 AUTOMATED TEST DATA GENERATION USING GENETIC ALGORITHM

The efficiency and effectiveness of data generation can be improved when using genetic algorithm compared to
other conventional search algorithms [18][12][17]. Due to the ability of Genetic Algorithm in searching optimal
solution with certain constraints and requirements, Genetic Algorithm based automatic data generation has
gained interest from many researchers. One of the early works on application of Genetic Algorithm in data
generation was done by integrating the algorithm in optimization and used real program as an evaluation
function in searching process [18]. This method produced input data that was able to bring optimum program
output under tested. They concluded that Genetic Algorithm-based technique is able to speed up the searching
process by eliminating some weak branches that were also traced in sequential traverse method.

©Informatics '09, UM 2009 RDT1 - 8


Proceeding of the 3rd International Conference on Informatics and Technology, 2009

In addition, Genetic Algorithm-based implements the algorithm in a fuzzy logic based program controlling
temperature that comprised of 210 C lines with 35 conditional rules or constraints [12] The study successfully
generated test cases that fulfilled those condition decisions. The finding also indicated that the Genetic Algorithm
based data generation outperformed the random method that, 33 percent of data generated by the random method
failed to satisfy the program requirements. A program known GenerateData was developed to generate data test
automatically for some identified programming tasks using genetic algorithm [17]. The experimental results support
the findings by others that have found Genetic Algorithm-based algorithms outperform the random methods.
Furthermore, the study also found that, GenerateData was able to produce relevant data test even the program
under tested were modified.

As an improvement to single use of Genetic Algorithm in searching optimizations, researches then shifted to
combining Genetic Algorithm as well as with other techniques [13]. The findings revealed that the combined Genetic
Algorithm method achieved highest performance in general by producing data test with wider coverage of program
requirements. Instead of focusing on the use of Genetic Algorithm in data generation, and parallel to the growth of
machine leaning and software engineering procedures, researches in the related fields then concentrating on the
improvement of Genetic Algorithm and also combination with other machine learning techniques.

For instance, a novel work on combination of Genetic Algorithm and formal concept analysis for automatic test data
generation was done [8]. They developed a generator known as genet that produces test data for branch coverage
and takes a simpler approach than previous Genetic Algorithm-based automatic test generators but exhibits similar
behavior in general and also is programming language independent. Genetic Algorithm in genet is used to search
for tests and formal concept analysis to organize the relationships between tests and their execution traces. genet
learns relationships between branches that provides useful insights for test selection and maintenance, finding a
minimal test set, analysis of test failures and understanding of a program’s dynamic control flow.

Meanwhile, a combining the parallel search ability of the adaptive Genetic Algorithm (aGA) with the controllable
jumping property of Simulated Annealing (SA) enabled the use of a kind of SAaGA hybrid meta-heuristic algorithm
for automatic software test data generation [6]. Experimental results based on some benchmark programs showed
that SAaGA is quite flexible with satisfactory results, and require fewer running time than aGA and SA. Considering
path coverage as the test adequacy criterion, a Genetic Algorithm can be used for automating the generation of test
data for white-box testing based [1]. The main aim is to overcome an inefficiency problem encountered in covering
multiple target paths. They have designed a Genetic Algorithm-based test data generator that is, in one run, able to
synthesize multiple test data to cover multiple target paths. From implementation of a set of variations of the
generator, the experimental results showed that the developed test data generator is more efficient and more
effective than others.

4.0 PROTOTYPE OVERVIEW

JTestGen is developed to provide a formal mechanism in automating the process of test data generation for testing
numeric data of Java program. The prototype can serve as a useful tool that assists lecturer to mark student’s
programming assignment. It also provides less user involvement and understanding on written program by student.
In addition, it is definitely useful for lecturer in providing the optimum set of test data that fulfilled the defined testing
criterion with no of necessity to be expert or fully understanding the technique of designing test cases. JTestGen is
also developed as a tool aids to improve the process of evaluating correctness of the Java programming
assignment which was proposed one researcher [20]. Figure 1 depicts the main interface of JTestGen. JTestGen
enables the lecturer, who is the main user of JTestGen to set the input specifications by indicating the number of
input variables, data type and category of data. After pressed “Generate Data” button, the sequence of reproduction
new generations of population will be displayed in the text area on the interface. Based on the input specifications,
JTestGen will generate a set of individual test data from randomly obtained list by using an equivalence partitioning.
The initial set of data is obtained by using a random method and an equivalence partitioning technique is used to
generate a set of possible test data from the randomly obtained set. The combinatorial approach is then used to
create possible combinations set of test data. Then, Genetic Algorithm is applied as an optimization approach to
select the optimum set of test data.

©Informatics '09, UM 2009 RDT1 - 9


Proceeding of the 3rd International Conference on Informatics and Technology, 2009

Fig. 1: JTestGen Interface

5.0 DEVELOPMENT OF PROTOTYPE USING GENETIC ALGORITHM

The initial set of test data is designed by using one of the black-box testing which is known an equivalence
partitioning technique. In the equivalence partitioning, the input domain is divided into classes of data by
representing a set of valid and invalid states. The set of test data that will be generated is certainly based on
the input specifications. Generally, the test data is divided into the following categories:
a) Valid test data
b) Invalid test data
c) Illegal test data

The overall process of the test data generation is depicted in figure 2. Genetic Algorithm is used in this study to
control the permutation by selecting the optimum data set [15]. Genetic Algorithm commonly known as GA, is
one of Artificial Intelligence techniques, grouped under evolutionary computation that simulates the process of
natural evolution (e.g. biological chromosomes, X and Y) that include selection, mutation and reproduction. As
one of the evolutionary computing branches, Genetic Algorithm solves problem by optimizing combination of
variables given a set of constraints. It uses natural selection and genetics-inspired techniques known as
crossover and mutation. Due to incorporating Genetic Algorithm in selecting an optimum set of test data, each
test data in the cluster is assigned with an appropriate fitness value. The fitness value is selected based on the
cluster of the test data. Each cluster represents a critical level of those data in the testing process. Two types of
testing have been conducted namely positive and negative testing. The positive testing emphasizes on testing
that fulfills program requirement, whilst the negative testing is focused on testing that might produce
unexpected results or errors. However, due to ensuring the testing process can be done in more effective way,
the design of test cases should reflect on both of positive and negative testing to fully cover any possible
circumstances.

©Informatics '09, UM 2009 RDT1 - 10


Proceeding of the 3rd International Conference on Informatics and Technology, 2009

Program Design Optimum set


Specification scheme of test data
of test
case

Random generation Random selection of


of individual test test data (initial
data population)

Selection of data
Initial set of test using GA
data Generation of the
combination of test
data (combinatorial
approach)

Fig. 2 : The Overall Process of test Data Generation

The individual of test data is generated by using a random selection technique. The number of test data to
be generated is basically relies on the number of input variables obtained from the program specification. The
second level of test data is then generated as a set of combinations of test data by using a combinatorial approach.
The number of combinations of test data is certainly relies on the generated individual of test data. This approach is
implemented repetitively according to the number of input variables. After generating the combination of test data,
Genetic Algorithm is then applied to control the variation of data by selecting the optimum set of test data that
fulfilled the defined testing criterion or program specification. Before Genetic Algorithm can be applied, a random
technique is used over again to randomly select some of the test data from the collection of combination results as a
set of initial population that comprising chromosomes of size N. Then, two chromosomes from the population
(parents) are selected to make a crossover. A mutation process is randomly performed to some of the
chromosomes in the population. Mutation is rarely done in nature to represent a change in the gene [14]. The
mutation process flips a randomly selected gene in a chromosome. Mutation can occur at any gene in a
chromosome with some probability, probably in the range of 0.001 and 0.01. The Genetic Algorithm process will
continue by reproducing some new generations of population until the termination criteria is eventually met.
Typically, when all chromosomes of the population produce the same total of fitness value, the termination criteria
(or optimization process) can be assumed as met.

6.0 CONCLUSION

The prototype was developed to provide provides a formal mechanism in automating the process of test data
generation for testing the correctness of basic Java programming assignments. The integration of Genetic Algorithm
as a control mechanism in selecting optimum set of relevant data not just reducing the time but also enhancing the
efficiency of data selection that fulfill the defined testing criterion. Therefore, this may improve the existing test data
generation technique that sometimes does not meet all testing requirements.

REFERENCES

[1] A.Ahmed, Moataz & Hermadi, I., “GA-based multiple paths test data generator”, Computers & Operations
Research, pp. 3107-3124, 2008.

[2] Bertolino, A.,”Software testing Research: Achievements, Challenges and Dreams”, Future of Software
Engineering. IEEE, pp. 85-103, 2007.

[3] Boyer, R.S., Elspas B. & Levitt K.N.,”SELECT-A formal System for testing and debugging programs by
symbolic execution”, ACM SIGPLAN Notics,1975.

©Informatics '09, UM 2009 RDT1 - 11


Proceeding of the 3rd International Conference on Informatics and Technology, 2009

[4] Chu, h. D., dodson, j. E. & liu, i. C., fast-a framework for automating statistic-based testing. Available
at: http://citeseer.nj.nec.com/73306.html, 1997.

[5] Ferguson, R. & Korel, B., “The Chaining Approach for Software Test Data Generation”, ACM
Transactions on Software Engineering and Methodology, 5(1), 63-86,1996.

[6] Gao, H., Feng, B. & Zhu, L., ”A kind of SaaGA Hybrid Meta-heuristic Algorithm for the Automatic Test
Data Generation”, IEEE,Vol 1. pp. 111-114, 2005.

[7] Kamal, Z.Z, Norashidi M.I, Mahamed Fadel J.K & Siti Norbaya A.,”A tool for automated test Data
Generation (and Execution) based on Combinatiorial approach”, 1 (1), pp. 19-36, 2007.

[8] Khor, S. & Grogono, P., “Using a Genetic Algorithm and Formal Concept Analysis to Generate Branch
Coverage Test Data Automatically”, Proceedings of the 19th International Conference on Automated
Software Engineering (ASE’04), IEEE, 2004.

[9] King, J.C., “A new approach to program testing”, ACM SIGPLAN Notices, pp. 10(6), 228-233, 1975.

[10] Korel, B., ”Automated Test Data Generation for programs with Procedures.”, ACM SIGSOFT Software
Engineering. 21, pp.209-215, 1996.
th
[11] Korel, B. & Ali, M.A.,”Assertion-Oriented Automated Test Data Generation”, Proceedings of the 18
International Conference on Software Engineering, pp. 71-80, 1996.

[12] Michael, C. C., McGraw, G. E., Schatz, M. A. & Walton, C.C. (1997). Genetic Algorithms for Dynamic
Test Data Generation. IEEE. 307-308.

[13] McGraw, G, Michael, C & Schatz. (1998). Generating Software Test Data by Evolution. Technical
Report RSTR-018-97-01, RST Corporation, Sterling.

[14] Myers, G.J., The Art of Software Testing. New York: John Wiley and Sons, 1979.

[15] Negnevitsky, M. , Artificial Intelligence: A Guide to Intelligent Systems. Addison-Wesley, Pearson


Education Limited, Essex, England, 2002.

[16] Offutt, A. J., Clark, J., Zhang, T. & Tewary, “Experiments with Data Flow and Mutation Testing”.
Retrieved April, 12, 2007, from http://citeseer.nj.nec.com/offutt94experiments.html, 1997

[17] Pargas, R.P., Harrold, M.J. && Peck, R.R., “Test Data Generation using Genetic Algorithms”, Journal
of Software Testing, Verification and Reliability 9, pp. 263-282, 1999.

[18] Pei, M, Goodman, E.D, Gao, Z & Zhong, K., “Automated Software Test Data Generation Using A
Genetic Algorthim”, Michigan State University, 1994.

[19] Ramamoorthy, C.V., “The Automated Generation of Program Test data”, IEEE Transactions on
Software Engineering. 2 (4), pp. 293-300, 1976.

[20] Rohaida, R., Cik Fazilah, H. & Mazni, O.,“Correctness Assessment Of Java Programming
Assignment“, Laporan Akhir Geran Penyelidikan Fakulti, Universiti Utara Malaysia, 2004.

BIOGRAPHY

Mawarny binti Md. Rejab obtained her Master of Computer Science from Universiti Teknologi Malaysia in 2003.
Currently, she is a lecturer at the College of Arts & Sciences (Information Technology), Universiti Utara
Malaysia. Her research areas include program analysis, software metrics, and software testing. She has
published a number of papers related to these areas.

Rohaida binti Romli is a lecturer at College of Arts & Sciences (Information Technology), Universiti Utara
Malaysia. Her research areas include software testing, program analysis, software metrics and software quality.
Currently, she is doing her phd at Universiti Sains Malaysia and focuses on test data generation.

Nooraini binti Yusof is a lecturer at College of Arts & Sciences (Information Technology), Universiti Utara
Malaysia. Her research area focuses more on Artificial Intelligence including neural networks, agents, genetic
algorithm. She has published a number of papers related to these areas.

©Informatics '09, UM 2009 RDT1 - 12

Potrebbero piacerti anche