Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
by Harry M. Sneed ANECON GmbH, Vienna Harry.Sneed@T-Online.de Abstract: This contribution is an experience report on system testing and in particular on the testing of a datawarehouse system. Datawarehouses are large databases used solely for querying and reporting purposes. The datawarehouse in question here was dedicated to fulfilling the reporting requirements of the BASEL-II agreement on the provision of auditing data by the banks, the European equivalent of the SarbaneOxley Act. The purpose of the testing project was to prove that the contents of the datawarehouse are correct in accordance with the rules specified to fill them. In the end, the only way to achieve this was to rewrite the rule specifications in a machine readable form and to transform them into post assertions, which could be interpreted by a data verification tool for comparison of the actual data contents with the expected data contents. The testing project was never fully completed, since not all of the rules could be properly transformed. Keywords: System Testing, Datawarehouse Testing, Data Transformation Rules, Post Conditions Assertions, Formal Verification. expect. They might even scan selected database contents to see how they are affected by the test transactions. If any reports are generated, they will trigger the jobs to produce them and check their contents. If there are any doubts about the results, they will consult with the analysts or with the end users. The checking of the output is done on a spot check basis. Through intuition or by means of some inerpt domain knowledge the tester is able to interpret what is correct and what is not. At the center of conventional system testing is the concept of a use case. A system is considered to be functionally tested when all of its use cases with all of their usage variants have been tested. The use case is also the basis for defining test cases [1]. One use case may have several test cases, one for each alternate part through the use case. Seldom are all of the possible results checked. To do so would require too much time and effort. This conventional system testing approach has been well defined in the pertinent testing literature [2]. Sometimes it is recommended to automate the filling up and the checking of the user interface by some kind of capture/replay tool to expediate the test [3] and sometimes it is recommended not to do that, since automation only clouds the issue and diverts the tester from his responsability for the accuracy of the test [4]. ANECON had always used the conventional test approach before. It was believed that a similar approach would work for a datawarehouse project. The problem was only one of finding enough testers to run the jobs and check the results. This belief was not going to be true.
Proceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06) 0-7695-2672-1/06 $20.00 2006
IEEE
Authorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
instance of the entity being described. Each instance of an entity, i.e. each line of a table, must be uniquely identifiable and distinguishable from all other lines of that table. For that purpose, one or more columns serve as a unique identifier. In the Datawarehouse in question there were 266 tables with more than 11.000 attributes, giving an average of some 40 attributes per table. Each attribute was to have a rule describing how that attribute is derived. Attributes can be taken from operational data, they can be computed or they can be set to a constant value. How this is to be accomplished is described in the attribute rule. For this datawarehouse, the input data was delivered from the various bank subsidaries scattered throughout Eastern Europe. Since the local bank analysts were the only ones familiar with their data, it was up to them to provide a data model of their operational data together with an English language description of the individual data elements. These models were merged by the central bank analysts to map the operational data on to the attributes of the datawarehouse. This lead to the so called mapping rules of which there were in the end, some 5317. The goal of datawarehouse testing is to demonstrate that the contents of the datawarehouse are correct. To achieve this means checking the attributes against their rules, i.e. comparing actual values with expected values based on the mapping rules. This could be done with a random sample of all attributes, with a subset of critical attributes, or with the complete set of attributes [5]. As was discovered, the effort of checking even a small subset of attributes, is so great manually that even statistical testing becomes very expensive. On the other hand, if the rule verification is done automatically, then it might as well be done for all the attributes, since there is no additional price to pay.
Proceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06) 0-7695-2672-1/06 $20.00 2006
IEEE
Authorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
they were. The remainder had to reformulated in a semi formal syntax. The syntax was as follows assign <Filename>. <Attribute> for a 1:1 assignment of an input value assign <Constant> for a 1:1 assignment of a constant value assign <constant>!<constant> for a set of alternate values assign Table.Attribute | <constant> | Table. Attributen for a concatenation assignment assign join Table_A.Attribute_A1 | Table_A.Attribute_A2 with Table_B.Attribute_B1 | Table_B.Attribute_B2 for concatenating values from different data sources assign Table_A.Attribute_A1 + Table_B.Attribute_B1 * 2; for arithmetic expressions. There was no nesting of clauses nor precedence rules here, so the arithmetic expression was resolved in a simple left to right sequence assign Function (Param_1, Param_2, Param_3) whereby the parameters could be attributes in source input files, i.e. Table_C.Attribute_C1, or constant values, i.e. 10.5 With these assignment expressions, enhanced by an if expression in the form if (Table_A.Attribute_A1 <oper> <operand>) whereby <oper> := , < , >, <=, >=, <>, etc. and <operand> = <Table>.<Attribute> or <constant> more than 4700 rules could be resolved automatically and converted into post condition assertions, which could then be tested. Of these rules some 750 were manually adjusted requiring more than 150 hours of effort. The fact that not more mapping rules were adjusted was due not only to the limited time available for the test, but also to the informal nature of the rules. Some rules were formulated in such a confusing manner which defied reformulation. There was simply no way to formalize them. It was a fault of the project that the rules were not properly defined to begin with. Had they been formulated at least in some semi formal form, it would have been possible to automatically process them immediately without having to spend valuable tester time in rewriting them. (see sample 2)
output data. It has been used to in previous projects for selective regression testing [7]. Basically it compares the values of attributes in a new database or new outputs with the values, which existed for the same attributes in a previous database, or output. For every entity, i.e. datbase table, file or report, a set of assertions is written, associating the new attributes with the old ones or with manipulations on the old ones. The assertions are of the form assert new.Attr_1 = old.Attr_2; assert new.Attr_2 > old.Attr_3; assert new.Attr_4 < old.Attr_4; Instead of comparing a new value with an old value, it was also possible to compare a new value with a constant, a set of constants, or a computed value as depicted below assert new.Attr_2 = 100.50; assert new.Attr_2 = A ! B ! C ! D!; assert new.Attr_2 = old.Attr.2 + 100 / 2; For this project, the assert statement was extended to include concatenations assert new.Attr_3 = old.Attr_3|-|old.Attr_4; Any assertion could become conditional by qualifying it with an if clause in the form assert new.Attr_4 = old.Attr_5 if (old. Attr_5 > 100 & old. Attr_5 < 200 ); There could be any number of and conditions. Logical or conditions were not allowed. They are expressed in another form, namely by assigning different assertions to the same attribute assert new.Attr_5 = old.Attr_6; assert new.Attr_5 < 100; assert new.Attr_5 > 200; If the value of Attr_5 fulfills any of these assertions, it is considered to be correct. The assertions are grouped together into assertion procedures, one per entity, and qualified by a key condition. The key condition matches the keys of the new or output entity with those of the old or input entity. if ( new.key_1 = old.key_1 & new.key_2 = old. key_2 .); assert new.Attr_1 = old.Attr_2; assert new.Attr_ 2 = 0; assert new.Attr_3 = old.Attr_3 + 5 ; end; For XML and WSDL there can be several entity types included in any one output report or response. Therefore, there must be a separate set of assertions, for every entity type. These assertions are qualified by the object name if (object = This_object & new.key_1 = old.key_1 ) assert new.Tab.Attr = old.Tab.Attr; end_object;
Proceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06) 0-7695-2672-1/06 $20.00 2006
IEEE
Authorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
A comparison job is started after every test, which complies the assertion into internal symbol tables and then processes the database tables or output files one by one, comparing the content of each asserted attribute against the result of the assertion. Attributes which do not comply with their assertions are reported as incorrect. In this way, the results of a test can be automatically validated without having to manually scan through and check them. This method is both faster and more reliable. A typical assertion procedure is depicted among the samples at the end. (see sample 3).
a key condition table with the names and types of keys to be matched a comparison table assigning which new attributes are to be compared with which old attributes and/or constants a constant table containing an entry for each constant value used as an operand in the assertions a condition table containing all of the conditions to be fulfilled for the assertions to be executed. A pointer links the conditions to the assertions to which they apply
DataVal is the final tool in the set. After reading the symbol tables it first processes the old CSV file, i.e. the inputs, and stores the values into a temporary database with their keys as an index. It takes the attribute tags from the first line of the CSV file and subequently counts the columns to associate the values with the tags. It then processes the new CSV file, i.e. the outputs, and matches each new record by key to an existing old record. If a match is found the contents of the new record are compared with the values of the old record or with the constant in the symbol table or with computed values or with concatenated values or with a set of alternate constant values or with the lower and upper bounds of a range, depending on the conditions. Thus, there are many ways to verify the correctness of an output value. If no match is found, the old record is considered to be missing. After processing all new records, a second search is made of all the old records in the temporary database to see if they were compared or not. If not they are considered to be missing in the new file. A protocoll lists out all of the incorrect data values, i.e. those that do not comply with their assertions, as well as all missing records. A set of test statistics summarizes the percentage of records missing and incorrect attributes. (see sample 7)
Proceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06) 0-7695-2672-1/06 $20.00 2006
IEEE
Authorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
Step 3: The SQL procedures were automatically generated from the assertions by the GenSQL tool Step 4: The input attributes for each target datawarehouse table were selected from the input tables, joined and downloaded into a CSV file Step 5: The output attributes for each target datawarehouse table were downloaded into a CSV file Step 6: The assertion procedures were compiled by the tool AsrtComp Step 7: The input and output CSV files were matched and the contents of the output file verified against the post assertions by the tool DataVal. Step 8: The testers examined the data validation results and reported any errors
contents have to be verified against the specifications. The challenge here is to provide a specification language which will accomodate both goals. Once the mapping rules have been specified it is the job of the tools to run the tests and verify the database contents. The role of the tester can be compared to that of an engineer on a roboter assembly line, monitoring the work of the robots and only intervening when something goes wrong. For this, he should understand the function of the robots, without having to do the work himself. Such is the case in datawarehouse testing. In the datawarehouse described here only 88% of the attributes were actually tested. This resulted from the fact that 12% of the rules were not verifiable. Nevertheless, those that could be verified were verified and more than 200 incorrect values could be identified. This project is a good example of improvising to make the best of a bad situation. It is always difficult to assess the success of a test project. The only objective way of doing it, it is to compare the errors found in testing with the errors which come up later in production. Since this datawarehouse system has yet to go into production, it is impossible to know how many errors might come up there. If they do it will be because of incomplete and inconsistent rules. The specification langauge problem remains the source of most software system errors and in the case of datawarehouse systems, particularly so. It is here where academia could make a significant contribution.
Source Data
Target data
Tester reports If (new.key=old.key) errors assert new.Attribut=old.Attribut if (<condition>); Error assert new.Attribut=old.Attribut + wert*wert; Reports
References:
[01] Bach, James: Reframing Requirements Analysis, in IEEE Computer, Vol. 32, No. 6, 2000, p. 113 [02] Hutcheson, Maggie.: Software Testing Fundamentals, John Wiley & Sons, Indianapolis, 2003, p. 12 [03] Fewster,M./Graham,D.: Software Test Automation, Addison-Wesley, New York, 1999, p. 248 [04] Kaner, C. / Bach, J. / Pettichord, B. : Lessons learned in Software Testing, John Wiley & Sons, New York, 2002, p. 111 [05] Dyer, Michael: Statistical Testing in The Cleanroom Approach to Quality Software Development, John Wiley & Son, New York, 1992, p. 123 [06] Taylor, R.: Assertions in Programming Languages, in Proc. of NCC, Chicago, 1978, p. 105. [07] Sneed, H.: Selective Regression Testing of a Host to DotNet Migration, submitted to ICSM2006, IEEE Computer Society, Philadelphia, Sept., 2006 [08] Koomen, T./ Pol, M.: Improving the Test Process, John Wiley & Sons, London, 1999, p. 7
In the end a report came out with the data errors which were then fed into the error tracking system by the testers. Once the rules had been reformulated the whole testing process could be repeated within a day. Normally such a test cycle would require at least 10 days. So the automation lead in this case to a significant improvement in test productivity.
Proceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06) 0-7695-2672-1/06 $20.00 2006
IEEE
Authorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
Samples:
Sample 1: An extract from the rule table before the rule has been converted
DR_INTEREST_ID;"Link to TB0_ACCOUNT_INTEREST.Debit interest conditions applicable to the account."; If REIACD in field DICT(debit) has a value other than 0, account is linked to interest group. The following then applies: REINTD / KEY (Position 3-4) (Interest Type) 2 Alpha REINTD / KEY (Position 5-9) (Interest Subtype) 5 Alpha REINTD / KEY (Position 1012)(Currency) 3 Alphas The above Key fields are concatenated in ID. If in REIACD the DICT values are zeroes, the account interest condition has to be extracted from the ACCNTAB: ACCNTAB / DRIB (debit Base Rate Code) ACCNTAB / DRIS (debit Spread Rate) If both of <> 0 value, extract ACCOUNT_ID If ACCNTAB / DRIS is available (<> 0), extract the same as for ACCOUNT_ID If only DRIB of <> value, extract DRIB
Sample 2: An extract from the rule table after the rule has been converted
DR_INTEREST_ID;"Link to TB0_ACCOUNT_INTEREST. Debit interest conditions applicable to the account."; " ? assign REIACD/DICT | REIACD/DCST | ACCNTAB/CCY | 'D' if REIACD/DICT (debit) <> '0', assign ACCNTAB/CONCAT if ACCNTAB/DRIS <> '0', assign ACCNTAB/CCY|ACCNTAB/DRIB if ACCNTAB/DRIB <> '0', assign '*nomap*' if REIACD/DICT = '00' and ACCNTAB/DRIS = '0' and ACCNTAB/DRIB = '00' assign ACCNTAB/CNUM|CCY|ACNO|BRCA if other. (18-digit account Id made up of CNUM length 6, leading zeros length 4, leading zeros +ACSQ length 2, leading zeros +BRCA concatenated).";
Proceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06) 0-7695-2672-1/06 $20.00 2006
IEEE
Authorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06) 0-7695-2672-1/06 $20.00 2006
IEEE
Authorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Testing: Academic & Industrial Conference Practice And Research Techniques (TAIC PART'06) 0-7695-2672-1/06 $20.00 2006
IEEE
Authorized licensed use limited to: JSS ACADEMY OF TECHNICAL EDUCATION. Downloaded on May 10,2010 at 04:00:02 UTC from IEEE Xplore. Restrictions apply.