Sei sulla pagina 1di 30

A Framework for Testing Database Applications

David Chays Polytechnic University Brooklyn, NY Joint work with Phyllis G. Frankl (Polytechnic) Saikat Dan (Polytechnic) Filippos Vokolos (Lucent Technologies) Elaine J. Weyuker (AT&T Labs - Research)

Motivation
Database systems play an important role in virtually every modern organization Faults can be very costly Programmers/testers may lack experience and/or time Little attention has been paid to DB application program correctness

Outline of Talk
Background Aspects of DB system correctness Issues in testing DB application programs Architecture of tool set Tool for generating database states Additional issues and approaches

DBMS and DB application


Database Management System DB schema, eg., Emp(ssn, name, addr, sal) Dept(id, dept-name)

DB

DB application, eg., /* C program with embedded SQL*/

Relational databases
Data is viewed as a collection of relations
relation schema relation (relation state)

Tables, tuples, attributes, constraints for example, create table S (ssn char(11) primary key,
name char(25) not null)

Table S

ssn 001-00-0356 012-34-5678 036-54-5555 051-88-9911

name Johnson Smith Jones Blake

Aspects of Correctness
Does the DBMS perform all operations correctly? Is concurrent access handled correctly? Is the system fault-tolerant? ... Does the application program behave as intended?

Traditional vs. DB programs


input input DB state

output function imperative nature

output

DB state

function declarative nature

Example of an Informal Specification


Customer-feature table:
customerID address features ...

Billing table
customerID billing plan ...

Input customer ID and name of feature to which the customer wishes to subscribe. Invalid ID: return 0 feature unavailable in that area: return code 2 feature available but incompatible with existing features: return code 3 else update customers feature record, update billing table, return code 1

What are the Input/Output Spaces?


Nave approach
I = {customer-IDs} X {feature-names} 0 = {0,1,2,3}

More suitable approach:


I = {customer-IDs} X {feature-names} X {database-states} 0 = {0,1,2,3} X {database-states}

Problem:
must control and observe the DB state

DB Application Testing Goal


Select interesting DB states along with user inputs that exercise interesting behavior Cover wide variety of situations that could arise in practice Do so in a way that facilitates checking of output to user and resulting DB state

Situations to Explore
Customer already subscribes to that feature Feature not available in customers area Feature available, but incompatible with other features customer already has Feature available and compatible with existing features Customer doesnt yet subscribe to any features ...

May involve interplay between several tables


Table 1: incompatible features feature incompatible_feature F1 F2 ... ... feature area F1 11235 F2 11235 ... ... area F1 F2 ... FN 11235 ...

Table 2: features available in various areas Table 3: customers and features

ID 011 ...

Will Live Data Suffice?


May not reflect sufficiently wide variety of situations May be difficult to find the situations of interest May violate privacy or security constraints

Generating Synthetic Data


DB state is a collection of relation states, each of which is a subset of the Cartesian product of some domains Generating domain elements and gluing them together isnt enough, since constraints must be honored We attempt to generate interesting data that obey integrity constraints Use schema and user supplied info

DB schema App source

Suggestions from tester DB state State Checker

State Generator

Input Generator User input App exec Output Checker Output

Results

DB state generator
Inputs DB schema (in SQL) Parses schema to derive info about
attributes tables constraints : uniqueness, not-NULL, referential integrity inputs additional info from user suggested attribute values, divided into groups, similar to Category-Partition Testing [OstrandBalcer] additional annotations

Example Schema
create table s (sno char(5), sname char(20), status decimal(3), city char(15), primary key(sno)); create table p (pno char(6) primary key, pname char(20), color char(6), weight decimal(3), city char(15)); create table sp (sno char(5), pno char(6), qty decimal(5), primary key(sno,pno), foreign key(sno) references s, foreign key(pno) references p);

Stmt
Create Stmt Nodetag type = T_CreateStmt relname = s

Create table s( sno char(5), primary key(sno) );


Column Definition Nodetag type = T_ColumnDef colname = sno type name = bpchar Constraints = NIL Table Constraint Nodetag type = T_Constraint contype = CONSTR_PRIMARY keys

T_IDENT name = sno

Stmt
Create Stmt Nodetag type = T_CreateStmt relname = s

Create table s( sno char(5) primary key );


Column Definition Nodetag type = T_ColumnDef colname = sno type name = bpchar Constraints contype = CONSTR_PRIMARY

0
globalTablePointer

1 2 3

sno | F| F| char |F| F| |F| | ~nn | S | F| F| pr un sname | F| F|char | F| F| |F| | ~nn | S | F| F| ~pr ~un status | |F| F|decF|| ~pr | F| | ~nn S | F| F| F| ~un city | F| F|char | F| F| |F| | ~nn City | S | F| F| ~pr ~un pno || F| |F| F| F| F| F|| F| | ~nn P char | pr un pname | F| |F| F| F|~prF| ~un | ~nn P char | F| | F| color || F| |F| F| F|~prF| ~un | ~nn color P char | F| | F| weight | F| |F| F| F|~prF| ~un | ~nn weight| P dec | F| | F| city | P | char | ~pr | ~un | ~nn

cp cp cp cp cp cp cp cp cp

S |4|
0 1

P |5|

2 3 4

SP | 3 |
0 1 Null 2

sno |SP | char | pr | un | ~nn | foreign cp pno |SP | char | pr | un | ~nn | foreign cp qty |SP | dec | ~pr | ~un | ~nn cp

Selecting Attribute Values


Initial prototype queries tester for suggested values and guidance on how to use those values Values may be partitioned into data groups (choices) Tester may specify probabilities for data groups

--choice_name: low 10 20 30 -----choice_name: medium 300 400 -----choice_name: high 5000 6000

Each category (column) can have a list of choices pointed to by cp.

cp

low
10 20 30

medium
300 400

high
5000 6000

DB table generation
Tester specifies table sizes Tool generates tuples for insertion
select data group or NULL, guided by annotations select value from data group, obeying constraints keep track of values used

Outputs sequence of SQL insert statements

Input files for Parts-Supplier database


sno: --choice_name: sno S1 S2 S3 S4 S5 sname: --choice_name: sname Smith Jones Blake Clark Adams pname: --choice_name: interior seats airbags dashboard -----choice_name: exterior doors wheels bumper city: --choice_name: domestic --choice_prob: 90 Brooklyn Florham-Park Middletown -----choice_name: foreign --choice_prob: 10 London Bombay weight: --choice_name: weight 100 300 500

pno: --choice_name: pno P1 P2 P3 P4 P5

status: --choice_name: status --null_prob: 50 0 1 2 3

color: --choice_name: color blue green yellow

city: --choice_name: domestic --choice_prob: 90 Brooklyn Florham-Park Middletown -----choice_name: foreign --choice_prob: 10 London Bombay

status: --choice_name: status --null_prob: 50 0 1 2 3

A database state produced by the tool


Table s sno S1 S2 S3 S4 sname NULL Smith Jones Blake status 0 1 NULL NULL city Brooklyn Florham-Park London Middletown sno S1 S1 S1 S2 S2 S2 S3 S3 S3 S4 Table sp pno P1 P2 P3 P1 P2 P3 P1 P2 P3 P1 qty 5000 300 10 6000 400 5000 20 300 30 6000

Table p pno P1 P2 P3 pname NULL Seats airbags color blue green yellow weight 100 300 500 city Brooklyn Florham-Park Middletown

Related work
Lyons-77, DB-Fill, TestBase Like our approach, rely on user to supply attribute values Do not handle integrity constraints as completely Require tester to describe tables in specialpurpose language (rather than SQL)

Testing Techniques in DB literature


Focus on DB system performance, rather than DB application correctness Benchmarks Performance of SQL processor
Generation of large number of DML statements [Slutz]

Generation of huge tables with given statistical properties [Grey et al]

Summary
Issues Framework Prototype

Future Work
Refinement based on feedback from DB application developers / testers Other DB state generation heuristics
boundary values missing constraints difficult SQL features

Interplay between DB state and user inputs Checking DB state after test execution Checking application outputs