Sei sulla pagina 1di 21

Bj Rollison

Test Architect
Microsoft

http://www.TestingMentor.com
http://blogs.msdn.com/imtesty
The test data dilemma

Benefits and drawbacks of static test data

Trips and traps of random data generation

Effective data decomposition

Effective sampling techniques


Large number of input variables
Virtually infinite permutations of variables
Impractical to test all permutations
Example:
NetBIOS name 15 alphanumeric characters
ASCII only chars, 82 allowable characters
(0x20 \ * + = | : ; “ ? < > , ) invalid*
Total number of possible input tests equals
(8215 + 8214 + 8213…+ 821)
51,586,566,049,662,994,687,009,994,574
That’s a RBN (really big number)!
Static test data Random test data

Customer data Tester generated


Domain expertise Experience
System knowledge
Test data (tribal knowledge) Limited population
Historical failure indicators Specialized knowledge

Disadvantages Computer generated


Limited in scope Increases breadth
Diminishing effectiveness Eliminates human bias
Outdated Not representative
Misused Violates constraints
Not reproducible
Static data
Customer data
Domain/business expertise
Test data
Library of historical failure indicators
Generally limited in scope
Loses effectiveness for multiple iterations
Random data
Tester generated data
Experience, intuition
Limited input population, keyboard mapping
Computer generated data
Not representative – does not “look” real
If the data is representative
of the total population then
any permutation of the
elements is allowable.
Violates constraints
Data decomposition
(equivalence class subsets)
Deterministic algorithms
Not reproducible.
Seeded random generation
Probabilistic
Representative of the total population of
possible data elements
Stochastic
Unbiased random sample of elements from a
probable distribution
Variability of sampled elements
Increases breadth of data coverage
Increases breadth of permutations
May produce unexpected variations
Eliminates/minimizes human bias
Pseudo-random number generators
Provides a sequence of numbers that meet
certain statistical requirements for randomness
Elements chosen with equal probability from a
finite set
Most use a date/time seed by default
But, must be able to pass a parameterized
constructor as a seed value for repeatability
Not perfect, but reasonably random for practical
purposes…let’s see!
Define representative data sets (valid and invalid)
Example – Credit card numbers

Checksum – Luhn (Mod 10) algorithm

341846580149320
Bank Identification Card length –
Number (BIN) – (BIN + digits)
between 1 and 4 between 14 and
digits depending 19 depending on
on card type card type
Equivalence class partitioning decomposes data
into discrete valid and invalid class subsets
Input variable Valid input Invalid input

Card type Valid Class subsets Invalid Class subsets


American BIN – 34, 37 Unassigned BINs
Express Length – 15 digits Length <= 16 digits
Checksum – Mod 10 Length >= 14 digits
Fail Checksum
Maestro BIN – 5020, 5038, Unassigned BINs
6034, 6759 Length <= 15 digits
Length – 16, 18 Length >= 19 digits
Checksum – Mod 10 Length == 17 digits
Fail Checksum
Random
Random Numbers
Length

Seed
value
Random
Pseudo BIN Seed
random
value
generator

348702004783719
Random credit card number
One random generator and seed per test run!
Dynamic seed
Seed variable must be preserved in test log
for repeatability!
User seed
Tester provides seed value for repeatability

private int seedVal = 0;

public int SeedValue


{
get { seedVal = GenerateSeed(); }
set { seed = value; }
}
GetCardNumber
Get BIN
Get CardLength
Deterministic
Assign BIN to cardNumber;
algorithm to Generate a new random object;
generate a valid for (cardNumberLength < CardLength)
random credit Generate a random number 0 <> 9
card Append it to the cardNumber
if Not_Valid_Card_Number
while Not_Valid_Card_Number
increment last number by 1
return cardNumber;

Assigned BINs ensures the data looks real


The Mod10 check ensures the data feels real
Result is representative of real data!
JCB Type 1
BIN = 35 Len = 16
JCB Type 2
BIN = 1800, 2131, Len = 15
Model Apply
Generate Verify
test test
test data results
data data

Decompose the Generate valid Apply the test Verify the actual
data set for each and invalid test data to the results against
parameter using data adhering to application the expected
equivalence class parameter properties, under test results – oracle!
partitioning business rules, and
test hypothesis
Robust
testing
String length
Multi- fixed or variable
language
input Seed value
testing
Custom range for
Unicode greater control
language Assigned code
families points
Reserved
characters
Unicode surrogate
pairs
1000 Unicode characters
from the sample population
Character corruption and
data loss

135 characters (bytes)


obvious data loss
Static test data wears out!
Recklessly generated random test data that is not
repeatable or not representative may find defects,
or may throw a lot of false negatives
Probabilistic stochastic test data
Modeled representation of the population
Statistically unbiased
Tests robustness
Increases breadth of data coverage
Increased value in using both!
http://www.TestingMentor.com
Bj.Rollison@TestingMentor.com
http://hwtams.com
Practice .NET Testing with IR Data
Bj Rollison
http://www.stpmag.com/issues/stp-2007-06.pdf
Automatic test data generation for path testing
using a new stochastic algorithm
Bruno T. de Abreu, Eliane Martins, Fabiano L. de Sousa
http://www.sbbd-sbes2005.ufu.br/arquivos/16-%209523.pdf
Data Generation Techniques for Automated
Software Robustness Testing
Matthew Schmid & Frank Hill
http://www.cigital.com/papers/download/ictcsfinal.pdf
Tools
http://www.TestingMentor.com

Potrebbero piacerti anche