Probabilistic Stochastic Test Data

Bj Rollison
Test Architect
Microsoft
http://www.TestingMentor.com
http://blogs.msdn.com/imtesty
The test data dilemma
Benefits and drawbacks of static test data
Trips and traps of random data generation
Effective data decomposition
Effective sampling techniques

Large number of input variables
Virtually infinite permutations of variables
Impractical to test all permutations
Example:
NetBIOS name 15 alphanumeric characters
ASCII only chars, 82 allowable characters
(0x20 \ * + = | : ; “ ? < > , ) invalid*
Total number of possible input tests equals
(8215 + 8214 + 8213…+ 821)
51,586,566,049,662,994,687,009,994,574
That’s a RBN (really big number)!
Static test data Random test data
Customer data Tester generated

Domain expertise Experience
System knowledge
Test data (tribal knowledge) Limited population
Historical failure indicators Specialized knowledge
Disadvantages Computer generated

Limited in scope Increases breadth
Diminishing effectiveness Eliminates human bias
Outdated Not representative
Misused Violates constraints
Not reproducible
Static data
Customer data
Domain/business expertise
Test data
Library of historical failure indicators
Generally limited in scope
Loses effectiveness for multiple iterations
Random data
Tester generated data
Experience, intuition
Limited input population, keyboard mapping
Computer generated data
Not representative – does not “look” real
If the data is representative
of the total population then
any permutation of the
elements is allowable.
Violates constraints
Data decomposition
(equivalence class subsets)
Deterministic algorithms
Not reproducible.
Seeded random generation
Probabilistic
Representative of the total population of
possible data elements
Stochastic
Unbiased random sample of elements from a
probable distribution
Variability of sampled elements
Increases breadth of data coverage
Increases breadth of permutations
May produce unexpected variations
Eliminates/minimizes human bias
Pseudo-random number generators
Provides a sequence of numbers that meet
certain statistical requirements for randomness
Elements chosen with equal probability from a
finite set
Most use a date/time seed by default
But, must be able to pass a parameterized
constructor as a seed value for repeatability
Not perfect, but reasonably random for practical
purposes…let’s see!
Define representative data sets (valid and invalid)
Example – Credit card numbers
Checksum – Luhn (Mod 10) algorithm
341846580149320
Bank Identification Card length –
Number (BIN) – (BIN + digits)
between 1 and 4 between 14 and
digits depending 19 depending on
on card type card type
Equivalence class partitioning decomposes data
into discrete valid and invalid class subsets
Input variable Valid input Invalid input
Card type Valid Class subsets Invalid Class subsets

American BIN – 34, 37 Unassigned BINs
Express Length – 15 digits Length <= 16 digits
Checksum – Mod 10 Length >= 14 digits
Fail Checksum
Maestro BIN – 5020, 5038, Unassigned BINs
6034, 6759 Length <= 15 digits
Length – 16, 18 Length >= 19 digits
Checksum – Mod 10 Length == 17 digits
Fail Checksum
Random
Random Numbers
Length
Seed
value
Random
Pseudo BIN Seed
random
value
generator
348702004783719
Random credit card number
One random generator and seed per test run!
Dynamic seed
Seed variable must be preserved in test log
for repeatability!
User seed
Tester provides seed value for repeatability
private int seedVal = 0;
public int SeedValue

{
get { seedVal = GenerateSeed(); }
set { seed = value; }
}
GetCardNumber
Get BIN
Get CardLength
Deterministic
Assign BIN to cardNumber;
algorithm to Generate a new random object;
generate a valid for (cardNumberLength < CardLength)
random credit Generate a random number 0 <> 9
card Append it to the cardNumber
if Not_Valid_Card_Number
while Not_Valid_Card_Number
increment last number by 1
return cardNumber;
Assigned BINs ensures the data looks real

The Mod10 check ensures the data feels real
Result is representative of real data!
JCB Type 1
BIN = 35 Len = 16
JCB Type 2
BIN = 1800, 2131, Len = 15
Model Apply
Generate Verify
test test
test data results
data data
Decompose the Generate valid Apply the test Verify the actual
data set for each and invalid test data to the results against
parameter using data adhering to application the expected
equivalence class parameter properties, under test results – oracle!
partitioning business rules, and
test hypothesis
Robust
testing
String length
Multi- fixed or variable
language
input Seed value
testing
Custom range for
Unicode greater control
language Assigned code
families points
Reserved
characters
Unicode surrogate
pairs
1000 Unicode characters
from the sample population
Character corruption and
data loss
135 characters (bytes)

obvious data loss
Static test data wears out!
Recklessly generated random test data that is not
repeatable or not representative may find defects,
or may throw a lot of false negatives
Probabilistic stochastic test data
Modeled representation of the population
Statistically unbiased
Tests robustness
Increases breadth of data coverage
Increased value in using both!
Bj.Rollison@TestingMentor.com
http://hwtams.com
Practice .NET Testing with IR Data
Bj Rollison
http://www.stpmag.com/issues/stp-2007-06.pdf
Automatic test data generation for path testing
using a new stochastic algorithm
Bruno T. de Abreu, Eliane Martins, Fabiano L. de Sousa
http://www.sbbd-sbes2005.ufu.br/arquivos/16-%209523.pdf
Data Generation Techniques for Automated
Software Robustness Testing
Matthew Schmid & Frank Hill
http://www.cigital.com/papers/download/ictcsfinal.pdf
Tools

Probabilistic Stochastic Test Data

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Probabilistic Stochastic Test Data

Caricato da

Copyright:

Formati disponibili

Bj Rollison

Benefits and drawbacks of static test data

Trips and traps of random data generation

Effective data decomposition

Effective sampling techniques

Customer data Tester generated

Disadvantages Computer generated

Checksum – Luhn (Mod 10) algorithm

Card type Valid Class subsets Invalid Class subsets

private int seedVal = 0;

public int SeedValue

Assigned BINs ensures the data looks real

135 characters (bytes)

Potrebbero piacerti anche