Sei sulla pagina 1di 14

PIBWin HELP FILE CONVERTED TO WORD

This tutorial was written by Dr. Trevor Bryant and goes far more in-depth than the Schnf
Ashex Tutorial in terms of PIBWins abilities.
Introduction
PROBABILISTIC IDENTIFICATION OF BACTERIA for Windows (PIBWin) is a windows
version of a DOS program PIB (also called Bacterial Identifier).
The programme has three major functions:
the identification of an unknown isolate
the selection of additional tests to distinguish between possible strains if
identification is not achieved
the storage and retrieval of results
It also has some utility functions for assessing the usefulness of identification matrices and for
converting matrices into different formats.
The program makes use of Excel files to store identification matrices and archived results to
achieve this, although other file formats are supported to allow backwards compatibility with
the DOS version of the programme.
Up to date information on the programme can be found on the PIBWin web site
www.som.soton.ac.uk/staff/tnb/pib.htm which can also be accessed from the Help menu.
The program is designed to use probabilistic identification matrices that have either published
in the literature or created by the user. The matrices that are provided with PIB have been
taken from the literature. These matrices have been typed in from the publication describing
them and users should refer to these publications for full details of the methods used when
testing isolates.
Identification Matrix
The identification matrix is displayed when the Matrix tab is selected.

The matrix may be displayed as integer numbers (ranging from 1 to 99) representing the
percentage probability of obtaining a positive result, or they can be displayed as +/v/depending on the value selected. This option is set by the Options.
The view can be changed by clicking the right mouse button and checking or unchecking
Display Matrix as +/v/- on the pop up menu.
To view the full name for a test or taxa move the cursor over the item, a pop up box will display
the item in full.
Sorting the identification matrix
The matrix can be sorted by double clicking on the name at the top of each column. The first
double click performs an ascending sort (negative results first), successive double clicks
perform descending and ascending sorts.
Note the underlying identification matrix is not affected by sorting as the Matrix tab displays a
view of it. To return to the original order, either click the right mouse button and select Revert
to original order, or select another tab and then return to the Matrix tab.
Results
The Results tab is where the results for an unknown strain are entered.
There are four aspects to the Results screen
Details Bar
Results Grid
Entering Results
Buttons
Details Bar
The details bar is where a personal key, the source of the isolate and details about the isolate
can be entered.

Key can be a maximum of 15 characters. A key must be entered if the results are to be saved
to an Archive file for recall at a later time.
Source is drop down list box which allows text up to a maximum of 50 characters to be
entered. To achieve consistent entry of source text, existing values from the Archive file is
displayed in the drop down list, so the list will grow in length over time.
Details provides for a maximum of 255 characters.
The Save button is enabled when one result has been entered and there is an entry in the Key
box; it is only shown on the Identification and Additional Tests tabs.
Note: If an isolate is recalled from the Archive file and the key changed. Save will create a
new, additional, record in the Archive file.

Results Grid
Results can be entered in a grid or list format. This is controlled by the status of the Use List
Format for Results check box.
Grid format enables a 96 well microtitre plate format to be accommodated. The full name of
each test is shown in a pop up box when the cursor is placed over the test name.

List Format is a scrolling list

Entry of Results
Results can be entered using the keyboard or the mouse. There are 4 possible states for a
result:
positive + , negative -, indeterminate ? and not done.

The indeterminate state is to allow for tests that have been carried out, but the interpretation of
the result is difficult and you are undecided about the result. The indeterminate state allows
you to record that the test has been done, rather than the result is missing.

Result

Function
Key

Key

Mouse
Action

Positive

+ or =

F2

Left click

Negative

- or _

F3

Right click

Indeterminate ? or /

F4

Missing

<space
<Enter>

bar>

or

F5

Repeat click

The programme has been written so that the shift character does not have to be pressed to
obtain the + or ? symbol, although some keyboard layouts may differ.
To change a result press the key for the new value.
To remove a result using the mouse, click a second time.
Note: because of the way the mouse works, the first left click sometimes acts as a select
object so an additional click is needed.
Buttons
Reset Clears the results of the current isolate and resets them
all to missing. The details are left unchanged
New

Clears the results and the details of the current isolate


and resets them all to missing.

Recall Recalls the results of a previous isolate from an Archive


file

Archived Results

The Archive Results screen displays details and identification of previously entered isolates. If
an Archive file is not already open then an Open window is displayed when the Recall button
is pressed in the Results window.

To recall the results of a previous isolate Double Click on the row of the isolate.
Sorting the Archived Results
Each column of information can be sorted. Click on the column heading to sort the archived
isolates into ascending order, a second click reverses the sort into descending order.
Searching the Archived Results
The Find button activates a search of the archived results. Searching is case insensitive, it
does not include wild cards or complex searching. Once a hit has been obtained, the Find
Next button is enabled to permit further searching.
Searching is performed across all rows and columns excluding the first column.
Technical details
The software can support two types of Archive Files, Excel and DOS Archive.
The DOS Archive format is for backwards compatibility with the previous DOS version of this
software. It is not recommended that this format is used. It contains less information about
isolates and is less flexible. The Excel format is recommended.
The Excel Archive file can be opened and manipulated in Microsoft Excel. This enables the
data to be used by other software packages, unwanted isolate information deleted. DO NOT
CHANGE the order of the columns in the Archive file. This would make the file unusable with
the identification matrix. There are some internal checks that the software performs to detect
discrepancies between the Identification matrix file and the Archive file but these are not fool
proof. It is a case of user beware. So if you wish to experiment make sure that you have taken
back ups of your files before they are modified.
Identification
The identification tab is shown once a test result has been entered in the Results window.

Additional Tests
This tab is available when Identification is not successful and more than one taxon is a
possible candidate for the unknown isolate.
Tests may be chosen in two ways:
they may be selected so that the most likely taxon can be distinguished from other
likely taxa.
they can be selected to distinguish likely taxa from each other.

Use the radio buttons to select which method of test selection you wish to choose, then use
the spin edit box

to choose the number of taxa to be considered.

Use Select Tests to obtain the list of tests to be used.

Move the cursor over the strains and tests to obtain the name in full in a pop up window.
The Exclude Tests button allows you to specifically omit certain tests before test selection is
carried out.
See Also Test Selection Algorithm
Exclude Tests
The Exclude Tests window is used by the Additional Tests and Select Best Tests for Matrix
procedures.
A list of tests in the current matrix is displayed. Those tests that will be omitted from the test
selection procedure are shown with an asterisk * in the Excluded column.
Tests can be included or excluded by clicking on the Excluded column.

Include All Tests is used to include all tests from the Test Selection procedure
Exclude All Tests is used to exclude all tests from the Test Selection procedure, then those
tests that are required can be selected by clicking in the Exclude column.
Tools
The Tools menu options provide functions for manipulating matrix files and investigating the
properties of an identification matrix
Convert Matrix
The Identification matrix file can be written in one of three formats:
Excel [*.xls]
Comma separated values [*.csv]
Fixed format [*.mat]
The recommended format is to use the Excel format because this
contains more information that the other two formats.
The fixed format is for backwards compatibility with the original
DOS version of this software and its use is not recommended.

Convert DOS archive

This allows the Archive file created by the original DOS version of
this software to be rewritten in the Excel archive format. It is
strongly recommended that you convert old Archive files.
Note: a new Archive file is created and the original Archive file is
left untouched.

Select Best Tests

This allows investigation of the current matrix to determine which


are the most important tests in the matrix. See Select Best Tests
for Matrix for further details

Calculate
scores

Matrix

ID This allows investigation of the current matrix to determine if there


is an overlap between strains in the matrix. See Matrix ID scores
for further details

Select Best Tests for Matrix


This procedure is called from the Tools Menu. The procedure can be used to select the
minimum of tests to distinguish taxa in an identification matrix.
Tests may be chosen in two ways:
they may be selected so that one taxon can be distinguished from other strains
(taxa).
they can be selected to distinguish all strains (taxa) from each other.

Use Select Tests to obtain the list of tests to be used.


Move the cursor over the strains and tests to obtain the name in full in a pop up window.
The Exclude Tests button allows you to specifically omit certain tests before test selection is
carried out.
See Also Test Selection Algorithm
Matrix ID Scores
The Matrix ID scores procedure is called from the Tools Menu. It is used to assess whether
the identification matrix is capable of identifying each taxon (strain) that is contained in it. The
procedure considers each taxon in turn, it uses each percentage probability for that taxon as a
positive or negative result, creating a Hypothetical Median Organism (HMO). It then uses this
HMO to calculate an Identification Score using the Willcox probability. If any probabilities of 50
are encountered (typically missing data is coded as 50), the identification score is calculated in
three ways, tests where a value of 50 is found for the taxon are:
excluded
all treated as positive results
all treated as negative results
These results are shown as ID Score, Missing Positive and Missing Negative.
If the ID score does not exceed the Identification Threshold then the strain with the second
highest identification score is listed in the Next Strain column.

Ideally the ID Score and Missing Positive and Missing Negative columns should display values
of 1.00000.
If identification is not achieved then the most likely taxa are listed descending order of their
identification scores. The Additional Tests tab is shown when the Identification tab is selected.

Differences between the unknown isolate likely taxa are listed in a second grid.
What is displayed is controlled by the threshold values set in Options.
Options
This calls the Options window which has two tabbed Options: General and Identification.
The Use default values button resets the defaults for values on the Identification tab.

Open Last Identification Matrix

The current (last) identification matrix used by the


programme is automatically opened when PIBWin is started.
The name of the file is displayed when this option is
selected. The Open window at the that is normally displayed
at the start of the programme is not displayed when this
option is selected.

Open Last Archive File:

The current (last) archive file used by the programme is


automatically opened when PIBWin is started. The name of
the file is displayed when this option is selected.

Display Matrix as +/v/-

The identification matrix values can either be displayed as


integer numbers (ranging from 1 to 99) representing the
percentage probability of obtaining a positive result, or they
can be displayed as +/v/- depending on the criterion used for
Tests are displayed as positive if the percentage is equal to
or greater than on the Identification tabbed option.

Record identification in Output The identification of any unknown isolate, atypical tests,
Window
additional tests to separate possible strains are recorded in
an Output window when this option is selected.

Identification achieved when the


ID score is greater than or equal
to
[default value 0.95]

An unknown is identified when the ID score, also known as


the Willcox probability, is equal to or greater than the
specified value.
A value within the range 0.00001 to 0.99999 can be
entered, though the accepted range for this value is 0.95 to
0.999 depending on the identification matrix

and the Modal Likelihood is A second criterion, the modal likelihood, is also applied to
greater than or equal to the identification. This avoids identification when one taxon
[default value 0.01]
gives a high ID score, but also has several test results that
differ from the unknown.
A value within the range 0.00001 to 0.99999 can be
entered.
List atypical results for taxa with A value within the range 0.00001 to 0.99999 can be
ID scores equal to or greater than entered.
[default value 0.05]
When no identification, list taxa This controls how many possible taxa are listed when
with ID scores equal to or greater identification is not achieved.
than
[default value 0.001]
A value within the range 0.00001 to 0.99999 can be
entered.
Taxa are distinguished by at least If identification is not achieved, further tests may be
[default value 2]
selected. The minimum number of tests to distinguish pairs
of taxa can be varied, though traditionally 2 tests is the
norm.

A test separates a pair of taxa if A pair of taxa are separated by a test if the absolute
their percentage difference is at difference between their matrix entries is at least the value
least
specified. This value can range from 51 to 98.
[default value 70]
Tests are displayed as positive if
the percentage is equal to or
greater than
[default value 85]

The Identification matrix values either be displayed as


integer numbers (ranging from 1 to 99) representing the
percentage probability of obtaining a positive result, or they
can be displayed as +/v/- depending on the value selected.
This value can range from 51 to 99. Negative results are
calculated as 100-the chosen value.

Theory
Most computer assisted identification systems are based on Willcox's implementation of Bayes
theorem.

where:

is the probability that an unknown isolate, giving a pattern of test results R, is a

member of taxon (group of bacteria) ti and


is the probability that the unknown has a
pattern R given that it is a member of taxon ti. Bayes theorem incorporates prior probabilities;
these are the expected prevalence of strains included in the identification matrix. For bacterial
identification most authors give all taxa an equal chance of being isolated and therefore the
prior probabilities for all taxa are set to 1.0 and omitted from the equation. The above equation
therefore can be re-expressed as:

where the probabilities are now referred to as Identification Scores, or Willcox Scores. The
identification scores for each taxon are normalized values and Li* for all taxa sums to one.
Identification of an unknown isolate is achieved when Li* for one taxon exceeds a specified
threshold value.
An example is shown below with an identification matrix consisting of three taxa for which we
have the probabilities for four tests.
Identification matrix with results of unknown
Tests

Taxa

Results of unknown

0.01

0.20

0.99

0.90

0.95

0.01

0.99

0.01

0.99

0.10

0.85

0.99

missing

An unknown has been isolated whose results for the first three tests are positive, negative and
positive respectively. The likelihoods that the taxa a, b and c will give the pattern of results
observed for the unknown is calculated by multiplying the probability of obtaining a positive
result for test 1 by the probability of obtaining a negative result for test 2 by the probability of
obtaining a positive result for test 3 for each taxon in turn.
Calculation of likelihood of unknown
1
2
Taxa

Likelihood

0.01

(1-0.20)

0.99

0.00792

0.95

(1-0.01)

0.99

0.93110

0.99

(1-0.10)

0.85

0.75735

Sum

1.69637

The original identification matrix only gives the probabilities for positive results, in order to use
the probability for a negative result we must subtract the matrix entries for test 2 from 1.

Calculation of likelihood of unknown


1
2
Taxa

Likelihood

0.01

(1-0.20)

0.99

0.00792

0.95

(1-0.01)

0.99

0.93110

0.99

(1-0.10)

0.85

0.75735

Sum

1.69637

The Identification Scores are expressed as normalized likelihoods.


Willcox probabilities (normalised likelihoods)
Identification Score
a

0.00792 / 1.69637

0.004669

Taxa b

0.93110 / 1.69637

0.548877

0.75735 / 1.69637

0.446455

Sum

1.000000

In this example the unknown is not identified because a single taxon does not reach the
identification threshold value. Taxa b and c are still both candidates for the identity of the
unknown. Threshold values of 0.999 are typically used, for example with the
Enterobacteriaceae, but with other groups of bacteria, such as the streptomycetes, values as
low as 0.95 have been used. In practical terms, a value of 0.999 means that the taxon which
the unknown identifies with will have at least two test differences from all other taxa in the
matrix.

Whatever type of identification system is used, there are four possible outcomes:
The unknown is identified with the correct taxon.
The unknown is misidentified, i.e. incorrectly attributed to wrong taxon.
The unknown is not identified at all, and correctly so because the taxon to which it
belongs is not present in the matrix.
The unknown is not identified, but should have been identified with a taxon that is
present in the matrix.
It is important that any system deals with these possibilities, although the last one is difficult to
resolve. One problem with the identification score is that if an unknown is not represented in
the matrix, but one strain within the matrix is closer to it (in a-space) than all others, the
unknown may be identified as this strain. This is where additional criteria should be used to
assist the identification process. These include, listing the differences in test results between
the unknown and the strain it has been identified as, as well as the use of other numeric
criteria such as taxonomic distance, the standard error of taxonomic distance measures or
maximum likelihoods. Taxonomic distance is the distance of an unknown from the centroid of
any taxon with which it is being compared; a low score, ideally less than 1.5, indicates
relatedness. The standard error of taxonomic distance assumes that the taxa are in
hyperspherical normal clusters. An acceptable score is less than 2.0 to 3.0, and about half the
members of a taxon will have negative scores, because they are closer to the centroid than
average. The maximum, or best likelihood, is the maximum probability for a taxon calculated
using those tests carried out on the unknown. The calculation uses the maximum of the
probabilities of a negative and positive result of a test.

Maximum possible likelihoods


1

Taxa

Best
Likelihood

(1-0.01)

(1-0.20)

0.99

0.78408

0.95

(1-0.01)

0.99

0.93110

0.99

(1-0.10)

0.85

0.75735

This allows for taxa with several entries of 0.50 in a matrix. Some authors calculate the
likelihood/maximum likelihood ratio, termed the modal likelihood fraction

Modal likelihood fraction


Modal likelihood
a

0.00792 / 0.78408

0.010101

Taxa b

0.93110 / 0.93110

1.000000

0.75735 / 0.75735

1.000000

or its inverse and use it to decide whether to accept the identification offered by a Willcox
score that has exceeded the identification threshold.