Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Objectives
Page 2
Our objectives for this session are to look at the steps involved in designing
effective QC, to look at how Total Allowable Error can be effectively used in the
process and to review some of the key points to keep in mind when using QC
rules to enhance effectiveness.
Designing Effective QC
2
Select QC rules
1
Set quality
requirement
Page 3
3
Set QC
frequency
2011
2011 Siemens
Siemens Healthcare
Healthcare Diagnostics
Diagnostics Inc.
Inc.
There are three steps to designing effective QC. First set the quality requirement.
Next Select the QC rules that will be most efficient at meeting that requirement,
and then finally determining how often we need to test QC samples to be
efficient and effective.
Lets start by discussing the quality requirement.
Tool
Quality Requirement
Westgard et al, A Multi-Rule Shewart Chart for Quality Control in Clinical Chemistry, Clin Chem, 27, 493, 1981
Page 4
A key concept to think about when looking at QC results is what constitutes a real
performance problem ? How do we know when there is a real meaningful
problem with the method ?
Is it always meaningful when the QC results fail a statistical QC rule ? Or is the
true criterion that method performance has changed enough to impact medical
care. I think we all agree that the later is our real concern. However, historically
we have mostly all assumed that these two things were equal and identical; that
failure of a statistical QC rule always meant that there was a medically significant
change in method performance. But is that always the case ? Actually, we know
it is not. This is reflected in the fact that we often report results even though the
QC results have failed a rule. We say that if only one level is out, its OK to report,
or similar evaluations. Here we have an excerpt from the original Westgard Rules
paper in 1981 that indicates that it is acceptable and necessary to sometimes
recognize that just because QC results have failed a statistical rule, even the
Westgard Rules, it may still be reasonable and necessary to release results while
we are investigating the rule failure.
Statistical QC rules are tools we use. Tools that help us to know that some degree
of change has occurred in the method. Then that change needs to be put in
perspective relative to our quality requirement for method performance. We have
been doing this intuitively and informally for years.
Lets look at formally establishing our quality requirement and see how QC rules
really relate to it. To do this we want to introduce a two very useful concepts
Total Analytical Error and Total Allowable Error.
True value
imprecision
1.65 SD
5%
Total analytical error is defined as the error that encompasses 95% of the results
for a given method. As we know, error is made up of two components. Constant
error, often called bias, is the average consistent error seen over time. We would
like to reduce or eliminate bias, but cannot always do so. The other component
of total error is random error or imprecision. This is an inherent characteristic of
the method. To estimate the total error for the method we want to capture the
error that covers 95% of the results of the method. Since random variation is
symmetrical around the mean, it sometimes adds to total error and sometimes
reduces total error. Since we are only interested is the maximum error, we only
look at the random error that increases total error. Using the SD as the measure
of random error, the combined bias and imprecision that covers the error for 95%
of results is bias plus 1.65 times the SD. That becomes our formula for estimating
total analytical error.
Page 6
So total analytical error is the actual error we have. We next want to look at what
is the maximum error that can be tolerated before we impact patient care. There
have been a number of ways to describe this, but the one that is currently most
widely used is Total Allowable Error. Using these two concepts together ca help
us design effective and efficient QC. Lets look at Total Allowable Error in more
detail.
Not method
performance based
Based on how
results are used,
not generated
Determined by
change in outcome
Established at
decision points
Outcomes:
Examples:
Failed PT
Altered medical
decision
Na+
- 115 mmol/L
PSA
- 4.0 ng/dl
TSH
- 4.0 IU/ml
Page 7
Total Allowable Error is the maximum error we can tolerate for an assay before
some outcome like medical decision making or patient care is impacted.
Total allowable error is NOT based on current method performance. It is
determined by how the results are used medically, not how the results are
determined analytically. So it is independent of the method used. Since Total
Allowable Error is dependent on the clinical use of the test result and the inherent
biologic variability of the analyte, it is not the same for all analytes. Therefore it
has to be established for each analyte and for each medically important
concentration for the analyte. The total allowable error for calcium is the same
regardless of what instrument or method is used to measure calcium.
The idea of total allowable error is that if we exceed it, the some outcome will be
affected we may fail proficiency testing, a medical decision may be altered,
patient care may be changed.
Since the concept of total allowable error revolves around medical decision
making, typically we estimate the allowable error at concentrations where
medical decision are made. To understand how this concept may be used lets try
defining Total Allowable Error for a few specific analytes as examples. Well use
Glucose, Sodium, PSA and TSH. The first step for each analyte is to define a
medically important concentration. For Glucose, 120 mg/dl is a decision point for
the diagnosis of diabetes; for Sodium 115 mmol/L is the decision point for
hyponatremia and severe electrolyte imbalance; for PSA a result above 4 ng/dl is
suggestive of increased risk for cancer and should be followed up; and for TSH a
result above 4.0 uIU/ml indicates possible hypothyroidism.
So, how do we decide what Total Allowable Error should be for a method ? Many
authorities discussed this for a number of years and in 1999 there was a
conference held is Stockholm to develop a consensus approach.
Biologic variation
Professional recommendations
Regulatory requirements
At the conference it was recognized that there is not one simple approach that
will work to define Total Allowable Error for all methods. So a hierarchy was
developed to start with the most medically sound approaches and move to other
approaches if the optimal was not possible. Here is that hierarchy.
Lets start with looking at the use of outcome studies and Clinical Expert opinion
Expert Opinion:
Review institutional standardized care protocols
Consult with physicians for expert opinion
Page 9
We are trying to establish how much change in a result will alter medical decision
making and patient care. That amount of change then becomes our allowable
error since any change less than that will not cause a physician to make a
different decision.
Clinical outcome studies are the optimal source for this information. They are
focused on the decision making in specific medical scenarios, like diagnosis and
management of heart disease or diabetes. These studies are prospective, long
term studies that objectively assess how treatment decisions affect the outcome.
Often lab results are used to make the treatment decisions. This is the most
specific data we can use.
Using medical expert opinion may seem an obvious choice for setting Total
Allowable Error. Essentially we want to know how much change in the result for
a test will cause physicians to change their decision and that becomes the limit.
To assess this we can look at the consensus derived standard treatment protocols
that are used in many healthcare facilities today or we can consult with
physicians. This will give us the benefit of their collective experience on how to
best use lab results.
HbA1c
Clinical outcome study:
Expert opinion:
Page 10
10
Page 11
The challenge is that outcomes studies and clinical protocols dont exist for most
analytes. So, while they may be useful guidance for some analytes, for most
there is no standard of this type. Also, when soliciting expert opinion, how do
you decide how much change is critical ?
It may seem straightforward to just consult with physicians about how much a
Glucose result has to change before they would consider it significant, but theres
a problem. Physicians intuitive sense of how much change is significant is in
large part based on their experience with how variable lab results are compared
to changes noted in the patients status. This intuitive sense is shaped by the
variability in lab results seen in the past and doesnt necessarily reflect current test
performance.
So these approaches are very valuable, but may not be practical for all analytes.
11
Biologic variation
Professional recommendations
Regulatory requirements
12
Page 13
13
Page 14
The current consensus on using biologic data to set analytical performance goals
sets the limits of analytical error based on the biologic data. The desirable goal
for bias is no more than 25% of total biologic variability. The desirable goal for
imprecision is no more than 50% of within individual variability. Using these
proposed limits, we can set a goal for Total Allowable Error that will encompass
95% of results for a given analyte.. The estimated Total Allowable Error is the bias
plus 1.65 times imprecision. This is the current working model for estimating
total allowable error from biologic data.
14
Analyte
Page 15
Biologic Variation
Desirable Specification
CVI (%)
CVB (%)
CV (%)
Bias (%)
TEa (%)
Glucose
5.7
6.9
2.9
2.2
6.9
Na+
0.7
1.0
0.4
0.3
0.9
PSA
18.1
72.4
9.1
18.7
33.6
TSH
19.3
19.7
9.7
6.9
22.8
15
Challenges:
No complete agreement on biologically based goals
Variability data for some analytes not robust
Performance of some current methods cannot meet biologic goals
Page 16
There are challenges. First this is a consensus model, which implies some degree
of disagreement on how the goals should be set. Second, the data used to
determine biologic variability is not robust for all analytes. We have excellent
data for many analytes, but the data is not as solid for many others. Finally, some
methods in current use cannot achieve the level of performance necessary to
meet goals set using this model. Current technology is not capable. Example
analytes where this is an issue are Sodium and often Calcium.
16
Biologic variation
Professional recommendations
Regulatory requirements
17
Page 18
There have also been a number of published studies and reports by professional
groups that also establish specific medical decision points for some analytes. In
these studies and reports, tolerable error limits are often also defined. These
reports can be very useful in establishing Total Allowable Error for those analytes.
Since the data used to establish the recommended performance criteria are not
always outcome based, the recommendations in these reports are not a solid as
those from outcome studies. These reports are based on outcome data whenever
possible, but, as we already indicated, that data does not exist for many analytes.
18
Page 19
19
Challenges:
Published guidelines only cover limited number of analytes
Standardized guidelines require consistency across methods / labs
Some current methods cannot meet desired performance goals
Page 20
As with the other approaches discussed, one limitation is that these reports and
recommendations do not exist for all analytes. Virtually none of these protocols,
studies or reports make any allowances for lab to lab or method to method
differences in results. None suggest interpretation of results using lab specific
reference intervals. This means that there is increasing pressure on
manufacturers and laboratories to minimize or eliminate these differences. As we
all know this is not simple task for a number of analytes, but progress is being
made and will continue to be made. Finally, these recommendations are clinically
based and focus on what is desirable clinically. There have been a couple of cases
where the performance recommendation cannot be met by any method in
current use. Technology has not caught up with the perceived medical need.
20
Biologic variation
Professional recommendations
Regulatory requirements
21
Page 22
Agencies in many countries and even state agencies here in the US manage
External Quality Assessment (EQA) or Proficiency Testing (PT) programs and have
established acceptable performance limits for these inter-laboratory testing
programs. If these limits are used to establish Total Allowable Error, we can then
set as a goal detecting any change in method performance that would cause a
failure with an EQA or PT result.
This approach to establishing Total Allowable Error has been very popular in most
of the literature articles about Total Allowable Error and these limits are often
listed in tables in these articles and in some software as recommended values for
TEa. Lets look at some examples.
22
Example:
CLIA 88 performance goals for proficiency testing
Often used as examples in literature and software for Total Allowable Error limits
Less than half the typical laboratory menu of analytes has CLIA PT goals
Goals created by committee consensus based on 1980s technology
Useful resource not a gold standard
CLIA Allowable
mandatedError
PT acceptable
limits:PT limits
Total
based on CLIA
Glucose
Sodium
PSA
TSH
Page 23
Target
value
6 mg/dl or 10% (greater)
120
mg/dl
10%
Target
value4
mmol/L
115 mmol/L
3.47%
None
NoneEstablished
Target
value21%
3 SD
4.0 IU/ml
2011 Siemens Healthcare Diagnostics Inc.
In the US the CLIA regulations have established performance criteria for a number
of analytes.
Here are the CLIA goals for our example analytes With our example, we can find
CLIA goals for Glucose, Sodium, and TSH and we can use the goals to set Total
Allowable Error specifications at our chosen medical decision points. However
there is no CLIA performance goal for PSA as is the case for many analytes and
most immunoassays.
These performance goals can be used as the Total Allowable Error goal. However,
these CLIA performance goals were established prior to 1992 using a consensus
process and are based on the expected performance of analytical systems in use
at that time. They dont reflect well current performance or necessarily medical
needs and, most importantly, the goals are only set for about 40 analytes. These
goals can be a good resource when establishing Total Allowable Error, but they
are not a gold standard.
23
Challenges:
Acceptable limits not defined for all analytes
While limits may be based on clinical requirements, may be altered to
meet practical needs of PT/EQA programs
Limits must incorporate allowances for factors such as sample stability,
capabilities of older technology, matrix interactions
Page 24
There are some challenges to using EQA or PT limits for Total Allowable Error.
Especially in the US many analytes commonly part of the labs menu do not have
CLIA defined limits. Further, while these limits are often based on medical
usefulness criteria, the actual limits are modified to meet the needs of the EQA /
PT program. Things like sample stability, possible matrix interactions, the need to
cover a wide range of analytical technology, etc. often drive adjustment of the
medically derived limits to meet the practical constraints of an EQA / PT program.
24
Biologic variation
Professional recommendations
Regulatory requirements
25
Determining TEa
is not simple
Page 26
Setting Total Allowable Error goals for all analytes is the hardest part of
developing an efficient and effective QC protocol. There is no simple, one right
way to estimate Total Allowable Error. The example tables from literature articles
are just that, examples. They are not standards or necessarily the best approach
for us to use. We need to keep in mind that the goals are primarily clinical in
nature.
26
Determining TEa
is not simple
Page 27
Need to use
reasoned
judgement
27
Determining TEa
is not simple
Page 28
Need to use
reasoned
judgement
TEa can be
used with
different
methods
Establishing total allowable error goals for all analytes in the lab takes a fair
amount of time and effort and is never easy. However, once it is done. It is
essentially done for all time. Since the Total Allowable Error goals are not based
on how current methods perform, but rather on how results are used, once the
goals are agreed on they can be used for a long time with different instrument
systems. So, in the long run, the effort to set these goals is worth it.
Once we have established our Total allowable Error, we can use it to help select
the optimal QC rules for our analytes
28
Designing Effective QC
2
Select QC rules
1
Set quality
requirement
Page 29
3
Set QC
frequency
2011
2011 Siemens
Siemens Healthcare
Healthcare Diagnostics
Diagnostics Inc.
Inc.
Once we have established our quality requirement, we can use this to select our
QC rules. However, before we discuss how to use the quality requirement in rule
selection, we need to review some key concepts behind the function of QC rules
so the selection process makes sense.
29
Types of Rules
Failure Rule:
If QC results fail this rule, method is considered out of control
Warning Rule:
If QC results fail this rule, considered an early warning alert.
Method is still in control and results are still reported without delay.
Failure of a warning rule suggests there may be something worth
investigating while results are still reported
Challenge:
Warning rule concept can be confusing
Warning rules are often treated as failures testing is halted,
results not reported. Negates the point of a warning rule
Page 30
An initial step in selecting rules is to decide which type of rules we want to use.
There are two basic types. Failure rules are designed so that, if the QC data fails
the rule, we say the method is out of control and we stop reporting results until
further action is taken. Clearly this is the most common type of QC rule and all
QC protocols need to be based on one or more failure rules.
We also have warning rules. These are rules that typically have too high a false
positive rate to be effective failure rules, but they can function very well to give us
an early indication that some change in performance may be occurring and allow
time to investigate before we trip the failure rule. With a warning rule, if the QC
results fail the rule, WE do NOT stop releasing results. Instead, we recognize that
the method is still acceptable, but something may be happening. So we start to
investigate to see if there really is an issue without interrupting the work flow.
The usual problem with using warning rules is that pretty soon everyone starts
treating them as failure rules and stops reporting when the warning rule trips.
This negates the whole idea of a warning rule and makes the QC process very
inefficient.
30
Options include:
Single rule
Multi-rule
Page 31
So what rules are available to us to use. There are actually quite a lot of
options. Single rule protocols and multi-rule protocols are the most
commonly used and we will discuss them in some detail.
31
Page 32
These other options: Mean & Range, cumulative sum, weighted moving
averages, using patient results and others like multi-variate approaches are
all well documented in the literature and can be very effective and
efficient. They have not been widely used because they pretty much all
require that calculations be made each time a QC result is evaluated. In
the past this was not practical for many labs. However, now all the
instruments have powerful computers, many labs use middleware
products that use powerful computers and most all labs are connected to
LIS systems that can perform the calculations. However, if we look at the
QC support software on our instruments, our middleware, and our LIS
systems, we dont find these options available. So we are still left with
single rule and multi-rule protocols as the most practical because they are
well supported.
32
Single rule
Multi-rule
Page 33
For now we will focus on Single rule protocols and multi-rule protocols
since these are the most readily available procotols and the only ones
generally supported by the software we use to manage QC.
33
Single rule
Multi-rule
+/- 2 SD
Historically, very commonly used
Very inefficient rule
False reject rate too high
Can only be effectively used as a
warning rule
Page 34
A single rule protocol is just what it sounds like. A single QC rule is applies
to each QC result as it is generated. If the result fails the rule, the method
is deemed out of control.
Historically, the single rule +/- 2 SD has been the most commonly used.
This is a very inefficient rule due to its very high false positive failure rate,
especially when used with multi-level QC material for for many methods
concurrently. It can be an effective warning rule but most of the time
the warning rule is actually used as a failure rule and we have gained
nothing.
34
Single rule
Multi-rule
Page 35
However, +/- 2 SD is not the only possible single rule. SD multipliers like
2.5, 2.58, 3 and even 3.5 can be effectively used to control the false
positive rate and reliably detect change in performance. Notice that it is
not required by statistics or science that the multiplier of the SD used for a
single rule needs to be a whole number. The only reason most rules used
historically have been whole numbers is that those rules were developed
when we were doing the math in our heads. And whole numbers are
easier to work with. Today with computers doing the math, the multiplier
can be any value we want in order to get the detection or false positive
rate we desire. A multiplier of 2.58 SD gives us a false positive rate of
exactly 1% per method per control. What is critical is to match the choice
of rule to the quality requirement and the usual performance of the
method. We will look at this in detail in a bit.
35
Single rule
Multi-rule
Page 36
36
Single rule
Multi-rule
Page 37
The best example of a multi-rule protocol is also the most widely known, the so
called Westgard Rules. Dr. Westgard and three other authors published the
paper introducing these rules 30 years ago. The rules used were selected from
statistical control rules used in other industries. Dr. Westgard selected rules that
would best fit the way a clinical laboratory operates and which would have a very
low false positive rate. The rules as described in the riginal article and in all
subsequent writings are not a fixed set of required rules, but rather a tool box of
rules that can be used. There are other multi-rule protocols available, but they all
essentially work the same way. Lets look at Dr. Westgards proposed rules in a
little more detail.
37
Scatter
13s
R4s
22s means
2 consecutive QC results
that both exceed 2 SD
S
in the same direction
Page 38
Bias
N = 2 or 4
N = 3 or 6
22s
2 of 32s
41s
31s
8x
6x
7t
N = number of QC samples per run
2011 Siemens Healthcare Diagnostics Inc.
Here is Dr. Westgards rule tool box. As you can see, some rules are
designed to detect increased scatter or imprecision. Other rules are
designed to detects changes in bias or shifts. We can also see that which
rules you should use depends in part on how many QC results are being
evaluated together. We have one set of rules for when we use 2 levels of
controls and a somewhat different set if we use three levels of controls.
The notation may seem strage at first, but it is easy to understand. 2 2s
means two consecutive QC results that both exceed 2 SD on the same side
of the mean. Similarly 4 1s would mean 4 consecutive QC results all
exceeding 1 SD on the same side of the mean. Lets look at these other
rules
38
R4s Examples
a
n
Example 1:
g
e
Control 1
QC results within
one run span > 4SD
Control 2
Page 39
The R 4 s rule looks at the range spanned by two controls within the same
run. If the span exceeds 4 SD, then the rule fails. Note this applies only to
controls run together in a single run and that they do not have to be
consecutive. If we are using three levels of control, if any two of the three
results show a span exceeding 4 SD ,then the rule fails.
39
Page 40
40
Trend rule: 7t
7 consecutive results, each one greater (less) than the preceding result
Popular in Europe
3
Z Score
2
1
0
-1
-2
-3
0
10
15
Run
Page 41
The last rule we will look at is the 7T. This is a trend rule that has been
popular in Europe. It requires that 7 consecutive QC results each be
greater than (or less than) the result immediately before. This is not the
same as the N x rules since they only require that the results be on the
same side of the mean. Here each result must be numerically greater
than, or less than the one before it.
41
Within
Comparison of control results within the same control sample (level)
across multiple runs
Example: last 3 results for level 2 for Glucose
Results will be from different runs and can be from different days
Across
Comparison of control results across different control samples
within the same run
Example: current results for levels 1,2 & 3 for TSH
Can be different control samples (levels)
Page 42
42
Level 1
Date
Result
Level 2
Z
Result
Level 3
Z
Result
21-Aug
101.6
2.41
231.3
1.37
345.9
-0.33
20-Aug
86.0
0.12
202.6
-0.55
387.6
1.43
13-Aug
78.3
-1.01
204.3
-0.44
342.8
-0.46
9-Aug
81.9
-0.49
227.7
1.13
360.9
0.30
8-Aug
87.8
0.39
216.8
0.40
384.6
1.30
7-Aug
87.9
0.41
220.6
0.65
359.5
0.24
6-Aug
74.6
-1.55
256.2
3.04
394.4
1.72
Across controls
Within a run
Within a control
Across runs
Page 43
Here is another way to visualize the concepts of within and across. Most
QC rules are designed to be applied both ways The idea behind looking
back to previous days is to gain sensitivity to detect changes early on by
using more data. This is really what we instinctively do when we look at
the QC graph and review the data from previous days. Applying the rules
this way just makes that look back part of the QC rules.
43
Westgard et al, A Multi-Rule Shewart Chart for Quality Control in Clinical Chemistry, Clin Chem, 27, 493, 1981
Page 44
Now lets look at some guidelines to the effective use of the Westgard
Rules
First Select the rules used based on method performance. We will
discuss how to do this is detail in a few moments, but right now I want to
make the point that you are not required to use all the rules all the time.
Even in the original paper that so many have referred to, Dr. Westgard
selected the which subset of the rules to use based on the number of QC
samples tested in each run. Today, the selection is driven by method
performance. Key is that random combinations do not work. The rules
have been validated to work in some very specific groupings. The specific
groupings can be readily found on Dr. Westgards website and even in the
original paper as shown here.
44
Level 1 result
Level 2 result
Level 3 result
45
Page 46
Once we have a rule failure, the data used to evaluate the rules cannot
come from prior to the rule failure. Lets see how this works
46
Level 1
Date
Result
Level 2
Z
Result
Level 3
Z
Result
28-Aug
91.2
0.88
196.2
-0.98
346.4
28-Aug
81.2
-0.59
224.2
0.89
357.0
0.14
27-Aug
96.2
1.61
229.3
1.24
342.4
-0.48
24-Aug
83.7
-0.22
216.2
0.36
350.9
-0.12
23-Aug
81.1
-0.59
207.5
-0.23
340.2
-0.57
22-Aug
78.3
-1.00
221.3
0.70
376.2
0.95
22-Aug
99.8
2.13
212.4
0.10
357.9
21-Aug
101.6
2.41
231.3
1.37
345.9
20-Aug
86.0
0.12
202.6
-0.55
387.6
1.43
13-Aug
78.3
-1.01
204.3
-0.44
342.8
-0.46
9-Aug
81.9
-0.49
227.7
1.13
360.9
0.30
8-Aug
87.8
0.39
216.8
0.40
384.6
1.30
7-Aug
87.9
0.41
220.6
0.65
359.5
0.24
6-Aug
74.6
-1.55
256.2
3.04
394.4
1.72
Page 47
-0.31
0.18 STOP
-0.33
2 of 32s
Once we have a failed run, we start over with the data used for rules going
forward. So it will be 4 runs into the future before we can apply the 4 1s rule
within a single control. However, this only applies to the QC rules. When we use
this data to calculate a mean or SD, we use all the data except from the specific
run that had the problem.
47
Level 1
Date
Result
Level 2
Z
Result
Level 3
Z
Result
28-Aug
91.2
0.88
196.2
-0.98
346.4
28-Aug
81.2
-0.59
224.2
0.89
357.0
0.14
27-Aug
96.2
1.61
229.3
1.24
342.4
-0.48
24-Aug
83.7
-0.22
216.2
0.36
350.9
-0.12
23-Aug
81.1
-0.59
207.5
-0.23
340.2
-0.57
22-Aug
78.3
-1.00
221.3
0.70
376.2
0.95
22-Aug
99.8
2.13
212.4
0.10
357.9
0.18 STOP
21-Aug
20-Aug
-0.31
Prior
QC data
not 231.3
used to 1.37
apply rules
going
101.6
2.41
345.9
-0.33
forward
86.0
0.12
202.6
-0.55
387.6
1.43
13-Aug
78.3
-1.01
204.3
-0.44
342.8
-0.46
9-Aug
81.9
-0.49
227.7
1.13
360.9
0.30
8-Aug
87.8
0.39
216.8
0.40
384.6
1.30
7-Aug
87.9
0.41
220.6
0.65
359.5
0.24
6-Aug
74.6
-1.55
256.2
3.04
394.4
1.72
Page 48
2 of 32s
Once we have a failed run, we start over with the data used for rules going
forward. So it will be 4 runs into the future before we can apply the 4 1s rule
within a single control. However, this only applies to the QC rules. When we use
this data to calculate a mean or SD, we use all the data except from the specific
run that had the problem.
48
Page 49
Recognize that the rules were developed in the 1970s and were designed
to be realtively simple for people to use manually
49
Page 50
50
Page 51
However, now most folks use some sort of a computerized implementation of the
rules and theres the challenge. Most computer implementation of the Westgard
Rules do not use the rules the way Dr. Westgard originally intended. Frequently
not all the rules are available, especially those for three levels of control. Then the
rules are often not applied within and across and finally the rules are often
applied to each individual QC result as it is generated rather than collectively to
the run.
These differences do not mean that these implementations of the rules are not
good and do not work. They can be effective and do the job, but it is important
that we know exactly how they work and not assume that just because they are
called Westgard Rules, they are exactly as described in the original paper.
51
TEa and QC
If typical method error is close to total allowable error, it will be very difficult to
control assay performance to prevent exceeding the TEa
If typical method error is much less than total allowable error, it will be relatively
easy to detect change in the assays performance before exceeding the TEa.
The ratio of the methods typical error relative to the Total Allowable Error goal
has been called the Sigma Metric
Page 52
Now, finally lets bring it all together and use our Total Allowable Error based
quality requirement and our understanding of the QC rules to see how we can
select effective and efficient QC rules for our methods
To do this we compare our TEa goals to the actual performance of our methods
on the instrument we are using. This is where we make the connection between
TEa goals and actual method performance.
So, if Total Allowable Error is close to the actual performance of the assay, it may
be difficult to monitor the assay and control it to prevent change in assay
performance from impacting assay interpretation. However, if the actual method
variability is small compared to the performance goal it will be easy to detect
change in performance before it has an impact on patient care.
Recently, folks have begun taking the ratio of TEa to the methods variability as a
guide to selecting QC rules. This ratio is called the Sigma Metric.
52
imprecision
Page 53
The Sigma metric is a measure of the difference between the actual method error
and the Total Allowable Error. Here we see the performance of an assay relative
to the true value and the Total Allowable Error. The Sigma Metric is calculated
by subtracting the assays bias from the Total Allowable Error goal and then
dividing that difference by the assay CV. This gives the difference between
current assay performance and the error goal as multiples of the CV or SD.
As you might expect, the ideal is for the Sigma Metric to be 6 or higher. Lets see
how we can use this value to determine what QC rules will be effective.
53
True value
1-3s
Page 54
With high sigma methods, the difference between typical performance and the
total allowable error limit is sufficiently large that a simple single rule protocol like
+/- 3 SD can readily catch any significant change in method performance before
we exceed the allowable limit and still have a very low false positive rate.
54
True value
1-3s
Page 55
On the other hand, a low sigma method doesnt have the same cushion to work
with. In this case using +/- 3 SD will not be effective because we will have
exceeded the error limit well before a 3 SD limit will consistently indicate the
change in performance. In this case a multi-rule protocol will be more effective
and which rules to use will depend on the sigma.
55
Selecting QC rules
3
5
Sigma Metric
QC Rules
13s/R4s/22s/41s/8x 13s/R4s/22s/41s
n=6
n=4
12.5s
12.5s
13s
13.5s
n=4
n=2
n=2
n=2
JO Westgard, Six Sigma Quality Design & Control, 2nd Ed., Westgard QC inc., 2006
Page 56
When we use the sigma metric to help select QC rules we find there is a
continuum of which QC rules work best at which sigma metric.
If the assays sigma metric is 5 or greater, it becomes fairly easy to detect change
in performance before the analytical performance can impact decision making
and the QC protocol used can be very simple.
If the assays sigma metric is between 4 and 5- its still fairly easy to catch change,
but slightly more powerful QC rules are needed
If the assays sigma metric is between 3 and 4 it is more difficult to catch
performance changes before they impact decision making, but it is still practical
with reasonable QC protocols. The closer we are to 3 sigma the more complex
the rule set.
If the Sigma metric is less than 3, we need all the QC rule support we can get and
even that may not be able to effectively monitor changes in the assays
performance to prevent any impact on decision making using statistical QC
protocols alone.
Fortunately, most current assays fall into the 4 or better sigma range and so are
OK. However, in the menu of almost every system are a few that do not. If thats
the case, and an alternate better method is not practical, then we have to use
maximum statistical QC and know that even that may not detect all significant
changes.
Since these choices are based on Sigma, it seems to suggest we could have
multiple QC protocols in the lab
56
Method
Glucose
Creatinine
BUN
K+
Na+
Calcium
LD
CK
CEA
Cortisol
Estradiol
Folate
Microalbumin
PSA
Page 57
+/- 3 SD
n=2
+/- 2.5 SD
n=2
+/- 2.5 SD
n=4
Multi-rule
n=6
4.8
7.5
3.3
5.0
2.9
4.5
6.2
9.5
4.0
6.2
3.4
6.9
9.2
6.1
2011 Siemens Healthcare Diagnostics Inc.
When you do a sigma analysis and look at the results, its easy to see that we will
certainly not use the same QC protocol for everything in the lab and probably not
even for all the methods on a single instrument. How are we supposed to
manage that ?!
57
+/- 2.5 SD
n=2
+/- 2.5 SD
n=4
Multi-rule
n=6
Creatinine
Glucose
K+
Calcium
BUN
Na+
LD
CK
CEA
Estradiol
Cortisol
Folate
Microalbumin
PSA
As we work it through we can see that methods get grouped into one of three or
four different QC protocols based on their sigma value. So we only have a small
number of different QC protocols. Still who can remember this ? No one can or
needs to. The QC software on most instruments today allows QC rules to be
assigned on a method by method basis. A number of Siemens systems have
supported this for more than 10 years. So we dont have to remember, the
computer does. We configure the QC software one time and it remembers from
that point on. Then we can use QC panels to easily schedule the number of QC
samples appropriate to each method. So that, looking at QC on a daily basis,
nothing changes, the QC software flags results that fail the rules and we follow
up. Regardless of the QC protocol.
58
Practical Challenges
To calculate need:
Page 59
As is often the case when we try to take a good idea and use it in the real world,
there are some practical challenges. To estimate the Sigma metric we need three
values: Total Allowable Error, bias and the CV.
CV is relatively straight forward if we have QC samples that are targeted near the
decision points of interest. We can use the CV from the QC material. We have to
make sure that we are calculating the CV using enough data. 10 values is no
where near enough and even 20 values will not give a robust estimate of CV. It is
best to use data from several months of QC testing if possible.
We have already discussed the challenges with determining total allowable error
so wont go over that again. However we recognize there is effort involved is
choosing the best value to use.
59
Practical Challenges
To calculate need:
Page 60
Finally there is bias. This can be a difficult challenge. Bias represents how much our results differ
from the true result on the average. But what is truth ? How do we know what the true value is ?
In articles about using sigma metric, it is often suggested that we use the QC or PT peer group
mean as our measure of truth. But is that really the best choice? The peer group mean is not
necessarily the most accurate value only the most popular one. It is entirely possible that the peer
group is generally more biased than we are.
In the past folks have used the all method mean from PT results as truth, and at one time it may
have given a reasonable estimate. However, today for many, many methods there is a
predominant market leader that most labs are using and the all method mean is really nothing
more than the peer group mean for that method. If that method is unbiased, then it is fine but
how do we know that method is unbiased?
We can send samples to a reference or commercial lab to get comparative results. However, often
these labs use the same methods we do. Sometimes however, these large labs do have reference
methods, or something very close, available. If thats the case then those results could give us a
good estimate of bias. What we really want is comparative results for fresh patient samples from
a real reference method. Unfortunately that is almost impossible to find. Reference methods are
very manual and are usually not practical for routine use. So we cannot afford to set them up and
often cannot find a lab that can. In recent times some PT programs have begun assigning target
values using reference type methods and grading is against the reference result rather than the
peer group. If thats the case, those TP targets may be useful.
One pragmatic way to get started using the concept of sigma metric even if we cannot find a
good way to estimate bias is to assume bias is zero. If we do this, we can estimate a sigma metric
and use it to help set up out QC and generally we will get close to the ideal. Most methods do not
have large biases so this can work at a very basic level to help us get started. Then once we find
an estimate of bias that we feel accurately represents method bias with patient samples, we can
revise our estimate of Sigma metric and adjust accordingly.
60
Analyte
Bias
CV
Glucose
1%
Na+
Medical TEa
CLIA TEa
Biologic TEa
2.3%
10%
6.9%
0%
1.0%
3.47%
0.9%
PSA
N/A
5.0%
None
33.6%
TSH
1%
4.9%
21%
22.8%
20%
Page 61
Looking at our example assays, there is only one, TSH, for which we have
documented Error goals based on all three approaches medical use, CLIA limits
and biologic data. Lets follow TSH through the process.
For the goal based on medical use we get a sigma metric of 3.9. Using the CLIA
based goal we get a sigma metric of 4.1 and using the biologic goal we get a
sigma metric of 4.4. All pretty much the same and all indicate that we can
monitor and control TSH to meets these goals using standard statistical QC
protocols.
However, that is not the case for all assays
61
Analyte
Bias
CV
CLIA TEa
Biologic TEa
CLIA
Biologic
Glucose
1%
2.3%
10%
6.9%
3.9
2.6
Na+
0%
1.0%
3.47%
0.9%
3.5
0.8
PSA
N/A
5.0%
None
33.6%
None
6.7
TSH
1%
4.9%
21%
22.8%
4.1
4.4
Challenges:
For some, no method in routine use has performance to meet biologically based goal
For others, no medical or CLIA based performance goals are available
There is no simple uniform way to set goals
Page 62
When we look at our four example assays we see some of the challenges we face.
For some assays the biologically based goals may not be achievable with current
methods and technology. For other analytes, there may not be defined goals
using criteria other than the biologic criteria. So we find that there is no simple
uniform way to set Total Allowable Error goals and estimate the sigma metric. It
becomes a decision based on available information and judgment.
However, it is worth the effort because it is so useful in helping us set up the most
efficient and effective QC protocols.
62
Select
optimal
QC rules
Total
Allowable
Error
Page 63
Guide
actions
when QC
results fail
rules
2011 Siemens Healthcare Diagnostics Inc.
There is another way that Total Allowable Error can help us in looking at QC
results and that is to guide our actions when we have a QC rule failure.
63
Page 64
Our QC rules are statistically based and are designed to detect any change in
method performance. If the apparent change in performance puts assay results
near the limit of the Total Allowable Error, then all results should be held until the
investigation is complete and the issue resolved.
However, if the shift in performance cause a QC rule failure, but the results are
still comfortably within the Total Allowable Error limit, then results can still be
reported while the investigation is being done. This is because in spite of the
change in method performance the error in the results still is not large enough to
affect medical decisions.
Some points to note This does NOT mean that we should use total allowable
error limits as the acceptable limits for our QC rules. That would not work very
well at all. We want our QC rules to work for us to detect any change in method
performance. Then we can use total allowable error to put this change in context
relative to medical decision making. Once we have put the method performance
in context, we can make the appropriate decisions about how to proceed and
whether patient results can be released. In this regard Total allowable error can
function like a warning rule.
64
Designing Effective QC
2
Select QC rules
1
Set quality
requirement
Page 65
3
Set QC
frequency
2011
2011 Siemens
Siemens Healthcare
Healthcare Diagnostics
Diagnostics Inc.
Inc.
So we have set our quality requirement and used it to help select the optimal QC
rules, now we need to establish when to test QC samples in order to finalize our
QC protocol
65
Event based:
Routine Monitoring:
Calibration
Service
Regulatory:
US two concentrations once each day of testing unless you can use
EQC; then its effectively No QC
Germany twice within 24 hours, no more than 16 hours between events
Page 66
When do we test QC samples? Generally there are two triggers for QC testing.
One is event based. We test QC samples every time we do something that may
have altered the performance of the system. Things like calibrate, maintenance,
new reagent lots, etc. The second trigger is based on routine monitoring to
detect random error. We know any analytical system can fail. We know these
failures are random in nature and infrequent. So we cannot predict when they
will occur. As a consequence we periodically test QC samples as a spot check for
this random error. But how often is periodically?
Even regulatory agencies cannot agree. In the US CLIA says the MINIMUM is once
every 24 hours of testing. In Germany, the requirement is twice in 24 hours with
no more than 16 hours between events. So how do we decide ?
66
Failure rate
is very low
Occurrence cannot
be predicted
Our goal is to detect failure before any incorrect patient results are reported. So
our goal is really risk based. We are more concerned about how many patient
results might be incorrect than we are about how often the system might fail.
We know the failure rate is low. We cannot predict when the failure will occur.
We know that testing QC samples can only tell us how the system is performing at
the moment the QC sample is tested. So the obvious conclusion is to test a QC
sample with every patient sample just to be sure ! WRONG !!! Clearly this
conclusion is not workable. It is completely impractical because of the realities of
workflow and the associated costs. So what do we do. We have to balance our
need to reduce patient risk with the practical realities and costs. Lets look at cost
in more detail
67
Cost of QC
Direct cost:
Indirect cost:
QC material
Delay in reporting
Reagents
Failure cost
Disposables
Labor time
Look back
Phone calls
Corrected reports
Incorrect treatment
Liability
Easy to assess
Harder to quantify
Page 68
The direct costs of QC are fairly easy to understand and estimate. They include
the cost of the QC material, the reagents and disposables used and the labor cost.
But there is another cost to QC the indirect costs. The costs resulting from
delayed reporting of patient results because we are running QC samples on the
instrument and investigating all the false positive QC rule failures before we
report results. Theres also the failure cost. This is the costs associated with the
occurrence of a QC rule failure that is then determined to be a true failure. The
costs of any look backs at patient results. The direct costs of any repeat testing of
patient samples. The cost of phone calls and corrected reports. The costs of
incorrect treatment decisions because of incorrect labs results and the potential
liability costs of the incorrect results. Fortunately these last two are not often a
big concern because few treatment decisions are made solely on the basis of a
single lab result. However, it can happen.
To understand the true cost of what ever QC protocol we use, we have to
estimate the indirect costs and factor that into the total cost. Direct costs are easy
to assess and generally, the less QC we do, the lower the direct cost. Indirect
costs are tougher to estimate and generally the less often we test QC sample, the
greater the potential indirect costs.
68
Cost of QC
Estimating Failure Cost:
Indirect cost:
Delay in reporting
Failure cost
Look back
Phone calls
3. Est. Costs:
Corrected reports
Incorrect treatment
Liability
Harder to quantify
Page 69
Lets look at estimating failure cost. First we need to estimate how often a real
failure of the system is likely to occur. This will be fairly infrequent. Remember,
the common reasons for changes to system performance are all event based and
we are addressing them with our event based QC. Our concern here is the
random failure. Then we need to estimate how many patient results are at risk if
a failure occurs. Generally the average number of patient results at risk is half the
number of results that would likely be reported between any teo routine QC
events. Now we look at the costs of following up on those at risk patient results.
Based on the labs protocol, what is the look back process ? How many patient
samples are retested, if any ? What is the likelihood of phone calls and corrected
reports and estimate the cost. Then we have to factor in some cost for the
possibility of incorrect treatment or liability. While an event like this may have
huge costs, it will be a rare occurrence, so the cost we factor in can be modest.
Now our true costs is the sum of the direct costs plus the indirect costs and we
can play what if by looking at varying the frequency of routine QC testing and
see what happens to the over all true cost. Lower QC frequency lowers direct and
increases indirect. So with a little experimenting using our own testing volumes
and protocols we can get an idea how to minimize the true cost.
69
QC frequency:
True Cost
Page 70
Patient Risk
In the end we try to balance the true cost of our QC protocol with patient risk. If
we increase the frequency of QC testing, we lower patient risk, but our costs go
up.
70
QC frequency:
True Cost
Page 71
Patient Risk
71
QC frequency: optimized
True Cost
Page 72
Patient Risk
We also recognize that we can never eliminate patient risk no matter how often
we test QC samples. So the optimal protocol balances risk and cost and tries to
get the most benefit in risk reduction for a true cost that can be sustained.
This optimum will be different for each laboratory. There is no single universally
correct answer. We each have to figure it out. Also note that in this discussion,
the expected frequency of system failure was not a factor used. That is because
once the expected failure rate drops below a threshold, the risk management
aspect of the QC protocol becomes more important than the expected frequency
of failure.
72
Once per
Day
US regulatory minimum
Probably not effective for risk
management
Every X hours
X hours set based on risk
Every
XX
Samples
Every X
hours
So we have decided based on true cost and risk management how often we may
want to test QC samples. Now how do we implement that ? First, we test QC
samples after every event that may alter system performance. Then for the
periodic testing, what are the options. The CLIA minimum of once per day is
probably not adequate to effectively manage indirect costs and patient risk for
most laboratories. Remember, just because we are doing something that is the
legal minimum, that doesnt mean we are doing it the best way possible.
Another way we can schedule QC samples is every XX patient samples. This
makes it very easy to estimate how many patient samples may be at risk if we
have a true failure, but it can be an awkward way to schedule QC. Since testing
volume varies widely between analytes, this approach can have us testing QC
samples for small groups of different methods quite often. This has a negative
impact on workflow and drives up direct costs. This approach is also difficult to
use unless QC testing can be auto-scheduled by the instrument, middleware or
LIS. Folks working on the instrument cannot possibly keep track of how many
samples have been tested for a given method. This approach is the foundation of
QC bracketing, which is used in some labs and is mandated for some testing.
Finally there is the way most of us schedule routine QC every X hours. Using
the approach we have discussed we would use our estimates balancing total cost
and risk to decide how long a time we should have between each QC event. This
is probably the most practical approach because we can keep track of the time
interval manually and increasingly instruments, middleware, etc. can autoschedule QC based on time. If we use the approach we have discussed to
determine the optimum time interval, this can be an effective way to do QC.
73
Steps to Optimized QC
1. Decide on the quality goal
Whats the Total Allowable Error ?
Page 74
2011
2011 Siemens
Siemens Healthcare
Healthcare Diagnostics
Diagnostics Inc.
Inc.
74