Sei sulla pagina 1di 77

Statistical Data Treatment and

Evaluation
Lecture 3

Statistical tests

Consider most common applications of statistical tests to the


treatment of analytical results:
Defining a numerical interval around the mean of a set of replicate
analytical results within which the population mean can be expected to
lie with a certain probability. This interval is called the confidence
interval (CI).
Determining the number of replicate measurements required to ensure
that an experimental mean falls within a certain range with a given level
of probability.
Estimating the probability that (a) an experimental mean and a true
value or (b) two experimental means are different.
Determining at a given probability level whether the precision of two
sets of measurements differs.
Comparing the means of more than two samples to determine whether
differences in the means are real or the result of random error. This
process is known as analysis of variance.
Deciding with a certain probability whether an apparent outlier in a set of
replicate measurements is the result of a gross error and can thus be
rejected or whether it is a legitimate part of the population that must be
retained in calculating the mean of the set.

Confidence Intervals
With statistics, however, we can establish an
interval surrounding an experimentally
determined mean x within which the population
mean is expected to lie with a certain degree
of probability.
This interval is known as the confidence interval and
the boundaries are called confidence limits.
Ex: 99% probable that the true population mean for a
set of potassium measurements lies in the interval
7.25% 0.15% K. Thus, the mean should lie in the
interval from 7.10% to 7.40% K with 99% probability.

Finding the Confidence Interval


- When is Known or s is a Good Estimate of
the shaded areas are the percentage of the total
area under the curve that is included within
these values of z
the quantity z is the deviation from the mean divided
by the population standard deviation
the true mean is likely to lie with a certain probability
provided we have a reasonable estimate of .
Ex: if we have a result x from a data set with a
standard deviation of , we may assume that 90
times out of 100, the true mean will fall in the
interval x 1.64 .
The probability is called the confidence level (CL).

The probability that a result is outside the confidence


interval is often called the significance level.
If we make a single measurement x from a
distribution of known , we can say that the true
mean should lie in the interval x z with a
probability dependent on z.
This probability is 90% for z = 1.64, 95% for z = 1.96,
and 99% for z = 2.58.

The confidence of the true mean based on


measuring a single value x:
CI for = x zs
We use the experimental mean of N
measurements as a better estimate of :

Apply only in the absence of bias and only if we can assume that s is
a good approximation of

Areas under a Gaussian curve for various values of z. (a) z=0.67

Areas under a Gaussian curve for various values of z. (b) z=1.29

Areas under a Gaussian curve for various values of z. (c) z=1.64

Areas under a Gaussian curve for various values of z. (d) z=1.96

Areas under a Gaussian curve for various values of z. (e) z=2.58

EXAMPLE 7-1
Determine the 80% and 95% confidence intervals for (a)
the first entry (1108mg/L glucose) in Example 6-2 (page
124) and (b) the mean value (1100.3mg/L) for month 1 in
the example. Assume that in each part, s = 19 is a good
estimate of .
Ans:
(a) From Table 7-1, z=1.28 and 1.96 for the 80% and
95% confidence levels:
80% CI = 1108 1.28 19 = 1108 24.3 mg/L
95% CI = 1108 1.96 19 = 1108 37.2 mg/L

(b) For the 7 measurements:

EXAMPLE 7-2
How many replicate measurements in month 1 in
Example 6-2 are needed to decrease the 95%
confidence interval to 1100.3 10.0 mg/L of glucose?

14 measurements are needed to provide a slightly better than 95% chance


that the population mean will lie within 14 mg/L of the experimental mean

Finding the Confidence Interval

- When is Unknown
Often, limitations in time or in the amount of available
sample prevent us to assume s is a good estimate of
I.e., a single set of replicate measurements must provide not
only a mean but also an estimate of precision.

To account for the variability of s, we use the important


statistical parameter t, which is defined in exactly the
same way as z except that s is substituted for .
For a single measurement with result x, we can define t as

For the mean of N measurements,

t Statistic
Statistical treatment of small sets of data
Often called Students t, t depends on
the desired confidence level.
the number of degrees of freedom in the calculation
of s.

The confidence interval for the mean of N


replicate measurements:

EXAMPLE 7-3
A chemist obtained the following data for the
alcohol content of a sample of blood: % C2H5OH:
0.084, 0.089, and 0.079. Calculate the 95%
confidence interval for the mean assuming (a)
the three results obtained are the only indication
of the precision of the method and (b) from
previous experience on hundreds of samples,
we know that the standard deviation of the
method s = 0.005% C2H5OH and is a good
estimate of .

Statistical Aids to Hypothesis Testing


Experimental results seldom agree exactly with
those predicted from a theoretical model.
Scientists and engineers frequently must judge
whether a numerical difference is a results of the
random errors inevitable in all measurements or a
result of systematic errors. Certain statistical tests are
useful in sharpening these judgments.
Tests of this kind make use of a null hypothesis,
which assumes that the numerical quantities being
compared are the same.

We then use a probability distribution to


calculate the probability that the observed
differences are a result of random error.
If the observed difference is greater than or equal to
the difference that would occur 5 times in 100 by
random chance (a significance level of 0.05), the null
hypothesis is considered questionable, and the
difference is judged to be significant
Other significance levels, such as 0.01 (1%) or 0.001
(0.1%), may also be adopted.
The significance level is often given the symbol .
The confidence level (CL) is then:
CL= (1 ) x 100%

Hypothesis Tests Examples


Comparing:
the mean of an experimental data set
with what is believed to be the true
value;
the mean to a predicted or cutoff
(threshold) value;
the means or the standard deviations
from two or more sets of data.

Comparing an Experimental Mean with


a Known Value
We use a statistical hypothesis test to draw
conclusions about the population mean and
its nearness to the known value 0.
A known value (0):
The true or accepted value based on prior knowledge
or experience
Predicted from theory
A threshold value for making decisions about the
presence or absence of a constituent.

Comparing an Experimental Mean with


a Known Value (cont.)
Two contradictory outcomes:
The null hypothesis H0, states that = 0.
The alternative hypothesis Ha, can be stated in
several ways:
We might reject the null hypothesis in favor of Ha if is
different from 0 (0).
Other alternative hypotheses are 0 or 0.

Ex: determining whether the concentration of lead in


an industrial wastewater discharge exceeds the
maximum permissible amount of 0.05 ppm:
H0 : =0.05 ppm
Ha: >0.05 ppm

A statistical test procedure:


Formulated from the data for the decision to accept or
reject H0
The formation of an appropriate test statistic and the
identification of a rejection region.

Reject region:
The rejection region consists of all the values of the
test statistic for which H0 will be rejected.
The null hypothesis is rejected if the test statistic lies
within the rejection region.

Test statistics:
Large sample: z statistic test
Small sample: t statistic test

Large Sample z Test


Applied when
a large number of results are available;
s is a good estimate of

The procedure:
State the null hypothesis: H0: = 0
Form the test statistic:
State the alternative hypothesis, Ha, and determine
the rejection region:
For Ha: 0, reject H0 if z zcrit or if z zcrit
For Ha: 0, reject H0 if z zcrit
For Ha: 0, reject H0 if z z crit

A two-tailed test, 95% confidence level:


for Ha: 0, we can reject for either a positive
value of z or for a negative value of z that exceeds
the critical value.
the probability that z exceeds zcrit is 0.025 in each tail
or 0.05 total.
The critical value of is 1.96 for this one-tailed test

Fig 7.2 (a)

A one-tailed test , 95% confidence level:


for Ha: > 0, we can reject only when z zcrit.
Now for the 95% confidence level, we want the
probability that z exceeds zcrit to be 5% or the total
probability in both tails to be 10%.
The overall significance level would be = 0.10.
The critical value of z is 1.64 for this one-tailed test.

Fig 7.2 (b)

EXAMPLE 7-4
A class of 30 students determined the
activation energy of a chemical reaction to
be 27.7 kcal/mol (mean value) with a
standard deviation of 5.2 kcal/mol. Are the
data in agreement with the literature value
of 30.8 kcal/mol at (1) the 95% confidence
level and (2) the 99% confidence level?
Estimate the probability of obtaining a
mean equal to the literature value.

We have enough values here so that s should be a good


estimate of . Our null hypothesis is that = 30.8
kcal/mol, 30.8 kcal/mol.
Here:

This is a two-tailed test:


zcrit = 1.96 for the 95% confidence level.
zcrit =2.58 for the 99% confidence level.
Since z 1.96 , we reject the null hypothesis at the 95%
confidence level. Note also that since z 2.58, we reject H0 at
the 99% confidence level.

The probability of obtaining a z value of -3.26 because of


random error is only about 0.2%! (next slide, table 7.1)
Conclusion: the student mean is actually different from
the literature value and not just the result of random error.

Small Sample t Test


Similar to the z test except that the test statistic
is the t statistic:
The procedure is as follows:
State the null hypothesis: H0: = 0
Form the test statistic:
x 0
t=
s/ N
State the alternative hypothesis, Ha, and determine
the rejection region:
For Ha: 0, reject H0 if t tcrit or if t tcrit (two-tailed
test)
For Ha: 0, reject H0 if t tcrit
For Ha: 0, reject H0 if t t crit

Figure 7-3
Illustration of systematic error
in an analytical method. Curve
A is the frequency distribution
for the accepted value by a
method without bias. Curve B
illustrates the frequency
distribution of results by a
method that could have a
significant bias.

In testing for bias, we do not know initially whether the


difference between the experimental mean and the accepted
value is due to random error or to an actual systematic error.
The t test is used to determine the significance of the
difference.

EXAMPLE 7-5
A new procedure for the rapid determination of the
percentage of sulfur in kerosenes was tested on a
sample known from its method of preparation to contain
0.123% (0 = 0.123%) S. The results were % S = 0.112,
0.118, 0.115, and 0.119. Do the data indicate that there
is a bias in the method at the 95% confidence level?

The null hypothesis is H0: = 0.123% S, and the alternative


hypothesis is Ha: 0.123% S.

The t test statistic gives:

Excel function
TDIST (x, deg_freedom, tails)
test value of t
TDIST (4.375, 3, 2) = 0.022
Only 2.2% probable to get a value
because of random error.
TINV (probability,
degree_freedom)
TINV (0.05, 3) = 3.1825
The critical value of t for 95%
confidence interval

Since t 3.18, we conclude that there is a significant difference at the


95% confidence level and thus bias in the method.
We would accept the null hypothesis at the 99 confidence level and
conclude that these is no difference between the experimental and the
accepted values.
Choice of the confidence level depends on our willingness to accept an
error in the outcome.
The significant level (0.05 or 0.01) is the probability of making an error
by rejecting the null hypothesis.

Comparison of Two Experimental


Means
Frequently, chemists must judge whether a difference
in the means of two sets of data is real or the result of
random error.
Assuming N1 replicate analyses by analyst 1 yielded a
mean value of x1 and the standard deviation s1 and
that N2 analyses by analyst 2 obtained by the same
method gave x2 and s2.
The t Test for Differences in Means
Null hypothesis states that the two means are identical and
that any difference is the result of random errors: H0: 1 =
2;
Alternative hypothesis: Ha: 1 2;
and the test is a two-tailed test.

The standard deviation of the mean of analyst 1: sm1 =


The variance of the mean of analyst 1:
Likewise, the variance of the mean of analyst 2:
In the t test, we are interested in the difference between
the means:
.

x1 x2

The variance of the difference between the means:

The standard deviation of the difference between the


means is:

s1
N1

Assuming that the pooled standard deviation spooled is a


better estimate of than sm1 or sm2:

The test statistic t is then:

The test static is then compared with the critical value of


t obtained from the table for the particular confidence
level desired.
The number of degrees of freedom for finding the critical value of
t in Table 7-3 is N1 + N2 2.

EXAMPLE 7-6
Two barrels of wine were analyzed for their
alcohol content to determine whether they were
from different sources. On the basis of six
analyses, the average content of the first barrel
was established to be 12.61% ethanol. Four
analyses of the second barrel gave a mean of
12.53% alcohol. The 10 analyses yielded a
pooled standard deviation spooled of 0.070%. Do
the data indicate a difference between the wines?

The null hypothesis is H0: 1 = 2, and the


alternative hypothesis is Ha: 1 = 2.
The test statistic t :

The critical value of t at the 95% confidence level for


10 2 = 8 degrees of freedom is 2.31. Since 1.771
2.31, we accept the null hypothesis at the 95%
confidence level and conclude that there is no
difference in the alcohol content of the wines.
Excel: TDIST(1.771,8,2) = 0.11. There is a 11%
chance that we could get a random error.

Paired Data
Scientists and engineers often make use
of pairs of measurements on the same
sample to minimize sources of variability
that are not of interest.
Ex: using two different methods to evaluate
two different samples.
There would be variability from different samples.
A better way would be use both methods on the
same samples and to focus on the differences.

Paired Data
The paired t test uses the same type of
procedure as the normal t test except that we
analyze pairs of data.
Our null hypothesis is H0: d = 0, where 0 is a
specific value of the difference to be tested, often zero.
The alternative hypothesis could be

The test statistic value is


Where d is the average difference equal to di / N .

EXAMPLE 7-7
A new automated procedure for determining
glucose in serum (Method A) is to be compared
with the established method (Method B). Both
methods are performed on serum from the same
six patients to eliminate patient-to-patient
variability. Do the following results confirm a
difference in the two methods at the 95%
confidence level?

Hypotheses:
If d is the true average difference between the methods, the null
hypothesis H0: d = 0 and the alternative hypothesis, Ha: d 0.

The test statistic is

where

d=
N

16 + 9 + 25 + 5 + 22 + 11
= 14.67
6

and the standard deviation of the difference sd:

From Table 7-3, the critical value of t is 2.57 for the 95% confidence
level and 5 degrees of freedom.
Since t tcrit, we reject the null hypothesis and conclude that the two
methods give different results

NOTE:
If we merely average the results of Method A
(836.0 mg/L) and the results of Method B
(821.3 mg/L), the large patient-to-patient
variation in glucose level gives us large
values for sA (146.5) and sB (142.7).
A comparison of means gives us a t value of
0.176a and we would accept the null
hypothesis!
Hence, the large patient-to-patient variability
masks the method differences that are of
interest. Pairing allows us to focus on the
differences.

Errors in Hypothesis Testing


The choice of a rejection region for the null
hypothesis is made so that we can readily
understand the errors involved.
Type I error:
The error that results from rejecting H0 when it is true.
An unusual results occurred that put our test statistic
z or t into the rejection region.
Ex: at the 95% confidence level, there is a 5% chance that
we will reject the null hypothesis even though it is true.

The significance level gives the frequency of


rejecting H0 when it is true.

Type II error:
We accept H0 when it is false.
The probability of a type II error is given the symbol .

Making smaller (0.01 instead of 0.05) would


appear to minimize the type I error rate.
Decreasing the type I error rate, however,
increases the type II error rate because they are
inversely related.
If a type I error is much more likely to have serious
consequences than a type II error, it is reasonable to
choose a small value of .
On the other hand, in some situations a type II error
would be quite serious, and so a larger value of is
employed to keep the type II error rate under control.

As a general rule of thumb, the largest that is


tolerable for the situation should be used. ->
This ensures the smallest type II error while
keeping the type I error within acceptable limits.

Comparison of Precision
While comparing the variances( or
standard deviations) of two populations:
F test can be used to test this assumption
under the provision that the populations follow
the normal (Gaussian) distribution.
The F test is also used in comparing more
than two means and in linear regression
analysis.

F test
Defined as the ratio of the two sample
variances
,
Calculated and compared with the critical
value of F at the desired significance level.
The null hypothesis that the two population
variances under consideration are equal,
H0:
.
The null hypothesis is rejected if the test
statistic differs too much from 1.
Critical values of F at the 0.05 significance
level are shown in Table 7-4.

Two degrees of freedom:


One associated with the numerator and the other with
the denominator.

Can used in either a one-tailed mode or a twotailed mode.

EXAMPLE 7-8
A standard method for the determination of the
carbon monoxide (CO) level in gaseous
mixtures is known from many hundreds of
measurements to have a standard deviation of
0.21 ppm CO. A modification of the method
yields a value for s of 0.15 ppm CO for a pooled
data set with 12 degrees of freedom. A second
modification, also based on 12 degrees of
freedom, has a standard deviation of 0.12 ppm
CO. Is either modification significantly more
precise than the original?

Here we test the null hypothesis H0:


where
is the variance of the standard method
and
is the variance of the modified method.
The alternative hypothesis is one-tailed, Ha:
The variances of the modifications are placed in
the denominator:
for the first modification:
for the second modification:
Sstd is a good estimate of and the number of the
degrees of freedom from the numerator can be taken
as infinite and hence the critical value of F at the 95%
confidence level is Fcrit 2.30.

F1 < Fcrit :
We cannot reject the null hypothesis.
There is no improvement in precision

F2 > Fcrit :
We reject the null hypothesis.
The second method does appear to give better precision a
the 95% confidence level.

When comparing the two methods:

With Fcrit = 2.69. Since F 2.69, we must accept H0


and conclude that the two methods give equivalent
precision.

Analysis of Variance (ANOVA)


Used to test whether a difference exists in
the means of more than two populations.
Using a single test to determine whether
there is or is not a difference among the
population means rather than pair-wise
comparisons, as are done with the t test.
Experimental design methods take
advantage of ANOVA in planning and
performing experiments.

ANOVA Concepts
Detect differences in several population
means by comparing the variances:
For comparing I population means, 1,
2, 3, I,
the null hypothesis H0 is
H0: 1 = 2 =3 = =I
and the alternative hypothesis Ha is
Ha: at least two of the mis are different

ANOVA Concepts
Typical applications of ANOVA:
Is there a difference in the results of five analysts
determining calcium by a volumetric method?
Will four different solvent compositions have
differing influences on the yield of a chemical
synthesis?
Are the results of manganese determinations by
three different analytical methods different?
Are there any differences in the fluorescence of a
complex ion at six different values of pH?

For the 4 examples given, the factors,


levels and the responses are:

The factors are considered the independent


variables, whereas the response is the
dependent variable.

ANOVA data for the five analysts determining Ca in


triplicate
A single-factor, or one-way, ANOVA type.

The basic principle of ANOVA is to compare the


between-groups variation with the within-groups
variation.
In ANOVA, the factor levels are often called groups.
When H0 is true, the variation between the group
means is close to the variation within groups. When
H0 is false, the variation between group means is
large compared with the variation within groups.
In previous figure, the groups are the different
analysts, and ANOVA is a comparison between
analysts to the within-analyst variation.

The basic statistical test used for ANOVA is the


F test.
A large value of F compared with Fcrit from the tables
may give us reason to reject H0 in favor of the
alternative hypothesis.

Single-Factor ANOVA
For I populations
the sample means of are:
the sample variances are:
The grand average (i.e., the average of all the data):

(weighted average)

Where N1 is the number of measurements in group 1, N2 is


the number in group 2, and so on. The total number of
measurements N.

Sums of squares to obtain between-groups


variation and within-in groups variation :
The sum of the squares due to the factor (SSF):

The sum of the squares due to error (SSE):

or related to the individual group variances by:

The total sum of squares (SST) is:


SST = SSF + SSE
or obtained from (N 1 )s2, where s2 is the sample
variance of all the data points.

F test for one-way ANOVA


ANOVA assumptions:
The variances of the I populations are assumed to be
identical.
The largest s should not be much more than twice the
smallest s for equal variances assumption

Each of the I populations is assumed to follow a


Gaussian distribution.

The number of degrees of freedom for


each of the sum of squares:
SST has N 1 degrees of freedom.
SSF has I 1 degrees of freedom.
SSE has N 1 degrees of freedom.

F test for one-way ANOVA


Mean square values:
Quantities for the estimation of between-groups and
within groups variations. Including:
Mean square error (MSE)
an estimate of the variance due to error

Mean square due to factor levels (MSF)


an estimate of the error variance plus the between-groups
variance

F test for one-way ANOVA


The test statistic is the F value, calculated as

If the factor has little effect, the between-groups


variance should be small compared with the error
variance. Thus, the two mean squares should be
nearly identical under these circumstances.
If the factor effect is significant, MSF is greater than
MSE.
We reject H0 if F exceeds the critical value

ANOVA Table

EXAMPLE 7-9

Determining Which Results Differ


There are several methods to determine
which means are significantly different.
One of the simplest is the least
significant difference (LSD) method:
A difference is calculated that is judged to be
the smallest difference that is significant.
The difference between each pair of means is
then compared with the least significant
difference to determine which means are
different.

Least Significant Difference (LSD)


For an equal number of replicates Ng in
each group:

where MSE is the mean square for error,


the value of t should have N I degrees of
freedom

EXAMPLE 7-10

Detection of Gross Errors


There are times when a set of data
contains an outlying result that appears to
be outside the range of what ransom
errors in the procedure would give.
Develop a criterion to decide whether to
retain or reject the outlying data point
the outlier could be the result of an
undetected gross error.
Although there is no universal rule to settle
the question of retention or rejection, the Q
test is generally acknowledged to be an
appropriate method for making the decision.

The Q Test
A simple, widely used statistical test for deciding whether
a suspected result should be retained or rejected.
The absolute value of the difference between the
questionable result xq and its nearest neighbor xn is
divided by the spread w of the entire set:

Compared with critical values Qcrit in Table 7-5.


If Q is greater than Qcrit, the questionable result can be
rejected with the indicated degree of confidence.

EXAMPLE 7-11
The analysis of a calcite sample yielded CaO
percentages of 55.95, 56.00, 56.04, 56.08, and 56.23.
The last value appears anomalous; should it be retained
or rejected at the 95% confidence level?
The difference between 56.23 and 56.08 is 0.15%. The
spread (56.23 55.95 ) is 0.28%. Thus,

For five measurements, Qcrit at the 95% confidence level


is 0.71. Because 0.54 < 0.71, we must retain the outlier
at the 95% confidence level.

The Q test for outliers

Recommendations for Treating Outliers

Reexamine carefully all data relating to the


outlying result to see if a gross error could have
affected its value.
If possible, estimate the precision that can be
reasonably expected from the procedure to be
sure that the outlying result actually is
questionable.
Repeat the analysis if sufficient sample and time
are available.
If more data cannot be secured, apply the Q test
to the existing set to see if the doubtful result
should be retained or rejected on statistical
grounds.
If the Q test indicates retention, consider reporting
the median of the set rather than the mean.

Potrebbero piacerti anche