Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Fall 2011
7.1 Basic Properties of Condence Intervals 7.2 Large-Sample Condence Intervals for a Population Mean and Proportion 7.3 Intervals Based on a Normal Population Distribution 7.4 Condence Intervals for the Variance and Standard Deviation of a Normal Population
Consider a random sample X1 , ..., Xn from N (, 2 ) and x1 , ..., xn be the actual observations of the random sample. N (, 2 /n). Sample mean X Z= X N (0, 1) / n X 1.96) = 0.95 / n
P (1.96
P (1.96 is equivalent to
X 1.96) = 0.95 / n
1.96 + 1.96 P (X X ) = 0.95 n n Thus, 1.96 , X + 1.96 ) (X n n is a random interval that includes or covers the true value of .
1.96 (X , X + 1.96 ) n n is a random interval that includes or covers the true value of .
(1)
Denition
If, after observing X1 = x1 , X2 = x2 , ..., Xn = xn , we compute the observed , the resulting sample mean x and then substitute x into (1) in place of X xed interval + 1.96 ) ( x 1.96 , x n n is called a 95% condence interval for .
Example
Exercises 1: Consider a normal population with the value of known. 1 What is the condence interval level for the interval x 2.81/ n? 2 What is the condence interval level for the interval x 1.44/ n? 3 What is the value of z /2 that will result in a condence level of 99.7%?
Proposition
If n is suciently large, the standardized variable Z= X S/ n
has approximately a standard normal distribution. This implies that s x z/2 n is a large-sample condence interval for with condence level approximately 100(1 )%. This formula is valid regardless of the shape of the population distribution.
STAT355 () - Probability & Statistics
Furthermore, if both np 10 and n(1 p ) 10, then X has approximately a normal distribution.
The standard deviation p involves the unknown parameter p . Standardizing p by subtracting p and dividing by p then implies that P (z/2 p p p (1 p )/n z/2 ) 1
Proposition
Let p =
2 /2n p +z/ 2 2 /n 1+z/ 2
p z/2
Exercise (7.2) 21
In a sample of 1000 randomly selected consumers who had opportunities to send in a rebate claim form after purchasing a product, 250 of these people said they never did so. Calculate an upper condence bound at the 95% condence level for the true proportion of such consumers who never apply for a rebate. Based on this bound, is there compelling evidence that the true proportion of such consumers is smaller than 1/3?
The CI for presented earlier is valid provided that n is large. The resulting interval can be used whatever the nature of the population distribution. The CLT cannot be invoked, however, when n is small. In this case, one way to proceed is to make a specic assumption about the form of the population distribution and then derive a CI tailored to that assumption.
Assumption
The population of interest is normal, so that X1 , ..., Xn constitutes a random sample from a normal distribution with both and 2 unknown.
The key result underlying the interval in earlier section was that for large X has approximately a standard normal distribution. n, the rv Z = S / n When n is small, S is no longer likely to be close to s , so the variability in the distribution of Z arises from randomness in both the numerator and the denominator. This implies that the probability distribution of out than the standard normal distribution.
X S/ n
The result on which inferences are based introduces a new family of probability distributions called t distributions.
Theorem
is the mean of a random sample of size n from a normal When X distribution with mean, the rv T = X S/ n
Properties of t Distributions
X Although the variable of interest is still S , we now denote it by T to / n emphasize that it does not have a standard normal distribution when n is small.
We know that a normal distribution is governed by two parameters; each dierent choice of in combination with 2 gives a particular normal distribution. Any particular t distribution results from specifying the value of a single parameter, called the number of degrees of freedom, abbreviated df.
Properties of t Distributions
Well denote this parameter by the Greek letter . Possible values of are the positive integers 1, 2, 3,... So there is a t distribution with 1 df, another with 2 df, yet another with 3 df, and so on. For any xed value of , the density function that species the associated t curve is even more complicated than the normal density function. Fortunately, we need concern ourselves only with several of the more important features of these curves.
Properties of t Distributions
Each t curve is bell-shaped and centered at 0. Each t curve is more spread out than the standard normal (z ) curve. As increases, the spread of the corresponding t curve decreases. As , the sequence of t curves approaches the standard normal curve (so the z curve is often called the t curve with df =).
Properties of t Distributions
T =
X S/ n
The number of df for T is n 1 because, although S is based on the n , ..., X Xn , the fact that (Xi X ) = 0 implies that deviations X1 X only n 1 of these are freely determined. The number of df for a t variable is the number of freely determined deviations on which the estimated standard deviation in the denominator of T is based. The use of t distribution in making inferences requires notation for capturing t -curve tail areas t analogous to z for the z curve.
Properties of t Distributions
Notation: Let t, = the number on the measurement axis for which the area under the t curve with df to the right of t, is ; t, is called a t critical value. For example, t.05,6 is the t critical value that captures an upper-tail area of 0.05 under the t curve with 6 df. Because t curves are symmetric about zero, -t, captures lower-tail area . Appendix Table A.5 gives t, for selected values of and n. The columns of the table correspond to dierent values of . To obtain t0.05,15 , go to the = 0.05 column, look down to the n = 15 row, and read t0.05,15 = 1.753.
STAT355 () - Probability & Statistics
Proposition
Let x and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with mean . Then a 100(1 )% condence interval for is s s ( x t/2,n1 , x + t/2,n1 ) n n or, more compactly, s x t/2,n1 n
Use R software.
Example (12) Consider the following sample of fat content (in percentage) of n = 10 randomly selected hot dogs (Sensory and Mechanical Assessment of the Quality of Frankfurters, J. of Texture Studies, 1990: 395409): 25.2 21.3 22.8 17.0 29.8 21.0 25.5 16.0 20.9 19.5 Assuming that these were selected from a normal population distribution, nd a 95% CI for (interval estimate of) the population mean fat content. Use your calculator to obtain x and s .
Denition
Let X1 , X2 , ..., Xn be a random sample from a normal distribution with parameters and 2 . Then the rv (n 1)S 2 = 2 )2 (Xi X 2
has a chi-squared (2 ) probability distribution with = n 1 df. Notation: Let 2 , called a chi-squared critical value, denote the number on the horizontal axis such that of the area under the chi-squared curve with df lies to the right of 2 , . Remark: The chi-squared distribution is not symmetric
Condence Interval of 2
From the theorem, P (2 1/2,n1 we get the inequalities (n 1)S 2 (n 1)S 2 2 2 /2,n1 1/2,n1 A 100(1 )% condence interval for the variance 2 of a normal population is (n 1)s 2 (n 1)s 2 , ) ( 2 /2,n1 2 1/2,n1 (n 1)S 2 2 /2,n1 ) = 1 2
(Suppl) 51
An April 2009 survey of 2253 American adults conducted by the Pew Research Centers Internet & American Life Project revealed that 1262 of the respondents had at some point used wireless means for online access. 1 Calculate an interpret a 95% CI for the proportion of all American adults who at the time of the survey had used wireless means for online access. 2 What sample size is required if the desired width of the 95% CI is to be at most 0.04, irrespective of the sample results?