Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Estimation
Definitions
Confidence Interval An interval estimate with a specific level of confidence Confidence Level The percent of the time the true mean will lie in the interval estimate given. Consistent Estimator An estimator which gets closer to the value of the parameter as the sample size increases. Degrees of Freedom The number of data values which are allowed to vary once a statistic has been determined. Estimator A sample statistic which is used to estimate a population parameter. It must be unbiased, consistent, and relatively efficient. Interval Estimate A range of values used to estimate a parameter. Maximum Error of the Estimate The maximum difference between the point estimate and the actual parameter. The Maximum Error of the Estimate is 0.5 the width of the confidence interval for means and proportions. Point Estimate A single value used to estimate a parameter. Relatively Efficient Estimator The estimator for a parameter with the smallest variance. T distribution A distribution used when the population variance is unknown. Unbiased Estimator An estimator whose expected value is the mean of the parameter being estimated.
Introduction to Estimation
One area of concern in inferential statistics is the estimation of the population parameter from the sample statistic. It is important to realize the order here. The sample statistic is calculated from the sample data and the population parameter is inferred (or estimated) from this sample statistic. Let me say that again: Statistics are calculated, parameters are estimated. We talked about problems of obtaining the value of the parameter earlier in the course when we talked about sampling techniques. Another area of inferential statistics is sample size determination. That is, how large of a sample should be taken to make an accurate estimation. In these cases, the statistics can't be used since the sample hasn't been taken yet.
Point Estimates
There are two types of estimates we will find: Point Estimates and Interval Estimates. The point estimate is the single best value.
Unbiased: The expected value of the estimator must be equal to the mean of the parameter Consistent: The value of the estimator approaches the value of the parameter as the sample size increases Relatively Efficient: The estimator has the smallest variance of all estimators which could be used
Confidence Intervals
The point estimate is going to be different from the population parameter because due to the sampling error, and there is no way to know who close it is to the actual parameter. For this reason, statisticians like to give an interval estimate which is a range of values used to estimate the parameter. A confidence interval is an interval estimate with a specific level of confidence. A level of confidence is the probability that the interval estimate will contain the parameter. The level of confidence is 1 - alpha. 1-alpha area lies within the confidence interval.
Area in Tails
Since the level of confidence is 1-alpha, the amount in the tails is alpha. There is a notation in statistics which means the score which has the specified area in the right tail. Examples:
Z(0.05) = 1.645 (the Z-score which has 0.05 to the right, and 0.4500 between 0 and it) Z(0.10) = 1.282 (the Z-score which has 0.10 to the right, and 0.4000 between 0 and it).
As a shorthand notation, the () are usually dropped, and the probability written as a subscript. The greek letter alpha is used represent the area in both tails for a confidence interval, and so alpha/2 will be the area in one tail. Here are some common values
Confidence Level 50% 80% 90% 95% 98% 99% Area between 0 and z-score 0.2500 0.4000 0.4500 0.4750 0.4900 0.4950 Area in one tail (alpha/2) 0.2500 0.1000 0.0500 0.0250 0.0100 0.0050 z-score 0.674 1.282 1.645 1.960 2.326 2.576
Notice in the above table, that the area between 0 and the z-score is simply one-half of the confidence level. So, if there is a confidence level which isn't given above, all you need to do to find it is divide the confidence level by two, and then look up the area in the inside part of the Z-table and look up the z-score on the outside.
Student's t Distribution
When the population standard deviation is unknown, the mean has a Student's t distribution. The Student's t distribution was created by William T. Gosset, an Irish brewery worker. The brewery wouldn't allow him to publish his work under his name, so he used the pseudonym "Student". The Student's t distribution is very similar to the standard normal distribution.
It is symmetric about its mean It has a mean of zero It has a standard deviation and variance greater than 1. There are actually many t distributions, one for each degree of freedom As the sample size increases, the t distribution approaches the normal distribution. It is bell shaped. The t-scores can be negative or positive, but the probabilities are always positive.
Degrees of Freedom
A degree of freedom occurs for every data value which is allowed to vary once a statistic has been fixed. For a single mean, there are n-1 degrees of freedom. This value will change depending on the statistic being used.
The maximum error of the estimate is given by the formula for E shown. The t here is the tscore obtained from the Student's t table. The t-score is a factor of the level of confidence and the sample size. Once you have computed E, I suggest you save it to the memory on your calculator. On the TI82, a good choice would be the letter E. The reason for this is that the limits for the confidence interval are now found by subtracting and adding the maximum error of the estimate from/to the sample mean.
Notice the formula is the same as for a population mean when the population standard deviation is known. The only thing that has changed is the formula for the maximum error of the estimate.
Recall:
The best point estimate for p is p hat, the sample proportion: If the formula for z is divided by n in both the numerator and the denominator, then the formula for z
becomes: Solving this for p to come up with a confidence interval, gives the maximum error of the estimate
as:
This is not, however, the formula that we will use. The problem with estimation is that you don't know the value of the parameter (in this case p), so you can't use it to estimate itself - if you knew it, then there would be no problem to work out. So we will replace the parameter by the statistic in the formula for the maximum error of the estimate.
The maximum error of the estimate is given by the formula for E shown. The Z here is the zscore obtained from the normal table, or the bottom of the t-table as explained in the introduction toestimation. The z-score is a factor of the level of confidence, so you may get in the habit of writing it next to the level of confidence. When you're computing E, I suggest that you find the sample proportion, p hat, and save it to P on the calculator. This way, you can find q as (1-p). Do NOT round the value for p hat and use the rounded value in the calculations. This will lead to error. Once you have computed E, I suggest you save it to the memory on your calculator. On the TI-82, a good choice would be the letter E. The reason for this is that the limits for the confidence interval are now found by subtracting and adding the maximum error of the estimate from/to the sample proportion.
Population Mean
Here is the formula for the sample size which is obtained by solving the maximum error of the estimate formula for the population mean for n.
Population Proportion
Here is the formula for the sample size which is obtained by solving the maximum error of the estimate formula for the population proportion for n. Some texts use p hat and q hat, but since the sample hasn't been taken, there is no value for the sample proportion. p and q are taken from a previous study, if one is available. If there is no previous study or estimate available, then use 0.5 for p and q, as these are the values which will give the largest sample size, and it is better to have too large of a sample size and come under the maximum error of the estimate than to have too small of a sample size and exceed the maximum error of the estimate.