Sei sulla pagina 1di 3

AP Statistics Tutorial: Hypothesis Test for Slope of Regression Line This lesson describes how to conduct a hypothesis test

to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y. The test focuses on the slope of the regression line Y = 0 + 1X where 0 is a constant, 1 is the slope (also called the regression coefficient), X is the value of the independent variable, and Y is the value of the dependent variable. Test Requirements The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met. The dependent variable Y has a linear relationship to the independent variable X. For each value of X, the probability distribution of Y has the same standard deviation . For any given value of X, The Y values are independent. The Y values are roughly normally distributed (i.e., symmetric and unimodal). A littleskewness is ok if the sample size is large. Previously, we described how to verify that regression requirements are met. The test procedure consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. State the Hypotheses If there is a significant linear relationship between the independent variable X and the dependent variable Y, the slope will not equal zero. H0: 1 = 0 Ha: 1 0 The null hypothesis states that the slope is equal to zero, and the alternative hypothesis states that the slope is not equal to zero. Formulate an Analysis Plan The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify the following elements. Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used. Test method. Use a linear regression t-test (described in the next section) to determine whether the slope of the regression line differs significantly from zero. Analyze Sample Data Using sample data, find the standard error of the slope, the slope of the regression line, the degrees of freedom, the test statistic, and the P-value associated with the test statistic. The approach described in this section is illustrated in the sample problem at the end of this lesson. Standard error. Many statistical software packages and some graphing calculators provide thestandard error of the slope as a regression analysis output. The table below shows hypothetical output for the following regression equation: y = 76 + 35x . Predictor Coef SE Coef T P Constant 76 30 2.53 0.01 X 35 20 1.75 0.04

In the output above, the standard error of the slope (shaded in gray) is equal to 20. In this example, the standard error is referred to as "SE Coeff". However, other software packages might use a different label for the standard error. It might be "StDev", "SE", "Std Dev", or something else. If you need to calculate the standard error of the slope (SE) by hand, use the following formula: SE = sb1 = sqrt [ (yi - i)2 / (n - 2) ] / sqrt [ (xi - x)2 ] where yi is the value of the dependent variable for observation i, i is estimated value of the dependent variable for observation i, xi is the observed value of the independent variable for observation i, x is the mean of the independent variable, and n is the number of observations. Slope. Like the standard error, the slope of the regression line will be provided by most statistics software packages. In the hypothetical output above, the slope is equal to 35. Degrees of freedom. The degrees of freedom (DF) is equal to: DF = n - 2 where n is the number of observations in the sample. Test statistic. The test statistic is a t-score (t) defined by the following equation. t = b1 / SE where b1 is the slope of the sample regression line, and SE is the standard error of the slope. P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t-score, use the t Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.

Interpret Results If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level. Test Your Understanding of This Lesson

Problem The local utility company surveys 101 randomly selected customers. For each survey participant, the company collects the following: annual electric bill (in dollars) and home size (in square feet). Output from a regression analysis appears below. Regression equation: Annual bill = 0.55 * Home size + 15 Predictor Constant Home size Coef 15 0.55 SE Coef 3 0.24 T 5.0 2.29 P 0.00 0.01

Is there a significant linear relationship between annual bill and home size? Use a 0.05 level of significance. Solution The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis. H0: The slope of the regression line is equal to zero. Ha: The slope of the regression line is not equal to zero. If the relationship between home size and electric bill is significant, the slope will not equal zero.

Formulate an analysis plan. For this analysis, the significance level is 0.05. Using sample data, we will conduct a linear regression t-test to determine whether the slope of the regression line differs significantly from zero. Analyze sample data. To apply the linear regression t-test to sample data, we require thestandard error of the slope, the slope of the regression line, the degrees of freedom, the t-score test statistic, and the P-value of the test statistic. We get the slope (b1) and the standard error (SE) from the regression output. b1 = 0.55 SE = 0.24

We compute the degrees of freedom and the t-score test statistic, using the following equations. DF = n - 2 = 101 - 2 = 99 t = b1/SE = 0.55/0.24 = 2.29 where DF is the degrees of freedom, n is the number of observations in the sample, b1 is the slope of the regression line, and SE is the standard error of the slope. Based on the t-score test statistic and the degrees of freedom, we determine the P-value. The P-value is the probability that a t-score having 99 degrees of freedom is more extreme than 2.29. Since this is a two-tailed test, "more extreme" means greater than 2.29 or less than -2.29. We use the t Distribution Calculator to find P(t > 2.29) = 0.0121 and P(t < 12.29) = 0.0121. Therefore, the P-value is 0.0121 + 0.0121 or 0.0242.

Interpret results. Since the P-value (0.0242) is less than the significance level (0.05), we cannot accept the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention that this approach is only appropriate when the standard requirements for simple linear regression are satisfied.

AP Statistics Tutorial: A Simple Regression Example Note: Regression computations are usually handled by a software package or a graphing calculator. For this example, however, we will do the computations "manually", since the gory details have educational value. Problem Statement Last year, five randomly selected students took a math aptitude test before they began their statistics course. The Statistics Department has three questions. What linear regression equation best predicts statistics performance, based on math aptitude scores? If a student made an 80 on the aptitude test, what grade would we expect her to make in statistics? How well does the regression equation fit the data? How to Find the Regression Equation In the table below, the xi column shows scores on the aptitude test. Similarly, the yi column shows statistics grades. The last two rows show sums and mean scores that we will use to conduct the regression analysis. Student xi yi (xi - x) (yi - y) (xi - x)2 (yi - y)2 1 2 3 4 5 Sum Mean 95 85 85 95 80 70 70 65 60 70 390385 78 77 17 7 2 -8 -18 8 18 -7 -12 -7 289 49 4 64 324 730 64 324 49 144 49 630 (xi - x)(yi - y) 136 126 -14 96 126 470

The regression equation is a linear equation of the form: = b0 + b1x . To conduct a regression analysis, we need to solve for b0 and b1. Computations are shown below. b1 = [ (xi - x)(yi - y) ] / [ (xi - x)2] b1 = 470/730 = 0.644 Therefore, the regression equation is: = 26.768 + 0.644x . How to Use the Regression Equation Once you have the regression equation, using it is a snap. Choose a value for the independent variable (x), perform the computation, and you have an estimated value () for the dependent variable. In our example, the independent variable is the student's score on the aptitude test. The dependent variable is the student's statistics grade. If a student made an 80 on the aptitude test, the estimated statistics grade would be: = 26.768 + 0.644x = 26.768 + 0.644 * 80 = 26.768 + 51.52 = 78.288 Warning: When you use a regression equation, do not use values for the independent variable that are outside the range of values used to create the equation. That is called extrapolation, and it can produce unreasonable estimates. In this example, the aptitude test scores used to create the regression equation ranged from 60 to 95. Therefore, only use values inside that range to estimate statistics grades. Using values outside that range (less than 60 or greater than 95) is problematic. How to Find the Coefficient of Determination Whenever you use a regression equation, you should ask how well the equation fits the data. One way to assess fit is to check the coefficient of determination, which can be computed from the following formula. R2 = { ( 1 / N ) * [ (xi - x) * (yi - y) ] / (x * y ) }2 where N is the number of observations used to fit the model, is the summation symbol, xi is the x value for observation i, x is the mean x value, yi is the y value for observation i, y is the mean y value, x is the standard deviation of x, and y is the standard deviation of y. Computations for the sample problem of this lesson are shown below. x = sqrt [ ( xi - x )2 / N ] x = sqrt( 730/5 ) = sqrt(146) = 12.083 y = sqrt [ ( yi - y )2 / N ] y = sqrt( 630/5 ) = sqrt(126) = 11.225 b0 = y - b1 * x b0 = 77 - (0.644)(78) = 26.768

R2 = { ( 1 / N ) * [ (xi - x) * (yi - y) ] / (x * y ) }2 R2 = [ ( 1/5 ) * 470 / ( 12.083 * 11.225 ) ]2 = ( 94 / 135.632 )2 = ( 0.693 )2 = 0.48

A coefficient of determination equal to 0.48 indicates that about 48% of the variation in statistics grades (the dependent variable) can be explained by the relationship to math aptitude scores (theindependent variable). This would be considered a good fit to the data, in the sense that it would substantially improve an educator's ability to predict student performance in statistics class.

Potrebbero piacerti anche