Sei sulla pagina 1di 18

Making Sense of Regression Results

Kwamina Banson
Socio-Economics Department

BNARI Seminar Room

30th 07 - 2009

Linear Regression: Introduction

Interpreting SPSS regression output


Coefficients for independent variables Fit of the regression: R Square

Statistical significance

How to reject the null hypothesis

Multivariate regressions

Academic Performance of Junior High Sch.

What is SPSS?

SPSS is a computer program used for a wide variety of statistical analysis. (Statistical Package for the Social Sciences) Statistical Product and Service Solutions In addition to statistical analysis, data management and data documentation are features of the base software. Statistics included in the base software:

Descriptive statistics: Cross tabulation, Frequencies, Descriptives, Explore, Descriptive Ratio Statistics Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial, distances), Nonparametric tests Prediction for numerical outcomes: Linear regression Prediction for identifying groups: Factor analysis, cluster analysis (twostep, K-means, hierarchical), Discriminant

Interpreting SPSS regression output


100

y = mx + b.
80

60

Slope or coefficient

where m is the slope of the line and b is the yintercept

40

Graduation Rate

How tight is the fit? Y-intercept or constant


20

0 0 200 400 600 800 1000 1200 1400 1600

Rsq = 0.3454

Average SAT Score

Interpreting SPSS regression output

An SPSS regression output includes two key tables for interpreting your results:

A Coefficients table that contains the yintercept (or constant) of the regression, a coefficient for every independent variable, and the standard error of that coefficient.

A Model Summary table that gives you information on the fit of your regression.

Interpreting SPSS regression output: Coefficients


Coefficientsa Unstandardized Coefficients Std. B Error 4.236 7.048 .007 Standardized Coefficients Beta .588 t .601 8.778 Sig. .549 .000

Model 1

(Constant) Average 5.88E-02 SAT Score

y = mx + b.
where m is the slope of the line and b is the yintercept

a. Dependent Variable: Graduation Rate

Here, we will ONLY LOOK AT UNSTANDARDIZED COEFFICIENTS!


The y-intercept is 4.2% with a standard error of 7.0% The coefficient for SAT Scores is 0.059%, with a standard error of 0.007%.

Interpreting SPSS regression output: Coefficients


Coefficientsa Unstandardized Coefficients Std. B Error 4.236 7.048 .007 Standardized Coefficients Beta .588 t .601 8.778 Sig. .549 .000

y = mx + b.
where m is the slope of the line and b is the yintercept

Model 1

(Constant) Average 5.88E-02 SAT Score

a. Dependent Variable: Graduation Rate

The y-intercept or constant is the predicted value of the dependent variable when the independent variable takes on the value of zero. This basic model predicts that when a college admits a class of students who averaged zero on their SAT, 4.2% of them will graduate. The constant is not the most helpful statistic.

Interpreting SPSS regression output: Coefficients


Coefficientsa Unstandardized Coefficients Std. B Error 4.236 7.048 .007 Standardized Coefficients Beta .588 t .601 8.778 Sig. .549 .000

y = mx + b.
where m is the slope of the line and b is the yintercept

Model 1

(Constant) Average 5.88E-02 SAT Score

a. Dependent Variable: Graduation Rate

The coefficient of an independent variable is the predicted change in the dependent variable that results from a one unit increase in the independent variable. A college with students whose SAT scores are one point higher on average will have a graduation rate that is 0.059% higher. Increasing SAT scores by 200 points leads to a (200)(0.059%) = 11.8% rise in graduation rates

Interpreting SPSS regression output: Fit of the Regression


Model Summary Adjusted R Square .341 Std. Error of the Estimate 12.45% Model 1 R R Square .588 a .345

a. Predictors: (Constant), Average SAT Score

The R Square measures how closely a regression line fits the data in a scatter plot. It can range from zero (no explanatory power) to one (perfect prediction).

An R Square of 0.345 means that differences in SAT scores can explain 35% of the variation in college graduation rates.

Statistical Significance

What would the null hypothesis look like in a scatterplot?

If the independent variable has no effect on the dependent variable, the scatterplot should look random, the regression line should be flat, and its slope should be zero.

Null hypothesis: The regression coefficient for an independent variable equals zero.

Statistical Significance

Multivariate Regressions

A multivariate regression uses more than one independent variable (or confound) to explain variation in a dependent variable.
The coefficient for each independent variable reports its effect on the DV, holding constant all of the other IVs in the regression. Thought experiment: Looking at factors such as

class size, sch. feeding program, and credentials effect on academic performance of Junior High School

Multivariate Regressions

Let's perform a regression analysis using ap2000 as the outcome variable and the variables acs_JH, meals and full as predictors

(ap2000)- These measure the academic performance of the school( acs_JH)- the average class size in Junior High Sch. (meals)- the percentage of students receiving free meals - which is an indicator of poverty, and (full)- the percentage of teachers who have full teaching credentials

We expect that better academic performance would be associated with lower class size, fewer students receiving free meals, and a higher percentage of teachers having full teaching credentials.

Multivariate Regressions
Coefficients(a) Unstandardized Coefficients Model (Constant) ACS_JH 1 MEALS -3.702 .154 -.808 -24.038 .000 B 906.739 -2.682 Std. Error 28.265 1.394 -.064 Standardize d Coefficients Beta 32.080 -1.924 .000 .055 t Sig.

FULL
a Dependent Variable: AP2000

.109

.091

.041

1.197

.232

Model Summary Model 1 R .821(a) R Square .674 Adjusted R Square .671 Std. Error of the Estimate 64.153

a Predictors: (Constant), FULL, ACS_JH, MEALS An R Square of 0.674 means that differences in ACS-JH, MEALS and FULL can explain 67% of the variation in academic performance rates.

Multivariate Regressions
Coefficients(a) Unstandardized Coefficients Model (Constant) ACS_JH 1 MEALS -3.702 .154 -.808 -24.038 .000 B 906.739 -2.682 Std. Error 28.265 1.394 -.064 Standardize d Coefficients Beta 32.080 -1.924 .000 .055 t Sig.

FULL
a Dependent Variable: AP2000

.109

.091

.041

1.197

.232

The average class size (acs_JH, b=-2.682) is not significant (p=0.055), but the coefficient is negative which would indicate that larger class sizes is related to lower academic performance -- which is what we would expect.

Multivariate Regressions
Coefficients(a) Unstandardized Coefficients Model (Constant) ACS_JH 1 MEALS -3.702 .154 -.808 -24.038 .000 B 906.739 -2.682 Std. Error 28.265 1.394 -.064 Standardize d Coefficients Beta 32.080 -1.924 .000 .055 t Sig.

FULL
a Dependent Variable: AP2000

.109

.091

.041

1.197

.232

Next, the effect of meals (b=-3.702, p=.000) is significant and its coefficient is negative indicating that the greater the proportion students receiving free meals, the lower the academic performance.
Please note that we are not saying that free meals are causing lower academic performance. The meals variable is highly related to income level and functions more as a proxy for poverty. Thus, higher levels of poverty are associated with lower academic performance. This result also makes sense.

Multivariate Regressions
Coefficients(a) Unstandardized Coefficients Model (Constant) ACS_JH 1 MEALS -3.702 .154 -.808 -24.038 .000 B 906.739 -2.682 Std. Error 28.265 1.394 -.064 Standardize d Coefficients Beta 32.080 -1.924 .000 .055 t Sig.

FULL
a Dependent Variable: AP2000

.109

.091

.041

1.197

.232

Finally, the percentage of teachers with full credentials (full, b=0.109, p=.2321) seems to be unrelated to academic performance. This would seem to indicate that the percentage of teachers with full credentials is not an important factor in predicting academic performance - this result was somewhat unexpected.

Should we take these results and write them up for publication?

From these results, we would conclude that :


lower class sizes are related to higher performance, that fewer students receiving free meals is associated with higher performance, and that the percentage of teachers with full credentials was not related to academic performance in the schools.

Before we write this up for publication, we should do a number of checks to make sure we can firmly stand behind these results. We start by

getting more familiar with the data file, doing preliminary data checking, and looking for errors in the data.

Potrebbero piacerti anche