Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Kwamina Banson
Socio-Economics Department
30th 07 - 2009
Statistical significance
Multivariate regressions
What is SPSS?
SPSS is a computer program used for a wide variety of statistical analysis. (Statistical Package for the Social Sciences) Statistical Product and Service Solutions In addition to statistical analysis, data management and data documentation are features of the base software. Statistics included in the base software:
Descriptive statistics: Cross tabulation, Frequencies, Descriptives, Explore, Descriptive Ratio Statistics Bivariate statistics: Means, t-test, ANOVA, Correlation (bivariate, partial, distances), Nonparametric tests Prediction for numerical outcomes: Linear regression Prediction for identifying groups: Factor analysis, cluster analysis (twostep, K-means, hierarchical), Discriminant
y = mx + b.
80
60
Slope or coefficient
40
Graduation Rate
Rsq = 0.3454
An SPSS regression output includes two key tables for interpreting your results:
A Coefficients table that contains the yintercept (or constant) of the regression, a coefficient for every independent variable, and the standard error of that coefficient.
A Model Summary table that gives you information on the fit of your regression.
Model 1
y = mx + b.
where m is the slope of the line and b is the yintercept
y = mx + b.
where m is the slope of the line and b is the yintercept
Model 1
The y-intercept or constant is the predicted value of the dependent variable when the independent variable takes on the value of zero. This basic model predicts that when a college admits a class of students who averaged zero on their SAT, 4.2% of them will graduate. The constant is not the most helpful statistic.
y = mx + b.
where m is the slope of the line and b is the yintercept
Model 1
The coefficient of an independent variable is the predicted change in the dependent variable that results from a one unit increase in the independent variable. A college with students whose SAT scores are one point higher on average will have a graduation rate that is 0.059% higher. Increasing SAT scores by 200 points leads to a (200)(0.059%) = 11.8% rise in graduation rates
The R Square measures how closely a regression line fits the data in a scatter plot. It can range from zero (no explanatory power) to one (perfect prediction).
An R Square of 0.345 means that differences in SAT scores can explain 35% of the variation in college graduation rates.
Statistical Significance
If the independent variable has no effect on the dependent variable, the scatterplot should look random, the regression line should be flat, and its slope should be zero.
Null hypothesis: The regression coefficient for an independent variable equals zero.
Statistical Significance
Multivariate Regressions
A multivariate regression uses more than one independent variable (or confound) to explain variation in a dependent variable.
The coefficient for each independent variable reports its effect on the DV, holding constant all of the other IVs in the regression. Thought experiment: Looking at factors such as
class size, sch. feeding program, and credentials effect on academic performance of Junior High School
Multivariate Regressions
Let's perform a regression analysis using ap2000 as the outcome variable and the variables acs_JH, meals and full as predictors
(ap2000)- These measure the academic performance of the school( acs_JH)- the average class size in Junior High Sch. (meals)- the percentage of students receiving free meals - which is an indicator of poverty, and (full)- the percentage of teachers who have full teaching credentials
We expect that better academic performance would be associated with lower class size, fewer students receiving free meals, and a higher percentage of teachers having full teaching credentials.
Multivariate Regressions
Coefficients(a) Unstandardized Coefficients Model (Constant) ACS_JH 1 MEALS -3.702 .154 -.808 -24.038 .000 B 906.739 -2.682 Std. Error 28.265 1.394 -.064 Standardize d Coefficients Beta 32.080 -1.924 .000 .055 t Sig.
FULL
a Dependent Variable: AP2000
.109
.091
.041
1.197
.232
Model Summary Model 1 R .821(a) R Square .674 Adjusted R Square .671 Std. Error of the Estimate 64.153
a Predictors: (Constant), FULL, ACS_JH, MEALS An R Square of 0.674 means that differences in ACS-JH, MEALS and FULL can explain 67% of the variation in academic performance rates.
Multivariate Regressions
Coefficients(a) Unstandardized Coefficients Model (Constant) ACS_JH 1 MEALS -3.702 .154 -.808 -24.038 .000 B 906.739 -2.682 Std. Error 28.265 1.394 -.064 Standardize d Coefficients Beta 32.080 -1.924 .000 .055 t Sig.
FULL
a Dependent Variable: AP2000
.109
.091
.041
1.197
.232
The average class size (acs_JH, b=-2.682) is not significant (p=0.055), but the coefficient is negative which would indicate that larger class sizes is related to lower academic performance -- which is what we would expect.
Multivariate Regressions
Coefficients(a) Unstandardized Coefficients Model (Constant) ACS_JH 1 MEALS -3.702 .154 -.808 -24.038 .000 B 906.739 -2.682 Std. Error 28.265 1.394 -.064 Standardize d Coefficients Beta 32.080 -1.924 .000 .055 t Sig.
FULL
a Dependent Variable: AP2000
.109
.091
.041
1.197
.232
Next, the effect of meals (b=-3.702, p=.000) is significant and its coefficient is negative indicating that the greater the proportion students receiving free meals, the lower the academic performance.
Please note that we are not saying that free meals are causing lower academic performance. The meals variable is highly related to income level and functions more as a proxy for poverty. Thus, higher levels of poverty are associated with lower academic performance. This result also makes sense.
Multivariate Regressions
Coefficients(a) Unstandardized Coefficients Model (Constant) ACS_JH 1 MEALS -3.702 .154 -.808 -24.038 .000 B 906.739 -2.682 Std. Error 28.265 1.394 -.064 Standardize d Coefficients Beta 32.080 -1.924 .000 .055 t Sig.
FULL
a Dependent Variable: AP2000
.109
.091
.041
1.197
.232
Finally, the percentage of teachers with full credentials (full, b=0.109, p=.2321) seems to be unrelated to academic performance. This would seem to indicate that the percentage of teachers with full credentials is not an important factor in predicting academic performance - this result was somewhat unexpected.
lower class sizes are related to higher performance, that fewer students receiving free meals is associated with higher performance, and that the percentage of teachers with full credentials was not related to academic performance in the schools.
Before we write this up for publication, we should do a number of checks to make sure we can firmly stand behind these results. We start by
getting more familiar with the data file, doing preliminary data checking, and looking for errors in the data.