Sei sulla pagina 1di 1

Multiple Linear Regression Analysis

When plotting graduation percentage versus the other four variables on scatterplots, I learned that the
median sat scores had a positive relationship with graduation percentage, while acceptance rate had a
negative relationship with graduation percentage. The r-squared values for these relationships were weak.
The scatterplots of graduation percentage versus expenditures per student and percentage of students in
the top ten percent of their high school class had a horizontal pattern. Using descriptive statistics, I was
able to determine that the SAT scores had a standard deviation of 62.67 and is centered around a mean of
1263.1. This means that the data is significantly spread out around the mean. The acceptance rate has a
mean of 38% and a range of 50%, with a standard deviation of 13.37%. The acceptance rate data has a
maximum much farther away from the mean than the minimum. For expenditures per student, the mean
expenditure is about $30,060 with standard deviation of $15,463. When I assessed multicollinearity, no
variables had correlations with each other of greater than .7 so no variables were removed due to
multicollinearity.
When running the simple regression models for graduation percentage versus the other variables, it
became apparent that there were problems with a few of the underlying variables. First, the top 10% of
high school students variable failed to reject the null hypothesis at a 95% confidence level with a p-value
of .3421 and an f-value under 1. The residual plot also had a mean that did not equal 0 and uneven scatter.
This shows that this variable does not have much of a relationship with graduation percentage. Moreover,
expenditure on student did not seem to have much of a relationship with graduation percentage either
with an f-statistic under 1 and p-value of .77. The residual plot also showed uneven scatter and a failure to
have a mean of 0. As a result of the residual plot issues and p-values, these two variables were removed
prior to multiple linear regression.
Median SAT and acceptance rate were put into multiple linear regression and had p-values under .05,
allowing us to reject the null hypothesis. These variables had an r-square of .388 indicating that 38.8% of
variation in graduation percentage could be attributed to median sat and acceptance rate, which is not
strong at all. The overall significance f was far below .05, indicating that it is almost impossible to get the
result due to chance. When attempting the dummy variable method using multiple regression to include
liberal arts and university into the regression equation all variables rejected the null hypothesis at a 5%
significance level. The f-statistic of 11.75 shows that these variables are 11.75 times more likely to
explain graduation percentage than the unexplained variation. The r-squared value improved to .439,
indicating a stronger, but still weak relationship. Thus, the best regression equation will include the
dummy variable, median SAT, and acceptance rate, but not the other eliminated variables: Graduation %
= 35.1 + 3.41 (Liberal Arts) + .043 (median SAT) 20.95 (Acceptance Rate). This means that every one
point increase in median sat will increase graduation % by .043, schools that are liberal arts schools
increase graduation % by 3.41, and every 1% decrease in acceptance rate decreases graduation percentage
by 2.095.
A curvilinear model was superior to the linear multiple regression model. A polynomial model was better,
but the dummy variable and acceptance rate squared had to be eliminated due to them having too high of
a p-value when median SAT and acceptance rate were squared. Moving to the polynomial method (with
median sat, median sat squared, and acceptance rate) increased the r-square to .5488 and the f-statistic to
18.25. The residuals all checked out as well and yielded the following equation: Graduation % = -904.74
+ 1.56 (median SAT) - .0006 (median sat ^2) 26.43 (Acceptance Rate)
This means that every one point increase in median sat will increase graduation % by 1.56, every 1
increase in median sat squared will decrease graduation rate by .0006, and every 1% decrease in
acceptance rate decreases graduation percentage by 2.643. For a median SAT score of 1210 and
acceptance rate of 23%, this equation yields an anticipated graduation percentage of 86.35%. Other
variables that could be included to improve the prediction include amount of time spent on campus,
average number of classes taken at the university, and average win total of the college basketball team.

Potrebbero piacerti anche