Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Inference
Sample
Statistics
Descriptive Using
Mean, 𝑥
Analysis Statistical Theory
Variance, s2
Proportion, p
Hypotheses Testing
• Once the data are ready for analysis, i.e., out of
range, missing responses are cleared up the
goodness of the measures is established, the
researcher is ready to test the hypotheses
already developed for the study with appropriate
statistical techniques.
Hypotheses Testing
• Allows a researcher to define how safe it is to go beyond a
specific sample of data.
– Null hypothesis
– Alternative hypothesis
• Confidence level (known or given)
• Significance level (given)
• P-value (the probability of having a statistic as extreme as the
one calculated by the sample or more)
• Decision rule:
– If the p-value is less than alpha, reject the null hypothesis.
– If the p-value is greater than alpha, retain the null
hypothesis.
Hypotheses Testing
Hours spent in
criminal activities
Age
„Wage“ of cri-
minal activities Probability of Expected
Wage for legal
Other Probability of conviction if sentence
employment
income getting caught caught
– Functional form of relationship not specified
Dependent variable,
explained variable, Error term,
Independent variable, disturbance,
response variable,…
explanatory variable, unobservables,…
regressor,…
The Simple
Regression Model – Changes in y
• Interpretation of the simple linear regression model
as long as
By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit
The Simple
Regression Model
• Example 1: Soybean yield and fertilizer
Rainfall,
land quality,
presence of parasites, …
Measures the effect of fertilizer on
yield, holding all other factors fixed
e.g. intelligence …
First observation
Second observation
y
60
40
20
0 x
0 20 40 60
The Simple
Regression Model
• What does "as good as possible" mean?
• Regression residuals
• Fitted regression
Intercept
If the return on equity increases by 1 percent,
then salary is predicted to change by 18,501 $
• Causal interpretation?
The Simple
Regression Model
Deviations from regression Correlation between deviations Sample averages of y and x lie
line sum up to zero and regressors is zero on regression line
The Simple Regression Model –
Sum of Squares
• Goodness-of-Fit
"How well does the explanatory variable explain the dependent variable?"
• Measures of Variation
𝑛 𝑛
𝑆𝑆𝑅 = (𝑦𝑖 − 𝑦)2 𝑆𝑆𝐸 = (𝑦𝑖 − 𝑦𝑖 )2
𝑖=1 𝑖=1
Total sum of squares, Explained sum of squares, Residual or Error sum of squares,
represents total variation represents variation represents variation not
in dependent variable explained by regression explained by regression
The Simple
Regression Model
• Decomposition of total variation
𝑆𝑆𝑅
2
𝑆𝑆𝐸
𝑅 = =1−
𝑆𝑆𝑇 𝑆𝑆𝑇 R-squared measures the fraction of the
total variation that is explained by the
regression
The Simple
Regression Model
• CEO Salary and return on equity
… if years of education
are increased by one year
The Simple
Regression Model
• Fitted regression
For example:
Data is random and depends on particular sample that has been drawn
• Interpretation of unbiasedness
– The estimated coefficients may be smaller or larger, depending on the
sample that is the result of a random draw
– However, on average, they will be equal to the values that charac-
terize the true relationship between y and x in the population
– "On average" means if sampling was repeated, i.e. if drawing the
random sample and doing the estimation was repeated many times
– In a given sample, estimates may differ considerably from true values
The Simple
Regression Model
• Estimating the error variance
Plug in for
the unknown
The estimated standard deviations of the regression coefficients are called "standard errors".
They measure how precisely the regression coefficients are estimated.
The Multiple
Regression Model
Introduction to Regression Analysis
Multiple Regression
Analysis: Estimation
• Definition of the multiple linear regression model
Dependent variable,
explained variable, Error term,
Independent variables, disturbance,
response variable,…
explanatory variables, unobservables,…
regressors,…
Multiple Regression
Analysis: Estimation
• Motivation for multiple regression
– Incorporate more explanatory factors into the model
– Explicitly hold fixed other factors that otherwise would be in
– Allow for more flexible functional forms
Grade point average at college High school grade point average Achievement test score
• Interpretation
– Holding ACT fixed (“Controlling for…”), another point on high school grade point average is
associated with another .453 points college grade point average
– Or: If we compare two students with the same ACT, but the hsGPA of student A is one point
higher, we predict student A to have a colGPA that is .453 higher than that of student B
– Holding high school grade point average fixed, another 10 points on ACT are associated
with less than one point on college GPA
Multiple Regression
Analysis: Estimation
• Example: Explaining arrest records
Number of times Proportion prior arrests Months in prison 1986 Quarters employed 1986
arrested 1986 that led to conviction
• Interpretation:
– Proportion prior arrests: 0.5 = -.150(0.5) =-.075 = -7.5 arrests per 100 men
– Months in prison: 12 = -.034(12) = -0.408 arrests for given man
– Quarters employed: 1 = -.104(1) = -10.4 arrests per 100 men
Multiple Regression
Analysis: Estimation
• OLS Estimation of the multiple regression model
• Random sample
• Regression residuals
• Remarks on MLR.3
– The assumption only rules out perfect collinearity/correlation bet-ween
explanatory variables; imperfect correlation is allowed
– If an explanatory variable is a perfect linear combination of other
explanatory variables it is superfluous and may be eliminated
– Constant variables are also ruled out (collinear with intercept)
Detecting Multicollinearity
= 0 in the population
No problem because .
Other factors
Interaction term
– A comparison between the R-squared of both models would be unfair to the first model because the first model contains fewer
parameters
– In the given example, even after adjusting for the difference in degrees of freedom, the quadratic model is preferred
Variable Selection:
Stepwise Regression
The user first identifies the response, y, and the set of potentially
important independent variables, x1, x2, … , xk, where k is generally
large. The response and independent variables are then entered
into the computer software, and the stepwise procedure begins
(aka mixed selection).
Linear probability
model (LPM)
• F-value has an associated p-value that allows you to run the test easily;
it is reported in most regression outputs.
Example: Estimation and
Prediction: CI vs. PI
Homework Assignment (see in R
script!!!)
1
W