Sei sulla pagina 1di 16

Regression

Analysis
Regression Analysis
Regression analysis is almost certainly the most important tool at the
econometrician’s disposal.
Regression is concerned with describing and evaluating the
relationship between a given variable and one or more other
variables.
More specifically, regression is an attempt to explain movements in a
variable by reference to movements in one or more other variables.
Steps of Regression Analysis
Collect data on the variables in question
Specify the form of the equation relating the variables
Estimate the equation coefficients
Evaluate the accuracy of the equation
Concepts
Population
◦ Sample
◦ In Sample Bias
Types of Data
◦ Time Series Data
◦ Cross Sectional Data
◦ Panel Data
Concepts
Ordinary Least-Squares Regression computes
coefficient values that give the smallest sum of
squared errors
Y = a + bx + Et – Population Equation
Ŷ = α + βx + εt – Estimated Equation
Dependent vs Independent
Variables
Names for y
◦ Dependent variable
◦ Regressand
◦ Effect variable
◦ Explained variable

Names for the x


◦ Independent variables
◦ Regressors
◦ Causal variables
◦ Explanatory variables
Concepts
Best Fit Line
Intercept (α). If the curve in question is given as y = f(x), the y-
coordinate of the y-intercept is found by calculating f(0). Functions
which are undefined at x = 0 have no y-intercept.
α = Σy/n – β Σx/n
Concepts
Slope (β). The slope of a line is a number that describes both the
direction and the steepness of the line.
◦ The direction of a line is either increasing, decreasing, horizontal or vertical.
◦ A line is increasing if it goes up from left to right. The slope is positive, i.e. m>0.
◦ A line is decreasing if it goes down from left to right. The slope is negative, i.e. m<0.
◦ If a line is horizontal the slope is zero. This is a constant function.
◦ If a line is vertical the slope is undefined
◦ The steepness, incline, or grade of a line is measured by the absolute value of
the slope. A slope with a greater absolute value indicates a steeper line
β = (nΣxy - (Σx)(Σy)) / (nΣx2 - (Σx)2)
Example
x y xy x2
15 50 750 225
10 60 600 100
25 35 875 625
35 30 1050 1225
45 25 1125 2025
50 20 1000 2500
∑x = 180 ∑y = 220 ∑xy = 5400 ∑x2 = 6700
Concepts
β = (nΣxy - (Σx)(Σy)) / (nΣx2 - (Σx)2)
β = (6 x 5400 – (180 x 220)) / ((6 x 6700) – 180 2) = -0.9231

α = Σy/n – β Σx/n
α = 220 / 6 – (-0.9231 x (180 / 6)) = 64.359
Example
70

60

50 f(x) = - 0.92x + 64.36


R² = 0.94
40

30

20

10

0
5 10 15 20 25 30 35 40 45 50 55
Concepts
Residuals / Error Term
◦ A statistical error (or disturbance) is the amount by which an observation
differs from its expected value i.e. mean, For example, if the mean height in a
population of 21-year-old men is 1.75 meters, and one randomly chosen man
is 1.80 meters tall, then the "error" is 0.05 meters
◦ Residual is the difference between the observed value of the dependent
variable (y) and the predicted value (ŷ) is called the residual. Consider the
previous example with men's heights and suppose we have a random sample
of n people. The sample mean could serve as a good estimator of the
population mean. Then we have:
◦ The difference between the height of each man in the sample and the unobservable population mean
is a statistical error, whereas
◦ The difference between the height of each man in the sample and the observable sample mean is a
Y X Y = 5 + 9x Residual
100 20 185 -85
120 30 275 -155
90 15 140 -50
Concepts
Total Sum of Squared Errors (TSS)
Sum of Squared Residuals (SSR)
Sum of Squared Errors (SSE)
Concepts
Hypothesis. Guess whether something is true or false
◦ Null hypothesis (β = 0)
◦ Alternate hypotheses (β ≠ 0)

t Statistics (β / SE). The t statistic is the coefficient divided by its


standard error. The standard error is an estimate of the standard
deviation of the coefficient, the amount it varies across cases. It can be
thought of as a measure of the precision with which the regression
coefficient is measured (more than 1.96 or 2 then significant)
Significance Level vs Confidence Level
Concepts
P Value. Probability (if less than 5% null hypothesis is rejected i.e. it does not
falls within the range of what would happen 95% of the time)
R Squared. R-Squared is known as coefficient of determination which is
statistically derived from the regression equation to quantify model
performance. The value of R-squared ranges from 0 to 100 percent.
R2 = 1 – RSS / TSS
◦ If your model fits the observed dependent variable values perfectly, R-squared is 1.0 (and
you, no doubt, have made an error… perhaps you've used a form of y to predict y)
◦ More likely, you will see R-squared values like 0.49, for example, which you can interpret by
saying: this model explains 49% of the variation in the dependent variable

Potrebbero piacerti anche