Sei sulla pagina 1di 19

LEARNING TO USE REGRESSION ANALYSIS

β 3-1
Steps in Applied Regression Analysis
Step 1: Review literature and develop
theoretical model.
Step 2: Specify model: Select independent
variables and functional form.
Step 3: Hypothesize expected signs of coefficients.
Step 4: Collect data. Inspect and clean data.
Step 5: Estimate and evaluate equation.

Step 6: Document results.


β 3-2
Step 1: Review the Literature and
Develop the Theoretical Model

• Best data analysts start with theory!


• It’s smart to review scholarly literature before doing
anything else.
• Many approaches, but searching EconLit is helpful.
• When topic has not been studied, two strategies:
1. Transfer theory from a similar topic to your topic.
2. Consult someone who works in the area.

β 3-3
Step 2: Specify the Model: Select the Independent
Variables and Functional Form

• Most important step in applied regression analysis:


specification of theoretical model.

• Specifying a model involves choosing:


1. Independent variables and how they should
be measured.
2. Functional (mathematical) form of variables.
3. Properties of stochastic error term.

β 3-4
Step 2: Specify the Model (continued)
• Any mistake in these three components leads to
specification error—a disastrous error to validity.
• Choose independent variables based on theory.
• Judgment must often be used and researchers impose
priors.

Example: Estimate demand equation for a good.


Theory suggests including prices of compliments
and substitutes.
Which ones do you choose?

β 3-5
Step 3: Hypothesize the Expected
Signs of Coefficients

• Once variables selected, hypothesize expected


signs of coefficients.

• Often, basic theory is general knowledge and


expected coefficient signs need no explanation.

• If there’s uncertainty, opposing theories should be


documented and your hypothesized sign explained.

β 3-6
Step 3: Hypothesize the Expected
Signs of Coefficients (continued)
Example: Impact of class size on student learning.
dependent variable:
Y= student score on grammar test
independent variables:
X1 = income level of student’s family
X2 = students per teacher

Notation with hypothesized signs above coefficients:


+ -
Y=b 0 + b 1 X1 + b 2 X2 + e i
(3.1)

β 3-7
Step 4: Collect the Data.
Inspect and Clean the Data

• Obtaining and preparing an original data set is difficult.

• General rule: the more observations the better.

• Reason there should be as many observations as


possible concerns the concept of degrees of freedom
(first mentioned in Section 2.4).

• With large number of degrees of freedom, every positive


error is likely balanced by a negative error.

β 3-8
Step 4: Collect the Data.
Inspect and Clean the Data (continued)
• Another question: does unit of measurement of the
variables matter?
• Short answer: No—except in interpreting scale of coef.
Example: Independent variable is measured in dollars or
thousands of dollars.
• Constant term and measures of fit are unchanged.
• Slope coefficient of the variable changes by the exact
amount to compensate for the change in units.
• Variable measured in “thousands of $”: coefficient is 50
• Variable measured in “$”: coefficient is 0.05
β 3-9
Step 4: Collect the Data.
Inspect and Clean the Data (continued)
• Always review data set for errors.
• Approaches:
• Plot the data and look for outliers.
• Look at mean, maximum and minimum of each
variable.
• Typically, data can be “cleaned” by replacing an
incorrect value with correct value.
• In extremely rare circumstances, drop an observation.
• BE CAREFUL! Mere existence of an outlier is not a
justification for dropping that observation.
β 3-10
Step 5: Estimate and Evaluate the Equation
• It can take months to complete steps 1–4!

• Once your equation is estimated, your work is not over.

• Rather, you need to evaluate.

• For example:
• How well did the equation fit the data?
• Were signs and magnitudes of coefficients expected?

• If evaluation indicates a problem, go back to step 1.

β 3-11
Step 6: Document the Results
• A standard format usually used to present results:

Ŷi =103.40 + 6.38Xi


(0.88) (3.2)
t = 7.22
2
N = 20 R = 0.73
• Number in parenthesis is standard error of coefficient.
• t-statistic is one used to test hypothesis that the true
value of the coefficient is different from zero.
• It is also important to explain the model, assumptions
and document data manipulations in written report.
β 3-12
Example: Using Regression
Analysis to Pick Restaurant Locations
• You’re hired to determine best location of the next
Woody’s restaurant (a moderately priced, 24-hour,family
restaurant chain).
• You decide to build a regression model to explain the
gross sales volume of each of the restaurants.

Step 1:Review the literature and develop theoretical


model.
• Read about restaurant industry.
• Talk to experts within the firm.

β 3-13
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 2: Specify the model: Select independent
variables and the functional form.
• You decide there are three major determinants of sales:

N = Competition: the number of direct market


competitors within a two-mile radius of the
Woody’s location
P = Population: the number of people living within a
three-mile radius of the Woody’s location
I = Income: the average household income of the
population measured in variable P
β 3-14
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 3: Hypothesize the expected signs of the
coefficients.
• N: More competition in area, the fewer customers the
location will have (negative).
• P: More people in area, the more customers the location
will have (positive).
• I: Unclear—probably positive but could be negative.

- + +?
• Thus: Yi = b 0 +b N Ni + b P Pi + b I Ii +e i
(3.3)

β 3-15
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 4: Collect the data. Inspect and clean the data.
Table 3.1: Data for Woody’s Restaurant Example

β 3-16
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 5: Estimate and evaluate the equation.
• With software and the data set, you estimate:

Ŷi =102,192 - 9075Ni + 0.355Pi +1.288I i


(2053) (0.073) (0.543) (3.4)
t = - 4.42 4.88 2.37
N = 33 R2 = 0.579
• Estimated coefficients have expected signs.
• Overall fit seems reasonable.

β 3-17
Example: Using Regression Analysis
to Pick Restaurant Locations (continued)
Step 6: Document the results.
• Equation 3.4 from Step 5 documents results—pulled from
statistical software output (like Table 3.2).

β 3-18
β

CHAPTER 3: the end

Potrebbero piacerti anche