Sei sulla pagina 1di 228

MicroEconometrics

Pavel Čížek
(P.Cizek@uvt.nl)

Fall 2016

Econometrics Slide 1
Introduction

Introduction
⊲ Introduction
Martin Salm (Room K 642, Email: M.Salm@uvt.nl)
Course structure Pavel Čížek (Room K 641, Email: P.Cizek@uvt.nl)
Outline
Reduced models
Structural models
• MicroEconometrics
Estimation
◦ linear and nonlinear regression models
Binary choice

Density estimation
◦ estimation techniques for single-equation models
Regression est.

Semiparametrics
• Main book:
Discrete choice
◦ A. C. Cameron and P. K. Trivedi (2005) Microeconometrics:
Censored data
Methods and Applications, Cambridge University Press.
Final thoughts

• Classes: 14 lectures + 4 assignments


• Exam: 80% written exam, 20% assignments

Econometrics Slide 2
Course structure

Introduction
Econometric models
Introduction
⊲ Course structure
Outline • linear models and causality
Reduced models
Structural models • duration models
Estimation • nonlinear models for discrete or limited responses
Binary choice

Density estimation Methodology


Regression est.

Semiparametrics
• linear models and their estimation
Discrete choice • maximum likelihood and generalized method of moments
Censored data • non- and semiparametric estimation
Final thoughts

Econometrics Slide 3
Outline

Introduction
Introduction
Topics in the second half of the course
Course structure
⊲ Outline • parameter estimation of reduced-form models
Reduced models
Structural models
◦ including binary-choice models
Estimation

Binary choice
• nonparametric and semiparametric estimation
Density estimation

Regression est. ◦ including binary-choice models


Semiparametrics

Discrete choice • discrete and limited responses models


Censored data • sample selection models
Final thoughts

Econometrics Slide 4
Reduced-form models

Introduction
Introduction
Estimation methodology designed for the reduced form models with
Course structure given statistical properties; for example,
Outline
⊲ Reduced models
Structural models yi = x⊤
i β + εi
Estimation

Binary choice

Density estimation • basic assumption E(εi |xi ) = 0


Regression est. • full rank condition E(xi x⊤
i )>0
Semiparametrics
• the model does not restrict what is yi and what is xi ;
Discrete choice
economic modeling determines what is yi and what is xi
Censored data

Final thoughts Without further structure, it is possible to study E(yi |xi ) = x⊤


i β,
that is,
• estimate parameters β
• test their significance or hypothesis regarding β
• state claims about correlations, not causality

Econometrics Slide 5
Structural models

Introduction
Models can have not only statistical, but also economic structure
Introduction
Course structure describing how economic behavior, institutions, and laws affect
Outline
Reduced models relationship between variables yi and xi ; for example,
⊲ Structural models

Estimation
• the ith firm’s production yi , labor input li , and capital ki can be
Binary choice related by the (deterministic) Cobb-Douglas production function:
Density estimation

Regression est. yi = Ai · liα · kiβ


Semiparametrics

Discrete choice
◦ α and β interpreted as elements of the production function
Censored data ◦ the economic validity can be studied: are firms operating
Final thoughts efficiently under state ownership or under regulators

• reduced form: ln yi = ln A + α ln li + β ln ki + εi , where εi

◦ captures measurement errors


◦ contains an unobservable part of the technology Ai

Is this the only possible reduced form?


Econometrics Slide 6
Reduced-form versus structural models

Introduction
Introduction
Structural models
Course structure
Outline • relate to economic theory
Reduced models
⊲ Structural models • facilitate interpretation
Estimation
Reduced-form models
Binary choice

Density estimation • account for heterogeneity


Regression est. • facilitate estimation
Semiparametrics

Discrete choice
Identification
Censored data • For a given structural model, is there only one reduced-form
Final thoughts
model?
• Is a given reduced-form model corresponding to multiple
structural models?
• Are all considered structural models rendering the same values
of (some) parameters in the reduced-form model?
(e.g., consider variation in Ai [in]dependent of i, li , or ki )
Econometrics Slide 7
Introduction

⊲ Estimation
Method of moments
GMM
Maximum likelihood
General MLE
Comparison
Quasi-MLE
Quantile regression
Asymptotics

Binary choice Parametric estimation methods


Density estimation

Regression est.

Semiparametrics

Discrete choice

Censored data

Final thoughts

Econometrics Slide 8
Method of moments - linear regression

Introduction
Random sample (x1 , y1 ), . . . , (xn , yn ) following model
Estimation
⊲ Method of moments
GMM yi = x⊤
i β + εi
Maximum likelihood
General MLE
Comparison
Quasi-MLE • first conditional moment of εi : E(εi |xi ) = 0
Quantile regression
Asymptotics • unconditional moment equation:
Binary choice
xi E(εi |xi ) = 0 ⇒ E{E(xi εi |xi )} = E(xi εi ) = 0
Density estimation
• population equation: E(xi εi ) = E{xi (yi − x⊤
i β)} = 0
Regression est.
• sample analog equation (to be solved):
Semiparametrics
Pn
Discrete choice n−1 Pn
i=1 x i (y i − x ⊤ β) =
i P
−1 −1 n ⊤β = 0
Censored data n i=1 x i yi − n i=1 x i x i
Final thoughts
• solution
" n
#−1 " n
#
X X
β̂n = n−1 xi x⊤
i n−1 xi yi
i=1 i=1

Econometrics Slide 9
Generalized method of moments

Introduction
Generalized method of moments (GMM): define
Estimation
Method of moments • data wi (e.g., wi = (yi , xi )⊤ = (yi , ki , li )⊤ ) of sample size n
⊲ GMM
Maximum likelihood • parameters of interest θ ∈ Θ ⊆ Rp ; its true value θ0 solves
General MLE
Comparison • moment conditions g(wi , θ) : Rp → Rk such that
Quasi-MLE
Quantile regression
Asymptotics
E{g(wi , θ0 )} = 0
Binary choice

Density estimation
(e.g., g(wi , θ) = xi (yi − x⊤
i θ) for E[x i (yi − x ⊤ θ)] = 0)
i
Regression est.

Semiparametrics
The GMM estimator minimizes wrt. θ
Discrete choice " n
#⊤ " n
#
Censored data 1 X 1 X
Qn (θ) = g(wi , θ) Wn g(wi , θ)
Final thoughts
n n
i=1 i=1

• k × k weighting matrix Wn , possibly random (estimated)

Econometrics Slide 10
Maximum likelihood estimation

Introduction
Linear regression model: yi = x⊤ 2
Estimation
i β0 + εi , where εi ∼ N (0, σ ) iid
Method of moments
GMM
• distribution of yi is known conditionally on xi :
⊲ Maximum likelihood
2
General MLE
Comparison yi = x⊤
i β 0 + ε i ⇒ yi |x i ∼ N (x ⊤
i β 0 , σ )
Quasi-MLE
Quantile regression
Asymptotics • the likelihood contribution is the value of the density φ of yi |xi
Binary choice
• the likelihood is the conditional density of {yi }n n
i=1 given {xi }i=1
Density estimation

Regression est.
Ln (β, σ 2 ) = f (y1 , . . . , yn |x1 , . . . , xn ; β, σ 2 )
Semiparametrics Yn
Discrete choice = φ(yi |xi ; β, σ 2 )
Censored data
Xi=1n
2
ln Ln (β, σ ) = ln φ(yi |xi ; β, σ 2 )
Final thoughts i=1

(densities of N (0, σ 2 ) and N (0, 1) are denoted φσ (·) and φ(·))

Econometrics Slide 11
Maximum likelihood estimation

Introduction
• the normal density function of yi |xi ∼ N (x⊤ 2
Estimation
i β, σ )
Method of moments  2

GMM
2 1 (yi − x⊤
i β)
⊲ Maximum likelihood φ(yi |xi ; β, σ ) = √ exp −
General MLE
2πσ 2 2σ 2
Comparison
Quasi-MLE
Quantile regression
• the log-likelihood function for yi |xi ∼ N (x⊤ 2
Asymptotics i β, σ )
Binary choice n
X
Density estimation
ln Ln (β, σ 2 ) = l(yi |xi ; β, σ 2 )
Regression est.
i=1
Semiparametrics
Xn
Discrete choice
= ln φ(yi |xi ; β, σ 2 )
Censored data
i=1
Final thoughts n 
X 2

1 (yi − x⊤ β)
= − ln(2π) + ln(σ 2 ) + i
2 σ2
i=1

• maximizing this objective function is equivalent to least squares


Econometrics Slide 12
Maximum likelihood estimation – general case

Introduction
Maximum likelihood estimation (MLE): define
Estimation
Method of moments • data wi = (yi , xi )⊤ (e.g., (yi , xi )⊤ = (yi , ki , li )⊤ ) of size n
GMM
Maximum likelihood • assume the true conditional density fy (yi |xi ; θ0 ) is known up to
⊲ General MLE
Comparison some parameters of interest θ ∈ Θ ⊆ Rp
(e.g., θ = (β, σ 2 )⊤ based on the model parameters β and
Quasi-MLE
Quantile regression
Asymptotics
the parameters of the density of errors εi )
Binary choice

Density estimation • identification: true parameter θ0 maximizes E[ln f (yi |xi ; θ)]
Regression est.

Semiparametrics
• log-likelihood for observations i: l(wi , θ) = ln f (yi |xi ; θ)
Discrete choice Pn
• log-likelihood function: ln Ln (θ) = n−1 i=1 l(wi , θ)
Censored data

Final thoughts
• maximum likelihood estimate
n
X
1
θ̂n = arg maxθ ln Ln (θ) = arg maxθ l(wi , θ)
n
i=1

Econometrics Slide 13
Comparison

Introduction
Assuming the correct moment equations, GMM is
Estimation
Method of moments • consistent and asymptotically normal
GMM
Maximum likelihood • linear regression: E(εi |xi ) = 0
General MLE
⊲ Comparison
Quasi-MLE
Assuming the correct parametric distribution, MLE is
Quantile regression
Asymptotics • consistent, asymptotically normal, and
Binary choice can be asymptotically efficient with as. variance I −1 (θ), where
Density estimation I(θ) = E[−∂ ln f (yi |xi ; θ)/∂θ∂θ⊤ ]
• linear regression: εi ∼ N (0, σ 2 )
Regression est.

Semiparametrics

Discrete choice Assumptions can be


Censored data
• relatively weak in the case of GMM
Final thoughts
• rather strict in the case of MLE
Inference – in both cases, one can apply, for instance
• the Wald, Likelihood Ratio, Lagrange multiplier, and t-test

Econometrics Slide 14
Quasi-maximum likelihood estimation

Introduction
Linear regression model: yi = x⊤ 2
Estimation
i β0 + εi , where εi ∼ N (0, σ ) iid
Method of moments
GMM
• distribution of yi is known conditionally on xi :
Maximum likelihood
2
General MLE
Comparison yi = x⊤
i β 0 + ε i ⇒ yi |x i ∼ N (x ⊤
i β 0 , σ )
⊲ Quasi-MLE
Quantile regression
Asymptotics • the likelihood is the conditional density of {yi }n n
i=1 given {xi }i=1
Binary choice
Xn
2
Density estimation
ln Ln (β, σ ) = ln φ(yi |xi ; β, σ 2 )
Regression est.
i=1
Semiparametrics
• quasi-MLE estimator solves the first-order conditions
Discrete choice

Censored data
∂ ln Ln (β, σ 2 ) Xn ∂ ln φ(yi |xi ; β, σ 2 )
Final thoughts = =0
∂(β ⊤ , σ 2 )⊤ i=1 ∂(β ⊤ , σ 2 )⊤

◦ have a form of moment conditions


◦ defines estimator consistent without εi ∼ N (0, σ 2 ) iid
Econometrics Slide 15
Quasi-maximum likelihood estimation

Introduction
• the normal density function of yi |xi ∼ N (x⊤ 2
Estimation
i β, σ )
Method of moments  2

GMM
2 1 (yi − x⊤
i β)
Maximum likelihood φ(yi |xi ; β, σ ) = √ exp −
General MLE
2πσ 2 2σ 2
Comparison
⊲ Quasi-MLE
Quantile regression
• the log-likelihood function for yi |xi ∼ N (x⊤ 2
Asymptotics i β, σ )
Binary choice n 
X 2

1 (yi − x⊤
i β)
Density estimation
ln Ln (β, σ 2 ) = − ln(2π) + ln(σ 2 ) +
Regression est. 2 σ2
i=1
Semiparametrics

Discrete choice
• the first-order conditions
Censored data
n
Final thoughts ∂ ln Ln (β, σ 2 ) 1 X
= 2 (yi − x⊤
i β)xi = 0
∂β σ
i=1

• solving this equation is equivalent to least squares estimator

Econometrics Slide 16
Quasi-maximum likelihood estimation

• the Laplace density function of yi |xi ∼ DExp(x⊤


Introduction

Estimation
i β, 1)
Method of moments
GMM 1 n o
2 ⊤
Maximum likelihood φ(yi |xi ; β, σ ) = exp −|yi − xi β|
General MLE 2
Comparison
⊲ Quasi-MLE
Quantile regression • the log-likelihood function for yi |xi ∼ DExp(x⊤
i β, 1)
Asymptotics
n 
X 
Binary choice
2 1
Density estimation ln Ln (β, σ ) = ln − yi − x⊤ β
i
2
Regression est. i=1
Semiparametrics

Discrete choice • the first-order conditions (defining quasi-MLE)


n  
Censored data

∂ ln Ln (β, σ 2 ) X 1
− I(yi − x⊤
Final thoughts
=2 i β ≥ 0) xi = 0
∂β 2
i=1

• this objective function defines least absolute deviation estimator

Econometrics Slide 17
Quantile regression

Linear regression model: yi = x⊤


Introduction
i β0 + εi
Estimation
Pn ⊤ β)2 is consistent if E(ε |x ) = 0
Method of moments
GMM
• least squares i=1 (y i − x i i i
Maximum likelihood ⇒ identifies expectation E(yi |xi ) = x⊤i β
General MLE
Comparison • least absolute deviation estimator
minimizes
Quasi-MLE n
X 1 n
X 1
⊲ Quantile regression ⊤
Asymptotics yi − x⊤
i β = − I(yi − x ⊤
i β ≤ 0) yi − x i β
2 2
Binary choice i=1 i=1
Density estimation

Regression est.
and is consistent if med(εi |xi ) = 0
Semiparametrics ⇒ identifies conditional median med(yi |xi ) = x⊤
i β
Discrete choice • quantile regression estimator minimizes
Censored data
n
X

Final thoughts |τ − I(yi − x⊤
i β ≤ 0)| yi − x i β
i=1

and is consistent if Qτ (εi |xi ) = 0


⇒ identifies conditional quantile Qτ (yi |xi ) = x⊤
i β

Econometrics Slide 18
Quantile regression

Introduction
Quantile regression (QR) in linear regression model
Estimation
Method of moments • QR is based on assumption Qτ (εi |xi ) = 0
GMM
Maximum likelihood • QR identifies conditional quantile Qτ (yi |xi ) = x⊤
i β
General MLE
Comparison • QR estimator minimizes
Quasi-MLE n
X
⊲ Quantile regression
Asymptotics ρτ (yi − x⊤
i β)
Binary choice i=1
Density estimation

Regression est.
where the check function ρτ (z) = [τ − I(z < 0)] · z
Semiparametrics • case of τ = 1/2: median regression or least absolute deviation
Discrete choice estimation as ρτ (z) = |z|/2
Censored data

Final thoughts

Econometrics Slide 19
Quantile regression – simulated examples

Introduction

Estimation
Method of moments
Normal data Log-normal data

20
4
GMM
Maximum likelihood

15
2
General MLE

10
0
Comparison Y

Y
Quasi-MLE

5
-2

⊲ Quantile regression

0
Asymptotics
-4

Binary choice -3 -2 -1 0 1 2 -3 -2 -1 0 1 2
X X
Density estimation
Normal data with quantiles Log-normal data with quantiles
Regression est.

20
4

15
Semiparametrics
2

10
Discrete choice
0
Y

Y
Censored data

5
-2

Final thoughts

0
-4

-3 -2 -1 0 1 2 -3 -2 -1 0 1 2
X X

Econometrics Slide 20
Quantile regression – simulated examples

Introduction

Estimation
Method of moments
Normal data Heteroscedastic data
GMM

2
Maximum likelihood

5
General MLE
Comparison
0
Quasi-MLE
Y

Y
⊲ Quantile regression

0
-2

Asymptotics
-4

Binary choice

-5
Density estimation
-6

-2 0 2 4 -2 0 2 4
Regression est.
X X

Semiparametrics Normal data with quantiles Heteroscedastic data with quantiles


5

Discrete choice

5
Censored data

Final thoughts
0
Y

Y
0
-5

-5

-2 0 2 4 -2 0 2 4
X X
Econometrics Slide 21
Quantile regression: Engel curve

Introduction

Estimation foreach q of numlist 0.25 0.5 0.75 {


Method of moments
GMM
qreg foodexp income, quantile(‘q’)
Maximum likelihood }
General MLE
Comparison -------------------------------------------------
Quasi-MLE
⊲ Quantile regression qr <- rq(foodexp ~ income),tau=c(0.25,0.5,0.75))
Asymptotics
summary(qr)
Binary choice

Density estimation

Regression est.

Semiparametrics

Discrete choice
Coefficients:
Censored data

Final thoughts tau= 0.25 tau= 0.50 tau= 0.75


(Intercept) 95.4835396 81.4822474 62.3965855
income 0.4741032 0.5601806 0.6440141

Econometrics Slide 22
Quantile regression: Engel curve

Introduction

Estimation
Method of moments
GMM

2000
Maximum likelihood
General MLE
Comparison
Quasi-MLE
⊲ Quantile regression 1500
Asymptotics

Binary choice
Food Expenditure

Density estimation

Regression est.
1000

Semiparametrics

Discrete choice

Censored data
500

Final thoughts
mean (LSE) fit
median (LAE) fit

1000 2000 3000 4000 5000

Household Income

Econometrics Slide 23
Quantile regression

If in the linear regression model yi = x⊤


Introduction

Estimation
i β0 (τ ) + εi with
Method of moments Qτ (εi |xi ) = 0
GMM
Maximum likelihood • data form random sample (yi , xi )n
i=1
General MLE
Comparison • conditional distribution functions Fi (yi |xi ) are absolutely
Quasi-MLE
Quantile regression continuous with continuous densities fi (yi |xi ) uniformly
⊲ Asymptotics
bounded away from 0 and ∞ at Qτ (yi |xi ), i = 1, . . . , n
Binary choice

Density estimation
• matrices D0 = E(xi x⊤
i ) and
Regression est. D1 (τ ) = E{f (Qτ (yi |xi ))xi x⊤
i } are positive definite
Semiparametrics QR
then the quantile regression estimator β̂n (τ ) is consistent and
Discrete choice

Censored data √  QR 
d
Final thoughts
n β̂n (τ ) − β0 (τ ) → N (0, τ (1 − τ )D1−1 D0 D1−1 )

Econometrics Slide 24
Quantile regression

Introduction
Buchinsky (1998) Recent Advances in Quantile Regression Models:
Estimation
Method of moments
A Practical Guideline for Empirical Research. The Journal of Human
GMM Resources 33(1), 88–126.
Maximum likelihood
General MLE
Comparison • properties of quantile regression estimator
Quasi-MLE
Quantile regression • computation of quantile regression estimator
⊲ Asymptotics
• inference and tests based on quantile regression
Binary choice
(test of homoscedasticity, symmetry,
Density estimation

Regression est.
• application to Current Population Survey data (1973–1993)
Semiparametrics • censored quantile regression (discussed later)
Discrete choice

Censored data

Final thoughts

Econometrics Slide 25
Introduction

Estimation

⊲ Binary choice
Binary choice
Probit and logit
MLE
Marginal effects
Measures of fit
Application
Heteroscedasticity
Simulation Binary choice models
Semiparametrics
MSC
Single index
Semiparametric LS
Klein and Spady
Implementation
Average derivative
Outlook

Density estimation

Regression est.

Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 26
Introduction to binary choice models

Introduction
Binary choice = binary response: a single discrete decision that can
Estimation
be characterized by values 0 and 1
Binary choice
⊲ Binary choice • traditionally, y = 1 = “yes, success” and y = 0 = “no, failure”
Probit and logit
MLE
• example: labor force participation, university education, foreign
Marginal effects
Measures of fit direct investment, public vs. private transport...
Application
Heteroscedasticity
Simulation Typically derived from a structural model for the latent variable y ∗
Semiparametrics
MSC
Single index
• y ∗ represents monetary utility, profit, ...
Semiparametric LS
Klein and Spady
• example: seller = price - purchasing value of an object,
Implementation
buyer = (monetary) utility from the object - price
Average derivative
Outlook (nontrivial in the cases of education, job, ...)
Density estimation

Regression est. Factors influencing the decision are available


Semiparametrics
• explanatory variables x = (x1 , . . . , xp )⊤
Discrete choice

Censored data

Final thoughts
Econometrics Slide 27
Introduction to binary choice models

Introduction
Typically derived from a structural model for the latent variable y ∗
Estimation

Binary choice • two choices a and b with choice characteristics za and zb


⊲ Binary choice
Probit and logit • individual characteristics w
• choice a: utility Ua = w ⊤ δa + za⊤ γa + ǫa
MLE
Marginal effects

• choice b: utility Ub = w ⊤ δb + zb⊤ γb + ǫb


Measures of fit
Application
Heteroscedasticity
Simulation
Semiparametrics Utilities Ua and Ub unobservable, only choice is observed:
MSC
Single index • choose a if Ua ≥ Ub ⇔ y ∗ = Ua − Ub ≥ 0: denoted y = 1
Semiparametric LS
Klein and Spady • choose b if Ua < Ub ⇔ y ∗ = Ua − Ub < 0: denoted y = 0
Implementation
Average derivative • Ua − Ub = w ⊤ (δa − δb ) + za⊤ γa − zb⊤ γb + ǫa − ǫb = x⊤ β + ε
Outlook

Density estimation
• regression model characterizes expectation (for individual i)
Regression est.

Semiparametrics
E(yi |xi ) = P (yi = 1|xi ) · 1 + P (yi = 0|xi ) · 0 = P (yi = 1|xi )
Discrete choice = P (yi∗ = Uia − Uib ≥ 0|xi )
= P (x⊤ ⊤ ⊤
Censored data
i β + ε i ≥ 0|x i ) = P (x i β + ε i ≥ 0|x i β)
Final thoughts
Econometrics Slide 28
Introduction to binary choice models

Introduction
What can be identified in
Estimation

Binary choice
• choice a: utility Ua = w ⊤ δa + za⊤ γa + ǫa ?
⊲ Binary choice
Probit and logit
• choice b: utility Ub = w ⊤ δb + zb⊤ γb + ǫb ?
MLE
Marginal effects The reduced-form model E(yi |xi ) = P (x⊤ i β + ε i ≥ 0|x ⊤ β)
i
Measures of fit
Application corresponds to the difference in utilities Ua − Ub :
Heteroscedasticity
Simulation
Semiparametrics y ∗ = w⊤ (δa − δb ) + za⊤ γa − zb⊤ γb + ǫa − ǫb = x⊤ β + ε
MSC
Single index
Semiparametric LS
Klein and Spady • δa and δb cannot be identified separately,
Implementation
Average derivative only their difference δa − δb can be identified
Outlook
• what about identification of γa and γb (often assumed γa = γb )
Density estimation

Regression est.
◦ if za and zb contain different variables/quantities
Semiparametrics
◦ if za and zb contain common variables/quantities
Discrete choice

Censored data
◦ do we have a choice?
Final thoughts
Econometrics Slide 29
Probit and logit

Introduction
Suppose that the latent utility yi∗ follows the linear model
Estimation

Binary choice
Binary choice
yi∗ = x⊤
i β0 + εi , εi ∼ F
⊲ Probit and logit
MLE
Marginal effects
Measures of fit
• observed response is binary (decision, choice, success, ...):
Application
Heteroscedasticity
yi = I(yi∗ > 0) = I(x⊤ i β + εi > 0)
Simulation
Semiparametrics
• εi is symmetrically distributed and has zero mean: Eεi = 0
MSC
Single index
• identification by normalization: σ 2 = varεi = 1, for example
(yi = I(x⊤ ⊤ σ
i β + εi > 0) = I(xi β/σ + εi /σ > 0) = yi )
Semiparametric LS
Klein and Spady
Implementation • regression function if εi ∼ F :
Average derivative
Outlook

Density estimation E(yi |xi ) = 1 · P (yi = 1|xi ) + 0 · P (yi = 0|xi )


Regression est. P (yi = 1|xi ) = P (yi∗ > 0|xi ) = P (x⊤
i β0 + εi > 0|xi )
= P (x⊤ ⊤
Semiparametrics
i β 0 > −ε i |x i ) = F (x i β0 )
Discrete choice

Censored data [ = P (−x⊤


i β 0 < ε i |x i ) = 1 − F (−x ⊤
i β0 )]
Final thoughts
Econometrics Slide 30
Probit and logit

Binary-choice model P (yi = 1|xi ) = F (x⊤


Introduction

Estimation
i β)
Binary choice • F is completely specified (does not depend on parameters)
Binary choice
⊲ Probit and logit • probit = F is the standard normal distribution
MLE
Marginal effects ˆ t
Measures of fit
Application F (t) ≡ Φ(t) = φ(s)ds
Heteroscedasticity −∞
Simulation
Semiparametrics
MSC (σ normalized to 1)
Single index
Semiparametric LS • logit = F is the (standard) logistic distribution with
Klein and Spady
Implementation
the location parameter 0 and scale parameter 1
Average derivative
Outlook exp(t) 1
Density estimation
F (t) ≡ Λ(t) = =
1 + exp(t) 1 + exp(−t)
Regression est.

Semiparametrics
(σ normalized to π/ 3 ≈ 1.814)
Discrete choice

Censored data

Final thoughts
Econometrics Slide 31
Maximum likelihood estimation

MLE for probit and logit: P (yi = 1|xi ) = F (x⊤


Introduction

Estimation
i β), F known
Binary choice • identification requires also E(xi x⊤
i ) to be non-singular:
P (x⊤ ⊤ β ) > 0 implies P [F (x⊤ β) 6= F (x⊤ β )] > 0
Binary choice
Probit and logit i β =
6 x i 0 i i 0
⊲ MLE
Marginal effects
if F is strictly monotonic and completely specified
Measures of fit
Application
• likelihood contribution:
Heteroscedasticity
Simulation
Semiparametrics
L(β|yi , xi ) = P (yi = 1|xi )yi P (yi = 0|xi )1−yi
yi 1−yi
= F (x⊤ ⊤
MSC
Single index i β) {1 − F (xi β)}
Semiparametric LS
Klein and Spady
Implementation • log-likelihood contribution:
Average derivative
Outlook l(yi , xi ; β) = yi ln F (x⊤
i β) + (1 − yi ) ln{1 − F (x ⊤ β)}
i
Density estimation • log-likelihood function:
Regression est.
n
X
Semiparametrics

Discrete choice
ln Ln (β) = [yi ln F (x⊤ ⊤
i β) + (1 − yi ) ln{1 − F (xi β)}]
Censored data i=1
Final thoughts
Econometrics Slide 32
Probit Φ(x⊤ ⊤
i β̂n ) and logit Λ(xi β̂n ): coronary heart disease

Introduction
probit chd age; predict probit, p
Estimation
--------------------------------------------------
Binary choice
Binary choice z <- glm(chd ~ age,family=binomial(link="probit"))
Probit and logit
⊲ MLE
z$fitted.values
Marginal effects
Measures of fit

1
Application
Heteroscedasticity
Simulation
.8

Semiparametrics
MSC
.6

Single index
Semiparametric LS
Klein and Spady
.4

Implementation
Average derivative
Outlook
.2

Density estimation
0

Regression est.
10 20 30 40 50 60 70
Semiparametrics Age (in years)

Discrete choice Evidence of coronary heart disease (1=yes, 0=no) Probit


Logit
Censored data

Final thoughts
Econometrics Slide 33
Probit and logit: coronary heart disease data

Introduction

Estimation probit chd age


Binary choice ---------------------------------------------------
Binary choice
Probit and logit
chd | Coef. Std. Err. z P>|z|
⊲ MLE
Marginal effects
-------------+-------------------------------------
Measures of fit age | .0651086 .0133894 4.86 0.000
Application
Heteroscedasticity _cons | -3.117323 .624082 -5.00 0.000
Simulation
Semiparametrics
---------------------------------------------------
MSC
Single index
Semiparametric LS logit chd age
Klein and Spady
Implementation ---------------------------------------------------
Average derivative
Outlook
chd | Coef. Std. Err. z P>|z|
Density estimation
-------------+-------------------------------------
Regression est. age | .109732 .0242318 4.53 0.000
Semiparametrics _cons | -5.259665 1.139163 -4.62 0.000
Discrete choice ---------------------------------------------------
Censored data Ratio of slope coefficients: 1.685
Final thoughts
Econometrics Slide 34
Interpretation – marginal effects

Introduction

∂F (x⊤ β)
Estimation
∂P (yi = 1|xi = x)
Binary choice pj (x) = = = f (x⊤ β)βj
Binary choice ∂xij ∂xij
Probit and logit
MLE
⊲ Marginal effects (pj (x) is auxiliary/temporary notation)
Measures of fit
Application
Heteroscedasticity
• pj (x) depends on f = F ′ , but ratios pj (x)/pk (x) do not
Simulation
Semiparametrics
• probit: pj (x) = φ(x⊤ β)βj (= 0.399βj at x⊤ β = 0)
MSC
Single index
• logit: pj (x) = λ(x⊤ β)βj (= 0.25βj at x⊤ β = 0)
Semiparametric LS
Klein and Spady
Implementation
Marginal effects
Average derivative Pn
Outlook • reported at average i=1 xi /n or
Pn ⊤
average marginal effects i=1 f (xi β̂n )β̂nj /n
Density estimation

Regression est.

Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 35
Interpretation – marginal effects: coronary heart disease

Introduction

Estimation probit chd age;margins, dydx(age) at(age=(20 40 60))


Binary choice --------+----------------------------------------
Binary choice
Probit and logit
age 20 | .0050015 .0024204 2.07 0.039
MLE
⊲ Marginal effects
at 40 | .0227723 .0041655 5.47 0.000
Measures of fit 60 | .0190241 .0023136 8.22 0.000
Application
Heteroscedasticity -------------------------------------------------
Simulation
Semiparametrics
MSC logit chd age; margins, dydx(age) at(age=(20 40 60))
Single index
Semiparametric LS --------+----------------------------------------
Klein and Spady
Implementation age 20 | .0046731 .001931 2.42 0.016
Average derivative
Outlook
at 40 | .0228294 .0042952 5.32 0.000
Density estimation
60 | .0182116 .0025315 7.19 0.000
Regression est. -------------------------------------------------
Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 36
Measures of fit

Introduction
How to measure fit in the binary-choice models?
Estimation Pn
Binary choice
• percentage correctly predicted (PCP) = i=1 I[yi = ŷi ]/n,
Binary choice
where ỹi = I{F (x⊤
i bn ) > 0.5}
Probit and logit
MLE
Marginal effects ◦ misleading if one response rarely observed
⊲ Measures of fit
Application ◦ threshold 0.5 not suitable if P (yi = 1|xi ) is always low/high
Heteroscedasticity
Simulation
Semiparametrics • pseudo-R2 = 1 − ln Ln (β̂n )/ ln Ln ((1, 0, . . . , 0)⊤ )
MSC
Single index • other measures exist, but interpretation is more important:
Semiparametric LS
Klein and Spady
Implementation ◦ marginal effects at average pj (x̄)
Average derivative Pn
Outlook ◦ average marginal effects i=1 pj (xi )/n
Density estimation ◦ correct predictions per categories (yi = 1 and yi = 0)
Regression est.
◦ cross-tabulation of yi versus ŷi
Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 37
Example: coronary heart disease data

Introduction

Estimation . estat classification


Binary choice -------- True --------
Binary choice
Probit and logit
Classified | D ~D | Total
MLE
Marginal effects
-----------+--------------------------+-----------
⊲ Measures of fit + | 28 12 | 40
Application
Heteroscedasticity - | 14 45 | 59
Simulation
Semiparametrics
-----------+--------------------------+-----------
MSC Total | 42 57 | 99
Single index
Semiparametric LS --------------------------------------------------
Klein and Spady
Implementation False + rate for true ~D Pr( +|~D) 21.05%
Average derivative
Outlook
False - rate for true D Pr( -| D) 33.33%
Density estimation
False + rate for classified + Pr(~D| +) 30.00%
Regression est. False - rate for classified - Pr( D| -) 23.73%
Semiparametrics --------------------------------------------------
Discrete choice Correctly classified 73.74%
Censored data --------------------------------------------------
Final thoughts
Econometrics Slide 38
Application

Introduction
A. van Soest (1995) Structural models of family labor supply, Journal
Estimation
of Human Resources 30(1), 63–88.
Binary choice
Binary choice
Probit and logit
• model of labor supply of couples forming households
MLE
Marginal effects
• labor supply of man and woman discretized (25–36 choices)
Measures of fit
⊲ Application
• imperfectly predictable wages and hours restrictions
Heteroscedasticity implemented
Simulation
Semiparametrics • estimation via simulated maximum likelihood
MSC
Single index
Semiparametric LS
Klein and Spady
Implementation
Average derivative
Outlook

Density estimation

Regression est.

Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 39
Distributional assumptions and heteroscedasticity

Introduction
Latent linear model with heteroscedasticity:
Estimation

Binary choice
Binary choice yi∗ = x⊤
i β + εi
Probit and logit
MLE
Marginal effects • conditional mean E(εi |xi ) = 0
Measures of fit
Application • conditional variance var(εi |xi ) = var(εi |xi0 ) 6= const. σ 2
⊲ Heteroscedasticity
Simulation
Semiparametrics ◦ generally unknown function of xi0 = xi without intercept
MSC
Single index ◦ parametric form can be assumed to facilitate estimation, e.g.,
Semiparametric LS
Klein and Spady
Implementation var(εi |xi ) = exp(α + x⊤
i0 γ)
Average derivative
Outlook

Density estimation • estimation under heteroscedasticity: var(εi |xi0 ) 6= const. σ 2


Regression est.

Semiparametrics
◦ linear-regression model: ordinary LS consistent
Discrete choice ◦ binary-choice model (εi ∼ N (0, exp(x⊤
i0 γ)) with α = 0):
Censored data “homoscedastic” maximum likelihood inconsistent if γ 6= 0
Final thoughts
Econometrics Slide 40
Simulated linear regression

Introduction

Estimation set obs 200 | n <- 200


Binary choice |
Binary choice
Probit and logit
gen eps = rnormal() | eps <- rnorm(n)
MLE
Marginal effects
gen x = 5*runiform() | x <- 5*runif(n)
Measures of fit gen y = -2+x+eps | y <- -2+x+eps
Application
Heteroscedasticity gen yhet = -2+x+eps*x/5 | y <- -2+x+eps*x/5
⊲ Simulation
Semiparametrics
|
MSC reg y x | z <- lm(y ~ x)
Single index
Semiparametric LS predict linpred, pr | linpred <- z$fitted.values
Klein and Spady
Implementation reg yhet x | z <- lm(yhet ~ x)
Average derivative
Outlook
predict hetpred, pr | hetpred <- z$fitted.values
Density estimation

Regression est.

Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 41
Simulated linear regression

Introduction

Estimation

4
Binary choice
Binary choice
Probit and logit
MLE
Marginal effects 2
Measures of fit
Application
Heteroscedasticity
0

⊲ Simulation
Semiparametrics
MSC
−2

Single index
Semiparametric LS
Klein and Spady
Implementation
−4

Average derivative
Outlook
0 1 2 3 4 5
Density estimation x

Regression est. y Fitted values


Semiparametrics y [hetero] Fitted values [hetero]

Discrete choice

Censored data

Final thoughts
Econometrics Slide 42
Simulated probit regression

Introduction

Estimation set obs 200 | n <- 200


Binary choice |
Binary choice
Probit and logit
gen eps = rnormal() | eps <- rnorm(n)
MLE
Marginal effects
gen x = 5*runiform() | x <- 5*runif(n)
Measures of fit gen y = -2+x+eps > 0 | y <- -2+x+eps > 0
Application
Heteroscedasticity gen yhet = -2+x+eps*x/5 > 0 | y <- -2+x+eps*x/5 > 0
⊲ Simulation
Semiparametrics
|
MSC probit y x |
Single index
Semiparametric LS predict linpred, pr |
Klein and Spady
Implementation probit yhet x |
Average derivative
Outlook
predict hetpred, pr |
Density estimation
---------------------------------------------------
Regression est. z <- glm(y ~ x, family=binomial(link="probit"))
Semiparametrics linpred <- z$fitted.values
Discrete choice z <- glm(yhet ~ x, family=binomial(link="probit"))
Censored data hetpred <- z$fitted.values
Final thoughts
Econometrics Slide 43
Simulated probit regression

Introduction

1
Estimation

Binary choice
Binary choice

.8
Probit and logit
MLE
Marginal effects .6
Measures of fit
Application
Heteroscedasticity
⊲ Simulation
.4

Semiparametrics
MSC
Single index
.2

Semiparametric LS
Klein and Spady
Implementation
0

Average derivative
Outlook
0 1 2 3 4 5
Density estimation x

Regression est. y Pr(y)


Semiparametrics y [hetero] Pr(y) [hetero]

Discrete choice

Censored data

Final thoughts
Econometrics Slide 44
Heteroscedasticity – estimation

Binary-choice model if εi |xi ∼ N (0, exp(x⊤


Introduction

Estimation
i0 γ))
⊤ ⊤γ
Binary choice
P (yi = 1|xi ) = Φ{xi β/ exp(xi0 )}
Binary choice
Probit and logit
2
MLE
Marginal effects • xi0 does not contain intercept
Measures of fit
Application • x⊤
i β and x ⊤ γ could contain different variables
i0
Heteroscedasticity
⊲ Simulation • proof as for the standard probit:
Semiparametrics
MSC
P (yi = 1|xi ) = P (x⊤ ⊤
i β0 + εi > 0|xi ) = P (xi β0 > −εi |xi )
Single index
Semiparametric LS
= P (x⊤ ⊤ ⊤
i β0 / exp(xi0 γ/2) > −εi / exp(xi0 γ/2)|xi )
Klein and Spady = Φ{x⊤ i β 0 / exp(x ⊤ γ/2)}
i0
Implementation
Average derivative • more flexible than the standard probit (recall how to estimate!)
Outlook

Density estimation
• complicated marginal effects (interpretation!)
Regression est. ∂P (yi = 1|xi = x)
pj (x) =
Semiparametrics
∂xij
Discrete choice
⊤ ⊤γ ⊤γ γj ⊤
Censored data = φ{xi β exp(−xi0 )} · exp(−xi0 )[βj − (xi β)]
2 2 2
Final thoughts
Econometrics Slide 45
Probit and heteroscedasticity: coronary heart disease data

Introduction

Estimation probit chd age


Binary choice ---------------------------------------------------
Binary choice
Probit and logit
chd | Coef. Std. Err. z P>|z|
MLE
Marginal effects
-------------+-------------------------------------
Measures of fit age | .0651086 .0133894 4.86 0.000
Application
Heteroscedasticity _cons | -3.117323 .624082 -5.00 0.000
⊲ Simulation
Semiparametrics
---------------------------------------------------
MSC hetprob chd age, het(age)
Single index
Semiparametric LS ---------------------------------------------------
Klein and Spady
Implementation chd | Coef. Std. Err. z P>|z|
Average derivative
Outlook
-------------+-------------------------------------
Density estimation
chd: age | .053903 .04073 1.32 0.186
Regression est. _cons | -2.60493 1.87666 -1.39 0.165
Semiparametrics -------------+-------------------------------------
Discrete choice lnsigma2:age | -.004551 .017336 -0.26 0.793
Censored data ---------------------------------------------------
Final thoughts
Econometrics Slide 46
Semiparametric alternatives

Introduction
Maximum likelihood estimator
Estimation

Binary choice
• can be asymptotically normal and efficient
Binary choice
Probit and logit
• requires strict distributional assumptions
MLE
Marginal effects
• for example, it is inconsistent
Measures of fit
Application
Heteroscedasticity
◦ and highly sensitive to heteroscedasticity
Simulation
◦ and insensitive to misspecification of symmetric unimodal
⊲ Semiparametrics
MSC distribution function
Single index
Semiparametric LS
Klein and Spady
Implementation Semiparametric estimation
Average derivative
Outlook
• methods of estimation that do not rely
Density estimation
on parametric assumptions about the shape
Regression est.
of the error term distribution
Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 47
Maximum score estimation

Introduction
Maximum score estimator (MSE) by Manski (1985)
Estimation

Binary choice
n
1X
Binary choice

β̂nM SE
Probit and logit
MLE = arg maxβ yi I(x⊤
i β ≥ 0) + (1 − yi )I(x ⊤
i β < 0)
Marginal effects n
Measures of fit
i=1
Xn
Application
1
Heteroscedasticity
= arg minβ |yi − I(x⊤
i β > 0)|
Simulation
n
Semiparametrics i=1
⊲ MSC
Single index
Semiparametric LS
Klein and Spady • weak distributional assumptions (med(εi |xi ) = 0)
Implementation
Average derivative applicable under any F and unobserved heteroscedasticity
Outlook
• identification up to a scale as in probit, estimation of
Density estimation
med(yi |xi ) = medI(x⊤i β + ε i > 0) = I(x ⊤ β > 0)
i
Regression est.

Semiparametrics • slow convergence rate n1/3


Discrete choice • Horowitz (1992): smoothed MSE with rate n1/2−δ for δ > 0
Censored data
• mostly applied as auxiliary estimator (e.g., Hausman test...)
Final thoughts
Econometrics Slide 48
Single-index models

Introduction
Single-index model:
Estimation

Binary choice
Binary choice
E(Yi |Xi = x) = g(x⊤ β)
Probit and logit
MLE
Marginal effects • covers linear models, binary-choice models, ...
Measures of fit
Application • restricts the form of heteroscedasticity
Heteroscedasticity
Simulation
Semiparametrics
Identification conditions, assuming unknown g : R → R
MSC
⊲ Single index • g is differentiable and not constant on the support of Xi⊤ β
Semiparametric LS
Klein and Spady • Xi has continuously distributed components and its support is
not contained in any proper linear subspace of Rp
Implementation
Average derivative
Outlook
• no intercept and β1 = 1 (location and scale normalization):
Density estimation
g ∗ (x⊤ β) = g(γ + δ · x⊤ β) if g ∗ (t) = g(γ + δt)
Regression est.

Semiparametrics
• coefficient values of discrete variables cannot divide the support
Discrete choice of Xi⊤ β into disjoint subsets (otherwise, g must not be periodic)
Censored data

Final thoughts
Econometrics Slide 49
Semiparametric LS

Introduction
Semiparametric least squares: Ichimura (1993)
Estimation
Nonlinear least squares for E(yi |xi ) = g(x⊤
i β): g is known
Binary choice
Binary choice n
X
2
Probit and logit
MLE
min {yi − g(x⊤
i β)}
Marginal effects
β∈B
i=1
Measures of fit
Application
Heteroscedasticity Semiparametric least squares: g is unknown
Simulation
Semiparametrics • estimate the regression function
MSC
Single index g(x⊤
i β) = E(Y i |X ⊤ β = x⊤ β) = E(Y |x⊤ β) by ĝ (x⊤ β)
i i i i n i
⊲ Semiparametric LS
Klein and Spady • maximize the sum of least squares to get β̂n from
Implementation
Average derivative
n
X
Outlook
2
Density estimation
min {yi − ĝn (x⊤
i β)}
β∈B
Regression est.
i=1
Semiparametrics

Discrete choice
and then estimate ĝn (x⊤
i β̂n )
Censored data

Final thoughts
Econometrics Slide 50
Klein and Spady

Introduction
Klein and Spady (1993): estimate F and maximize likelihood based
Estimation
on the estimated distribution function F̂n
Binary choice
Binary choice
Probit and logit
• binary response: F (Xi⊤ β) = P (Yi = 1|Xi⊤ ) = E(Yi |Xi⊤ β)
MLE
Marginal effects
• parametric log-likelihood function:
Measures of fit
Application n
X
Heteroscedasticity
Simulation
ln Ln (β) = [yi ln F (x⊤ ⊤
i β) + (1 − yi ) ln{1 − F (xi β)}]
Semiparametrics i=1
MSC
Single index
Semiparametric LS
⊲ Klein and Spady
• estimate F (x⊤ ⊤ ⊤
i β) = P (Yi = 1|xi β) = E(Yi |xi β)
Implementation by F̂n (x⊤
i β), maximize likelihood wrt. β
Average derivative
Outlook
n
X
[yi ln F̂n (x⊤ ⊤
Density estimation

Regression est.
i β) + (1 − yi ) ln{1 − F̂n (xi β)}]
i=1
Semiparametrics

to get β̂n and then estimate F̂n (x⊤


Discrete choice

Censored data
i β̂n )
Final thoughts
Econometrics Slide 51
Implementation: Binary-choice model

Introduction

Estimation probit <- function(beta, x, y)


Binary choice {
Binary choice
Probit and logit
p <- pnorm(x %*% beta)
MLE
Marginal effects
logl <- -1 * (y == 0) * log(1 - p)
Measures of fit + -1 * (y == 1) * log(p)
Application
Heteroscedasticity
Simulation
Semiparametrics
return(sum(logl))
MSC }
Single index
Semiparametric LS
Klein and Spady
⊲ Implementation # assume the data are
Average derivative
Outlook
# Y is n x 1 vector for dependent variable
Density estimation
# X is n x p matrix for explanatory variables
Regression est.

Semiparametrics z <- optim(double(p+1), probit, x=X, y=Y,


Discrete choice method="BFGS")
Censored data print(c("Parameter estimates:",z$par))
Final thoughts
Econometrics Slide 52
Implementation: Binary-choice model

Introduction

Estimation program define probit


Binary choice args lnf Xb
Binary choice
Probit and logit
MLE
Marginal effects
quietly replace ‘lnf’ = ln(normal(‘Xb’))
Measures of fit if $ML_y1==1
Application
Heteroscedasticity quietly replace ‘lnf’ = ln(normal(-‘Xb’))
Simulation
Semiparametrics
if $ML_y1==0
MSC end
Single index
Semiparametric LS
Klein and Spady
⊲ Implementation * assume the data are
Average derivative
Outlook
* y is he dependent variable
Density estimation
* x1, x2 are the explanatory variables
Regression est.

Semiparametrics ml model lf probit (y = x1 x2)


Discrete choice ml init 1 0 0, copy
Censored data ml maximize
Final thoughts
Econometrics Slide 53
Implementation: Klein and Spady

Introduction

Estimation KS <- function(beta, x, y)


Binary choice {
Binary choice
Probit and logit
# originally p <- pnorm(x %*% beta)
MLE
Marginal effects
beta <- c(0, beta, 1)
Measures of fit p <- ksmooth(x%*%beta, y, x.points = x%*%beta)
Application
Heteroscedasticity logl <- -1 * (y == 0) * log(1 - p)
Simulation
Semiparametrics
+ -1 * (y == 1) * log(p)
MSC
Single index
Semiparametric LS return(sum(logl))
Klein and Spady
⊲ Implementation }
Average derivative
Outlook

Density estimation
# assume the data are
Regression est. # Y is n x 1 vector for dependent variable
Semiparametrics # X is n x p matrix for explanatory variables
Discrete choice z <- optim(double(p-1), KS, x=X, y=Y, method="BFGS")
Censored data print(c("Parameter estimates:",z$par))
Final thoughts
Econometrics Slide 54
Implementation: Klein and Spady

Introduction

Estimation program define KS


Binary choice args lnf Xb
Binary choice
Probit and logit
tempvar prob
MLE
Marginal effects
Measures of fit lpoly $ML_y1 ‘Xb’, gen(‘prob’)
Application
Heteroscedasticity
Simulation
Semiparametrics
quietly replace ‘lnf’ = ln(‘prob’)
MSC if $ML_y1==1
Single index
Semiparametric LS quietly replace ‘lnf’ = ln(1-‘prob’)
Klein and Spady
⊲ Implementation if $ML_y1==0
Average derivative
Outlook
end
Density estimation

Regression est. ml model lf KS (y = x1 x2, noconst offset(x1))


Semiparametrics ml init 0, copy
Discrete choice ml maximize
Censored data

Final thoughts
Econometrics Slide 55
Average derivative

Direct estimation of single index model E(Yi |Xi ) = g(Xi⊤ β)


Introduction

Estimation
(Powell, Stock, and Stoker, 1989; Härdle and Stoker, 1989)
Binary choice
Binary choice
Probit and logit • denoting m(x) = E(Yi |Xi = x) and f the density of Xi
MLE
Marginal effects
Measures of fit ′ ∂g(x ⊤ β)
′ ⊤
 ′
Application m (x) = = g (x β)β ⇒ E m (Xi ) = γβ
Heteroscedasticity ∂x
Simulation
Semiparametrics
ˆ ˆ
MSC
Single index
E{m′ (Xi )} = m′ (x)f (x)dx = − m(x)f ′ (x)dx
Semiparametric LS

 ′

f (x) f (Xi )
ˆ
Klein and Spady
Implementation = − m(x) f (x)dx = −E Yi
⊲ Average derivative f (x) f (Xi )
Outlook

Density estimation
• estimate f and f ′ by kernel density estimator
Regression est.

n
X yi fˆn (xi )
Semiparametrics ′
Discrete choice c =
γβ −
n
Censored data
i=1 nfˆn (xi )
Final thoughts
Econometrics Slide 56
Outlook

Introduction
What are benefits of semiparametric procedures
Estimation

Binary choice
• Estimation under less restrictive assumptions
Binary choice
Probit and logit
• Ability to compute probabilities without distributional
MLE
assumptions: estimate
Marginal effects
Measures of fit
Application
Heteroscedasticity
F (Xi⊤ β) = P (Yi = 1|Xi⊤ β) = E(Yi |Xi⊤ β)
Simulation
Semiparametrics
MSC • Ability to compute marginal effects:
Single index
Semiparametric LS
estimate
Klein and Spady
Implementation
′ ∂P (Yi = 1|Xi⊤ β) ∂E(Yi |Xi⊤ β)
Average derivative
F (Xi⊤ β) = =
⊲ Outlook ∂Xi ∂Xi
Density estimation

Regression est. • Requires estimation of densities, distribution functions,


Semiparametrics conditional expectations, and their derivatives
Discrete choice

Censored data

Final thoughts
Econometrics Slide 57
Outlook

Introduction
probit chd age | z <- glm(chd ~ age,
Estimation
predict ind, xb; | family=binomial(link="probit"))
Binary choice
Binary choice predict prob, pr | r <- locpoly(z$fitted.values, chd)
Probit and logit
MLE
lpoly chd ind, ci | lines(r)
Marginal effects addplot((line prob ind, sort))
Measures of fit
Application

Evidence of coronary heart disease (1=yes, 0=no)


Heteroscedasticity
Local polynomial smooth
Simulation
Semiparametrics 1
MSC
.8

Single index
Semiparametric LS
.6

Klein and Spady


Implementation
.4

Average derivative
⊲ Outlook
.2

Density estimation
0

Regression est.
−2 −1 0 1
Linear prediction
Semiparametrics
95% CI Evidence of coronary heart disease
Discrete choice Local smoother Pr(chd)
kernel = epanechnikov, degree = 0, bandwidth = .5, pwidth = .76
Censored data

Final thoughts
Econometrics Slide 58
Application

Introduction
Gerfin (1996) Parametric and semiparametric estimation of the
Estimation
binary-response models of labor market participation. Journal of
Binary choice
Binary choice Applied Econometrics 11, 321–339.
Probit and logit
MLE • labor force participation of Swiss and German women
Marginal effects
Measures of fit • parametric and semiparametric estimators compared
Application
Heteroscedasticity
Simulation
Semiparametrics
MSC
Single index
Semiparametric LS
Klein and Spady
Implementation
Average derivative
⊲ Outlook

Density estimation

Regression est.

Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 59
Introduction

Estimation

Binary choice

⊲ Density estimation
Introduction
Motivation
Histogram
Local histogram
Kernel estimator
Related methods Nonparametric density
Kernel and band.
Bias and variance estimation
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
Testing
Multivariate density

Regression est.

Semiparametrics

Discrete choice

Censored data

Final thoughts

Econometrics Slide 60
Introduction

Introduction
• Parametric regression:
Estimation

Binary choice ◦ concentrates on estimating E(Y |X)


Density estimation
⊲ Introduction
◦ functional and distributional assumptions
Motivation
Histogram
Local histogram
• Semiparametric methods:
Kernel estimator
Related methods ◦ preserve parametric structure for parameters of interest
Kernel and band.
Bias and variance ◦ auxiliary parameters (e.g., the error distribution) are estimated
Bandwidth choice
Plug-in methods without specific parametric assumptions
Asymptotics
Confidence intervals
Confidence bands • Nonparametric function estimation: unconstrained
Testing
Multivariate density
◦ density estimation
Regression est.

Semiparametrics
◦ regression estimation
Discrete choice ◦ curse of dimensionality
Censored data

Final thoughts

Econometrics Slide 61
Motivation

Introduction
Probability density function can
Estimation

Binary choice
• capture and demonstrate stylized facts
Density estimation (e.g., development of income distribution)
Introduction
⊲ Motivation • describe an unknown distribution
Histogram
Local histogram
(e.g., of an estimation procedure in finite samples)
Kernel estimator
Related methods
• help in parametric inference
Kernel and band. (e.g., asymptotic variance of LAD depends on f (0))
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics • conditional moment
´ estimation:
Confidence intervals
Confidence bands E(Y |X = x) = yf (x, y)/f (x)dy in regression)
Testing
Multivariate density • conditional distribution function estimation:
Regression est. P (Y ≤ t|X = x) = E[I(Y ≤ t)|X = x)
Semiparametrics • derivative of density function
Discrete choice
by differentiating a density estimator
Censored data

Final thoughts

Econometrics Slide 62
Motivation

Introduction
Parametric approach
Estimation

Binary choice
• assume form parametrized by a number of parameters
Density estimation (  2 )
Introduction 1 1 x−µ
⊲ Motivation f (x|µ, σ) = √ exp −
Histogram
2πσ 2 σ
Local histogram
Kernel estimator
Related methods
Kernel and band. • estimate µ and σ
• set fˆ(x) = f (x|µ̂, σ̂)
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands Nonparametric approach
Testing
Multivariate density
• do not assume specific form or parameters
Regression est.
• impose smoothness of the density function
Semiparametrics

Discrete choice
• estimate a general density function
Censored data • example: histogram
Final thoughts

Econometrics Slide 63
Net income example

Introduction
Net income in the U.K. from 1969 to 1983:
Estimation
noparametric and parametric density estimates
Binary choice

Density estimation
Introduction
⊲ Motivation
Histogram
Kernel density Log-normal density
Local histogram
Kernel estimator
Related methods
Kernel and band.
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
Testing
Multivariate density 1981
1979 1981
1977 1979
Regression est. 1975 1977
1973 1975
1971 1973
Semiparametrics
1971

Discrete choice

Censored data

Final thoughts

Econometrics Slide 64
Histogram

Introduction
Estimate density f (observations x1 , . . . , xn ∼ F )
Estimation

Binary choice • select origin x0 and bin width h


Density estimation • construct intervals (bins): Ij = hx0 + (j − 1)h, x0 + jh)
Introduction Pn
Motivation • set fj = i=1 I(xi
∈ Ij )/(nh) for each interval
⊲ Histogram
Local histogram • example: observations from χ2 (5) with x0 = 0 and h = 1
Kernel estimator
Related methods
Kernel and band. 15
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
10

Confidence bands
Y*E-2

Testing
Multivariate density

Regression est.
5

Semiparametrics

Discrete choice

Censored data
0

Final thoughts
0 5 10 15
X
Econometrics Slide 65
Histogram – properties

Introduction
Mathematical explanation
Estimation

Binary choice
• probability of “falling” into interval Ij = [x0 + (j − 1)h, x0 + jh]
Density estimation
ˆ
Introduction P (X ∈ Ij ) = f (x)dx ≈ f {x0 + (j − 1/2)h}h
Motivation
⊲ Histogram
Ij
Local histogram
Kernel estimator
Related methods
Kernel and band.
• estimate of the density
Bias and variance
Bandwidth choice X n
1 1
Plug-in methods
Asymptotics
fˆ{x0 + (j − 1/2)h} ≈ P (X ∈ Ij ) ≈ I(xi ∈ Ij )
Confidence intervals
h hn
i=1
Confidence bands
Testing
Multivariate density Properties
Regression est.
• step function
Semiparametrics

Discrete choice
• bias ∼ h: fˆ{x0 + (j − 1/2)h} used for all x ∈ Ij
Censored data • variance ∼ 1/nh: all data in Ij used
Final thoughts • dependence on origin and on bin width h
Econometrics Slide 66
Histogram: simulated example

Introduction
100 histograms for 500 observations simulated from N (0, 1)
Estimation

Binary choice

Density estimation h=0.1 h=0.5


Introduction

0.6

0.6
Motivation
⊲ Histogram
0.5

0.5
Local histogram
0.4

0.4
Kernel estimator
Y

Y
0.3

0.3
Related methods
0.2

0.2
Kernel and band.
Bias and variance
0.1

0.1
Bandwidth choice
0

0
Plug-in methods -5 0 5 -5 0 5
X X
Asymptotics
Confidence intervals h=1.0 h=2.0
Confidence bands
Testing
0.6

0.6
Multivariate density
0.5

0.5
Regression est.
0.4

0.4
Y

Y
0.3

0.3
Semiparametrics
0.2

0.2
Discrete choice
0.1

0.1

Censored data
0

-5 0 5 -5 0 5
Final thoughts X X

Econometrics Slide 67
Local histogram

Introduction
• use interval around any given x: (x − h/2, x + h/2)
Estimation
• estimate
Binary choice
Xn
1
Density estimation
fˆh (x) = I(x − h/2 ≤ xi ≤ x + h/2)
Introduction
nh
Motivation i=1
Histogram
Xn  
⊲ Local histogram 1 1 xi − x 1
Kernel estimator = I − ≤ ≤
Related methods nh 2 h 2
Kernel and band. i=1
Bias and variance
Bandwidth choice Properties (strictly speaking, we should write hn and fˆhn ,n ):
Plug-in methods

fˆh (x)dx = 1 (verify)


Asymptotics
´
Confidence intervals •
Confidence bands
Testing • not continuous, but converges to f (x) for h → 0
Multivariate density

Regression est.
′ F (x + h/2) − F (x − h/2)
f (x) = F (x) = lim
Semiparametrics
h→0 h
Discrete choice
P (x − h/2 ≤ Xi ≤ x + h/2)
Censored data = lim
h→0 h
Final thoughts

Econometrics Slide 68
Kernel estimator

Introduction
(Local) histogram not smooth ⇒ replace indicator by a smooth
Estimation
function
Binary choice

Density estimation • kernel function K(x)


Introduction
Motivation • positive K(x) ≥ 0
Histogram
´∞
Local histogram • −∞ K(t)dt =1
⊲ Kernel estimator
Related methods Rozenblatt-Parzen estimator (for bandwidth h)
Kernel and band.
Bias and variance
n
X  
Bandwidth choice
ˆ 1 xi − x
Plug-in methods fh (x) = K
Asymptotics nh h
Confidence intervals i=1
Confidence bands
Testing
Multivariate density Further requirements on kernel K(x)
Regression est.
• diminishes with distance from zero: K(−∞) = K(∞) = 0
Semiparametrics ´
Discrete choice
• is symmetric tK(t)dt = 0
Censored data • typically, K(x) > 0 for x ∈ (−1, 1)
Final thoughts

Econometrics Slide 69
Related methods

Introduction
• Derivative estimation:
ˆ(s) Pn
Estimation s s+1
fh (x) = (−1) /(nh ) i=1 K (s) {(xi − x)/h}
Binary choice

Density estimation
• Variable bandwidth: each point xi has it own bandwidth hin
Introduction
Motivation
• K th nearest neighbor:
Histogram
n
X  
Local histogram
1 xi − x
Kernel estimator
fˆk (x) = K ,
⊲ Related methods
ndk (x) dk (x)
Kernel and band. i=1
Bias and variance
Bandwidth choice
Plug-in methods where dk (x) = distance of x and its k th nearest neighbor
Asymptotics
Confidence intervals • Series estimation: express continuous density as
Confidence bands
Testing
Multivariate density
J
X
Regression est. fˆJ (x) = aj gj (x)
Semiparametrics j=1
Discrete choice

Censored data
for some orthogonal functions gj (x)
Final thoughts (e.g., gj (x) = Hj (x) or φ(x)xj )
Econometrics Slide 70
Kernel functions

Introduction
Examples of various kernel functions
Estimation

Binary choice

Density estimation Kernel Function


Introduction
Motivation Uniform 1/2 · I(|x| ≤ 1)
Histogram
Local histogram Triangular (1 − |x|)I(|x| ≤ 1)
Kernel estimator
Related methods
Quartic 15/16 · (1 − x2 )2 I(|x| ≤ 1)
⊲ Kernel and band.
Bias and variance
Epanechnikov 3/4√· (1 − x2 )I(|x| ≤ 1)
Bandwidth choice Gaussian 1/ 2π · exp(−x2 /2)
Plug-in methods
Asymptotics ... ...
Confidence intervals
Confidence bands
Testing
Multivariate density

Regression est.

Semiparametrics

Discrete choice

Censored data

Final thoughts

Econometrics Slide 71
Kernel functions – graphs

Introduction
Plots of several kernel functions with support [−1, 1]
Estimation

Binary choice

Density estimation
Introduction
Motivation Uniform Epanechnikov
Histogram
1

1
Local histogram
Kernel estimator
K(x)

K(x)
Related methods
0.5

0.5
⊲ Kernel and band.
Bias and variance
Bandwidth choice
0

0
Plug-in methods
-1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5
Asymptotics x x
Confidence intervals
Triangle Quartic
Confidence bands
Testing
1

1
Multivariate density

Regression est.
K(x)

K(x)
0.5

0.5
Semiparametrics

Discrete choice
0

Censored data -1.5 -1 -0.5 0 0.5 1 1.5 -1.5 -1 -0.5 0 0.5 1 1.5
x x

Final thoughts

Econometrics Slide 72
Kernel choice

Introduction
Estimated density of stock returns with different kernels
Estimation
(Pagan and Schwert, 1990; monthly US data 1834–1925)
Binary choice

Density estimation
Introduction
Motivation
Histogram Uniform kernel, h=0.015 Epanechnikov kernel, h=0.015
Local histogram
Kernel estimator
10

10
Related methods
⊲ Kernel and band.
fh

fh
Bias and variance
5

5
Bandwidth choice
Plug-in methods
0

0
Asymptotics
Confidence intervals -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x
Confidence bands
Testing Triangle kernel, h=0.015 Quartic kernel, h=0.015
Multivariate density

Regression est.
10

10
Semiparametrics
fh

fh
5

5
Discrete choice

Censored data
0

Final thoughts -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x

Econometrics Slide 73
Bandwidth choice

Introduction
Estimated density of stock returns with different bandwidths
Estimation
(Pagan and Schwert, 1990; monthly US data 1834–1925)
Binary choice

Density estimation
Introduction
Motivation
Histogram Epanechnikov kernel, h=0.005 Epanechnikov kernel, h=0.01
Local histogram
Kernel estimator
15

15
Related methods
⊲ Kernel and band.
10

10
fh

fh
Bias and variance
Bandwidth choice
5

5
Plug-in methods
Asymptotics
0

0
Confidence intervals -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x
Confidence bands
Testing Epanechnikov kernel, h=0.025 Epanechnikov kernel, h=0.025
Multivariate density
15

15
Regression est.
10

10
Semiparametrics
fh

fh

Discrete choice
5

Censored data
0

Final thoughts -0.15 -0.1 -0.05 0 0.05 0.1 0.15 -0.15 -0.1 -0.05 0 0.05 0.1 0.15
x x

Econometrics Slide 74
Bandwidth choice – simulations

Introduction
100 density estimates for 500 observations simulated from N (0, 1)
Estimation

Binary choice

Density estimation h=0.05 h=0.20


Introduction

0.6

0.6
Motivation
Histogram
0.5

0.5
Local histogram
0.4

0.4
Kernel estimator
Y

Y
0.3

0.3
Related methods
0.2

0.2
⊲ Kernel and band.
Bias and variance
0.1

0.1
Bandwidth choice
0

0
Plug-in methods -5 0 5 -5 0 5
X X
Asymptotics
Confidence intervals h=0.80 h=3.20
Confidence bands
Testing
0.6

0.6
Multivariate density
0.5

0.5
Regression est.
0.4

0.4
Y

Y
0.3

0.3
Semiparametrics
0.2

0.2
Discrete choice
0.1

0.1

Censored data
0

-5 0 5 -5 0 5
Final thoughts X X

Econometrics Slide 75
Density estimation – assumptions

Introduction
Kernel estimator of density f
Estimation
n
X   n
X
Binary choice
1 x i − x 1
Density estimation fˆh (x) = K = wni (x)
Introduction nh h nh
Motivation
i=1 i=1
Histogram
Local histogram • observations x1 , . . . , xn
Kernel estimator
Related methods • kernel K is symmetric around zero and
Kernel and band.
⊲ Bias and variance
´
Bandwidth choice
◦ K(t)dt = 1
Plug-in methods
´ 2
Asymptotics
◦ t K(t)dt = µ2 6= 0
´ 2
Confidence intervals
Confidence bands
◦ K (t)dt = kKk2 < ∞
Testing
Multivariate density
• h = hn → 0 as n → ∞
Regression est.

Semiparametrics
• nhn → ∞ as n → ∞
Discrete choice

Censored data • f is twice continuously differentiable


Final thoughts

Econometrics Slide 76
Exact bias and variance

Introduction
• Bias (substitution t = (u − x)/h)
Estimation
  
Binary choice
1 xi − x
Density estimation E[fˆh (x) − f (x)] = E K − f (x)
Introduction h h
Motivation
 
1 u−x
ˆ
Histogram
Local histogram
= K f (u)du − f (x)
Kernel estimator
h h
Related methods
ˆ
Kernel and band.
⊲ Bias and variance
= K(t){f (x + th) − f (x)}dt
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
• Variance (varZ = E(Z 2 ) − [E(Z)]2 )
Confidence bands
  
Testing
ˆ 1 1 xi − x
Multivariate density
var[fh (x)] = var K
Regression est. n h h
ˆ 2
1 1
Semiparametrics
ˆ
2
Discrete choice = K (t)f (x + th)dt − K(t)f (x + th)dt
nh n
Censored data

Final thoughts

Econometrics Slide 77
Asymptotic bias and variance

Introduction
Using the Taylor expansion
Estimation
1
Binary choice f (x + th) = f (x) + thf (x) + (th)2 f ′′ (x) + · · ·

Density estimation 2
Introduction
Motivation • bias up to O(h2 )
Histogram
Local histogram
ˆ
Kernel estimator
Related methods
E[fˆh (x) − f (x)] = K(t){f (x + th) − f (x)}dt
Kernel and band.
1
ˆ
⊲ Bias and variance
Bandwidth choice = K(t){thf (x) + (th)2 f ′′ (x)}dt

Plug-in methods 2
Asymptotics ˆ 2 ′′
h f (x)
ˆ

K(t)t2 dt
Confidence intervals
Confidence bands = hf (x) K(t)tdt +
Testing 2
Multivariate density

Regression est. h 2 ˆ
h 2
bias[fˆh (x)] = ′′ 2
f (x) t K(t)dt = f ′′ (x)µ2 (K)
Semiparametrics
2 2
Discrete choice

Censored data

Final thoughts

Econometrics Slide 78
Example – bias vs. density

Introduction
Density = mixture of N (0, 1) (0.3) and t1 − 3 (0.7)
Estimation

Binary choice

Density estimation
Introduction
Density (dashed) and bias effect (solid)
Motivation
Histogram
Local histogram 20
Kernel estimator
Related methods
Kernel and band.
⊲ Bias and variance
15
Density and bias*E-2

Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
10

Confidence bands
Testing
Multivariate density

Regression est.
5

Semiparametrics

Discrete choice

Censored data

Final thoughts -6 -4 -2 0 2
x
Econometrics Slide 79
Asymptotic bias and variance

Introduction
Using the Taylor expansion
Estimation
1
Binary choice f (x + th) = f (x) + thf (x) + (th)2 f ′′ (x) + · · ·

Density estimation 2
Introduction
Motivation • bias is up to O(h2 )
Histogram
Local histogram
Kernel estimator
h2 h2
ˆ
Related methods
Kernel and band. bias[fˆh (x)] = ′′
f (x) 2
t K(t)dt = µ2 (K)f ′′ (x)
⊲ Bias and variance 2 2
Bandwidth choice
Plug-in methods
Asymptotics • variance is up to O(1/nh)
Confidence intervals

1 1
Confidence bands
ˆ
Testing var[fˆh (x)] = f (x) K 2 (t)dt = kKk2 f (x)
Multivariate density nh nh
Regression est.

Semiparametrics • mean square error (MSE = Bias2 + Var)


Discrete choice  2
h2 1
Censored data
M SE[fˆh (x)] = µ2 f (x) ′′
+ kKk2 f (x)
Final thoughts 2 nh
Econometrics Slide 80
Example – bias, variance, and MSE vs. bandwidth

Introduction
Density = mixture of N (0, 1) (0.3) and N (−3, 1) (0.7)
Estimation

Binary choice
Squared bias (solid), variance (dashed), and MSE (thick)
Density estimation
Introduction
Motivation

15
Histogram
Local histogram
Kernel estimator
Related methods
Bias, variance and MSE*E-4

Kernel and band.


⊲ Bias and variance
10

Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
Testing
5

Multivariate density

Regression est.

Semiparametrics

Discrete choice
0

Censored data
2 4 6 8 10
Final thoughts Bandwidth*E-2

Econometrics Slide 81
Bandwidth choice

Introduction
Bias-variance trade-off (see simulation)
Estimation

Binary choice
• MSE pointwise only

M SE[fˆh (x)] = E[fˆh (x) − f (x)]2


Density estimation
Introduction
Motivation
Histogram • Mean integrated squared error (MISE)
Local histogram
Kernel estimator
ˆ  ˆ
Related methods
Kernel and band.
M ISE[fˆh ] = E {fˆh (x) − f (x)}2 dx = M SE[fˆh (x)]dx
Bias and variance
⊲ Bandwidth choice
Plug-in methods • Asymptotic MISE (AMISE)
Asymptotics
Confidence intervals
ˆ "2
2 #
h 1
Confidence bands
AM ISE[fˆh ] = µ2 (K)f ′′ (x) + kKk2 f (x) dx
Testing
Multivariate density
2 nh
Regression est. 4
h 2
ˆ
1
ˆ
Semiparametrics = µ2 (K) [f ′′ (x)]2 dx + kKk2 f (x)dx
4 nh
Discrete choice
h4 1
Censored data
= µ2 (K)kf ′′ k2 + kKk2
Final thoughts 4 nh
Econometrics Slide 82
Bandwidth choice

Introduction
Optimal bandwidth h = minimal error
Estimation

Binary choice
hopt = arg minh AM ISE(fˆh )
Density estimation
Introduction
Motivation
Histogram • optimal bandwidth
Local histogram
 1/5
Kernel estimator
kKk2
Related methods
hopt = ∼ n−1/5
Kernel and band.
Bias and variance
kf ′′ k2 µ22 (K)n
⊲ Bandwidth choice
Plug-in methods
Asymptotics • optimal error AM ISE ∼ n−4/5 (histogram: n−2/3 )
Confidence intervals
Confidence bands • kf ′′ k2 unknown
Testing
Multivariate density
Kernel choice by minimizing AMISE
Regression est.

Semiparametrics • K opt (x) = 34 (1 − x2 ) (Epanechnikov)


Discrete choice • other kernels have just slightly worse efficiencies
Censored data
Gauss (1.04), quartic (1.005), triangle (1.01), unif. (1.06)
Final thoughts

Econometrics Slide 83
Plug-in methods

Introduction
Plug-in methods = assume normality
Estimation

Binary choice
• Silverman’s rule of thumb: f = φ{(x − µ)/σ}/σ
Density estimation ⇒ hROT = 1.06σ̂n−1/5 for Gaussian kernel
Introduction
Motivation • Park and Marron plug-in estimator:
Histogram
Local histogram
Kernel estimator
◦ estimate f ′′ (x) by kernel density estimation
Related methods
n
X  
Kernel and band.
ˆ′′ 1 ′′ xi − x
Bias and variance fhROT (x) = K ,
Bandwidth choice nh3ROT i=1 hROT
⊲ Plug-in methods
Asymptotics
Confidence intervals ◦ use bias correction
Confidence bands

\ ′′ k2 = kfˆ′′ k2 −
1
kK ′′ k2
Testing
kf
nh5ROT
Multivariate density

Regression est.

Semiparametrics ◦ compute optimal bandwidth


Discrete choice !1/5
kKk2
Censored data
hP M = ∼ n−1/5
Final thoughts \
kf ′′ k2 µ2 (K)n
2
Econometrics Slide 84
Example: Car weights

Introduction
Beware: Stata example for the car weight data
Estimation
kdensity weight, kernel(epanechnikov) generate(x epan)
Binary choice
kdensity weight, kernel(parzen) generate(x2 parzen)
Density estimation
Introduction line epan parzen x, sort ytitle(Density) legend(cols(1))
Motivation
Histogram

.0008
Local histogram
Kernel estimator
Related methods
Kernel and band. .0006
Bias and variance
Bandwidth choice
Density
.0004

⊲ Plug-in methods
Asymptotics
Confidence intervals
.0002

Confidence bands
Testing
Multivariate density
0

Regression est.
1000 2000 3000 4000 5000
Weight (lbs.)
Semiparametrics
Epanechnikov density estimate
Discrete choice Parzen density estimate
Censored data

Final thoughts

Econometrics Slide 85
Example: Car weights

Introduction
Beware: Stata example for the car weight data
Estimation
kdens weight, kernel(epanechnikov) generate(epan x) bw(sjpi)
Binary choice
kdens weight, kernel(parzen) generate(parzen x) bw(sjpi)
Density estimation
Introduction line epan parzen x, sort ytitle(Density) legend(cols(1))
Motivation
Histogram

.0005
Local histogram
Kernel estimator

.0004
Related methods
Kernel and band.
Bias and variance
.0002 .0003

Bandwidth choice
Density

⊲ Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
.0001

Testing
Multivariate density
0

Regression est.
1,000 2,000 3,000 4,000 5,000
Weight (lbs.)
Semiparametrics
Epanechnikov density estimate
Discrete choice Parzen density estimate
Censored data

Final thoughts

Econometrics Slide 86
Example: Car weights

Introduction
kdensity weight, nograph generate(x fx)
Estimation
kdensity weight if foreign==0, nograph generate(fx0) at(x)
Binary choice
kdensity weight if foreign==1, nograph generate(fx1) at(x)
Density estimation
Introduction line fx0 fx1 x, sort ytitle(Density)
Motivation
Histogram

.001
Local histogram
Kernel estimator
Related methods
Kernel and band. .0008
Bias and variance
.0006

Bandwidth choice
Density

⊲ Plug-in methods
Asymptotics
.0004

Confidence intervals
Confidence bands
.0002

Testing
Multivariate density

Regression est.
0

Semiparametrics 1000 2000 3000 4000 5000


Weight (lbs.)
Discrete choice Domestic cars Foreign cars

Censored data

Final thoughts

Econometrics Slide 87
Asymptotics – assumptions

Introduction
Provided that kernel K and density f satisfy additionally
Estimation

Binary choice • |x|K(x) → 0 as x → ∞


Density estimation • supx∈R |K(x)| < ∞
Introduction
K 2+δ (x)dx < ∞
´
Motivation •
Histogram
Local histogram
Kernel estimator
Related methods • f is everywhere continuous
Kernel and band. ´
Bias and variance • |f (x)|dx < ∞
Bandwidth choice
Plug-in methods
⊲ Asymptotics
Confidence intervals the kernel density estimator is asymptotically unbiased
Confidence bands
Testing
Multivariate density lim Efˆh = f and lim sup |Efˆh − f | = 0
n→∞ n→∞ x∈R
Regression est.

Semiparametrics

Discrete choice

Censored data

Final thoughts

Econometrics Slide 88
Asymptotics – consistency and normality

Introduction
Kernel density estimator is (hn → 0 and nhn → ∞)
Estimation
P
Binary choice • pointwise consistent: M SE[fˆh ] → 0 and fˆh → f
Density estimation
Introduction
• uniformly consistent under some regularity conditions and
P
Motivation
Histogram
nh2n → ∞: supx∈R |fˆh (x) − f (x)| → 0
Local histogram
Kernel estimator
• asymptotically normal (pointwise):
Related methods √ 
Kernel and band.
Bias and variance
nh{fˆh (x) − Efˆh (x)} → N 0, f (x)kKk2
Bandwidth choice
Plug-in methods √ √
⊲ Asymptotics
ˆ
(the same applies to nh(fh − f ) if nhh2 → 0,
which does not hold for hopt , but for undersmoothing)
Confidence intervals
Confidence bands

• asymptotics of fˆh − f under h = cn−1/5


Testing
Multivariate density

Regression est.  
√ c2
Semiparametrics
nh{fˆh (x) − f (x)} → N f ′′ (x)µ2 (K), f (x)kKk2
Discrete choice 2
Censored data √
Final thoughts • optimal rate of convergence: hopt ∼ n−1/5 ⇒ nh ∼ n2/5
Econometrics Slide 89
Confidence intervals

Introduction
Asymptotic normality ⇒ pointwise confidence intervals
Estimation

Binary choice
• asymptotic confidence interval
Density estimation
α
 −1/2
Φ−1 1−
Introduction
ˆ
Motivation fˆh (x) ± √ 2
fˆh (x) K 2 (x)dx
Histogram
Local histogram
nh
Kernel estimator
Related methods
under undersmoothing (h ∼ n−1/5−δ , δ > 0)
Kernel and band.
Bias and variance
Bandwidth choice
• finite samples
Plug-in methods
Asymptotics ◦ undersmooth (see above)
⊲ Confidence intervals
Confidence bands ◦ estimate bias (difficult in small samples)
Testing
Multivariate density
α
 −1
h2 Φ−1 1−
ˆ
Regression est.
fˆh (x)− fc′′
h (x)µ2 (K)± √ 2
fˆh (x) K 2 (x)dx
Semiparametrics 2 nh
Discrete choice

Censored data ◦ bootstrap variance (improvement in finite samples)


Final thoughts

Econometrics Slide 90
Confidence bands

Introduction
• confidence intervals – pointwise
Estimation
• confidence bands – interval or R wide
Binary choice

Density estimation
Introduction
◦ available under restrictive assumptions
Motivation (undersmoothing, f on interval (0, 1))
Histogram
Local histogram
" #1/2  1/2
Kernel estimator
ˆ(x)kKk2
f z
fˆ(x)±
Related methods
Kernel and band.
1/2
+ dn
Bias and variance nh {2(1/5 + δ) log(n)}
Bandwidth choice
Plug-in methods
Asymptotics with coverage probability 1 − α = exp[−2 exp(z)] and
Confidence intervals
⊲ Confidence bands dn = {2(1/5 + δ) log(n)}1/2 [1 + log{kK ′ k2 /2πkKk2 }]
Testing
Multivariate density

Regression est.
• confidence bands typically wider than confidence intervals
Semiparametrics

Discrete choice

Censored data

Final thoughts

Econometrics Slide 91
Example: CPS 1985

Introduction
Income distribution in the USA (CPS 1985)
Estimation
Test statistics: T = 57.209 > 2.32 = Φ−1 (0.99)
Binary choice

Density estimation Log-normal (solid) and nonparametric (dashed) densities


Introduction
Motivation
Histogram
Local histogram 10
Kernel estimator
Related methods
Kernel and band.
Bias and variance
Bandwidth choice
Density*E-2

Plug-in methods
Asymptotics
5

Confidence intervals
⊲ Confidence bands
Testing
Multivariate density

Regression est.

Semiparametrics

Discrete choice
0

Censored data

Final thoughts 0 10 20 30
Income
Econometrics Slide 92
Example: CPS 1985

Introduction
Income distribution in the USA (CPS 1985)
Estimation
kdens wagelog, ci normal bw(oversmooth)
Binary choice

Density estimation

.25
Introduction
Motivation
Histogram

.2
Local histogram
Kernel estimator
Related methods .15
Density

Kernel and band.


Bias and variance
.1

Bandwidth choice
Plug-in methods
Asymptotics
.05

Confidence intervals
⊲ Confidence bands
Testing
0

Multivariate density
−2 0 2 4 6 8
Regression est. wlog

Semiparametrics 95% CI Kernel estimate


Normal density
Discrete choice

Censored data

Final thoughts

Econometrics Slide 93
Testing

Introduction
Testing H0 : f = g vs. H1 : f 6= g for a known g(x, θ)
Estimation

Binary choice • test statistics


Density estimation
ˆ
Introduction ˆ fˆ, g) =
Iˆ = I( {fˆ(x) − g(x, θ)}2 dx
Motivation
Histogram
x
Local histogram
Kernel estimator • asymptotically nh1/2 (Iˆ − c(n) − Ibias(fˆ)2 )´→ N (0, σ 2 ),
Related methods
Kernel and band. where c(n) = kKk2 /(nh) and σ 2 = C(K) f 2 (x)dx
Bias and variance
Bandwidth choice • bias eliminated by undersmoothing (h ∼ n−1/5−δ , δ > 0)
Plug-in methods Pn ˆ
Asymptotics • variance estimated by σ̂ = C(K) i=1 f (xi )/n or
Confidence intervals
Confidence bands
n X
X n  
C(K) xi − xj
⊲ Testing
ˆ
Multivariate density
σ̂ = C(K) fˆ2 (x)dx = K ∗K
Regression est. n2 h h
i=1 j=1
Semiparametrics

• test statistics T = nh1/2 {Iˆ − kKk2 /(nh)}/σ̂ → N (0, 1)


Discrete choice

Censored data

Final thoughts

Econometrics Slide 94
Example: CPS 1985

Introduction
Income distribution in the USA (CPS 1985)
Estimation
Test statistics: T = 57.209 > 2.32 = Φ−1 (0.99)
Binary choice

Density estimation Log-normal (solid) and nonparametric (dashed) densities


Introduction
Motivation
Histogram
Local histogram 10
Kernel estimator
Related methods
Kernel and band.
Bias and variance
Bandwidth choice
Density*E-2

Plug-in methods
Asymptotics
5

Confidence intervals
Confidence bands
⊲ Testing
Multivariate density

Regression est.

Semiparametrics

Discrete choice
0

Censored data

Final thoughts 0 10 20 30
Income
Econometrics Slide 95
Multivariate density estimation

Data xi ∈ Rd and kernel K : Rd → R


Introduction

Estimation

Binary choice • product kernel K(x) = K1 (x1 ) · . . . · Kd (xd )


´
Density estimation
Introduction
• radially symmetric kernel K(x) = K1 (kxk)/ Rd K1 (ktk)dt
Motivation
Histogram Multivariate density estimation
Local histogram
ˆ Pn
Kernel estimator • single bandwidth: fh (x) = i=1 K{(x − xi )/h}/nhd
Related methods
Kernel and band. • multiple bandwidths:
Bias and variance
Bandwidth choice
Plug-in methods
n
X  
Asymptotics 1 x1 − xi1 xd − xid
Confidence intervals fˆh (x) = K ,··· ,
Confidence bands nh1 · . . . · hd h1 hd
Testing i=1
⊲ Multivariate density

Regression est.
• most general (H ∈ Rd×d )
Semiparametrics
X n
Discrete choice
ˆ 1
fH (x) = K{H −1 (x − xi )}
Censored data ndet(H)
i=1
Final thoughts

Econometrics Slide 96
Multivariate density estimation – properties

Introduction
For a symmetric kernel with second moments and norm
´ ´
Estimation
( K(x)dx = 1, xK(x)dx = 0,
2
Binary choice
´ ⊤
´ 2
µ2 (K) = xx K(x)dx, kKk = K (x)dx)
Density estimation
Introduction
Motivation
• bias (H(f ) = Hessian matrix of f ) [result for H = hId ]
Histogram  2

1 h
bias[fˆH (x)] ≈ µ2 (K)tr(H ⊤ H(f )H) =
Local histogram
Kernel estimator µ2 (K)tr(H(f ))
Related methods 2 2
Kernel and band.
Bias and variance
Bandwidth choice • variance [result for H = hId ]
Plug-in methods  
Asymptotics
1 1
Confidence intervals var[fˆH (x)] ≈ kKk2 f (x) = d
kKk 2
f (x)
Confidence bands ndet(H) nh
Testing
⊲ Multivariate density

Regression est. • AMISE = Bias2 + Variance


Semiparametrics
• optimal bandwidth h ∼ n−1/(4+d) (ROT, CV, ...)
Discrete choice √
• optimal AM ISE and rate of convergence ∼ n−2/(4+d)
Censored data

Final thoughts • curse of dimensionality


Econometrics Slide 97
Age-income example

Introduction
Joint density of income and age in east Germany, 1991
Estimation

Binary choice

Density estimation
Introduction
Motivation Age-income density estimate
Histogram
Local histogram
Kernel estimator
Related methods
Kernel and band.
Bias and variance
Bandwidth choice
Plug-in methods
Asymptotics
Confidence intervals
Confidence bands
Testing
⊲ Multivariate density

Regression est.

Semiparametrics 3426
2917
2408
Discrete choice
1898
1389
Censored data 880
Final thoughts 25 32 39 46 52 59

Econometrics Slide 98
Introduction

Estimation

Binary choice

Density estimation

⊲ Regression est.
Cond. moments
Nonpar. regression
Various estimators
Local linear reg. Nonparametric regression
Local polynomial
Example
Assumptions
estimation
Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
Confidence intervals
Testing
Examples
Multivariate reg.

Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 99
Conditional moments

Introduction
Estimation of conditinal moments
Estimation

Binary choice
• regression
Density estimation
◦ dependent variable Y (e.g., earnings)
Regression est.
⊲ Cond. moments ◦ explanatory variables X (e.g., age, education)
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
yi = m(xi ) + εi
Example
Assumptions
E(Y |X) = m(X)
Nadaraya-Watson
Local linear reg.
ln Earnings = m(Age, Education) + ε
Bandwidth choice
Cross validation
Asymptotics • conditional variance
Confidence intervals
Testing
Examples E[{Yi − E(Yi |Xi )}2 |Xi ] = E(Yi2 |Xi ) − [E(Yi |Xi )]2
Multivariate reg.

Semiparametrics

Discrete choice
• conditional probability
Censored data

Final thoughts
P (Yi = 1|Xi ) = 1P (Yi = 1|Xi )+0P (Yi = 0|Xi ) = E(Yi |Xi )
Econometrics Slide 100
Univariate regression

Introduction
Estimation idea with explanatory variable
Estimation

Binary choice
• discrete (bandwidth 1 ≫ h)
Density estimation
n
X I(xi = x)yi
Regression est.
Ên (Yi |Xi = x) = Pn
j=1 I(xj = x)
Cond. moments
⊲ Nonpar. regression i=1
Various estimators
Xn
Local linear reg. I(x − h < xi < x + h)yi
Local polynomial = Pn
j=1 I(x − h < xj < x − h)
Example
Assumptions i=1
Nadaraya-Watson
Xn
Local linear reg. I[|x − xi |/h < 1]yi
Bandwidth choice = Pn
Cross validation
i=1 j=1 I[|x − xj |/h < 1]
Asymptotics
Confidence intervals
Testing
Examples
• continuous (kernel K , bandwidth h)
Multivariate reg.
n
X
Semiparametrics K{(x − xi )/h}yi
Ên (yi |x) = Pn
j=1 K{(x − xj )/h}
Discrete choice

Censored data
i=1

Final thoughts
Econometrics Slide 101
Nonparametric regression

Introduction
Regression model y = E(y|x) + ε = m(x) + ε
Estimation

f (y, x)
Binary choice
ˆ ˆ
Density estimation m(x) = E(y|x) = yf (y|x)dy = y dy
fx (x)
Regression est.
Cond. moments
⊲ Nonpar. regression • joint density f (y, x), x ∈ Rp
Various estimators
n
X    
Local linear reg. 1 yi − y xi − x
Local polynomial fˆ(y, x) = Ky Kx
Example nh′ hp h′ h
Assumptions i=1
Nadaraya-Watson
Local linear reg. • marginal density fx (x)
Bandwidth choice n
X  
Cross validation
ˆ 1 xi − x
Asymptotics fx (x) = Kx
Confidence intervals nhp h
Testing
i=1

• substitute fˆ and fˆx to estimate m(x)


Examples
Multivariate reg.

Pn yi −y  
Semiparametrics
ˆ 1 xi −x
f (y, x) i=1 Ky Kx h
ˆ ˆ
Discrete choice
dy = y h h′

m̂(x) = y Pn xi −x
 dy
Censored data
fˆx (x) i=1 Kx h
Final thoughts
Econometrics Slide 102
Nonparametric regression

Introduction
Smooth regression function m(x)
Estimation

Binary choice • integrate using substitution t = (yi − y)/h′


Density estimation
1 Pn xi −x
´ yi −y 
Regression est. h′ i=1 Kx h yKy h′ dy
Cond. moments m̂(x) = Pn xi −x

⊲ Nonpar. regression i=1 Kx h
Various estimators Pn xi −x

(yi + th′ )Ky
´
Local linear reg.
i=1 Kx h (t) dt
Local polynomial = Pn xi −x

i=1 Kx
Example
Assumptions
h
Nadaraya-Watson
Local linear reg. • Nadaraya-Watson estimator
Bandwidth choice
Cross validation
n 
Asymptotics X K x−xi
yi
r̂h (x, y)
h =
Confidence intervals
m̂h (x) = Pn
Testing
K
x−xj fˆh (x)
Examples i=1 j=1 h
Multivariate reg.

Semiparametrics Pn
Discrete choice
• general nonparametric estimator m̂h (x) = i=1 wni (x)yi
Censored data
with weights wni (x) = wn (xi , x)
Final thoughts
Econometrics Slide 103
Example – Engel curve

Introduction
Food expenditures vs. net income in the U.K., 1973
Estimation

Binary choice

Density estimation Engel Curve (UK, 1973)


Regression est.

1.5
Cond. moments
⊲ Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
Example
1

Assumptions
Food expenditures

Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
0.5

Confidence intervals
Testing
Examples
Multivariate reg.

Semiparametrics

Discrete choice
0

Censored data 0 0.5 1 1.5 2 2.5 3


Final thoughts Net income
Econometrics Slide 104
Example – coronary heart disease data

Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, addplot((line prob age, sort))
Density estimation

Regression est.

Evidence of coronary heart disease (1=yes, 0=no)


Cond. moments Local polynomial smooth
⊲ Nonpar. regression

1
Various estimators
Local linear reg. .8
Local polynomial
Example
.6

Assumptions
Nadaraya-Watson
Local linear reg.
.4

Bandwidth choice
Cross validation
.2

Asymptotics
Confidence intervals
0

Testing
Examples 0 20 40 60
Multivariate reg. Age (in years)

Semiparametrics Evidence of coronary heart disease (1=yes, 0=no)


Pr(chd) lpoly smooth
Discrete choice
kernel = epanechnikov, degree = 0, bandwidth = 4.49
Censored data

Final thoughts
Econometrics Slide 105
Various estimators

Introduction
Pn
General nonparametric estimator m̂h (x) = i=1 wni (x)yi
Estimation

• Nadaraya-Watson:
Binary choice
Pn
Density estimation wni (x) = K{(xi − x)/h}/ j=1 K{(xj − x)/h}
Regression est.
• Variable bandwidth estimation:
Cond. moments Pn
Nonpar. regression wni (x) = K{(xi − x)/hi }/ j=1 K{(xj − x)/hj }
⊲ Various estimators
Local linear reg.
Local polynomial
Example • The k th nearest neighbor estimator:
Assumptions
Nadaraya-Watson
wni (x) = Iki /k = I(xi = kth nearest to x)/k or
Local linear reg.
Bandwidth choice
wni (x) = Iki wk
Cross validation
Asymptotics
Confidence intervals • Known density fx (x):
Testing
Examples wni (x) = K{(xi − x)/h}/[(nhp )f1 (x)]
Multivariate reg.
• Fixed design
´ (Gasser-Müller estimator):
Semiparametrics si
Discrete choice
wni (x) = si−1 K{(t − x)/h}/h ≈ (si − si−1 )K[(x − ξ)/h]
Censored data

Final thoughts
Econometrics Slide 106
Local linear regression

Introduction
• Nadaraya-Watson minimizes (b0 (x) = m(x), verify)
Estimation
n
X
Binary choice

Density estimation
{yi − b0 (x)}2 K{(xi − x)/h}
Regression est.
i=1
Cond. moments
Nonpar. regression • Local linear regression – minimize
Various estimators
⊲ Local linear reg. n
X
Local polynomial
Example
{yi − b0 (x) − b1 (x)(xi − x)}2 K{(xi − x)/h}
Assumptions i=1
Nadaraya-Watson
Local linear reg. ◦ at given x, b0 (x), b1 (x) regression constants
Bandwidth choice
Cross validation ◦ weighted least squares regression (around x):
Asymptotics
Confidence intervals
Testing b̂0,h (x) = ȳ − b̂1 (x)(x̄h − x)
Examples Pn
Multivariate reg.
i=1 (yi − ȳh )(xi − x̄h )K{(xi − x)/h}
b̂1,h (x) = Pn 2 K{(x − x)/h}
Semiparametrics
j=1 (x j − x̄ h ) j
Discrete choice
Pn Pn
Censored data
for v̄h = i=1 vi K{(xi − x)/h}/ j=1 K{(xj − x)/h}
Final thoughts
Econometrics Slide 107
Local polynomial regression

Introduction
• Motivation – Taylor expansion
Estimation

Binary choice ∂m 1 ∂ pm p
m(xi ) ≈ m(x) + (x)(xi − x) + . . . + (x)(x i − x)
Density estimation ∂x p! ∂xp
Regression est.
Cond. moments
Nonpar. regression
• Local polynomial regression – minimize
Various estimators
Local linear reg. n
X  
xi − x
{yi −b0 (x)−b1 (x)(xi −x)−. . .−bp (x)(xi −x)p }2 K
⊲ Local polynomial
Example
Assumptions h
i=1
Nadaraya-Watson
Local linear reg.
Pn
Bandwidth choice
Cross validation
• Weighted average of yi (m̂(x) = i=1 wni (x)yi )
Asymptotics

b̂h (x) = (b̂0,h (x), . . . , b̂p,h (x))⊤ = (X ⊤ KX)−1 X ⊤ K(y1 , . . . , yn )⊤


Confidence intervals
Testing
Examples
Multivariate reg.
for X = (1, xi − x, . . . , (xi − x)p )n
i=1 and
Semiparametrics

Discrete choice
K = diag(K{(xi − x)/h})
(j)
Censored data • Estimates of derivatives m̂h (x) = bj (x) · j!
Final thoughts
Econometrics Slide 108
Simulated example

Introduction
Nadaraya-Watson and local linear regression
Estimation
(m(x) = x + 5 sin(2x), n = 400, h = 0.8)
Binary choice

Density estimation

Regression est. Local constant (red dashed) and linear (blue solid) regression
Cond. moments
Nonpar. regression
10
Various estimators
Local linear reg.
⊲ Local polynomial
Example
Assumptions
5

Nadaraya-Watson
Local linear reg.
Y

Bandwidth choice
Cross validation
Asymptotics
0

Confidence intervals
Testing
Examples
Multivariate reg.

Semiparametrics
-5

Discrete choice
0 1 2 3 4 5
Censored data X

Final thoughts
Econometrics Slide 109
Example NW – coronary heart disease data

Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, addplot((line prob age, sort))
Density estimation

Regression est.

Evidence of coronary heart disease (1=yes, 0=no)


Cond. moments Local polynomial smooth
Nonpar. regression

1
Various estimators
Local linear reg. .8
⊲ Local polynomial
Example
.6

Assumptions
Nadaraya-Watson
Local linear reg.
.4

Bandwidth choice
Cross validation
.2

Asymptotics
Confidence intervals
0

Testing
Examples 0 20 40 60
Multivariate reg. Age (in years)

Semiparametrics Evidence of coronary heart disease (1=yes, 0=no)


Pr(chd) lpoly smooth
Discrete choice
kernel = epanechnikov, degree = 0, bandwidth = 4.49
Censored data

Final thoughts
Econometrics Slide 110
Example LLR – coronary heart disease data

Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, degree(1) addplot((line prob age, sort))
Density estimation

Regression est.

Evidence of coronary heart disease (1=yes, 0=no)


Cond. moments Local polynomial smooth
Nonpar. regression

1
Various estimators
Local linear reg. .8
Local polynomial
⊲ Example
.6

Assumptions
Nadaraya-Watson
Local linear reg.
.4

Bandwidth choice
Cross validation
.2

Asymptotics
Confidence intervals
0

Testing
Examples 0 20 40 60
Multivariate reg. Age (in years)

Semiparametrics Evidence of coronary heart disease (1=yes, 0=no)


Pr(chd) lpoly smooth
Discrete choice
kernel = epanechnikov, degree = 1, bandwidth = 4.49
Censored data

Final thoughts
Econometrics Slide 111
Example – Engel curve

Introduction
Food expenditures vs. net income in the U.K., 1973
Estimation

Binary choice

Density estimation Engel Curve (dashed) and its derivative (solid)


Regression est.

1.5
Cond. moments
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
⊲ Example
1

Assumptions
Food expenditures

Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
0.5

Confidence intervals
Testing
Examples
Multivariate reg.

Semiparametrics

Discrete choice
0

Censored data
0 0.5 1 1.5 2 2.5 3
Final thoughts Net income
Econometrics Slide 112
Nonparametric regression – assumptions

Introduction
Finite-sample properties
Estimation

Binary choice
• method comparison
Density estimation • bandwidth choice
Regression est.
Cond. moments Assumptions for yi = m(xi ) + εi
Nonpar. regression
Various estimators • (xi , yi ) i.i.d. sample from (x, y) ∼ f
Local linear reg.
Local polynomial • εi i.i.d. with zero mean and independent of xi
Example
⊲ Assumptions • m and f twice continuously differentiable
Nadaraya-Watson
Local linear reg. • kernel K symmetric with
Bandwidth choice
Cross validation ´ ´
Asymptotics ◦ K(x)dx = 1, xK(x)dx = 0
Confidence intervals ´ 2
Testing ◦ x K(x)dx = µ2 (K) < ∞
Examples ´ 2
Multivariate reg. ◦ K (x)dx = kKk2 < ∞
Semiparametrics

Discrete choice • h = hn → 0 and nh → ∞ as n → ∞


′′
Censored data • fx (z) bounded around x in interior of supp(fx )
Final thoughts
Econometrics Slide 113
Nadaraya-Watson estimator

Introduction
• Bias (fx (x) > 0)
Estimation
( ′
)
h2
Binary choice
fx (x)m′ (x) 1
Density estimation bias[m̂h (x)] = ′′
m (x) + 2 +O( )+o(h2 )
Regression est.
2 fx (x) nh
Cond. moments
Nonpar. regression
Various estimators • Variance (σ 2 (x) = var(εi |xi ))
Local linear reg.
Local polynomial
Example
1 σ 2 (x) 2 1
Assumptions var[m̂h (x)] = kKk + o( )
⊲ Nadaraya-Watson nh fx (x) nh
Local linear reg.
Bandwidth choice
Cross validation • Comparison with density estimators
Asymptotics
Confidence intervals
Testing ◦ bias proportional to curvature (m′′ (x))
Examples ′
Multivariate reg. ◦ extra bias term (m′ (x)fx (x)/fx (x))
Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 114
Local linear regression

Introduction
• Bias
Estimation
h2
Binary choice
bias[m̂h (x)] = µ2 (K)m′′ (x) + o(h2 )
Density estimation 2
• Variance (σ 2 (x) = var(εi |xi ))
Regression est.
Cond. moments
Nonpar. regression
Various estimators 1 σ 2 (x) 2 1
Local linear reg. var[m̂h (x)] = kKk + o( )
Local polynomial nh fx (x) nh
Example
Assumptions
Nadaraya-Watson
• Comparison with Nadaraya-Watson estimator
⊲ Local linear reg.
Bandwidth choice ◦ bias independent of fx (design)
Cross validation
Asymptotics ◦ no bias for linear functions m
Confidence intervals
Testing ◦ very similar to bias and variance of density estimator
Examples
Multivariate reg.
• General combination with a parametric estimator m(x, β):
Semiparametrics
Pn
Discrete choice i=1 {yi − m(x, β)}2 K{(xi − x)/h}
Censored data

Final thoughts
◦ reduced bias if m(x, β) is close to m(x)
Econometrics Slide 115
Simulated example

Introduction
Nadaraya-Watson and local linear regression
Estimation
(m(x) = x + 5 sin(2x), n = 1000, h = 0.8)
Binary choice

Density estimation

Regression est. Local constant (red dashed) and linear (blue solid) regression
Cond. moments
Nonpar. regression
10
Various estimators
Local linear reg.
Local polynomial
5

Example
Assumptions
Nadaraya-Watson
⊲ Local linear reg.
0
Y

Bandwidth choice
Cross validation
Asymptotics
Confidence intervals
-5

Testing
Examples
Multivariate reg.
-10

Semiparametrics

Discrete choice
-5 0 5
Censored data X

Final thoughts
Econometrics Slide 116
Bandwidth choice

Introduction
Optimal bandwidth selection
Estimation

Binary choice
• minimize mean integrated squared error
Density estimation
ˆ
Regression est. a
Cond. moments M ISE(m̂) = E [m̂(x) − m(x)]2 dx = c1 /(nh) + c2 h4
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial ⇒ hopt = (c1 /4c2 )n−1/5
Example
Assumptions
Nadaraya-Watson
• plug-in estimator – not used, complicated
Local linear reg.
⊲ Bandwidth choice • alternative: minimize mean average squared error
Cross validation " n
#
Asymptotics
1 X
Confidence intervals
M ASE(m̂) = E {m̂(xi ) − m(xi )}2
Testing n
Examples i=1
Multivariate reg.
◦ advantage: if yi − m(xi ) and m̂(xi ) are uncorrelated,
Semiparametrics Pn Pn
Discrete choice i=1 {yi
− m̂(xi )}2 = i=1 {m(xi ) + εi − m̂(xi )}2 =
Censored data σ 2 + M ASE(m̂)
Final thoughts
Econometrics Slide 117
Cross validation

Introduction
Mean average squared error
Estimation

Binary choice
• yi − m(xi ) and m̂(xi ) are correlated
Density estimation • solution: omit the ith observation from the sample to estimate
Regression est. m(xi ) by m̂h,−i (xi ), which is uncorrelated with yi − m(xi )
Cond. moments
Nonpar. regression
Various estimators
Local linear reg. Leave-one-out cross validation
Local polynomial
Example n
X
Assumptions
Nadaraya-Watson hCV = arg minh {yi − m̂h,−i (xi )}2
Local linear reg.
Bandwidth choice
i=1
⊲ Cross validation
Asymptotics
Confidence intervals • m̂h,−i (xi ) = leave-one-out estimate
Testing
Examples
based on observations 1, . . . , i − 1, i + 1, . . . , n
Multivariate reg.
• hCV → hopt very slowly (∼ n−1/10 )
Semiparametrics

Discrete choice
• often used
Censored data

Final thoughts
Econometrics Slide 118
Example – Engel curve

Introduction
Food expenditures vs. net income in the U.K., 1973:
Estimation
bandwidth selection by cross validation
Binary choice

Density estimation Cross Validation Optimal h:


Regression est. 0.00021+criterion*E-6 0.12884
Cond. moments
10
Nonpar. regression
Various estimators
5

Local linear reg. 5 10 15 20 25 30 35


Local polynomial 0.05+h*E-2
Example
Assumptions Engel curve Range of h:
Nadaraya-Watson 0.075
1.5

Local linear reg. 0.4


Bandwidth choice
Points:
⊲ Cross validation
100
Asymptotics
1

Confidence intervals ----------


Testing
Y

Examples Kernel K:
qua
0.5

Multivariate reg.

Semiparametrics Binwidth d:
0.01
Discrete choice
0

Censored data
0.5 1 1.5 2 2.5 3
Final thoughts X
Econometrics Slide 119
Asymptotics – consistency

Introduction
Provided that kernel K and density f satisfy additionally
Estimation

Binary choice • |x|K(x) → 0 as x → ∞


Density estimation • supx∈R |K(x)| < ∞
K 2+δ (x)dx < ∞
Regression est.
´
Cond. moments

Nonpar. regression
Various estimators
Local linear reg. • f (x) > 0
Local polynomial
Example • E(ε2+δ
i )<∞
Assumptions
Nadaraya-Watson
Local linear reg.
The Nadaraya-Watson estimator is
Bandwidth choice
Cross validation • weakly pointwise consistent: limn→∞ m̂(x) = m(x)
⊲ Asymptotics
Confidence intervals • uniformly strongly consistent (under stronger assumptions)
Testing
Examples
Multivariate reg. lim sup |m̂(x) − m(x)| = 0
Semiparametrics x
Discrete choice
(nh2h → 0, uniform continuity of f and m, ...)
Censored data

Final thoughts
Econometrics Slide 120
Asymptotics – asymptotic distribution

Introduction
The Nadaraya-Watson estimator is (hn → 0 and nhn → ∞)
Estimation

Binary choice • asymptotically normal (pointwise):


 
Density estimation √ σ 2 (x)
Regression est. nh{m̂h (x) − Em̂(x)} → N 0, kKk2
Cond. moments f (x)
Nonpar. regression √ √
Various estimators
Local linear reg.
(the same applies to nh(m̂h − m) if nhh2 → 0,
Local polynomial which does not hold for hopt , but for undersmoothing)
Example
Assumptions • asymptotics of m̂h − m under h = cn−1/5
Nadaraya-Watson
Local linear reg. √
Bandwidth choice nh{m̂h (x) − m(x)} →
Cross validation
⊲ Asymptotics
( ′
) !
2 m′′ (x) m′ (x)fx (x) σ 2 (x)
kKk2
Confidence intervals
Testing → N c µ2 (K) + ,
Examples 2 fx (x) cfx (x)
Multivariate reg.

Semiparametrics
• optimal rate of convergence: hopt ∼ n−1/5 ⇒ nh ∼ n2/5
Discrete choice

Censored data
• local linear estimator – analogous
Final thoughts
Econometrics Slide 121
Confidence intervals

Introduction
Asymptotic normality ⇒ pointwise confidence intervals
Estimation

Binary choice
• asymptotic confidence interval
Density estimation
α
" #−1/2
Φ−1 1− σ̂ 2 (x)
ˆ
Regression est.
Cond. moments m̂(x) ± √ 2
K 2 (x)dx
Nonpar. regression nh fˆ(x)
Various estimators
Local linear reg.

under undersmoothing (h ∼ n−1/5−δ , δ > 0)


Local polynomial
Example
Assumptions
Nadaraya-Watson
• finite samples
Local linear reg.
Bandwidth choice
Cross validation
◦ undersmooth (see above)
Asymptotics
⊲ Confidence intervals
◦ estimate bias (difficult in small samples)
Testing
Examples
Multivariate reg.
• confidence bands – analogous to density estimation
Semiparametrics

Discrete choice

Censored data

Final thoughts
Econometrics Slide 122
Testing

Introduction
Testing H0 : m = g vs. H1 : m 6= g for a known g(x, θ)
Estimation

Binary choice • pointwise tests: select xt1 , . . . , xtd , compute


Density estimation √
Regression est. Tj = nhvj−1 [m̂(xtj ) − g(xtj , θ̂)]2 ∼ N (0, 1),
Cond. moments
Nonpar. regression Pd
Various estimators and use maxj=1,...,d Tj or average T̄ = i=1 Tj /d
Local linear reg.
Local polynomial (Tj asymptotically independent)
Example
Assumptions • conditional moment tests E[εi |xi ] = 0
Nadaraya-Watson
Local linear reg. (asymptotically normal test statistics)
Bandwidth choice
Cross validation
◦ ⇒ E{h(xi )E[εi |xi ]} = 0:
Asymptotics Pn
Confidence intervals ρ̂ = i=1 {m̂(xi ) − g(xi , θ̂)}{yi − g(xi , θ̂)}/n
⊲ Testing
Examples ◦ ⇒ E{E[εi |xi ]}2 = 0:
Multivariate reg. Pn ˆ \
Semiparametrics ρ̃ = i=1 fx (xi )E(ε̂ i |xi ){yi − g(xi , θ̂)}/n
Discrete choice

Censored data
• many other possibilities
Final thoughts
Econometrics Slide 123
Example: CPS 1985

Introduction
Income-experience profile (USA, CPS 1985)
Estimation
(education fixed at 12 years)
Binary choice

Density estimation
LS (solid) and NW (dashed) fits at educ = 12
Regression est.
Cond. moments

3
Nonpar. regression
Various estimators
Local linear reg. 2.5
Local polynomial
Example
2

Assumptions
Log Income

Nadaraya-Watson
Local linear reg.
1.5

Bandwidth choice
Cross validation
Asymptotics
1

Confidence intervals
Testing
0.5

⊲ Examples
Multivariate reg.

Semiparametrics
0

Discrete choice 0 10 20 30 40
Censored data Experience
Final thoughts
Econometrics Slide 124
Example: coronary heart disease

Introduction
probit chd age
Estimation
predict prob, pr
Binary choice
lpoly chd age, ci degree(1) addplot((line prob age, sort))
Density estimation

Regression est.

Evidence of coronary heart disease (1=yes, 0=no)


Cond. moments Local polynomial smooth

1.5
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
1

Example
Assumptions
.5

Nadaraya-Watson
Local linear reg.
Bandwidth choice
0

Cross validation
Asymptotics
Confidence intervals
−.5

Testing
⊲ Examples 0 20 40 60
Multivariate reg. Age (in years)

Semiparametrics 95% CI Evidence of coronary heart disease


lpoly smooth Pr(chd)
Discrete choice
kernel = epanechnikov, degree = 1, bandwidth = 4.49, pwidth = 6.73
Censored data

Final thoughts
Econometrics Slide 125
Multivariate regression

Introduction
Regression function E(y|x) = E(y|x1 , . . . , xd ).
Estimation

Binary choice • Nadaraya-Watson estimator:


Density estimation n
X K{H −1 (xi − x)}yi
Regression est. m̂H (x) = Pn −1 (x − x)}
j=1 K{H
Cond. moments
i=1 j
Nonpar. regression
Various estimators
Local linear reg.
Local polynomial
• Local linear regression: m̂H (x) = βH0 (x)
Example
Assumptions
Nadaraya-Watson
n
X
Local linear reg.
Bandwidth choice
β̂H (x) = arg minβ {yi − β ⊤ (xi − x)}2 K{H −1 (xi − x)}
Cross validation i=1
Asymptotics
Confidence intervals
Testing
• Asymptotic mean square error for H = diag(h)
Examples
1
⊲ Multivariate reg.
M SE[m̂(x)] = C1 d + C2 h4
Semiparametrics nh
Discrete choice
• Curse of dimensionality: √
Censored data

Final thoughts
hopt ∼ n−1/(4+d) ⇒ AM SE ∼ n−4/(4+d) , nh ∼ n−2/(4+d)
Econometrics Slide 126
Example: CPS 1985

Introduction
Income as a function of education and experience
Estimation
and Mincer’s equation (USA, CPS 1985)
Binary choice
(education = 2 to 18 years, experience = 0 to 55 years)
Density estimation

Regression est.
Cond. moments
Nonpar. regression Wage = m(Education, Experience) Wage = a + b*Educ + c*Exp + d*Exp^2
Various estimators
Local linear reg.
Local polynomial
Example
Assumptions
Nadaraya-Watson
Local linear reg.
Bandwidth choice
Cross validation
Asymptotics
43.9
Confidence intervals 37.8
31.6
Testing 25.5
43.9 19.4
Examples 37.8 13.2
31.6 7.1
⊲ Multivariate reg. 25.5
19.4 6.8 10.5 14.2
13.2
7.1
Semiparametrics
6.8 10.5
Discrete choice 14.2

Censored data

Final thoughts
Econometrics Slide 127
Introduction

Estimation

Binary choice

Density estimation

Regression est.

⊲ Semiparametrics
Semiparametric LS
Klein and Spady
Example
Semiparametric regression
Average derivative
PLM estimation
Heteroscedasticity
Application

Discrete choice

Censored data

Final thoughts

Econometrics Slide 128


Semiparametric LS

Introduction
Semiparametric least squares: Ichimura (1993)
Estimation
Nonlinear least squares for E(Yi |Xi ) = g(Xi⊤ β): g is known
Binary choice
n
X
Density estimation

Regression est.
min {Yi − g(Xi⊤ β)}2
β∈B
Semiparametrics
i=1
⊲ Semiparametric LS
Klein and Spady Semiparametric least squares: g is unknown
Example
Average derivative
PLM
• estimate g(Xi⊤ β) = E(Yi |Xi⊤ β) = E[g(Xi⊤ β)|Xi⊤ β) using a
Heteroscedasticity leave-one-out estimator ĝn (Xi⊤ β) = Ê−i,n (Yi |Xi⊤ β) =
Application
 ,   
Discrete choice
X (Xj − Xi ⊤
) β X (Xj − Xi ⊤
) β
Yi K K
Censored data hn hn
j6=i j6=i
Final thoughts

• maximize or solve FOC to get β̂n


n
X
min {Yi − ĝn (Xi⊤ β)}2
β∈B
i=1

Econometrics Slide 129


Semiparametric LS

Let Yi = g(Xi⊤ β) + εi , where Xi does not contain intercept, and


Introduction

Estimation

Binary choice • β ∈ B compact and |β1 | = 1


Density estimation
• g and K be three times (Lipschitz) differentiable wrt. z = x⊤ β
Regression est.
• Yi has finite k th moments, k ≥ 3, and cov(Yi |Xi ) is uniformly
Semiparametrics
⊲ Semiparametric LS bounded
Klein and Spady 3+3/(k−1)
Example • ln hn /[nhn ] → 0 and nh8n → 0
Average derivative
PLM Then the semiparametric LS estimator is consistent and
Heteroscedasticity
Application
√ d −1 −1

Discrete choice n(β̂n − β) → N 0, V ΣV
Censored data

Final thoughts
• Σ = E[E(ε2i |Xi ) · {g ′ (Xi⊤ β)}2 ×
{Xi − E(Xi |Xi⊤ β)}{Xi − E(Xi |Xi⊤ β)}⊤ ]
• V = E[{g ′ (Xi⊤ β)}2 {Xi − E(Xi |Xi⊤ β)}{Xi − E(Xi |Xi⊤ β)}⊤ ]
• semiparametrically efficient
• under heteroscedasticity, weighting can be employed
Econometrics Slide 130
Klein and Spady

Introduction
Klein and Spady (1993): estimate F and maximize likelihood based
Estimation
on the estimated distribution function F̂n
Binary choice

Density estimation • binary response: F (Xi⊤ β) = P (Yi = 1|Xi⊤ ) = E(Yi |Xi⊤ β)


Regression est. and leave-one-out estimator
Semiparametrics
 ,  
Semiparametric LS X (Xi − x)⊤ β X (Xj − ⊤
x) β
⊲ Klein and Spady
F̂−i,n (x) = Yi K K
Example
hn hn
Average derivative yj =1,j6=i j6=i
PLM
Heteroscedasticity
Application
• maximize likelihood (ξi is a trimming indicator)
Discrete choice

Censored data
n
X
Final thoughts ξi [Yi ln F̂−i,n (Xi⊤ β) + (1 − Yi ) ln{1 − F̂−i,n (Xi⊤ β)}]
i=1

or solve the first-order conditions

Econometrics Slide 131


Klein and Spady

Introduction
Klein and Spady (1993)’s estimator under assumptions
Estimation

Binary choice
• data are iid, β ∈ B compact
Density estimation • P (β) = P (Yi = 1|Xi⊤ β) = P (Yi = 1|Xi ) ∈ (a, b),
Regression est. where 0 < a and b < 1
Semiparametrics
Semiparametric LS
• P (Yi = 1|Xi⊤ β = t) continuously differentiable in t
⊲ Klein and Spady
Example
• n−1/6 < hn < n−1/8 and a higher-order kernel is used
Average derivative
PLM is
Heteroscedasticity
Application
• consistent and asymptotically normal
Discrete choice
 −1 !
Censored data
√ d ∂P (β) ∂P (β) 1
Final thoughts n(β̂n −β) → N 0, E
∂β ∂β ⊤ P (β)[1 − P (β)]

• semiparametrically efficient
• (parametrically) efficient if E(Xi |Xi⊤ β) = c0 + c1 (Xi⊤ β)

Econometrics Slide 132


Example: labor force participation

Introduction
Married women labor force participation (Mroz, 1987):
Estimation

Binary choice
• binary decision = choice to work (1) or to stay at home (0)
Density estimation • explanatory variables
Regression est.

Semiparametrics
◦ non-wife household income
Semiparametric LS
Klein and Spady
◦ age
⊲ Example
Average derivative
◦ education
PLM
Heteroscedasticity
◦ labor market experience and its square
Application
◦ number of children (below and above 6)
Discrete choice

Censored data

Final thoughts

Econometrics Slide 133


Example: labor force participation

Introduction
Married women labor force participation (Mroz, 1987) – estimates:
Estimation

Binary choice

Density estimation
Stata: regress probit sml
Regression est.
Linear Probit KS
Semiparametrics
Semiparametric LS --------------------------------------------------
Klein and Spady
⊲ Example inlf | Coef. SE | Coef. SE | Coef. SE
Average derivative
PLM
---------+-------------+-------------+------------
Heteroscedasticity nwifeinc | -.011 .004 | -.014 .006 | -.015 .001
Application

Discrete choice
educ | .141 .027 | .151 .029 | .129 .011
Censored data
exper | .382 .019 | .142 .022 | .243 ---
Final thoughts
expersq | -.008 .000 | -.002 .001 | -.005 .000
age | -.061 .008 | -.060 .010 | -.058 .002
kidslt6 | -1 .126 | -1 .137 | -1 .025
kidsge6 | .050 .049 | .041 .050 | .052 .009
_cons | 2.237 .588 | .311 .585 | --- ---
---------+-------------+-------------+------------
Econometrics Slide 134
Example: labor force participation

Introduction
probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6
Estimation
predict pindex, xb
Binary choice
predict prob, p
Density estimation
lpoly inlf pindex, ci addplot((line prob pindex, sort))
Regression est.

Semiparametrics
Semiparametric LS Local polynomial smooth
Klein and Spady
⊲ Example 1
Y=1 if in labor force, 1975

Average derivative
PLM
.5

Heteroscedasticity
Application

Discrete choice
0

Censored data

Final thoughts
−.5

−3 −2 −1 0 1 2
Linear prediction

95% CI Y=1 if in labor force, 1975


lpoly smooth Pr(inlf)
kernel = epanechnikov, degree = 0, bandwidth = .49, pwidth = .73

Econometrics Slide 135


Example: labor force participation

Introduction
probit inlf nwifeinc educ exper expersq age kidslt6 kidsge6
Estimation
predict pindex, xb
Binary choice
predict prob, p
Density estimation
lpoly inlf pindex, ci degree(1) addplot((line prob pindex, sort))
Regression est.

Semiparametrics
Semiparametric LS Local polynomial smooth

1.5
Klein and Spady
⊲ Example
Y=1 if in labor force, 1975

Average derivative
1

PLM
Heteroscedasticity
Application
.5

Discrete choice

Censored data
0

Final thoughts
−.5

−3 −2 −1 0 1 2
Linear prediction

95% CI Y=1 if in labor force, 1975


lpoly smooth Pr(inlf)
kernel = epanechnikov, degree = 1, bandwidth = .49, pwidth = .73

Econometrics Slide 136


Average derivative

Direct estimation of single index model E(Yi |Xi ) = g(Xi⊤ β)


Introduction

Estimation
(Powell, Stock, and Stoker, 1989; Härdle and Stoker, 1989)
Binary choice

Density estimation • moment condition for m(x) = E(Yi |Xi = x)


Regression est.

Semiparametrics
′ ∂g(x ⊤ β)
′ ⊤
 ′
Semiparametric LS m (x) = = g (x β)β ⇒ E m (Xi ) = γβ
Klein and Spady ∂x
Example
⊲ Average derivative
PLM • integration by parts (f denotes density of Xi )
Heteroscedasticity
Application
 ′

f (Xi )
ˆ ˆ
Discrete choice Em′ (Xi ) = m′ (x)f (x)dx = − m(x)f ′ (x)dx = −E Yi
Censored data
f (Xi )
Final thoughts
• estimate f and f ′ by kernel density estimator and set (an ↓ 0)

Yi fˆn (Xi )
n
X ′
c =
γβ − · I[fˆn (Xi ) > an ]
n
i=1 nfˆn (Xi )

Econometrics Slide 137


Average derivative

Average derivative estimation of E(Yi |Xi ) = g(Xi⊤ β) under


Introduction

Estimation

Binary choice • Xi ∈ Rp has a convex support with ⌈p/2 + 2⌉ times


Density estimation differentiable density f
Regression est.
• m′ (Xi ) and f ′ (Xi ) are Lipschitz and have finite second
Semiparametrics
Semiparametric LS moments
Klein and Spady
Example • other regularity assumptions (on kernel K etc.)
⊲ Average derivative
PLM is consistent and asymptotically normal
Heteroscedasticity
Application
√ d
Discrete choice n(β̂n − β) → N (0, Ω)
Censored data

Final thoughts • Ω = 4E[R(Yi , Xi )R(Yi , Xi )⊤ ] − 16E[Yi f ′ (Xi )]


• R(Yi , Xi ) = f (Xi )m′ (Xi ) − {Yi − m(Xi )}f ′ (Xi )

Econometrics Slide 138


Other applications – partially linear models

Introduction
Partially linear models (Robinson, 1988, ECO 56, 931–954)
Estimation

Binary choice
• linear regression with one variable Zi entering nonparametrically
Density estimation

Regression est.
Yi = Xi⊤ β +g(Zi )+εi , E(εi |Xi , Zi ) = 0, g:R→R
Semiparametrics
Semiparametric LS Extensions (cf. Tobit and sample selection models)
Klein and Spady
Example
Average derivative
• partially linear single-index model (Xia, Tong, and Li, 1999)
⊲ PLM
Heteroscedasticity
Application
Yi = Xi⊤ β + g(Zi⊤ γ) + εi
Discrete choice
• generalized partially linear models (Carroll et al., 1997)
Censored data

Final thoughts
E(Yi |Xi ) = F (Xi⊤ β + g(Zi ))

e.g., for logit with F ≡ Λ, E(Yi |Xi ) = P (Yi = 1|Xi ) and

P (Yi = 1|Xi )
ln = Xi⊤ β + g(Zi )
P (Yi = 0|Xi )
Econometrics Slide 139
Other applications – partially linear models

Model Yi = Xi⊤ β + g(Zi ) + εi (no constant in Xi ) implies


Introduction

Estimation

Binary choice • E(Yi |Zi ) = E(Xi |Zi )⊤ β + g(Zi ) + E(εi |Zi ) and
Density estimation

Regression est. Yi − E(Yi |Zi ) = {Xi − E(Xi |Zi )}⊤ β + {εi − E(εi |Zi )}
Semiparametrics
Semiparametric LS
Klein and Spady
Estimation (involves p + 1 nonparametric regressions)
Example
Average derivative • estimate µyi = µy (Zi ) = E(Yi |Zi ), µxi = µx (Zi ) = E(Xi |Zi )
⊲ PLM
Heteroscedasticity • estimate β by the least squares estimator β̂n
Application

Discrete choice
" n
#−1 " n #
X X
Censored data
(Xi − µ̂xi )(Xi − µ̂xi )⊤ (Xi − µ̂xi )(Yi − µ̂yi )⊤
Final thoughts
i=1 i=1

• estimate function g nonparametrically:

◦ set ĝn (Zi ) = µ̂yi − µ̂⊤


xi β̂n , or
◦ nonparametrically regress Yi − Xi⊤ β̂n on Zi
Econometrics Slide 140
Other applicartions – heteroscedastic linear regression

Linear regression Yi = Xi⊤ β + εi with var(εi |Xi ) = σ 2 (Xi ).


Introduction

Estimation Pn −2 ⊤
Binary choice
Generalized LS (GLS) solves i=1 σ (Xi )Xi (Yi − Xi β) = 0
Density estimation
n
!−1 n
!
X Xi Xi⊤ X Xi Yi
β̂nGLS
Regression est.
=
Semiparametrics
i=1
σ 2 (Xi )
i=1
σi2 (Xi )
Semiparametric LS
Klein and Spady
Example
Average derivative
Heteroscedasticity of
PLM
⊲ Heteroscedasticity • known form: σ 2 (Xi ) = exp(Xi⊤ γ) and estimate γ̂n
Application
• unknown form: nonparametric estimate σ̂n2 (Xi ) of σ 2 (Xi )
Discrete choice

Censored data
◦ Robinson (1987) ECO 55(4), 875–891: compute
Final thoughts
ei = yi − x⊤ OLS and nonparametrically estimate
i β̂
σ 2 (x) = E(ε2i |Xi ) using residuals e2i
◦ alternative – fully nonparametric estimation: compute
\
ẽi = Yi − E(Y i |Xi ) and nonparametrically estimate
σ 2 (Xi ) = E(ε2i |Xi ) using residuals ẽ2i
Econometrics Slide 141
Application

Introduction
Lehrer and Kordas (2013) Matching using semiparametric propensity
Estimation
scores. Empirical Economics 44, 13–45.
Binary choice

Density estimation • look at impact of job training programs


Regression est. • study (simulations) and analyze (application) propensity score
Semiparametrics
matching
Semiparametric LS
Klein and Spady
• matching by probit, Klein-Spady, and maximum score estimators
Example
Average derivative
PLM
Heteroscedasticity
⊲ Application

Discrete choice

Censored data

Final thoughts

Econometrics Slide 142


Introduction

Estimation

Binary choice

Density estimation

Regression est.

Semiparametrics

⊲ Discrete choice
Introduction
Ordered models Discrete choice models
Example
Specification tests
Semiparametrics
Application
Multinomial models
Latent model
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics

Censored data

Final thoughts

Econometrics Slide 143


Multinomial and ordered response models

Introduction
Data with a discrete response with more than two values
Estimation

Binary choice
• multiple discrete responses yi = 0, 1, . . . , J
Density estimation
◦ ordered response (values not completely arbitrary)
Regression est.
(e.g., credit rating, preference, health plan choice, ...)
Semiparametrics

Discrete choice ◦ unordered (nominal) response


⊲ Introduction
Ordered models
(e.g., mode of transportation, choice of industry for
Example investment, brand choice, ...)
Specification tests
Semiparametrics
Application • explanatory variables xi
Multinomial models
Latent model • motivated by latent (utility-maximization) models
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics

Censored data

Final thoughts

Econometrics Slide 144


Ordered response models

Introduction
Discrete response yi = 0, . . . , J , where responses are ordered
Estimation
(ratings, preferences for food, no/part-time/full-time job)
Binary choice

Density estimation Ordered probit and logit models:


Regression est.
• latent (utility) model yi∗ = x⊤
i β + εi , where εi ∼ F
Semiparametrics
(e.g., F = N (0, 1) or F = Λ(0, 1))
Discrete choice
Introduction
• cut-off points α1 < α2 < . . . < αJ
⊲ Ordered models
Example
• response 
Specification tests
Semiparametrics 
 0 if yi∗ ≤ α1
Application 

Multinomial models 1 if α1 < yi∗ ≤ α2
Latent model
yi = .. ..
Multinomial logit

 . .
Latent model 

Conditional logit 
Multinomial probit J if αJ < yi∗
Hierarchy
Semiparametrics
• identification: no intercept in xi or α1 = 0 as
Censored data
α1 < x⊤ ⊤
i β ≤ α2 ⇔ α1 + γ < xi β + γ ≤ α2 + γ
Final thoughts
• (yi , xi )n
i=1 random sample, full-rank assumption satisfied
Econometrics Slide 145
Ordered response models

Introduction
Ordered response models:
Estimation

Binary choice
• probabilities (P (yi = j|xi ) = P (αj−1 < x⊤
i β + εi ≤ αj |xi ))
Density estimation

Regression est.
P (yi = 0|xi ) = F (α1 − x⊤
i β) − 0
Semiparametrics P (yi = 1|xi ) = F (α2 − x⊤
i β) − F (α1 − x ⊤
i β)
Discrete choice .. .. ..
Introduction
. . .
⊲ Ordered models
Example
Specification tests
P (yi = J|xi ) = 1 − F (αJ − x⊤
i β)
Semiparametrics
Application
Multinomial models • log-likelihood contributions (for MLE, FOC, variance, ...)
Latent model
Multinomial logit
Latent model l(wi , β) = I(yi = 0) · ln F (α1 − x⊤
i β)
Conditional logit
Multinomial probit + I(yi = 1) · ln[F (α2 − x⊤
i β) − F (α1 − x ⊤
i β)]
Hierarchy
Semiparametrics + ...
+ I(yi = J) · ln[1 − F (αJ − x⊤
Censored data

Final thoughts
i β)]

Econometrics Slide 146


Ordered response models

Introduction
Probit fit of yi = I(0.5 + xi + εi > −0.5) + I(0.5 + xi + εi > 1)
Estimation
for εi ∼ N (0, 1) and n = 1000
Binary choice

Density estimation

3
Regression est.

Semiparametrics

Discrete choice
Introduction
2

⊲ Ordered models
Example
Specification tests
Semiparametrics
1

Application
Multinomial models
Latent model
Multinomial logit
Latent model
0

Conditional logit
Multinomial probit −6 −4 −2 0 2 4
Hierarchy Linear prediction (cutpoints excluded)
Semiparametrics
y Pr(y==0)
Censored data Pr(y==1) Pr(y==2)

Final thoughts

Econometrics Slide 147


Ordered response models

Introduction
Ordered response models:
Estimation

Binary choice
• probabilities (P (yi = j|xi ) = P (αj−1 < x⊤
i β + εi ≤ αj |xi ))
Density estimation

Regression est.
P (yi = 0|xi ) = F (α1 − x⊤
i β)
Semiparametrics P (yi = 1|xi ) = F (α2 − x⊤
i β) − F (α1 − x ⊤
i β)
Discrete choice .. .. ..
Introduction
. . .
⊲ Ordered models
Example
Specification tests
P (yi = J|xi ) = 1 − F (αJ − x⊤
i β)
Semiparametrics
Application
Multinomial models • marginal effects (see signs of the effect of the middle terms!)
Latent model
Multinomial logit
Latent model ∂P (yi = 0|xi )/∂xik = −βk f (α1 − x⊤
i β)
Conditional logit
Multinomial probit ∂P (yi = 1|xi )/∂xik = βk [f (α1 − x⊤
i β) − f (α2 − x ⊤
i β)]
Hierarchy
Semiparametrics .. .. ..
. . .
Censored data

Final thoughts ∂P (yi = J|xi )/∂xik = βk f (αJ − x⊤


i β)

Econometrics Slide 148


Example: asset allocation

Introduction
Pension-plan decision of adults
Estimation
(mostly bonds = 0, mixed = 1, mostly stocks = 2)
Binary choice

Density estimation Explanatory variables


Regression est.
• ability to choose pension investment scheme
Semiparametrics

Discrete choice
• profit-sharing plan
Introduction
Ordered models
• age
⊲ Example
Specification tests
• education
Semiparametrics
Application
• gender
Multinomial models
Latent model
• race
Multinomial logit
Latent model
• marital status
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics

Censored data

Final thoughts

Econometrics Slide 149


Example: asset allocation

Introduction

Estimation oprobit pctstck choice prftshr female married


Binary choice age educ black
Density estimation

Regression est. ---------------------------------------------------


Semiparametrics res | Coef. Std. Err. z P>|z|
Discrete choice
Introduction
----------+----------------------------------------
Ordered models choice | .3064487 .1706318 1.80 0.073
⊲ Example
Specification tests prftshr | .5070566 .2066888 2.45 0.014
Semiparametrics
Application female | .0332831 .1951634 0.17 0.865
Multinomial models
Latent model
married | .0133023 .2116247 0.06 0.950
Multinomial logit age | -.044364 .0206048 -2.15 0.031
Latent model
Conditional logit educ | .0257263 .0314588 0.82 0.413
Multinomial probit
Hierarchy
black | .1386562 .2634013 0.53 0.599
Semiparametrics
----------+----------------------------------------
Censored data
/cut1 | -2.444581 1.430851
Final thoughts
/cut2 | -1.428021 1.426402
Econometrics
---------------------------------------------------
Slide 150
Example: asset allocation

Introduction

Estimation margins, predict(outcome(#1)) dydx(age prftshr)


Binary choice margins, predict(outcome(#2)) dydx(age prftshr)
Density estimation margins, predict(outcome(#3)) dydx(age prftshr)
Regression est.

Semiparametrics ---------------------------------------------------
Discrete choice
Introduction
| Delta-method
Ordered models | dy/dx Std. Err. z P>|z|
⊲ Example
Specification tests ----------+----------------------------------------
Semiparametrics
Application 0 prftshr | -.1766166 .070157 -2.52 0.012
Multinomial models
Latent model
age | .0154528 .0070159 2.20 0.028
Multinomial logit ---------------------------------------------------
Latent model
Conditional logit 1 prftshr | .0097728 .0133563 0.73 0.464
Multinomial probit
Hierarchy
age | -.0008551 .0011694 -0.73 0.465
Semiparametrics
---------------------------------------------------
Censored data
2 prftshr | .1668438 .0662334 2.52 0.012
Final thoughts
age | -.0145977 .0066674 -2.19 0.029
Econometrics
---------------------------------------------------
Slide 151
Example: asset allocation

Introduction
oprobit pctstck choice prftshr female married age educ black
Estimation
predict ind, xb; predict p1, outcome(#1)
Binary choice
gen y1 = (y == 0)
Density estimation
lpoly y1 ind, ci degree(1) addplot((line p1 ind, sort))
Regression est.

Semiparametrics
Local polynomial smooth
Discrete choice
Introduction

1
Ordered models
⊲ Example
Specification tests
I(y = 0)

Semiparametrics
.5

Application
Multinomial models
Latent model
Multinomial logit
0

Latent model
Conditional logit
−3 −2.5 −2 −1.5 −1
Multinomial probit Linear prediction (cutpoints excluded)
Hierarchy
95% CI I(y=0)
Semiparametrics
lpoly smooth Pr(y=0)
Censored data kernel = epanechnikov, degree = 1, bandwidth = .37, pwidth = .56

Final thoughts

Econometrics Slide 152


Example: asset allocation

Introduction
oprobit pctstck choice prftshr female married age educ black
Estimation
predict ind, xb; predict p2, outcome(#2)
Binary choice
gen y2 = (y == 1)
Density estimation
lpoly y2 ind, ci degree(1) addplot((line p2 ind, sort))
Regression est.

Semiparametrics
Local polynomial smooth
Discrete choice

1.5
Introduction
Ordered models
⊲ Example
1

Specification tests
I(y = 1)

Semiparametrics
.5

Application
Multinomial models
Latent model
0

Multinomial logit
Latent model
−.5

Conditional logit
−3 −2.5 −2 −1.5 −1
Multinomial probit Linear prediction (cutpoints excluded)
Hierarchy
95% CI I(y=1)
Semiparametrics
lpoly smooth Pr(y=1)
Censored data kernel = epanechnikov, degree = 1, bandwidth = .19, pwidth = .29

Final thoughts

Econometrics Slide 153


Example: asset allocation

Introduction
oprobit pctstck choice prftshr female married age educ black
Estimation
predict ind, xb; predict p3, outcome(#3)
Binary choice
gen y3 = (y == 2)
Density estimation
lpoly y3 ind, ci degree(1) addplot((line p3 ind, sort))
Regression est.

Semiparametrics
Local polynomial smooth
Discrete choice

1
Introduction
Ordered models
⊲ Example
.5

Specification tests
I(y = 2)

Semiparametrics
0

Application
Multinomial models
−.5

Latent model
Multinomial logit
Latent model
−1

Conditional logit
−3 −2.5 −2 −1.5 −1
Multinomial probit Linear prediction (cutpoints excluded)
Hierarchy
95% CI I(y=2)
Semiparametrics
lpoly smooth Pr(y=2)
Censored data kernel = epanechnikov, degree = 1, bandwidth = .19, pwidth = .29

Final thoughts

Econometrics Slide 154


Example: asset allocation

Introduction
Data plot with cut-off points
Estimation

Binary choice

2
Density estimation

Regression est.

1.5
Semiparametrics

Discrete choice
Introduction
Ordered models
⊲ Example
1
y

Specification tests
Semiparametrics
Application
Multinomial models
.5

Latent model
Multinomial logit
Latent model
Conditional logit
0

Multinomial probit
Hierarchy −3 −2.5 −2 −1.5 −1
Semiparametrics Linear prediction (cutpoints excluded)

Censored data

Final thoughts

Econometrics Slide 155


Specification tests

Introduction
Possible problems with the model specification
Estimation

Binary choice
• parallel regression assumption
Density estimation
◦ ordered choice model with constant slopes (probit)
Regression est.

P (yi ≤ j|xi ) = F (αj − x⊤


Semiparametrics

Discrete choice
i β)
Introduction
Ordered models
Example
◦ ordered choice model with varying slopes (probit)
⊲ Specification tests

P (yi ≤ j|xi ) = F (αj − x⊤


Semiparametrics
Application i βj )
Multinomial models
Latent model
Multinomial logit might indicate sequential probit (e.g.,
Latent model
Conditional logit
P (Yi = 2|Xi ) = P (Yi = 2|Yi = 1, Xi ) · P (Yi = 1|Xi ))
Multinomial probit
Hierarchy
Semiparametrics
• heteroscedasticity (similar to probit/logit)
Censored data • distributional assumptions (similar to probit/logit)
Final thoughts

Econometrics Slide 156


Semiparametric alternatives

Single index model E(yi |xi ) = g(x⊤


Introduction

Estimation
i β) applies,
J
X
jP (yi = j|x⊤ ⊤
Binary choice
E(yi |xi ) = i β) = g(x i β),
Density estimation
j=0
Regression est.

Semiparametrics but threshold in general not identified unless intercept is:


Discrete choice
Introduction
Ordered models
I(yi ≥ k) = I(yi∗ > αk−1 ) = I(x⊤
i β + εi − αk−1 > 0)
Example
Specification tests
⊲ Semiparametrics • maximum score estimator identifies intercept
Application n J
Multinomial models 1X X
Latent model β̂nM SE = arg minβ |yi − I(x⊤
i β > αj−1 )|
Multinomial logit n
Latent model
i=1 j=0
Conditional logit
Multinomial probit • Klein & Spady under symmetry of F (Chen, 1999)
Hierarchy

P (yi = 1|x⊤
Semiparametrics

Censored data
i β = t) = F (t) = 1 − F (−t)

Final thoughts = 1 − P (yi = 1|x⊤


i β = −t)

Econometrics Slide 157


Application: market entry

Introduction
Breshanan and Reis (1991) Entry and competition in concentrated
Estimation
markets. The Journal of Political Economy 99, 977–1009.
Binary choice

Density estimation • study the number of firms in a market given its size and
Regression est. competition
Semiparametrics
• analyze 202 geographically isolated markets (dentists, plumbers,
Discrete choice
Introduction
electricians etc. in county seat cities)
Ordered models
Example
Specification tests
Semiparametrics
⊲ Application
Multinomial models
Latent model
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics

Censored data

Final thoughts

Econometrics Slide 158


Multinomial and ordered response models

Introduction
Data with a discrete response with more than two values
Estimation

Binary choice
• multiple discrete responses yi = 0, 1, . . . , J
Density estimation
◦ unordered (nominal) response
Regression est.
(e.g., mode of transportation, choice of industry for
Semiparametrics

Discrete choice
investment, brand choice, ...)
Introduction
Ordered models
◦ ordered response (values not completely arbitrary)
Example (e.g., credit rating, preference, health plan choice, ...)
Specification tests
Semiparametrics
Application • explanatory variables xi
⊲ Multinomial models
Latent model • motivated by latent (utility-maximization) models
Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics

Censored data

Final thoughts

Econometrics Slide 159


Latent model

Introduction
Deriving the multinomial logit model from a latent utility maximization
Estimation
(McFadden, 1974), where each choice j ≥ 1 has own coefficient βj
Binary choice

Density estimation ∗
yij = x⊤
ij βj + εij j = 0, . . . , J
Regression est.

Semiparametrics
∗ implies
• utility maximization yi = arg maxj=0,...,J yij
Discrete choice
Introduction
Ordered models
Example P (yi = j|xi0 , . . . xiJ ) = P (yij > yik for all k 6= j|xi0 , . . . xiJ )
Specification tests
Semiparametrics
Application • assuming type I extreme value (Gumbel) distribution
Multinomial models
⊲ Latent model εij ∼ F (t) = exp(− exp(t)), it follows
Multinomial logit
Latent model
Conditional logit exp(x⊤
ij βj )
Multinomial probit P (yi = j|xi0 , . . . xiJ ) = PJ

l=0 exp(xil βl )
Hierarchy
Semiparametrics

Censored data
• coefficients βj or values xij have to vary across choices
Final thoughts P P
(exp(. . .+x⊤
i β)/ exp(. . .+x⊤
i β) = exp(. . .)/ exp(. . .))
Econometrics Slide 160
Multinomial logit model

Introduction
Multinomial choice model
Estimation
exp(x⊤
ij βj )
Binary choice
P (yi = j|xi0 , . . . xiJ ) = PJ
Density estimation exp(x ⊤β )
l=0 il l
Regression est.

Semiparametrics • coefficients βj or values xij have to vary across choices


Discrete choice
Introduction Consider a multinomial response model using characteristics xi of
Ordered models
Example decision makers (independent of choice j ). Multinomial logit model:
Specification tests
Semiparametrics exp(x⊤i βj )
Application P (yi = j|xi ) = PJ
Multinomial models 1 + l=1 exp(x⊤ i βl )
Latent model
⊲ Multinomial logit 1
Latent model P (yi = 0|xi ) = PJ
Conditional logit 1 + l=1 exp(x⊤ i βl )
Multinomial probit
Hierarchy
Semiparametrics • normalization β0 = (0, . . . , 0)⊤ as (β0 , . . . βj ) + (γ, . . . , γ)
Censored data generates equal probabilities irrespective of the value of γ

P ⊤
P
Final thoughts
(exp(. . .+xi γ)/ exp(. . .+xi γ) = exp(. . .)/ exp(. . .))
Econometrics Slide 161
Multinomial logit model

Introduction
Multinomial logit model:
Estimation
exp(x⊤ i βj )
Binary choice
P (yi = j|xi ) = PJ
Density estimation 1 + l=1 exp(x⊤ i βl )
Regression est.
1
Semiparametrics P (yi = 0|xi ) = PJ
Discrete choice
1 + l=1 exp(x⊤ i βl )
Introduction Pn
Ordered models • contributions to log-likelihood function ln Ln (β) = i=1 li (β):
Example
Specification tests
J
X
Semiparametrics
Application li (β) = I(yi = j) log P (yi = j|xi )
Multinomial models
Latent model j=0
⊲ Multinomial logit
Latent model • note: reduction to the binary logit (P (j|j, k) = P (j)/P (j, k))
Conditional logit
Multinomial probit
exp(x⊤ i βj )
Hierarchy
P (yi = j|yi ∈ {j, k}, xi ) = ⊤ ⊤
=
Semiparametrics
exp(xi βj ) + exp(xi βk )
Censored data {1 + exp[−x⊤
i (βj − βk )]}
−1
= Λ{x⊤
i (βj − βk )}
Final thoughts

(independence from irrelevant alternatives)


Econometrics Slide 162
Multinomial logit model – interpretation

Introduction
Multinomial logit (here j ∈ {1, . . . , J}):
Estimation

Binary choice • probabilities


Density estimation

Regression est.
exp(x⊤i βj )
pj (xi ) = P (yi = j|xi ) = PJ
Semiparametrics 1 + l=1 exp(x⊤ i βl )
Discrete choice
Introduction
Ordered models • partial effects
Example
Specification tests
( PJ )

Semiparametrics ∂P (yi = j|xi ) l=1 βlk exp(xi βl )
Application = P (yi = j|xi ) βjk − PJ
Multinomial models ∂xik 1 + l=1 exp(x⊤ i βl )
Latent model
⊲ Multinomial logit
Latent model
Conditional logit
• (simpler) interpretation of partial effects via
Multinomial probit  
Hierarchy P (yi = j|xi ) ∂ pj (xi )
Semiparametrics = exp(x⊤
i β j ) ⇒ = exp(x ⊤
i βj )βjk
Censored data
P (yi = 0|xi ) ∂xik p0 (xi )
Final thoughts

Econometrics Slide 163


Example: school and employment decision

Introduction
Employment and schooling decisions of young men
Estimation
(school = 1, home = 2, work = 3)
Binary choice

Density estimation Explanatory variables: characterize individuals


Regression est.
• education
Semiparametrics

Discrete choice
• work experience
Introduction
Ordered models
• race
Example
Specification tests
Semiparametrics
Application
Multinomial models
Latent model
⊲ Multinomial logit
Latent model
Conditional logit
Multinomial probit
Hierarchy
Semiparametrics

Censored data

Final thoughts

Econometrics Slide 164


Example: school and employment decision

Introduction

Estimation
Multinomial logistic regression
Binary choice
------------------------------------------------------
Density estimation
status | Coef. Std. Err. z P>|z|
-------------+----------------------------------------
Regression est.
1 | (base outcome)
Semiparametrics
-------------+----------------------------------------
Discrete choice
Introduction 2 educ | -.6736313 .0698999 -9.64 0.000
Ordered models exper | -.1062149 .173282 -0.61 0.540
Example
Specification tests expersq | -.0125152 .0252291 -0.50 0.620
Semiparametrics
Application
black | .8130166 .3027231 2.69 0.007
Multinomial models _cons | 10.27787 1.133336 9.07 0.000
Latent model
⊲ Multinomial logit
-------------+----------------------------------------
Latent model 3 educ | -.3146573 .0651096 -4.83 0.000
Conditional logit
Multinomial probit
exper | .8487367 .1569856 5.41 0.000
Hierarchy expersq | -.0773003 .0229217 -3.37 0.001
Semiparametrics
black | .3113612 .2815339 1.11 0.269
Censored data
_cons | 5.543798 1.086409 5.10 0.000
Final thoughts
------------------------------------------------------
Econometrics Slide 165
Example: school and employment decision

Introduction

Estimation Multinomial logistic regression


Binary choice Average marginal effects
Density estimation

Regression est. ---------------------------------------------------


Semiparametrics | Delta-method
Discrete choice
Introduction
| dy/dx Std. Err. z P>|z|
Ordered models ----------+----------------------------------------
Example
Specification tests 1 educ | .0173786 .0029057 5.98 0.000
Semiparametrics
Application exper | -.0313191 .0067664 -4.63 0.000
Multinomial models
Latent model
---------------------------------------------------
⊲ Multinomial logit 2 educ | -.0429471 .0030007 -14.31 0.000
Latent model
Conditional logit exper | -.1006127 .0096477 -10.43 0.000
Multinomial probit
Hierarchy
---------------------------------------------------
Semiparametrics
3 educ | .0255685 .0040739 6.28 0.000
Censored data
exper | .1319318 .0107027 12.33 0.000
Final thoughts
---------------------------------------------------
Econometrics Slide 166
Latent model – back to square one

Introduction
Deriving the multinomial logit model from a latent utility maximization
Estimation
(McFadden, 1974) , where each choice j ≥ 1 has own coefficient βj
Binary choice

Density estimation ∗
yij = x⊤
ij βj + εij j = 0, . . . , J
Regression est.

Semiparametrics
∗ implies
• utility maximization yi = arg maxj=0,...,J yij
Discrete choice
Introduction
Ordered models
Example P (yi = j|xi0 , . . . xiJ ) = P (yij > yik for all k 6= j|xi0 , . . . xiJ )
Specification tests
Semiparametrics
Application • assuming type I extreme value (Gumbel) distribution
Multinomial models
Latent model εij ∼ F (t) = exp(− exp(t)), it follows
Multinomial logit
⊲ Latent model
Conditional logit exp(x⊤
ij βj )
Multinomial probit P (yi = j|xi0 , . . . xiJ ) = PJ

l=0 exp(xil βl )
Hierarchy
Semiparametrics

Censored data
• coefficients βj or values xij have to vary across choices
Final thoughts P P
(exp(. . .+x⊤
i β)/ exp(. . .+x⊤
i β) = exp(. . .)/ exp(. . .))
Econometrics Slide 167
Conditional logit model

Introduction
Multinomial choice model
Estimation
exp(x⊤
ij βj )
Binary choice
P (yi = j|xi0 , . . . xiJ ) = PJ
Density estimation exp(x ⊤β )
l=0 il l
Regression est.

Semiparametrics • coefficients βj or values xij have to vary across choices


Discrete choice
Introduction Consider a multinomial response model using characteristics xi of
Ordered models
Example individual choices (varying with choices j ) and one common β .
Specification tests
Semiparametrics
Conditional logit model (xi = (xi0 , . . . , xiJ )):
Application
Multinomial models exp(x⊤
ij β)
Latent model pj (xi ) = P (yi = j|xi0 , . . . , xiJ ) = PJ
Multinomial logit exp(x ⊤ β)
Latent model
l=0 il
⊲ Conditional logit Pn
Multinomial probit • contributions to log-likelihood function ln Ln (β) = i=1 li (β):
Hierarchy
Semiparametrics J
X
Censored data
li (β) = I(yi = j) log P (yi = j|xi )
Final thoughts
j=0

Econometrics Slide 168


Conditional logit model – interpretation

Introduction
Consider a multinomial response model using characteristics xi of
Estimation
individual choices (varying with choices j ) and one common β .
Binary choice

Density estimation • the probability of the choice j


Regression est. exp(x⊤
ij β)
pj (xi ) = P (yi = j|xi0 , . . . , xiJ ) = PJ
Semiparametrics ⊤
Discrete choice
l=0 exp(xil β)
Introduction
Ordered models • partial effects (xijk = the k th component of xij )
Example
Specification tests
Semiparametrics ∂P (yi = j|xi0 , . . . , xiJ )
Application = pj (xi )[1 − pj (xi )]βk
Multinomial models ∂xijk
Latent model
Multinomial logit ∂P (yi = j|xi0 , . . . , xiJ )
Latent model = −pj (xi )pl (xi )βk
⊲ Conditional logit ∂xilk
Multinomial probit
Hierarchy • odds ratio (independence from irrelevant alternatives)
Semiparametrics

Censored data P (yi = j|xi )


= exp[(xij − xik )⊤ β]
Final thoughts
P (yi = k|xi )
Econometrics Slide 169
Comparison of multinomial logits

Introduction
Multinomial logit model (e.g., choice of occupation):
Estimation

Binary choice
• individual characteristics used
Density estimation • characteristics of alternative choice unimportant and omitted
Regression est.
Conditional logit (e.g., choice of transport):
Semiparametrics

Discrete choice • characteristics of alternative choices are important and used


Introduction
Ordered models • multinomial logit is a special case of conditional logit
for xij = {xi I(j = l)}Jl=1 and parameter vector (β1 , . . . , βJ )
Example
Specification tests
Semiparametrics
Application Mixed logit:
Multinomial models
Latent model
Multinomial logit
• contains both individual- and choice-specific characteristics
Latent model
⊲ Conditional logit
Multinomial probit Independence from irrelevant alternatives (IIA) assumptions
Hierarchy
Semiparametrics
required!
Censored data

Final thoughts
pj (xi )/pl (xi ) = exp(x⊤
ij β)/ exp(x ⊤
il β) = exp[(x ij − x ⊤
il β]
)

Econometrics Slide 170


Multinomial probit model

Introduction
Multiple-choice problem: J choices with utilities
Estimation


Binary choice
yij = x⊤
ij β + εij j = 0, . . . , J
Density estimation

Regression est.

Semiparametrics • observable yi = arg maxj=0,...,J yj∗


Discrete choice
Introduction
Ordered models Estimation by multinomial probit:
Example
Specification tests • E = (εi1 , . . . , εiJ )⊤ ∼ N (0, Ω), xi = (xi1 , . . . , xiJ )
Semiparametrics Pn PJ
Application • likelihood log Ln (b) = i=1 j=1 yji log P (yi = j|xi ),
Multinomial models
Latent model where probability P (yi = j|xi ) equals
Multinomial logit
Latent model
∗ ∗
Conditional logit P (yik < yij , ∀k 6= j|xi ) = P {x⊤i b + E i < x ⊤
ij b + εij |xi }
⊲ Multinomial probit ˆ ˆ
Hierarchy
Semiparametrics = ··· I(x⊤ ⊤
ik b + εik < xij b + εij , ∀k 6= j)dΦ(E|Ω)
Censored data ε1 εJ
Final thoughts
• estimation and evaluation for a larger J difficult, by simulation
Econometrics Slide 171
Example: school and employment decision

Introduction
Employment and schooling decisions of young men
Estimation
(school = 1, home = 2, work = 3)
Binary choice

Density estimation Explanatory variables: characterize individuals


Regression est.
• education
Semiparametrics

Discrete choice
• work experience
Introduction
Ordered models
• race
Example
Specification tests
Semiparametrics
Application
Multinomial models
Latent model
Multinomial logit
Latent model
Conditional logit
⊲ Multinomial probit
Hierarchy
Semiparametrics

Censored data

Final thoughts

Econometrics Slide 172


Example: school and employment decision

Introduction

Estimation
Multinomial probit regression
Binary choice
------------------------------------------------------
Density estimation
status | Coef. Std. Err. z P>|z|
-------------+----------------------------------------
Regression est.
1 | (base outcome)
Semiparametrics
-------------+----------------------------------------
Discrete choice
Introduction 2 educ | -.4410793 .0413589 -10.66 0.000
Ordered models exper | -.1137917 .1114018 -1.02 0.307
Example
Specification tests expersq | -.0043746 .0155293 -0.28 0.778
Semiparametrics
Application
black | .6047029 .1908654 3.17 0.002
Multinomial models _cons | 6.706148 .6554217 10.23 0.000
Latent model
Multinomial logit
-------------+----------------------------------------
Latent model 3 educ | -.162221 .0385226 -4.21 0.000
Conditional logit
⊲ Multinomial probit
exper | .6721982 .1027514 6.54 0.000
Hierarchy expersq | -.0592846 .0142553 -4.16 0.000
Semiparametrics
black | .2244359 .1795638 1.25 0.211
Censored data
_cons | 2.985019 .6285942 4.75 0.000
Final thoughts
------------------------------------------------------
Econometrics Slide 173
Example: school and employment decision

Introduction

Estimation Multinomial probit regression


Binary choice Average marginal effects
Density estimation

Regression est. ---------------------------------------------------


Semiparametrics | Delta-method
Discrete choice
Introduction
| dy/dx Std. Err. z Logit
Ordered models ----------+----------------------------------------
Example
Specification tests 1 educ | .0157377 .0029057 5.98 0.017
Semiparametrics
Application exper | -.0334263 .0067664 -4.63 -0.031
Multinomial models
Latent model
---------------------------------------------------
Multinomial logit 2 educ | -.0443320 .0030007 -14.31 -0.043
Latent model
Conditional logit exper | -.1064632 .0096477 -10.43 -0.100
⊲ Multinomial probit
Hierarchy
---------------------------------------------------
Semiparametrics
3 educ | .0285944 .0040739 6.28 0.026
Censored data
exper | .1398895 .0107027 12.33 0.013
Final thoughts
---------------------------------------------------
Econometrics Slide 174
Hierarchical models

Introduction
Nested logit model (yi = 0, 1, . . . , J ):
Estimation

Binary choice • split J choices into S groups G1 , . . . , GS


Density estimation • first hierarchy: yi ∈ Gs
Regression est.
hP iρs
Semiparametrics
αs exp(ρ −1 x⊤ β)
l∈Gs s ij
Discrete choice P (yi ∈ Gs |xi ) = P hP iρr
Introduction S −1 ⊤
Ordered models r=1 l∈Gs exp(ρ r xij β)
Example
Specification tests
Semiparametrics
Application
• second hierarchy: yi = j
Multinomial models
Latent model
exp(ρ−1 ⊤
s xij β)
Multinomial logit
Latent model
P (yi = j|yi ∈ Gs , xi ) = P −1 ⊤
Conditional logit l∈Gs exp(ρ s xil β)
Multinomial probit
⊲ Hierarchy
Semiparametrics • independence of irrelevant alternatives assumption needed only
Censored data within groups Gs
Final thoughts

Econometrics Slide 175


Hierarchical models

Introduction
Estimation of the nested logit model:
Estimation

Binary choice
• normalization α1 = 1 required
Density estimation • other restrictions often imposed
Regression est. (e.g., α1 = . . . = αS or ρ1 = . . . = ρS )
Semiparametrics
• 1 − ρs represents correlation of unobservables across groups
Discrete choice
Introduction • limited-information likelihood
Ordered models
Example
Specification tests ◦ estimate λs = ρ−1
s β by conditional logit for each group of
Semiparametrics
Application
responses Gs , s = 1, . . . , S
Multinomial models
Latent model
◦ maximize multinomial choice of the group Gs
Multinomial logit
Latent model
Conditional logit
• full-information likelihood
Multinomial probit
⊲ Hierarchy ◦ maximize joint likelihood based on
Semiparametrics

Censored data
P (yi = j|yi ∈ Gs , xi ) · P (yi ∈ Gs |xi )
Final thoughts
• conditional logit: α1 = . . . = αS = ρ1 = . . . = ρS = 1 (test)
Econometrics Slide 176
Semiparametric alternatives

Introduction
Straightforward generalizations of the methods for binary responses;
Estimation
for example,
Binary choice

Density estimation • Maximum score estimator for multinomial choice (Fox, 2007)
Regression est.

Semiparametrics
◦ choice specific characteristics xij
Discrete choice ◦ consider choices yi = k and yi = l
Introduction
n
1X
Ordered models
Example
Specification tests β̂nM SE = arg maxβ I(yi = k)I(x⊤
ik β > x ⊤
il β)
Semiparametrics n
Application
i=1
Multinomial models
Latent model
+ I(yi = l)I(x⊤
ik β < x⊤
il β)
Multinomial logit
Latent model
Conditional logit
Multinomial probit
◦ many choices
J J n
Hierarchy
1 XXX
⊲ Semiparametrics
β̂nM SE = arg maxβ I(yi = k)I(x⊤
ik β > x ⊤
il β)
Censored data n
k=1 l=1 i=1
Final thoughts
+ I(yi = l)I(x⊤ ⊤
ik β < xil β)
Econometrics Slide 177
Introduction

Estimation

Binary choice

Density estimation

Regression est.

Semiparametrics

Discrete choice

⊲ Censored data
Models for censored and
Introduction
Truncation
truncated data
Tobit model
MLE
Tobit – interpretation
Specification
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application

Final thoughts

Econometrics Slide 178


Censoring and corner solution responses

Introduction
Censored data
Estimation

Binary choice
• censored responses = some values are not observable; just a
Density estimation lower or upper bound is known
Regression est. (example: duration, income due to bracketing, social
Semiparametrics contributions, taxation rules, toxicity measurements)
Discrete choice
• corner solution responses = response distribution partially
Censored data
⊲ Introduction discrete due to censoring of latent response values
Truncation
(example: alcohol or charitable spending)
Tobit model
MLE
Tobit – interpretation
• estimation similar in both cases, although underlying reasons
Specification and models differ
Alternatives
Two-part Tobit • interpretation differs: for a variable yi with values observed only
Application
Alternatives above a (e.g., a = 0), we can typically be interested in
Sample selection
Two-step estimation
MLE
◦ both models: P (yi > a|xi ) or P (yi ≤ a|xi )
Example
Application
◦ censored model: E(yi |xi )
Final thoughts ◦ corner-solution response: E(yi |xi , yi > a)
Econometrics Slide 179
Linear models?

Latent (uncensored) model: yi∗ = x⊤


Introduction

Estimation
i β + εi
censored at a from below/above
Binary choice

Density estimation Transformation to censoring from below at 0


Regression est.

Semiparametrics
• yi = max{yi∗ , a} ⇔ yi − a = max{yi∗ − a, 0}
Discrete choice • yi = min{yi∗ , a} ⇔ −yi = max{−yi∗ , −a}
Censored data • assume yi = max{yi∗ , 0} without loss of generality
⊲ Introduction
Truncation
Tobit model
MLE
Can we use linear model E(yi |xi ) = x⊤
i β?
Tobit – interpretation
Specification • E(yi |xi ) not linear in xi unless range of xi is very limited
Alternatives
Two-part Tobit
(under censoring from below, E(yi |xi ) > E(yi∗ |xi ) = x⊤ i β)
Application
Alternatives
• heteroscedasticity due to var(yi |xi ) (see the next slide)
Sample selection
Two-step estimation
• predictions not always positive
MLE
Example
• P (yi = 0|xi ) not predictable
Application

Final thoughts

Econometrics Slide 180


Linear model – censored data?

Introduction
Linear fit of yi∗ = 0.5 + xi + εi and yi = max{0.5 + xi + εi , 0}
Estimation
for εi ∼ N (0, 1) and n = 1000
Binary choice

Density estimation

6
Regression est.

Semiparametrics
4
Discrete choice

Censored data
2

⊲ Introduction
Truncation
Tobit model
0

MLE
Tobit – interpretation
Specification
−2

Alternatives
Two-part Tobit
−4

Application
Alternatives
−4 −2 0 2 4
Sample selection x
Two-step estimation
MLE Original Censored
Example True line Linear prediction
Application

Final thoughts

Econometrics Slide 181


Truncated data

Introduction
Censored data
Estimation
yi = max{0, x⊤
i β + εi }
Binary choice

Density estimation
• corner solution: x⊤
i β + εi is the unconstrained optimal choice
Regression est.

Semiparametrics
• censoring: x⊤ ∗
i β + εi represents the latent variable yi
Discrete choice

Censored data
Introduction Truncated data – “corner/censored” values are not observed
⊲ Truncation
Tobit model
MLE yi = x⊤
i β + εi
Tobit – interpretation
Specification
Alternatives
• both yi and xi observed only for yi > 0
Two-part Tobit
Application (example: truncated income data)
Alternatives
Sample selection • other thresholds and truncation from above possible (yi ≶ a)
Two-step estimation
MLE
Example
Application

Final thoughts

Econometrics Slide 182


Truncated model – data

Introduction
Linear fit of yi∗ = 0.5 + xi + εi observable only for yi∗ > 0
Estimation
with εi ∼ N (0, 1) and n = 1000
Binary choice

Density estimation

6
Regression est.

Semiparametrics
4
Discrete choice

Censored data
2

Introduction
⊲ Truncation
Tobit model
0

MLE
Tobit – interpretation
−2

Specification
Alternatives
Two-part Tobit
−4

Application
Alternatives −4 −2 0 2 4
Sample selection x
Two-step estimation
MLE Original Truncated
Example True line Linear prediction
Application

Final thoughts

Econometrics Slide 183


Tobit model

Introduction
Tobit type I model
Estimation

Binary choice
yi = max{0, x⊤
i β + εi }
Density estimation

Regression est.

Semiparametrics • corner solution: x⊤


i β + εi is the unconstrained optimal choice
Discrete choice • censoring: x⊤ ∗
i β + εi represents the latent variable yi
Censored data
Introduction
• Tobit: error distribution εi |xi ∼ N (0, σ 2 )
Truncation
⊲ Tobit model
⇒ probability of censoring P (yi = 0|xi ) > 0 for any xi
MLE
• E(xi x⊤ ) has a full rank, (y , x ) n random sample
Tobit – interpretation i i i i=1
Specification
Alternatives
Two-part Tobit Discussion
Application
Alternatives • maximum likelihood estimation
Sample selection
Two-step estimation • regression functions and marginal effects
MLE
Example • specification testing and extensions
Application

Final thoughts

Econometrics Slide 184


Tobit model – data

Introduction
Linear fit of yi∗ = 0.5 + xi + εi and yi = max{0.5 + xi + εi , 0}
Estimation
for εi ∼ N (0, 1) and n = 1000
Binary choice

Density estimation

6
Regression est.

Semiparametrics
4
Discrete choice

Censored data
2

Introduction
Truncation
⊲ Tobit model
0

MLE
Tobit – interpretation
Specification
−2

Alternatives
Two-part Tobit
−4

Application
Alternatives
−4 −2 0 2 4
Sample selection x
Two-step estimation
MLE Original Censored
Example True line Linear prediction
Application

Final thoughts

Econometrics Slide 185


Tobit model – error distribution

Introduction
Error distribution from yi = max{0.5 + xi + εi , 0}
Estimation
for εi ∼ N (0, 1) and xi = 1: εi = (yi |xi = 1) − 0.5 − 1.0
Binary choice

Density estimation

.8
Regression est.

Semiparametrics

Discrete choice
.6
Censored data
Introduction
Density

Truncation
.4

⊲ Tobit model
MLE
Tobit – interpretation
Specification
.2

Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
0

Two-step estimation −5 0 5
MLE
Example Original Censored
Application

Final thoughts

Econometrics Slide 186


Tobit model – distribution properties

Introduction
Error-term distribution
Estimation

Binary choice
• εi = yi − x⊤ ⊤
i β “observable” only for yi ≥ 0 ⇔ εi ≥ −xi β
Density estimation • εi censored from below at −x⊤
i β:
Regression est.

Semiparametrics
◦ positive probability at t = −x⊤
i β equal to
Discrete choice P (εi ≤ −x⊤ i β|x i ) = Φσ (−x ⊤ β) = Φ(−x⊤ β/σ)
i i
Censored data ◦ continuously distributed at t > −x⊤i β with
Introduction
Truncation density φσ (t) = φ(t/σ)/σ
⊲ Tobit model
MLE
Tobit – interpretation Tobit type I model yi = max{0, x⊤
i β + εi } – distribution properties
Specification

• P (yi = 0|xi ) = P (x⊤


Alternatives
Two-part Tobit i β + εi ≤ 0|xi )
Application
Alternatives
= P (x⊤i β/σ ≤ −ε i /σ|x i ) = 1 − Φ(x ⊤ β/σ) > 0
i
Sample selection
Two-step estimation
• P (yi > 0|xi ) = Φ(x⊤ i β/σ) > 0
MLE
Example • P (yi = c > 0|xi ) = P (x⊤
i β + εi = c > 0|xi ) = 0
Application

Final thoughts
• uncensored yi cont. distributed: φσ (x⊤
i β) = φ(x ⊤ β/σ)/σ
i
Econometrics Slide 187
Tobit model – estimation

Introduction
Maximum likelihood estimation (yi∗ ∼ Fy )
Estimation

Binary choice • likelihood contribution for yi = 0:


Density estimation P (yi = 0|xi ) = 1 − Φ(x⊤ i β/σ)
Regression est. • likelihood contribution for yi > 0:
Semiparametrics
fy (yi |xi ) = fε (yi − x⊤ ⊤
i β|xi ) = φ{(yi − xi β)/σ}/σ
Discrete choice
• total likelihood contribution:
Censored data
Introduction
Truncation
l(yi , xi ; β) = I(yi = 0) · ln[1 − Φ(x⊤i β/σ)]
Tobit model
  ⊤

⊲ MLE
1 yi − xi β
Tobit – interpretation
+ I(yi > 0) · ln φ
Specification
Alternatives
σ σ
Two-part Tobit
Application
= I(yi = 0) · ln[1 − Φ(x⊤
i β/σ)]
Alternatives
 ⊤ 2

1 (yi − xi β)
Sample selection
− I(yi > 0) · ln(2πσ 2 ) +
Two-step estimation
MLE
2 2σ 2
Example
Application Pn
Final thoughts
• MLE: maximize i=1 l(yi , xi ; β)

Econometrics Slide 188


Example: female labor supply

Introduction
Annual labor supply in hours for married women (Mroz, 1987)
Estimation
Reduced form equation using
Binary choice

Density estimation • hours can be zero (40% of data) or positive


Regression est. • wage not included due to endogeneity
Semiparametrics
• non-wife income
Discrete choice

Censored data
• education
Introduction
Truncation
• labor market experience
Tobit model
⊲ MLE
• age
Tobit – interpretation
Specification
• number of children
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application

Final thoughts

Econometrics Slide 189


Example: female labor supply

Introduction

Estimation tobit hours nwifeinc educ exper expersq age


Binary choice kidslt6 kidsge6, ll(0)
Density estimation

Regression est. ---------------------------------------------------


Semiparametrics hours | Coef. Std. Err. t OLS
Discrete choice
----------+----------------------------------------
Censored data
Introduction
nwifeinc | -8.814243 4.459096 -1.98 [-3.45]
Truncation
Tobit model
educ | 80.64561 21.58322 3.74 [28.76]
⊲ MLE exper | 131.5643 17.27938 7.61 [65.67]
Tobit – interpretation
Specification expersq | -1.864158 .5376615 -3.47 [-0.70]
Alternatives
Two-part Tobit
age | -54.40501 7.418496 -7.33 [-30.5]
Application
Alternatives
kidslt6 | -894.0217 111.8779 -7.99 [-422.]
Sample selection kidsge6 | -16.218 38.64136 -0.42 [-32.8]
Two-step estimation
MLE _cons | 965.3053 446.4358 2.16 [1330.]
Example
Application
----------+----------------------------------------
Final thoughts /sigma | 1122.022 41.57903
Econometrics
---------------------------------------------------
Slide 190
Tobit model – conditional expectations

Tobit type I model yi = max{0, x⊤


Introduction

Estimation
i β + εi } – basic properties
Binary choice • E(yi |xi ) ≥ max{0, E(x⊤
i β + ε i |x i )} = max{0, x ⊤ β}
i
Density estimation (Jensen’s inequality; see figure on slide 196)
Regression est.
• med(yi |xi ) = max{0, med(x⊤ ⊤
i β + εi |xi )} = max{0, xi β}
Semiparametrics

Discrete choice
Further, all objects of interest are related by
Censored data
Introduction E(yi |xi ) = P (yi = 0|xi ) · 0 + P (yi > 0|xi ) · E(yi |xi , yi > 0)
Truncation
Tobit model
MLE
= P (yi > 0|xi ) · E(yi |xi , yi > 0)
⊲ Tobit – interpretation
Specification
Alternatives
Two-part Tobit
• note: probability P (yi > 0|xi ) can be expressed as in probit
Application

P (yi > 0|xi ) = P (x⊤


Alternatives
Sample selection i β + εi > 0|xi )
Two-step estimation
MLE = P (x⊤
i β/σ + ε i /σ > 0|x i ) = Φ(x ⊤
i β/σ)
Example
Application

Final thoughts
(the identication assumption of probit is varεi = 1)

Econometrics Slide 191


Tobit model – conditional expectations

Introduction
Probability of no censoring
Estimation

Binary choice
P (yi > 0|xi ) = Φ(x⊤
i β/σ)
Density estimation

Regression est.
Expectation conditional on not being censored
Semiparametrics

Discrete choice φ(x⊤


i β/σ)
E(yi |xi , yi > 0) = x⊤
i β +σ
Censored data
Introduction
Φ(x⊤
i β/σ)
Truncation
Tobit model
MLE
(note: Mill’s ratio λ(t) = φ(t)/[1 − Φ(t)])
⊲ Tobit – interpretation
Specification Expectation for censored and uncensored observations
Alternatives
Two-part Tobit
Application E(yi |xi ) = P (yi > 0|xi ) · E(yi |xi , yi > 0)
Alternatives
 ⊤

Sample selection
φ(xi β/σ)
Two-step estimation = Φ(x⊤i β/σ) x ⊤
i β + σ
MLE
Example
Φ(x⊤ i β/σ)
Application
= Φ(x⊤ ⊤ ⊤
i β/σ)xi β + σφ(xi β/σ)
Final thoughts

Econometrics Slide 192


Tobit model – marginal effects

Introduction
Marginal effects (estimated at x̄ or as average partial effects)
Estimation

Binary choice
• marginal effects for P (yi > 0|xi )
Density estimation  
∂P (yi > 0|xi ) ∂Φ(x⊤
i β/σ) x⊤
i β βk
Regression est.
= =φ
Semiparametrics ∂xik ∂xik σ σ
Discrete choice

Censored data • marginal effects for E(yi |xi , yi > 0) = x⊤ ⊤


i β + σλ(−xi β/σ)
Introduction
Truncation   
Tobit model ∂E(yi |xi , yi > 0) x⊤
i β x⊤
i β x⊤
i β
MLE = βk 1 − λ(− ) + λ(− )
⊲ Tobit – interpretation ∂xik σ σ σ
Specification
Alternatives
Two-part Tobit • marginal effects for
Application
Alternatives
Sample selection ∂E(yi |xi )
Two-step estimation = Φ(x⊤
i β/σ)βk
MLE ∂xik
Example
Application

Final thoughts

Econometrics Slide 193


Example: female labor supply

Introduction

Estimation tobit hours nwifeinc educ exper expersq age


Binary choice kidslt6 kidsge6, ll(0)
Density estimation

Regression est. ---------------------------------------------------


Semiparametrics hours | Coef. Std. Err. t OLS
Discrete choice
----------+----------------------------------------
Censored data
Introduction
nwifeinc | -8.814243 4.459096 -1.98 [-3.45]
Truncation
Tobit model
educ | 80.64561 21.58322 3.74 [28.76]
MLE exper | 131.5643 17.27938 7.61 [65.67]
⊲ Tobit – interpretation
Specification expersq | -1.864158 .5376615 -3.47 [-0.70]
Alternatives
Two-part Tobit
age | -54.40501 7.418496 -7.33 [-30.5]
Application
Alternatives
kidslt6 | -894.0217 111.8779 -7.99 [-422.]
Sample selection kidsge6 | -16.218 38.64136 -0.42 [-32.8]
Two-step estimation
MLE _cons | 965.3053 446.4358 2.16 [1330.]
Example
Application
----------+----------------------------------------
Final thoughts /sigma | 1122.022 41.57903
Econometrics
---------------------------------------------------
Slide 194
Example: female labor supply

Introduction

Estimation Expression: Pr(hours>0) <=> predict(p(0,.))


Binary choice

Density estimation margins, predict(p(0,.)) dydx(*)


Regression est.

Semiparametrics ---------------------------------------------------
Discrete choice
| Delta-method
Censored data
Introduction
| dy/dx Std. Err. z P>|z|
Truncation
Tobit model
----------+----------------------------------------
MLE nwifeinc | -.0024212 .0012202 -1.98 0.047
⊲ Tobit – interpretation
Specification educ | .022153 .0058285 3.80 0.000
Alternatives
Two-part Tobit
exper | .0361402 .0043438 8.32 0.000
Application
Alternatives
expersq | -.0005121 .0001444 -3.55 0.000
Sample selection age | -.0149448 .0019298 -7.74 0.000
Two-step estimation
MLE kidslt6 | -.2455841 .0282462 -8.69 0.000
Example
Application
kidsge6 | -.004455 .0106216 -0.42 0.675
Final thoughts ---------------------------------------------------
Econometrics Slide 195
Example: female labor supply

Introduction

Estimation Expression: E(hours|hours>0) <=> predict(e(0,.))


Binary choice

Density estimation margins, predict(e(0,.)) dydx(*)


Regression est.

Semiparametrics ---------------------------------------------------
Discrete choice
| Delta-method
Censored data
Introduction
| dy/dx Std. Err. z P>|z|
Truncation
Tobit model
----------+----------------------------------------
MLE nwifeinc | -3.968784 2.007582 -1.98 0.048
⊲ Tobit – interpretation
Specification educ | 36.31225 9.703038 3.74 0.000
Alternatives
Two-part Tobit
exper | 59.23938 7.833684 7.56 0.000
Application
Alternatives
expersq | -.8393732 .2423184 -3.46 0.001
Sample selection age | -24.49691 3.362492 -7.29 0.000
Two-step estimation
MLE kidslt6 | -402.5507 50.74877 -7.93 0.000
Example
Application
kidsge6 | -7.302468 17.40427 -0.42 0.675
Final thoughts ---------------------------------------------------
Econometrics Slide 196
Example: female labor supply

Introduction

Estimation Expression: E(max{0,hours}) <=> predict(ystar(0,.))


Binary choice

Density estimation margins, predict(ystar(0,.)) dydx(*)


Regression est.

Semiparametrics ---------------------------------------------------
Discrete choice
| Delta-method
Censored data
Introduction
| dy/dx Std. Err. z P>|z|
Truncation
Tobit model
----------+----------------------------------------
MLE nwifeinc | -5.188622 2.62141 -1.98 0.048
⊲ Tobit – interpretation
Specification educ | 47.47311 12.6214 3.76 0.000
Alternatives
Two-part Tobit
exper | 77.44708 9.997656 7.75 0.000
Application
Alternatives
expersq | -1.097361 .3155947 -3.48 0.001
Sample selection age | -32.02624 4.292112 -7.46 0.000
Two-step estimation
MLE kidslt6 | -526.2779 64.70622 -8.13 0.000
Example
Application
kidsge6 | -9.54694 22.75225 -0.42 0.675
Final thoughts ---------------------------------------------------
Econometrics Slide 197
Specification testing and extensions

Introduction
Extensions
Estimation

Binary choice
• doubly censored/two-limit data
Density estimation

Regression est.
Specification testing
Semiparametrics
• heteroscedasticity and non-normality
Discrete choice
(similar to probit, extend specification or use Hausman test;
Censored data
Introduction
see the censored least absolute deviation)
Truncation
Tobit model
• two-part specification: what if decision P (yi > 0|xi ) is driven by
MLE
Tobit – interpretation
different factors than the average amount E(yi |xi )
⊲ Specification (example: spending on a particular charity, expats’ labour supply)
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application

Final thoughts

Econometrics Slide 198


Example: female labor supply

Introduction
predict indt, xb; gen rest = hours - indt
Estimation
gen normd = normalden(rest / 1122.022) / 1122.022
Binary choice
kdens rest if indt>0 & rest>0, ci bw(sjpi) ll(0)
Density estimation
addplot((line normd rest if indt>0 & rest>0, sort))
Regression est.

Semiparametrics

.0008
Discrete choice

Censored data
Introduction .0006
Truncation
Density

Tobit model
.0004

MLE
Tobit – interpretation
⊲ Specification
.0002

Alternatives
Two-part Tobit
Application
0

Alternatives
Sample selection 0 1000 2000 3000 4000
rest
Two-step estimation
MLE 95% CI Kernel estimate
Example normd

Application

Final thoughts

Econometrics Slide 199


Alternative methods

Introduction
• Symmetrically trimmed least squares (Powell, 1986)
Estimation

Binary choice ◦ data yi truncated from below at 0, mean x⊤


i β
Density estimation
◦ under conditional symmetry of ε|x ...
Regression est.
◦ ... truncate symmetrically from above at 2x⊤
i β
Semiparametrics

Discrete choice
◦ trimming works under censoring, but inefficiently
Censored data n
X
Introduction
Truncation
β̂ (ST LS) = arg minβ∈B {yi − max(x⊤
i β, yi /2)} 2
Tobit model i=1
MLE
Tobit – interpretation
Specification • Symmetrically censored least squares (Powell, 1986)
⊲ Alternatives
Two-part Tobit
Application ◦ under conditional symmetry of ε|x
Alternatives n
X
β̂ (SCLS) {yi − max(x⊤ 2
Sample selection
Two-step estimation = arg minβ∈B i β, yi /2)}
MLE
Example ni=1 o
2 ⊤ 2
Application
+I(yi > 2x⊤
i β) · (yi /2) − max(0, xi β)
Final thoughts

Econometrics Slide 200


Alternative methods

Introduction
Censored least absolute deviation (CLAD) method (Powell, 1984)
Estimation

Binary choice
• med(yi |xi ) = max{0, med(x⊤ ⊤
i β + εi |xi )} = max{0, xi β}
Density estimation • assume med(εi |xi ) = 0 and minimize
Regression est.
n
X
Semiparametrics

Discrete choice
|yi − max{0, x⊤
i β}|
Censored data
i=1
Introduction
Truncation

Tobit model
• n-consistent and asymptotically normal estimator
MLE
Tobit – interpretation
• uses only observations with x⊤i β > 0;
Specification
full-rank assumption for E(xi x⊤
i |x ⊤ β > 0)
i
⊲ Alternatives
Two-part Tobit (just as for STLS / SCLS; )
Application
Alternatives • “only” med(yi |xi ) identified
Sample selection
Two-step estimation (STLS / SCLS identify “only” E(yi |xi ))
MLE
Example • poor performance in small or heavily censored samples
Application

Final thoughts

Econometrics Slide 201


Example: female labor supply

Introduction

Estimation clad hours nwifeinc educ exper expersq age


Binary choice kidslt6 kidsge6
Density estimation

Regression est. ---------------------------------------------------


Semiparametrics hours | Observed Bias Std. Err. Tobit
Discrete choice
----------+----------------------------------------
Censored data
Introduction
nwifeinc | -5.99571 -.972583 5.562 [-8.81]
Truncation
Tobit model
educ | 73.87262 -5.683093 32.531 [80.65]
MLE exper | 115.80310 6.880405 23.476 [131.6]
Tobit – interpretation
Specification expersq | -1.32852 -.160859 0.745 [-1.86]
⊲ Alternatives
Two-part Tobit
age | -57.43443 -2.804438 8.015 [-54.4]
Application
Alternatives
kidslt6 |-1057.3100 -49.193320 209.544 [-894.]
Sample selection kidsge6 |-104.01270 -15.718030 49.884 [-16.2]
Two-step estimation
MLE _cons |1498.64600 155.657400 528.073 [965.3]
Example
Application
---------------------------------------------------
Final thoughts Hausman test possible?
Econometrics Slide 202
Two-part Tobit model

Introduction
Recall the Tobit type I model
Estimation

Binary choice
yi = max{0, x⊤
i β + εi }
Density estimation

Regression est.

Semiparametrics • corner solution: x⊤


i β + εi is the unconstrained optimal choice
Discrete choice • censoring: x⊤ ∗
i β + εi represents the latent variable yi
Censored data
Introduction
• under error distribution εi |xi ∼ N (0, σ 2 ),
Truncation
Tobit model
MLE
◦ P (yi > 0|xi ) = P (x⊤
i β + ε i > 0|x i ) = Φ(x ⊤ β/σ)
i
Tobit – interpretation
Specification
◦ E(yi |xi , yi > 0) = x⊤
i β + σλ(−x ⊤ β/σ)
i
Alternatives
⊲ Two-part Tobit
Application
Alternatives Regression parameters β
Sample selection
Two-step estimation • determine the influence of xi on the “decision” yi > 0
MLE
Example • determine the influence of xi on the amount yi if yi > 0
Application

Final thoughts

Econometrics Slide 203


Two-part Tobit model

Introduction
Two-part Tobit model (also hurdle model):
Estimation
model the following two decisions separately
Binary choice

Density estimation • participation decision (yi = 0 vs. yi > 0)


Regression est. • amount decision (the magnitude of yi if yi > 0)
Semiparametrics
• two decisions can be related in general (see Tobit type II)
Discrete choice

Censored data
• assume independence of the two decisions for now:
Introduction yi = si qi , where si = I(yi > 0), and
Truncation
Tobit model
MLE
Tobit – interpretation
Fq (qi |xi , si ) = Fq (qi |xi )
Specification
Alternatives
⊲ Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application

Final thoughts

Econometrics Slide 204


Application

Introduction
Melenberg and van Soest (1996) Modelling of vacation expenditures.
Estimation
Journal of Applied Econometrics 11, 59–76
Binary choice

Density estimation • analyze vacation spending of households


Regression est. • homoscedastic and heteroscedastic Tobit compared with CLAD
Semiparametrics
• two-part model tested and independence of decision and
Discrete choice
spending found
Censored data
Introduction
Truncation
Tobit model
MLE
Tobit – interpretation
Specification
Alternatives
Two-part Tobit
⊲ Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
Application

Final thoughts

Econometrics Slide 205


Truncated normal hurdle model

Introduction
Truncated normal hurdle model (Cragg, 1971)
Estimation

Binary choice
• decision – model by probit:
Density estimation P (si = 1|xi ) = P (yi > 0|xi ) = Φ(x⊤
i γ)
Regression est. • amount – model using yi = qi ∼ N (0, σ 2 ) truncated at 0
Semiparametrics

Discrete choice
f (yi |xi ) = f (yi |xi , yi > 0) · P (yi > 0|xi )
Censored data φ{(yi − x⊤ i β)/σ}/σ
Introduction f (yi |xi , yi > 0) =
Truncation Φ{x⊤ i β/σ}
Tobit model
MLE
Tobit – interpretation • likelihood contribution for the full-information MLE:
Specification
Alternatives l(yi , xi , β) = I(yi = 0) · ln[1 − Φ(x⊤
i γ)]
Two-part Tobit
⊲ Application + I(yi > 0) · ln[Φ(x⊤
i γ)]
Alternatives
Sample selection + I(yi > 0) · ln[φ{(yi − x⊤
i β)/σ}/σ]
Two-step estimation
MLE
Example
− I(yi > 0) · ln[Φ(x⊤
i β/σ)]
Application

Final thoughts • reduces to the standard Tobit for γ = β/σ


Econometrics Slide 206
Example: female labor supply

Introduction

Estimation craggit inlf nwifeinc ... kidsge6,


Binary choice second(hours nwifeinc ... kidsge6) ll(0)
Density estimation

Regression est. Tobit/s Probit Truncated/s


Semiparametrics --------------------------------------------------
Discrete choice
inlf | Coef. SE | Coef. SE | Coef. SE
Censored data
Introduction
---------+-------------+-------------+------------
Truncation
Tobit model
nwifeinc | -.008 .004 | -.012 .005 | -.000 .005
MLE educ | .072 .019 | .131 .025 | -.035 .027
Tobit – interpretation
Specification exper | .117 .015 | .123 .019 | .085 .026
Alternatives
Two-part Tobit
expersq | -.002 .000 | -.002 .001 | -.001 .001
⊲ Application
Alternatives
age | -.048 .007 | -.052 .008 | -.032 .010
Sample selection kidslt6 | -.797 .099 | -.868 .119 | -.570 .181
Two-step estimation
MLE kidsge6 | -.014 .034 | .036 .043 | .121 .051
Example
Application
_cons | .860 .398 | .270 .508 | 2.49 .569
Final thoughts ---------+-------------+-------------+------------
Econometrics Slide 207
Semiparametric alternatives

Censored regression model yi = max{0, x⊤


Introduction

Estimation
i β + εi }
Binary choice • single-index models applicable (semiparametric LS)
Density estimation
• conditional median assumption med(εi |xi ) = 0
Regression est.
• least absolute deviation regression (LAD; biased)
Semiparametrics

Discrete choice n
X
Censored data minp |yi − x⊤
i β|
Introduction β∈R
Truncation i=1
Tobit model
MLE
Tobit – interpretation
• censored absolute deviation regression (CLAD)
Specification
Alternatives n
X
|yi − max{0, x⊤
Two-part Tobit
Application minp i β}|
⊲ Alternatives β∈R
i=1
Sample selection
Two-step estimation
MLE skewed in small samples, no two-part model
Example
Application

Final thoughts

Econometrics Slide 208


Two-step estimation of censored regression

Khan and Powell (2001) for yi = max{0, x⊤


Introduction

Estimation
i β + εi }
Binary choice n
X n
X
Density estimation minp |yi − x⊤
i β| vs. minp |yi − max{0, x⊤
i β}|
Regression est.
β∈R β∈R
i=1 i=1
Semiparametrics

Discrete choice

Censored data
• observation: criteria of QR and CQR equivalent for x⊤
i β >0
Introduction
Truncation
• observation: med(yi |xi ) = x⊤
i β if P (yi > 0|xi ) > 0.5
Tobit model ⇒ estimate nonparametrically (by Nadaraya-Watson estimator)
MLE
Tobit – interpretation p(xi ) = P (yi > 0|xi ) = E[I(yi > 0)|xi )]
Specification
Alternatives • observation: med(yi |xi ) = x⊤
i β if med(y i |x i ) = x ⊤β > 0
i
Two-part Tobit
Application ⇒ estimate nonparametrically q(xi ) = med(yi |xi )“ = a(x)”
⊲ Alternatives
Sample selection
(local median regression)
Two-step estimation
MLE n
X
Example
Application
minp |yi − a(x) − b(x)(xi − x)|K{Hn−1 (xi − x)}
β∈R
Final thoughts i=1

Econometrics Slide 209


Two-step estimation of censored regression

Introduction
Two-step estimation of censored regression model
Estimation

Binary choice
yi = max{0, x⊤
i β + εi }
Density estimation

Regression est.

Semiparametrics • first step: estimate nonparametrically p̂n (xi ) or q̂n (xi )


Discrete choice • second step:
Censored data
Introduction
Truncation
◦ select observations with p̂n (xi ) > 0.5 or q̂n (xi ) > 0
Tobit model
MLE
◦ estimate β by the linear quantile regression estimator applied
Tobit – interpretation only to the selected observations
Specification
Alternatives
Two-part Tobit • asymptotic distribution of the estimator is equivalent to the
Application
⊲ Alternatives “oracle” estimator = quantile regression estimator applied to data
Sample selection
Two-step estimation
with med(yi |xi ) > 0; the two-step estimator is adaptive
MLE
Example
Application

Final thoughts

Econometrics Slide 210


Example: female labor supply

Introduction
sml inlf nwifeinc educ expersq age kidslt6 kidsge6, offset(exper)
Estimation
predict indsml, xb
Binary choice
lpoly inlf indsml, generate(indlp plp)
Density estimation

Regression est.
Local polynomial smooth
Semiparametrics

1
Discrete choice

.8
Censored data Y=1 if in lab frce, 1975
Introduction
Truncation
.6

Tobit model
MLE
.4

Tobit – interpretation
Specification
Alternatives
.2

Two-part Tobit
Application
0

⊲ Alternatives
Sample selection −80 −60 −40 −20 0 20
Linear prediction
Two-step estimation kernel = epanechnikov, degree = 0, bandwidth = 5.32
MLE
Example
Application

Final thoughts

Econometrics Slide 211


Example: female labor supply – second step

Introduction

Estimation lpoly inlf indsml, generate(indlp plp)


Binary choice qreg hours nwifeinc educ exper expersq
Density estimation age kidslt6 kidsge6 if plp>0.5, quantile(50)
Regression est.

Semiparametrics Testing? Tobit Two-part T. Khan &Powell


Discrete choice
--------------------------------------------------
Censored data
Introduction
inlf | Coef. SE | Coef. SE | Coef. SE
Truncation
Tobit model
---------+-------------+-------------+------------
MLE nwifeinc | -8.81 4.46 | .153 5.16 | -1.18 4.52
Tobit – interpretation
Specification educ | 80.6 21.6 | -29.9 22.8 | 21.9 23.2
Alternatives
Two-part Tobit
exper | 131. 17.3 | 72.6 21.2 | 37.5 17.9
Application
⊲ Alternatives
expersq | -1.86 .537 | -.944 .609 | .750 .591
Sample selection age | -54.4 7.42 | -27.4 8.29 | -26.1 7.85
Two-step estimation
MLE kidslt6 | -894. 112. | -485. 154. | -374. 104.
Example
Application
kidsge6 | -16.2 38.6 | -103. 43.5 | -30.0 41.7
Final thoughts _cons | 965. 446. | 2124. 483. | 1022. 487.
Econometrics
---------+-------------+-------------+------------
Slide 212
Tobit type II – sample selection model

Introduction
Tobit type II – incidental truncation
Estimation

Binary choice
y1i = x⊤
1i β1 + ε1i
Density estimation

Regression est.
y2i = I(x⊤
2i β2 + ε2i > 0)
Semiparametrics

Discrete choice • (Two-part) Tobit if y1 ≡ y2 , x1 ≡ x2 , ε1 ≡ ε2 (and β1 ≡ β2 )


Censored data
Introduction • (x2i , y2 ) are always observed
Truncation
Tobit model • (x1i , y1i ) are observed only if y2i = 1
MLE
Tobit – interpretation • (ε1i , ε2i ) is independent of (x1i , x2i ) and has zero mean
Specification
Alternatives • ε2i ∼ N (0, 1) and E(ε1i |ε2i ) = ρε2i
Two-part Tobit
Application
(⇒ E(ε1i ε2i ) = E[E(ε1i |ε2i )ε2i ] = ρ)
Alternatives
⊲ Sample selection
Two-step estimation Then
MLE
Example
Application E(y1i |x1i , x2i , ε2i ) = x⊤
1i β 1 + E(ε 1i |x 1i , x 2i , ε 2i ) = x ⊤
1i β1 + ρε2i
Final thoughts
E(y1i |x1i , x2i , y2i ) = x⊤
1i β1 + ρE(ε2i |x1i , x2i , y2i )
Econometrics Slide 213
Tobit type II – sample selection model

Introduction
Since in the model
Estimation

Binary choice
y1i = x⊤
1i β1 + ε1i
Density estimation

Regression est.
y2i = I(x⊤
2i β2 + ε2i > 0),
Semiparametrics
only data with y2i = 1 are observed and ε2i ∼ N (0, 1), then
Discrete choice

Censored data
Introduction E(y1i |x1i , x2i , y2i = 1) =
Truncation
Tobit model = x⊤
1i β1 + ρE(ε2i |x1i , x2i , y2i = 1)
MLE
Tobit – interpretation = x⊤
1i β 1 + ρE(ε 2i |x 1i , x 2i , x ⊤
2i β2 + ε2i > 0)
Specification
Alternatives = x⊤
1i β 1 + ρE(ε 2i |x 1i , x 2i , ε 2i > −x ⊤
2i β2 )
Two-part Tobit
Application
⊤ φ(x⊤ 2i β2 ) ⊤ ⊤
Alternatives = x1i β1 + ρ ⊤
= x 1i β 1 + ρλ(−x 2i β2 ),
⊲ Sample selection
Two-step estimation
Φ(x2i β2 )
MLE
Example
where λ(−t) = φ(t)/Φ(t) is the inverse Mills ratio
Application

Final thoughts

Econometrics Slide 214


Two-step estimation

Introduction
Heckman (1976): two-step procedure similar to Tobit type II
Estimation

Binary choice
• estimate binary-choice model (probit) using all data
Density estimation

Regression est.
P (y2i = 1|x2i ) = Φ(x⊤
2i β2 )
Semiparametrics

Discrete choice • obtain “Heckman’s lambda’s” λ̂i = λ(−x⊤


2i b2n ), i = 1, . . . , n
Censored data • estimate the linear model for β1 and ρ using data with y2i = 1
Introduction
Truncation
Tobit model
MLE
y1i = x⊤
1i β1 + ρλ̂i + {ε1i − E(ε1i |y2i = 1)}
Tobit – interpretation
Specification
Alternatives
Inference
Two-part Tobit
Application • test H0 : ρ = 0 for selection bias (variance of β̂1n and ρ̂n simple
Alternatives
Sample selection
to estimate under H0 as var(ε1i |x1i , y2i = 1) = var(ε1i |x1i ))
⊲ Two-step estimation
MLE
• variance estimation difficult if ρ 6= 0 due to λ̂i and selection
Example
Application
• x1i ≡ x2i possible, but identification only via nonlinearity of λ(·)
Final thoughts (danger of collinearity if not enough variation in x1i !)
Econometrics Slide 215
MLE estimation

Introduction
Maximum likelihood estimation
Estimation
X
2
Binary choice
log Ln (β, σ11 , σ12 ) = log P (y2i = 0|x2i )
Density estimation
y2i =0
Regression est. X
Semiparametrics
+ log[P (y2i = 1|x2i ) · f (y1i |y2i = 1, x1i , x2i )]
Discrete choice
y2i =1
Censored data
Introduction
Truncation
• MLE estimation possible using the Bayes rule
Tobit model
MLE f (y1i |...)
Tobit – interpretation f (y1i |y2i = 1, ...) = P (y2i = 1|y1i , ...) ·
Specification P (y2i = 1|...)
Alternatives
Two-part Tobit
Application which implies
Alternatives
Sample selection
Two-step estimation P (y2i = 1|...)f (y1i|y2i = 1, ...) = P (y2i = 1|y1i , ...)f (y1i|...)
⊲ MLE
Example
Application

Final thoughts

Econometrics Slide 216


MLE estimation

Introduction
Maximum likelihood estimation
Estimation
X
2
Binary choice
log Ln (β, σ11 , σ12 ) = log P (y2i = 0|x2i )
Density estimation
y2i =0
Regression est. X
Semiparametrics
+ log[P (y2i = 1|y1i , x1i , x2i ) · f (y1i |x1i , x2i )]
Discrete choice
y2i =1
Censored data
Introduction
Truncation
• conditional distribution of y2i = x⊤ ⊤
2i β2 + ε2i |ε1i = y1i − x1i β1
Tobit model follows from the joint normality of (ε1i , ε2i ):
MLE
Tobit – interpretation
−2
Specification
Alternatives
ε2i |ε1i = y1i − x⊤ ⊤
1i β1 ∼ N (µ1 − σ12 σ11 (y1i − x1i β1 − µ2 ),
−1
Two-part Tobit
Application
σ22 − σ12 σ22 σ21 )
Alternatives
Sample selection
Two-step estimation
where σ22 = 1 and
⊲ MLE         
Example ε1i µ1 σ11 σ12 σ11 σ12
Application ∼N , = N 0,
ε2i µ2 σ21 σ22 σ21 1
Final thoughts

Econometrics Slide 217


MLE estimation

Introduction
Maximum likelihood estimation
Estimation
 
Binary choice 1 y1i −x⊤
1i β1
f (y1i |x1i , x2i ) = φ
Density estimation σ11 σ11
Regression est.
P (y2i = 0|x1i , x2i ) = 1 − Φ(x⊤ 2i β2 )
Semiparametrics  
Discrete choice  x⊤ β + σ σ −1 (y − x⊤ β ) 
2i 2 q12 11 i1 i1 1
Censored data P (y2i = 1|y1i , x1i , x2i ) = Φ
 2 σ −2 
Introduction
Truncation
1 − σ12 11
Tobit model
MLE 2 σ −2 )
• log-likelihood contribution (denoting σc2 = 1 − σ12
Tobit – interpretation 11
Specification
Alternatives
Two-part Tobit li (β, σ11 , σ12 ) = (1 − y2i ) log{1 − Φ(x⊤ 2i β2 )}
Application  ⊤ −1 ⊤

Alternatives x2i β2 + σ12 σ11 (yi1 − xi1 β1 )
Sample selection + y2i log Φ 2 σ −2 )1/2
Two-step estimation (1 − σ12 11
⊲ MLE  
Example y1i − x⊤ 1i β1
Application + log φ − log(σ11 )
Final thoughts
σ11
Econometrics Slide 218
Example: female wage equation

Introduction
Married women labor force participation (Mroz, 1987):
Estimation

Binary choice
• wage offer = observed only for those who choose to work
Density estimation • explanatory variables for the wage equation
Regression est.

Semiparametrics
◦ education
Discrete choice ◦ labor market experience
Censored data
Introduction • additional explanatory variables for the participation equation
Truncation
Tobit model
MLE ◦ non-wife income
Tobit – interpretation
Specification ◦ age
Alternatives
Two-part Tobit ◦ number of children
Application
Alternatives
Sample selection
Two-step estimation
MLE
⊲ Example
Application

Final thoughts

Econometrics Slide 219


Example: female wage equation

Introduction

Estimation Heckman selection model -- two-step estimates


Binary choice

Density estimation Probit selection equation


Regression est. ---------------------------------------------------
Semiparametrics | Coef. Std. Err. z P>|z|
Discrete choice
----------+----------------------------------------
Censored data
Introduction
inlf |
Truncation
Tobit model
nwifeinc | -.0120237 .0048398 -2.48 0.013
MLE age | -.0528527 .0084772 -6.23 0.000
Tobit – interpretation
Specification educ | .1309047 .0252542 5.18 0.000
Alternatives
Two-part Tobit
exper | .1233476 .0187164 6.59 0.000
Application
Alternatives
expersq | -.0018871 .0006 -3.15 0.002
Sample selection kidslt6 | -.8683285 .1185223 -7.33 0.000
Two-step estimation
MLE kidsge6 | .036005 .0434768 0.83 0.408
⊲ Example
Application
_cons | .2700768 .508593 0.53 0.595
Final thoughts ---------------------------------------------------
Econometrics Slide 220
Example: female wage equation

Introduction

Estimation Heckman selection model -- two-step estimates


Binary choice ---------------------------------------------------
Density estimation | Coef. Std. Err. z [OLS]
Regression est. ----------+----------------------------------------
Semiparametrics lwage |
Discrete choice
educ | .1090655 .015523 7.03 .107
Censored data
Introduction
exper | .0438873 .0162611 2.70 .041
Truncation
Tobit model
expersq | -.0008591 .0004389 -1.96 -.001
MLE _cons | -.5781032 .3050062 -1.90 -.552
Tobit – interpretation
Specification ----------+----------------------------------------
Alternatives
Two-part Tobit
mills |
Application
Alternatives
lambda | .0322619 .1336246 0.24 0.809
Sample selection ----------+----------------------------------------
Two-step estimation
MLE rho | 0.04861
⊲ Example
Application
sigma | .66362875
Final thoughts ----------+----------------------------------------
Econometrics Slide 221
Example: female wage equation

Introduction

Estimation Heckman selection model -- MLE estimates


Binary choice ---------------------------------------------------
Density estimation | Coef. Std. Err. z [OLS]
Regression est. ----------+----------------------------------------
Semiparametrics lwage |
Discrete choice
educ | .1083502 .0148607 7.29 .107
Censored data
Introduction
exper | .0428369 .0148785 2.88 .041
Truncation
Tobit model
expersq | -.0008374 .0004175 -2.01 -.001
MLE _cons | -.5526973 .2603784 -2.12 -.552
Tobit – interpretation
Specification ----------+----------------------------------------
Alternatives
Two-part Tobit
rho | .0266078 .1470778
Application
Alternatives
sigma | .6633975 .0227075
Sample selection lambda | .0176515 .0976057
Two-step estimation
MLE ---------------------------------------------------
⊲ Example
Application

Final thoughts

Econometrics Slide 222


Application

Introduction
Buchinsky (1998) The dynamics of changes in the female wage
Estimation
distribution in the USA: a quantile regression approach. Journal of
Binary choice
Applied Econometrics 13, 1–30.
Density estimation

Regression est.

Semiparametrics

Discrete choice

Censored data
Introduction
Truncation
Tobit model
MLE
Tobit – interpretation
Specification
Alternatives
Two-part Tobit
Application
Alternatives
Sample selection
Two-step estimation
MLE
Example
⊲ Application

Final thoughts

Econometrics Slide 223


Introduction

Estimation

Binary choice

Density estimation

Regression est.

Semiparametrics

Discrete choice

Censored data Final thoughts


⊲ Final thoughts
MLE
Semiparametrics
Microeconomic data
The end

Econometrics Slide 224


Maximum likelihood estimation

Introduction
Maximum likelihood
Estimation

Binary choice
• applicable in many nonlinear models
Density estimation • distributional assumptions necessary
Regression est.

Semiparametrics Examples of MLE – discrete-choice responses


Discrete choice
• probit/logit, ordered probit/logit, multinomial probit/logit
Censored data

Final thoughts
• count data
⊲ MLE
Semiparametrics Examples of MLE – partially discrete, partially continuous responses
Microeconomic data
The end
• Tobit, two-part Tobit, Tobit type II (sample selection)
• models with random censoring

Applications
• extensions needed (non-constant thresholds, random censoring,
random coefficients, sample selection probit, endogeneity, ...)

Econometrics Slide 225


Nonparametric estimation

Introduction
Nonparametric estimation
Estimation

Binary choice
• very flexible, minimal assumptions
Density estimation • not easily applicable directly in models with several/many
Regression est. explanatory variables
Semiparametrics

Discrete choice Semiparametric estimation


Censored data
• combine benefits of nonparametric and parametric methods
Final thoughts
MLE
⊲ Semiparametrics
Microeconomic data
Applications
The end
• testing of distributional or functional assumptions
• estimation with relaxed assumptions, where suitable
• nonparametric identification

Econometrics Slide 226


Microeconomic data

Introduction
Choices such as
Estimation

Binary choice
• structural versus reduced-form analysis
Density estimation • parametric versus semiparametric estimation
Regression est.
• approach to semiparametric estimation
Semiparametrics
• software package and numerical tools
Discrete choice

Censored data should primarily depend on


Final thoughts
MLE • purpose of the analysis
Semiparametrics
⊲ Microeconomic data • availability of data
The end
• characteristics of data

Econometrics Slide 227


The end

Introduction

Estimation

Binary choice

Density estimation

Regression est.

Semiparametrics

Discrete choice

Censored data

Final thoughts
MLE
Semiparametrics
Microeconomic data
⊲ The end

Econometrics Slide 228

Potrebbero piacerti anche