Sei sulla pagina 1di 3

Solutions to the Review Questions at the End of Chapter 11 1.

While the linear probability model (LPM) is simple to estimate and intuitive to interpret, it is fatally flawed as a method to deal with binary dependent variables. There are several problems that we may encounter There is nothin! in the model to ensure that the fitted probabilities will lie between "ero and one. #ven if we truncated the probabilities so that they ta$e plausible values, this will still result in too many observations for which the estimated probabilities are e%actly "ero or one. &t is simply not plausible to say that the probability of the event occurrin! is e%actly "ero or e%actly one. 'ince the dependent variable only ta$es one of two values, for !iven (fi%ed in repeated samples) values of the e%planatory variables, the disturbance term will also only ta$e on one of two values. (ence the error term cannot plausibly be assumed to be normally distributed. 'ince the disturbances chan!e systematically with the e%planatory variables, they will also be heteroscedastic.

). *oth the lo!it and probit model approaches are able to overcome the limitation of the LPM that it can produce estimated probabilities that are ne!ative or !reater than one. They do this by usin! a function that effectively transforms the re!ression model so that the fitted values are bounded within the (+,1) interval. ,isually, the fitted re!ression model will appear as an 'shape rather than a strai!ht line, as was the case for the LPM. The only difference between the two approaches is that under the lo!it approach, the cumulative lo!istic function is used to transform the model, so that the probabilities are bounded between "ero and one. *ut with the probit model, the cumulative normal distribution is used instead. .or the ma/ority of the applications, the lo!it and probit models will !ive very similar characterisations of the data because the densities are very similar. 0.(a) When ma%imum li$elihood is used as a techni1ue to estimate limited dependent variable models, the !eneral intuition is the same as for any other model a lo!-li$elihood function is formed and then the parameter values are ta$en to ma%imise it. The form of this LL. will depend upon whether the lo!it or probit model is used2 further technical details on the estimation are !iven in the appendi% to 3hapter 11.

1/3

Introductory Econometrics for Finance Chris Brooks 2008

(b) &t is temptin!, but incorrect, to state that a 1-unit increase in x)i, for e%ample, causes a )4 increase in the probability that the outcome correspondin! to yi 5 1 will be realised. This would have been the correct interpretation for the linear probability model. *ut for lo!it and probit models, this interpretation would be incorrect because the form of the function is not Pi 5 1 6 )xi 6 ui, for e%ample, but rather Pi 5F(x)i), where F represents the (non-linear) lo!istic or cumulative normal function. To obtain the re1uired relationship between chan!es in x)i and Pi, we would need to differentiate F with respect to x)i and it turns out that this derivative is )F(x)i). 'o in fact, a 1-unit increase in x)i will cause a )F(x)i) increase in probability. 7sually, these impacts of incremental chan!es in an e%planatory variable are evaluated by settin! each of them to their mean values. (c) While it would be possible to calculate the values of the standard !oodness of fit measures such as RSS, R) or ad/usted R) for linear dependent variable models, these cease to have any real meanin!. &f calculated in the usual fashion, these will be misleadin! because the fitted values from the model can ta$e on any value but the actual values will only be either + and 1. The model has effectively made the correct prediction if the predicted probability for a particular entity i is !reater than the unconditional probability that y 5 1, whereas R) or ad/usted R) will not !ive the model full credit for this. Two !oodness of fit measures that are commonly reported for limited dependent variable models are The percenta!e of yi values correctly predicted, defined as 1++ times the number of observations predicted correctly divided by the total number of observations. 8bviously, the hi!her this number, the better the fit of the model. 9lthou!h this measure is intuitive and easy to calculate, :ennedy ()++0) su!!ests that it is not ideal, since it is possible that a na;ve predictor could do better than any model if the sample is unbalanced between + and 1. .or e%ample, suppose that yi 51 for <+4 of the observations. 9 simple rule that the prediction is always 1 is li$ely to outperform any more comple% model on this measure but is unli$ely to be very useful. 9 measure $nown as =pseudo- R)>, defined as 1 LLF LLF0 where LLF is the ma%imised value of the lo!-li$elihood function for the lo!it or probit model and LLF+ is the value of the lo!-li$elihood function for a restricted model where all of the slope parameters are set to "ero (i.e. the model contains only an intercept). 'ince the li$elihood is essentially a /oint probability, its value must be between "ero and one, and therefore ta$in! its lo!arithm to form the LLF must result in a ne!ative number. Thus, as the model fit improves, LLF will become less ne!ative and therefore pseudo- R) will rise. This definition of pseudo- R) is also $nown as Mc.adden>s R).

(d) When there is more than one alternative outcome, but they are unordered, this is sometimes called a discrete choice or multiple choice problem. The models used are derived from the principles of utility ma%imisation ? that is, the a!ent chooses the alternative that ma%imises his utility relative to the 2/3 Introductory Econometrics for Finance Chris Brooks 2008

others. #conometrically, this is captured usin! a simple !eneralisation of the binary setup. When there were only ) choices (+, 1), we re1uired /ust one e1uation to capture the probability that one or the other would be chosen. &f there are now, say, three alternatives, we would need two e1uations2 for four alternatives, we would need three e1uations. &n !eneral, if there are m possible alternative choices, we need m-1 e1uations. The omitted e1uation then becomes a reference point, analo!ous to an omitted cate!orical dummy variable in a standard re!ression model, and the parameter interpretations need to be modified accordin!ly compared with the binary choice setup. @. (a) The differences between censored and truncated variables are as follows. 3ensored data occur when the dependent variable has been =censored> at certain point so that values above (or below) this cannot be observed. #ven thou!h the dependent variable is censored, the correspondin! values of the independent variables are still observable. 9s an e%ample, suppose that a privatisation &P8 is heavily oversubscribed, and you were tryin! to model the demand for the shares usin! household income, a!e, education, and re!ion of residence as e%planatory variables. The number of shares allocated to each investor may have been capped at, say )A+, resultin! in a truncated distribution. &n this e%ample, even thou!h we are li$ely to have many share allocations at )A+ and none above this fi!ure, all of the observations on the independent variables are present and hence the dependent variable is censored, not truncated. 9 truncated dependent variable, on the other hand, occurs when the observations for both the dependent and the independent variables are missin! when the dependent variable is above (or below) a certain threshold. Thus the $ey difference from censored data is that we cannot observe the e%planatory variables either, and so some observations are completely cut out or truncated from the sample. .or e%ample, suppose that a ban$ were interested in determinin! the factors (such as a!e, occupation and income) that affected a customerBs decision as to whether to underta$e a transaction in a branch or on-line. 'uppose also that the ban$ tried to achieve this by encoura!in! clients to fill in an on-line 1uestionnaire when they lo! on. There would be no data at all for those who opted to transact in person since they probably would not have even lo!!ed on to the ban$Bs web-based system and so would not have the opportunity to complete the 1uestionnaire. Thus, dealin! with truncated data is really a sample selection problem because the sample of data that can be observed is not representative of the population of interest ? the sample is biased, very li$ely resultin! in biased and inconsistent parameter estimates. This is a common problem, which will result whenever data for buyers or users only can be observed while data for non-buyers or non-users cannot. 8f course, it is possible, althou!h unli$ely, that the population of interest is focused only on those who use the internet for ban$in! transactions, in which case there would be no problem.

3/3

Introductory Econometrics for Finance Chris Brooks 2008

Potrebbero piacerti anche