Sei sulla pagina 1di 20
Applied Econometrics ing vari 8) What have other studies done? What variables have the authors included? ») Look for symptoms of an omitted variable such as unusually large oF small coefficients or coefficients that have the incorrect sign (not conforming to economic Intuition) ) utilize the Ramsey RESET Test (Regression Equation Specification Error Test) by running a regression in STATA and then typing the command: “estat ovtest”, The null hypothesis Is that the model has no omitted variables, Rejection of the null hypothesis implies that there are possible missing variables and the model suffers from endogenelty, causing biased coefficient estimates, Example: + estat ovtest Ransey RESET test using powers of the fitted values of nettfa Ho: model has no omitted variables (3, 9266) = 246.86 Prob > F= 0.0000 Here we reject the null hypothesis that the model has no omitted varlables in favor of the alternative that the model does have possible missing variables at the 19% level. This indicates that our model most likely suffers from endogeneity causing biased coefficient estimates. Funeti Recall from class that the multiple regression framework allows us to model both linear and nonlinear relationships. We can use a simple linear model “lin-lin’, a “log lin’, a“lin-log’, a “log-log” that readily gives us elasticities, or a polynomial model (a special case of which Is quadratic). 4) We should choose the model based upon examining the scatterplots of the dependent variable and each independent variable. Ifthe scatterplot exhibits a non- linear relationship, then we should not use the simple linear model “lin-lin’, Another way to determine which functional form isthe best is to compare the R? of each form as long as the dependent variables are the same. You should choose the model with the higher coefficient of determination in this case. However, oftentimes the dependent variables will differ (Le. the case of ‘comparing In(y) and y). The rule of thumb if the dependent variables differ is to choose the model with a higher K? ifthe value is atleast 0.05 greater, ) An incorrect functional form can lead to biased coelficients (endogeneity). a) If the symptoms of multicollinearity exist (low t-statistics and. Insigalflcant coefficients) then you should: 4) Examine the correlation matrix between the independent variables by typing the command "correlate" followed by the list of your independent variables in STATA. As a rule of thumb, a correlation of 0.8 or higher is indicative of imperfect multicollinearity. ii) Calculate the VIF (variance inflation factor). A variance inflation factor greater than or equal to 5 implies that imperfect multicollinearity is likely. In STATA, estimate your regression and then type the command “estat vif" in order to get the variance inflation factor. »b) The possible solutions to imperfect multicollinearty are: {) Drop a variable ifit is neither significant nor theoretically valid nor ikely to cause omitted variable bias. fit is likely to cause omitted variable bias, do not drop the variable. Biased coefficients are mare serious than inflated standard errors. 4 Transform a variable. Example: Variable | ur air ‘erend | 9 0. i0asee iscense | 9198 rdasae7 specdiinices | ilo bvsezee ie ‘cease 1 1.0000 spoedtinites | 078804 1.0000 frend | 0.9330 018831 1.0000, —____Mullicallinearty appears to exist since the variance inflation factor is greater than. ‘We can also tell that the multicollinearity arises between “license” and “trend!” since the correlation is greater than 0.80 (itis 0.933). 1V. "Testing for Serial Correlation (Autocorrelation) a) Serial Correlation is defined as correlation between the observations of residuals It can be caused by a missing variable, an Incorrect functional form, or the pure serlal correlation that frequently arises in time series data b) In order to test for autocorrelation we will utilize the Breusch-Godtrey Lagrange Multiplier Test. In order to implement this test in STATA type the ‘command “estat bgodfrey” after running your regression. The null hypothesis is that there is no serial correlation. ) If we find serial correlation, then we can correct for it by utilizing the command “prais” rather than "regress" in STATA. Example: estat bgodtrey Breusch-Godtrey LM test for autocorrelation lagstp) | biz a Prob > chi2 no serial correlation We reject the null hypothesis that there is no serial correlation in favor of the alternative that autocorrelation exists at the 5% level. We can correct for this by utilizing the “prais" command rather than "regress" ‘V.Testing for Heteraskedasticity a) In order to test for heteroskedasticity we will utilize White's Test which is ‘another Lagrange Multiplier (LM) Test. In order to implement White's Test in STATA we utilize the command “estat imtest, white" after running our regression. ‘The null hypothesis is homoskedasticity, ») If we find heteroskedasticity (rejecting the null hypothesis) then we can correct for it by utilizing the weighted least squares estimation procedure that is BLUE ifthe other classical assumptions hold, More easly, however, we can adjust the standard errors for heteroskedasticity by making them robust standard errors, After typing ‘your regression command into STATA utilize the following at the end of it: *“wee(robust)". Example: estat imtest, white White's test for Ho: homoskedasticity against Ha: unrestricted hataraskadasticity eni2i7) 9.66 Prob > chi2 = 0.2085 Heteroskedasticity does not appear to exist in this case since we fail to reject the null hypothesis of homoskedastiity at any reasonable level of significance. ECONOMET étes LAB — _| GpP dete ce @ Paste data fom Lea ink Stake “Data Ed ite’ DE iptive SteHshics of Betas ‘Summarize, [Scramnarize _gp9r, detail] [Semerarine. gap ge, detail | Y) Only jinkeested Linte Scheatien dicka, < (1993 = 200), you can Filler the athec years Surnmarize if year >= 1993 & year = 2000] of SD) Let's say yu want to gonacete a legariThm trans for ‘ea of _uneniyal; nt t [genacete—lagunemp= Inloatmp)] | lack ia DEth elite de cae ra verelle becnemp|| VG) yoo can alse label the vadeke go [them dese ripen 4 labe| variable gdpgr “GOP 9 rauth” reate a dwe-way scattecpht o€ gdpye ard year, [scatter gdege yeor | Create a neg, F inves it growtl _| frwoway (connected inv gr yeary] | Now lets create a line-araph foc multiple Series, — Tp | Namely — Growth PHwoway (connected comsgr gipgeinvge year) | ii A A EL histegcam_gd eye] = = Lt 9) ow TeAS obtain a histogram of GDP gcouith. —® t's look Correlation bahween each of our sariala| ———_fesrelntetngegapenpge conage Image gredvege cneng Let's do a simple linear £eg.cession whee GDP i growth i independent egress gdpge _invgr Restaurant Data. . New You Te Label tha Varjable er capita income in three mile _eadivs of cestavcant™ Lobe] the Variable p * populati three mil - divs of restaurant” T= income N= competitors f= population, Y= Sal what is the dependant variable? @ mean, median, maximern and mia Summarize the variable income by Gading the Ave. “C_® Generate « seaterpiot of sales and income, @ ate a histogr Ge: efi ©. aan What does A Simple linear _regcession of income on Sales tha. p- Wwe teil you? saa @ Multiple Reg sSion! feqgresS Sales populthon compet’ tors jacome. Examining the Average Tuition Elasticity of Enrollment for Two-Year Public Colleges Directions: Find the corresponding Fxcel dataset on WebCampus and paste the dataset into STATA. Examine your dataset and follow the steps outlined below while answering the questions. 1) What type of data is this? What isthe unit of analysis? How many observational entities are there? How many time periods? What i the total number of observations (NY? 2) What is the advantage of utilizing this type of data aver other types? 3) Let's begin by recoding the FIPS numbers into regions as defined by the US Census, These are our commands: reoode fipstate (9 10 11 23 25 33 3436 42 44 50= 1) (17 18 19 20 26.2731 38.39 4655=2)(15 1213 212224 28 29 37 40 45 47 48 51 54=3) (2468 15 16 3032 35 41 49 53 56=4), en(region) recode region (1= 1) 2.3 4= 0), gen(norteast) recode region (21) (13 4=0), gen(midwest) recode rogion (3= 1) (124% 0), gen(south) eende region (4 1) (1230), gngwest) 4) Generate logarithms of our key dependent and independent variables s0 that we can estimate clastctes easily: generate Inencr=In(encs) generate Inentin(ent) generat intin(td) ‘generate Ints“In(ts) generate Intos=n(tos) ‘generate Infed-In(fed) generate Inlocal-In(lcel) generat Ininstinins) generate Inhs-In(ss) ‘generate Infac=In(fac) ‘generate Inun=ln(un) ‘generate Inine=lninc) generate Ineppubin(ppub) sgenerateIncppri=In(eppri) generate Infem=in(fem) ‘generate Innw=ln(a) _ seen nding) fers Ingratinge) 5) Identify the panel data variables (the entity and time variables): xtset id year (6) Examine the correlation between our independent variables before we begin: correlate td ts tos fed local int bs fae un ine eppub eppri fem nw ‘Where may imperfect multcollinearity be a problem? 7) Estimate the following models in STATA @ In(EWg) = Ba + Bila (TDi.) + Baln(FED,) + aln(HS)xe + Byln(INCy) + Bln(CPPUBy.) + Bol(UN) se + Byl0(FEM)ge + Bala(NW)e+ e+ Hie ‘Which variables are significant? At what level? © In(ENiz) = Bo + Brln(TDi) + Prln(FED.) + Baln(HS)ee + Balm(INCye) + Beln(CPPUB.) + Bal(UN)ce + BrlN(PEM)se + Boln(NW yee + te Which variables ar. 8) Which model do you prefer (aor b)? Why is it preferred? 9) What is the tution elasticity of enrollment? Does this result make sense? Why or why not? If not, what may the problem be? If it does make sense, explain why this result is theoretically sound, ificant? At what level? 10) Run the regression in 7(a) for only the observations in the following regions (thus, a ‘regionally restricted regression in order to compare regions): ‘) Northeast if nortbeast—1) {i) West (i west—=1) ‘What do you notice about the tution elasticity of enrollment across these two regions? What ‘may be driving the result? “Time Series Econometrics where pis measure of the. f serial eocvelotion Chow" correlated &e ist & P=O > no seri ccelachion =o >? positive serial correlation P< O —> negetive serial corelaMon [Causes of Serial Covelation © Pore Serial Correlation — revidvals ac _corcelated (D_Missing variable can cause serial covceleh'on (2) dncerreet ce ehanolaleoem ce DGledard ereoes sar ee ialse_bs._i han: @ Take Fret diffrences inclead of levels ® utilize Generalized Least Squores (GLS) by replacin, Eaves ting f Stationarity —7 a statisnory time Series haS a corstaet eae mean and variance ~— veridbles case the following problems aDaaeat fagression feeuits = false stihl pine 3) t- stats do oot follow trdisricHens DF-stais deo not Slew F dishibvtien A tonm stationary procass_is alse known as © cai ‘endoma_elk ik @ unit root, and cae tolegended of odes dey —— -|arclom Walk — i = stationary noi tena How 42 Chack for Sabenarity 7 net 1 Variable js _aop-stationory - | Dickey - Fullue test aa Hat Variable is stetionarye [Ta STATA type! “dfullec” ord thon the variable nema. lite ate —____|Steg #1: Cheek 4 ant oe Ke ges, stationacy @) ifne yes coinkgrated Q ie iftroe tS ~check residvals of regression for stationarity. — Estimate 4no vegression , £1) STATA +o hoe residuals, _ an ie ct resids, residuals” and then “dfullar vesids” £ - : Restaurant Data — Sohshiows Co a Vabel variable p "population in three mile radius of restaurant yer capita income in three mile radius of restaurant" <9 sales is the dependent variable Percentiles smallest 1% 3240" 1240 oe 14409 14409 108 Lasse is7 one B 258 iss39 14583 Sm of wot. 3 sox 19200 ean 20552.58, Largest Std bev. ‘524865 22554 ‘8015 2e9i5 $0902 variance 2.640407 5573 31573, Skewes 19536938 33242 352420 kurtosis © 3460758 ‘his shows that the mean of per capita income 1s $20,552.58, the wedian is $ 3 19,200 where the cunutative density 15 SOX), the minimu vaiue 1s 319-240 and 5 he‘aavisum value is 593,242 © seater v4 80000 120080 18200 2 25000 000 38000 per capi ream n twee ie rade of restaurant © etncS *etartaaz4o, widtheoo.4) g | i 8 zi 3 2000, re) @ regress y 4 souren eo hunber of she = | a3 EO SS Daz. 38 del | 4.63226009 1 4,63226009 Prob Se esieust | i:iascerto at “3eevasese eostuared Aaj ac sovared Torat | i.e0e2e10 32 So1o4a246 oor isl y coef. Std. Err «tI ‘(958 Conf, Interval + | 2,339908 ,eeonss2 3.54 0.00 9035124 3.csa03 wcons | Tisaas i5875037 535 L000 “asonas'a_doeonas’ * The povalue on income is 0,001 which means that at any reasonable Tevel of s jlgmificance, we ean conclude that ncore has a significant effect an suies. Our bese guess, ‘or our poinc estinate™"is that for-eyery dollar inereaze tn income, we wilt, see'a $2.34 increase tn sales. the confidence interval felis us thot me ‘can'be 958 confident that the true effect of a dollar increase in income Ts bee ween &. 50,99 and’ $3.69 Increase in sales: JRACT ICE 1. You estimate the regression that appears on the next page in order to study the determinants of wage. Data are gathered for 26 individuals in 2009: WAGE\= average hourly earnings, in dollars, forthe i* individual EXPER, = years of working experience, EXPERSQ)= years of working experience, squared EDUC\= number of years of education TENURE:= number of years with current employer FEMALE)= dummy equal to] if female, 0 otherwise MARRIED). dummy equal to 1 if married, 0 otherwise 1. Interpret the coefficients of EDUC and FEMALE. Are they significant? At what level? , How much will » 10% increase in average tenure change wage? ‘c. What econometric problems appear to exist and not exist? Be specific, backing up ‘your answers with evidence from the regression and tests that follow (pages 3-4). 4, Based on the regression results, what i the relationship between wages and ‘experience? Explain or graph the relationship . At what level of experience are wages maximized, ceteris paribus? + regress wage exper expersg educ tenure fenale married Source | ss ae us Mumber of obs = __ 526 a : FC 6) S19) = 87.37 Model | 2655.33214. 6 475.808689 Prob SP = 0.0000 Residual | 4305.08215 $19 @.29495598 Resquared = 0.3588 SS e Ag} Reaquared = 0.3310 ‘Total | 7160.41429 525 13.6298044 Root MSE = 2.8801 © Biel (954 Cont. Interval] exper | .2004023 0372579 5.38 sz72074 «2735972 expersq | ~.0040498 ‘000763 5.14 0.000 -,0088983 -Lo0zso12 educ | 15282458 lo4s9e18 10.78 0.000 © .4320187 «624473 tenure | 11334383 0206634 6.460.000 Losze44i 740325 female | 1.779127 12603125 6.89 0.000 2.290522, -i.267731, married | .0924594 2936218 0.210.753, ‘4043746692927 ‘cone | 2.118117 i71z6914-2.97 0.003 -3.510232 | ~.7280024 tat imtest, white Ihite's teat for Has honnskedasticity ‘against Ha: unrestricted heteroskedasticsty eniziza) = 0.92 Prob > chi 0.0000 correlate exper expersg educ tenure female married obs=526) | exper expersg _edue tenure fenale married exper | 1.0000 expersg | 0:9610 1.0000 fedue | -0.2995 -0.3313 1.0000 tenure | 0.4993 014592 -0.0562 1.0000 fenale | -0,0416 -010279 -0.0850 -011979 1.0000 i married | 0.3170 0.2173 0.0689 0.2398 -0.1661 1.0000 estat ovteat Ramsey RESET test using powers of the fitted values of wage Ho: model has no omitted variables, F(3, 516) 176 Prob>F= 0.2360 + summarize vage exper expersg educ tenure female married Variable | obs Mean std. bev. min ax wage | 526 5.896103 3.693006 53 24.98 exper | 26 7.01711. 13,57216 1 BY expersq | 526 473.4394 616.0448 1 2601 fedue | 526 12.56274 2.769022 ° 18 tenure | 526 S.i04se3 7.224462 ° 44 female | 264790875 500038 ° 1 married | 526" -608365 4885804 ° 1 2. The regression in Problem 1 was re-estimated using the natural log of wage rather than wage, and the results are reported on the next page. Using these results and the tests provided that follow, answer the questions below. For the regression on the next page, LWAGE;= natural log of wage All other variables as defined for Problem 1 ‘4. Comparing this regression with the one given in Problem 1, which is beter? Explain ‘carefully. + regress Ivage exper expereg educ tenure female married Source | ss ae us Number of obs = 526 wenen ene ntenccnncnne eon nnnnnn= Ft 6, 519) = 66.91, Model | 64.6920613 6 10.7820102 Prob P= 050000, Residual | 03.6376902 S19 .161151619 Resquared = = 0.4361 aon onan Adj Resquared = 014296 Total | 148.329751 $25 20253206 Rook MSE = .a01N4 wage | Coef. Std. Err. -t ~—-p>it| (958 conf. Interval exper | .0300995 0051931 5.80 0.000 0198874 0403017 expersq | -10006012 ‘0001093 5.47 0.000 ~.000817 -.0003853 educ | 10798322 1006827311768 0.000 vage49a —- 0952447 tenure | 10160739 Loozes01 5-58 0.000 10104187 oa 732 female || 12911303 10362832 -8.02 0,000 ~13624102-.2198503 married | 10564494 0409259 1138, 10239515 ‘1360503, cons | 14158424 109933724119 22069 “6109948 < estat imtest, white ite's test for Ho: honoskedasticity ‘against Ha; unrestricted heteroskedasticity euizeza) = aa.72 Prob > eni2 = 0.1782 eat oveest. Ramsey RESET test using powers of the fitted values of Image Hor model has no omitted variables F(3, $16) = 0.69 Prob > 018308

Potrebbero piacerti anche