Sei sulla pagina 1di 56

PROC ROBUSTREG Robust Regression Models

April 20, 2005 Charlie Hallahan


1

Overview
PROC ROBUSTREG is experimental in SAS/ETS Version 9.* Main purpose is to detect outliers and provide resistant (stable) results in the presence of outliers Addresses three types of problems: problems with outliers in the y-direction (response direction) problems with multivariate outliers in the x-space (leverage points) problems with outliers in both the y-direction and x-space * These notes closely follow the SAS documentation for ROBUSTREG. Also, see the paper Robust Regression and Outlier Detection with the ROBUSTREG Procedure by Colin Chen presented at SUGI27 in 2002 (http://www2.sas.com/proceedings/sugi27/p265-27.pdf )
2

Overview
ROBUSTREG supports four methods: 1. M estimation: introduced by Huber in 1973. Simplest both computationally and theoretically. Only addresses contamination in the response direction. Least Trimmed Squares (LTS): introduced by Rouseeuw in 1984. It is a so-called high breakdown method. The breakdown value is a measure of the proportion of contamination that an estimation method can withstand and still maintain its robustness. Uses the FAST-LST algorithm of Rouseeuw and Van Driessen (1998). S estimation: introduced by Rouseeuw and Yohai in 1984. It is a high breakdown method that is more statistically efficient than LTS. MM estimation: introduced by Yohai in 1987. Combines high breakdown value estimation and M estimation. It is a high breakdown method that is more statistically efficient than S estimation 3

2.

3.

4.

Overview M Estimation
Before getting to the SAS code, its probably worthwhile to review whats involved with the simplest robust estimator, the M estimator. These notes follow some online documentation for the text Applied Regression Analysis, Linear Models, and Related Methods by John Fox. When the error distribution is normal, least squares (LS) is the most efficient regression estimator. However, LS is very sensitive to outliers (aberrant observations in the y-direction) at high leverage points (aberrant observations in the x-direction). Such cases result in heavy-tailed error distributions.

Overview M Estimation

Overview M Estimation

Overview M Estimation

Overview M Estimation

Overview M Estimation

Overview M Estimation

10

Overview M Estimation

11

Leverage Points
Leverage points are outlying points in the x-direction. A leverage points may or may not have an effect on the estimated regression model.

12

Getting Started- M Estimation


M estimation is used when the contamination occurs mostly in the response direction. The data set stack is the stackloss data of Brownlee (1965). The data describe the operation of a plant for the oxidation of ammonia to nitric acid and consist of 21 fourdimensional observations. The response variable (y) represents stackloss and the explanatory variables are the rate of operation (x1), the cooling water inlet temperature (x2) and the acid concentration (x3).
* M Estimation; data stack; input x1 x2 x3 y; datalines; 80 27 89 42 80 27 88 37 : : : 56 20 82 15 70 20 91 15 ;

13

Getting Started- M Estimation


proc robustreg data=stack; model y = x1 x2 x3 / diagnostics leverage; id x1; test x3; run; By default, the procedure does M estimation with the bisquare weight function, and it uses the median method for estimating the scalar parameter. The MODEL statement specifies the covariate effects The DIAGNOSTICS option requests a table for outlier diagnostics The LEVERAGE option adds leverage point diagnostic results for continuous effects The ID statement specifies the variable x1 to identify observations in output tables the TEST statement tests the significance of an effect
14

Getting Started- M Estimation


The ROBUSTREG Procedure Model Information Data Set Dependent Variable Number of Independent Variables Number of Observations Method WORK.STACK y 3 21 M Estimation

Number of Observations Read Number of Observations Used

21 21

Summary Statistics Standard Deviation 9.1683 3.1608 5.3586 10.1716

Variable x1 x2 x3 y

Q1 53.0000 18.0000 82.0000 10.0000

Median 58.0000 20.0000 87.0000 15.0000

Q3 62.0000 24.0000 89.5000 19.5000

Mean 60.4286 21.0952 86.2857 17.5238

MAD 5.9304 2.9652 4.4478 5.9304

Note the response variable (y) has the biggest discrepancy between the two estimates of scale, the standard deviation and MAD.

15

Getting Started- M Estimation


Parameter Estimates Standard Parameter DF Estimate Error Intercept x1 x2 x3 Scale 1 -42.2854 1 0.9276 1 0.6507 1 -0.1123 1 2.2819 95% Confidence Limits ChiSquare Pr > ChiSq 19.79 74.11 4.90 0.81 <.0001 <.0001 0.0269 0.3683

9.5045 -60.9138 -23.6569 0.1077 0.7164 1.1387 0.2940 0.0744 1.2270 0.1249 -0.3571 0.1324

M estimation yields the fitted linear model:

42.2845 0.9276 x1 0.6507 x 2 01123 y . x3


16

Getting Started- M Estimation


Diagnostics Robust MCD Distance 5.5284 5.6374 4.1972 1.5887 3.6573 Standardized Robust Residual 1.0995 -1.1409 1.5604 3.0381 -4.5733 Obs 1 2 3 4 21 x1 80.000000 80.000000 75.000000 62.000000 70.000000 Mahalanobis Distance 2.2536 2.3247 1.5937 1.2719 2.1768 Leverage * * * * Outlier

* *

Diagnostics Summary Observation Type Outlier Leverage

Proportion 0.0952 0.1905

Cutoff 3.0000 3.0575

Observations 4 and 21 are outliers because their robust residuals exceed the cutoff value in absolute value. 4 high leverage points are detected, mainly caused by x1. Note that only observation 21 is a bad leverage point, i.e., an aberrant x-value that results in a large residual.
17

Getting Started- M Estimation


Two useful plots are the RDPLOT, robust residuals against robust distances. and the DDPLOT, robust distances against classical Mahalanobis distances. Residual plots are also useful. These plots are available with ODS GRAPHICS.

ods html; ods graphics on;


proc robustreg data=stack plots=(rdplot ddplot reshistogram resqqplot); model y = x1 x2 x3; run; ods graphics off; ods html close;
18

Getting Started- M Estimation


RDPLOT:

19

Getting Started- M Estimation


DDPLOT:

20

Getting Started- M Estimation

21

Getting Started- M Estimation

22

Getting Started- M Estimation


Goodness-of-Fit Statistic R-Square AICR BICR Deviance Value 0.6659 29.5231 36.3361 125.7905

Robust Linear Tests* Test Test Statistic 0.9378 0.8092 ChiSquare Pr > ChiSq 1.18 0.81 0.2782 0.3683

Test Rho Rn2

Lambda DF 0.7977 1 1

Rho is a robust version of the F-test, and Rn2 is a robust version of the Wald test.23

Getting Started- M Estimation


The default constant for the bisquare weight function is c* = 4.685. With this value the asymptotic efficiency of the M estimates is 95% with the Gaussian distribution. A smaller value of c lowers the asymptotic efficiency but sharpens the M estimator as an outlier predictor. To use c=3.5, for example, with the stackloss data set:

proc robustreg method=m(wf=bisquare(c=3.5)) data=stack; model y = x1 x2 x3 / diagnostics leverage; id x1; test x3; run;

* The constant c representing the cutoff value for a weight of zero is called k on p. 10.
24

Getting Started- M Estimation


Parameter Estimates Standard Parameter DF Estimate Error Intercept x1 x2 x3 Scale 1 -37.1076 1 0.8191 1 0.5173 1 -0.0728 1 1.4265 95% Confidence Limits ChiSquare Pr > ChiSq 45.97 174.28 9.33 1.03 <.0001 <.0001 0.0022 0.3111

5.4731 -47.8346 -26.3805 0.0620 0.6975 0.9407 0.1693 0.1855 0.8492 0.0719 -0.2138 0.0681

The refitted linear model with c = 3.5 is:

37.1076 0.8191x1 0.5173x 2 0.0728 x 3 y


25

Getting Started- M Estimation


Diagnostics Robust MCD Distance 5.5284 5.6374 4.1972 1.5887 3.6573 Standardized Robust Residual 4.2719 0.7158 4.4142 5.7792 -6.2727

Obs 1 2 3 4 21

x1 80.000000 80.000000 75.000000 62.000000 70.000000

Mahalanobis Distance 2.2536 2.3247 1.5937 1.2719 2.1768

Leverage * * * *

Outlier *

* * *

In addition to observations 4 and 21, observations 1 and 3 are now detected as outliers.

26

Getting Started- LTS Estimation


If the data are contaminated in the x-space, M estimation does not do well. LTS estimation is more appropriate in this situation. In the following example, the data set hbk is an artificial data set created by Hawkins, Bradu, and Kass (1984). Both OLS and M estimation suggest that observations 11 to 14 are serious outliers. However, these four observations were generated from the underlying model, whereas observations 1 to 10 were contaminated. OLS and M estimation cannot distinguish good leverage points (observations 11 to 14) from bad leverage points (observations 1 to 10). In such cases, LTS identifies the true outliers.

27

Getting Started- LTS Estimation


data hbk; input index$ datalines; 1 10.1 19.6 2 9.5 20.5 3 10.7 20.2 4 9.9 21.5 : : 35 3.1 2.4 36 1.1 2.2 37 0.1 3.0 38 1.5 1.2 ; x1 x2 x3 y @@; 28.3 28.9 31.0 31.7 9.7 10.1 10.3 9.5 39 40 41 42 2.1 0.5 3.4 0.3 0.0 2.0 1.6 1.0 1.2 1.2 2.9 2.7 -0.7 -0.5 -0.1 -0.7

3.0 2.7 2.6 0.2

0.3 -1.0 -0.6 0.9

73 74 75

0.3 0.0 0.3

1.7 2.2 0.4

2.2 1.6 2.6

0.4 -0.9 0.2

Well 1st do M estimation proc robustreg data=hbk method=m; model y = x1 x2 x3 / diagnostics leverage; id index; run;

28

Getting Started- LTS Estimation


Diagnostics Robust MCD Distance 29.4424 30.2054 31.8909 32.8621 32.2778 30.5892 30.6807 29.7994 31.9537 30.9429 36.6384 37.9552 36.9175 41.0914 Standardized Robust Residual 0.2434 0.4821 0.1402 -1.1429 -0.3896 0.0946 1.0055 0.8921 -0.6449 0.1768 -14.2697 -14.8641 -13.6431 -16.1370 Obs 1 3 5 7 9 11 13 15 17 19 21 23 25 27 index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mahalanobis Distance 1.9168 1.8558 2.3137 2.2297 2.1001 2.1462 2.0105 1.9193 2.2212 2.3335 2.4465 3.1083 2.6624 6.3816 Leverage * * * * * * * * * * * * * * Outlier

* * * *

M estimation (wrongly) identifies observation 11 to 14 as outliers and misses the real outliers, observations 1 to 10. 29

Getting Started- LTS Estimation


Parameter estimates from M-estimation:
Parameter Estimates Standard Parameter DF Estimate Error Intercept x1 x2 x3 Scale 1 1 1 1 1 -0.9459 0.1449 0.1974 0.1803 0.8226 0.1359 0.0857 0.0506 0.0420 95% Confidence Limits -1.2123 -0.0230 0.0982 0.0978 -0.6795 0.3127 0.2965 0.2627 ChiSquare Pr > ChiSq 48.42 2.86 15.21 18.38 <.0001 0.0908 <.0001 <.0001

30

Getting Started- LTS Estimation


A warning appears on the SAS Log:
NOTE: Algorithm converged for the M estimates. NOTE: The MCD estimators for covariates have been successfully computed. WARNING: The data set contains one or more high leverage points, for which M estimation is not robust. It is recommended to use METHOD=LTS or METHOD=MM for this data set. NOTE: Algorithm converged for the location-scale M estimates. NOTE: Algorithm converged for the M estimates in the reduced model.

proc robustreg data=hbk fwls method=lts; model y = x1 x2 x3 / diagnostics leverage; id index; run;

31

Getting Started- LTS Estimation


Model Information Data Set Dependent Variable Number of Independent Variables Number of Observations Method WORK.HBK y 3 75 LTS Estimation

Number of Observations Read Number of Observations Used

75 75

Summary Statistics Standard Deviation 3.6526 8.2391 11.7403 3.4928

Variable x1 x2 x3 y

Q1 0.8000 1.0000 0.9000 -0.5000

Median 1.8000 2.2000 2.1000 0.1000

Q3 3.1000 3.3000 3.0000 0.7000

Mean 3.2067 5.5973 7.2307 1.2787

MAD 1.9274 1.6309 1.7791 0.8896

Note large differences between the usual and robust location and scale estimates. 32

Getting Started- LTS Estimation


We re-estimate the model using the LTS method:
proc robustreg data=hbk fwls method=lts; model y = x1 x2 x3 / diagnostics leverage; id index; run;

The option fwls requests that the final weighted least squares method be applied.
LTS Profile Total Number of Observations Number of Squares Minimized Number of Coefficients Highest Possible Breakdown Value 75 57 4 0.2533

In this case, the LTS estimate minimizes the sum of 57 smallest squares of residuals. It can still pick up the right model if the remaining 18 observations are contaminated. This corresponds to a breakdown value around 0.25, which is the default. 33

Getting Started- LTS Estimation


LTS Parameter Estimates Parameter Intercept x1 x2 x3 Scale (sLTS) Scale (Wscale) DF 1 1 1 1 0 0 Estimate -0.3431 0.0901 0.0703 -0.0731 0.7451 0.5749

Two robust estimates of the scale parameter are displayed. The weighted scale estimate (Wscale) is a more efficient estimate of the scale parameter.

34

Getting Started- LTS Estimation


Diagnostics Robust MCD Distance 29.4424 30.2054 31.8909 32.8621 32.2778 30.5892 30.6807 29.7994 31.9537 30.9429 36.6384 37.9552 36.9175 41.0914 Standardized Robust Residual 17.0868 17.8428 18.3063 16.9702 17.7498 17.5155 18.8801 18.2253 17.1843 17.8021 0.0406 -0.0874 1.0776 -0.7875 Obs 1 3 5 7 9 11 13 15 17 19 21 23 25 27 index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mahalanobis Distance 1.9168 1.8558 2.3137 2.2297 2.1001 2.1462 2.0105 1.9193 2.2212 2.3335 2.4465 3.1083 2.6624 6.3816 Leverage * * * * * * * * * * * * * * Outlier * * * * * * * * * *

Diagnostics Summary Observation Type Outlier Leverage

Proportion 0.1333 0.1867

Cutoff 3.0000 3.0575

35

Getting Started- LTS Estimation


As can be seen on the previous slide, LTS correctly identifies the first ten observations as outliers and observations 11 to 14 are identified as good leverage points.
Parameter Estimates for Final Weighted Least Squares Fit Standard Error 0.1044 0.0667 0.0405 0.0354 95% Confidence Limits -0.3852 -0.0493 -0.0394 -0.1210 0.0242 0.2120 0.1192 0.0177 ChiSquare Pr > ChiSq 2.99 1.49 0.97 2.13 0.0840 0.2222 0.3242 0.1441

Parameter Intercept x1 x2 x3 Scale

DF Estimate 1 1 1 1 0 -0.1805 0.0814 0.0399 -0.0517 0.5572

The final weighted least squares estimates are the least squares estimates computed after deleting the detected outliers. Compare with the M-estimation results on p. 30.
36

Example: Comparison of Robust Estimates


This example illustrates differences in the performance of robust estimates available in the ROBUSTREG procedure. The following statements generate 1000 random observations. The first 900 observations are from a linear model and the last 100 observations are significantly biased in the y-direction. In other words, ten percent of the observations are contaminated with outliers.
data a (drop=i); do i=1 to 1000; x1=rannor(1234); x2=rannor(1234); e=rannor(1234); if i > 900 then y=100 + e; else y=10 + 5*x1 + 3*x2 + .5 * e; output; end; run;

Since the data, by design, has outliers, and not high leverage points, both M and MM estimation methods are appropriate.
37

Example: Comparison of Robust Estimates


The true value of the coefficients are 10, 5, and 3 with an error standard deviation of 0.5. Well first see how OLS performs. proc reg data=a; model y = x1 x2; run;
Parameter Estimates Parameter Estimate 19.06712 3.55485 2.12341 Standard Error 0.86322 0.86892 0.83039

Variable Intercept x1 x2

DF 1 1 1

t Value 22.09 4.09 2.56

Pr > |t| <.0001 <.0001 0.0107

The RMSE estimate of 27.3 greatly overestimates the true error scale.

38

Example: Comparison of Robust Estimates


M estimation with 10% contamination:
proc robustreg data=a method=m ; model y = x1 x2; run;
Parameter Estimates

Standard Parameter DF Estimate Error


Intercept x1 x2 Scale 1 1 1 1 10.0024 5.0077 3.0161 0.5780 0.0174 0.0175 0.0167

95% Confidence Limits


9.9683 4.9735 2.9834

ChiSquare Pr > ChiSq


<.0001 <.0001 <.0001

10.0364 331908 5.0420 82106.9 3.0488 32612.5

Diagnostics Summary Observation Type Outlier

Proportion 0.1020

Cutoff 3.0000 39

Example: Comparison of Robust Estimates


MM estimation with 10% contamination:
proc robustreg data=a method=mm ; model y = x1 x2; run;
Parameter Estimates Standard Parameter DF Estimate Error Intercept x1 x2 Scale 1 1 1 0 10.0035 5.0085 3.0181 0.6733 0.0176 0.0178 0.0168 95% Confidence Limits 9.9690 4.9737 2.9851 ChiSquare Pr > ChiSq <.0001 <.0001 <.0001

10.0379 323947 5.0433 79600.6 3.0511 32165.0

Diagnostics Summary
Observation Type Outlier

Proportion 0.1000

Cutoff 3.0000 40

Example: Comparison of Robust Estimates


The next statements demonstrate that if the percentage of contamination is increased to 40 %, the M estimates and MM estimates with default options fail to pick up the underlying model. However, by tuning the constant c for the M estimate and the constants INITH and K0 for the MM estimate, you can increase the breakdown values of these estimates and capture the right model.

data b (drop=i); do i=1 to 1000; x1=rannor(1234); x2=rannor(1234); e=rannor(1234); if i > 600 then y=100 + e; else y=10 + 5*x1 + 3*x2 + .5 * e; output; end; run;

41

Example: Comparison of Robust Estimates


M estimation with 40% contamination:
proc robustreg data=b method=m(wf=bisquare(c=2)); model y = x1 x2; run;
Parameter Estimates Standard Error 0.0219 0.0220 0.0210 95% Confidence Limits 9.9708 4.9473 2.9987 ChiSquare Pr > ChiSq <.0001 <.0001 <.0001

Parameter DF Estimate Intercept x1 x2 Scale 1 1 1 1 10.0137 4.9905 3.0399 1.0531

10.0565 209688 5.0336 51399.1 3.0811 20882.4

Diagnostics Summary Observation Type Outlier

Proportion 0.4000

Cutoff 3.0000

42

Example: Comparison of Robust Estimates


MM estimation with 40% contamination:
proc robustreg data=b method=mm(inith=502 k0=1.8); model y = x1 x2; run;
Parameter Estimates Standard Parameter DF Estimate Error Intercept x1 x2 Scale 1 1 1 0 10.0103 4.9890 3.0363 1.8992 0.0213 0.0218 0.0201 95% Confidence Limits 9.9686 4.9463 2.9970 ChiSquare Pr > ChiSq <.0001 <.0001 <.0001

10.0520 221639 5.0316 52535.9 3.0756 22895.5

Diagnostics Summary Observation Type Outlier

Proportion 0.4000

Cutoff 3.0000 43

Example: Comparison of Robust Estimates


Note that the summary statistics for this last data set show that there is no evidence of any leverage points, but there do appear to be outliers in the y-direction.

Summary Statistics Standard Deviation 0.9933 1.0394 44.2957

Variable x1 x2 y

Q1 -0.6546 -0.7891 9.1165

Median 0.0230 -0.0747 16.1409

Q3 0.7099 0.6839 99.6590

Mean 0.0222 -0.0401 46.0448

MAD 1.0085 1.0857 18.3994

When there are bad leverage points, the M estimates fail to pick up the underlying model no matter what constant c you use. In this case, other estimates (LTS, S, and MM estimates) in PROC ROBUSTREG, which are robust to bad leverage points, will pick up the underlying model. The following statements generate 1000 observations with 1% bad high leverage 44 points.

Example: Comparison of Robust Estimates


data b (drop=i); do i=1 to 1000; x1=rannor(1234); x2=rannor(1234); e=rannor(1234); if i > 600 then y=100 + e; else y=10 + 5*x1 + 3*x2 + .5 * e; if i < 11 then x1=200 * rannor(1234); if i < 11 then x2=200 * rannor(1234); if i < 11 then y= 100*e; output; end; run;
45

Example: Comparison of Robust Estimates


S estimation with 40% outliers and 1% leverage points: proc robustreg data=b method=s(k0=1.8); model y = x1 x2; run;

Note the summary statistics indicate both outliers in the y-direction and leverage points in the x-direction.
Summary Statistics Standard Deviation 32.0322 15.8316 44.8562

Variable x1 x2 y

Q1 -0.6423 -0.6571 8.7280

Median -0.00266 0.0366 16.4729

Q3 0.7236 0.7210 99.6293

Mean -0.2973 0.0872 46.4549

MAD 1.0012 1.0273 18.5679

46

Example: Comparison of Robust Estimates


S Profile

Total Number of Observations Number of Coefficients Subset Size Chi Function K0 Breakdown Value Efficiency Parameter Estimates
Standard Error 0.0216 0.0208 0.0222 95% Confidence Limits 9.9383 4.9896 2.9782

1000 3 3 Tukey 1.8000 0.4401 0.3874

Parameter DF Estimate Intercept x1 x2 Scale 1 1 1 0 9.9808 5.0303 3.0217 2.2094

ChiSquare Pr > ChiSq <.0001 <.0001 <.0001

10.0232 212532 5.0710 58656.3 3.0652 18555.7

Diagnostics Summary Observation Type Outlier

Proportion 0.4100

Cutoff 3.0000

47

Example: Comparison of Robust Estimates


MM estimation with 40% outliers and 1% leverage points:
proc robustreg data=b method=mm(inith=502 k0=1.8); model y = x1 x2; run;
Parameter Estimates Standard Error 0.0215 0.0206 0.0221 95% Confidence Limits 9.9398 4.9898 2.9789 ChiSquare Pr > ChiSq <.0001 <.0001 <.0001

Parameter DF Estimate Intercept x1 x2 Scale 1 1 1 0 9.9820 5.0303 3.0222 2.2134

10.0241 215369 5.0707 59469.1 3.0655 18744.9

Diagnostics Summary Observation Type Outlier

Proportion 0.4100

Cutoff 3.0000

48

Example: Growth Study of De Long & Summers


Robust regression and outlier detection techniques have considerable applications to econometrics. The following example from Zaman, Rousseeuw, and Orhan (2001) shows how these techniques substantially improve the ordinary least squares (OLS) results for the growth study of De Long and Summers. De Long and Summers (1991) studied the national growth of 61 countries from 1960 to 1985 using OLS. The regression equation they used is:

GDP = 0 + 1 LFG + 2 GAP + 3 EQP + 4 NEQ +


where the response variable is the growth in gross domestic product per worker (GDP) and the regressors are labor force growth (LFG), relative GDP gap (GAP), equipment investment (EQP), and non-equipment investment (NEQ).
49

The following statements invoke the REG procedure for the OLS analysis:

Example: Growth Study of De Long & Summers


data growth; input country$ GDP LFG EQP NEQ GAP datalines; Argentin 0.0089 0.0118 0.0214 0.2286 Austria 0.0332 0.0014 0.0991 0.1349 Belgium 0.0256 0.0061 0.0684 0.1653 Bolivia 0.0124 0.0209 0.0167 0.1133 Botswana 0.0676 0.0239 0.1310 0.1490 Brazil 0.0437 0.0306 0.0646 0.1588 : : U.K. 0.0189 0.0048 U.S. 0.0133 0.0189 Uruguay 0.0041 0.0052 Venezuel 0.0120 0.0378 Zambia -0.0110 0.0275 Zimbabwe 0.0110 0.0309 ; 0.0694 0.0762 0.0155 0.0340 0.0702 0.0843 0.1132 0.1356 0.1154 0.0760 0.2012 0.1257

@@;
0.6079 0.5809 0.4109 0.8634 0.9474 0.8498

0.4307 0.0000 0.5782 0.4974 0.8695 0.8875

50

Example: Growth Study of De Long & Summers


OLS estimation: proc reg data=growth; model GDP = LFG GAP EQP NEQ ; run;
Parameter Estimates Parameter Estimate -0.01430 -0.02981 0.02026 0.26538 0.06236 Standard Error 0.01028 0.19838 0.00917 0.06529 0.03482

Variable Intercept LFG GAP EQP NEQ

DF 1 1 1 1 1

t Value -1.39 -0.15 2.21 4.06 1.79

Pr > |t| 0.1697 0.8811 0.0313 0.0002 0.0787

The OLS analysis indicates that GAP and EQP have a significant influence on GDP at the 5% level.
51

Example: Growth Study of De Long & Summers


M estimation (the default):
proc robustreg data=growth; model GDP = LFG GAP EQP NEQ / diagnostics leverage; output out=robout r=resid sr=stdres; run;
Summary Statistics
Standard Deviation 0.00979 0.2181 0.0296 0.0570 0.0155

Variable LFG GAP EQP NEQ GDP

Q1 0.0118 0.5796 0.0265 0.0956 0.0121

Median 0.0239 0.8015 0.0433 0.1356 0.0231

Q3 0.0281 0.8863 0.0720 0.1812 0.0310

Mean 0.0211 0.7258 0.0523 0.1399 0.0224

MAD 0.00949 0.1778 0.0325 0.0624 0.0150

Its not obvious from the summary statistics that there may be outliers or leverage points. 52

Example: Growth Study of De Long & Summers


Parameter Estimates Standard Error 0.0097 0.1867 0.0086 0.0614 0.0328 95% Confidence Limits -0.0437 -0.2619 0.0080 0.1764 0.0242 -0.0058 0.4699 0.0419 0.4172 0.1527 ChiSquare Pr > ChiSq 6.53 0.31 8.36 23.33 7.29 0.0106 0.5775 0.0038 <.0001 0.0069 Parameter DF Estimate Intercept LFG GAP EQP NEQ Scale 1 1 1 1 1 1 -0.0247 0.1040 0.0250 0.2968 0.0885 0.0099

The parameter estimates now show that NEQ is also statistically significant.

53

Example: Growth Study of De Long & Summers


Diagnostics Obs 1 5 8 9 17 23 27 31 53 57 58 59 60 61 Mahalanobis Distance 2.6083 3.4351 3.1876 3.6752 2.6024 2.1225 2.6461 2.9179 2.2600 3.8701 2.5953 2.9239 1.8562 1.9634

Robust MCD Distance


4.0639 6.7391 4.6843 5.0599 3.8186 3.8238 5.0336 4.7140 4.3193 5.4874 3.9671 4.1663 2.7135 3.9128

Leverage * * * * * * * * * * * * *

Standardized Robust Residual


-0.9424 1.4200 -0.1972 -1.8784 -1.7971 1.7161 0.0909 0.0216 -1.8082 0.1448 -0.0978 0.3573 -4.9798 -2.5959

Outlier

The diagnostics show that observation #60 (Zambia) is an outlier. While there are several leverage points in the data, none are serious. In this case, M estimation 54 is appropriate.

Example: Growth Study of De Long & Summers


The following statements invoke the ROBUSTREG procedure with LTS estimation, which was used by Zaman, Rousseeuw, and Orhan (2001). The results are consistent with those of M estimation.
proc robustreg method=lts(h=33) fwls data=growth; model GDP = LFG GAP EQP NEQ / diagnostics leverage ; output out=robout r=resid sr=stdres; run;

55

Example: Growth Study of De Long & Summers


Parameter Estimates for Final Weighted Least Squares Fit Standard Error 0.0093 0.1771 0.0082 0.0581 0.0314 95% Confidence Limits -0.0405 -0.3026 0.0084 0.1685 0.0233 -0.0039 0.3917 0.0406 0.3964 0.1465 ChiSquare Pr > ChiSq 5.65 0.06 8.89 23.60 7.30 0.0175 0.8013 0.0029 <.0001 0.0069 Parameter Intercept LFG GAP EQP NEQ Scale DF Estimate 1 1 1 1 1 0 -0.0222 0.0446 0.0245 0.2824 0.0849 0.0116

The final weighted lease squares estimates are identical to those reported in Zaman, Rousseeuw, and Orhan (2001).

56

Potrebbero piacerti anche