Proc Robust Reg

PROC ROBUSTREG Robust Regression Models
April 20, 2005 Charlie Hallahan

1
Overview
PROC ROBUSTREG is experimental in SAS/ETS Version 9.* Main purpose is to detect outliers and provide resistant (stable) results in the presence of outliers Addresses three types of problems: problems with outliers in the y-direction (response direction) problems with multivariate outliers in the x-space (leverage points) problems with outliers in both the y-direction and x-space * These notes closely follow the SAS documentation for ROBUSTREG. Also, see the paper Robust Regression and Outlier Detection with the ROBUSTREG Procedure by Colin Chen presented at SUGI27 in 2002 (http://www2.sas.com/proceedings/sugi27/p265-27.pdf )
2
Overview
ROBUSTREG supports four methods: 1. M estimation: introduced by Huber in 1973. Simplest both computationally and theoretically. Only addresses contamination in the response direction. Least Trimmed Squares (LTS): introduced by Rouseeuw in 1984. It is a so-called high breakdown method. The breakdown value is a measure of the proportion of contamination that an estimation method can withstand and still maintain its robustness. Uses the FAST-LST algorithm of Rouseeuw and Van Driessen (1998). S estimation: introduced by Rouseeuw and Yohai in 1984. It is a high breakdown method that is more statistically efficient than LTS. MM estimation: introduced by Yohai in 1987. Combines high breakdown value estimation and M estimation. It is a high breakdown method that is more statistically efficient than S estimation 3
2.
3.
4.
Overview M Estimation
Before getting to the SAS code, its probably worthwhile to review whats involved with the simplest robust estimator, the M estimator. These notes follow some online documentation for the text Applied Regression Analysis, Linear Models, and Related Methods by John Fox. When the error distribution is normal, least squares (LS) is the most efficient regression estimator. However, LS is very sensitive to outliers (aberrant observations in the y-direction) at high leverage points (aberrant observations in the x-direction). Such cases result in heavy-tailed error distributions.
10
11
Leverage Points
Leverage points are outlying points in the x-direction. A leverage points may or may not have an effect on the estimated regression model.
12
Getting Started- M Estimation

M estimation is used when the contamination occurs mostly in the response direction. The data set stack is the stackloss data of Brownlee (1965). The data describe the operation of a plant for the oxidation of ammonia to nitric acid and consist of 21 fourdimensional observations. The response variable (y) represents stackloss and the explanatory variables are the rate of operation (x1), the cooling water inlet temperature (x2) and the acid concentration (x3).
* M Estimation; data stack; input x1 x2 x3 y; datalines; 80 27 89 42 80 27 88 37 : : : 56 20 82 15 70 20 91 15 ;
13

proc robustreg data=stack; model y = x1 x2 x3 / diagnostics leverage; id x1; test x3; run; By default, the procedure does M estimation with the bisquare weight function, and it uses the median method for estimating the scalar parameter. The MODEL statement specifies the covariate effects The DIAGNOSTICS option requests a table for outlier diagnostics The LEVERAGE option adds leverage point diagnostic results for continuous effects The ID statement specifies the variable x1 to identify observations in output tables the TEST statement tests the significance of an effect
14

The ROBUSTREG Procedure Model Information Data Set Dependent Variable Number of Independent Variables Number of Observations Method WORK.STACK y 3 21 M Estimation
Number of Observations Read Number of Observations Used
21 21
Summary Statistics Standard Deviation 9.1683 3.1608 5.3586 10.1716
Variable x1 x2 x3 y
Q1 53.0000 18.0000 82.0000 10.0000
Median 58.0000 20.0000 87.0000 15.0000
Q3 62.0000 24.0000 89.5000 19.5000
Mean 60.4286 21.0952 86.2857 17.5238
MAD 5.9304 2.9652 4.4478 5.9304
Note the response variable (y) has the biggest discrepancy between the two estimates of scale, the standard deviation and MAD.
15

Parameter Estimates Standard Parameter DF Estimate Error Intercept x1 x2 x3 Scale 1 -42.2854 1 0.9276 1 0.6507 1 -0.1123 1 2.2819 95% Confidence Limits ChiSquare Pr > ChiSq 19.79 74.11 4.90 0.81 <.0001 <.0001 0.0269 0.3683
9.5045 -60.9138 -23.6569 0.1077 0.7164 1.1387 0.2940 0.0744 1.2270 0.1249 -0.3571 0.1324
M estimation yields the fitted linear model:
42.2845 0.9276 x1 0.6507 x 2 01123 y . x3

16

Diagnostics Robust MCD Distance 5.5284 5.6374 4.1972 1.5887 3.6573 Standardized Robust Residual 1.0995 -1.1409 1.5604 3.0381 -4.5733 Obs 1 2 3 4 21 x1 80.000000 80.000000 75.000000 62.000000 70.000000 Mahalanobis Distance 2.2536 2.3247 1.5937 1.2719 2.1768 Leverage * * * * Outlier
* *
Diagnostics Summary Observation Type Outlier Leverage
Proportion 0.0952 0.1905
Cutoff 3.0000 3.0575
Observations 4 and 21 are outliers because their robust residuals exceed the cutoff value in absolute value. 4 high leverage points are detected, mainly caused by x1. Note that only observation 21 is a bad leverage point, i.e., an aberrant x-value that results in a large residual.
17

Two useful plots are the RDPLOT, robust residuals against robust distances. and the DDPLOT, robust distances against classical Mahalanobis distances. Residual plots are also useful. These plots are available with ODS GRAPHICS.
ods html; ods graphics on;

proc robustreg data=stack plots=(rdplot ddplot reshistogram resqqplot); model y = x1 x2 x3; run; ods graphics off; ods html close;
18

RDPLOT:
19

DDPLOT:
20
21
22

Goodness-of-Fit Statistic R-Square AICR BICR Deviance Value 0.6659 29.5231 36.3361 125.7905
Robust Linear Tests* Test Test Statistic 0.9378 0.8092 ChiSquare Pr > ChiSq 1.18 0.81 0.2782 0.3683
Test Rho Rn2
Lambda DF 0.7977 1 1
Rho is a robust version of the F-test, and Rn2 is a robust version of the Wald test.23

The default constant for the bisquare weight function is c* = 4.685. With this value the asymptotic efficiency of the M estimates is 95% with the Gaussian distribution. A smaller value of c lowers the asymptotic efficiency but sharpens the M estimator as an outlier predictor. To use c=3.5, for example, with the stackloss data set:
proc robustreg method=m(wf=bisquare(c=3.5)) data=stack; model y = x1 x2 x3 / diagnostics leverage; id x1; test x3; run;
* The constant c representing the cutoff value for a weight of zero is called k on p. 10.
24

Parameter Estimates Standard Parameter DF Estimate Error Intercept x1 x2 x3 Scale 1 -37.1076 1 0.8191 1 0.5173 1 -0.0728 1 1.4265 95% Confidence Limits ChiSquare Pr > ChiSq 45.97 174.28 9.33 1.03 <.0001 <.0001 0.0022 0.3111
5.4731 -47.8346 -26.3805 0.0620 0.6975 0.9407 0.1693 0.1855 0.8492 0.0719 -0.2138 0.0681
The refitted linear model with c = 3.5 is:
37.1076 0.8191x1 0.5173x 2 0.0728 x 3 y

25

Diagnostics Robust MCD Distance 5.5284 5.6374 4.1972 1.5887 3.6573 Standardized Robust Residual 4.2719 0.7158 4.4142 5.7792 -6.2727
Obs 1 2 3 4 21
x1 80.000000 80.000000 75.000000 62.000000 70.000000
Mahalanobis Distance 2.2536 2.3247 1.5937 1.2719 2.1768
Leverage * * * *
Outlier *
* * *
In addition to observations 4 and 21, observations 1 and 3 are now detected as outliers.
26
Getting Started- LTS Estimation

If the data are contaminated in the x-space, M estimation does not do well. LTS estimation is more appropriate in this situation. In the following example, the data set hbk is an artificial data set created by Hawkins, Bradu, and Kass (1984). Both OLS and M estimation suggest that observations 11 to 14 are serious outliers. However, these four observations were generated from the underlying model, whereas observations 1 to 10 were contaminated. OLS and M estimation cannot distinguish good leverage points (observations 11 to 14) from bad leverage points (observations 1 to 10). In such cases, LTS identifies the true outliers.
27

data hbk; input index$ datalines; 1 10.1 19.6 2 9.5 20.5 3 10.7 20.2 4 9.9 21.5 : : 35 3.1 2.4 36 1.1 2.2 37 0.1 3.0 38 1.5 1.2 ; x1 x2 x3 y @@; 28.3 28.9 31.0 31.7 9.7 10.1 10.3 9.5 39 40 41 42 2.1 0.5 3.4 0.3 0.0 2.0 1.6 1.0 1.2 1.2 2.9 2.7 -0.7 -0.5 -0.1 -0.7
3.0 2.7 2.6 0.2
0.3 -1.0 -0.6 0.9
73 74 75
0.3 0.0 0.3
1.7 2.2 0.4
2.2 1.6 2.6
0.4 -0.9 0.2
Well 1st do M estimation proc robustreg data=hbk method=m; model y = x1 x2 x3 / diagnostics leverage; id index; run;
28

Diagnostics Robust MCD Distance 29.4424 30.2054 31.8909 32.8621 32.2778 30.5892 30.6807 29.7994 31.9537 30.9429 36.6384 37.9552 36.9175 41.0914 Standardized Robust Residual 0.2434 0.4821 0.1402 -1.1429 -0.3896 0.0946 1.0055 0.8921 -0.6449 0.1768 -14.2697 -14.8641 -13.6431 -16.1370 Obs 1 3 5 7 9 11 13 15 17 19 21 23 25 27 index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mahalanobis Distance 1.9168 1.8558 2.3137 2.2297 2.1001 2.1462 2.0105 1.9193 2.2212 2.3335 2.4465 3.1083 2.6624 6.3816 Leverage * * * * * * * * * * * * * * Outlier
* * * *
M estimation (wrongly) identifies observation 11 to 14 as outliers and misses the real outliers, observations 1 to 10. 29

Parameter estimates from M-estimation:
Parameter Estimates Standard Parameter DF Estimate Error Intercept x1 x2 x3 Scale 1 1 1 1 1 -0.9459 0.1449 0.1974 0.1803 0.8226 0.1359 0.0857 0.0506 0.0420 95% Confidence Limits -1.2123 -0.0230 0.0982 0.0978 -0.6795 0.3127 0.2965 0.2627 ChiSquare Pr > ChiSq 48.42 2.86 15.21 18.38 <.0001 0.0908 <.0001 <.0001
30

A warning appears on the SAS Log:
NOTE: Algorithm converged for the M estimates. NOTE: The MCD estimators for covariates have been successfully computed. WARNING: The data set contains one or more high leverage points, for which M estimation is not robust. It is recommended to use METHOD=LTS or METHOD=MM for this data set. NOTE: Algorithm converged for the location-scale M estimates. NOTE: Algorithm converged for the M estimates in the reduced model.
proc robustreg data=hbk fwls method=lts; model y = x1 x2 x3 / diagnostics leverage; id index; run;
31

Model Information Data Set Dependent Variable Number of Independent Variables Number of Observations Method WORK.HBK y 3 75 LTS Estimation
Number of Observations Read Number of Observations Used
75 75
Summary Statistics Standard Deviation 3.6526 8.2391 11.7403 3.4928
Variable x1 x2 x3 y
Q1 0.8000 1.0000 0.9000 -0.5000
Median 1.8000 2.2000 2.1000 0.1000
Q3 3.1000 3.3000 3.0000 0.7000
Mean 3.2067 5.5973 7.2307 1.2787
MAD 1.9274 1.6309 1.7791 0.8896
Note large differences between the usual and robust location and scale estimates. 32

We re-estimate the model using the LTS method:
proc robustreg data=hbk fwls method=lts; model y = x1 x2 x3 / diagnostics leverage; id index; run;
The option fwls requests that the final weighted least squares method be applied.
LTS Profile Total Number of Observations Number of Squares Minimized Number of Coefficients Highest Possible Breakdown Value 75 57 4 0.2533
In this case, the LTS estimate minimizes the sum of 57 smallest squares of residuals. It can still pick up the right model if the remaining 18 observations are contaminated. This corresponds to a breakdown value around 0.25, which is the default. 33

LTS Parameter Estimates Parameter Intercept x1 x2 x3 Scale (sLTS) Scale (Wscale) DF 1 1 1 1 0 0 Estimate -0.3431 0.0901 0.0703 -0.0731 0.7451 0.5749
Two robust estimates of the scale parameter are displayed. The weighted scale estimate (Wscale) is a more efficient estimate of the scale parameter.
34

Diagnostics Robust MCD Distance 29.4424 30.2054 31.8909 32.8621 32.2778 30.5892 30.6807 29.7994 31.9537 30.9429 36.6384 37.9552 36.9175 41.0914 Standardized Robust Residual 17.0868 17.8428 18.3063 16.9702 17.7498 17.5155 18.8801 18.2253 17.1843 17.8021 0.0406 -0.0874 1.0776 -0.7875 Obs 1 3 5 7 9 11 13 15 17 19 21 23 25 27 index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mahalanobis Distance 1.9168 1.8558 2.3137 2.2297 2.1001 2.1462 2.0105 1.9193 2.2212 2.3335 2.4465 3.1083 2.6624 6.3816 Leverage * * * * * * * * * * * * * * Outlier * * * * * * * * * *
Diagnostics Summary Observation Type Outlier Leverage
Proportion 0.1333 0.1867
Cutoff 3.0000 3.0575
35

As can be seen on the previous slide, LTS correctly identifies the first ten observations as outliers and observations 11 to 14 are identified as good leverage points.
Parameter Estimates for Final Weighted Least Squares Fit Standard Error 0.1044 0.0667 0.0405 0.0354 95% Confidence Limits -0.3852 -0.0493 -0.0394 -0.1210 0.0242 0.2120 0.1192 0.0177 ChiSquare Pr > ChiSq 2.99 1.49 0.97 2.13 0.0840 0.2222 0.3242 0.1441
Parameter Intercept x1 x2 x3 Scale
DF Estimate 1 1 1 1 0 -0.1805 0.0814 0.0399 -0.0517 0.5572
The final weighted least squares estimates are the least squares estimates computed after deleting the detected outliers. Compare with the M-estimation results on p. 30.
36
Example: Comparison of Robust Estimates

This example illustrates differences in the performance of robust estimates available in the ROBUSTREG procedure. The following statements generate 1000 random observations. The first 900 observations are from a linear model and the last 100 observations are significantly biased in the y-direction. In other words, ten percent of the observations are contaminated with outliers.
data a (drop=i); do i=1 to 1000; x1=rannor(1234); x2=rannor(1234); e=rannor(1234); if i > 900 then y=100 + e; else y=10 + 5*x1 + 3*x2 + .5 * e; output; end; run;
Since the data, by design, has outliers, and not high leverage points, both M and MM estimation methods are appropriate.
37

The true value of the coefficients are 10, 5, and 3 with an error standard deviation of 0.5. Well first see how OLS performs. proc reg data=a; model y = x1 x2; run;
Parameter Estimates Parameter Estimate 19.06712 3.55485 2.12341 Standard Error 0.86322 0.86892 0.83039
Variable Intercept x1 x2
DF 1 1 1
t Value 22.09 4.09 2.56
Pr > |t| <.0001 <.0001 0.0107
The RMSE estimate of 27.3 greatly overestimates the true error scale.
38

M estimation with 10% contamination:
proc robustreg data=a method=m ; model y = x1 x2; run;
Parameter Estimates
Standard Parameter DF Estimate Error

Intercept x1 x2 Scale 1 1 1 1 10.0024 5.0077 3.0161 0.5780 0.0174 0.0175 0.0167
95% Confidence Limits

9.9683 4.9735 2.9834
ChiSquare Pr > ChiSq

<.0001 <.0001 <.0001
10.0364 331908 5.0420 82106.9 3.0488 32612.5
Diagnostics Summary Observation Type Outlier
Proportion 0.1020
Cutoff 3.0000 39

MM estimation with 10% contamination:
proc robustreg data=a method=mm ; model y = x1 x2; run;
Parameter Estimates Standard Parameter DF Estimate Error Intercept x1 x2 Scale 1 1 1 0 10.0035 5.0085 3.0181 0.6733 0.0176 0.0178 0.0168 95% Confidence Limits 9.9690 4.9737 2.9851 ChiSquare Pr > ChiSq <.0001 <.0001 <.0001
10.0379 323947 5.0433 79600.6 3.0511 32165.0
Diagnostics Summary
Observation Type Outlier
Proportion 0.1000
Cutoff 3.0000 40

The next statements demonstrate that if the percentage of contamination is increased to 40 %, the M estimates and MM estimates with default options fail to pick up the underlying model. However, by tuning the constant c for the M estimate and the constants INITH and K0 for the MM estimate, you can increase the breakdown values of these estimates and capture the right model.
data b (drop=i); do i=1 to 1000; x1=rannor(1234); x2=rannor(1234); e=rannor(1234); if i > 600 then y=100 + e; else y=10 + 5*x1 + 3*x2 + .5 * e; output; end; run;
41

M estimation with 40% contamination:
proc robustreg data=b method=m(wf=bisquare(c=2)); model y = x1 x2; run;
Parameter Estimates Standard Error 0.0219 0.0220 0.0210 95% Confidence Limits 9.9708 4.9473 2.9987 ChiSquare Pr > ChiSq <.0001 <.0001 <.0001
Parameter DF Estimate Intercept x1 x2 Scale 1 1 1 1 10.0137 4.9905 3.0399 1.0531
10.0565 209688 5.0336 51399.1 3.0811 20882.4
Proportion 0.4000
Cutoff 3.0000
42

MM estimation with 40% contamination:
proc robustreg data=b method=mm(inith=502 k0=1.8); model y = x1 x2; run;
Parameter Estimates Standard Parameter DF Estimate Error Intercept x1 x2 Scale 1 1 1 0 10.0103 4.9890 3.0363 1.8992 0.0213 0.0218 0.0201 95% Confidence Limits 9.9686 4.9463 2.9970 ChiSquare Pr > ChiSq <.0001 <.0001 <.0001
10.0520 221639 5.0316 52535.9 3.0756 22895.5
Proportion 0.4000
Cutoff 3.0000 43

Note that the summary statistics for this last data set show that there is no evidence of any leverage points, but there do appear to be outliers in the y-direction.
Summary Statistics Standard Deviation 0.9933 1.0394 44.2957
Variable x1 x2 y
Q1 -0.6546 -0.7891 9.1165
Median 0.0230 -0.0747 16.1409
Q3 0.7099 0.6839 99.6590
Mean 0.0222 -0.0401 46.0448
MAD 1.0085 1.0857 18.3994
When there are bad leverage points, the M estimates fail to pick up the underlying model no matter what constant c you use. In this case, other estimates (LTS, S, and MM estimates) in PROC ROBUSTREG, which are robust to bad leverage points, will pick up the underlying model. The following statements generate 1000 observations with 1% bad high leverage 44 points.

data b (drop=i); do i=1 to 1000; x1=rannor(1234); x2=rannor(1234); e=rannor(1234); if i > 600 then y=100 + e; else y=10 + 5*x1 + 3*x2 + .5 * e; if i < 11 then x1=200 * rannor(1234); if i < 11 then x2=200 * rannor(1234); if i < 11 then y= 100*e; output; end; run;
45

S estimation with 40% outliers and 1% leverage points: proc robustreg data=b method=s(k0=1.8); model y = x1 x2; run;
Note the summary statistics indicate both outliers in the y-direction and leverage points in the x-direction.
Summary Statistics Standard Deviation 32.0322 15.8316 44.8562
Variable x1 x2 y
Q1 -0.6423 -0.6571 8.7280
Median -0.00266 0.0366 16.4729
Q3 0.7236 0.7210 99.6293
Mean -0.2973 0.0872 46.4549
MAD 1.0012 1.0273 18.5679
46

S Profile
Total Number of Observations Number of Coefficients Subset Size Chi Function K0 Breakdown Value Efficiency Parameter Estimates
Standard Error 0.0216 0.0208 0.0222 95% Confidence Limits 9.9383 4.9896 2.9782
1000 3 3 Tukey 1.8000 0.4401 0.3874
ChiSquare Pr > ChiSq <.0001 <.0001 <.0001
10.0232 212532 5.0710 58656.3 3.0652 18555.7
Proportion 0.4100
Cutoff 3.0000
47

MM estimation with 40% outliers and 1% leverage points:
proc robustreg data=b method=mm(inith=502 k0=1.8); model y = x1 x2; run;
Parameter Estimates Standard Error 0.0215 0.0206 0.0221 95% Confidence Limits 9.9398 4.9898 2.9789 ChiSquare Pr > ChiSq <.0001 <.0001 <.0001
10.0241 215369 5.0707 59469.1 3.0655 18744.9
Proportion 0.4100
Cutoff 3.0000
48
Example: Growth Study of De Long & Summers

Robust regression and outlier detection techniques have considerable applications to econometrics. The following example from Zaman, Rousseeuw, and Orhan (2001) shows how these techniques substantially improve the ordinary least squares (OLS) results for the growth study of De Long and Summers. De Long and Summers (1991) studied the national growth of 61 countries from 1960 to 1985 using OLS. The regression equation they used is:
GDP = 0 + 1 LFG + 2 GAP + 3 EQP + 4 NEQ +

where the response variable is the growth in gross domestic product per worker (GDP) and the regressors are labor force growth (LFG), relative GDP gap (GAP), equipment investment (EQP), and non-equipment investment (NEQ).
49
The following statements invoke the REG procedure for the OLS analysis:

data growth; input country$ GDP LFG EQP NEQ GAP datalines; Argentin 0.0089 0.0118 0.0214 0.2286 Austria 0.0332 0.0014 0.0991 0.1349 Belgium 0.0256 0.0061 0.0684 0.1653 Bolivia 0.0124 0.0209 0.0167 0.1133 Botswana 0.0676 0.0239 0.1310 0.1490 Brazil 0.0437 0.0306 0.0646 0.1588 : : U.K. 0.0189 0.0048 U.S. 0.0133 0.0189 Uruguay 0.0041 0.0052 Venezuel 0.0120 0.0378 Zambia -0.0110 0.0275 Zimbabwe 0.0110 0.0309 ; 0.0694 0.0762 0.0155 0.0340 0.0702 0.0843 0.1132 0.1356 0.1154 0.0760 0.2012 0.1257
@@;
0.6079 0.5809 0.4109 0.8634 0.9474 0.8498
0.4307 0.0000 0.5782 0.4974 0.8695 0.8875
50

OLS estimation: proc reg data=growth; model GDP = LFG GAP EQP NEQ ; run;
Parameter Estimates Parameter Estimate -0.01430 -0.02981 0.02026 0.26538 0.06236 Standard Error 0.01028 0.19838 0.00917 0.06529 0.03482
Variable Intercept LFG GAP EQP NEQ
DF 1 1 1 1 1
t Value -1.39 -0.15 2.21 4.06 1.79
Pr > |t| 0.1697 0.8811 0.0313 0.0002 0.0787
The OLS analysis indicates that GAP and EQP have a significant influence on GDP at the 5% level.
51

M estimation (the default):
proc robustreg data=growth; model GDP = LFG GAP EQP NEQ / diagnostics leverage; output out=robout r=resid sr=stdres; run;
Summary Statistics
Standard Deviation 0.00979 0.2181 0.0296 0.0570 0.0155
Variable LFG GAP EQP NEQ GDP
Q1 0.0118 0.5796 0.0265 0.0956 0.0121
Median 0.0239 0.8015 0.0433 0.1356 0.0231
Q3 0.0281 0.8863 0.0720 0.1812 0.0310
Mean 0.0211 0.7258 0.0523 0.1399 0.0224
MAD 0.00949 0.1778 0.0325 0.0624 0.0150
Its not obvious from the summary statistics that there may be outliers or leverage points. 52

Parameter Estimates Standard Error 0.0097 0.1867 0.0086 0.0614 0.0328 95% Confidence Limits -0.0437 -0.2619 0.0080 0.1764 0.0242 -0.0058 0.4699 0.0419 0.4172 0.1527 ChiSquare Pr > ChiSq 6.53 0.31 8.36 23.33 7.29 0.0106 0.5775 0.0038 <.0001 0.0069 Parameter DF Estimate Intercept LFG GAP EQP NEQ Scale 1 1 1 1 1 1 -0.0247 0.1040 0.0250 0.2968 0.0885 0.0099
The parameter estimates now show that NEQ is also statistically significant.
53

Diagnostics Obs 1 5 8 9 17 23 27 31 53 57 58 59 60 61 Mahalanobis Distance 2.6083 3.4351 3.1876 3.6752 2.6024 2.1225 2.6461 2.9179 2.2600 3.8701 2.5953 2.9239 1.8562 1.9634
Robust MCD Distance

4.0639 6.7391 4.6843 5.0599 3.8186 3.8238 5.0336 4.7140 4.3193 5.4874 3.9671 4.1663 2.7135 3.9128
Leverage * * * * * * * * * * * * *
Standardized Robust Residual

-0.9424 1.4200 -0.1972 -1.8784 -1.7971 1.7161 0.0909 0.0216 -1.8082 0.1448 -0.0978 0.3573 -4.9798 -2.5959
Outlier
The diagnostics show that observation #60 (Zambia) is an outlier. While there are several leverage points in the data, none are serious. In this case, M estimation 54 is appropriate.

The following statements invoke the ROBUSTREG procedure with LTS estimation, which was used by Zaman, Rousseeuw, and Orhan (2001). The results are consistent with those of M estimation.
proc robustreg method=lts(h=33) fwls data=growth; model GDP = LFG GAP EQP NEQ / diagnostics leverage ; output out=robout r=resid sr=stdres; run;
55

Parameter Estimates for Final Weighted Least Squares Fit Standard Error 0.0093 0.1771 0.0082 0.0581 0.0314 95% Confidence Limits -0.0405 -0.3026 0.0084 0.1685 0.0233 -0.0039 0.3917 0.0406 0.3964 0.1465 ChiSquare Pr > ChiSq 5.65 0.06 8.89 23.60 7.30 0.0175 0.8013 0.0029 <.0001 0.0069 Parameter Intercept LFG GAP EQP NEQ Scale DF Estimate 1 1 1 1 1 0 -0.0222 0.0446 0.0245 0.2824 0.0849 0.0116
The final weighted lease squares estimates are identical to those reported in Zaman, Rousseeuw, and Orhan (2001).
56

Proc Robust Reg

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Proc Robust Reg

Caricato da

Copyright:

Formati disponibili

PROC ROBUSTREG Robust Regression Models

April 20, 2005 Charlie Hallahan

Getting Started- M Estimation

Getting Started- M Estimation

Getting Started- M Estimation

Number of Observations Read Number of Observations Used

Summary Statistics Standard Deviation 9.1683 3.1608 5.3586 10.1716

Q1 53.0000 18.0000 82.0000 10.0000

Median 58.0000 20.0000 87.0000 15.0000

Q3 62.0000 24.0000 89.5000 19.5000

Mean 60.4286 21.0952 86.2857 17.5238

MAD 5.9304 2.9652 4.4478 5.9304

Getting Started- M Estimation

M estimation yields the fitted linear model:

42.2845 0.9276 x1 0.6507 x 2 01123 y . x3

Getting Started- M Estimation

Diagnostics Summary Observation Type Outlier Leverage

Proportion 0.0952 0.1905

Cutoff 3.0000 3.0575

Getting Started- M Estimation

ods html; ods graphics on;

Getting Started- M Estimation

Getting Started- M Estimation

Getting Started- M Estimation

Getting Started- M Estimation

Getting Started- M Estimation

Test Rho Rn2

Getting Started- M Estimation

Getting Started- M Estimation

The refitted linear model with c = 3.5 is:

37.1076 0.8191x1 0.5173x 2 0.0728 x 3 y

Getting Started- M Estimation

x1 80.000000 80.000000 75.000000 62.000000 70.000000

Mahalanobis Distance 2.2536 2.3247 1.5937 1.2719 2.1768

Getting Started- LTS Estimation

Getting Started- LTS Estimation

3.0 2.7 2.6 0.2

0.3 -1.0 -0.6 0.9

0.3 0.0 0.3

1.7 2.2 0.4

2.2 1.6 2.6

0.4 -0.9 0.2

Getting Started- LTS Estimation

Getting Started- LTS Estimation

Getting Started- LTS Estimation

Getting Started- LTS Estimation

Number of Observations Read Number of Observations Used

Summary Statistics Standard Deviation 3.6526 8.2391 11.7403 3.4928

Q1 0.8000 1.0000 0.9000 -0.5000

Median 1.8000 2.2000 2.1000 0.1000

Q3 3.1000 3.3000 3.0000 0.7000

Mean 3.2067 5.5973 7.2307 1.2787

MAD 1.9274 1.6309 1.7791 0.8896

Getting Started- LTS Estimation

Getting Started- LTS Estimation

Getting Started- LTS Estimation

Diagnostics Summary Observation Type Outlier Leverage

Proportion 0.1333 0.1867

Cutoff 3.0000 3.0575

Getting Started- LTS Estimation

Parameter Intercept x1 x2 x3 Scale

DF Estimate 1 1 1 1 0 -0.1805 0.0814 0.0399 -0.0517 0.5572

Example: Comparison of Robust Estimates

Example: Comparison of Robust Estimates