Sei sulla pagina 1di 10

STA5102 Model Building (Examples)

Model Building
Example: Surgical Unit
A hospital surgical unit was interested in predicting survival in patients undergoing a
particular type of liver operation. A random selection of 108 patients was available for
analysis. From each patient record, the following information was extracted from the preoperation evaluation:
Y:
X1:
X2:
X3:
X4:
X5:
X6:
X7 and X8:

survival time
blood clotting score
prognostic index
Enzyme function test score
liver function test score
age, in years
indicator variable for gender (0=male, 1=female)
indicator variables for history of alcohol use:
None:
X7 = 0, X8 = 0
Moderate:
X7 = 1, X8 = 0
Severe:
X7 = 0, X8 = 1

For explanatory purpose, only X1 to X4 are used and data of the first 54 patients are taken.

Regression of Y on X1, X2, X3 and X4.

Residual Plot [apparent curvature, normal???]

Transform Y to ln Y and obtain the residual plot again.

STA5102 Model Building (Examples)

The adoption of Ln Y seems to be a better choice.

STA5102 Model Building (Examples)


Correlation matrix:

Pearson Correlation Coefficients, N = 54


Prob > |r| under H0: Rho=0

X1

X1

X2

X3

X4

LnY

1.00000

0.09012

-0.14963

0.50242

0.24619

0.5169

0.2802

0.0001

0.0727

1.00000

-0.02361

0.36903

0.46994

0.8655

0.0060

0.0003

1.00000

0.41642

0.65389

0.0017

<.0001

1.00000

0.64926

BloodClotting
Score
X2
Prognostic
Index
X3
Enzyme
Test
X4
Liver Test
LnY
LnY

0.09012
0.5169

-0.14963

-0.02361

0.2802

0.8655

0.50242

0.36903

0.41642

0.0001

0.0060

0.0017

0.24619

0.46994

0.65389

0.64926

0.0727

0.0003

<.0001

<.0001

<.0001
1.00000

Remark:
Are there any highly correlated predictor variables?
Are the predictor variables linearly associated with lnY?

STA5102 Model Building (Examples)


All possible regression models

Number
in
Model

RSquare

Adjusted
RSquare

C(p)

0.4276

0.4166

66.4889

0.4215

0.4104

0.2208

AIC

MSE

Root
MSE

SBC

SSE

Variables in
Model

-103.8269

0.14099

0.37549

-99.84889

7.33157

X3

67.7148

-103.2615

0.14248

0.37746

-99.28357

7.40873

X4

0.2059

108.5558

-87.1781

0.19191

0.43807

-83.20011

9.97918

X2

0.0606

0.0425

141.1639

-77.0788

0.23137

0.48101

-73.10079

12.03147

X1

0.6633

0.6501

20.5197

-130.4833

0.08456

0.29079

-124.51634

4.31249

X2 X3

0.5995

0.5838

33.5041

-121.1126

0.10058

0.31715

-115.14561

5.12970

X3 X4

0.5486

0.5309

43.8517

-114.6583

0.11335

0.33668

-108.69138

5.78096

X1 X3

0.4830

0.4627

57.2149

-107.3236

0.12984

0.36034

-101.35663

6.62201

X2 X4

0.4301

0.4078

67.9721

-102.0669

0.14312

0.37831

-96.09998

7.29905

X1 X4

0.2627

0.2338

102.0313

-88.1622

0.18515

0.43029

-82.19528

9.44267

X1 X2

0.7573

0.7427

3.3905

-146.1609

0.06217

0.24934

-138.20494

3.10854

X1 X2 X3

0.7178

0.7009

11.4237

-138.0232

0.07228

0.26885

-130.06723

3.61413

X2 X3 X4

0.6121

0.5889

32.9320

-120.8442

0.09936

0.31521

-112.88823

4.96782

X1 X3 X4

0.4870

0.4562

58.3917

-105.7477

0.13140

0.36250

-97.79178

6.57020

X1 X2 X4

0.7592

0.7396

5.0000

-144.5895

0.06294

0.25087

-134.64461

3.08396

X1 X2 X3 X4

In summary, (X1, X2, X3) seems to be a better choice.

STA5102 Model Building (Examples)


Stepwise regression [Significance to enter = 0.1; Significance level to stay = 0.15]
Step 1: Variable X3 Entered: R-Square = 0.4276 and C(p) = 66.4889

Analysis of Variance
Source

DF

Model

5.47615

5.47615 38.84

Error

52

7.33157

0.14099

Corrected Total 53

Variable

0.01512

Mean
Square

F Value

Pr > F
<.0001

12.80773

Parameter Standard
Estimate
Error

Intercept 5.26426
X3

Sum of
Squares

Type II SS

F Value

Pr > F

0.19398

103.83831 736.48 <.0001

0.00243

5.47615

38.84

<.0001

Step 2: Variable X2 Entered: R-Square = 0.6633 and C(p) = 20.5197


Analysis of Variance
Source

DF

Model

8.49523

4.24762 50.23

Error

51

4.31249

0.08456

Corrected Total 53

Variable

Sum of
Squares

Mean
Square

F Value

<.0001

12.80773

Parameter Standard Type II SS F Value


Estimate
Error

Intercept 4.35058

Pr > F

Pr > F

0.21436

34.83215 411.93 <.0001

X2

0.01412

0.00236

3.01908

35.70

<.0001

X3

0.01539

0.00188

5.66669

67.01

<.0001

Step 3: X3 cannot be dropped.

STA5102 Model Building (Examples)


Step 4: Variable X1 Entered: R-Square = 0.7573 and C(p) = 3.3905
Analysis of Variance
Source

DF

Model

9.69919

3.23306 52.00

Error

50

3.10854

0.06217

Corrected Total 53

Variable

Sum of
Squares

Mean
Square

F Value

<.0001

12.80773

Parameter Standard Type II SS F Value


Estimate
Error

Intercept 3.76618

Pr > F

Pr > F

0.22676

17.14987 275.85 <.0001

X1

0.09546

0.02169

1.20395

19.37

<.0001

X2

0.01334

0.00203

2.67242

42.99

<.0001

X3

0.01645

0.00163

6.33413

101.88 <.0001

Step 5:
All variables left in the model are significant at the 0.1500 level.
No other variable met the 0.1000 significance level for entry into the model.

STA5102 Model Building (Examples)


Diagnostics
Assume X1, X2, X3 and X8 in the model.
Let the model contains the interaction terms, which have been tested and found nonsignificant.
Correlation matrix:

Pearson Correlation Coefficients, N = 54


Prob > |r| under H0: Rho=0

X1

X1

X2

X3

X8

LnY

1.00000

0.09012

-0.14963

0.22414

0.24619

0.5169

0.2802

0.1032

0.0727

1.00000

-0.02361

-0.08372

0.46994

0.8655

0.5472

0.0003

1.00000

0.11748

0.65389

0.3975

<.0001

1.00000

0.37278

BloodClotting
Score
X2
Prognostic
Index
X3
Enzyme
Test
X8
Alc Use:
Heavy
LnY
LnY

0.09012
0.5169

-0.14963

-0.02361

0.2802

0.8655

0.22414

-0.08372

0.11748

0.1032

0.5472

0.3975

0.24619

0.46994

0.65389

0.37278

0.0727

0.0003

<.0001

0.0055

0.0055

1.00000

STA5102 Model Building (Examples)


Residual plot

STA5102 Model Building (Examples)


Multicollinearity analysis

Parameter Estimates
Variable

Label

Intercept Intercept

DF Parameter Standard t Value


Estimate
Error

Pr > |t|

Variance
Inflation

3.85242

0.19270

19.99

<.0001 0

X1

BloodClotting
Score

0.07332

0.01897

3.86

0.0003 1.10259

X2

Prognostic
Index

0.01419

0.00173

8.20

<.0001 1.01992

X3

Enzyme
Test

0.01545

0.00140

11.07

<.0001 1.04871

X8

Alc Use:
Heavy

0.35297

0.07719

4.57

<.0001 1.09186

STA5102 Model Building (Examples)


Diagnostics for outlying cases
Case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

residual_LnY
0.06918
-0.08416
0.07887
-0.14409
0.28261
-0.07732
0.00064
0.27903
0.28214
0.22067
-0.17547
-0.18259
-0.04006
-0.06089
-0.13078
0.19951
0.59524
-0.15971
-0.18676
-0.03793
-0.26273
-0.23207
0.27876
-0.22502
-0.21854
0.34997
0.00175
0.08759
0.30196
0.12854
0.09468
-0.28607
-0.16500
-0.14508
0.02622
-0.04128
0.09732
-0.22706
0.21294
0.21218
0.24164
-0.03035
0.06959
-0.02524
-0.45307
0.22155
0.01113
-0.16797
-0.00162
0.08982
-0.13442
-0.13749
-0.16208
-0.23869

student_LnY
0.33399
-0.40555
0.38553
-0.71252
1.47042
-0.38010
0.00327
1.36037
1.36456
1.08673
-0.85583
-0.90534
-0.20675
-0.29683
-0.66012
0.97187
3.06161
-0.81067
-0.90639
-0.19036
-1.26691
-1.17870
1.46749
-1.08259
-1.06767
1.67990
0.00844
0.49345
1.46848
0.62359
0.46833
-1.53630
-0.79352
-0.73269
0.12604
-0.20151
0.48807
-1.29246
1.02439
1.02598
1.16899
-0.16360
0.35600
-0.12182
-2.24389
1.09281
0.05481
-0.88184
-0.00779
0.44784
-0.66216
-0.73925
-0.78253
-1.19911

cookd_LnY
0.00081
0.00107
0.00186
0.00887
0.08811
0.00216
0.00000
0.02105
0.01492
0.01852
0.00846
0.01528
0.00157
0.00100
0.01159
0.01040
0.33062
0.01914
0.00778
0.00087
0.01090
0.04087
0.10006
0.00685
0.01397
0.01383
0.00000
0.02002
0.02227
0.00362
0.00386
0.13333
0.00358
0.01439
0.00009
0.00048
0.00564
0.14725
0.00610
0.00835
0.01111
0.00157
0.00415
0.00011
0.09132
0.01954
0.00005
0.03507
0.00000
0.00423
0.00692
0.03120
0.00446
0.03515

h_LnY
0.03519
0.03146
0.05876
0.08032
0.16926
0.06946
0.13698
0.05380
0.03853
0.07270
0.05459
0.08528
0.15552
0.05367
0.11735
0.05220
0.14992
0.12710
0.04521
0.10726
0.03285
0.12822
0.18852
0.02838
0.05775
0.02392
0.03648
0.29137
0.04909
0.04451
0.08086
0.22024
0.02766
0.11818
0.02703
0.05604
0.10593
0.30591
0.02826
0.03816
0.03906
0.22623
0.14070
0.03472
0.08315
0.07562
0.07217
0.18402
0.02570
0.09537
0.07317
0.22208
0.03517
0.10892

rstudent_LnY
0.33094
-0.40207
0.38215
-0.70889
1.48855
-0.37676
0.00324
1.37258
1.37698
1.08878
-0.85345
-0.90364
-0.20472
-0.29405
-0.65628
0.97131
3.36960
-0.80779
-0.90471
-0.18848
-1.27497
-1.18351
1.48545
-1.08454
-1.06923
1.71272
0.00836
0.48961
1.48650
0.61966
0.46457
-1.55854
-0.79048
-0.72918
0.12477
-0.19953
0.48424
-1.30159
1.02492
1.02654
1.17348
-0.16197
0.35281
-0.12059
-2.34460
1.09503
0.05425
-0.87981
-0.00771
0.44416
-0.65832
-0.73579
-0.77938
-1.20462

dffits_LnY
0.06320
-0.07246
0.09548
-0.20949
0.67191
-0.10294
0.00129
0.32731
0.27565
0.30486
-0.20509
-0.27592
-0.08786
-0.07003
-0.23930
0.22796
1.41506
-0.30824
-0.19686
-0.06533
-0.23497
-0.45389
0.71597
-0.18536
-0.26470
0.26812
0.00163
0.31395
0.33776
0.13374
0.13780
-0.82830
-0.13332
-0.26694
0.02080
-0.04862
0.16668
-0.86410
0.17477
0.20446
0.23658
-0.08758
0.14276
-0.02287
-0.70606
0.31319
0.01513
-0.41781
-0.00125
0.14421
-0.18497
-0.39313
-0.14880
-0.42115

10

Potrebbero piacerti anche