Sei sulla pagina 1di 48

Categorical Data: Ordinal

Ordered discrete response variable with a xed number of classes c. For convenience we will number the categories 1 . . . c. Consider calving diculty: Diculty Easy Moderate Dicult Code 1 2 3

Stat 892: Spring 2004

In this case it is reasonable to think of a moderately dicult calving being somewhere between easy and dicult calving. However, it doesnt make sense to say that the dierence between a moderately dicult and easy calving is the same as the dierence between a dicult and an moderately dicult calving.

Stat 892: Spring 2004

Response Variable
The response on individual i will either be recorded as yi {1 . . . c} or as the c 1 vector zi1 . zi = . zic zij = 1 yi = j 0 otherwise.

with

Stat 892: Spring 2004

Multinomial distribution
yi Mult(1, i1 . . . ic) Mean i1 . E(z i) = i = i = . ic

Covariance Matrix var(z i) = Ri = Diag( i) i i

Stat 892: Spring 2004

Link Function
Recall: Logit

i xi = ln 1 i exp(xi) i = 1 + exp(xi) exp(xi) Pr(yi = 0) = 1 + exp(xi)


Stat 892: Spring 2004 5

Generalize this by modeling exp(Ij + xi) Pr(yi j) = 1 + exp(Ij + xi) Pr(yi j) Ij + xi = ln 1 Pr(yi j) Pr(yi = j) = ij = Pr(yi j) Pr(yi j 1) where Pr(yi 0) = 0 Pr(yi c) = 1

Stat 892: Spring 2004

Coal miner pneumoconiosis data set


Exposure Years 5.8 15.0 21.5 27.5 33.5 39.5 46.0 51.5 Diagnosis Normal Moderate 98 0 51 2 34 6 35 5 32 10 23 7 12 6 4 2 Severe 0 1 3 8 9 8 10 5

McCullagh and Nelder (1989) Generalized Linear Models, pg. 179


Stat 892: Spring 2004 7

100 90 80 Cases (%) 70 60 50 40 30 0 10 Severity 20 Cat 1 30 Exposure (years) Cat 1 or 2 40 50 60

Stat 892: Spring 2004

6 5 4 Logit 3 2 1 0 -1 0 10 Severity 20 Cat 1 30 Exposure (years) Cat 1 or 2 40 50 60

Stat 892: Spring 2004

Model
yi ind Mult(1, iN , iM , iS ) ln(iN /(1 iN )) = IN + b Log(Exp)i ln(iN + iM /(1 iN iM )) = IM + b Log(Exp)i ln((1 iS )/iS ) = IM + b Log(Exp)i ln(iS /(1 iS )) = IM b Log(Exp)i

Stat 892: Spring 2004

10

Parameters Intercepts: IN and IM Slope: b

Stat 892: Spring 2004

11

Program
proc genmod data=miner; freq n; model score=logtime/dist=multinomial link=cumlogit type3; run;

Stat 892: Spring 2004

12

Results
Model Information Data Set Distribution Link Function Dependent Variable Frequency Weight Variable WORK.MINER Multinomial Cumulative Logit score n

Number Number Sum of Sum of

of Observations Read of Observations Used Frequencies Read Frequencies Used

22 22 371 371

Stat 892: Spring 2004

13

Response Profile Ordered Value 1 2 3 Total Frequency 289 38 44

score 1 2 3

PROC GENMOD is modeling the probabilities of levels of score having LOWER Ordered Values in the response profile table. One way to change this to model the probabilities of HIGHER Ordered Values is to specify the DESCENDING option in the PROC statement. Criteria For Assessing Goodness Of Fit Criterion Log Likelihood Algorithm converged. DF Value -204.2742 Value/DF

Stat 892: Spring 2004

14

Analysis Of Parameter Estimates Standard Error 1.3233 1.3437 0.3810 0.0000 Wald 95% Confidence Limits 7.0826 7.9481 -3.3435 1.0000 12.2696 13.2154 -1.8502 1.0000 ChiSquare 53.47 62.02 46.47

Parameter Intercept1 Intercept2 logtime Scale

DF 1 1 1 0

Estimate 9.6761 10.5817 -2.5968 1.0000

Pr > ChiSq <.0001 <.0001 <.0001

NOTE: The scale parameter was held fixed.

Stat 892: Spring 2004

15

LR Statistics For Type 3 Analysis ChiSquare 96.61

Source logtime

DF 1

Pr > ChiSq <.0001

Stat 892: Spring 2004

16

Lack of Fit
proc genmod data=miner; freq n; class time; model score=logtime time/ dist=multinomial link=cumlogit type1; run;

Stat 892: Spring 2004

17

Criteria For Assessing Goodness Of Fit Criterion Log Likelihood DF Value -202.6940 Value/DF

Algorithm converged. LR Statistics For Type 1 Analysis ChiSquare

Source Intercepts logtime time

Deviance 1010.3241 817.0967 810.7760

DF

Pr > ChiSq

1 6

96.61 3.16

<.0001 0.7885

2(202.694 (204.2742)) = 3.1604

Stat 892: Spring 2004

18

Fitted Values
%macro invlogit(xbeta); exp(&xbeta)/(1+exp(&xbeta)); %mend; data minerlogit;set miner3; xbeta=-2.5968*logtime; int=9.6761; if Severity="Cat 1 or 2" then int=10.5817; output; Severity=substr(Severity,1,10) || " Pred"; pestl=int+xbeta; prob=100*%invlogit(int+xbeta); output;
Stat 892: Spring 2004 19

100 90 80 Cases (%) 70 60 50 40 30 0 Severity 10 20 Cat 1 Cat 1 or 2 30 Exposure (years) Cat 1 Pred Cat 1 or 2 Pred 40 50 60

Stat 892: Spring 2004

20

7 6 5 Logit 4 3 2 1 0 -1 0 Severity 10 20 Cat 1 Cat 1 or 2 30 Exposure (years) Cat 1 Pred Cat 1 or 2 Pred 40 50 60

Stat 892: Spring 2004

21

Eect of Surface and Vision on Balance


http://www.statsci.org/data/oz/ctsib.html Factors Sex (Female and Male) Surface (Normal and Foam) Vision (Closed, Dome, and Open)

Stat 892: Spring 2004

22

40 Subjects Age (yr) Weight (kg) Height (cm) Each treatment repeated twice for each subject

Stat 892: Spring 2004

23

Eects
Fixed Sex|Surface|Vision Age, Weight, and Height as Covariates Random Subject Surface*Vision*Subject

Stat 892: Spring 2004

24

SAS
%let DIR=h:/mixed-model; data balance; infile "&DIR/ctsibuni.txt" firstobs=2 expandtabs; length sex $6 Vision $ 6; input Subject Sex Age Height Weight Surface $ Vision CTSIB=min(CTSIB,3) run; proc print; run; proc means data=balance; var age weight height; run;
Stat 892: Spring 2004

CTSIB;

25

proc genmod data=balance; class sex Vision Subject Surface; model score=age weight height sex|Vision|Surface/dist=mult type3;

Stat 892: Spring 2004

26

Genmod
The GENMOD Procedure Model Information Data Set Distribution Link Function Dependent Variable WORK.BALANCE Multinomial Cumulative Logit score

Number of Observations Read Number of Observations Used

480 480

Stat 892: Spring 2004

27

Class Level Information Class sex Vision Subject Levels 2 3 40 Values female male closed dome open 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 foam norm

Surface

Stat 892: Spring 2004

28

Response Profile Ordered Value 1 2 3 Total Frequency 114 292 74

score 1 2 3

PROC GENMOD is modeling the probabilities of levels of score having LOWER Ordered Values in the response profile table. One way to change this to model the probabilities of HIGHER Ordered Values is to specify the DESCENDING option in the PROC statement.

Stat 892: Spring 2004

29

Criteria For Assessing Goodness Of Fit Criterion Log Likelihood DF Value -243.1001 Value/DF

Algorithm converged.

Stat 892: Spring 2004

30

LR Statistics For Type 3 Analysis ChiSource DF Square Pr > ChiSq Age 1 0.16 0.6912 Weight 1 23.87 <.0001 Height 1 24.88 <.0001 sex 1 7.30 0.0069 Vision 2 182.83 <.0001 sex*Vision 2 4.60 0.1001 Surface 1 269.08 <.0001 sex*Surface 1 0.44 0.5081 Vision*Surface 2 7.07 0.0291 sex*Vision*Surface 2 3.09 0.2134

Stat 892: Spring 2004

31

GLIMMIX
proc glimmix data=balance; class sex Vision Subject Surface; model score=age weight height sex|Vision|Surface /dist=mult ddfm=satterthwaite; random intercept surface*Vision/subject=subject(sex); estimate closed vs dome vision 1 -1 0; estimate open vs rest vision -.5 -.5 1; estimate foam vs norm surface 1 -1; estimate F vs M closed sex 1 -1 sex*vision 1 0 0 -1 0 0 ; estimate F vs M dome sex 1 -1 sex*vision 0 1 0 0 -1 0; estimate F vs M open sex 1 -1 sex*vision 0 0 1 0 0 -1; estimate Weight weight 1; estimate Height height 1;
Stat 892: Spring 2004 32

The GLIMMIX Procedure Model Information Data Set Response Variable Response Distribution Link Function Variance Function Variance Matrix Blocked By Estimation Technique Degrees of Freedom Method WORK.BALANCE score Multinomial (ordered) Cumulative Logit Default Subject(sex) Residual PL Satterthwaite

Stat 892: Spring 2004

33

Class Level Information Class sex Vision Subject Levels 2 3 40 Values female male closed dome 1 2 3 4 5 6 19 20 21 22 34 35 36 37 foam norm

open 7 8 9 10 11 12 13 14 15 16 17 18 23 24 25 26 27 28 29 30 31 32 33 38 39 40

Surface

Number of Observations Read Number of Observations Used

480 480

Stat 892: Spring 2004

34

Response Profile Ordered Value 1 2 3 Total Frequency 114 292 74

score 1 2 3

The GLIMMIX procedure is modeling the probabilities of levels of score having lower Ordered Values in the Response Profile table.

Dimensions

Stat 892: Spring 2004

35

G-side Cov. Parameters Columns in X Columns in Z per Subject Subjects (Blocks in V) Max Obs per Subject

2 40 7 40 12

Stat 892: Spring 2004

36

Optimization Information Optimization Technique Parameters in Optimization Lower Boundaries Upper Boundaries Fixed Effects Starting From Dual Quasi-Newton 2 2 0 Profiled Data

Stat 892: Spring 2004

37

Convergence criterion (PCONV=1.11022E-8) satisfied.

Fit Statistics -2 Res Log Pseudo-Likelihood Pseudo-AIC (smaller is better) Pseudo-AICC (smaller is better) Pseudo-BIC (smaller is better) Pseudo-CAIC (smaller is better) Pseudo-HQIC (smaller is better) 6888.28 6892.28 6892.30 6895.65 6897.65 6893.50

Stat 892: Spring 2004

38

Covariance Parameter Estimates Standard Error 1.4750 0.5589

Cov Parm Intercept Vision*Surface

Subject Subject(sex) Subject(sex)

Estimate 4.0077 1.6234

Stat 892: Spring 2004

39

Type III Tests of Num Effect DF Age 1 Weight 1 Height 1 sex 1 Vision 2 sex*Vision 2 Surface 1 sex*Surface 1 Vision*Surface 2 sex*Vision*Surface 2

Fixed Effects Den DF F Value 25.64 0.01 27.15 4.18 27.18 4.91 27.23 1.93 360.4 29.62 229.6 2.25 464 63.34 221.6 0.32 362.9 1.18 266.7 0.55

Pr > F 0.9046 0.0507 0.0353 0.1756 <.0001 0.1080 <.0001 0.5696 0.3083 0.5784

Stat 892: Spring 2004

40

Estimates Standard Error 0.4339 0.7270 0.8521 1.3687 1.3274 1.3608 0.04533 0.06605

Label closed vs dome open vs rest foam vs norm F vs M closed F vs M dome F vs M open Weight Height

Estimate -0.8305 5.5587 -6.7815 -2.8479 -1.3362 -1.0147 0.09269 -0.1464

DF 214.7 464 464 38.82 34.9 38.9 27.15 27.18

t Value -1.91 7.65 -7.96 -2.08 -1.01 -0.75 2.04 -2.22

Pr > |t| 0.0569 <.0001 <.0001 0.0441 0.3211 0.4604 0.0507 0.0353

Stat 892: Spring 2004

41

LSMEANS
estimate LSM Vision closed 1 intercept 1 0 sex .5 weight 71.145 height 172.05 Vision 1 0 0 Surface .5 .5 sex*vision .5 0 surface*vision .5 .5 0 0 0 0 sex*surface .25 .25 .25 .25 sex*vision*surface .25 .25 0 0 0 0 .25 .25 estimate LSM Vision dome 1 intercept 1 0 sex .5 weight 71.145 height 172.05 Vision 0 1 0 Surface .5 .5 sex*vision 0 .5 surface*vision 0 0 .5 .5 0 0 sex*surface .25 .25 .25 .25 sex*vision*surface 0 0 .25 .25 0 0 0 0 .25 ods output estimates=lsm;
Stat 892: Spring 2004

.5

age 21.8 0 0

0 .5

0 0 0 0 ; .5 age 21.8 0 0 .5 0

.25 0 0 ;

42

data lsm;set lsm; length cat $ 15; if substr(label,1,3) = "LSM" then do; var=scan(label,2); lev=scan(label,3); c=scan(label,4); if c=1 then cat="Stable"; if c=2 then cat="<= Mod. Stable"; label cat="Category"; prob=exp(estimate)/(1+exp(estimate)); label prob="Probability"; output; end;

Stat 892: Spring 2004

43

Obs 1 2 3 4 5 6 7 8 9 10

var Vision Vision Vision Vision Vision Vision Surface Surface Surface Surface

lev closed dome open closed dome open foam norm foam norm

cat Stable Stable Stable <= Mod. <= Mod. <= Mod. Stable Stable <= Mod. <= Mod.

Estimate -6.2103 -5.3798 -0.2364 3.6023 4.4329 9.5763 -7.3329 -0.5514 2.4797 9.2613

prob 0.00200 0.00459 0.44118 0.97346 0.98826 0.99993 0.00065 0.36554 0.92271 0.99990

Stable Stable Stable

Stable Stable

Stat 892: Spring 2004

44

Estimate 10.0000 9.0000 8.0000 7.0000 6.0000 5.0000 4.0000 3.0000 2.0000 1.0000 0 -1.0000 -2.0000 -3.0000 -4.0000 -5.0000 -6.0000 -7.0000 closed Category dome Vision <= Mod. Stable Stable open

Stat 892: Spring 2004

45

Probability 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 closed Category dome <= Mod. Stable Vision Stable open

Stat 892: Spring 2004

46

Estimate 10.0000 9.0000 8.0000 7.0000 6.0000 5.0000 4.0000 3.0000 2.0000 1.0000 0 -1.0000 -2.0000 -3.0000 -4.0000 -5.0000 -6.0000 -7.0000 -8.0000 foam Category Surface <= Mod. Stable Stable norm

Stat 892: Spring 2004

47

Probability 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 foam Category <= Mod. Stable Surface Stable norm

Stat 892: Spring 2004

48

Potrebbero piacerti anche