Sei sulla pagina 1di 17

Stat250GundersonLectureNotes

RelationshipsbetweenCategoricalVariables
12:ChiSquareAnalysis

InferenceforCategoricalVariables

Having now covered a lot of inference techniques for quantitative responses, we return to
analyzingcategoricaldata,thatis,analyzingcountdata.Thethreemaintestsdescribedinthe
textthatwewillcoverare:

1. GoodnessofFitTest:thistestisforassessingifaparticulardiscretemodelisagoodfitting
modelforadiscretecharacteristic,basedonarandomsamplefromthepopulation.
E.g. Hasthemodelforthemethodoftransportation(drive,bike,walk,other)used
bystudentstogettheclasschangedfromthatfor5yearsago?

2. TestofHomogeneity:thistestisforassessingiftwoormorepopulationsarehomogeneous
(alike)withrespecttothedistributionofsomediscrete(categorical)variable.
E.g. Isthedistributionofopiniononlegalgamblingthesameforadultmalesversus
adultfemales?

3. TestofIndependence:thistesthelpsustoassessiftwodiscrete(categorical)variablesare
independentforapopulation,orifthereisanassociationbetweenthetwovariables.
E.g. Isthereanassociationbetweensatisfactionwiththequalityofpublicschools
(not satisfied, somewhat satisfied, very satisfied) and political party
(Republican,Democrat,etc.)

The first test is the onesample test for count data. The other two tests (homogeneity and
independence)areactuallythesametest.Althoughthehypothesesarestateddifferentlyand
theunderlyingassumptionsabouthowthedataisgatheredaredifferent,thestepsfordoingthe
twotestsareexactlythesame.

All three tests are based on an X 2 test statistic that, if the corresponding H0 is true and the
assumptions hold, follows a chisquare distribution with some degrees of freedom, written
2 (df ) . So our first discussion is to learn about the chisquare distribution what the
distributionlookslike,somefacts,howtouseTableA.5tofindvariouspercentiles.

193

TheChiSquareDistribution

GeneralShape:
Ifwehaveachisquaredistributionwith
df=degreesoffreedom,
thenthe...

Meanisequaltodf

Varianceisequalto2(df)

Standarddeviationisequalto[2(df)]

Thesefactswillserveasausefulframeof
referenceformakingdecision.

Allimages
TableA.5providessomeuppertailpercentilesforchisquaredistributions.

From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission.

TryIt!
Considerthe 2 ( 4) distribution.
a. Whatisthemeanforthisdistribution?___4____

b. Whatisthemedianforthisdistribution?___3.36_______

c. Howlikelywoulditbetogetavalueof4orevenlarger?
Drawapicturetohelpshowit.Areaisbetween0.25and0.50.

d. Howlikelywoulditbetogetavalueof10.3orevenlarger?

Drawapicturetohelpshowit.Areaisbetween0.025and0.05.

Thisishowboundsforapvaluewillbefound

194

TheBIGIDEA

Thedataconsistsofobservedcounts.
WecomputeexpectedcountsundertheH0thesecountsarewhatwewouldexpect(on
average)ifthecorrespondingH0weretrue.
Comparetheobservedandexpectedcountsusingthe X 2 teststatistic.Thestatisticwill
beameasureofhowclosetheobservedcountsaretotheexpectedcountsunderH0.If
thisdistanceislarge,wehavesupportforthealternativeHa.

With this in mind, we turn to our first chisquare test of goodness of fit. We will derive the
methodologyforthetestthroughanexample.Anoverallsummaryofthetestwillbepresented
attheend.

TestofGoodnessofFit:Helpsusassessifaparticulardiscretemodelisagoodfitting
modelforadiscretecharacteristic,basedonarandomsamplefromthepopulation.

GoodnessofFitTest

Scenario:Wehaveonepopulationofinterest,sayallcarsexitingatollroadthathasfourbooths
attheexit.

Question:Arethefourboothsusedequallyoften?

Data:Onerandomsampleof100cars,werecordonevariableX,whichboothwasused(1,2,3,
4).Thetablebelowsummarizesthedataintermsoftheobservedcounts.

Observed#cars

Booth1
26

Booth2
20

Booth3
28

Booth4
26

Note:Thisisonlyaonewayfrequencytable,notatwowaytableaswillbeinthehomogeneity
andindependencetests.Weusethenotationk=thenumberofcategoriesorcells,here k 4 .

Thenullhypothesis:

Letpi=(population)proportionofcarsusingbooth i

H0:p1=0.25,p2=0.25,p3=0.25,p4=0.25.

Ha:___notallprobabilitiesspecifiedinH0arecorrect_______________

Thenullhypothesisspecifiesaparticulardiscretemodel(massfunction)bylistingtheproportions
(orprobabilities)foreachofthe k outcomecategories.

TheonewaytableprovidestheOBSERVEDcounts.OurnextstepistocomputetheEXPECTED
counts,undertheassumptionthatH0istrue.

195

Howtofindtheexpectedcounts?
Therewere100carsinthesampleand4booths.

Iftheboothsareusedequallyoften,H0istrue,thenwewouldexpect

...25carstouseBooth#1

Howdidyou
getthe25?
25%of100(np)

...25carstouseBooth#2

...25carstouseBooth#3

...25carstouseBooth#4

ExpectedCounts E i np i
Entertheseexpectedcountsintheparenthesesinthetablebelow.

ObservedCounts(ExpectedCounts)}

Booth1
Booth2
Booth3
Numberofcars 26(25)
20(25)
28(25)

Booth4
26(25)

The X 2 teststatistic
Nextweneedourteststatistic,ourmeasureofhowclosetheobservedcountsaretowhatwe
expectunderthenullhypothesis.

X2

O E 2

26 252 20 252 28 252 26 252

E
25
25
25
25
(1 25 9 1) / 25 36 / 25 1.44

Doyouthinkavalueof X 2 1.44islargeenoughtorejectH0?

Let'sfindthepvalue,theprobabilityofgettingan X 2 teststatisticvalueaslargeorlargerthan
theoneweobserved,assumingH0istrue.Todothisweneedtoknowthedistributionofthe X 2
teststatisticunderthenullhypothesis.

IfH0istrue,then X 2 hasthe 2 distributionwithdegreesoffreedom=k 1 .

allcells

196

Findthepvalueforourtollboothexample:

Observed X 2 teststatisticvalue=1.44

df=4 1 = 3 .

Sketchdistributiontofindboundspvalueis>0.50.

Aretheresultsstatisticallysignificantatthe5%significancelevel?NO

Conclusionata5%level:Itappearsthat....the4boothsareusedequallyoftenforthepopulation
ofallcarsrepresentedbyoursample.

Aside:Usingourframeofreferenceforchisquaredistributions.
Recallthatifwehaveachisquaredistributionwith df degreesoffreedom,thenthemeanis
equalto df ,andthestandarddeviationisequalto 2(df )

So,ifH0weretrue,wewouldexpectthe X 2 teststatistictobeabout3
giveortakeaboutsqrt(2*3)=2.45.

Since we reject H0 for large values of X 2 , and we only observed a value of 1.44 ,
evenlessthanexpectedunderH0,wecertainlydonothaveenoughevidencetorejectH0.

GoodnessofFitTestSummary

Assume:Wehave1randomsampleofsize n .

WemeasureonediscreteresponseXthathas k possibleoutcomes

Test: H0:AspecifieddiscretemodelforX p1 p10 , p2 p20 , , pk pk 0

Ha:Theprobabilitiesarenotasspecifiedinthenullhypothesis.
TestStatistic: X

observed - expected2

expected
whereexpected Ei npi 0

If H0 is true, then X 2 has a 2 distribution with ( k 1) degrees of freedom, where k is the


numberofcategories.Thenecessaryconditionsare:atleast80%oftheexpectedcountsare
greaterthan5andnonearelessthan1.Beawareofthesamplesize(pg656).

197

TryIt!CrossbreedingPeas
Forageneticsexperimentinthecrossbreedingofpeas,Mendelobtainedthefollowingdataina
samplefromthesecondgenerationofseedsresultingfromcrossingyellowroundpeasandgreen
wrinkledpeas.n=556
YellowRound
YellowWrinkled
GreenRound
GreenWrinkled
315
101
108
32
312.75
104.25
104.25
34.75
556(9/16)=312.75,etc.
Dothesedatasupportthetheorythatthesefourtypesshouldoccurwithprobabilities9/16,3/16,
3/16,and1/16respectively?Use=0.01.

H 0 : p1 9/16, p 2 3/16, p 3 3/16, p 4 1/16.


X2

315 312.752 101 104.252 108 104.252 32 34.752


312 .75

104 .25

104 .25

34.75

0.47

Thepvalueis>0.50sowecannotrejectthenullhypothesis.Thedatadonotrefutethe
theory.Infact,theresultslookalmosttoogoodMendelhadafictitiousassistant,perhaps
fictitiousdatatoo?Ordidtheassumptionsnothold?Ordidwejustobserveaveryunusual.

TryIt!DesiredVacationPlace
The AAA travel agency would like to assess if the distribution of desired vacation place has
changedfromthemodelof3yearsago.Arandomsampleof928adultswerepolledbythepolling
companyIpsosduringthispastmidMay.One questionaskedwasNametheoneplaceyou
wouldwanttogoforvacationifyouhadthetimeandthemoney.Thetabledisplaysthemodel
forthedistributionofdesiredvacationplace3yearsagoandtheobservedresultsbasedonthe
recentpoll.

1=Hawaii
2=Europe
3=Caribbean
4=Other
Totals
Model
10%
40%
20%
30%
100%
3yearsago
ObsCounts
124(92.8)
390(371.2)
125(105.6)
289(278.4)
928
frompoll

a. Givethenullhypothesistotestiftherehasbeenasignificantchangeinthedistributionof
desiredvacationplacefrom3yearsago.

H0:p1=0.10,p2=0.40,p3=0.20,p4=0.30

b. The observed test statistic is nearly 31 and the corresponding pvalue is less than 0.001.
Interpretthispvalueintermsofrepeatedrandomsamplesof928adults.
If repeated random samples of n = 928 adults were obtained and if the distribution of
desiredvacationplacehasnotchanged,wewouldexpecttoseeanX2statisticofabout31
orlargerinlessthan0.1%oftherepetitions.
Note:theunderlinedphraseissayingifthenullhypothesisweretrue.
Thebiggestdiscrepanciesbetweentheobservedcountsandexpectedcountsunderthenull
wereforHawaiiandthenfromCaribbean.

198

TestofHomogeneity:Helpsustoassessifthedistributionforonediscrete(categorical)
variableisthesamefortwoormorepopulations.

TestofHomogeneity

Scenario:Wehave2populationsofinterest;preschoolboysandpreschoolgirls.
Question:IsIceCreamPreferencethesameforboysandgirls?

Data: 1randomsampleof75preschoolboys,
1randomsampleof75preschoolgirls;
thetworandomsamplesareindependent.
Thetablebelowsummarizesthedataintermsoftheobservedcounts.
ObservedCounts:
IceCreamPreference
Boys
Girls
Vanilla(V)
25
26
Chocolate(C)
30
23
Strawberry(S)
20
26

75
75

Note:Thecolumntotalsherewereknowninadvance,evenbeforetheicecreampreferences
weremeasured.Thisisakeyideaforhowtodistinguishbetweenthetestofhomogeneityand
thetestofindependence.

Thenullhypothesis:
H0: Thedistributionoficecreampreferenceisthesame
forthetwopopulations,boysandgirls.

Amoremathematicalwaytowritethisnullhypothesisis:
H0: P X i | population j P X i forall i, j
where X isthecategoricalvariable,inthiscase,icecreampreference.

Aswecansee,thenullhypothesisisstatingthatthedistributionoficecreampreferencedoes
notdependon(isindependentof)thepopulationweselectfromsincethetwodistributionsare
thesame.

Thenullhypothesislookslike: P A | B P A ,whichisonedefinitionofindependentevents,
fromourpreviousdiscussionofindependence.Thisiswhythetestofhomogeneity(comparing
severalpopulations)isreallythesameasthetestofindependence.Theassumptionsaredifferent
however.

For our homogeneity (comparing several populations) test, we assume we have independent
randomsamples,onefromeachpopulation,andwemeasure1discrete(categorical)response.
Fortheindependencetest(discussedlater)wewillassumewehavejust1randomsamplefrom
1population,butwemeasure2discrete(categorical)responses.

GettingbacktoICECREAM...ThetableprovidestheOBSERVEDcounts.Ournextstepisto
computetheEXPECTEDcounts,undertheassumptionthatH0istrue.

199

Howtofindtheexpectedcounts?
Let'slookatthosewhopreferredStrawberryfirst.

Strawberry: Sincetherewere46childrenwhopreferredStrawberryoverall,
ifthedistributionsforboysandgirlsarethesame(H0istrue),
thenwewouldexpect23ofthesechildrentobeboys
andtheremaining23ofthesechildrentobegirls.

Notethatoursamplesizeswerethesame,75boysand75girls,50%ofeach.Iftheywerenot
5050,wewouldhavetoadjusttheexpectedcountsaccordingly.LetsdothesamefortheVanilla
andChocolatepreferences.

Chocolate:

Vanilla:

Sincetherewere53childrenwhopreferredChocolateoverall,
ifthedistributionsforboysandgirlsarethesame(H0istrue),
thenwewouldexpect26.5ofthesechildrentobeboys
andtheremaining26.5ofthesechildrentobegirls.
Sincetherewere51childrenwhopreferredVanillaoverall,
ifthedistributionsforboysandgirlsarethesame(H0istrue),
thenwewouldexpect25.5ofthesechildrentobeboys
andtheremaining25.5ofthesechildrentobegirls.

Entertheseexpectedcountsintheparenthesesinthetablebelow.

ObservedCounts(ExpectedCounts)
IceCreamPreference
Boys
Girls
Total
Vanilla(V)
25(25.5 )
26(25.5 )
51
Chocolate(C)
30(26.5 )
23(26.5 ) 53
Strawberry(S)
20(23 )
26(23 )
46
Total
75
75
150

ACloserLookattheExpectedCounts:
Let'slookathowweactuallycomputedanexpectedcountsowecandevelopageneralrule:If
H0weretrue(i.e.,nodifferenceinpreferencesforboysversusgirls),thenourbestestimateof
theP(achildprefersvanilla)=51/150.Sincewehad75boys,undernodifferenceinpreference,
wewouldexpect75x(51/150)toprefervanilla.Thatis,theexpectednumberofboys
preferringvanilla= 150
.Thisquickrecipeforcomputingthe
Total n
expectedcountsunderthenullhypothesisiscalledtheCrossProductRule.

( 75)( 51)

(row total)(column total)

200

The X 2 teststatistic
Nextweneedtocomputeourteststatistic,ourmeasureofhowclosetheobservedcountsare
towhatweexpectunderthenullhypothesis.Belowweareprovidedthefirstcontributionto
theteststatisticvalue.Determinetheremainingcontributionswhicharesummedtogetthe
value.
2
2
2
2
2
2

25 25.5
26 25.5
30 26.5
23 26.5
20 23
26 23
2
X

25.5
25.5
26.5
26.5
23
23
1.73
Thereare6cellsinthetable,so6termstoaddupintheteststatistic.

Thelargertheteststatistic,thebiggerthedifferencesbetweenwhatweobservedandwhat
wewouldexpecttoseeifH0weretrue.Sothelargertheteststatistic,themoreevidencewe
haveagainstthenullhypothesis.

Isavalueof X 2 1.73largeenoughtorejectH0?

Weneedtofindthepvalue,theprobabilityofgettingan X 2 teststatisticvalueaslargeorlarger
thantheoneweobserved,assumingH0istrue.Todothisweneedtoknowthedistributionof
the X 2 teststatisticunderthenullhypothesis.

IfH0istrue,then X 2 hasthe 2 distributionwithdegreesoffreedom=(r1)(c1)

Briefmotivationforthedegreesoffreedomformula:
Ifyouknewthat50%wereboysyouwouldknowtherewere50%girls(c1)
Ifyouknowsay70%likedchocorvanyouwouldknow30%likedstraw(r1)

Findthepvalueforouricecreamexample:
Observed X 2 teststatisticvalue=1.73df=(31)(21)=2

Decisionata5%significancelevel:(circleone)
RejectH0
FailtorejectH0
Makeasketchandfindtheboundsforthepvalue0.25<pvalue<0.50.

Conclusion:Itappearsthat....
Thedistributionoficecreampreferenceisthesamefor
thepopulationsofboysandgirlsrepresentedbythese
samples.

201

TestofHomogeneitySummary(ComparisonofSeveralPopulations)

Assume:Wehave C independentrandomsamplesofsize n1 , n2 ,..., nc

from C populations.
Wemeasure1discreteresponse X thathas r possibleoutcomes.

Test:
H0:Thedistributionfortheresponsevariable X isthesameforallpopulations.

TestStatistic: X

observed - expected2

expected
(row total)(column total)
whereexpected

Total n

IfH0istrue,then X 2 hasa 2 distributionwith ( r 1)( c 1) degreesoffreedom.Thenecessary


conditionsare:atleast80%oftheexpectedcountsaregreaterthan5andnonearelessthan1.

TryIt!WhatisyourDecision?
For a chisquare test of homogeneity, there are 3 populations and 4 possible values of the
discretecharacteristic.

IfH0istrue,thatis,thedistributionfortheresponseisthesameforall3populations,whatisthe
expectedvalueoftheteststatistic?

TheteststatisticisX2.
IfH0istruetheteststatisticwillhaveachisquareddistribution
with(31)(41)=6degreesoffreedom.
SoifH0istrue,wewouldexpecttheteststatistictobeabout6.

202

TryIt!TreatmentforShingles
AnarticlehadtheheadlineForadults,chickenpoxvaccinemaystopshingles.Aclinicaltrial
wasconductedinwhich420subjectswererandomlyassignedtoreceivethechickenpoxvaccine
oraplacebovaccine.Somesideeffectsofinterestwereswellingandrasharoundtheinjection
site.Considerthefollowingresultsfortheswellingsideeffect.

Vaccine
Placebo

Major Swelling Minor Swelling No Swelling


54
42
134
16
32
142
Pearson's Chi-squared test

data: .Table
X-squared = 18.5707, df = 2, p-value = 9.277e-05

a. Givethenameofthetesttobeusedforassessingifthedistributionofswellingstatusisthe
sameforthetwotreatmentpopulations.
Chisquaredtestofhomogeneity

b.Basedontheabovedata,amongthosechickenpoxvaccinatedsubjects,whatpercenthad
majorswellingaroundtheinjectionsite?

54/230=0.2348

c. Basedontheabovedata,amongthoseplacebovaccinatedsubjects,whatpercenthadmajor
swellingaroundtheinjectionsite?

16/190=0.0842

d.Assumingthedistributionofswellingstatusisthesameforthetwotreatmentpopulations,
howmanychickenpoxvaccinatedsubjectswouldyouexpecttohavemajorswellingaround
theinjectionsite?Showyourwork.

(230x70)/420=38.33

e. ComputethecontributiontotheChisquareteststatisticbasedonthosevaccinatedsubjects
whohadmajorswellingaroundtheinjectionsite.

(5438.33)2/38.33=6.406

f. Use a level of 0.05 to assess if the distribution of swelling status is the same for the two
treatmentpopulations.
TestStatisticValue:__18.571_ pvalue:__0.00009__
Thus,thedistributionofswellingstatus(circleyouranswer):doesdoesnot
appearstobethesameforthetwotreatmentpopulations.

203

Test of Independence: Helps us to assess if two discrete (categorical) variables are


independentforapopulation,orifthereisanassociationbetweenthetwovariables.

TestofIndependence

Scenario:Wehaveonepopulationofinterestsayfactoryworkers.

Question:Istherearelationshipbetweensmokinghabitsandwhetherornotafactoryworker
experienceshypertension?

Data: 1randomsampleof180factoryworkers,wemeasurethetwovariables:

Y=hypertensionstatus(yesorno)
X=smokinghabit(non,moderate,heavy)

Thetablebelowsummarizesthedataintermsoftheobservedcounts.
ObservedCounts:

Y=
HyperYes
StatusNo

X=

Smoking
Mod
36
26
62

Non
21
48
69

Habit
Heavy
30
19
49

87
93
180

Gettherowandcolumntotals.
Note:neitherrownorcolumntotalswereknowninadvancebeforemeasuringhypertensionand
smokinghabit.Weonlyknowtheoveralltotalof180.

Thenullhypothesis:
H0: Thereisnoassociationbetweensmokinghabitandhypertensionstatus

forthepopulationoffactoryworkers.
(orThetwofactors,smokinghabitandhypertensionstatus,areindependentforthe
population.)

Onemoremathematicalwaytowritethisnullhypothesisis:
H0: P X i and Y j P( X i) P(Y j )

Thenullhypothesislookslike: P ( A and B ) P ( A) P ( B ) ,whichisonedefinitionofindependent


events,fromourpreviousdiscussionofindependence.

204

GettingbacktoourFACTORYWORKERS
ThetwowaytableprovidestheOBSERVEDcounts.OurnextstepistocomputetheEXPECTED
counts, under the assumption that H 0 is true. The expected counts and the test statistic are
foundthesamewayasforthehomogeneitytest.
CrossProductRule: ExpectedCounts

(row total)(column total)

Total n

Computeandentertheseexpectedcountsintheparenthesesinthetablebelow.
ObservedCounts(ExpectedCounts):

Y=
HyperYes

X=
Non
21
(33.35)
48
(35.65)
69

StatusNo

Smoking
Mod
36
(29.97)
26
(32.03)
62

Habit
Heavy
30
(23.68)
19
(25.32)
49

87
93
180

The X teststatistic
Ourmeasureofhowclosetheobservedcountsaretowhatweexpectunderthenullhypothesis.
2
2
2
2
2
2

21 33.35
36 19.97
30 23.68
48 35.65
26 32.03
19 25.32
2
X

33.35
29.97
23.68
35.65
32.03
25.32
14.5

Doyouthinkavalueof X 2 14.5islargeenoughtorejectH0?

Thenextstepistofindthepvalue,theprobabilityofgettingan X 2 teststatisticvalueaslarge
or larger than the one we observed, assuming H0 is true. To do this we need to know the
distributionofthe X 2 teststatisticunderthenullhypothesis.
IfH0istrue,then X 2 hasthe 2 distributionwithdegreesoffreedom= (r1)(c1)

Aside:Usingourframeofreferenceforchisquaredistributions.
IfH0weretrue,wewouldexpectthe X 2 teststatistictobeabout2

giveortakeaboutsqrt(2*2)=2 .
Abouthowmanystandarddeviationsistheobserved X 2 valueof14.5fromtheexpectedvalue
underH0?Whatdoyouthinkthedecisionwillbe?
(14.52)/2=6.25about6standarddeviationsabovetheexpectedvalueunderH0.

205

Findthepvalueforourfactoryworkerexample:

Observed X 2 teststatisticvalue=14.5 df=2

Find the pvalue and use it to determine if the results are statistically significant at the 1%
significancelevel.
Sketchthedistributiontoshowtheboundsare:
pvalue<0.001
Sotheresultsarestatisticallysignificantatthe
1%level

Conclusionata1%level:Itappearsthat....
thereisanassociationbetweensmokingand
hypertensionforthepopulationoffactory
workersrepresentedbythissample.

TestofIndependenceSummary

Assume: Wehave1randomsampleofsize n .

Wemeasure2discreteresponses:

Xwhichhas r possibleoutcomes

andYwhichhas c possibleoutcomes.

Test: H0:ThetwovariablesXandYareindependentforthepopulation.
TestStatistic: X

observed - expected2

expected
(row total)(column total)
whereexpected

Total n

IfH0istrue,then X 2 hasa 2 distributionwith ( r 1)( c 1) degreesoffreedom.Thenecessary


conditionsare:atleast80%oftheexpectedcountsaregreaterthan5andnonearelessthan1.

206

RelationshipbetweenAgeGroupandAppearanceSatisfaction
Areyousatisfiedwithyouroverallappearance?Arandomsampleof150womenweresurveyed.
Theiranswertothisquestion(very,somewhat,not)wasrecordedalongwiththeiragecategory
(1=under30,2=30to50,and3=over50).Rwasusedtogeneratethefollowingoutputfrom
thedata.

Under 30 30 to 50 Over 50
Very Satisfied
20
10
16
Somewhat Satisfied
18
20
18
Not Satisfied
10
29
9
Pearson's Chi-squared test
data: .Table
X-squared = 15.478, df = 4, p-value = 0.003805

a. Givethenameofthetesttobeusedforassessingifthereisarelationshipbetweenagegroup
andappearancesatisfaction.

__Chisquaredtestofindependence______________________

b.Assumingthereisnorelationshipbetweenagegroupandappearancesatisfaction,howmany
oldwomen(over50)wouldyouexpecttobeverysatisfiedwiththeirappearance?

(46)(43)/150=13.19

c. ComputethecontributiontotheChisquareteststatisticbasedontheolderwomen(over
50)whowereverysatisfiedwiththeirappearance.

(1613.19)2/13.19=0.599

d. Assumingthereisnorelationshipbetweenagegroupandappearancesatisfaction,whatis
theexpectedvalueoftheteststatistic?

Theexpectedvalueis(2)(2)=4=degreesoffreedom.

e. Use a level of 0.05 to assess if there is a significant relationship between age group and
appearancesatisfaction.

TestStatisticValue:___15.478____ pvalue:__0.003805__

Thus,there(circleyouranswer): doesdoesnot

appeartobeanassociationbetweenagegroupandappearancesatisfaction.

207

2x2Tablesaspecialcaseofthetwoproportionztest

The ztest for comparing two population proportions is the same as the chisquare test
providedthealternativeistwosided.Theztestwouldneedtobeperformedforonesided
alternatives.
Whentheconditionsfortheztestorchisquaretestarenotmet(samplesizestoosmall)
thereisanotheralternativetestcalledtheFishersExactTest.

Stat250FormulaCard:

ChiSquareTests
TestofIndependence&
TestofHomogeneity

TestforGoodnessofFit

ExpectedCount
Ei expected npi 0

ExpectedCount
row total column tot al
E expected
total n

TestStatistic

TestStatistic
2

O E
E

(observed expected)
expected

df=(r1)(c1)

O E 2
E

(observed expected) 2
expected

df=k1

IfYfollowsa 2 df distribution,thenE(Y)=dfandVar(Y)=2(df).

208

AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,write
outanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.

209

Potrebbero piacerti anche