Sei sulla pagina 1di 21

Introduction

Learningobjectivesforthislesson
Uponcompletionofthislesson,youshouldbeableto:
distinguishbetweendiscreteandcontinuousrandomvariables
explainthedifferencebetweenpopulation,parameter,sample,andstatistic
determineifagivenvaluerepresentsapopulationparameterorsamplestatistic
findprobabilitiesassociatedwithadiscreteprobabilitydistribution
computethemeanandvarianceofadiscreteprobabilitydistribution
findprobabilitiesassociatedwithabinomialdistribution
findprobabilitiesassociatedwithanormalprobabilitydistributionusingthestandardnormaltable
determinethestandarderrorforthesampleproportionandsamplemean
applytheCentralLimitTheoremproperlytoasetofcontinuousdata

RandomVariables
Arandomvariableisnumericalcharacteristicofeacheventinasamplespace,orequivalently,each
individualinapopulation.
Examples:
Thenumberofheadsinfourflipsofacoin(anumericalpropertyofeachdifferentsequenceofflips).
Heightsofindividualsinalargepopulation.
Randomvariablesareclassifiedintotwobroadtypes
Adiscreterandomvariablehasacountablesetofdistinctpossiblevalues.
Acontinuousrandomvariableissuchthatanyvalue(toanynumberofdecimalplaces)withinsome
intervalisapossiblevalue.
Examplesofdiscreterandomvariable:
Numberofheadsin4flipsofacoin(possibleoutcomesare0,1,2,3,4).
Numberofclassesmissedlastweek(possibleoutcomesare0,1,2,3,...,uptosomemaximum
number)
Amountwonorlostwhenbetting$1onthePennsylvaniaDailynumberlottery
Examplesofcontinuousrandomvariables:
Heightsofindividuals
Timetofinishatest
Hoursspentexercisinglastweek.
Note:Inpractice,wedon'tmeasureaccuratelyenoughtotrulyseeallpossiblevaluesofacontinuous
randomvariable.Forinstance,inrealitysomebodymayhaveexercised4.2341567hourslastweekbutthey

probablywouldroundoffto4.Nevertheless,hoursofexerciselastweekisinherentlyacontinuousrandom
variable.

ProbabilityDistributions:DiscreteRandomVariables
Foradiscreterandomvariable,itsprobabilitydistribution(alsocalledtheprobabilitydistribution
function)isanytable,graph,orformulathatgiveseachpossiblevalueandtheprobabilityofthat
value.Note:Thetotalofallprobabilitiesacrossthedistributionmustbe1,andeachindividualprobability
mustbebetween0and1,inclusive.
Examples:
(1)ProbabilityDistributionforNumberofHeadsin4Flipsofacoin
Heads
Probability

1/16

4/16

6/16

4/16

1/16

Thiscouldbefoundbelistingall16possiblesequencesofheadsandtailsforfourflips,andthencounting
howmanysequencesthereareforeachpossiblenumberofheads.
(2)ProbabilityDistributionfornumberoftattooseachstudenthasinapopulationofstudents
Tattoos
Probability

0.850

0.120

0.015

0.010

0.005

Thiscouldbefoundbedoingacensusofalargestudentpopulation.
CumulativeProbabilities
Often,wewishtoknowtheprobabilitythatavariableislessthanorequaltosomevalue.Thisiscalled
thecumulativeprobabilitybecausetofindtheanswer,wesimplyaddprobabilitiesforallvaluesqualifying
as"lessthanorequal"tothespecifiedvalue.
Example:Supposewewanttoknowtheprobabilitythatthenumberofheadsinfourflipsis1orless.The
qualifyingvaluesare0and1,soweaddprobabilitiesforthosetwopossibilities.
P(numberofheads=2)=P(numberofheads=0)+P(numberofheads=1)=(1/16)+(4/16)=5/16
Thecumulativedistributionisalistingofallpossiblevaluesalongwiththecumulativeprobabilityfor
eachvalue
Examples:
(1)ProbabilityDistributionandCumulativeDistributionforNumberofHeadsin4Flips
Heads

Probability

1/16

4/16

6/16

4/16

1/16

Cumulative
Probability

1/16

5/16

11/16

15/16

Eachcumulativeprobabilitywasfoundbyaddingprobabilities(insecondrow)uptotheparticularcolumn
ofthetable.Asanexample,for2heads,weaddprobabilitiesfor0,1,and2headstoget11/16.Thisisthe
probabilitythenumberofheadsistwoorless.
(2)ProbabilityDistributionandCumulativeDistributionfornumberoftattooseachstudenthasina
populationofstudents
Tattoos

Probability

0.850

0.120

0.015

0.010

0.005

Cumulative
Probability

0.850

0.970

0.985

0.995

Asanexample,probabilityarandomlyselectedstudenthas2orfewertattoos==0.985(calculatedas
0.850+0.120+0.015).

Mean,alsocalledExpectedValue,ofaDiscreteVariable
Thephraseexpectedvalueisasynonymformeanvalueinthelongrun(meaningformanyrepeatsora
largesamplesize).Foradiscreterandomvariable,thecalculationisSumof(valueprobability)wherewe
sumoverallvalues(afterseparatelycalculatingvalueprobabilityforeachvalue),expressedas:
E(X)=

,meaningwetakeeachobservedXvalueandmultiplyitbyitsrespectiveprobability.We

thenaddtheseproductstoreachourexpectedvaluelabeledE(X).[NOTE:theletterXisacommonsymbol
usedtorepresentarandomvariable.Anylettercanbeused.]
Example:Afairsixsideddieistossed.Youwin$2iftheresultisa1,youwin$1iftheresultisa6
butotherwiseyoulose$1.
TheprobabilitydistributionforX=amountwonorlostis
X

+2

+1

Probability

1/6

1/6

4/6

ExpectedValue=(2 )+(1 )+(1 )=1/6=$0.17.


Theinterpretationisthatifyouplaymanytimes,theaverageoutcomeislosing17centsperplay.
Example:Usingtheprobabilitydistributionfornumberoftattoosgivenabove(notthecumulative!),
Themeannumberoftattoosperstudentis
ExpectedValue=(00.85)+(10.12)+(20.015)+(30.010)+(40.005)=0.20.
StandardDeviationofaDiscreteVariable
Knowingtheexpectedvalueisnottheonlyimportantcharacteristiconemaywanttoknowaboutasetof

discretenumbers:onemayalsoneedtoknowthespread,orvariability,ofthesedata.Forinstance,youmay
"expect"towin$20whenplayingaparticulargame(whichappearsgood!),butthespreadforthismightbe
fromlosing$20towinning$60.Knowingsuchinformationcaninfluenceyoudecisiononwhethertoplay.
Tocalculatethestandarddeviationwefirstmustcalculatethevariance.Fromthevariance,wetakethe
squarerootandthisprovidesusthestandarddeviation.Yourbookprovidesthefollowingformulafor
calculatingthevariance:
andthestandarddeviationis:
InthisexpressionwesubstituteourresultforE(X)intou,anduissimplythesymbolusedtorepresentthe
meanofsomepopulation.
However,aneasierformulatouseandrememberforcalculatingthestandarddeviationisthefollowing:
andagainwesubstituteE(X)for.
Thestandarddeviationisthenfoundbytakingthesquarerootofthevariance.Noticeinthesummationpart
ofthisequationthatweonlysquareeachobservedXvalueandnottherespectiveprobability.
Example:Goingbacktothefirstexampleusedaboveforexpectationinvolvingthedie,wewouldcalculate
thestandarddeviationforthisdiscretedistributionbyfirstcalculatingthevariance:

Sothestandarddeviationwouldbethesquarerootof1.472,or1.213

BinomialRandomVariable
Thisisaspecifictypeofdiscreterandomvariable.Abinomialrandomvariablecountshowoftena
particulareventoccursinafixednumberortries.Foravariabletobeabinomialrandomvariable,these
conditionsmustbemet:
Thereareafixednumberoftrials(afixedsamplesize).
Oneachtrial,theevenofinteresteitheroccursordoesnot.
Theprobabilityofoccurrence(ornot)isthesameoneachtrial.
Trialsareindependentofoneanother.
Examplesofbinomialrandomvariables:
Numberofcorrectguessesat30truefalsequestionswhenyourandomlyguessallanswers
Numberofwinninglotteryticketswhenyoubuy10ticketsofthesamekind
Numberoflefthandersinarandomlyselectedsampleof100unrelatedpeople
Notation
n=numberoftrials(samplesize)

p=probabilityeventofinterestoccursonanyonetrial
Example:Fortheguessingattruequestionsexampleabove,n=30andp=.5(chanceofgettinganyone
questionright).
Probabilitiesforbinomialrandomvariables
Theconditionsforbeingabinomialvariableleadtoasomewhatcomplicatedformulaforfindingthe
probabilityanyspecificvalueoccurs(suchastheprobabilityyouget20rightwhenyouguessas20True
Falsequestions.)
We'lluseMinitabtofindprobabilitiesforbinomialrandomvariables.Don'tworryaboutthebyhand
formula.However,forthoseofyouwhoarecurious,thebyhandformulafortheprobabilityofgettinga
specificoutcomeinabinomialexperimentis:

EvaluatingtheBinomialDistribution
Onecanusetheformulatofindtheprobabilityoralternatively,useMinitabtofindtheprobability.Inthe
homework,youmayusetheonethatyouaremorecomfortablewithunlessspecifiedotherwise.
ExampleMinitab:UsingMinitab,findP(x)forn=20,x=3,and =0.4.
Calc>ProbabilityDistributions>Binomial
ChooseProbabilitysincewewanttofindtheprobabilityx=3.Chooseinputconstantandtypein3since
thatisthevalueyouwanttoevaluatetheprobabilityat.{NOTE:ThefollowinggraphicisfromMinitab
Version14.IfusingVersion15,ProbabilityofSuccesshasbeeneditedtoEventProbability.

Minitaboutput:
ProbabilityDensityFunction

Binomialwithn=20andp=0.4
x

P(X=x)

3.00

0.0123

Inthefollowingexample,weillustratehowtousetheformulatocomputebinomialprobabilities.Ifyou
don'tliketousetheformula,youcanalsojustuseMinitabtofindtheprobabilities.
Examplebyhand:Crossfertilizingaredandawhiteflowerproducesredflowers25%ofthetime.Nowwe
crossfertilizefivepairsofredandwhiteflowersandproducefiveoffspring.
Findtheprobabilitythat:
a.Therewillbenoredfloweredplantsinthefiveoffspring.
X=#ofredfloweredplantsinthefiveoffspring.Here,thenumberofredfloweredplantshasa
binomialdistributionwithn=5,p=0.25.
P(X=0)=

=1(0.25)0(0.75)5=0.237

b.CumulativeProbabilityTherewilllessthantworedfloweredplants.
Answer:
P(Xis1orless)=P(X=0)+P(X=1)=

=0.237+0.395=0.632
Inthepreviousexample,partawasfindingtheP(X=x)andpartbwasfindingP(X<=x).Thislatter
expressioniscalledfindingacumulativeprobabilitybecauseyouarefindingtheprobabilitythathas
accumulatedfromtheminimumtosomepoint,i.e.from0to1inthisexample
TouseMinitabtosolveacumulativeprobabilitybinomialproblem,returntoCalc>Probability
Distributions>Binomialasshownabove.Nowhowever,selecttheradiobuttonforCumulativeProbability
andthenentertherespectiveNumberofTrials(i.e.5),EventProbability(i.e.0.25),andclicktheradio
buttonforInputConstantandenterthexvalue(i.e.1).
ExpectedValueandStandardDeviationforBinomialrandomvariable
Theformulagivenearlierfordiscreterandomvariablescouldbeused,butthegoodnewsisthatfor
binomialrandomvariablesashortcutformulaforexpectedvalue(themean)andstandarddeviationare:

ExpectedValue=npStandardDeviation=
Afteryouusethisformulaacoupleoftimes,you'llrealizethisformulamatchesyourintuition.Forinstance,
theexpectednumberofcorrect(random)guessesat30TrueFalsequestionsisnp=(30)(.5)=15(halfof
thequestions).Forafairsixsideddierolled60times,theexpectedvalueofthenumberoftimesa1is
tossedisnp=(60)(1/6)=10.Thestandarddeviationforbothofthesewouldbe,fortheTrueFalsetest
andforthedie

ProbabilityDistributions:ContinuousRandomVariable
DensityCurves
Previouslywediscusseddiscreterandomvariables,andnowweconsiderthecontuoustype.Acontinuous
randomvariableissuchthatallvalues(toanynumberofdecimalplaces)withinsomeintervalarepossible
outcomes.Acontinuousrandomvariablehasaninfinitenumberofpossiblevaluessowecan'tassign
probabilitiestoeachspecificvalue.Ifwedid,thetotalprobabilitywouldbeinfinite,ratherthan1,asitis
supposedtobe
Todescribeprobabilitiesforacontinuousrandomvariable,weuseaprobabilitydensity
function.Aprobabilitydensityfunctionisacurvesuchthattheareaunderthecurvewithinanyintervalof
valuesalongthehorizontalgivestheprobabilityforthatinterval.

NormalRandomVariables
Themostcommonlyencounteredtypeofcontinuousrandomvariableisanormalrandomvariable,which
hasasymmetricbellshapeddensityfunction.Thecenterpointofthedistributionisthemeanvalue,denoted
by(pronounced"mew").Thespreadofthedistributionisdeterminedbythevariance,denotedby
2(pronounced"sigmasquared")orbythesquarerootofthevariancecalledstandarddeviation,denotedby
(pronounced"sigma").
Example:Supposevehiclespeedsatahighwaylocationhaveanormaldistributionwithmean=65mph
andstandarddeviations=5mph.Theprobabilitydensityfunctionisshownbelow.Noticethatthe
horizontalaxisshowsspeedsandthebelliscenteredatthemean(65mph).

ProbabilityforanInterval=Areaunderthedensitycurveinthatinterval
Thenextfigureshowstheprobabilitythatthespeedofarandomlyselectedvehiclewillbebetween60and
73mileperhour,withthisprobabilityequaltotheareaunderthecurvebetween60and73.

EmpiricalRuleReview
Recallthatourfirstlessonwelearnedthatforbellshapeddata,about95%ofthedatavalueswillbeinthe
intervalmean(2std.dev).Inourexample,thisis65(25),or55to75.Thenextfigureshowsthat
theprobabilityisabout0.95(about95%)thatarandomlyselectedvehiclespeedisbetween55and75.

TheEmpiricalRulealsostatedthatabout99.7%(nearlyall)ofabellshapeddatasetwillbeinthe

intervalmean(3std.dev).Thisis65(35),or50to80forexample.Noticethatthisintervalroughly
givesthecompleterangeofthedensitycurveshownabove.

FindingProbabilitiesforaNormalRandomVariable
Rememberthatthecumulativeprobabilityforavalueistheprobabilitylessthanorequaltothatvalue.
Minitab,Excel,andtheTI83seriesofcalculatorswillgivethecumulativeprobabilityforanyvalueof
interestinaspecificnormalcurve.
Forourexampleofvehiclespeeds,hereisMinitaboutputshowingthattheprobability=0.9542thatthe
speedofarandomlyselectedvehicleislessthanorequalto73mph.

Tofindthisprobability,useCalc>ProbabilityDistribution>Normal,specifythemeanandstandard
deviationandenterthevalueofinterestas"InputConstant."Here'swhatitlookslikeforourexample.

Hereisafigurethatillustratesthecumulativeprobabilitywefoundusingthisprocedure.

"Greaterthan"Probabilities
Sometimeswewanttoknowtheprobabilitythatavariablehasavaluegreaterthansomevalue.For

instance,wemightwanttoknowtheprobabilitythatarandomlyselectedvehiclespeedisgreaterthan73
mph,writtenP(X>73).
Forourexample,probabilityspeedisgreaterthan73=10.9452=0.0548.
Thegeneralrulefora"greaterthan"situationis
P(greaterthanavalue)=1P(lessthanorequaltothevalue)
Example:UsingMinitabwecanfindthattheprobability=0.1587thataspeedislessthanorequalto60
mph.Thustheprobabilityaspeedisgreaterthan60mph=10.1587=0.8413.
TherelevantMinitaboutputandafigureshowingthecumulativeprobabilityfor60mphfollows:

"Inbetween"Probabilities
Supposewewanttoknowtheprobabilityanormalrandomvariableiswithinaspecifiedinterval.For
instance,supposewewanttoknowtheprobabilityarandomlyselectedspeedisbetween60and73mph.
Thesimplestapproachistosubtractthecumulativeprobabilityfor60mphfromthecumulativeprobability
for73.Theansweris
Probabilityspeedisbetween60and73=0.94520.1587=0.7875.
ThiscanbewrittenasP(60<X<73)=0.7875,whereXisspeed.
Thegeneralruleforan"inbetween"probabilityis
P(betweenaandb)=cumulativeprobabilityforvaluebcumulativeprobabilityforvaluea

FindingCumulativeProbabilities
UsingtheStandardNormalTableintheappendixoftextbookorseeacopyatStandardNormalTable
TableA.1inthetextbookgivesnormalcurvecumulativeprobabilitiesforstandardizedscores.

Astandardizedscore(alsocalledzscore)is

RowlabelsofTableA.1givepossiblezscoresuptoonedecimalplace.Thecolumnlabelsgivethe
seconddecimalplaceofthezscore.
Thecumulativeprobabilityforavalueequalsthecumulativeprobabilityforthatvalue'szscore.Here,
probabilityspeedlessthanorequal73mph=probabilityzscorelessthanorequal1.60.Howdidwearrive
atthiszscore?
Example
Inourvehiclespeedexample,thestandardizedscoresfor73mphis
.
Welookinthe".00"columnofthe"1.6"row(1.6plus.00equals1.60)tofindthatthecumulative
probabilityforz=1.60is0.9452,thesamevaluewegotearlierasthecumulativeprobabilityforspeed=73
mph.

Example
Forspeed=60thezscoreis

.
TableA.1givesthisinformation:

Thecumulativeprobabilityis.1587forz=1.00andthisisalsothecumulativeprobabilityforaspeedof60
mph.

Example
Supposepulseratesofadultfemaleshaveanormalcurvedistributionwithmean=75andstandard
deviations=8.Whatistheprobabilitythatarandomlyselectedfemalehasapulserategreaterthan
85?Becareful!Noticewewanta"greaterthan"andtheintervalwewantisentirelyaboveaverage,sowe
knowtheanswermustbelessthan0.5.
IfweuseTableA.1,thefirststepistocalculateazscoreof85.

InformationfromTableA.1is

Usethe"05"columntofindthatthecumulativeprobabilityforz=1.25is0.8944.
Thisisnotyettheanswer.Thisistheprobabilitythepulseislessthanorequalto85.Wewantagreaterthan
probabilitysotheansweris
P(greaterthan85)=1P(lessthanorequal85)=10.8944=0.1056.

FindingPercentiles
Wemaywishtoknowthevalueofavariablethatisaspecifiedpercentileofthevalues.
Wemightaskwhatspeedisthe99.99thpercentileofspeedsatthehighwaylocationinourearlier
example.
Wemightwanttoknowwhatpulserateisthe25thpercentileofpulserates.
InMinitab,wecanfindpercentilesusingtheCalc>ProbabilityDistributions>Normalbutwehavetomake
twochangestowhatwedidbefore.(1)Clickonthe"InverseCumulativeProbability"radiobutton(rather
thancumulativeprobability)and(2)enterthepercentilerankingasadecimalfractioninthe"Input
Constant"box.
The99.99thpercentileofspeeds(whenmean=65andstandarddeviation=5)isabout83.6mph.
OutputfromMinitabfollows.Noticethatnowthespecifiedcumulativeprobabilityisgivenfirst,and
thenthecorrespondingspeed.

The25thpercentileofpulserates(when=75ands=8)isabout69.6.RelevantMinitaboutputis

NormalApproximationtotheBinomial
Rememberbinomialrandomvariablesfromlastweek'sdiscussion?Abinomialrandomvariablecanalsobe
approximatedbyusingnormalrandomvariablemethodsdiscussedabove.Thisapproximationcantake
placeaslongas:
1.Thepopulationsizemustbeatleast10timesthesamplesize.
2.np=10andn(1p)=10.[Theseconstraintstakecareofpopulationshapesthatareunbalanced
becausepistoocloseto0orto1.]

Themeanofabinomialrandomvariableiseasytograspintuitively:Saytheprobabilityofsuccessforeach
observationis0.2andwemake10observations.Thenontheaverageweshouldhave10*0.2=2
successes.Thespreadofabinomialdistributionisnotsointuitive,sowewillnotjustifyourformulafor
standarddeviation.
IfsamplecountXofsuccessesisabinomialrandomvariablefornfixedobservationswithprobabilityof
successpforeachobservation,thenXhasameanandstandarddeviationasdiscussedinsection8.4of:
Mean=npandstandarddeviation=
Andaslongastheabove2requirementsarefornandparesatisfied,wecanapproximateXwithanormal
randomvariablehavingthesamemeanandstandarddeviationandusethenormalcalculationsdiscussed
previouslyinthesenotestosolveforprobabilitiesforX.

ReviewofFindingProbabilities
ClickontheInspecticonforanaudio/visualexampleforeachsituationdescribed.When
reviewinganyoftheseexampleskeepinthattheyapplywhen:
1.Thevariableinquestionfollowsanormal,orbellshaped,distribution
2.Ifthevariableisnotinstandardized,thenyouneedtostandardizedthevaluefirstby
.

Finding"LessThan"Probability
Finding"GreaterThan"Probability
Finding"Between"Probability
Finding"Either/Or"Probability

PopulationParametersandSampleStatistics
S1Asurveyiscarriedoutatauniversitytoestimatetheproportionofundergraduateslivingathome
duringthecurrentterm.Population:undergraduatesattheuniversityParameter:thetrueproportionof
undergraduatesthatliveathomeSample:theundergraduatessurveyedStatistic:theproportionofthe
sampledstudentswholiveathomeusedtoestimatethetrueproportion
S2Astudyisconductedtofindtheaveragehourscollegestudentsspendpartyingonthe
weekend.Population:allcollegestudentsParameter:thetruemeannumberofhourscollegestudents
spendpartyingontheweekendSample:thestudentssampledforthestudyStatistic:themeanhoursof
weekendpartyingcalculatedfromthesample
S1isconcernedaboutestimatingaproportionpwhereprepresentsthetrue(typicallyunknown)parameter

and [pronounced"phat"]representsthestatisticcalculatedfromthesample
S2isconcernedaboutestimatingameanuwhereu[pronounced"mew"]representsthetrue(typically
unknown)parameterand [pronounce"xbar"]representsthestatisticcalculatedfromthesample.
Ineithercasethestatisticisusedtoestimatetheparameter.Thestatisticcanvaryfromsampletosample,
buttheparameterisunderstoodtobefixed.
Thestatistic,then,cantakeonvariousvaluesdependingontheresultofrepeatedrandomsampling.The
distributionofthesepossiblevaluesisknownasthesamplingdistribution.
Overviewofsymbols
Thefollowingtableofsymbolsprovidessomeofthecommonnotationthatwewillseethroughtheremaing
sections.

Thedifferencebetween"paired"samplesand"independent"samplescanbemosteasilyexplainedbythe
situationwheretheobservationsaretakenonthesameindividual(e.g.measureaperson'sstresslevelbefore
andafteranexam)whereindependentwouldconsistoftakingobservationsfromtwodistinctgroups(e.g.
measurethestresslevelsofmenandwomenbeforeanexamandcomparethesestresslevels).Anexception
tothisasituationthatinvolvesanalyzingspouses.Insuchcases,spousaldataisoftenlinkedaspaireddata.

SamplingDistributionsofSampleStatistics
Twocommonstatisticsarethesampleproportion, ,(readaspihat)andsamplemean, ,(readasx
bar).Samplestatisticsarerandomvariablesandthereforevaryfromsampletosample.Forinstance,
considertakingtworandomsamples,eachsampleconsistingof5students,fromaclassandcalculatingthe
meanheightofthestudentsineachsample.Wouldyouexpectbothsamplemeanstobeexactlythesame?
Asaresult,samplestatisticsalsohaveadistributioncalledthesamplingdistribution.Thesesampling
distributions,similartodistributionsdiscussedpreviuosly,haveameanandstandarddeviation.However,
werefertothestandarddeviationofasamplingdistributionasthestandarderror.Thus,thestandarderror
issimplythestandarddeviationofasamplingdistribituion.Oftentimespeoplewillinterchangethesetwo
terms.Thisisokayaslongasyouunderstandthedistinctionbetweenthetwo:standarderrorrefers
tosamplingdistributionsandstandarddeviationrefestoprobabilitydistributions.

SamplingDistributionsforSampleProportion,
Ifnumerousrepetitionsofsamplesaretaken,thedistributionof issaidtoapproximateanormalcurve
distribution.Alternatively,thiscanbeassumedifBOTHn*pandn*(1p)areatleast10.[SPECIAL
NOTE:Sometextbooksuse15insteadof10believingthat10istoliberal.Wewilluse10forour
discussions.]Usingthis,wecanestimatethetruepopulationproportion,p,by andthetruestandard
deviationofpbys.e.( )=

,wheres.e.( )isinterpretedasthestandarderrorof

ProbabilitiesaboutthenumberXofsuccessesinabinomialsituationarethesameasprobabilitiesabout
correspondingproportions.
Ingeneral,ifnp>=10andn(1p)>=10,thesamplingdistributionof isaboutnormalwithmeanofpand
standarderrorSE( )=

Example.Supposetheproportionofallcollegestudentswhohaveusedmarijuanainthepast6months
isp=.40.ForaclassofsizeN=200,representativeofallcollegestudentsonuseofmarijuana,whatisthe
chancethattheproportionofstudentswhohaveusedmjinthepast6monthsislessthan.32(or32%)?
Solution.Themeanofthesampleproportion ispandthestandarderrorof isSE( )=

.For

thismarijuanaexample,wearegiventhatp=.4.WethendetermineSE( )=

=0.0346
So,thesampleproportion isaboutnormalwithmeanp=.40andSE( )=0.0346.
Thezscorefor.32isz=(.32.40)/0.0346=2.31.ThenusingStandardNormalTable
Prob( <.32)=Prob(Z<.2.31)=0.0104.
Questiontoponder:Ifyouobservedasampleproportionof.32wouldyoubelieveaclaimthat40%of
collegestudentsusedmjinthepast6months?Orwouldyouthinktheproportionislessthan.40?
SamplingDistributionoftheSampleMean
Thecentrallimittheoremstatesthatifalargeenoughsampleistaken(typicallyn>30)thenthesampling
distributionof isapproximatelyanormaldistributionwithameanofandastandarddeviationof
Sinceinpracticeweusuallydonotknoworweestimatetheseby and

respectively.Inthis

casesistheestimateofandisthestandarddeviationofthesample.Theexpression

isknownasthe

standarderrorofthemean,labeleds.e.( )
Simulation:Generate500samplesofsizeheightsof4men.Assumethedistributionofmaleheightsis
normalwithmeanm=70"andstandarddeviations=3.0".Thenfindthemeanofeachof500samplesof

size4.
Herearethefirst10samplemeans:
70.472.072.369.970.570.070.568.169.271.8

Theorysaysthatthemeanof( )==70whichisalsothePopulationMeanandSE( )=

=1.50.
Simulationshows:Average(500 's)=69.957andSE(of500 's)=1.496
Changethesamplesizefromn=4ton=25andgetdescriptivestatistics:

Theorysaysthatthemeanof( )==70whichisalsothePopulationMeanandSE( )=

=0.60.
Simulationshows:Average(500 's)=69.983andSE(of500 's)=0.592
SamplingDistributionofSampleMean fromaNonNormalPopulation
Simulation:BelowisaHistogramofNumberofCdsOwnedbyPSUStudents.Thedistributionisstrongly
skewedtotheright.

AssumethePopulationMeanNumberofCDsownedis=84ands=96
Let'sobtain500samplesofsize4fromthispopulationandlookatthedistributionofthe500xbars:

Theorysaysthatthemeanof( )==84whichisalsothePopulationMeantheSE( )=48=


SimulationshowsAverage(500 's)=81.11andSE(500 'sforsamplesofsize4)=45.1
Changethesamplesizefromn=4ton=25andgetdescriptivestatisticsandcurve:

Theorysaysthatthemeanof( )==84whichisalsothePopulationMeanandtheSE( )=

19.2SimulationshowsAverage(500 's)=83.281andSE(500 'sforsamplesofsize25)=18.268.A


histogramofthe500 'scomputedfromsamplesofsize25isbeginningtolookalotlikeanormalcurve.
i.TheLawofLargeNumberssaysthatasthesamplesizeincreasesthesamplemeanwillapproachthe
populationmean.
ii.TheCentralLimitTheoremsaysthatasthesamplesizeincreasesthesamplingdistributionof (read
xbar)approachesthenormaldistribution.Weseethiseffecthereforn=25.Generally,weassumethata
samplesizeofn=30issufficienttogetanapproximatenormaldistributionforthedistributionofthe

samplemean.
iii.TheCentralLimitTheoremisimportantbecauseitenablesustocalculateprobabilitiesaboutsample
means.
Example.FindtheapproximateprobabilitythattheaveragenumberofCDsownedwhen100studentsare
askedisbetween70and90.
Solution.Sincethesamplesizeisgreaterthan30,weassumethesamplingdistributionof isaboutnormal
withmeanm=84andSE( )=

=9.6.WeareaskedtofindProb(70<

<90).Thezscores

forthetwovaluesare
for90:z=(9084)/9.6=0.625andfor70:z=(7084)/9.6=1.46.Fromtablesofthenormaldistribution
wegetP(1.46<Z<0.625)=.734.072=.662.
Supposethesamplesizewas1600insteadof100.Thenthedistributionof wouldbeaboutnormalwith
mean84andstandarddeviation

=96/40=2.4.Fromtheempiricalruleweknowthatalmost

allxbarsforsamplesofsize1600willbeintheinterval
84(3)(2.4)orintheinterval847.2orbetween76.8and91.2.TheLawofLargeNumberssaysthatas
weincreasethesamplesizetheprobabilitythatthesamplemeanapproachesthepopulationmeanis1.00!
APPLET
HereisanappletdevelopedbythefolksatRiceUniversitythatsimulates"samplingdistribution".The
objecthereistogiveyouachancetoexplorevariousaspectsofsamplingdistributions.Whentheapplet
begins,ahistogramofanormaldistributionisdisplayedatthetopicofthescreen.
Thedistributionportrayedatthetopofthescreenisthepopulationfromwhichsamplesaretaken.The
meanofthedistributionisindicatedbyasmallbluelineandthemedianisindicatedbyasmallpurpleline.
Sincethemeanandmedianarethesame,thetwolinesoverlap.Theredlineextendsfromthemeanone
standarddeviationineachdirection.Notethecorrespondencebetweenthecolorsusedonthehistogram
andthestatisticsdisplayedtotheleftofthehistogram.
Thesecondhistogramdisplaysthesampledata.Thishistogramisinitiallyblank.Thethirdandfourth
histogramsshowthedistributionofstatisticscomputedfromthesampledata.Thenumberofsamples
(replications)thatthethirdandfourthhistogramsarebasedonisindicatedbythelabel"Reps=."
BasicOperation
Thesimulationissettoinitiallysamplefivenumbersfromthepopulation,computethemeanofthefive
numbers,andplotthemean.Clickthe"Animatedsample"buttonandyouwillseethefivenumbersappear
inthehistogram.Themeanofthefivenumberswillbecomputedandthemeanwillbeplottedinthethird
histogram.Dothisseveraltimestoseethedistributionofmeansbegintobeformed.Onceyouseehow
thisworks,youcanspeedthingsupbytaking5,1,000,or10,000samplesatatime.
noappletsupport

Noticethatasyouincreasethesamplesize,regardlessoftheshapeyoucreate,thedistribution(i.e.lookat
thehistogram)becomesmorebellshaped.Thisisthetheoreticalmeaningbehindthecentrallimit
theorem:assamplesizeincreases,thendespitethatthepopulationfromwhichthesampleoriginatedisnot
normal(e.g.uniformorchisquare),thesamplemeanwillapproximateanormaldistribution

ReviewofSamplingDistributions
Inlaterpartofthelastlessonwediscussedfindingtheprobabilityforacontinuousrandomvariablethat
followedanormaldistribution.Wedidsobyconvertingtheobservedscoretoastandardizedzscoreand
thenapplyingStandardNormalTable.Forexample:
IQscoresarenormallydistributedwithmean,,of110andstandarddeviation,,equalto25.Letthe
randomvariableXbearandomlychosenscore.Findtheprobabilityofarandomlychosenscoreexceedinga
100.Thatis,findP(X>100).Tosolve,

Butwhataboutsituationswhenwehavemorethanonesample,thatisthesamplesizeisgreaterthan1?In
practice,usuallyjustonerandomsampleistakenfromapopulationofquantitativeorqualitativevaluesand
thestatistic thesamplemeanor thesampleproportion,respectively,ismeasuredonetimeonly.For
instance,ifwewantedtoestimatewhatproportionofPSUstudentsagreedwiththePresident'sexplanation
totherisingtuitioncostswewouldonlytakeonerandomsample,ofsomesize,andusethissampletomake
anestimate.Wewouldnotcontinuetotakesamplesandmakeestimatesasthiswouldbecostlyand
inefficient.Forsamplestakenatrandom,samplemean{orsampleproportion}isarandomvariable.Toget
anideaofhowsucharandomvariablebehavesweconsiderthisvariable'ssamplingdistributionwhichwe
discussedpreviouslyinthislesson.
ConsiderthepopulationofpossiblerollsXforasinglesixsidediehasamean,,equalto3.5anda
standarddeviation,,equalto1.7.[Ifyoudonotbelievethisrecallourdiscussionofprobabilitiesfor
discreterandomvariables.Forthesixsidedieyouhavesixpossibleoutcomeseachwiththesame1/6
probabilityofbeingrolled.Applyingyourrecentknowledge,calculatethemeanandstandarddeviationand
seewhatyouget!]Ifwerolledthedietwice,thesamplemean, ofthesetworollscantakeonvarious
valuesbasedonwhatnumberscomeup.Sincetheseresultsaresubjecttothelawsofchancetheycanbe
definedasarandomvariable.Fromthebeginningofthesemesterwecanapplywhatwelearnedto
summarizedistributionsbyitscenter,spread,andshape.
1.Sometimesthemeanrollof2dicewillbelessthan3.5,othertimesgreaterthan3.5.Itshouldbejust
aslikelytogetalowerthanaveragemeanthatitistogetahigherthanaveragemean,butthe
samplingdistributionofthesamplemeanshouldbecenteredat3.5.
2.Fortherollof2dice,thesamplemeancouldbespreadallthewayfrom1to6thinkiftwo"1s"or
two"6s"aretossed.
3.Themostlikelymeanrollfromthetwodiceis3.5allcombinationswherethesumis7.Thelower
andhigherthemeanrolls,thelesslikelytheyaretooccur.Sotheshapeofthedistributionofthe
samplemeansfromtworollswouldtaketheformofatriangle.
Ifweincreasethesamplesize,i.e.thenumberofrolls,tosay10,thenthissamplemeanisalsoarandom

variable.
1.Sometimesthemeanrollof10dicewillbelessthan3.5andsometimesgreaterthan3.5.Similarto
whenwerolledthedice2times,thesampledistributionof for10rollsshouldbecenteredat3.5.
2.For10rolls,thedistributionofthesamplemeanwouldnotbeasspreadasthatfor2rolls.Gettinga
"1"ora"6"onall10rollswillalmostneveroccur.
3.Themostlikelymeanrollisstill3.5withlowerorhighermeanrollsgettingprogressivelylesslikely.
Butnowthereisamuchbetterchanceoftheforthesamplemeanofthe10rollstobecloseto3.5,
andamuchworsechanceforthissamplemeantobenear1or6.Therefore,theshapeofthesampling
distributionfor10rollsbulgesat3.5andtapersoffateitherendtada!Theshapelooksbellshaped
ornormal!
Thisdieexampleillustratesthegeneralresultofthecentrallimittheorem:regardlessofthepopulation
distribution(thedistributionforthedieiscalledauniformdistributionbecauseeachoutcomeisequally
likely)thedistributionofthesamplemeanwillapproachnormalassamplesizeincreasesandthesample
mean, hasthefollowingcharacteristics:
1.Thedistributionof iscenteredat
2.Thespreadof canbemeasuredbyitsstandarddeviation,,equalto

Example
Assumewomen'sheightsarenormallydistributedwith=64.5inchesand=2.5inches.Pickonewomen
atrandom.AccordingtotheEmpiricalRule,theprobabilityis:
68%thatherheightXisbetween62inchesand68inches
95%thatherheightXisbetween59.5inchesand69.5inches
99.7%thatherheightXisbetween57inchesand72inches
Nowpickarandomsampleofsize25women.Thesamplemeanheight, isnormalwithexpectedvalue
(i.e.mean)of64.5inchesandstandarddeviation,

,equalto0.5.Theprobabilityis:

68%thattheirsamplemeanheight isbetween64inchesand65inches
95%thattheirsamplemeanheight isbetween63.5inchesand65.5inches
99.7%thattheirsamplemeanheight isbetween63inchesand66inches
UsingStandardNormalTableformoreexactprobabilitiesinsteadoftheEmpiricalRule,whatisthe
probabilitythatthesamplemeanheightof25womenislessthan63.75inches?

Proportions

Similarlawsapplyforproportions.Thedifferencesare:
1.FortheCentralLimitTheoremtoapply,werequirethatbothn>=10andn(1)>=10,whereis
thetruepopulationproportion.Ifisunknownthenwecansubstitutethesampleproportion, .
2.Thedistributionofthesampleproportion, ,willhaveameanequaltoandstandarddeviationof
.
Tofindprobabilitiesassociatedwithsome wefollowsimilarcalculationsasthatforsamplemeans:

2007ThePennsylvaniaStateUniversity.Allrightsreserved.

Potrebbero piacerti anche