Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Learningobjectivesforthislesson
Uponcompletionofthislesson,youshouldbeableto:
distinguishbetweendiscreteandcontinuousrandomvariables
explainthedifferencebetweenpopulation,parameter,sample,andstatistic
determineifagivenvaluerepresentsapopulationparameterorsamplestatistic
findprobabilitiesassociatedwithadiscreteprobabilitydistribution
computethemeanandvarianceofadiscreteprobabilitydistribution
findprobabilitiesassociatedwithabinomialdistribution
findprobabilitiesassociatedwithanormalprobabilitydistributionusingthestandardnormaltable
determinethestandarderrorforthesampleproportionandsamplemean
applytheCentralLimitTheoremproperlytoasetofcontinuousdata
RandomVariables
Arandomvariableisnumericalcharacteristicofeacheventinasamplespace,orequivalently,each
individualinapopulation.
Examples:
Thenumberofheadsinfourflipsofacoin(anumericalpropertyofeachdifferentsequenceofflips).
Heightsofindividualsinalargepopulation.
Randomvariablesareclassifiedintotwobroadtypes
Adiscreterandomvariablehasacountablesetofdistinctpossiblevalues.
Acontinuousrandomvariableissuchthatanyvalue(toanynumberofdecimalplaces)withinsome
intervalisapossiblevalue.
Examplesofdiscreterandomvariable:
Numberofheadsin4flipsofacoin(possibleoutcomesare0,1,2,3,4).
Numberofclassesmissedlastweek(possibleoutcomesare0,1,2,3,...,uptosomemaximum
number)
Amountwonorlostwhenbetting$1onthePennsylvaniaDailynumberlottery
Examplesofcontinuousrandomvariables:
Heightsofindividuals
Timetofinishatest
Hoursspentexercisinglastweek.
Note:Inpractice,wedon'tmeasureaccuratelyenoughtotrulyseeallpossiblevaluesofacontinuous
randomvariable.Forinstance,inrealitysomebodymayhaveexercised4.2341567hourslastweekbutthey
probablywouldroundoffto4.Nevertheless,hoursofexerciselastweekisinherentlyacontinuousrandom
variable.
ProbabilityDistributions:DiscreteRandomVariables
Foradiscreterandomvariable,itsprobabilitydistribution(alsocalledtheprobabilitydistribution
function)isanytable,graph,orformulathatgiveseachpossiblevalueandtheprobabilityofthat
value.Note:Thetotalofallprobabilitiesacrossthedistributionmustbe1,andeachindividualprobability
mustbebetween0and1,inclusive.
Examples:
(1)ProbabilityDistributionforNumberofHeadsin4Flipsofacoin
Heads
Probability
1/16
4/16
6/16
4/16
1/16
Thiscouldbefoundbelistingall16possiblesequencesofheadsandtailsforfourflips,andthencounting
howmanysequencesthereareforeachpossiblenumberofheads.
(2)ProbabilityDistributionfornumberoftattooseachstudenthasinapopulationofstudents
Tattoos
Probability
0.850
0.120
0.015
0.010
0.005
Thiscouldbefoundbedoingacensusofalargestudentpopulation.
CumulativeProbabilities
Often,wewishtoknowtheprobabilitythatavariableislessthanorequaltosomevalue.Thisiscalled
thecumulativeprobabilitybecausetofindtheanswer,wesimplyaddprobabilitiesforallvaluesqualifying
as"lessthanorequal"tothespecifiedvalue.
Example:Supposewewanttoknowtheprobabilitythatthenumberofheadsinfourflipsis1orless.The
qualifyingvaluesare0and1,soweaddprobabilitiesforthosetwopossibilities.
P(numberofheads=2)=P(numberofheads=0)+P(numberofheads=1)=(1/16)+(4/16)=5/16
Thecumulativedistributionisalistingofallpossiblevaluesalongwiththecumulativeprobabilityfor
eachvalue
Examples:
(1)ProbabilityDistributionandCumulativeDistributionforNumberofHeadsin4Flips
Heads
Probability
1/16
4/16
6/16
4/16
1/16
Cumulative
Probability
1/16
5/16
11/16
15/16
Eachcumulativeprobabilitywasfoundbyaddingprobabilities(insecondrow)uptotheparticularcolumn
ofthetable.Asanexample,for2heads,weaddprobabilitiesfor0,1,and2headstoget11/16.Thisisthe
probabilitythenumberofheadsistwoorless.
(2)ProbabilityDistributionandCumulativeDistributionfornumberoftattooseachstudenthasina
populationofstudents
Tattoos
Probability
0.850
0.120
0.015
0.010
0.005
Cumulative
Probability
0.850
0.970
0.985
0.995
Asanexample,probabilityarandomlyselectedstudenthas2orfewertattoos==0.985(calculatedas
0.850+0.120+0.015).
Mean,alsocalledExpectedValue,ofaDiscreteVariable
Thephraseexpectedvalueisasynonymformeanvalueinthelongrun(meaningformanyrepeatsora
largesamplesize).Foradiscreterandomvariable,thecalculationisSumof(valueprobability)wherewe
sumoverallvalues(afterseparatelycalculatingvalueprobabilityforeachvalue),expressedas:
E(X)=
,meaningwetakeeachobservedXvalueandmultiplyitbyitsrespectiveprobability.We
thenaddtheseproductstoreachourexpectedvaluelabeledE(X).[NOTE:theletterXisacommonsymbol
usedtorepresentarandomvariable.Anylettercanbeused.]
Example:Afairsixsideddieistossed.Youwin$2iftheresultisa1,youwin$1iftheresultisa6
butotherwiseyoulose$1.
TheprobabilitydistributionforX=amountwonorlostis
X
+2
+1
Probability
1/6
1/6
4/6
discretenumbers:onemayalsoneedtoknowthespread,orvariability,ofthesedata.Forinstance,youmay
"expect"towin$20whenplayingaparticulargame(whichappearsgood!),butthespreadforthismightbe
fromlosing$20towinning$60.Knowingsuchinformationcaninfluenceyoudecisiononwhethertoplay.
Tocalculatethestandarddeviationwefirstmustcalculatethevariance.Fromthevariance,wetakethe
squarerootandthisprovidesusthestandarddeviation.Yourbookprovidesthefollowingformulafor
calculatingthevariance:
andthestandarddeviationis:
InthisexpressionwesubstituteourresultforE(X)intou,anduissimplythesymbolusedtorepresentthe
meanofsomepopulation.
However,aneasierformulatouseandrememberforcalculatingthestandarddeviationisthefollowing:
andagainwesubstituteE(X)for.
Thestandarddeviationisthenfoundbytakingthesquarerootofthevariance.Noticeinthesummationpart
ofthisequationthatweonlysquareeachobservedXvalueandnottherespectiveprobability.
Example:Goingbacktothefirstexampleusedaboveforexpectationinvolvingthedie,wewouldcalculate
thestandarddeviationforthisdiscretedistributionbyfirstcalculatingthevariance:
Sothestandarddeviationwouldbethesquarerootof1.472,or1.213
BinomialRandomVariable
Thisisaspecifictypeofdiscreterandomvariable.Abinomialrandomvariablecountshowoftena
particulareventoccursinafixednumberortries.Foravariabletobeabinomialrandomvariable,these
conditionsmustbemet:
Thereareafixednumberoftrials(afixedsamplesize).
Oneachtrial,theevenofinteresteitheroccursordoesnot.
Theprobabilityofoccurrence(ornot)isthesameoneachtrial.
Trialsareindependentofoneanother.
Examplesofbinomialrandomvariables:
Numberofcorrectguessesat30truefalsequestionswhenyourandomlyguessallanswers
Numberofwinninglotteryticketswhenyoubuy10ticketsofthesamekind
Numberoflefthandersinarandomlyselectedsampleof100unrelatedpeople
Notation
n=numberoftrials(samplesize)
p=probabilityeventofinterestoccursonanyonetrial
Example:Fortheguessingattruequestionsexampleabove,n=30andp=.5(chanceofgettinganyone
questionright).
Probabilitiesforbinomialrandomvariables
Theconditionsforbeingabinomialvariableleadtoasomewhatcomplicatedformulaforfindingthe
probabilityanyspecificvalueoccurs(suchastheprobabilityyouget20rightwhenyouguessas20True
Falsequestions.)
We'lluseMinitabtofindprobabilitiesforbinomialrandomvariables.Don'tworryaboutthebyhand
formula.However,forthoseofyouwhoarecurious,thebyhandformulafortheprobabilityofgettinga
specificoutcomeinabinomialexperimentis:
EvaluatingtheBinomialDistribution
Onecanusetheformulatofindtheprobabilityoralternatively,useMinitabtofindtheprobability.Inthe
homework,youmayusetheonethatyouaremorecomfortablewithunlessspecifiedotherwise.
ExampleMinitab:UsingMinitab,findP(x)forn=20,x=3,and =0.4.
Calc>ProbabilityDistributions>Binomial
ChooseProbabilitysincewewanttofindtheprobabilityx=3.Chooseinputconstantandtypein3since
thatisthevalueyouwanttoevaluatetheprobabilityat.{NOTE:ThefollowinggraphicisfromMinitab
Version14.IfusingVersion15,ProbabilityofSuccesshasbeeneditedtoEventProbability.
Minitaboutput:
ProbabilityDensityFunction
Binomialwithn=20andp=0.4
x
P(X=x)
3.00
0.0123
Inthefollowingexample,weillustratehowtousetheformulatocomputebinomialprobabilities.Ifyou
don'tliketousetheformula,youcanalsojustuseMinitabtofindtheprobabilities.
Examplebyhand:Crossfertilizingaredandawhiteflowerproducesredflowers25%ofthetime.Nowwe
crossfertilizefivepairsofredandwhiteflowersandproducefiveoffspring.
Findtheprobabilitythat:
a.Therewillbenoredfloweredplantsinthefiveoffspring.
X=#ofredfloweredplantsinthefiveoffspring.Here,thenumberofredfloweredplantshasa
binomialdistributionwithn=5,p=0.25.
P(X=0)=
=1(0.25)0(0.75)5=0.237
b.CumulativeProbabilityTherewilllessthantworedfloweredplants.
Answer:
P(Xis1orless)=P(X=0)+P(X=1)=
=0.237+0.395=0.632
Inthepreviousexample,partawasfindingtheP(X=x)andpartbwasfindingP(X<=x).Thislatter
expressioniscalledfindingacumulativeprobabilitybecauseyouarefindingtheprobabilitythathas
accumulatedfromtheminimumtosomepoint,i.e.from0to1inthisexample
TouseMinitabtosolveacumulativeprobabilitybinomialproblem,returntoCalc>Probability
Distributions>Binomialasshownabove.Nowhowever,selecttheradiobuttonforCumulativeProbability
andthenentertherespectiveNumberofTrials(i.e.5),EventProbability(i.e.0.25),andclicktheradio
buttonforInputConstantandenterthexvalue(i.e.1).
ExpectedValueandStandardDeviationforBinomialrandomvariable
Theformulagivenearlierfordiscreterandomvariablescouldbeused,butthegoodnewsisthatfor
binomialrandomvariablesashortcutformulaforexpectedvalue(themean)andstandarddeviationare:
ExpectedValue=npStandardDeviation=
Afteryouusethisformulaacoupleoftimes,you'llrealizethisformulamatchesyourintuition.Forinstance,
theexpectednumberofcorrect(random)guessesat30TrueFalsequestionsisnp=(30)(.5)=15(halfof
thequestions).Forafairsixsideddierolled60times,theexpectedvalueofthenumberoftimesa1is
tossedisnp=(60)(1/6)=10.Thestandarddeviationforbothofthesewouldbe,fortheTrueFalsetest
andforthedie
ProbabilityDistributions:ContinuousRandomVariable
DensityCurves
Previouslywediscusseddiscreterandomvariables,andnowweconsiderthecontuoustype.Acontinuous
randomvariableissuchthatallvalues(toanynumberofdecimalplaces)withinsomeintervalarepossible
outcomes.Acontinuousrandomvariablehasaninfinitenumberofpossiblevaluessowecan'tassign
probabilitiestoeachspecificvalue.Ifwedid,thetotalprobabilitywouldbeinfinite,ratherthan1,asitis
supposedtobe
Todescribeprobabilitiesforacontinuousrandomvariable,weuseaprobabilitydensity
function.Aprobabilitydensityfunctionisacurvesuchthattheareaunderthecurvewithinanyintervalof
valuesalongthehorizontalgivestheprobabilityforthatinterval.
NormalRandomVariables
Themostcommonlyencounteredtypeofcontinuousrandomvariableisanormalrandomvariable,which
hasasymmetricbellshapeddensityfunction.Thecenterpointofthedistributionisthemeanvalue,denoted
by(pronounced"mew").Thespreadofthedistributionisdeterminedbythevariance,denotedby
2(pronounced"sigmasquared")orbythesquarerootofthevariancecalledstandarddeviation,denotedby
(pronounced"sigma").
Example:Supposevehiclespeedsatahighwaylocationhaveanormaldistributionwithmean=65mph
andstandarddeviations=5mph.Theprobabilitydensityfunctionisshownbelow.Noticethatthe
horizontalaxisshowsspeedsandthebelliscenteredatthemean(65mph).
ProbabilityforanInterval=Areaunderthedensitycurveinthatinterval
Thenextfigureshowstheprobabilitythatthespeedofarandomlyselectedvehiclewillbebetween60and
73mileperhour,withthisprobabilityequaltotheareaunderthecurvebetween60and73.
EmpiricalRuleReview
Recallthatourfirstlessonwelearnedthatforbellshapeddata,about95%ofthedatavalueswillbeinthe
intervalmean(2std.dev).Inourexample,thisis65(25),or55to75.Thenextfigureshowsthat
theprobabilityisabout0.95(about95%)thatarandomlyselectedvehiclespeedisbetween55and75.
TheEmpiricalRulealsostatedthatabout99.7%(nearlyall)ofabellshapeddatasetwillbeinthe
intervalmean(3std.dev).Thisis65(35),or50to80forexample.Noticethatthisintervalroughly
givesthecompleterangeofthedensitycurveshownabove.
FindingProbabilitiesforaNormalRandomVariable
Rememberthatthecumulativeprobabilityforavalueistheprobabilitylessthanorequaltothatvalue.
Minitab,Excel,andtheTI83seriesofcalculatorswillgivethecumulativeprobabilityforanyvalueof
interestinaspecificnormalcurve.
Forourexampleofvehiclespeeds,hereisMinitaboutputshowingthattheprobability=0.9542thatthe
speedofarandomlyselectedvehicleislessthanorequalto73mph.
Tofindthisprobability,useCalc>ProbabilityDistribution>Normal,specifythemeanandstandard
deviationandenterthevalueofinterestas"InputConstant."Here'swhatitlookslikeforourexample.
Hereisafigurethatillustratesthecumulativeprobabilitywefoundusingthisprocedure.
"Greaterthan"Probabilities
Sometimeswewanttoknowtheprobabilitythatavariablehasavaluegreaterthansomevalue.For
instance,wemightwanttoknowtheprobabilitythatarandomlyselectedvehiclespeedisgreaterthan73
mph,writtenP(X>73).
Forourexample,probabilityspeedisgreaterthan73=10.9452=0.0548.
Thegeneralrulefora"greaterthan"situationis
P(greaterthanavalue)=1P(lessthanorequaltothevalue)
Example:UsingMinitabwecanfindthattheprobability=0.1587thataspeedislessthanorequalto60
mph.Thustheprobabilityaspeedisgreaterthan60mph=10.1587=0.8413.
TherelevantMinitaboutputandafigureshowingthecumulativeprobabilityfor60mphfollows:
"Inbetween"Probabilities
Supposewewanttoknowtheprobabilityanormalrandomvariableiswithinaspecifiedinterval.For
instance,supposewewanttoknowtheprobabilityarandomlyselectedspeedisbetween60and73mph.
Thesimplestapproachistosubtractthecumulativeprobabilityfor60mphfromthecumulativeprobability
for73.Theansweris
Probabilityspeedisbetween60and73=0.94520.1587=0.7875.
ThiscanbewrittenasP(60<X<73)=0.7875,whereXisspeed.
Thegeneralruleforan"inbetween"probabilityis
P(betweenaandb)=cumulativeprobabilityforvaluebcumulativeprobabilityforvaluea
FindingCumulativeProbabilities
UsingtheStandardNormalTableintheappendixoftextbookorseeacopyatStandardNormalTable
TableA.1inthetextbookgivesnormalcurvecumulativeprobabilitiesforstandardizedscores.
Astandardizedscore(alsocalledzscore)is
RowlabelsofTableA.1givepossiblezscoresuptoonedecimalplace.Thecolumnlabelsgivethe
seconddecimalplaceofthezscore.
Thecumulativeprobabilityforavalueequalsthecumulativeprobabilityforthatvalue'szscore.Here,
probabilityspeedlessthanorequal73mph=probabilityzscorelessthanorequal1.60.Howdidwearrive
atthiszscore?
Example
Inourvehiclespeedexample,thestandardizedscoresfor73mphis
.
Welookinthe".00"columnofthe"1.6"row(1.6plus.00equals1.60)tofindthatthecumulative
probabilityforz=1.60is0.9452,thesamevaluewegotearlierasthecumulativeprobabilityforspeed=73
mph.
Example
Forspeed=60thezscoreis
.
TableA.1givesthisinformation:
Thecumulativeprobabilityis.1587forz=1.00andthisisalsothecumulativeprobabilityforaspeedof60
mph.
Example
Supposepulseratesofadultfemaleshaveanormalcurvedistributionwithmean=75andstandard
deviations=8.Whatistheprobabilitythatarandomlyselectedfemalehasapulserategreaterthan
85?Becareful!Noticewewanta"greaterthan"andtheintervalwewantisentirelyaboveaverage,sowe
knowtheanswermustbelessthan0.5.
IfweuseTableA.1,thefirststepistocalculateazscoreof85.
InformationfromTableA.1is
Usethe"05"columntofindthatthecumulativeprobabilityforz=1.25is0.8944.
Thisisnotyettheanswer.Thisistheprobabilitythepulseislessthanorequalto85.Wewantagreaterthan
probabilitysotheansweris
P(greaterthan85)=1P(lessthanorequal85)=10.8944=0.1056.
FindingPercentiles
Wemaywishtoknowthevalueofavariablethatisaspecifiedpercentileofthevalues.
Wemightaskwhatspeedisthe99.99thpercentileofspeedsatthehighwaylocationinourearlier
example.
Wemightwanttoknowwhatpulserateisthe25thpercentileofpulserates.
InMinitab,wecanfindpercentilesusingtheCalc>ProbabilityDistributions>Normalbutwehavetomake
twochangestowhatwedidbefore.(1)Clickonthe"InverseCumulativeProbability"radiobutton(rather
thancumulativeprobability)and(2)enterthepercentilerankingasadecimalfractioninthe"Input
Constant"box.
The99.99thpercentileofspeeds(whenmean=65andstandarddeviation=5)isabout83.6mph.
OutputfromMinitabfollows.Noticethatnowthespecifiedcumulativeprobabilityisgivenfirst,and
thenthecorrespondingspeed.
The25thpercentileofpulserates(when=75ands=8)isabout69.6.RelevantMinitaboutputis
NormalApproximationtotheBinomial
Rememberbinomialrandomvariablesfromlastweek'sdiscussion?Abinomialrandomvariablecanalsobe
approximatedbyusingnormalrandomvariablemethodsdiscussedabove.Thisapproximationcantake
placeaslongas:
1.Thepopulationsizemustbeatleast10timesthesamplesize.
2.np=10andn(1p)=10.[Theseconstraintstakecareofpopulationshapesthatareunbalanced
becausepistoocloseto0orto1.]
Themeanofabinomialrandomvariableiseasytograspintuitively:Saytheprobabilityofsuccessforeach
observationis0.2andwemake10observations.Thenontheaverageweshouldhave10*0.2=2
successes.Thespreadofabinomialdistributionisnotsointuitive,sowewillnotjustifyourformulafor
standarddeviation.
IfsamplecountXofsuccessesisabinomialrandomvariablefornfixedobservationswithprobabilityof
successpforeachobservation,thenXhasameanandstandarddeviationasdiscussedinsection8.4of:
Mean=npandstandarddeviation=
Andaslongastheabove2requirementsarefornandparesatisfied,wecanapproximateXwithanormal
randomvariablehavingthesamemeanandstandarddeviationandusethenormalcalculationsdiscussed
previouslyinthesenotestosolveforprobabilitiesforX.
ReviewofFindingProbabilities
ClickontheInspecticonforanaudio/visualexampleforeachsituationdescribed.When
reviewinganyoftheseexampleskeepinthattheyapplywhen:
1.Thevariableinquestionfollowsanormal,orbellshaped,distribution
2.Ifthevariableisnotinstandardized,thenyouneedtostandardizedthevaluefirstby
.
Finding"LessThan"Probability
Finding"GreaterThan"Probability
Finding"Between"Probability
Finding"Either/Or"Probability
PopulationParametersandSampleStatistics
S1Asurveyiscarriedoutatauniversitytoestimatetheproportionofundergraduateslivingathome
duringthecurrentterm.Population:undergraduatesattheuniversityParameter:thetrueproportionof
undergraduatesthatliveathomeSample:theundergraduatessurveyedStatistic:theproportionofthe
sampledstudentswholiveathomeusedtoestimatethetrueproportion
S2Astudyisconductedtofindtheaveragehourscollegestudentsspendpartyingonthe
weekend.Population:allcollegestudentsParameter:thetruemeannumberofhourscollegestudents
spendpartyingontheweekendSample:thestudentssampledforthestudyStatistic:themeanhoursof
weekendpartyingcalculatedfromthesample
S1isconcernedaboutestimatingaproportionpwhereprepresentsthetrue(typicallyunknown)parameter
and [pronounced"phat"]representsthestatisticcalculatedfromthesample
S2isconcernedaboutestimatingameanuwhereu[pronounced"mew"]representsthetrue(typically
unknown)parameterand [pronounce"xbar"]representsthestatisticcalculatedfromthesample.
Ineithercasethestatisticisusedtoestimatetheparameter.Thestatisticcanvaryfromsampletosample,
buttheparameterisunderstoodtobefixed.
Thestatistic,then,cantakeonvariousvaluesdependingontheresultofrepeatedrandomsampling.The
distributionofthesepossiblevaluesisknownasthesamplingdistribution.
Overviewofsymbols
Thefollowingtableofsymbolsprovidessomeofthecommonnotationthatwewillseethroughtheremaing
sections.
Thedifferencebetween"paired"samplesand"independent"samplescanbemosteasilyexplainedbythe
situationwheretheobservationsaretakenonthesameindividual(e.g.measureaperson'sstresslevelbefore
andafteranexam)whereindependentwouldconsistoftakingobservationsfromtwodistinctgroups(e.g.
measurethestresslevelsofmenandwomenbeforeanexamandcomparethesestresslevels).Anexception
tothisasituationthatinvolvesanalyzingspouses.Insuchcases,spousaldataisoftenlinkedaspaireddata.
SamplingDistributionsofSampleStatistics
Twocommonstatisticsarethesampleproportion, ,(readaspihat)andsamplemean, ,(readasx
bar).Samplestatisticsarerandomvariablesandthereforevaryfromsampletosample.Forinstance,
considertakingtworandomsamples,eachsampleconsistingof5students,fromaclassandcalculatingthe
meanheightofthestudentsineachsample.Wouldyouexpectbothsamplemeanstobeexactlythesame?
Asaresult,samplestatisticsalsohaveadistributioncalledthesamplingdistribution.Thesesampling
distributions,similartodistributionsdiscussedpreviuosly,haveameanandstandarddeviation.However,
werefertothestandarddeviationofasamplingdistributionasthestandarderror.Thus,thestandarderror
issimplythestandarddeviationofasamplingdistribituion.Oftentimespeoplewillinterchangethesetwo
terms.Thisisokayaslongasyouunderstandthedistinctionbetweenthetwo:standarderrorrefers
tosamplingdistributionsandstandarddeviationrefestoprobabilitydistributions.
SamplingDistributionsforSampleProportion,
Ifnumerousrepetitionsofsamplesaretaken,thedistributionof issaidtoapproximateanormalcurve
distribution.Alternatively,thiscanbeassumedifBOTHn*pandn*(1p)areatleast10.[SPECIAL
NOTE:Sometextbooksuse15insteadof10believingthat10istoliberal.Wewilluse10forour
discussions.]Usingthis,wecanestimatethetruepopulationproportion,p,by andthetruestandard
deviationofpbys.e.( )=
,wheres.e.( )isinterpretedasthestandarderrorof
ProbabilitiesaboutthenumberXofsuccessesinabinomialsituationarethesameasprobabilitiesabout
correspondingproportions.
Ingeneral,ifnp>=10andn(1p)>=10,thesamplingdistributionof isaboutnormalwithmeanofpand
standarderrorSE( )=
Example.Supposetheproportionofallcollegestudentswhohaveusedmarijuanainthepast6months
isp=.40.ForaclassofsizeN=200,representativeofallcollegestudentsonuseofmarijuana,whatisthe
chancethattheproportionofstudentswhohaveusedmjinthepast6monthsislessthan.32(or32%)?
Solution.Themeanofthesampleproportion ispandthestandarderrorof isSE( )=
.For
thismarijuanaexample,wearegiventhatp=.4.WethendetermineSE( )=
=0.0346
So,thesampleproportion isaboutnormalwithmeanp=.40andSE( )=0.0346.
Thezscorefor.32isz=(.32.40)/0.0346=2.31.ThenusingStandardNormalTable
Prob( <.32)=Prob(Z<.2.31)=0.0104.
Questiontoponder:Ifyouobservedasampleproportionof.32wouldyoubelieveaclaimthat40%of
collegestudentsusedmjinthepast6months?Orwouldyouthinktheproportionislessthan.40?
SamplingDistributionoftheSampleMean
Thecentrallimittheoremstatesthatifalargeenoughsampleistaken(typicallyn>30)thenthesampling
distributionof isapproximatelyanormaldistributionwithameanofandastandarddeviationof
Sinceinpracticeweusuallydonotknoworweestimatetheseby and
respectively.Inthis
casesistheestimateofandisthestandarddeviationofthesample.Theexpression
isknownasthe
standarderrorofthemean,labeleds.e.( )
Simulation:Generate500samplesofsizeheightsof4men.Assumethedistributionofmaleheightsis
normalwithmeanm=70"andstandarddeviations=3.0".Thenfindthemeanofeachof500samplesof
size4.
Herearethefirst10samplemeans:
70.472.072.369.970.570.070.568.169.271.8
Theorysaysthatthemeanof( )==70whichisalsothePopulationMeanandSE( )=
=1.50.
Simulationshows:Average(500 's)=69.957andSE(of500 's)=1.496
Changethesamplesizefromn=4ton=25andgetdescriptivestatistics:
Theorysaysthatthemeanof( )==70whichisalsothePopulationMeanandSE( )=
=0.60.
Simulationshows:Average(500 's)=69.983andSE(of500 's)=0.592
SamplingDistributionofSampleMean fromaNonNormalPopulation
Simulation:BelowisaHistogramofNumberofCdsOwnedbyPSUStudents.Thedistributionisstrongly
skewedtotheright.
AssumethePopulationMeanNumberofCDsownedis=84ands=96
Let'sobtain500samplesofsize4fromthispopulationandlookatthedistributionofthe500xbars:
Theorysaysthatthemeanof( )==84whichisalsothePopulationMeanandtheSE( )=
samplemean.
iii.TheCentralLimitTheoremisimportantbecauseitenablesustocalculateprobabilitiesaboutsample
means.
Example.FindtheapproximateprobabilitythattheaveragenumberofCDsownedwhen100studentsare
askedisbetween70and90.
Solution.Sincethesamplesizeisgreaterthan30,weassumethesamplingdistributionof isaboutnormal
withmeanm=84andSE( )=
=9.6.WeareaskedtofindProb(70<
<90).Thezscores
forthetwovaluesare
for90:z=(9084)/9.6=0.625andfor70:z=(7084)/9.6=1.46.Fromtablesofthenormaldistribution
wegetP(1.46<Z<0.625)=.734.072=.662.
Supposethesamplesizewas1600insteadof100.Thenthedistributionof wouldbeaboutnormalwith
mean84andstandarddeviation
=96/40=2.4.Fromtheempiricalruleweknowthatalmost
allxbarsforsamplesofsize1600willbeintheinterval
84(3)(2.4)orintheinterval847.2orbetween76.8and91.2.TheLawofLargeNumberssaysthatas
weincreasethesamplesizetheprobabilitythatthesamplemeanapproachesthepopulationmeanis1.00!
APPLET
HereisanappletdevelopedbythefolksatRiceUniversitythatsimulates"samplingdistribution".The
objecthereistogiveyouachancetoexplorevariousaspectsofsamplingdistributions.Whentheapplet
begins,ahistogramofanormaldistributionisdisplayedatthetopicofthescreen.
Thedistributionportrayedatthetopofthescreenisthepopulationfromwhichsamplesaretaken.The
meanofthedistributionisindicatedbyasmallbluelineandthemedianisindicatedbyasmallpurpleline.
Sincethemeanandmedianarethesame,thetwolinesoverlap.Theredlineextendsfromthemeanone
standarddeviationineachdirection.Notethecorrespondencebetweenthecolorsusedonthehistogram
andthestatisticsdisplayedtotheleftofthehistogram.
Thesecondhistogramdisplaysthesampledata.Thishistogramisinitiallyblank.Thethirdandfourth
histogramsshowthedistributionofstatisticscomputedfromthesampledata.Thenumberofsamples
(replications)thatthethirdandfourthhistogramsarebasedonisindicatedbythelabel"Reps=."
BasicOperation
Thesimulationissettoinitiallysamplefivenumbersfromthepopulation,computethemeanofthefive
numbers,andplotthemean.Clickthe"Animatedsample"buttonandyouwillseethefivenumbersappear
inthehistogram.Themeanofthefivenumberswillbecomputedandthemeanwillbeplottedinthethird
histogram.Dothisseveraltimestoseethedistributionofmeansbegintobeformed.Onceyouseehow
thisworks,youcanspeedthingsupbytaking5,1,000,or10,000samplesatatime.
noappletsupport
Noticethatasyouincreasethesamplesize,regardlessoftheshapeyoucreate,thedistribution(i.e.lookat
thehistogram)becomesmorebellshaped.Thisisthetheoreticalmeaningbehindthecentrallimit
theorem:assamplesizeincreases,thendespitethatthepopulationfromwhichthesampleoriginatedisnot
normal(e.g.uniformorchisquare),thesamplemeanwillapproximateanormaldistribution
ReviewofSamplingDistributions
Inlaterpartofthelastlessonwediscussedfindingtheprobabilityforacontinuousrandomvariablethat
followedanormaldistribution.Wedidsobyconvertingtheobservedscoretoastandardizedzscoreand
thenapplyingStandardNormalTable.Forexample:
IQscoresarenormallydistributedwithmean,,of110andstandarddeviation,,equalto25.Letthe
randomvariableXbearandomlychosenscore.Findtheprobabilityofarandomlychosenscoreexceedinga
100.Thatis,findP(X>100).Tosolve,
Butwhataboutsituationswhenwehavemorethanonesample,thatisthesamplesizeisgreaterthan1?In
practice,usuallyjustonerandomsampleistakenfromapopulationofquantitativeorqualitativevaluesand
thestatistic thesamplemeanor thesampleproportion,respectively,ismeasuredonetimeonly.For
instance,ifwewantedtoestimatewhatproportionofPSUstudentsagreedwiththePresident'sexplanation
totherisingtuitioncostswewouldonlytakeonerandomsample,ofsomesize,andusethissampletomake
anestimate.Wewouldnotcontinuetotakesamplesandmakeestimatesasthiswouldbecostlyand
inefficient.Forsamplestakenatrandom,samplemean{orsampleproportion}isarandomvariable.Toget
anideaofhowsucharandomvariablebehavesweconsiderthisvariable'ssamplingdistributionwhichwe
discussedpreviouslyinthislesson.
ConsiderthepopulationofpossiblerollsXforasinglesixsidediehasamean,,equalto3.5anda
standarddeviation,,equalto1.7.[Ifyoudonotbelievethisrecallourdiscussionofprobabilitiesfor
discreterandomvariables.Forthesixsidedieyouhavesixpossibleoutcomeseachwiththesame1/6
probabilityofbeingrolled.Applyingyourrecentknowledge,calculatethemeanandstandarddeviationand
seewhatyouget!]Ifwerolledthedietwice,thesamplemean, ofthesetworollscantakeonvarious
valuesbasedonwhatnumberscomeup.Sincetheseresultsaresubjecttothelawsofchancetheycanbe
definedasarandomvariable.Fromthebeginningofthesemesterwecanapplywhatwelearnedto
summarizedistributionsbyitscenter,spread,andshape.
1.Sometimesthemeanrollof2dicewillbelessthan3.5,othertimesgreaterthan3.5.Itshouldbejust
aslikelytogetalowerthanaveragemeanthatitistogetahigherthanaveragemean,butthe
samplingdistributionofthesamplemeanshouldbecenteredat3.5.
2.Fortherollof2dice,thesamplemeancouldbespreadallthewayfrom1to6thinkiftwo"1s"or
two"6s"aretossed.
3.Themostlikelymeanrollfromthetwodiceis3.5allcombinationswherethesumis7.Thelower
andhigherthemeanrolls,thelesslikelytheyaretooccur.Sotheshapeofthedistributionofthe
samplemeansfromtworollswouldtaketheformofatriangle.
Ifweincreasethesamplesize,i.e.thenumberofrolls,tosay10,thenthissamplemeanisalsoarandom
variable.
1.Sometimesthemeanrollof10dicewillbelessthan3.5andsometimesgreaterthan3.5.Similarto
whenwerolledthedice2times,thesampledistributionof for10rollsshouldbecenteredat3.5.
2.For10rolls,thedistributionofthesamplemeanwouldnotbeasspreadasthatfor2rolls.Gettinga
"1"ora"6"onall10rollswillalmostneveroccur.
3.Themostlikelymeanrollisstill3.5withlowerorhighermeanrollsgettingprogressivelylesslikely.
Butnowthereisamuchbetterchanceoftheforthesamplemeanofthe10rollstobecloseto3.5,
andamuchworsechanceforthissamplemeantobenear1or6.Therefore,theshapeofthesampling
distributionfor10rollsbulgesat3.5andtapersoffateitherendtada!Theshapelooksbellshaped
ornormal!
Thisdieexampleillustratesthegeneralresultofthecentrallimittheorem:regardlessofthepopulation
distribution(thedistributionforthedieiscalledauniformdistributionbecauseeachoutcomeisequally
likely)thedistributionofthesamplemeanwillapproachnormalassamplesizeincreasesandthesample
mean, hasthefollowingcharacteristics:
1.Thedistributionof iscenteredat
2.Thespreadof canbemeasuredbyitsstandarddeviation,,equalto
Example
Assumewomen'sheightsarenormallydistributedwith=64.5inchesand=2.5inches.Pickonewomen
atrandom.AccordingtotheEmpiricalRule,theprobabilityis:
68%thatherheightXisbetween62inchesand68inches
95%thatherheightXisbetween59.5inchesand69.5inches
99.7%thatherheightXisbetween57inchesand72inches
Nowpickarandomsampleofsize25women.Thesamplemeanheight, isnormalwithexpectedvalue
(i.e.mean)of64.5inchesandstandarddeviation,
,equalto0.5.Theprobabilityis:
68%thattheirsamplemeanheight isbetween64inchesand65inches
95%thattheirsamplemeanheight isbetween63.5inchesand65.5inches
99.7%thattheirsamplemeanheight isbetween63inchesand66inches
UsingStandardNormalTableformoreexactprobabilitiesinsteadoftheEmpiricalRule,whatisthe
probabilitythatthesamplemeanheightof25womenislessthan63.75inches?
Proportions
Similarlawsapplyforproportions.Thedifferencesare:
1.FortheCentralLimitTheoremtoapply,werequirethatbothn>=10andn(1)>=10,whereis
thetruepopulationproportion.Ifisunknownthenwecansubstitutethesampleproportion, .
2.Thedistributionofthesampleproportion, ,willhaveameanequaltoandstandarddeviationof
.
Tofindprobabilitiesassociatedwithsome wefollowsimilarcalculationsasthatforsamplemeans:
2007ThePennsylvaniaStateUniversity.Allrightsreserved.