5 Describing Populations: in This Chapter We Describe Populations and Samples Using The Language of Probability

5 Describing Populations
In this chapter we describe populations and samples using the language of probability.
5.1 Population
The range of values will be called the population.
random variable to be a random number drawn from a population.
distribution of a random variable to a description of the range and the
probabilities of r.v
5.1.1 Discrete random variables
P(X=k)>0 and P(X=k)1. Furthermore, as X has some value, we have k P(X=k)=1.
#spike plot
k = 0:4
p=c(1,2,3,2,1)/9
plot(k,p,type="h",xlab="k", ylab="probability",ylim=c(0,max(p))) # argument type=h for
spike plot
points(k,p,pch=16,cex=2)
#
add
the
balls
to
top
of
spike
#Using sample() to generate random values

k = 0:2
p = c(1,2,1)/4
sample(k,size=1,prob=p)
sample(k,size=1,prob=p)
sample(1:6,size=1) + sample(1:6, size=1)

sample(1:6, size=1)+sample(1:6, size=1)
#The mean and standard deviation of a discrete random variable

population mean:
population variation:
5.1.2 Continuous random variables

The p.d.f. and c.d.f.
The mean and standard deviation of a continuous random variable
Quantiles of a continuous random variable
5.1.3 Sampling from a population
A sequence that is both independent and identically distributed is called an i.i.d.
sequence, or a random sample.
Random samples generated by sample()
## toss a coin 10 times. Heads=1, tails=0
sample(0:1,size=10,replace=TRUE)
sample(1:6,size=10,replace=TRUE) ## roll a die 10 times
## sum of dice roll 10 times
sample(1:6, size=10,replace=TRUE) + sample(1: 6,size=10,replace=TRUE)
#a random sample can also be produced by specifying the probabilities using prob=:
sample(0:1,size=10,replace=T,prob=c(1-.62,.62))
5.1.4 Sampling distributions

The distribution of a statistic is known as its sampling distribution.
The sampling distribution of a statistic can be quite complicated. However, for many
common statistics, properties of the sampling distribution are known and are related to
the population parameters. For example, the sample mean of a random sample has
5.2 Families of distribution

5.2.1 The d, p, q, and r functions
R has four types of functions for getting information about a family of distributions.
The d functions return the p.d.f. of the distribution, whereas the p functions return the
c.d.f. of the distribution. The q functions return the quantiles, and the r functions
return random samples from a distribution.
dunif(x=1, min=0, max=3)
punif(q=2, min=0, max=3)
qunif(p=1/2, min=0, max=3)
runif(n=1, min=0, max=3)
the arguments to these functions can be vectors,

ps = seq(0,1,by=.2)
# vector
names(ps)=as.character(seq(0,100,by=20)) # give names
qunif(ps, min=0, max=1)
5.2.2 Binomial, normal, and some other named distributions

Bernoulli random variables
n = 10; p = 1/4
sample(0:1, size=n, replace=TRUE,prob=c(1-p,p))
Binomial random variables
Example 5.5: Tossing ten coins Toss a coin ten times. Let X be the number of heads. If
the coin is fair, X has a Binomial(10,1/2) distribution.
The probability that X=5 can be found directly from the distribution with the choose()
function:
choose(10,5)*(1/2)^5 * (1/2)^(10-5)
dbinom(5, size=10, prob=1/2)
The probability that there are six or fewer heads, P(X6)=k6 P(X=k), can be given
either of these two ways:
sum(dbinom(0:6,size=10,prob=1/2))
pbinom(6,size=10,p=1/2)
If we wanted the probability of seven or more heads, we could answer using

P(X7)=1P(X6), or using the extra argument lower .tail=FALSE. This returns P(X>k )
rather than P(X k) .
sum(dbinom(7:10,size=10,prob=l/2))
pbinom(6,size=10,p=1/2)
pbinom(6,size=10,p=1/2, lower.tail=FALSE) # k=6 not 7!
A spike plot (Figure 5.4) of the distribution can be produced using dbinom():
heights=dbinom(0:10,size=10,prob=1/2)
plot(0:10, heights, type="h",main="Spike plot of X", xlab="k", ylab="p.d.f.")
points(0:10, heights, pch=16,cex=2)
Normal random variables
We can verify this with the p function:

pnorm(1.5, mean=0,sd=1)
pnorm(4.75, mean=4,sd=1/2)
# same z-score as above
How much area is no more than one standard deviation from the mean? We use pnorm()
to find this:
pnorm(1)pnorm(1)
12*pnorm(2)
# subtract area of two tails
diff(pnorm(c(3,3))) # use diff to subtract
Example 5.8: Testing the rules of thumb We can test the rules of thumb using random
samples from the normal distribution as provided by rnorm().
First we create 1,000 random samples and assign them to res:
mu = 100; sigma = 10
res = rnorm(1000,mean = mu,sd = sigma)
k = 1;sum(res > mu k*sigma & res < mu + k*sigma)
5.2.3 Popular distributions to describe populations

Uniform distribution
res = runif(50, min=0, max=10)
## fig= setting uses bottom 35% of diagram
par(fig=c(0,1,0,.35))
boxplot(res,horizontal=TRUE, bty="n", xlab="uniform sample")
## fig= setting uses top 75% of figure
par(fig=c(0,1,.25,1), new=TRUE)
hist(res, prob=TRUE, main="", col=gray(.9))
lines(density(res),lty=2)
curve(dunif(x, min=0, max=10), lwd=2, add=TRUE)
rug(res)
Exponential distribution
res = rexp(50, rate=1/5)
## boxplot
par(fig=c(0,1,0,.35))
boxplot(res, horizontal=TRUE, bty="n",xlab="exponential sample")
## histogram
par(fig=c(0,1,.25,1), new=TRUE)
## store values, then find largest y one to set ylim=
tmp.hist = hist(res, plot=FALSE)
tmp.edens = density(res)
tmp.dens = dexp(0, rate=1/5)
y.max = max(tmp.hist$density, tmp.edens$y, tmp.dens)
## make plots
hist(res, ylim=c(0,y.max), prob=TRUE, main="",col=gray(.9))
lines(density(res), lty=2)
curve(dexp(x, rate=1/5), lwd=2, add=TRUE)
rug(res)
Lognormal distribution
qt(c(.025,.975), df=10)
# 10 degrees of freedom
qf(c(.025,.975), df1=10, df2=5) # 10 and 5 degrees of freedom
qchisq(c(.025,.975), df=10)
# 10 degr
5.3 The central limit theorem

5.3.1 Normal parent population
X
That is, with greater and greater probability, the random value of
is close to the mean,
, of the parent population. This phenomenon of the sample average concentrating on the
mean is known as the law of large numbers
if adult male heights are normally distributed with mean 70.2 inches and standard
deviation 2.89 inches, the average height of 25 randomly chosen males is again normal
with mean 70.2 but standard deviation 1/5 as large. The probability that the sample
average is between 70 and 71 is found with
mu=70.2; sigma=2.89; n=25
diff( pnorm(70:71, mu, sigma/sqrt(n)) )
[1] 0.5522
5.3.2 Nonnormal parent population

The central limit theorem states that for any parent population with mean and
standard
deviation , the sampling distribution of
for large n satisfies
Figure 5.9 illustrates the central limit theorem for data with an Exponential (1)
distribution. This parent population and simulations of the distribution of
and 100 are drawn. As n gets bigger, the sampling distribution of
more bell shaped
for n=5, 25,
becomes more and
pnorm(.9, mean=1, sd = 1/sqrt(20))

[1] 0.3274

5 Describing Populations: in This Chapter We Describe Populations and Samples Using The Language of Probability

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

5 Describing Populations: in This Chapter We Describe Populations and Samples Using The Language of Probability

Caricato da

Copyright:

Formati disponibili

5 Describing Populations

#Using sample() to generate random values

sample(1:6,size=1) + sample(1:6, size=1)

#The mean and standard deviation of a discrete random variable

5.1.2 Continuous random variables

5.1.4 Sampling distributions

5.2 Families of distribution

the arguments to these functions can be vectors,

5.2.2 Binomial, normal, and some other named distributions

Binomial random variables

If we wanted the probability of seven or more heads, we could answer using

Normal random variables

We can verify this with the p function:

# same z-score as above

5.2.3 Popular distributions to describe populations

5.3 The central limit theorem

5.3.2 Nonnormal parent population

for large n satisfies

for n=5, 25,

becomes more and

pnorm(.9, mean=1, sd = 1/sqrt(20))

Potrebbero piacerti anche