Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
In this chapter we describe populations and samples using the language of probability.
5.1 Population
The range of values will be called the population.
random variable to be a random number drawn from a population.
distribution of a random variable to a description of the range and the
probabilities of r.v
5.1.1 Discrete random variables
P(X=k)>0 and P(X=k)1. Furthermore, as X has some value, we have k P(X=k)=1.
#spike plot
k = 0:4
p=c(1,2,3,2,1)/9
plot(k,p,type="h",xlab="k", ylab="probability",ylim=c(0,max(p))) # argument type=h for
spike plot
points(k,p,pch=16,cex=2)
#
add
the
balls
to
top
of
spike
population variation:
#a random sample can also be produced by specifying the probabilities using prob=:
sample(0:1,size=10,replace=T,prob=c(1-.62,.62))
Example 5.5: Tossing ten coins Toss a coin ten times. Let X be the number of heads. If
the coin is fair, X has a Binomial(10,1/2) distribution.
The probability that X=5 can be found directly from the distribution with the choose()
function:
choose(10,5)*(1/2)^5 * (1/2)^(10-5)
dbinom(5, size=10, prob=1/2)
The probability that there are six or fewer heads, P(X6)=k6 P(X=k), can be given
either of these two ways:
sum(dbinom(0:6,size=10,prob=1/2))
pbinom(6,size=10,p=1/2)
A spike plot (Figure 5.4) of the distribution can be produced using dbinom():
heights=dbinom(0:10,size=10,prob=1/2)
plot(0:10, heights, type="h",main="Spike plot of X", xlab="k", ylab="p.d.f.")
points(0:10, heights, pch=16,cex=2)
How much area is no more than one standard deviation from the mean? We use pnorm()
to find this:
pnorm(1)pnorm(1)
12*pnorm(2)
# subtract area of two tails
diff(pnorm(c(3,3))) # use diff to subtract
Example 5.8: Testing the rules of thumb We can test the rules of thumb using random
samples from the normal distribution as provided by rnorm().
First we create 1,000 random samples and assign them to res:
mu = 100; sigma = 10
res = rnorm(1000,mean = mu,sd = sigma)
k = 1;sum(res > mu k*sigma & res < mu + k*sigma)
k = 2;sum(res > mu k*sigma & res < mu + k*sigma)
k = 3;sum(res > mu k*sigma & res < mu + k*sigma)
Exponential distribution
res = rexp(50, rate=1/5)
## boxplot
par(fig=c(0,1,0,.35))
boxplot(res, horizontal=TRUE, bty="n",xlab="exponential sample")
## histogram
par(fig=c(0,1,.25,1), new=TRUE)
## store values, then find largest y one to set ylim=
tmp.hist = hist(res, plot=FALSE)
tmp.edens = density(res)
tmp.dens = dexp(0, rate=1/5)
y.max = max(tmp.hist$density, tmp.edens$y, tmp.dens)
## make plots
hist(res, ylim=c(0,y.max), prob=TRUE, main="",col=gray(.9))
lines(density(res), lty=2)
curve(dexp(x, rate=1/5), lwd=2, add=TRUE)
rug(res)
Lognormal distribution
qt(c(.025,.975), df=10)
# 10 degrees of freedom
qf(c(.025,.975), df1=10, df2=5) # 10 and 5 degrees of freedom
qchisq(c(.025,.975), df=10)
# 10 degr
That is, with greater and greater probability, the random value of
is close to the mean,
, of the parent population. This phenomenon of the sample average concentrating on the
mean is known as the law of large numbers
if adult male heights are normally distributed with mean 70.2 inches and standard
deviation 2.89 inches, the average height of 25 randomly chosen males is again normal
with mean 70.2 but standard deviation 1/5 as large. The probability that the sample
average is between 70 and 71 is found with
mu=70.2; sigma=2.89; n=25
diff( pnorm(70:71, mu, sigma/sqrt(n)) )
[1] 0.5522
Figure 5.9 illustrates the central limit theorem for data with an Exponential (1)
distribution. This parent population and simulations of the distribution of
and 100 are drawn. As n gets bigger, the sampling distribution of
more bell shaped