Sei sulla pagina 1di 5

Assignment: Statistical Inference Course Project

Rodrigo Farruguia
April 1, 2016

Overview:
In this project you will investigate the exponential distribution in R and compare it with the Central Limit Theorem. The exponential distribution can be
simulated in R with rexp(n, lambda) where lambda is the rate parameter. The
mean of exponential distribution is 1/lambda and the standard deviation is also
1/lambda. Set lambda = 0.2 for all of the simulations. You will investigate the
distribution of averages of 40 exponentials. Note that you will need to do a
thousand simulations.

Simulations
1000 samples of size 40 from and explonential distribution using a lambda of .2 .
We use rexp for this, also 1/lambda is the standard diviation and the exponential
distribution.
#number of seeds for reproduceability
set.seed(304)
# Variables to be used extracted from the overview.
lambda <- 0.2 # the lambda for experiments.
n <- 40 # number of experimentals used
numsims <- 1000 #number of simulations

Lets look at the means, Sample and Theoretical.


with a sample size of n , theoretical mean of the average samples will be mu sub
x = 1 lambda, you can see they are very close.
Mean
Mean from samples
Theoretical mean

5.013
5.000

A histogram representing the distribution of our sample means the vertical lines
are the mean of the distribution and the theoretical mean.By the comparison
numbers we got above we can tell from the graph that the lines will almost
overlap eachother.

Histogram of the Sample Means (30 bins)

Frequency

60

40

20

0
4

Sample Mean

Lets now look at the Variance, Sample and theoretical.


The actual variance is calculated by taking the variance of the experintal sample
mean. and the theoretical is calculated by the theoretical mean raise to the
second power , devided by the number of experimentals used. They are actually
really close as expected. ro2=Var(mean of samples)n.
Variance
Variance from the sample
Theoretical variance
# Do we have a normal distr

0.637
0.625
ibution

The averages of the samples should follow the normal distribution. We do this
by plotting and compairing the distribution of the samples mean and normal
distribution

Histogram, sample means fitting normal curve


0.5

Frequency

0.4

0.3

0.2

0.1

0.0
4

Sample mean

Normal probability plot


8

sample

theoretical

Lets see how it plots on a line to see that the theoretical normal distribution is a
match to the sample mean.You can see the distribution is approximately linear
normal

INDEX with code used to generate supporting graphs and calculations


### library needed for plots
library(ggplot2) #library needed for plots
library(knitr) #knitr, for exporting to document.
### means
experimentaldist <- matrix(data=rexp(n= numsims*n,rate=lambda), numsims,n )
experimentalmean <- rowMeans(experimentaldist)
actualmean <- mean(experimentalmean)
theoreticalmean <- 1/ lambda
actualvariance <- var(experimentalmean)
theoreticalvariance <- (1/ lambda)^2 /n
### first table
r1 <-data.frame("Mean"=c(actualmean,theoreticalmean),
row.names = c("Mean from samples ","Theoretical mean"))

kable(x = round(r1,3),align = 'c')


### graph 1 histogram
experimentalmeandata <- as.data.frame(experimentalmean)
ggplot(experimentalmeandata, aes(experimentalmean))+
geom_histogram(bins= 40, alpha=.5, position="identity", fill="green", col="black")+
geom_vline(xintercept = theoreticalmean, col="blue", linetype = "longdash",show.legend=TRUE)+
geom_vline(xintercept = actualmean, col="red", linetype = "longdash", show.legend =TRUE)+
ggtitle ("Histogram of the Sample Means (30 bins)")+
xlab("Sample Mean")+
ylab("Frequency")
### second table
r2 <-data.frame("Variance"=c(actualvariance, theoreticalvariance),
row.names = c("Variance from the sample ","Theoretical variance"))
kable(x = round(r2,3),align = 'c')

### graph 2 histogram


ggplot(experimentalmeandata, aes(experimentalmean))+
geom_histogram(aes(y=..density..),bins = 40, alpha=.5, position="identity", fill="green", col="b
geom_density(col="brown", size=1)+
stat_function(fun = dnorm, col = "red", args = list(mean = theoreticalmean, sd = sqrt(theoretica
ggtitle ("Histogram, sample means fitting normal curve ")+
xlab("Sample mean")+
ylab("Frequency")

### graph 3 line plot


qqplot.data <- function (vec) # argument: vector of numbers
{
y <- quantile(vec[!is.na(vec)], c(0.25, 0.75))
x <- qnorm(c(0.25, 0.75))
slope <- diff(y)/diff(x)
int <- y[1L] - slope * x[1L]
d <- data.frame(resids = vec)
ggplot(d, aes(sample = resids)) + stat_qq(col="blue") + geom_abline(slope = slope, intercept = int, co
}
qqplot.data (experimentalmean) +ggtitle ("Normal probability plot ")

Potrebbero piacerti anche