Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Lecture One:
Types of Data:
Numerical or quantitative data are real numbers with specific numerical values.
Nominal or qualitative data are non-numerical data sorted into categories on the basis of qualitative
attributes.
Ordinal or ranked data are nominal data that can be ranked.
- The population is the complete set of data that we seek to obtain information about
- The sample is a part of the population that is selected (or sampled) in some way using a
sampling frame
- A characteristic of a population is call a parameter
- A characteristic of a sample is called a statistic
- The difference between our estimate and the true (usually unknown) parameter is the
sampling error
- In a random sample, all population members have an equal chance of being sampled
Lecture Two:
Lecture Three:
Measures of Centre:
𝚺𝒙
Mean/Average: population: µ sample: x̄ =
𝒏
- easy to calculate
- sensitive to extreme observations
Median: middle number, or average of two middle numbers
- not sensitive to extreme observations
Mode: most frequently occurring number
- only used for finding most common outcome
x̄ – µ = sampling error
Measures of Variation:
Population variance measures an average of the squared deviations between each observation and
1
the population mean: σ2 = ∑(𝑥1 − 𝜇)2
𝑁
1
Population standard deviation is the square root of population variance: σ = √𝑁 ∑(𝑥1 − 𝜇)2
Sample variance measures the average of the squared deviations between each observation and the
1
sample mean: s2: 𝑛−1 ∑(𝑥1 − x̄ )2
1
Sample standard deviation is the square root of sample variance: s = √𝑛−1 ∑(𝑥1 − x̄ )2
Coefficient of variation measures the variation in a sample (given by its standard deviation) relative
to that sample’s mean, it is expressed as a percentage to provide a unit-free measurement, letting us
𝑠
compare difference samples: CV = 100 × %
x̄
Lecture Four:
Measures of Association:
Covariance measures the co-variation between two sets of observations.
With a population size N having observations (xi, yi), (x2, y2), (xN, yN) etc. and having μx, μy, being the
respective means of the xi and yi terms, covariance is calculated as,
𝑁
1
𝐶𝑂𝑉(𝑋, 𝑌) = ∑(𝑥𝑖 − 𝜇𝑥 )(𝑦 − 𝜇𝑦 )
𝑁
𝑖−1
If we have a sample of size n, with sample means 𝑥̅ and 𝑦̅, the covariance is calculated as
𝑛
1
𝑐𝑜𝑣(𝑥, 𝑦) = ∑(𝑥𝑖 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑁
𝑖−1
Problems with covariance: it is difficult to interpret the strength of a relationship because covariance
is sensitive to units.
Correlation gives us a measure of association which is not affected by units.
𝑐𝑜𝑣(𝑥,𝑦)
Sample correlation coefficient: 𝑟 = 𝑠𝑥 𝑠𝑦
, sx and sy are sample standard deviations
𝐶𝑂𝑉(𝑋,𝑌)
Population correlation coefficient: 𝜌 = 𝜎𝑥 𝜎𝑦
, σx and σy are population standard deviations
Lecture Five:
A random experiment is a procedure that generates outcomes that are not known with certainty
until observed.
A random variable (RV) is a variable with a value that is determined by the outcome of an
experiment.
A discrete random variable has a countable number (K) of possible outcomes with each having a
specific probability associated with each.
Univariate data has one random variable.
Bivariate data has two random variables.
If X is a random variable with K possible outcomes, then an individual value of X is written as xi,
i=1,2,3…K
The probability of observing X is written as P(X=xi) or p(xi) where
- 0 ≤ 𝑝(𝑥𝑖 ) ≤ 1
- ∑ 𝑝( (𝑥𝑖 ) = 1
- That is, all probabilities must lie between 0 and 1 and all added together equal 1 in total
Expected Value/Mean of a random variable: is the value of x one would expect to get on average
over a large/infinite number of repeated trials: µx = E(X) = ∑ 𝑥𝑖 𝑝(𝑥𝑖 )
Variance of a random variable: is the probability-weighted average of all squared deviations
between each possible outcome with the expected value: σ2 = V(X) = ∑(𝑥𝑖 − µ𝑥 )2 𝑝(𝑥𝑖 ) or
∑ 𝑥𝑖 2 𝑝(𝑥𝑖 ) − µ𝑥 2
Lecture Six:
Binomial Distribution:
- Each experiment is independent
- There are n trials each with two possible outcomes, success = p or failure = q
Lecture Seven:
Normal Distribution:
Normal distribution is bell-shaped and symmetrical.
The total area under the curve is equal to 1.
Lecture Nine:
If we take repeated samples of size n from a population X and record 𝑥̅ for each, the collection of 𝑥̅
can be represented as a random variable 𝑋̅ with its own distribution. This is called a sampling
distribution.
Standardising 𝑋̅:
𝑋̅ − 𝜇𝑋̅ 𝑋̅ − 𝜇
𝑍= =
𝜎𝑋̅ 𝜎/√𝑛
Lecture Ten:
𝑋
We can approximate the distribution of 𝜌̂ = by a normal distribution with a mean p and variance
𝑛
𝑝𝑞 𝑝𝑞
pq/n: 𝑝̂ ≈ 𝑁(𝑝, ) with a standard error, √ 𝑛 if n𝜌̂ and n𝑞̂ are both ≥ 5
𝑛
Lecture Eleven:
Principles of Estimation:
We can construct point estimators to makes a specific guess of the parameter value or interval
estimators to guess a range of value in which the parameter may lie.
Samples statistics such as mean, median, variance etc. are all examples of point estimates of
population parameters.
Confidence Intervals:
Rather than finding the probability content of a given interval, confidence intervals find the interval
on the basis of sample data with a given probability content. These intervals can then be used to
guess the location of population parameters.
This means in repeated samples of size n, the probability of a randomly chosen interval covering the
true populations mean, μ is p.
The interval is random because 𝑋̅ varies from sample to sample, therefore the interval is random but
μ is not.
Interpretation of the confidence interval is that we are x% confident the interval covers the mean.
𝜎
The confidence interval estimator is 𝑋̅ ± 𝑧𝛼 ( 𝑛)
2 √
- As the level of confidence increases, the z score becomes more extreme and the interval
widens
- As the population standard deviation increases, the standard error increases and the interval
widens
- As the sample size increases, the standard error decreases so the interval narrows
Lecture Twelve:
If we do not know μ or σ2 we should replace σ2 with s2 and build our confidence intervals using t-
𝑋̅−𝜇
values; 𝑡 = 𝑠 using n-1 degrees of freedom
( )
√𝑛
If the table does not give the degrees of freedom you want, approximate, and write a note of why,
explaining the approximation used.
𝑠
Confidence interval estimator: 𝑋̅ ± 𝑡𝛼,𝑑𝑓 ( 𝑛) this can only be used if 𝑋̅~𝑁
2 √
- A the level of confidence increases, the t score becomes more extreme so the interval
widens
- As the sample standard deviation increases, the standard error increases so the interval
widens
- As the sample size increases, the interval narrows
Let 𝜃̂be the estimator of a parameter 𝜃 with standard error of 𝜃̂ being 𝑠𝜃̂
A (1-α) 100% confident interval for 𝜃 is [𝜃̂ − 𝑐𝑎 𝑠𝜃̂ , 𝜃̂ + 𝑐1−𝑎 𝑠𝜃̂ ] where 𝑐1−𝑎 cuts off an upper tail
2 2 2
probability of α/2.
𝑝̂ 𝑞̂
To find a confidence interval estimation of the proportion we use; 𝑝̂ ± 𝑧𝛼 (√ 𝑛
)
2
- As the level of confidence increases, the z scores become more extreme, widening the
interval
- As 𝑝̂ and 𝑞̂ approach 0.5, the standard error increases so the interval widens
- As the sample size increases, the standard error decreases, so the interval narrows
𝜎
Margin of error is half the width of an interval estimate, which is equal to 𝑧𝑎 ( )
𝑛 2 √
Any specified maximum allowable margin of error is called the error bound, B.
2
𝜎 2 𝜎
𝐵 = 𝑧𝑎 ( ) → 𝐵2 = 𝑧𝑎/2
2 √𝑛 𝑛
𝑧𝑎 𝜎
2 𝜎2
To find the sample size needed for a specified error bound we calculate; 𝑛 = 𝑧𝑎/2 𝐵2
=( 2
𝐵
)2
𝑝̂𝑞̂
For proportions, error bound; 𝐵 = 𝑧𝑎 (√ 𝑛 ) however to find n, we set 𝑝̂ = 0.5 which gives a
2
𝑧𝑎 ̂ ̂
√𝑝𝑞
conservative, wide interval estimate and then solve the equation 𝑛 = ( 2
𝐵
)2
Lecture Thirteen:
- Estimators are statistics that are random variables before the samples is drawn
- We use estimators to guess unknown parameter values
- A point estimator gives a single guess of a parameter
- A confidence interval gives a range of parameter values that are consistent with observed
sample data
The null hypothesis is usually an assertion about a specific value of the parameter and always has the
‘=’ sign.
The null is assumed true unless the evidence in the data supports the notion that it is not true.
The alternative hypothesis is the maintained hypothesis, where true lies if the null is not true.
Values of Z that are evidence in support of H0 are in the acceptance region. All other values of Z are
in the rejection region.
Level of significance is the probability of the test statistic falling into the rejection region given H0 is
true, it is α.
The values of the test statistic that lie on the boundary between the acceptance and rejection
regions are called critical values.
Method:
Eg. At 5% level of significance, there is sufficient evidence in this sample to reject the null hypothesis
that the average width of paper produced by this machine has not changed.
Lecture 14:
One-tailed tests:
If the hypothesis involves < or > then we have a one-tailed tests, the method is the same, however
the shape of the rejection region is different.
With α=0.05 in a one tailed test in the lower tail, the critical value is −𝑧𝑎 = −𝑧0.05 = −1.645
The p-value is the marginal level of significance, this means any larger level of significance than the
p-value will put the calculated test statistic in the rejection region and any smaller level of
significance than the p-value will put the calculated test statistic in the acceptance region
The p-value gives the probability of observing a statistic as extreme as the observed value of the test
statistic given that the null hypothesis is true.
For z-statistics:
- For one-tailed tests in the upper tail, the p-value is P(Z>z|H0 is true)
- For one-tailed tests in the lower tail, the p-value is P(Z<z|H0 is true)
- For a two-tailed test, if z is positive, the p-value is 2P(Z>z|H0 is true) and if it is negative the
p-value is 2P(Z<-z|H0 is true) this can also be written as the p-value being 2(Z> |z||H0 is true)
The more extreme the p-value, the more we believe a hypothesis (null or alternative) therefore a
very small p-value suggest the data is unlikely to have come from the population described in the
null, and a big p-value means the observed statistic is very likely to be observed if the null is true
Power of a test:
1 – β is called the power of the test, and among tests of a given size, we want the ones with the most
power test as these give the greatest probability of making the correct decision.
There is however a trade-off between size and power. If we drive size to 0 so we never make a Type I
error by always accepting the null, we never reject a false null so our power is 0.
Similarly, we can have a power of unity by always rejecting the null, but then we always incorrectly
reject true nulls so our size goes to unity.
Lecture Fifteen:
Independent samples: two samples with observations which are unrelated to each other
𝜎2
If 𝑋~𝑁 or n is large, 𝑋~𝑁(𝜇, 𝑛
)
If X1 and X2 are both random variables and independent samples are taken from each population,
the sample means will be independent random variables;
- 𝐸[𝑋̅1 − 𝑋̅2 ] = 𝜇1 − 𝜇2
𝜎 𝜎 2 2
- 𝑉[𝑋̅1 − 𝑋̅2 ] = 𝑛1 + 𝑛2
1 2
2 2
𝜎 𝜎
- (𝑋̅1 − 𝑋̅2 )~𝑁[(𝜇1 − 𝜇2 ), (𝑛1 + 𝑛2 )]
1 2
𝜎 𝜎 2 2
(𝑋̅1 − 𝑋̅2 ) ± 𝑧𝑎/2 √(𝑛1 + 2 ) and we can test hypothesis about(𝜇1 − 𝜇2 ) by using;
𝑛 1 2
(𝑋̅1 −𝑋̅2 )−(𝜇1 −𝜇2 )
𝑧=
𝜎 2𝜎 2
√( 1 + 2 )
𝑛1 𝑛2
- 𝐸[𝑝̂1 − 𝑝̂2 ] = 𝑝1 − 𝑝2
𝑝 𝑞 𝑝 𝑞
- 𝑉[𝑝̂1 − 𝑝̂2 ] = 𝑛1 1 + 𝑛2 2
1 2
𝑝1 𝑞1 𝑝2 𝑞2
If 𝑛1 𝑝̂1 ≥ 5 and 𝑛2 𝑝̂2 ≥ 5 then; (𝑝̂1 − 𝑝̂2 )~𝑁[(𝑝1 − 𝑝2 ), ( 𝑛1
+ 𝑛2
)]
Observed frequency is the number of times a particular observation occurred relative to the total
number of observations
Expected frequency is the number of times we would expect that observation to occur if the null
hypothesis were true
For nominal data we use the chi-squared distribution: we sum together the squared differences
between the observed and expected frequencies of k outcomes relative to their expected
(𝑜−𝑒)2
frequencies: 𝜒 2 = ∑ 𝑒
if the observed values are close to expected, the test statistic will be small
With univariate data with k outcomes, the test statistic has degrees of freedom = k-1
2
The critical value for hypothesis testing will be 𝜒𝑎,𝑘−1
2
We reject H0 if 𝜒 2 > 𝜒𝑎,𝑘−1
Interpretation: We conclude there is/is not sufficient evidence in this sample to reject the null
hypothesis in favour of the alternative hypothesis that the distribution of variable is different from
the distribution of other variable at the x% level of significance.
We can use the previous method to test whether a univariate distribution first the shape we would
expect if it matched the hypothesised distribution where we have assumed the data came from a
normally distributed population – this is a goodness of fit test
Here the H0: 𝑋~𝑁(𝜇, 𝜎 2 ) and HA: X is not distributed as hypothesised in the null
We continue to use the previous method, however as this is bivariate data, we use degrees of
freedom = (r-1)(c-1) i.e. (the number of rows -1) times (the number of columns – 1) for example with
four rows and three columns we would have df=(4-1)(2-1)=3*2=6
Here, H0 is that the two variables are independent and HA is that the two variables are not
independent
If H0 is true, then P(A)=P(AUB) therefore if the null were true, the probability of one variable would
equal the probability of the union of the two for one situation
This means we can calculate the expected frequencies of the data, then we can calculate 𝜒 2 =
(𝑜−𝑒)2 2
∑ and compare it to the critical value. We reject H0 if 𝜒 2 > 𝜒𝑎,𝑘−1
𝑒
Correlation can be between, [-1, 1] with -1 being a perfectly linear, negative relationship, 1 being a
perfectly linear, positive relationship and 0 being no linear relationship at all.
We assume errors (𝜀) are random and normally distributed with a mean of zero, so we can construct
estimates of 𝛽0 and 𝛽1 using sample data. Therefore 𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑥1 and the residual, e, is 𝑦 − 𝑦̂
We want the residual for each observation to be as small as possible so the estimated line is more
likely to fit the real one so we aim to minimise the sum of the squared residuals, ∑ 𝑒 2 .
To find the values of 𝛽0 and 𝛽1 that will minimise ∑ 𝑒 2 we differentiate ∑ 𝑒 2 with respect to each
coefficient and set the results equal to zero.
𝜕 ∑ 𝑒2
= −2 ∑(𝑦 − 𝛽̂0 − 𝛽̂1 𝑥) = 0 → 𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥
𝜕𝛽̂0
𝜕 ∑ 𝑒2 ∑ 𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ 𝑆𝑆𝑥𝑦
= −2 ∑ 𝑥(𝑦 − 𝛽̂0 − 𝛽̂1 𝑥) = 0 → 𝛽̂1 = =
𝜕𝛽̂1 ∑ 𝑥 2 − 𝑛𝑥̅ 2 𝑆𝑆𝑥
1
Where 𝑆𝑆𝑥𝑦 = ∑ 𝑥𝑦 − (∑ 𝑥)(∑ 𝑦)
𝑛
OLS estimators 𝛽̂0 and 𝛽̂1 give point estimates of 𝛽0 and 𝛽1 , this method will find the best line of fit
possible for the data.
The estimated intercept coefficient 𝛽̂0 is interpreted as; when the independent variable is equal to 0,
the dependent variable is estimated to be 𝛽̂0 on average.
The estimated slope coefficient 𝛽̂1 is interpreted as; for each increase in 1 of the independent
variable, the dependent variable increases (or decreases) by 𝛽̂1 on average.
The better the fit of a regression line, the smaller the scatter of observations around the line and
hence the smaller the standard error of the estimate.
Goodness of fit:
The deviation of y from its mean is comprised of two deviations, the explained and unexplained.
R2 measures the proportion of variation in the dependent variable which can be explained by the
variation in the independent variable.
Lecture Nineteen:
Regression assumptions:
- A linear relationship exists between x and y in the population and the data can be modelled
as 𝑦𝑖 = 𝛽0 − 𝛽1 𝑥𝑖 + 𝜀𝑖 , 𝑖 = 1,2,3 … 𝑛
- 𝐸[𝜀𝑖 |𝑥𝑖 ] = 0 →𝐸[𝑦𝑖 |𝑥𝑖 ] = 𝛽0 + 𝛽1 𝑥𝑖
- 𝑉[𝜀𝑖 |𝑥𝑖 ] = 𝜎𝜀2
- 𝐶𝑜𝑣[𝜀𝑖 , 𝜀𝑗 ] = 0, 𝑖 ≠ 𝑗
Given the assumptions and CLT, 𝛽̂0 ~𝑁(𝛽0 , 𝜎𝛽̂2 ) and 𝛽̂1 ~𝑁(𝛽1 , 𝜎𝛽̂2 )
0 1
̂0 −𝛽0
𝛽 ̂1 −𝛽1
𝛽
From this, we can see that 𝜎𝛽2 ~𝑁(0,1) and 𝜎𝛽2 ~𝑁(0,1)
̂
0 ̂ 1
1 2
Since 𝜎𝛽̂2 and 𝜎𝛽̂2 are unknown, they can be estimated using 𝑠𝜀2 = 𝑛−2 ∑(𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥𝑖 ) and we
0 1
̂0 −𝛽0
𝛽 ̂1 −𝛽1
𝛽
For this, we then use ~𝑡𝑛−2 and ~𝑡𝑛−2
𝑠𝛽
̂ 𝑠𝛽
̂
0 1
𝜎2
𝜎𝛽̂2 = ∑(𝑥 −𝑥̅
𝜀
)2
- bigger when 𝜎𝜀2 gets bigger i.e. the data gets noisier
1 𝑖
- smaller as n increases
- smaller the greater the variability in x
2
The smaller 𝜎𝛽̂ is, the more precise inferences about 𝛽1 are.
1
Confidence intervals:
Hypothesis testing:
̂𝑗 −𝛽 ∗
𝛽 𝑗
To test a null of the form H0:βj=βj* against an alternative, the test statistic is with either tn-2 or z
𝑠𝛽
̂
𝑗
Extrapolation: finding a y value for given x outside of the range. The further from the range, the less
accurate our prediction because while our model may be a good fit for our data, the unknown areas
may not fit as well, or not even be linear.
This gives the range of values which would cover the average (dependent variable) with a given
independent variable at an x% level of confidence.
This gives the range of values we would expect to cover a single (dependent variable) with a given
independent variable at an x% level of confidence.
Proportionate change is (y1-y2)/y2 with percentage change being this multiplied by 100
From calculus, we can determine that proportionate change can be defined using loge, ln
This means we can use ln(𝑦̂) = 𝛽̂0 + 𝛽̂1 𝑥1 to represent a measure of proportionate change in y for a
unit change in x with 100𝛽̂1 being a measure of percentage change in y for a unit change in x.
Similarly, 𝑦̂ = 𝛽̂0 + 𝛽̂1 ln(𝑥1 ) is a measure of change in y for a proportionate change in x with
𝛽̂1 /100 being a measure of change in y for a percentage change in x.
From this, we can see that ln(𝑦̂) = 𝛽̂0 + 𝛽̂1 ln(𝑥1 ) represents a proportionate change in y for a
proportionate change in x which measures elasticity.
Lecture Twenty-one:
Many relationships theorised may be functions of more than one independent variable so we use
multiple regression as a technique to test these theories.
We can either add another independent variable as having its own effect on the dependent variable,
or we can test the relationship of two independent variables’ effects on the dependent variable.
For hypothesis testing, we take 1 degree of freedom for each independent variable, i.e. in single
variable regression we had n-2 degrees of freedom because there was only one independent
variable. In general, degrees of freedom for slope coefficients in regression analysis is n-k-1 where k
is the number of independent variables.
We test for multicollinearity by checking the correlation coefficients between each pair of
independent variables.
If z=0 there is no effect on the regression, showing the dummy variable does not affect the
dependent variable, however if it is 1 then 𝛽̂2 will be the extra amount added to the dependent
variable, meaning the intercept of the estimated regression will be (𝛽̂0 + 𝛽̂2 ).
Hypothesis testing:
If we want to know whether the effect of the dummy variable is statistically significant or not at x%
̂2 −𝛽2
𝛽
level of significance, we test H0: β2=0 or not using 𝑡𝑎,𝑛−𝑘−1 with a test statistic of rejecting if
2 𝑠𝛽
̂
2
the test statistic is < -𝑡𝑎,𝑛−𝑘−1 or >𝑡𝑎,𝑛−𝑘−1
2 2
Interpretation: This will conclude that there is/is not sufficient evidence in the sample to suggest
that the dummy variable has an effect on the dependent variable at an x% level of significance.
This measures whether or not the dummy variable has an effect of the slope of a regression. If z=0
there is no effect on the regression, showing the dummy variable does not affect the dependent
variable, however if it is 1 then 𝛽̂2 will be the extra amount added to the dependent variable, per
increase in xi, meaning the slope of the estimated regression will be (𝛽̂1 + 𝛽̂2 )𝑥𝑖 .
We can also test a dummy variable’s effect on slope and intercept by adding the two together, or
test numerous variables by adding more 𝛽̂s.