TOPIC 1 Types of Statistical Analysis

TOPIC 1 Types of Statistical
Analysis
Sample Variance:
s 2=
Mult chain rule: P(AB) = P(B|

A)*P(A)
1
( x i x )2
n1 i=1
Box Plots:
lower fence= LQ - 1.5*IQR upper
fence= UQ + 1.5*IQR
Find limits of whiskers (LX=
smallest value >LF, UX= greatest
value <UF)
UQ LQ = spread
Standardised score (z-score):
z=
xx
s
measures # and
direction of SDs the considered

value is away from the mean
If zs=0 measurement loc near
mean
s 2y =c2 s 2x
sy =
Sample space= {.}

Elementary events no
subsetscannot be broken down
Compound events made up of
elementary e.g. S={A, CA}
n!
( nr ) !
n
r
C=
n!
r ! (nr) !
Conditional prob rule

If A and B dependent
P ( A|B )=
P ( B| A ) P( A)
P( B)
TOPIC 3 Discrete Random

Variables
Population Variance
E [ ( X )2 ]
2 =
x i2P ( X =xi ) 2
i=1
TOPIC 2 Probability
P=
P ( A|B )=
|c| * sx
n
r
Bayes Rule:
Linear Transformations:
yi = a + c*xi
y =a+c x
b. P(AB) = P(A) x P(B)

c. Events are physically
independent
P ( A B)
P(B)
Statistical Independence
Events A & B stat ind if:
a. P(A|B) = P(A) i.e. knowledge
that B occurred no effect on
P(A) vice versa
R.v. Transformations:
y = a + c*X
Mean= E[Y] = a + c*E[X]
Variance = Var[a+c*X] = c2 Var(X)
Sd.= y = |c|* x
Bernoulli Distributions: exp has 2
mutually exclusive outcomes
(success, fail), inde trials,
constant
Binomial Distrib: fixed # identical
trials, trials independent,
constant for exp., 2 poss.
outcomes on each trial, binom.
variable x=# success in n trials
Poisson Distrib: events occur
randomly over a continuum of
time at constant rate, each event
occurs inde of the other events,
expected # events each unit= ,
events occur at low frequency
Poisson Approx to Binom.
For binom with small + large n:

E[X] = n , Var(X)= n(1- ) = n
=n poisson
Geometric Dist.: X= # trials till 1st
success. Conditions: same
Bernoulli
Negative Binom: k=#success >
trials inde, 2 possible outcomes at
each trial, constant , X= # trials
till 1st success
TOPIC 4 Continuous Random
Variables
Discrete: prob takes one value
Continuous: prob between 2
values
Random drawing n elements

without replace from set of N
elements
1
x + n
X
2
P ( X x )=P( z=
)
Var ( X ) n ( 1 )
>15 and n(1- )>15 for

CLT check
n
Uniform Distn: X~U(a,b)

Prob=area under curve
Poisson approx. for binom:
Normal Distn:
If X~ N(, 2) Standardizing:
z=
Hypergeometric: N= populatn
size, n= # elements drawn, r=#
success in pop,
Hyperge variable x= # success in
sample
z=
If is small and n is large
Then Z~N(0,1)
Empirical Rule (Normdist) and
Chebyshev (Any prob. dist):
1sd of =68.26%
0
2sd of =95.44%
3/4
3sd of = 99.7%
8/9
TOPIC 6 Statistical Inference
TOPIC 5 Sampling
Distributions
C.I. for :
Exactly Norm distn:

2
If X~ N (, 2) then
X N ( , )
n
Standardising Norm Distn:

If sample chosen from populatn
x
z=
/n
Central Limit Thm:

Provided parent distributn has
finite variance as n increases
(n30) distributn of xbar
approaches normdist
Normal approx.for binom:
t
x
Type I Error: H0 is true but we

reject it
Type II Error: H0 is false but we
incorrectly retain it
Prob of Type I error is the level of
signif (alpha)
If populatn variance knownztest
z obs =
x
/n
TOPIC 7 Inference about a

single populatn mean &
Investigating Normality
When unknown test statistic

now:
x 0
s/ n
t obs=
df= n-1 as df tZ
C.I. for :
Investigating Normality:
IQR/S ~1.3
Construct normal probability
plotif data approx.
normdist, points fall
approx. on a straight line
within confidence bounds
TOPIC 8 Inference Regarding
two Populatn Means
TOPIC 9 Inference Regarding

Proportions
= population proportion
(constant)
p= sample proportion p=X/n
CLT check:
np15 n(1-p)15
Proportion of Binomdist:
z obs =
t obs=
1 1
+ )
n1 n 2
Confidence Interval:
( x1 x2 )( 12 )
sp
1 1
+
n1 n2
P z / 2
p(1 p)
n
Two proportions:
^ =weighted avg of p1p 2
Pooled Variance:
( n1 1 )s21 + ( n21 )s 22
2
s p=
( n1+ n22 )
Confidence interval:
( x1 x2 ) t n1 +n 22s p
1/2n is the continuity correction

When < 0 use +1/2n, when
> 0 use -1/2n
( x1 x2)
1
2n
(1 )
n
|P|
2-Sample Test
2(
S larger
< 2
S smalle r
1= 2 if
t s
T-test Requirements:
Parent populatn normdist
z obs =
2 samples inde,
observations within each
sample inde,
The 2 populatn s.d. are
same 1= 2
1 1
+
n1 n2
Assumptions:
Data come from normal or
approx.. normdist
sample 1+ sample 2
total n
p 1p 2
z obs =
1
1
^ (1 ^ )( + )
n1 n 2
Confidence Interval for 2 Prop:
( p 1p 2 ) z
2
p 1 ( 1 p 1 ) p 2(1 p 2)
+
n1
n2
Here we dont assume 1= 2
For one sided H-test for >A

c.i. : (X,1) and use not +
estimation + variability due to

individuals
TOPIC 10 Regression and

Correlation
Assumptions:
Linearity: true linear tread for the
conditional expected value of Y
given X
Normality: residuals normdist
Constant Variance: variability
about regression line constant
Independence: response values
inde
Mean of response (Y)

Linear function of predictor (x)
E[Y|X=x] =
^
y i= ^
0+ ^
1 x
SSE= ( y i ^
y i )2=SSYY ^1 SS xy
SS2XY
SS tot =SSYY =
+ ( SS YY ^ SS XY )
SS XX
i=1
^ 1= SS XY
SS XX
sample intercept: ^ 0= y ^1 x
regression coeff:
SS XX = xi2n x2 =( n1 )S 2x
SS XY = xi y in x y
^ =s=
Coeff of Determination;
Proportion of total var in Y
explained my linear relation of Y
and X
2
R2=
SS YY ^ 1 SS XY
n2
Correlation Coeff:
SS XY
SS XX SSYY
r n2
t obs= 2
1r
1 0
t obs=
est . se .(b)
est . se. ( b ) =
r= ^p =
s
SS XX
TOPIC 11 Sample Size &

Power
Confidence interval of B:
0 t n2, / 2est . se.(b)
Type 1 error= alpha

Type 2 error= beta
Predicted Values:
Est. standard error:
x x
^ ( Y | X=x c ) ) =s 1 + ( c )
s e^ ( E
n
SS XX
c.i.:
^ ( Y| X=x c ) )
0 t n2, / 2s e^ ( E
Prediction interval:
SS reg
SS XY
=
SS tot SS XX SS YY
2
1 ( xc x )
0 t n2, / 2s 1+ +
n
SS XX
Prediction int. takes into account

variability in parameter
Power= P(reject H0|H0false)=1beta
Larger sample size, n higher

power
Higher alphahigher power
Minimize beta for fixed value
of alpha. i.e. Increase
difference between hypoth
value and true value of
populatn parameter
Larger significance level, alpha
TOPIC 12 Categorical Data
IF : Ei < 5 in any cell before

test carried outneed to:
Get larger n, or pool classes
Goodness of Fit:
Ei= expected count under H0 in
cell i
Contingency Tables
k
2
obs =
i=1
1
Oi Ei
2
Ei
Where is contin correction.

Further observed counts from
expected counts, more willing to
reject the H0
NOTE: approx. by 2obs
distribution strongly affected by
small expected values less
accurate for small Ei
CLT Check: Ei 5assumption
for 2obs ok
Deg freedom= k-1
Shape of
2obs
If H0 true ij= i x j
But be do not have i or j
Estimate:
î=
row ( i ) total
=Pi
N
^j=
column ( j ) total
=P j
N
Therefore if H0 true
î
Eij =^
j î N=
column ( j ) total row ( i ) total
N
N
N
Test Statistic:
totally dependent
of d.f. (v) = E[ 2v and is right

skewed
îj =^
j *
=
2
o bs
i =1 j=1
Oij Eij
1
2
E ij
H0: i and j independent

H1: i and j related

TOPIC 1 Types of Statistical Analysis

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

TOPIC 1 Types of Statistical Analysis

Caricato da

Copyright:

Formati disponibili

TOPIC 1 Types of Statistical

Mult chain rule: P(AB) = P(B|

direction of SDs the considered

Sample space= {.}

Conditional prob rule

TOPIC 3 Discrete Random

b. P(AB) = P(A) x P(B)

For binom with small + large n:

Random drawing n elements

>15 and n(1- )>15 for

Uniform Distn: X~U(a,b)

Poisson approx. for binom:

If is small and n is large

TOPIC 6 Statistical Inference

Exactly Norm distn:

Standardising Norm Distn:

Central Limit Thm:

Type I Error: H0 is true but we

TOPIC 7 Inference about a

When unknown test statistic

TOPIC 9 Inference Regarding

^ =weighted avg of p1p 2

1/2n is the continuity correction

Confidence Interval for 2 Prop:

Here we dont assume 1= 2

For one sided H-test for >A

estimation + variability due to

TOPIC 10 Regression and

Mean of response (Y)

TOPIC 11 Sample Size &

0 t n2, / 2est . se.(b)

Type 1 error= alpha

Prediction int. takes into account

Power= P(reject H0|H0false)=1beta

Larger sample size, n higher

TOPIC 12 Categorical Data

IF : Ei < 5 in any cell before

Where is contin correction.

column ( j ) total row ( i ) total

of d.f. (v) = E[ 2v and is right

H0: i and j independent

Potrebbero piacerti anche