Sei sulla pagina 1di 5

TOPIC 1 Types of Statistical

Analysis
Sample Variance:

s 2=

Mult chain rule: P(AB) = P(B|


A)*P(A)

1
( x i x )2

n1 i=1

Box Plots:
lower fence= LQ - 1.5*IQR upper
fence= UQ + 1.5*IQR
Find limits of whiskers (LX=
smallest value >LF, UX= greatest
value <UF)
UQ LQ = spread
Standardised score (z-score):

z=

xx
s

measures # and

direction of SDs the considered


value is away from the mean
If zs=0 measurement loc near
mean

s 2y =c2 s 2x

sy =

Sample space= {.}


Elementary events no
subsetscannot be broken down
Compound events made up of
elementary e.g. S={A, CA}

n!
( nr ) !

n
r

C=

n!
r ! (nr) !

Conditional prob rule


If A and B dependent

P ( A|B )=

P ( B| A ) P( A)
P( B)

TOPIC 3 Discrete Random


Variables
Population Variance

E [ ( X )2 ]

2 =

x i2P ( X =xi ) 2
i=1

TOPIC 2 Probability

P=

P ( A|B )=

|c| * sx

n
r

Bayes Rule:

Linear Transformations:
yi = a + c*xi

y =a+c x

b. P(AB) = P(A) x P(B)


c. Events are physically
independent

P ( A B)
P(B)

Statistical Independence
Events A & B stat ind if:
a. P(A|B) = P(A) i.e. knowledge
that B occurred no effect on
P(A) vice versa

R.v. Transformations:
y = a + c*X
Mean= E[Y] = a + c*E[X]
Variance = Var[a+c*X] = c2 Var(X)
Sd.= y = |c|* x
Bernoulli Distributions: exp has 2
mutually exclusive outcomes
(success, fail), inde trials,
constant
Binomial Distrib: fixed # identical
trials, trials independent,
constant for exp., 2 poss.
outcomes on each trial, binom.
variable x=# success in n trials
Poisson Distrib: events occur
randomly over a continuum of
time at constant rate, each event
occurs inde of the other events,
expected # events each unit= ,
events occur at low frequency
Poisson Approx to Binom.

For binom with small + large n:


E[X] = n , Var(X)= n(1- ) = n
=n poisson
Geometric Dist.: X= # trials till 1st
success. Conditions: same
Bernoulli
Negative Binom: k=#success >
trials inde, 2 possible outcomes at
each trial, constant , X= # trials
till 1st success
TOPIC 4 Continuous Random
Variables
Discrete: prob takes one value
Continuous: prob between 2
values

Random drawing n elements


without replace from set of N
elements

1
x + n
X
2
P ( X x )=P( z=

)
Var ( X ) n ( 1 )

>15 and n(1- )>15 for


CLT check
n

Uniform Distn: X~U(a,b)


Prob=area under curve

Poisson approx. for binom:

Normal Distn:
If X~ N(, 2) Standardizing:

z=

Hypergeometric: N= populatn
size, n= # elements drawn, r=#
success in pop,
Hyperge variable x= # success in
sample

z=

If is small and n is large

Then Z~N(0,1)
Empirical Rule (Normdist) and
Chebyshev (Any prob. dist):
1sd of =68.26%
0
2sd of =95.44%
3/4
3sd of = 99.7%
8/9

TOPIC 6 Statistical Inference

TOPIC 5 Sampling
Distributions

C.I. for :

Exactly Norm distn:


2

If X~ N (, 2) then

X N ( , )
n

Standardising Norm Distn:


If sample chosen from populatn

x
z=
/n

Central Limit Thm:


Provided parent distributn has
finite variance as n increases
(n30) distributn of xbar
approaches normdist
Normal approx.for binom:

t
x

Type I Error: H0 is true but we


reject it
Type II Error: H0 is false but we
incorrectly retain it
Prob of Type I error is the level of
signif (alpha)
If populatn variance knownztest

z obs =

x
/n

TOPIC 7 Inference about a


single populatn mean &
Investigating Normality

When unknown test statistic


now:

x 0
s/ n

t obs=

df= n-1 as df tZ

C.I. for :

Investigating Normality:
IQR/S ~1.3
Construct normal probability
plotif data approx.
normdist, points fall
approx. on a straight line
within confidence bounds
TOPIC 8 Inference Regarding
two Populatn Means

TOPIC 9 Inference Regarding


Proportions
= population proportion
(constant)
p= sample proportion p=X/n
CLT check:
np15 n(1-p)15
Proportion of Binomdist:

z obs =

t obs=

1 1
+ )
n1 n 2

Confidence Interval:

( x1 x2 )( 12 )
sp

1 1
+
n1 n2

P z / 2

p(1 p)
n

Two proportions:

^ =weighted avg of p1p 2

Pooled Variance:

( n1 1 )s21 + ( n21 )s 22
2
s p=
( n1+ n22 )
Confidence interval:

( x1 x2 ) t n1 +n 22s p

1/2n is the continuity correction


When < 0 use +1/2n, when
> 0 use -1/2n

( x1 x2)

1
2n
(1 )
n

|P|

2-Sample Test

2(

S larger
< 2
S smalle r

1= 2 if

t s

T-test Requirements:
Parent populatn normdist

z obs =

2 samples inde,
observations within each
sample inde,
The 2 populatn s.d. are
same 1= 2

1 1
+
n1 n2

Assumptions:
Data come from normal or
approx.. normdist

sample 1+ sample 2
total n
p 1p 2
z obs =
1
1
^ (1 ^ )( + )
n1 n 2

Confidence Interval for 2 Prop:

( p 1p 2 ) z
2

p 1 ( 1 p 1 ) p 2(1 p 2)
+
n1
n2

Here we dont assume 1= 2

For one sided H-test for >A


c.i. : (X,1) and use not +

estimation + variability due to


individuals

TOPIC 10 Regression and


Correlation

Assumptions:
Linearity: true linear tread for the
conditional expected value of Y
given X
Normality: residuals normdist
Constant Variance: variability
about regression line constant
Independence: response values
inde

Mean of response (Y)


Linear function of predictor (x)
E[Y|X=x] =

^
y i= ^
0+ ^
1 x

SSE= ( y i ^
y i )2=SSYY ^1 SS xy

SS2XY
SS tot =SSYY =
+ ( SS YY ^ SS XY )
SS XX

i=1

^ 1= SS XY
SS XX
sample intercept: ^ 0= y ^1 x
regression coeff:

SS XX = xi2n x2 =( n1 )S 2x
SS XY = xi y in x y
^ =s=

Coeff of Determination;
Proportion of total var in Y
explained my linear relation of Y
and X
2

R2=

SS YY ^ 1 SS XY
n2

Correlation Coeff:

SS XY
SS XX SSYY
r n2
t obs= 2
1r

1 0
t obs=
est . se .(b)
est . se. ( b ) =

r= ^p =

s
SS XX

TOPIC 11 Sample Size &


Power

Confidence interval of B:

0 t n2, / 2est . se.(b)

Type 1 error= alpha


Type 2 error= beta

Predicted Values:
Est. standard error:

x x
^ ( Y | X=x c ) ) =s 1 + ( c )
s e^ ( E
n
SS XX
c.i.:

^ ( Y| X=x c ) )
0 t n2, / 2s e^ ( E

Prediction interval:

SS reg
SS XY
=
SS tot SS XX SS YY

2
1 ( xc x )
0 t n2, / 2s 1+ +
n
SS XX

Prediction int. takes into account


variability in parameter

Power= P(reject H0|H0false)=1beta

Larger sample size, n higher


power
Higher alphahigher power
Minimize beta for fixed value
of alpha. i.e. Increase
difference between hypoth
value and true value of
populatn parameter
Larger significance level, alpha

TOPIC 12 Categorical Data

IF : Ei < 5 in any cell before


test carried outneed to:
Get larger n, or pool classes

Goodness of Fit:
Ei= expected count under H0 in
cell i

Contingency Tables

k
2

obs =
i=1

1
Oi Ei
2
Ei

Where is contin correction.


Further observed counts from
expected counts, more willing to
reject the H0
NOTE: approx. by 2obs
distribution strongly affected by
small expected values less
accurate for small Ei
CLT Check: Ei 5assumption
for 2obs ok
Deg freedom= k-1
Shape of

2obs

If H0 true ij= i x j
But be do not have i or j
Estimate:

^i=

row ( i ) total
=Pi
N

^j=

column ( j ) total
=P j
N

Therefore if H0 true

^i
Eij =^
j ^i N=

column ( j ) total row ( i ) total

N
N
N

Test Statistic:

totally dependent

of d.f. (v) = E[ 2v and is right


skewed

^ij =^
j *

=
2
o bs

i =1 j=1

Oij Eij

1
2

E ij

H0: i and j independent


H1: i and j related

Potrebbero piacerti anche