Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
A Agenda: d
Hypothesis testing Z test T test (independent and dependent) Analysis of Variance (ANOVA) F-test
Research Process
Problem Definition Approach Development Research Design Fieldwork & Data Collection Data Analysis Report & Presentation
G K Saini
One Sample * Chi-Square * K-S * Runs * Binomial Independent * Chi-Square * Mann-Whitney * Median * K-S * K-W ANOVA
* Multivariate Analysis * Factor Analysis of Variance * Confirmatory * Canonical Correlation Factor Analysis * Multiple Discriminant Analysis * Structural Equation Modeling and Path Analysis
What is a Hypothesis?
A hypothesis is an assumption about the population parameter Examples of parameters are population mean The parameter must be identified before analysis I claim the average monthly productivity of a salesman is = 24 !
: = 24 ) , not
Begins with the assumption that the null hypothesis is true Similar to the notion of innocent until proven guilty Refers to the status quo May or may not be rejected
T k a Sample Take S l
( X = 28 )
If H0 is true
H 0 = 40000.( Hypothesized .Value.of .Pop.Mean) H0: Yearly output per worker after training is 40000 units. H 0 : = 40000.( Null.Hypothesis ) H1: Yearly output per worker after training is not 40000 units. H 1 : 40000.( Alternate.Hypothesis) = 0.05.( Level.of .Significance)
( we .know ); So ; =
2000 100
H 0 1.96 x
UpperConfi denceLimit = 40000 + 1.96(200) = 40392.Units LowerConfi denceLimit = 40000 1.96(200) = 39608.Units
Since, the sample mean (i.e. 39650) lies in the confidence interval so accept the null hypothesis. That means there are no significant difference between hypothesized mean and observed sample mean. There is an improvement in the productivity after the training, as expected (i.e., 40k unit)
H 0 = 40000.( Hypothesized .Value.of .Pop.Mean) H 0 : = 40000.( Null.Hypothesis) H0: Yearly output per worker after training is 40000 units. H1: Yearly output per worker after training is not 40000 units. H 1 : 40000.( Alternate.Hypothesis) = 0.05.( Level.of .Significance) 2000 ( we .know ); So ; = x = = 200 .( SE .of .Mean )
n 100
z=
x H0
The 0.95 acceptance region contains two equal areas of 0.475. Z value for 0 0.475 475 of the area under the curve is 1.96
Don ' t.. Re ject .If ;1.96 < Z > 1.96 1.96 < ( 1.75) > 1.96
Since, the z value (i.e. -1.75) lies in the confidence interval ( 1.96 ) so accept the null hypothesis. That means there are no significant difference between hypothesized mean and observed sample mean. There is an improvement in the productivity after the training, as expected (i.e., 40k unit).
If the test statistic falls beyond the critical values, reject H0 Otherwise do not reject H0
C Compare th the p-value l with ith If p-value , do not reject H0 If p-value , reject H0
Source: Arsham H., Kuiper's P-value as a Measuring Tool and Decision Procedure for the Goodness-of-fit Test, Journal of Applied Statistics, Vol. 15, No.3, 131-135, 1988.
Level of Significance,
Defines unlikely values of sample statistic if null hypothesis is true Called rejection region of the sampling distribution Is designated by , (level of significance) Typical values are .01, .05, .10 Is selected by the researcher at the beginning Provides the critical value(s) of the test
Guilty
Correct Reject H0
T Type II Error E Fails to reject a false null hypothesis The probability of Type II Error is The power of the test is 1
Reject H0
H0 : 3 H1 : < 3
7. Collect data 8. Compute test statistic and p-value 10. Express conclusion
-1.645 100 employees surveyed Computed test stat =-2, p-value = .0228 p
=.05 05
n = 100 Z test
Rejection Region
H0: 0 H1: < 0 H0: 0 H1: > 0
Reject H0
Reject H0
0
Z Must Be Significantly Below 0 to reject H0
Z 0
Small values of Z dont contradict H0 Dont Reject H0 !
Z test statistic
Z= X x
X x
/ n
Test Statistic:
Z=
Reject .05
= 0.05
n = 25 Critical Value: 1.645
=1.50
Microsoft Excel Worksheet
Decision:
Conclusion:
0 1.645 Z
1.50
= 0.05
n = 25 Critical Value: 1.96
R j t Reject .025 -1.96 .025
Z=
Decision: Do Not Reject at = .05 Conclusion: No Evidence that True Mean is Not 368
0 1.96 1.50
t=
x H 0 84 90 = = 2 . 44 x 2 . 46
Reject the null hypothesis as t value is more than the critical value i.e. 1.729 (for 19 d.f. and 0.10 level of significance)
Less than than, more than than, less than or equal to to, and greater or equal to.
T- Distribution Ta able
22
n2
22 n2
x1 x 2 =
x1 = 1
Sampling Dist. of the diff. b between sample l means
x2 = 2
z=
x 1 x 2 = Diff .Between.SampleMean s
x
x1x2
z=
Dist .of . All .Possible .Values ..of .x 1 x 2
( x1 x 2 ) (1 2 )H 0
H 0 : 1 = 2 (there .is.no.difference ) Price in Delhi and Mumbai are equal H 1 : 1 2 (there .is.a.difference )
= 0 .05
x1x 2 = 21 22 n1 + n2 .= (0.40) 2 (0.60) 2 + 200 175
Standardizing the difference of sample means; H0=hypothesized difference between two means Since 2.83>1.96, reject the null hypothesis H0 Interpretation: There is significant difference in T-Shirt price at Delhi and Mumbai
= 0.00286 = 0.053
z= =
( x1 x 2 ) (1 2 )H 0 x1x2
Question: Please test whether prices are about 0.10$lower in Delhi than Mumbai, at 0.05 level.
Mumbai (2) 9 10 9.10 0 60 0.60 175 Question: Please test whether prices are about 0.10$lower in Delhi than Mumbai, at 0.05 level.
H 0 : 1 = 2 0.10
H1 : 1 2 0.10 Prices are not 0.10$ lower in Delhi than Mumbai = 0.05
z= = ( x1 x 2 ) (1 2 )H 0 x1 x2
Standardizing S d di i the h diff difference of f sample l means; H0=hypothesized difference between two means
Test for Differences Between Means: Small Sample Size: Independent Samples
Weighted average of
2 s12 .and .s 2
Firm B
H 0 : 1 = 2 (there .is.no.difference .in. productivi ty .of .two . firms ) H 1 : 1 > 2 ( productivi ty .is.higher .in. firm1.than . firm 2) = 0.05
Estimated Standard Error of the Difference between Two Sample Means with Small Samples and Equal Pop. Variance
x1 x 2 = s p
1 1 + n1 n 2
Tests for Differences Between Means: Small Sample Size: Dependent (Paired) Samples
Sale Before Sale After s.no. Promotion Promotion 1 2 3 4 5 6 7 8 9 10 60 70 65 63 67 70 73 75 64 68 64 75 68 64 69 74 75 79 70 71 Various Forms of Hypothesis?
Considering Two Samples
H 0 : 1 = 2 (there .is.no.difference .in. productivi ty .of .two . firms ) H 1 : 1 > 2 ( productivi ty .is.higher .in. firm1.than . firm 2) = 0.05
s2 p =
s2 p =
x1 x 2 = s p
Critical Value for 22 d.f. for 0.05 level ( x1 x 2 ) (1 2 )H 0 (80 70) 0 t= = = 1.59 of significance is 1.717. x1 x 2 6.27
So accept the Null hypothesis because 1.59<1.717. Interpretation: There are no significant differences between two firms.
Tests for Differences Between Means: Small Sample Size: Dependent (Paired) Samples
Sale Before Sale After sno Promotion Promotion Diff (x) Diff (xsq.) 1 2 3 4 5 6 7 8 9 10 mean 60 70 65 63 67 70 73 75 64 68 67.5 64 75 68 64 69 74 75 79 70 71 70.9 4 5 3 1 2 4 2 4 6 3 34 16 25 9 1 4 16 4 16 36 9 136
t=
x = n = 1 .505 10 = 0 .476
s= x=
x ,&. x &s=
n
34 = 3 .4 10
n 1
nx n 1
136 10 ( 3 .4 ) 2 = 9 9
2 .267 = 1 .505
x H0 3 .4 2 .5 = = 1 . 890 x 0 . 476
If the test concerns whether one mean is significantly higher or significantly lower than the other
Use a one tailed test.
Since T value (1.890) is more than critical value (1.833), for 0.05 level of significance, so reject the null hypothesis. Interpretation: There is more improvement in sales than the hypothesized.
ANOVA: Exercise 1
A firm adopted three promotional campaigns in three different territories for the same product, and is interested in knowing whether one of them is more effective than others in generating sales. 5 days sales figures are observed at random from each method, which are given below.
Day 1 2 3 4 5 News Paper Ads(1) 25 30 36 38 31 TV Advertising (2) 31 39 38 42 35 Direct Marketing (3) 24 30 28 25 28
Assumptions
Each of the sample is drawn from a normal populations Each of the population has same variance, 2 . However, in larger sample size normality assumption is not must.
x
2
n
2 x
= ( we .know )
n .( Pop .Variance )
2 2 = sx n. =
n( x x)
k 1
So So, the process of ANOVA is Determine estimate of 2 from the variance between the sample means within the sample means Compare two estimates of 2 to reject or accept H0
2 b =
( Estimate.of .Between.Column( Sample).Variance) k 1 where : n j = size.of .the. jth.sample; x j = sample.mean.of .the. jth.sample
n j ( x j x) 2
( x x)
n 1
( Sample.Variance)
2 w =
where : n j = size.of .the. jth.sample; s2 p . var iance.of f .the. j jth.sample p j = sample nT = n j = total.sample.size k = number.of .samples
F-ratio should be one (close to one), if null hypothesis is true. So, if smaller the F-value, more likely y that H0 is true. Larger the F-value less likely that H0 is true. In practice, when populations are unequal, the variance among the sample means tends to be higher than the variances within the sample. This results into higher F-value leading to rejection of H0
The F-distribution
The F-distribution
Calculating d.f. for distribution
d . f .in .the .numerator .of .the .F .ratio = ( k 1)
The F-distribution F distribution is useful to calculate the ratios of variances of normally distributed statistics. The shape of an F distribution is positively skewed and depends on the degrees of freedom for the numerator and the denominator. The value of F is always positive. The F-distribution depends on the number of d.f., however, it approaches symmetry with increasing d.f.
Comparing F-value with critical value Taking the decision: rejecting and accepting the H0
Source: http://www.socialresearchmethods.net/kb/expfact.php
Source: http://www.socialresearchmethods.net/kb/expfact.php
For all settings, the 4 hrs/week condition worked better than the 1hr/week
In class training was better than pull out training for all amount of time
Source: http://www.socialresearchmethods.net/kb/expfact.php
Source: http://www.socialresearchmethods.net/kb/expfact.php
4hrs/week always works better than 1 hr/week and in-class setting always works better than pull-out
4hrs/week and in-class setting does better than the other three
Source: http://www.socialresearchmethods.net/kb/expfact.php
Source: http://www.socialresearchmethods.net/kb/expfact.php
Source: http://www.socialresearchmethods.net/kb/expfact.php
Chi-Square (
Chi-Square Distribution
v=degree of freedom Sampling dist. of ( 2 ) statistic can be approximated by a continuous curve known k as a ( 2 ) distribution, given H0 is true. There is different dist. for each of different number of degrees of freedom. Smaller the d.f., rightly skewed the dist. ( 2 ) curve tend to approach symmetry with increasing d.f. and can become normal. Dist. is nothing but a prob. dist. so 2 total area under the curve in each ( ) dist. is 1.0
>2 samples (variables) are being investigated. Chi-square as a test of independence of attributes. Chi-square test allows to test whether more than two population proportions can be considered equal.
Whether the differences observed among various sample proportions are significant or only due to chance.
H 0 : p s1 = p s 2 = p s 3 = p s 4 ( Null .Hypothesis )
H 1 : p s1 p s 2 p s 3 p s 4 ( Alternativ e.Hypothesis )
If the columns are not contingent on rows, than the rows and column frequencies are independent. Test to whether columns are contingent on rows, is known as chisquare test of independence independence.
2 =
( fo fe )2 .(Chi Square.Statistic ) fe
Observed and expected frequencies Calculating expected frequencies Comparing expected and observed frequencies Accepting and rejecting the null hypothesis
Chi-Square: Exercise 1
An IT firm was trying to find out what variables influenced the attrition rate. Type of education institution was suggested as possible variable influencing attrition. A sample of 500 employees was selected, which is given below.
With the firm Tier I eng. college (IITs) Tier II eng. College (NITs) Tier III eng. (others) 0 50 80 Left the firm 250 100 20
Chi-Square: Exercise 2
Following table shows the classification of 1000 workers in a factory, according to disciplinary action taken by the management and their promotional experience.
Disciplinary Action Promotional Exp Promoted Offenders Non-offenders 30 70 Not Promoted 670 230
Is there any evidence from above data of a relation between type of college and attrition?
Examine whether the disciplinary action taken and promotional experiences are associated.
Locating the Critical Value in the Chi-square Table: Right Tailed vs. Left Tailed Test
Right Tailed Test , use the upper pp limit table for the right g tailed To find out critical value, tests Left Tailed-Test To find the critical chi square value for a left tailed test, you will be using the table labeled "lower limits. Subtract your significance level (given by the Greek symbol alpha) from 1. 1 For example, example if your significance level is 0 0.025, 025 then 1-0 1-0.025 025 = 0.975. Find this value at the top of the chi square table, heading a column.