Sei sulla pagina 1di 50

T distribution

Student t-test

Steps in hypotheses testing concerning


SteSteps 1. Set up hypotheses Examples Ho: = o H1 : o

Select level of significance


2. Select the appropriate statistics Z=

=0.05
X- o /n

3. Generate decision rule 4. Compute the value test value 5. Draw conclusion about Ho by comparing the test value (4) to the decision rule (3).

Reject Ho if Z Z1- Do not reject Ho if Z Z1-

Test of hypothesis concerning


Assumptions: Normal distribution or large sample (n30) Simple random samples x z ( /n)

Case 1: known Case 2. unknown and n 30 Case 3: unknown and n 30

x
X

z (s/n)
(s/n), df = n-1

Estimation of two samples is concerned with estimating (1-2), the difference in means between groups/populations Tests of hypotheses in the two sample case are also concerned with the difference in the means. E.g. Ho : 1-2= 0 ( no difference in means) vs H1: 1-2 0 (means are different) Or H1: 12 (the mean of population 1 is larger than the mean of population 2) Or H1: 12 (the mean of population 1 is smaller than the mean of population 2)
4

General format
Test value =(observed)-(expected)
_____________________________ Standard error 1 - 2 is the observed difference, and 1-2 is the expected difference which is 0 when the null hypothesis is 1= 2, since the equivalent of 1-2 = 0 Standard error of the difference is (1/n1)+(2/n2)

If 12 and 22 are not known, the researcher can use the variances s12and s22 obtained from sample respectively, provided the sample sizes must be 30 or more. The formula then is:
1 - 2 = difference in means ______ (s12/n1)+(s22/n2) = standard error of difference in means Provided n1 30 and n2 30.

In comparison between two means, the same basic steps of hypothesis testing for Z are followed. When comparing two means by using t-test, the researcher must decide if the two samples are independent or dependent. Two assumptions of difference between two means:
The sample must be independent of each other The population from which the samples were drawn must be normally distributed
7

Student t-test
1. Testing the difference between two means : independent large samples 2. Testing the difference between two means: independent small samples 3. Testing the difference two means: small dependent samples

Testing between two means of independent large samples


general formula
Test statistic

1 2 (S12 /n1+ S22/n2)

Confidence Interval: ( - ) = Z1-/2 (se)

Example 1.
A survey found that the average hotel room rate in Zaria is N88.42 and the average room rate in Funtua is N80.61. Assume that the data were obtained from two samples of 50 hotels each and the standard deviation were N5.62 and N4.83, respectively. At = 0.05, can it be concluded that there is a significant difference in the rates?
10

solution
Step 1 State the hypotheses Ho : 1 = 2 and H: 1 2 (claim) Step 2. Find the critical value Z = 1.96 Compute the test value
1 - 2/(S12/n1)+(S12/n2), thus substitution 88.42-80.61/(5.622/50)+(4.832/50) 7.81/(31.5844/50)+(23.3289/50) 7.81/(0.6317)+(0.4666) 7.81/(1.0982)
11

7.81/(1.0983)

7.81/1.048 (note 1.048 is se) t= 7.4523 Step 4. Make the decision. If tcalc > ttab Reject the null hypothesis (Ho)
Step 5. Summarize the result There is no enough evidence to support the claim that the means are not equal. Hence there is significant difference in the rates.

12

Fixing of confidence limit


1 - 2 Z *(S12/n1)+(S12/n2)
88.42-80.61 1.96*(5.622/50)+(4.832/50) 7.81 *1.96*(31.5844/50)+(23.3289/50) 7.81 1.96*(0.6317)+(0.4666) 7.81 1.96*(1.0982)

7.81 1.96(1.0478) 7.81 2.0537 therefore 7.81 2.0537 = 5.7363 &7.81 + 2.0537 = 9.8637 CI = (5.7363,9.8637)
13

Using of confidence level to test hypotheses


State the hypotheses Ho : 1 2 = 0 H1 : 1 2 0 Make a decision. If CI does not contain 0, Reject null hypothesis CI (5.7363, 9.8367) does not contain 0, therefore Ho is rejected Summary. No enough evidence that the means are the same. There is significant difference in mean rates
14

Supposing the mean cholesterol level of males age 50 is 241. An investigator wishes to examine whether the cholesterol levels are significantly reduced by modifying diets only slightly. A random sample of 12 patient agree to participate in the study and followed the modified diet for 3months. After 3months, their cholesterol levels were measured and summary statistics are produced on the n=12 subjects. The mean cholesterol level is 235 with standard deviation of 12.5. Based on the data is there statistical evidence that the modified diet reduces cholesterol?
15

1. set up hypotheses Ho : = 241 H1: 241 2. select the appropriate test statistic t = -uo /(s/n) for 3. Decision rule Reject Ho if t -1.796 (df= 11, p =0.05) Do not reject Ho if t < -1.796

16

4. test value t = -o /(s/n) Substituting the values in the formula above:


235-241 12.5/12 -6 = -6 12.5/3.464 3.6 = -1.66

17

Example 4
An investigation is undertaken to examine the mean times to relief from headache pain under 2 entirely treatments: medication vs Relaxation treatment. Patients suffering from chromic headaches are enrolled in a study and randomly assigned to one or the two treatments under investigation. Patients are instructed to either take assigned medication or perform the relaxation exercises at the onset of their next headache. They are also instructed to record the time in minutes until the headache pain is resolved.
18

Fifteen subjects are assigned to the medication treatment and report a mean time relief of 33.8minutes with a variance of 2.85minutes. A second random sample of 15 subjects are assigned to the relaxation treatment, and report a mean time to relief of 22.4minutes with a variance of 3.07 minutes The data layout is shown below/next slide

19

Patients with chronic headaches

Randomize

Medication

Relaxation treatment

n1 = 15 1 = 33.8 minutes S12 = 2.85

N = 15 2 = 22.4 minutes S22 3.07

Are these sample means statistically significantly different . Run an appropriate test to asses whether there is a significant difference in the mean time to relief under the two different treatments using 5% level of significance. 20

Formula
12/ Sp*(1/n1+ 1/n2)
Where Sp = pooled standard deviation = (X12 (x)2/n1) + (X22 (x)2/n2)
_______________ + _______________ n1-1 n2 - 1

21

Substituting in the formula: t = 33.8 22.4


__________ 1.72 *(1/15+1/15) 11.4/0.63 = 18.10

t = 18.10 > 2.08 (t0.05, df n1+n2 -2) Reject Ho because there is significant evidence that there is difference in the mean relief time between medication and relaxation therapy.
22

Two dependent populations


Attributes Samples are matched or paired, n (# pairs) 30 Samples are matches or paired, n(# pairs) 30 Test Statistic Z = d - d __________ Sd/n t = d - d __________ Sd/n df = n-1 Confidence Interval d Z1-/2*Sd/n

d t1-/2*Sd/n df = n-1

Where d, Sd are the mean

and standard deviation of

the difference scores

23

Example
A nutritionist expert is examining a weight loss programme to evaluate its effectiveness. Ten subjects were randomly selected for the investigation. Each subjects initial weight is recorded, they follow the program for six weeks, and they are again weighed. The data are given below:

24

Subjects
1 2 3 4 5 6 7 8 9 10

initial weight
180 142 126 138 175 205 116 142 157 136

Final weight
165 138 128 136 170 197 115 128 144 130
25

Sbjts
1 2 3 4 5 6 7 8 9 10

iw
180 142 126 138 175 205 116 142 157 136

fw
165 138 128 136 170 197 115 128 144 130

difference(d) difference2(d2)
15 4 -2 2 5 8 1 14 13 6 225 16 4 4 25 64 1 196 169 36

d = 66

d2 = 740
26

d = d/n = 66/10 = 6.6 S2d = d2 (d)2/n = n-1 S2d Sd = 33.82 = 33.82 = 5.82 740 (66)2/10 9

27

Test the hypothesis


1. Set up hypotheses Ho : d = 0 H1 : d 0 2. Select the appropriate statistic t = __ d - d ____ Sd/n

28

6.6-0/(5.8210) = 3.59 df n-1 = 10 1 = 9 tcalc. = 3.59, ttab(0.05, df=9) = 2.262 tcalc. ttab Reject Ho We have 95% significant evidence, to show that there is mean weight loss following six weeks program.

29

Fixing of confidence interval


Recall d t1-/2*Sd/n d = 6.6, t1-/2 = 2.262, Sd = 5.83 , 10 = 3.162278 6.6 2.262* 5.83/3.162278 6.6 2.262 * 18.44 6.6 41.71128 6.66+41.71128 = 48.31128 6.66- 41.71128 = -35.11128 (-35.11128, 48.311128) Do not reject Ho: we have 95% significant evidence to show that the program has no significant effect on mean weight loss after six weeks.
30

Chi-square table

31

Chi- Square Analysis


Goodness of fit test Test of independence Test of heterogeneity
Used for the test of hypotheses of multi-variable data in one-sample, two or more sample applications. Both tests and test statistic follows chi-square distribution (2).
32

Goodness of fit test


Test Statistic 2 = (O-E)2 E Where O = observed, E = expected E.G. Volunteers at a teen hotline have been assigned to based on the assumption that 40% of all calls are drug-related, 25% are sexrelated, 24% are stress-related and 1% concern educational issues.
33

For this investigation, each call is classified into one category based on the primary issue raised by the caller. To test the hypothesis, the following data are collected from 120 randomly selected calls placed to the teen hotline. Based on the data, is the assumption regarding the distribution appropriate?

34

Topical issue Drugs Number of calls 52 Sex 38 Stress 21 Education 9

35

1 Sep up the hypothesis Ho : p1 = 0.40, p2 = 0.25, p3 = 0,25, p4 = 0.1 H1 : Ho is false 2 Select appropriate statistic 2 = (O-E)2 E 3. select level of confidence = 0.05, here we determine df, n-1, 4 1 = 3 2 = 7.815 from table @ df=3, critical level 0.05
36

4. Decision rule Reject Ho if 2 7.815 Do not reject Ho if 2 7.815 5. compute test statistic

37

Organized computations of the test statistic


Topical Issue O= (observed frqcy) E= (expected frqncy) (O-E) (O-E)2/E Drugs 52 Sex 38 Stress 21 Educational 9 Total 120

120(0.40) = 48 4 (4)2/48 = 0.33

120 (0,25) = 30 8 (8)2/30 = 2.13

120(0.25) = 30 -9 (-9)2/30 = 2.70

120(0.1) 12 -3 (-3)2/12 = 0.75

120

0 5.915

The test statistic (2) = 5.913

38

Conclusions Do not reject Ho since 5.913 7.815 We do not have significant evidence = 0.05 to show that the distribution of topical issues in the calls placed to the teen hotline is not as assummed (40% drug related, 25% sexrelated, 25% stress-related and 10% eductionrelated).

39

Test of Independence
This considers applications involving two or more samples or two categorical variables. Our interest is to evaluate whether these two categorical variables are related (dependent/associated) or unrelated (independent/ not associated). The following example illustrates the use of 2 test of independence
40

Example. The following data were collected in a multisite study of medical effectiveness in type II diabetes. Three sites were involved in the study, a health maintenance organization (HMO), a university teaching hospital (UTH), and an independent practice association (IPA). Type II patients were enrolled in the study from each site and monitored for over a three year period. The data below illustrate the treatment regimens of patients measured by site
41

Treatment Regimens Site HMO UTH IPA Total Diet & exercise 294 132 189 615 Oral Hypoglycemics 827 388 516 1630 Insulin 579 352 404 1335 Total 1700 772 1109 3581

42

The table above is a 3 X 3 cross-tabulation table or a contingency table. Both sites and treatment regimens are categorical variables Site is called the row variables and treatment regimen is called the column variables The number of rows in the table is denoted R and the number of columns in the table is denoted C. In this table, R=3 and C=3 The row and column totals are shown on the right and bottom of the table, respectively.

43

The 9 combinations of site and treatment regimens are called the cells of the table. e.g. Patients in the HMO treated by diet and exercise denoted one cell of the table, patients in the HMO treated by the oral hypoglycemics denote another cell, etc, We wish to use the data to test the hypothesis that the two variables (site and treatment regimen) are independent (i.e. no difference in treatment regimen across sites) The hypotheses are written as follows
44

1. set up the hypothesis Ho : Site and treatment regimen are independent ( no relationship between site and treatment regimen) H1 : Ho is false ( site and treatment regimen are related) 2. Select the significant level ( = 0.05) 3. select the appropriate statistic 2 = (O-E)2 E
45

4. Decision rule To select the appropriate critical value, we first determine the df = (R-1)(C-1)=(3-1)(3-1) DF= (2)(2) = 4 From the table 2 = 9.49 Reject Ho if 2calc 9.49(tab) else do not reject Ho if 2calc 9.49(tab) 5. compute the test statistic

46

To compute the test static Note that the observed values are displayed in the cells Let us compute the expected values and put them in parenthesis in each cell. The expected value for each cell is computed by finding the product of the row and column totals in which the cell is located / total patients involved in the investigation. Eg expected frequency of HMO and diet / exercise = 1700 X 615/3581
47

Treatment Regimens Site HMD Diet & exercise 294 (1700X615)/3581 = 291.95) 132 (772X615)/3581 = 132.6) 189 (1109X615)/3581 = 189.5) 615 Oral Hypoglycemics 827 (1700X1630)/3581 = 774.3) 388 (772X1630/3581 = 351.6) 516 (1109X1630)/3581 = 505.1) 1630 Insulin 579 (1700X1335)/3581 = 633.8) 352 (773X1335)/3581 = 287.8) 404 (1109X1335)/3581 = 413.4) 1335 Total 1700

UTH

772

IPA

1109

Total

3581

Note: The marginal totals of observed = marginal totals of expected

48

Using the observed and expected frequencies, we compute the test statistics 2 = (O-E)2 E (294-291.5)2 + (827-774.3)2 + (579-633.8)2 + 291.5 774.5 633.8 (132-132.6)2 + (288-351.6)2 + (352-187.8)2 + 132.6 351.6 187.8 (189-190.5)2 + (516-505.1)2 + (404-413.3)2 = 190.5 505.1 413.3
49

2 = 0.014 + 3.359 + 4.732 + 0.003 + 11.509 + 14.320 + 0.011 + 0.2235 +0.215 = 34.629 Conclusion. Reject Ho since 34.629 9.49. we have significant evidence ( = 0.05) to show that site and treatment regimen are not independent ( i.e. related).

50

Potrebbero piacerti anche