Sei sulla pagina 1di 4

Stats 2 - Review Problems

1) a) Two different midterm were given to stats sections A and B. Test the claim that the midterm given to section A was harder than that given to section B. There were 35 students sampled in section A which had a class average of 55% and SD of 5%. There were 25 students in section B which had a class average of 59% and SD of 3%. Assume equal variances. Note that if the midterm given to section A was harder, the score would be lower. We must do a 2-sample T-test because the two groups are independent. Note also that since the sample sizes are not the same, it cannot be dependent. Because we have a < sign in Ha, we are doing a 1-tailed test. Ho: a= b =0.05 T-stat =
(

Ha: a< b = 55
)

Ho: a- b=0 sa=5 na=35


) ( )

Ha: a- b<0 b = 59 sp=


( ) ( )

df = na + nb - 2 = 35+25-2 = 58 sb=3 nb=25 =


( ) ( )

) (

= -3.5651

= 4.2871

Look up 3.5651 in T-table at 58 df and we find that it falls to the right of the last column corresponding to a probability of 0.0005 one tailed. *This probability will depend on the table you have available. p-value is less than 0.0005. T-crit = -1.671 from T-table at 60df. Negative because we are dealing with the left hand side of the curve. CI = ( ) + T-crit* =( ) + 1.671* = (-100, -2.1241)

Reject Ho because p< , T-stat < T-crit, and our CI does not contain a- b or "0". We conclude that the midterm given to section A was harder.

b) All students were required to complete assignment #1. To ensure that students did not copy one another the profs created 25 variations of the assignment and gave 1 copy to one student in each section. Test the claim that section A scored more than 5 marks higher than section B. The average mark was 25 in section A with a SD of 2. The average mark was 21 in section B with a SD of 1. 32 students from each section were sampled. Because we have now given the same assignments to 2 groups. We have a dependent situation. We must define what d will be. In our case we will define it as the mean of section A minus the mean of section B. Note that because we are dealing with a dependent situation, we would need some raw data to calculate the mean difference and SD of the differences. You will have to find this in minitab. For our example lets use the mean difference as 4 and the SD of the differences as 1.5. Ho: d= 5 =0.05 T-stat =
(

Ha: d> 5 = d 4 =
( )

d = a- b sd=1.5 n=32

df = n - 1 = 32-1=31

= -3.7714

Look up 3.7714 in T-table at 31 df and we find that it falls to the right of the last column corresponding to a probability of 0.0005 one tailed. *This probability will depend on the table you have available. Be careful, as the rejection are will be to the right of this line meaning p > 1-0.0005 => p >0.9995 T-crit = 1.697 from T-table at 30 df. Again, we are dealing with the right hand side of the curve, so this remains positive. CI = ( ) - T-crit*( )= ( ) - 1.697*( ) = (3.5500,100) Do not reject Ho because p> , T-stat < T-crit, and our CI does contain d or "5". c) If in parts a) and b), we increase the sample sizes to 105, how would your answers change? Show the new solution.

*a) Note that if the sample size increases to over 100 we have a large sample and can thus assume that the sample SD is approximately equal to the population SD. We then end up calculating a Z-stat instead of a T-stat as shown below. Notice that for the large sample formula we cannot use a pooled standard deviation. Ho: a= b =0.05 Z-stat =
( (

) (

Ha: a< b = 55
) )

) (

Ho: a- b=0 a=5 na=105


) ( ) ) ( )

Ha: a- b<0 b = 59

b=3

nb=105

= -7.02935

Look up -7.02935 in Z-table and we find a value of 0.5, our p-value is 0.5-0.5=0 Z-crit = -1.645 from Z-table. Negative because we are dealing with the left hand side of the curve. Reject Ho because p< , |Z-stat| > |Z-crit|. *b) Ho: d= 5 =0.05 Z-stat =
(

Ha: d> 5 =4 =
( )

d = a- b d=1.5 n=105

= -6.8313

Look up -6.8313 in Z-table and we find a value of 0.5 *This probability will depend on the table you have available. Be careful, as the rejection are will be to the right of this line meaning p = 0.5+0.5=1 Z-crit = 1.645. Again, we are dealing with the right hand side of the curve, so this remains positive. Do not reject Ho because p> , Z-stat < Z-crit.

d) If in parts a) and b), there were only 200 students in each stats section, how would your answers change? Show the new solution. You would have to apply the FPCF, not covered for 2 sample. I will show how to apply with 1 sample.

e) Given the test you chose to use in part a), what boxplots(s) should you be looking at to determine normality? If the data was not normal, what would your hypothesis look like? Because this is an independent test, we would want to look at the boxplot from each of the samples. If even one was not normal we would perform a Mann Whitney test with the following Hypothesis: Ho: Ma= Mb Ha: Ma< Mb
or

Ho: Ma- Mb=0

Ha: Ma- Mb<0

f) Given the test you chose to use in part b), what boxplots(s) should you be looking at to determine normality? If the data was not normal, what would your hypothesis look like? Because this is a dependent test, we would want to look at the boxplot of differences. If it were not normal we would perform a Wilcoxin test with the following Hypothesis: Ho: Md= 5 Ha: Md> 5 where, Md = Ma- Mb

2a) The profs for the class claim that exactly 75% of students should pass the final. To test this you sample 50 students from last year's class and find that 36 of them passed the final, test the profs claim. We are now dealing with binary data, and we have 1 sample so we will do a 1-proportion test, but first we must validate the assumptions:

Ho: p= 0.75 Ha: p 0.75 phat = 36/50=0.74 Assumptions are met, 1-proportion test is ok. =0.05 Z-stat =
( ) ( )

n=50

np=37.5>10

nq=12.5>10

= -0.1633

Look up -0.1633 in Z-table and we find a value of 0.4364. (This was using the negative table). Because we have a 2 tailed test, we have a rejection area at both ends. So p-value=2*0.4364 = 0.8728 Do not reject Ho because p> . b) The profs for the class also claim that less than 5% of students should score over 90 percent on the final exam. To test this you sample 50 students from last year's class and find that 2 scored over 90% , test the profs claim. Ho: p= 0.05 Ha: p< 0.05 phat = 2/50=0.04 Assumptions are not met, we must use binomial. n=50 np=2<10 nq=48>10

We need to calculate our P-value as P(x<=2). So x can thus take on values of 0,1,2. We must apply binomial equation for each value of "x". P(x<=2) = P(x=0) + P(x=1) + P(x=2) = 50C0*0.050*0.9550 + 50C1*0.051*0.9549 + 50C2*0.052*0.9548 = 0.5405. Do not reject Ho because p> . c) Test the claim that you are more than 10% more likely to pass the final if you passed the midterm. You sampled last semesters students and you found that 15/20 of them passed the final after passing the midterm and that 12/20 of them passed the final after failing the midterm. Ho: pp= pf + 0.1 Ha: pp> pf + 0.1 Ho: pp - pf =0.1 Ha: pp - pf >0.1 =0.05 pp_hat= 15/20=0.75 np=20 pf_hat= 12/20=0.6 Z-stat => apply formula in number 11 on the cheat sheet Z-stat = 0.3412 Look up 0.3412 in the Z-table and we find a value of 0.6331. p=1-0.6331 = 0.3669 Do not reject Ho because p> . 4) A local sports store assumes that they see twice as many customers on saturdays and sundays than they do during the rest of the weekdays. Test to see if this distribution of customers is accurate using last week's customer counts given below. Customers Expected Chi-Squared Monday 24 33 2.4545 Tuesday 25 33 1.9394 Wednesday 33 33 0.0000 Thursday 15 33 9.8182 Friday 44 33 3.6667 Saturday 74 66 0.9697 Sunday 82 66 3.8788 297 297 22.7273 Total Step 1 - Calculate the number of observations by adding up all observed values. Step 2 - Derive Expected Values. Let x be the expected number of customers on a weekday. Let 2x be the number of customers on a weekend. So the total weekly customers would be x + x + x + x+ x + 2x + 2x = 9x = 297. Solving for x, we get a value of 33. So we would expect 33 customers on weekdays and 66 customers on weekends. 2 2 2 2 2 Step 3 - Calculate chi value for each cell. chi = (o-e) /e. chi (Monday) = (24-33) / 33 = 2.4545 2 2 Step 4 - Calculate Chi Stat value by adding up the chi values from each cell. 2 Chi Stat = 2.4545 + 1.9394 + 0 + 9.8182 + 3.6667 + 0.9697 + 3.8788 = 22.7273

nf=20

Step 5 - Hypothesis Test - Ho: Data follows described distribution Ha: Data follows some other distribution 2 df=(k-1) = (7-1) = 6 Chi crit = 12.5916 2 2 Since Chi Stat > Chi crit we reject Ho, data does not follow the described distribution. P-value = less than 0.005

Potrebbero piacerti anche