Sei sulla pagina 1di 3

B io Factsheet

April 2003 www.curriculumpress.co.uk Number 122


Answering Exam Questions on Statistics
Examination questions may require the calculation and interpretation of statistical measures and tests. This Factsheet discusses
strategies for approaching such questions and gives guidance on common mistakes to avoid. Factsheets 79 and 85 cover the chi-squared
test and t-test specifically. A later Factsheet will cover diagrams and their interpretation.
What can they ask you? Calculating the standard deviation
Exactly what is examinable depends on the specification you are studying, The formula for this that you will be given is:
but there are three main categories:
Σx 2− mean2
standard deviation =
n
• basic statistical calculations and their interpretation
Σ means "sum of", so Σx2 means "square each value then add them up"
• chi-squared test
• t-test a) For a list of numbers:
i) Square each number and add up the squares (this gives Σx2)
Basic statistical calculations and their interpretation ii) Divide your answer to i) by how many numbers there are
All specifications require you to calculate the mean; some also require the (this gives Σx2/n)
standard deviation. You need to remember the formula for the mean, but iii) Find the mean and square it.
will be given it for the standard deviation. iv) Take the answer to iii) away from the answer to ii)
(this gives everything inside the square root)
Calculating the mean v) Square root the answer to iv) (this gives the standard
a) For a list of numbers, just add them all up and divide by how many there deviation)
are.
b) For a table of grouped data, follow this procedure eg: Find the standard deviation of 2, 5, 6, 7, 8

Step 1. Find out the midpoint of each class, by adding its endpoints i) Σx2 =22 + 52 + 62 + 72 + 82 = 178
and dividing by two. Add it to the table. Call this column "x" ii) ÷ 5 = 35.6
Σx2 /n = 178÷
iii) Mean = (2 + 5 + 6 + 7 + 8)÷ ÷5 = 5.6 Mean2 = 5.62 = 31.36
Step 2. Add another column, and put in it the values of iv) Σx /n − mean = 35.6 − 31.36 = 4.24
2 2

x × number of individuals (f) v) standard deviation = √4.24 = 2.0591

Step 3. mean = total of "x × f" column


total of "f" column b) For a table of grouped data
eg: Find the mean of the following data i) Complete the columns "x" and "x × f" as for finding the mean
ii) Add another column, which is x2 × f
Length Number of x x×f iii) Find the total of the " x2 × f" column. (this gives Σx2)
(nearest cm) individuals (f) iv) Divide your answer to iii) by the total of the "f" column
9 - 11 6 (9 + 11) ÷ 2 = 10 60 (this gives Σx2/n)
12 - 14 11 (12 + 14) ÷ 2 = 13 143 v) Find the mean, as described opposite, and square it
15 - 17 10 (15 + 17) ÷ 2 = 16 160
vi) Take the answer to v) away from the answer to iv)
18 - 20 8 (18 + 20) ÷ 2 = 19 152
21 - 23 4 (21 + 23) ÷ 2 = 22 88
(this gives everything inside the square root)
vii) Square root the answer to vi) (this gives the standard deviation)
mean = 60+143+160+152+88 = 15.46
6+11+10+8+4 eg. Find the standard deviation of the following data

Calculator Tip:- To do this sum on your calculator, you need to


Length Number of x x×f x2 × f
put brackets around all of the top and all of the bottom, like this:
(nearest cm) individuals (f)
(60 + 143 + 160 + 152 + 88) ÷ (6 + 11 + 10 + 8 + 4)
9 - 11 6 10 60 600
12 - 14 11 13 143 1859
15 - 17 10 16 160 2560
Calculator Tip:- Most scientific or graphical calculators will 18 - 20 8 19 152 2888
allow you to calculate mean and standard deviation automatically. 21 - 23 4 22 88 1936
This can save a lot of time! However, not all calculators do it in
the same way, so you need to consult your calculator instruction
iii) Σx2 = 600 + 1859 +2560 + 2888 + 1936 = 9843
book and practice well in advance of the exam. iv) Total of f column = 6 + 11 + 10 + 8 + 4 = 39
Σx2 /n = 9843÷÷39 = 252.3846
One of the commonest mistakes candidates make when using v) Mean2 = 15.462 = 239.0116
the calculator is not to clear all the data before starting a new vi) Σx2 /n - mean2 = 252.3846 - 239.0116 = 13.3730
calculation. You can usually do this on a scientific calculator by vii) Standard deviation = √13.3730 = 3.657
going into the statistics mode and then pressing SHIFT or 2ND
and "AC". To check it works, press the button that you would
normally use to get the mean - if it gives you a number, you
haven't cleared the data properly!

1
Answering Exam Questions on Statistics Bio Factsheet
www.curriculumpress.co.uk

Interpreting the mean and standard deviation Degrees of freedom: you do not need to know the exact meaning, although
The mean, of course, is the average - but that does not mean half the values you do need to know how to calculate them (see below). The idea is that
are below and half above it, or that it is a common value. For example, the the amount of data you have affects the critical value - this is because you
mean of the values 1, 1, 2, 3, 100 is 21.4; this is nowhere near any of the actual are much more likely to get unusual results by chance if you only have a few
values, and four out of the five values are below it! observations, than if you have a lot of observations.

The mean also does not distinguish betwee these two data sets:- Interpreting results and drawing conclusions
A: 48, 49, 50, 51, 52 You must remember that if the value you calculate (the test statistic) is
B: 35, 40, 50, 62, 63 greater than the value from the tables (the critical value), then you reject
Both sets of data have mean 50, but they are not very similar. the null hypothesis. Otherwise you accept it.

This is where the standard deviation comes in. This measures how spread You then need to relate this back to the original hypotheses; this will be
out the data are - the bigger the standard deviation, the greater the spread. discussed in more detail for each test.
For example, for data set A above, the standard deviation is 1.414, and for
set B, it is 11.296. Choose your words carefully - a statistical test does not "prove" a
hypothesis is true - there is always a chance that a wrong decision could be
So, for example if you know the following: made. It is normal to say "the result is significant at the 5% level" or "the
Data set 1: mean = 45.2 standard deviation = 2.13 alternative hypothesis was accepted at the 5% level".
Data set 2: mean = 43.7 standard deviation = 10.03
We know that data set 2 is more spread out than data set 1. Let's consider The remainder of the section is divided between the chi-squared test and the
which would be more likely to have a value in it above 50, say. t-test.
For data set 1, 50 is more than 2 standard deviations away from the mean
(45.2 + 2 ×2.13 = 49.46)
Chi-squared test
For data set 2, 50 is less than 1 standard deviation away from the mean
There are two types main types of chi-squared test you may have to do:
(43.7 + 10.03 = 53.73).
a) Testing to see if there is a difference
This tells us that 50 is a less "extreme" or "uncommon" value for data set
b) Testing to see if the theoretical ratios predicted by genetics apply
2 than for data set 1. So data set 2 is more likely to have values above 50.
The hypotheses for the tests are
Statistical tests a) H0: there is no difference between the different conditions
In the exam, you will always be told which statistical test to use if you are H1: there is a difference between the different conditions
being required to do calculations. You will be given any tables you need.
There are various types of questions:- b) H0: the observations are in accordance with the predictions of genetics
• understanding statistical terms like degrees of freedom, significance, etc H1: the observations are not in accordance with the predictions of
• interpreting results and drawing conclusions genetics
• doing the calculations according to the test formula Calculations for the test formula
• finding degrees of freedom In chi-squared, you will need to calculate expected frequencies, and then
• using statistical tables the value of chi-squared, using the formula:
Some of these are the same for both t-test and chi-squared; others are specific
Σ (O E- E)
2 O is observed values - the data from the question
to the test. χ2 =
E is expected values - the ones you calculate
Σ means sum of
Understanding statistical terms

Hypotheses: the purpose of a statistical test is to decide between the null a) To calculate expected values when you are testing for a difference, you
hypothesis and the alternative hypothesis. The exact form of these just add up all the values and divide by the number of them.
hypotheses depends on the test. When you are carrying out the test, you
accept the null hypothesis, unless you have convincing evidence otherwise b) To calculate expected values for genetics, you have to use the genetic
(in a court of law, the "null hypothesis" is that the person is innocent - he ratio. The procedure is:
is only decided to be guilty if there is enough evidence). i) Add up all the values from the data you are given
ii) Add up all the numbers in the genetic ratio
Test statistic: this is the value calculated from your data. The formula for (eg for 9:3:3:1, do 9 + 3 + 3 + 1 = 16)
it depends on the test you are doing. This tells you the number of parts you will be dividing your total
from i) into.
Critical value: this is the value you compare the test statistic to, to decide iii) Find out how much one part is, by dividing your total from i) by your
whether you are going to accept or reject the null hypothesis. total from ii)
For both t-test and chi-squared test, you reject the null hypothesis if your iv) Find out the expected frequencies, by multiplying one part by the
test statistic is greater than the critical value. numbers in the ratio (eg by 9, 3, 3 and 1)
Critical values come from statistical tables.
Once you have calculated the expected frequencies, you substitute into the
Significance level: It is possible to reject the null hypothesis even if it is formula above to find the chi-squared value.
true, because "unusual" results can occur by chance (eg it is possible -
although unlikely - to get 100 heads in succession when tossing a coin). Finding degrees of freedom
The significance level is the chance of rejecting the null hypothesis when it You need to learn this formula:
is true. These may be written as percentages (10%, 5%, 1%) or as decimals
(0.1, 0.05, 0.01). For chi-squared:
The normal significance level in science is 5%. Use this unless you degrees of freedom = number of categories - 1
are told otherwise.
2
Answering Exam Questions on Statistics Bio Factsheet
www.curriculumpress.co.uk

Using statistical tables Using statistical tables


All you have to do is to read down to find the number of degrees of freedom All you have to do is to read down to find the number of degrees of freedom
you have, and across to find the significance level (usually 5% = 0.05). you have, and across to find the significance level (usually 5% = 0.05).
chi-squared tables t-table
For a chi-squared test
df 0.10 0.05 0.025 0.01 0.005 with 1 degrees of freedom Significance level For a t-test with 10 degrees of
1 2.71 3.84 5.02 6.63 7.88 df 0.1 0.05 0.01 freedom at a significance level
at a significance level of
2 4.61 5.99 7.38 9.21 10.60 7 1.895 2.365 3.499 of 5%, the critical (tables)
3 6.25 7.81 9.35 11.34 12.84 5%, the critical (tables) 8 1.860 2.306 3.355
value is 3.84 9 1.833 2.262 3.250
value is 2.228
4 7.78 9.49 11.14 13.23 14.86
10 1.812 2.228 3.169
11 1.796 2.201 3.106

t-test
There are two types of t-test, paired and unpaired. The exam will always
make it clear which you should do. You will always be given the relevant Common mistakes
formulae. These are some of the commonest errors candidates make:-

The hypotheses for both tests are • Rounding errors, due to rounding too early. If in doubt, use all the
H0: mean 1 = mean 2 figures.
H1: mean 1≠ mean 2 It is useful to keep figures in your calculator, to avoid having to keep
(This is a 2-tailed test - you may also come across 1-tailed tests, but in the writing down and re-entering data. Learn how to use your calculator
exam you will never have to choose between the two) memory.

Calculations for the test formula • Calculator errors - putting the correct figures into the calculator
The calculations for either type of type of t-test are similar to those for finding wrongly. See the calculator tips in this Factsheet and practice using
means and standard deviations. You also need to be able to substitute into your calculator well before the exam.
a formula. Provided you can do calculations like the ones on page 1, you will
not have a problem with these. Remember, you will be given any formulae • Failure to show working - hence throwing away all the marks if there
you require. is even one tiny error in calculation.
The paired t-test first requires you to find the differences between each pair
of values. You then work with these differences only.
• Failure to recall the formulae for degrees of freedom - these have
to be learnt. If you get them wrong, they will invalidate your tables
x is the mean of the differences value and your conclusion.
x √(n -1) n is the number of pairs
paired t-test: t =
s s is the standard deviation of the • Not drawing conclusions correctly - you must learn that if your
differences calculated value is larger than the tables value, you reject the null
hypothesis.
In the unpaired t-test, you will need to use these formulae:
s= Σx12 - n1x12 + Σx22 - n2x22 x1 and x2 are the means of the • Getting the hypotheses the wrong way round - if your calculated
n1 + n2 - 2 two samples result is greater than the tables value, then:
x1 - x2 n1 and n2 are the sizes of the • for the t-test, there is a difference between the means
t = two samples • for testing for a difference in chi-squared, there is a difference
1+1 Σ means "sum of"
s n n
1 2
• for genetics chi-squared, the results are not as predicted by
genetics
Exam questions will get you to do these calculations bit by bit and "follow
through" marks are likely to be awarded - so if you calculate s wrong, for
example, but use your value correctly to calculate the value of t, then you
will get the rest of the marks.
Calculator Tips:-
To carry out any calculation that is set out as a fraction, you
must put brackets round the top and round the bottom.
It is probably easier to work out the number inside the square-
root first, then take the square root, rather than trying to do it all
in one go.

Finding degrees of freedom


You need to learn these formulae:

For paired t-test:


Acknowledgments: This Factsheet was researched and written by Cath Brown.
degrees of freedom = number of pairs - 1 Curriculum Press, Unit 305B The Big Peg, 120 Vyse Street, Birmingham B18 6NF
For unpaired t-test:
degrees of freedom = number in 1st sample + number in 2nd sample - 2 Bio Factsheets may be copied free of charge by teaching staff or students,
provided that their school is a registered subscriber.
No part of these Factsheets may be reproduced, stored in a retrieval system, or
transmitted, in any other form or by any other means, without the prior
permission of the publisher.
ISSN 1351-5136

Potrebbero piacerti anche