Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
University of York , UK
University of Durham , UK
Published online: 02 Oct 2012.
To cite this article: Emma Marsden & Carole J. Torgerson (2012) Single group, pre- and post-test
research designs: Some methodological concerns, Oxford Review of Education, 38:5, 583-616, DOI:
10.1080/03054985.2012.731208
To link to this article: http://dx.doi.org/10.1080/03054985.2012.731208
University of Durham, UK
This article provides two illustrations of some of the factors that can influence findings from
pre- and post-test research designs in evaluation studies, including regression to the mean
(RTM), maturation, history and test effects. The first illustration involves a re-analysis of data
from a study by Marsden (2004), in which pre-test scores are plotted against gain scores to
demonstrate RTM effects. The second illustration is a methodological review of single group,
pre- and post-test research designs (pre-experiments) that evaluate causal relationships between
intervention and outcome. Re-analysis of Marsdens prior data shows that learners with higher
baseline scores consistently made smaller gains than those with lower baseline scores, demonstrating that RTM is clearly observable in single group, pre-post test designs. Our review found
that 13% of the sample of 490 articles were evaluation studies. Of these evaluation studies,
about half used an experimental design. However, a quarter used a single group, pre-post test
design, and researchers using these designs did not mention possible RTM effects in their explanations, although other explanatory factors were mentioned. We conclude by describing how
using experimental or quasi-experimental designs would have enabled researchers to explain
their findings more accurately, and to draw more useful implications for pedagogy.
Keywords: education research; pre-experiment; regression to the mean; control group; research design;
research methods
Introduction
Pre-experimental designs in evaluation studies
Evaluations of educational policy and practice interventions that rely on the single
group pre-experimental research design (also known as the before and after or
584
585
comparison group is usually necessary to account for the possible effects of these
on post-test scores (Cook & Campbell, 1979; Shadish et al., 2002; Torgerson &
Torgerson, 2008).
Maturation
Learners tend to improve in their educational outcomes over time simply due to
increasing maturity. In the absence of a control group we cannot control for maturation effects because these will tend to affect post-test scores regardless of any
new intervention being evaluated. The greater the time difference between preand post-test, the greater the potential effects due to maturation (Cook & Campbell, 1979; Shadish et al., 2002).
Test effects
Evaluations usually involve some form of measurement, before and after the intervention. It is possible that improvements can result from the test itself, attributable
to factors such as participants remembering questions or the questions raising
awareness and triggering learning after the pre-test, independent of the subsequent
intervention. Ideally, two or more equivalent versions of the same test should be
used, counter-balanced amongst participants at pre- and post-test. However, to
fully ascertain whether any learning occurred as a result of simply having done the
test, it is necessary to test participants who did not receive the pre-test. Thus,
there can be four groups of participants: pre- and post-test no intervention; preand post-test with intervention; post-test only no intervention; post-test only with
intervention. This is the Solomon four group design (Shadish et al., 2002).
The RTM phenomenon and an illustration of its effects
RTM is a statistical phenomenon that affects all pre-experimental designs that
include, or analyse data from, participants selected on the basis of an extreme,
usually low but sometimes high, pre-test score (Cook & Campbell, 1979).
Galton (1886) discovered the statistical phenomenon of RTM in his work on
the heritability of height, describing it as regression to mediocrity. Thorndike
(1942) reminded us how RTM, or the regression fallacy, can affect educational
research. The phenomenon can affect measurement known to contain an error
component (as well as a true measurement), for example a reading test. Many
test results have a normal distribution with most values clustered around the mean
and a smaller number of markedly lower or higher results. In almost all educational tests a proportion of the results will have a random error component (the
error term). Scores towards the ends of a distribution will, on average, be more
likely to have a higher error term than those nearer the mean. When students are
re-tested the results towards the ends of the distribution will tend to move closer
to the mean (to their true value) than the results in the middle range. A minority
will not regress to the mean but the majority will, moving the mean of the subgroups towards the whole sample mean on post-test. The regression effect is,
therefore, most evident for students with the lowest and highest pre-test scores.
In order to illustrate the RTM phenomenon we re-analysed data from a study
undertaken by the first author (Marsden, 2004, 2006). To ascertain whether
RTM affected the data, participants change scores (i.e., post-test minus pre-test
scores) were plotted against pre-test scores. If RTM were present a negative correlation would be observed, because participants with high pre-test scores would, on
average, tend to have smaller gains than participants with low pre-test scores. As
expected, Figure 1 shows a strong negative correlation ( 0.65, p < 0.001) between
the pre-test and change scores.1
The lower and upper quartiles of the pre-test scores were extracted for each of
the four measures (listening, speaking, reading and writing). This created two subgroups, lower and upper, coming from the same contextual group with the same
intervention. The pre- to post-test gains made by these lower and upper groups
were compared. This simulated a pre-experiment that compared the effectiveness
of the intervention for those with the lowest and highest scores at the outset.
(Once the inter-quartile range of the test scores was eliminated, the remaining
samples were very small, and it is emphasised that our aim was solely to illustrate
the existence of RTM effects.) Of the 16 comparisons (i.e., pre-to-post and
pre-to-delayed post tests, in two groups, in four outcome measures), six showed
statistically significantly larger gains by the lower groups than the upper groups. A
30.00
20.00
Pre-post
586
10.00
0.00
-10.00
-20.00
20.0030.00
40.00
50.00
60.00
Pretest score
Figure 1. Pre-test scores on a test plotted against gains between pre- and post- tests (data from
Marsden, 2004)
587
Design issues
We can address the issues illustrated above by introducing a control or comparison
group formed by random allocation. Then, maturation, history, test effects and
RTM effects will affect both groups similarly and cancel out when comparing
groups. Consequently, group differences in changes from pre- to post-tests can be
appropriately ascribed to the intervention.
Although including a randomised control group will deal with temporal and RTM
effects, using a selected control group, not formed by random allocation, may not.
There are several ways a selected control group can introduce bias, the most obvious
of which is selection bias. The members of a control group may be systematically different in some variable, often unobserved, which influences outcome; consequently,
any difference observed between the groups at outcome may be due to selection
rather than to treatment. Secondly, even well matched control groups may introduce
bias through difference in history. For example, groups may have been exposed to
different interventions that may accelerate maturation. Also, RTM can differentially
affect different contextual groups even though they may appear to be matched at pretest. This is because it is an individuals position in relation to their own contextual
groups mean that determines whether their score is likely to regress up or down to
their groups mean, and by how much. Most of the extreme values will, regardless of
the presence or effectiveness of an intervention regress towards the mean, though a
minority do not. Identifying those values that will regress on re-testing, given no
intervention, and those that will not, is difficult, if not impossible.
The effects of history, maturation, test effects and RTM often do not operate
alone. Usually, differences in pre- and post-test gains are a combination of all four
factors. A methodological review of studies of psychological, educational and
behavioural treatments (Lipsey & Wilson, 1993) showed that the pre- post-test
design consistently overestimates effectiveness by an average of 61% compared with
studies with a control group. This greater improvement seen in before and after
studies compared with quasi-experiments (i.e., experiments with a control group) is
entirely predictable given what we know about history, maturation, test effects and
RTM effects. Indeed, gains in control or comparison groups can be observed,
demonstrating that not all of the gains in the experimental group are attributable to
the intervention itself. For example, Norris and Ortegas (2000) meta-analysis of
78 second language education studies found that the average effect sizes of true
control and comparison treatment groups was d=0.30 (st.dev.=0.39), that 15
groups with no experimental intervention made small but important gains, and that
change over time in control groups was a consistent phenomenon.
588
Review methods
We searched 13 educational research journals: British Educational Research Journal, Cambridge Journal of Education, Educational Studies, International Journal of
Science Education, Journal of Educational Research, Journal of Research in Reading,
Journal of Teacher Education, Language Learning and Technology, Oxford Review of
Education, Reading and Writing: An Interdisciplinary Journal, Research in the Teaching of English, Reading Research Quarterly and Science Education, for 2009, using
the database Educational Resources Information Centre. The year 2009 provided the
most recent full cycle of journal issues before the start of the review process. This
is not a representative sample of education journals or of educational research
itself, and we note that it is likely that the number of articles that fit our criteria
from any one journal is potentially positively correlated with the number of articles published by that journal in that year, and/or with the amount of detail provided in the report.
In order to be selected for the review, papers had to: be unique empirical studies; compare a construct (e.g., attitude, knowledge, behaviour) before and after an
intervention; have at least one quantified measure; employ a study design that did
not include a control or comparison group or any other mechanism that could
have potentially addressed the known biases of using a single group design. Independent data extraction of the studies (with double data extraction of over 80% of
the studies) retrieved information about: the topic; the nature of the intervention;
the outcome measures; and the results. We also recorded whether: the author/s
derived a causal inference between intervention and outcome; the author/s mentioned RTM as a possible explanation for the results; the author/s mentioned other
potential explanatory factors for the results.
Results
In total, 490 articles were published in 2009 in the 13 journals. We found 64
(13%) evaluated innovative interventions and used experimental, quasi-experimental or pre-experimental designs (with quantitative and/or qualitative outcome measurements).3 Of these 64, 19 were included at the first screening stage. At the
second screening stage we excluded three studies (Graebner et al., 2009; MacArthur & Lembo, 2009; Tsaparlis & Papaphotis, 2009) because they did not fit our
criteria. This left 16 (25%) evaluation studies that met our criteria (i.e., pre-post
designs without a control or comparison group). (Note, 48 studies evaluated interventions using designs with control or comparison groups.)
Detailed information about each pre-experiment is presented in the Appendix.
589
Table 1. Consideration of potential factors explaining the observed changes, other than, or in
addition to, the experimental intervention
Characteristic of study
Studies
No mention of any potential explanatory Annetta et al.; Ducate & Lomicka; Newton &
factor (except the intervention).
Newton; Sherrod & Wilhem; Spalding et al.; Taylor
& Jones.
Acknowledgement that other
Grace; Park et al.
(unspecified) factors may be involved.
Specific alternative explanatory factors
Brady et al. (characteristics of intervention and other
mentioned.
extraneous variables pertaining to quality of
interventionsupport, time, resources; the measure
was not standardised). Evagorou et al. (self-selection
bias). Guisasola et al. (tests influence learning). Jones
et al. (self-selection bias). Miedijensky & Tal
(influence of regular school, time, maturation).
OByrne (indirectly: tests influence learning). Sherin
& van Es (relationship between the intervention and
the outcome measure may be cyclical). Wilhelm
(differential maturation, differential exposure to the
intervention, differential motivation).
590
591
592
impact of an intervention on low and high achievers (though assignment to conditions used matched randomisation at the school level rather than at the level of
individual participants, and so RTM may have affected the different groups
unevenly, as described above).
Discussion
Control (or comparison) groups are important for avoiding unwarranted interpretations of data from pre-post measurements. It should be noted that 14 of the 64
evaluation studies did use a comparison group, without pre-intervention measures;
and 34 of the 64 studies used both a pre-post design and a control/comparison
group (with or without random allocation to groups).
The use of control and comparison groups principally avoids unwarranted interpretations (internal validity). It can also improve ecological validity. For example,
using test only groups can inform decisions when the intervention would be
added to the normal programme, and using comparison groups can help practitioners determine the relative merits of different interventions. As discussed above,
random allocation is the best way of addressing history, maturation, test effects
and RTM effects. If a control group cannot be formed by random assignment then
a contemporaneous control group is preferable to no control group.
Another way of partly controlling for RTM effects is to undertake repeated multiple baseline measurements, in an interrupted time series design, until a stable score
is achieved so as to reduce the margin of error of the test. This improves the validity
of associating any future gains with the intervention rather than RTM. This is often
done in cognitive psychology research in order to find an asymptote that is more
likely to reflect the true value of the construct being measured. McArthur and Lembo (2009) evaluated cognitive strategy instruction for writing skills. The three participants did between three and five pre-test essays to obtain a stable baseline (p. 1029).
The post-test consisted of three more essays. For two students, post-test scores were
all higher than stable baseline scores. For the other student, a slight increase was
observed at post-test over baseline. The authors note that the percentage of nonoverlapping data between stable baseline and post-test was 100% (p. 1029).
Whilst such a research design is statistically more satisfactory, for the participant, teacher and policy-maker, it is time consuming and difficult to justify pedagogically. Randomisation is therefore probably a preferable method of addressing
the RTM problem, particularly as it also eliminates selection bias.
Pre-experimental designs do, however, have a role to play in educational
research. For example, before and after data can determine the promise of an intervention during its development phase. In this case researchers will investigate the
potential for an intervention to improve scores in an iterative cycle of testing and
developing, though the researcher should guard against over-interpretation beyond
the observation that the intervention has promise. Many of the studies we
reviewed also made useful contributions by demonstrating feasibility of implementation. However, pre-experimental research in which the observed magnitude of
593
gains over time is ascribed uniquely to a causal relationship between the intervention and the outcome measures is a concern. Furthermore, caution must be exercised when using pre-experimental research to inform sample size calculations for
RCTs because such studies over-estimate the intervention effects and lead to an
underestimation of the sample size (Torgerson & Torgerson, 2008).
We do not know the extent to which the effects outlined earlier influenced the
findings reported in the studies we reviewed. Thirteen of the 16 studies included all
the participants in all analyses, and did not split the pre-test data into high and low
scorers. In such studies one might argue that the movement up to the mean from
the lower scorers and the movement down to the mean from the higher scorers
may have cancelled out the effects of RTM. However, this is by no means certain,
as the movement of the lower and upper outliers due to RTM may not have been
equivalent. Indeed, equal upwards and downwards movement is unlikely given the
combined effects of history, maturation, test effects and the intervention (experimental or comparison). The combined effects of these factors may reduce any
regression down to the mean of the higher scorers but increase the regression up to
the mean of the lower scorers. Clearly, some of the difference might be due to the
intervention actually being effective at improving the outcomes measured, but how
much, if any, is impossible to know due to the limitations within the design.
Conclusions
In our small-scale methodological review of pre-experimental studies we have illustrated that a number of authors of such research designs did not take into account
the potential biasing effects of history, maturation, test effects and RTM in the
discussion of their results. We found several studies that divided the participants
on the basis of their pre-test scores into low and high achievers and argued that an
intervention was more beneficial for those with low scores at baseline, but did not
discuss RTM as a possible factor influencing this finding.
In pre-experiments, history, maturation, test effects or RTM effects may not
explain all of the pre-post differences observed in these studies, and the experimental interventions may be responsible for some of the effects observed. However, because random allocation to experimental and comparison groups was not
used, we cannot tell the extent to which the differences were due to history, maturation, test or the regression artefact. We know, however, that some of the
observed difference is likely to be artefactual.
Randomised controlled trials are widely used to control for selection bias, that
is, where participants are selected on characteristics that may bias the results. This
paper has highlighted how randomised control groups are also important to
control for history, maturation, test and the RTM phenomenon. Our review found
about one fifth of the evaluation studies did use a comparison group, and about
half used pre-intervention measures in addition to a comparison group, some with
random allocation. This illustrates that such designs are feasible in instructional
settings.
594
Acknowledgements
We thank David Torgerson for his useful comments and suggestions on an earlier
draft of the paper.
Notes
1.
2.
3.
The data were, in fact, from a trial that used matched randomisation to an experimental
and a comparison group, thereby controlling for RTM effects.
For reasons of space statistics are not provided, but the data can be found in Marsden,
2004.
The majority of studies that were NOT evaluation studies aimed to explore potential relationships, define constructs, or document processes.
Notes on contributors
Emma Marsden is Senior Lecturer in Language Education in the Department of
Education at the University of York. Her interests are in research methods
and design, both for general educational and applied linguistics research,
language learning theories, and foreign language education. She has worked
on projects funded by the ESRC, DfES, British Academy, HEA LLAS
Subject Centre and the British Council.
Carole Torgerson is Professor of Education in the School of Education at Durham
University. Her main methodological research interests are in experimental
methods (randomised controlled trials and quasi-experiments) and research
synthesis. She has received awards from the DfES, DfCSF, Home Office,
ESRC, HEA, CfBT, HEFCE, NIHR and a range of other organisations.
References
Annetta, L., Mangrum, J., Holmes, S., Collazo, K. & Cheng, M. (2009) Bridging realty to virtual reality: investigating gender effect and student engagement on learning through video
game play in an elementary school, International Journal of Science Education, 31(8), 1091
1113.
Bell, J., Donnelly, J., Homer, M. & Pell, G. (2009) A value-added study of the impact of science curriculum reform using the national pupil database, British Educational Research Journal, 35(1), 119135.
Benati, A., Lee, J. & McNulty, E. (2010) Exploring the effects of Processing Instruction on a
discourse-level guided composition, in: A. Benati & J. Lee (Eds) Processing instruction and
discourse (London, Continuum), 97147.
Ben-David, A. & Zohar, A. (2009) Contribution of meta-strategic knowledge to scientific
inquiry learning, International Journal of Science Education, 31(12), 16571682.
Brady, S., Gillis, M., Smith, T., Lavalette, M., Liss-Bronstein, L., Lowe, E., North, W., Russo,
E. & Wilder, T.D. (2009) First grade teachers knowledge of phonological awareness and
code concepts: examining gains from an intensive form of professional development and
corresponding teacher attitudes, Reading and Writing: An Interdisciplinary Journal, 22(4),
425429.
595
Campbell, D.T. & Stanley, J.C. (1963) Experimental and quasi-experimental designs for research
(Chicago, IL, RandMcNally).
Cook, T.D. & Campbell, D.T. (1979) Quasi-experimentation: design and analysis issues for field settings (Boston, MA, Houghton Mifflin).
Ducate, L. & Lomicka, L. (2009) Podcasting: an effective tool for honing language students
pronunciation? Language Learning and Technology, 13(3), 6686.
Evagorou, M., Korfiatis, K., Nicolaou, C. & Constantinou, C. (2009) An investigation of the
potential of interactive simulations for developing thinking skills in elementary school: a case
study with fifth-graders and sixth-graders, International Journal of Science Education, 31(5),
655674.
Galton, F. (1886) Regression towards mediocrity in hereditary stature, The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246263.
Grace, M. (2009) Developing high quality decision-making discussions about biological conservation in a normal classroom setting, International Journal of Science Education, 31(4), 551
570.
Graebner, I.T., de Souza, E.M.T. & Saito, C.H. (2009) Action-research and food and nutrition
security: a school experience mediated by conceptual graphic representation tool, International Journal of Science Education, 31(6), 809827.
Guisasola, J., Solbes, J., Barragues, J.-I., Morentin, M. & Moreno, A. (2009) Students understanding of the special theory of relativity and design for a guided visit to a science museum,
International Journal of Science Education, 31(15), 20852104.
Jones, G., Taylor, A. & Broadwell, B. (2009) Estimating linear size and scale: body rulers, International Journal of Science Education, 31(11), 14951509.
Lipsey, M.W. & Wilson, D.B. (1993) The efficacy of psychological, educational and behavioral
treatment: confirmation from meta-analysis, American Psychologist, 48(12), 11811209.
MacArthur, C.A. & Lembo, L. (2009) Strategy instruction in writing for adult literacy learners,
Reading and Writing, 22(9), 10211039.
Marsden, E. (2004) Teaching and learning of French verb inflections: a classroom experiment
using processing instruction. Unpublished Ph.D. dissertation, University of Southampton.
Marsden, E. (2006) Exploring input processing in the classroom: an experimental comparison
of processing instruction and enriched input, Language Learning, 56, 507566.
McCutchen, D., Green, L., Abbott, R. & Sanders, E. (2009) Further evidence for teacher
knowledge: supporting struggling readers in grades three through five, Reading and Writing:
An Interdisciplinary Journal, 22(4), 401423.
Miedijensky, S. & Tal, T. (2009) Embedded assessment in project-based science courses for the
gifted: insights to inform teaching all students, International Journal of Science Education, 31
(18), 24112435.
Moore, M. & Wade, B. (1998) Reading and comprehension: a longitudinal study of ex-Reading
Recovery students, Educational Studies, 24, 195203.
Newton, D.P. & Newton, L.D. (2009) Knowledge development at the time of use: a problembased approach to lesson planning in primary teacher training in a low knowledge, low skill
context, Educational Studies, 35(3), 311321.
Norris, J. & Ortega, L. (2000) Effectiveness of L2 instruction: a research synthesis and quantitative meta-analysis, Language Learning, 50, 417528.
OByrne, B. (2009) Knowing more than words can say: using multimodal assessment tools to
excavate and construct knowledge about wolves, International Journal of Science Education,
31(4), 523539.
Park, H., Khan, S. & Petrina, S. (2009) ICT in science education: a quasi-experimental study
of achievement, attitudes toward science, and career aspirations of Korean middle school
students, International Journal of Science Education, 31(8), 9931012.
596
Shadish, W.R., Cook, T.D. & Campbell, D.T. (2002) Experimental and quasi-experimental designs
for generalized causal inference (Boston, Houghton Mifflin).
Sherin, M.G. & van Es, E.A. (2009) Effects of video club participation on teachers professional
vision, Journal of Teacher Education, 60(1), 2037.
Sherrod, S.E. & Wilhelm, J. (2009) A study of how classroom dialogue facilitates the development of geometric spatial concepts related to understanding the cause of moon phases,
International Journal of Science Education, 31(7), 873894.
Spalding, E., Wang, J., Lin, E. & Hu, G. (2009) Analyzing voice in the writing of Chinese
teachers of English, Research in the Teaching of English, 44(1), 2351.
Taylor, A. & Jones, G. (2009) Proportional reasoning ability and concepts of scale: Surface area
to volume relationships in science, International Journal of Science Education, 31(9), 1231
1247.
Thorndike, R.L. (1942) Regression fallacies in the matched groups experiment, Psychometrika,
7, 85102.
Torgerson, C. & Torgerson, D. (2008) Designing and running randomised trials in health, education
and the social sciences: an introduction (Basingstoke, Palgrave Macmillan).
Tsaparlis, G. & Papaphotis, G. (2009) High-school students conceptual difficulties and
attempts and conceptual change: the case of basic quantum chemical concepts, International
Journal of Science Education, 31(7), 895930.
Wilhelm, J. (2009) Gender differences in lunar-related scientific and mathematical understandings, International Journal of Science Education, 31(15), 21052122.
Brady, Gillis,
Smith,
Annetta,
Mangrum,
Holmes,
Collazo &
Cheng.
Study
U.S.A.
n=74 students
First grade
teachers;
No.
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
No.
Yes. the MEGA
integrated into an
elementary school
science class did result
in the learning of key
science concepts for
fifth-grade boys and
girls learning of
simple machines
(p. 1104).
Elementary
school;
Sample size
5th grade
Examine students
learning of simple
students of
machines;
varying
academic levels
1011 years;
Measures
Participants;
Science
Education;
Topic;
Setting;
Country
Objective
of study;
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
Data extracted from included studies with a pre-experiment (single group pre-post test) design
Appendix
Elementary
school;
Professional
Lavalette,
development of
Lissteachers;
Bronstein,
Lowe, North,
Russo &
Wilder
Study
Topic;
Setting;
Country
Appendix (Continued)
Measures
Participants;
Sample size
professional
development for
building the
knowledge of firstgrade teachers in the
areas of phonological
awareness and
phonics;
Objective
of study;
weak knowledge of
phonological
awareness and
phonics concepts
prior to PD [average
42.6% correct] and
large, significant
gains in each year by
year-end [average
74.1% correct] on
all [three] subtests
and on the total
score (pp. 436,
437). Repeated
Measures ANOVAs
showed statistically
significant differences
for phonological
awareness, Code,
Fluency and Oral
language, (p. 437).
With large effect
sizes for [PA] and
[C] (.73 and .80) (p.
443).
Attitudes: Repeated
measure ANOVAs
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
(Continued)
generated substantial
overall gains on the
survey of teachers
knowledge (p. 443)
Thus, there are effects
for time with final
teacher knowledge
scores ... being
significantly higher
than their respective
beginning scores (p.
437) assessment of
teacher attitudes
indicates that positive
feelings about the PD
increased, as did
personal commitment
to participate (p. 439)
The present study
demonstrated the value
of an intensive form of
PD provided by skilled
mentors for building
teacher knowledge (p.
447).
But authors state: At (But authors state
this point one cannot we only have been
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
598
E. Marsden and C.J. Torgerson
Ducate &
Lomicka
Study
Foreign
language
education;
U.S.A.
Topic;
Setting;
Country
Appendix (Continued)
Measures
Participants;
Sample size
n=57 (n=65
for analysis of
Teacher
Knowledge)
Measures: Survey to
assess teacher
knowledge needed to
teach basic reading
skills and a Teacher
Attitude Survey
(TAS) to measure
attitudes to the
professional
development.
Undergraduates Using podcasts to
improve foreign
(1822 years
language
old);
pronunciation;
Objective
of study;
Yes, indirectly.
Negative result
reported, causal
relationship implied
indicated significant
effects of time with
higher scores at the
end of the year for
self-efficacy
and positive
attitudes toward PD
lower scores at
the end of the year
on negative attitudes
(p. 438). Tables 3 &
4.
No statistically
significant
improvement in
most of the measures
Results, as
reported by
authors, including
statistics and
references to
Did authors ascribe
relevant tables
a causal
where appropriate relationship?
No.
(Continued)
able to identify a
modest portion of
the variance
accounting for
teachers responses
and scores)
(p. 446).
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
Study
U.S.A.
University;
Topic;
Setting;
Country
Appendix (Continued)
Sample size
Attitudes towards
pronunciation using
Pronunciation
Attitude Inventory.
Measures
Participants;
n=12 German
students; n=10
French
students.
Objective
of study;
(comprehensibility,
accentedness, and
attitudes towards
pronunciation) (p.
73). One sig
difference in French
comprehensibility
ratings (p. 73).
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
600
E. Marsden and C.J. Torgerson
Grace
Measures
Participants;
Sample size
n=13
Measures: 2 tests,
both used as pre and
post tests. Each test
with tasks
corresponding to
seven thinking skills.
Objective
of study;
Cyprus.
Decision-making 1516 year olds; Can peer group
in science
decision-making
classrooms;
discussions help
develop students
Elementary
school;
Thinking skills;
Evagorou,
Korfiatis,
Nicolaou &
Constantinou.
Study
Topic;
Setting;
Country
Appendix (Continued)
Considerable
improvements in the
participants system
thinking skills, on six
measures, but not on
feedback thinking
(pp. 664669 and
Tables 2, 3 & 4).
No.
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
No.
Yes (and indirectly,
no). The proposed
learning environment
provoked considerable
improvements in some
system thinking skills
during a relatively brief
learning process.
(p. 656) after the
instruction the total
number of referred
elements increased
(p. 664). We have to
admit the failure of
our intervention in
promoting feedback
thinking (p. 671).
But authors state that
results: could be
positively affected by
the fact that [the
students] voluntarily
participated in the
project (p. 669).
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
Study
Measures
Participants;
Sample size
n=131 (4 intact
classes)
Topic;
Setting;
Country
Secondary
school;
personal reasoning in
relation to
conservation issues?
Measures: Pre and
post questionnaire
Objective
of study;
Appendix (Continued)
(p. 557). A
comparison of preand post-test
comments revealed a
general shift to
higher-level
responses following
the discussions
(p. 559). 54% of
student exhibited an
increased quality of
response; 40%
remained at the same
level, and 6%
dropped down a
level Almost 20% of
students moved from
level 3 to level 4 (p.
559).
But it is not possible
to establish with
certainty that the
differences between an
individuals pre-test
and post-test
statements were the
direct result of the
Results, as
reported by
authors, including
statistics and
references to
Did authors ascribe
relevant tables
a causal
where appropriate relationship?
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
602
E. Marsden and C.J. Torgerson
Guisasola,
Solbes,
Barragues,
Morentin &
Moreno
Study
Objective
of study;
Measures
Participants;
Sample size
U.K.
How does a museum
Physics education 1st year
undergraduates; visit influence
in Engineering
students
course;
understanding of the
Special theory of
Relativity (STR) and
its applications? Do
students use more
scientific arguments
when discussing
topics related to the
Special theory of
Relativity after
visiting the
exhibition?
University
n=35
Measures: to
measure
understanding, a
questionnaire as the
pre-test; a written
report structured
around similar
questions to pre-test
for the post test.
Topic;
Setting;
Country
Appendix (Continued)
Increases between
pre and post
measures in
understanding.
Figures 1, 2 and 3
showing differences
between pre and post
measures for: correct
explanations of
aspects of STR;
scientific arguments
applied; proportions
of three or more
mentions of
applications.
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
discussions (p.
556).
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
Spain
Study
Topic;
Setting;
Country
Appendix (Continued)
Sample size
Measures
Participants;
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
Objective
of study;
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
604
E. Marsden and C.J. Torgerson
Miedijensky
&Tal
Study
Measures
Participants;
Sample size
1113 years;
n=19
1215 year olds; To document
student views on and
reactions to
assessment for
learning, amongst
students taking oneyear project-based
science courses for
the gifted.
Topic;
Setting;
Country
Summer Camp;
U.S.A.
Impacts of
assessment for
learning
approach
amongst gifted
and talented;
pull-out
programme for
gifted and
talented;
Measure: 20 item
test to assess
understanding of
metric scale.
Objective
of study;
Appendix (Continued)
Significant
differences between
the pre/post
questionnaires were
found with regard to
the three main
categories and most
of the subcategories.
differences for
object estimation ,
kinaesthetic estimation
..., and body ruler
(p. 1504)
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
No.
Yes. Causal links
between experiencing
AFL and positive
views about AFL: Our
findings indicate
positive impacts of
AFL on the students
views of assessment
(p. 2430). Also,
relationship between
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
Study
Measures
Participants;
Sample size
n=86
Topic;
Setting;
Country
Israel
Measures: pre-post
questionnaire:
general view of
assessment, ideas
about assessment
modes, and
relationships
Objective
of study;
Appendix (Continued)
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
606
E. Marsden and C.J. Torgerson
Topic;
Setting;
Country
Teacher
education;
Study
Newton &
Newton
Appendix (Continued)
Measures
Sample size
between assessment
and learning;
12 post-treatment
interviews
Primary teacher Impact of a problemtrainees, and
solving approach to
PGCE tutors;
lesson planning in an
area where trainees
Objective
of study;
Participants;
Statistically
significant increase,
(very large effect
size), in students
reported confidence
considered to
contribute to this shift.
We cannot entirely
denounce this concern;
however, our data
indicate nothing of this
sort of assessment was
employed in the
regular schools; and
indicate the students
strongly associated
their views to the
assessment
components, and
provided relevant
examples that support
our claim (p. 2430).
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
No.
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
Study
Measures
Participants;
Sample size
n=75 PGCE
students; and
Topic;
Setting;
Country
PGCE teacher
training course;
Measures: Before
and after
comparisons of a)
Objective
of study;
Appendix (Continued)
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
608
E. Marsden and C.J. Torgerson
OByrne
Study
reported selfconfidence and b)
quality of solutions
to problems.
Sample size
n=3 PGCE
tutors
Primary school;
n (aggregated
over 3
consecutive
years)=43;
concept of form
emerged once they
were provided carefully
constructed assessment
and learning tasks that
drew out this
knowledge (p. 533);
Some of these shifts
were no doubt the
result of small-group
investigations
(p. 534). Multimodal
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
Descriptive statistics
of individual test
items (pp. 529535).
Tables 1 & 2.
Measurementdependent gains
between pre and post
test, e.g. Every pair
of prepost-unit
drawings showed
refinement of
concepts of wolves in
Measures: Pre-post nature as distinct
True-false test; Pre- from fictional or
imaginary wolves,
post drawings of
more commonly
wolves.
featured in pre-unit
drawings (p. 538).
Measures
Participants;
Objective
of study;
U.K.
Understanding of Grade 2 pupils
wolves;
from one
school;
Topic;
Setting;
Country
Appendix (Continued)
Study
Measures
Participants;
Sample size
Middle school;
Measures:
Comparisons of
Impact of Computer
Assisted Instruction
(CAI) on
achievement and
attitudes;
Objective
of study;
U.S.A.
Topic;
Setting;
Country
Appendix (Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
No.
Yes. CAI was
significantly correlated
with improvement in
most of the
achievement groups
(p. 1003).
Collectively, student
achievement in the
post-achievement test
improved significantly
compared to their
achievement in science
prior to CAI (p.
1006).
Assessment Tools
Support Gradual
Concept Change (p.
537).
Implicit in design is
that test-effect used,
indirectly, as a learning
tool.
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
610
E. Marsden and C.J. Torgerson
n=234
Korea
Teachers
Sample size
Impact on attitudes
to science and future
courses and career
aspirations measured
by pre- and postquestionnaires.
Document effect of
discussing video
recorded lessons
(video clubs) on
teacher learning.
post-achievement
test to students
Grade Point Average
in the previous year.
Measures
Participants;
Topic;
Setting;
Country
Study
Objective
of study;
Appendix (Continued)
Quantitative
improvement on all
indicators of
teachers attention to
students
mathematical
thinking, amongst all
teachers on all
measures. Tables 3
11, pp. 2532 ... not
only did the teachers,
over time, come to
use more
sophisticated
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
No.
Yes. there is a
strong alignment
between the reasoning
strategies developed in
the video club and
those displayed in the
later classroom
observations.
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
n=4 + 7
Understanding
7th grade;
geometric spatial
Measures: quantity
and nature of
discussion during
video clubs (prepost); classroom
observations (earlylate); noticing
interviews (pre +
post)
Will classroom
dialogue facilitate
Sample size
Sherrod &
Wilhem
Measures
Participants;
Study
Objective
of study;
Topic;
Setting;
Country
Appendix (Continued)
Learners
demonstrated new
strategies for
reasoning about
student thinking ...
they also came to
notice more complex
issues of student
thinking (p. 27).
And in Meeting 1,
only 25% of the
comments about the
student concerned
mathematical
thinking ... in
Meeting 10, 92% ...
were ... to do with
mathematical
thinking (p. 28).
Yes (implicitly).
Following classroom
Results, as
reported by
authors, including
statistics and
references to
Did authors ascribe
relevant tables
a causal
where appropriate relationship?
No.
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
612
E. Marsden and C.J. Torgerson
Middle school;
concepts related
to the cause of
lunar phases;
Measures
Participants;
Sample size
Effect of 3 week
summer writing
workshop on
n= 92 (5 classes Measures: 2 D
taught by same drawings, before and
teacher)
after dialogue. And
the Lunar Phases
Concept Inventory
pre- and postmeasure.
students
understanding of
lunar concepts
related to geometric
spatial visualisation?
Objective
of study;
U.S.A.
Spalding,
Development of Chinese
Wang, Lin & writing in English teachers of
English as a
Hu
as a foreign
language;
Study
Topic;
Setting;
Country
Appendix (Continued)
understanding of
three concepts
tested: significant
gains in scientific
and geometric spatial
understanding ,
but also
accelerated
dedication to inquiry
teaching (p. 877);
increased proportions
of students
demonstrating
understanding
(p. 881).
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
No.
Yes. The writing
course led to
significant gains in
writing scores (p. 48).
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
China.
Study
Topic;
Setting;
Country
Appendix (Continued)
Measures
Participants;
Sample size
n=57
Objective
of study;
A significant
correlation between
proportional
reasoning ability and
students
understanding of
surface area to
volume relationships.
Mean score on the
pre-test was 54.42
(SD 20.41) whereas
the mean score on
the post-test
increased to 75.89
(SD 19.71) (pp.
course of the
institute (p. 23).
Greatest gains in
voice. Tables 1, 2
& 3.
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
No.
Yes. Results of a
paired-sample t-test
suggested that the
significant changes in
pre-test and post-test
for the ASAVA were
not due to random
chance but instead are
probably due to the
intervention the
students received as a
result of completing
the surface area to
volume application
tasks (p. 1236).
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
614
E. Marsden and C.J. Torgerson
Wilhelm
Study
Measures
Participants;
Sample size
n=19
Middle level
students;
Topic;
Setting;
Country
Middle school;
U.S.A.
Science
education;
Examine gender
differences in lunar
phases
understanding.
Measures:
Understanding tested
by pre-post
achievement tests.
Relationship of
understanding to
proportional
reasoning ability
tested in one off
achievement test.
Objective
of study;
Appendix (Continued)
12351236, 1236
1237).
(Continued)
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
No.
Yes. The partial g2
value of 0.703
indicates that
approximately 70.3%
of the gain in lunarrelated understanding
can be directly
attributed to the
inquiry Moon unit
(p. 2113). The partial
g2 value of 0.151
indicates that
approximately 15.1%
of the gain in lunar-
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
Study
Sample size
n=123
University;
U.S.A.
Measures
Participants;
Topic;
Setting;
Country
Measures: a Lunar
Phases Concept
Inventory (20 item)
and a Geometric
Spatial Assessment
(GSA) (16 item).
Objective
of study;
Appendix (Continued)
related understanding
can be directly
attributed to the
inquiry Moon unit
(p. 2116). Findings
suggest that both
scientific and
mathematical
understandings can be
significantly improved
for both sexes through
the use of spatially
focused, inquiryoriented curriculum
such as REAL
(pp. 2105, 2120).
Results, as
reported by
authors, including
statistics and
Did authors ascribe
references to
a causal
relevant tables
where appropriate relationship?
RTM mentioned?
(Points that could
relate to RTM, as
interpreted by
reviewers)
616
E. Marsden and C.J. Torgerson