Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
207 223
This article concerns the use of assessment for learning (formative assessment) and assessment of
learning (summative assessment), and how one can affect the other in either positive or negative
ways. It makes a case for greater use of teachers judgements in summative assessment, the reasons
for this being found in the research that is reviewed in the rst sections of the article. This research,
concerning the impact of summative assessment, particularly high-stakes testing and examinations,
on students motivation for learning and on teachers and the curriculum, reveals some seriously
detrimental effects. Suggestions for changes that would reduce the negative effects include making
greater use of teachers summative assessment. However, this raises other issues, about the
reliability and validity of teachers assessment. Research on ways of improving the dependability of
teachers summative assessment suggests actions that would equally support more effective use of
assessment to help learning. The later sections of the article address the issues and opportunities
relating to the possibility of assessment that serves both formative and summative purposes, with
examples of what this means in practice, leading to the conclusion that the distinction between
formative and summative purposes of assessment should be maintained, while assessment systems
should be planned and implemented to enable evidence of students ongoing learning to be used for
both purposes.
Introduction
All assessment in the context of education involves making decisions about what is
relevant evidence for a particular purpose, how to collect the evidence, how to
interpret it and how to communicate it to intended users. Such decisions follow from
the purpose of conducting the assessment. These purposes include helping learning,
*Haymount Coach House, Bridgend,
wynne@torphin.freeserve.co.uk
Duns,
Berwickshire,
TD11
3DJ,
UK.
Email:
208
W. Harlen
The reference here to a threat to validity of the assessment is but one of several. Highstakes tests are inevitably designed to be as objective as possible, since there is a
premium on reliable marking in the interests of fairness. This has the effect of
reducing what is assessed to what can be readily and reliably marked. Generally this
excludes many worthwhile outcomes of education such as problem-solving and
critical thinking.
210
W. Harlen
.
.
.
.
.
.
.
.
When passing tests is high stakes, teachers adopt a teaching style which
emphasizes transmission teaching of knowledge, thereby favouring those
students who prefer to learn in this way and disadvantaging and lowering the
self-esteem of those who prefer more active and creative learning experiences.
High-stakes tests can become the rationale for all that is done in classrooms
and permeates teachers own assessment interactions.
Repeated practice tests reinforce the low self-image of the lower-achieving
students.
Tests can inuence teachers classroom assessment, which is interpreted by
students as purely summative regardless of teacher intention, possibly as a
result of teachers over-concern with performance rather than process.
Students are aware of the performance ethos in the classroom and that the
tests give only a narrow view of what they can do.
Students dislike selection and high-stakes tests, show high levels of test
anxiety (particularly girls) and prefer other forms of assessment.
Feedback on assessments has an important role in determining further
learning. Judgemental feedback may inuence students views of their
capability and likelihood of succeeding. Students use feedback from earlier
performance on similar tasks in relation to the effort they invest in further tasks.
A schools assessment culture inuences students feelings of self-efcacy,
so teacher collegiality is important and should be encouraged by school
management.
An education system that puts great emphasis on evaluation and selectivity
produces students with strong extrinsic orientation towards grades and
social status.
The review not only identied the negative impacts of testing, but also gave clues as
to what actions could be taken to reduce these impacts. Suggested action included, at
the class level: explaining to students the purpose of tests and involving them in
decisions about tests; using assessment to convey a sense of progress in their learning
to students; providing explanations to students about the purpose of tests and other
assessments of their learning; providing feedback that helps further learning; and
developing students self-assessment skills and their use of criteria relating to
learning, rather than test performance. It is noteworthy that these actions refer to
several of the key features of assessment used to help learning.
Implications for assessment policy were drawn from the ndings by convening a
consultation conference of experts representing policy-makers, practitioners, teacher
educators and researchers. The policy implications included steps that should be taken
212
W. Harlen
to reduce the high stakes of summative assessment, by using a wider range of indicators
of school performance, and by using a more valid approach to tracking standards at the
national level, through testing a sample of students rather than a whole age group. It was
also emphasized that more valid information about individual student performance was
needed than could be obtained through testing alone, and that more use should be make
of teachers judgements as part of summative assessment. We now turn to the potential
advantages and disadvantages of this latter course of action.
This excludes the role of teachers as markers or examiners in the context of external
examinations, where they do not mark their own students work.
In addition to dening reliability and validity it was found useful to discuss
approaches in terms of dependability. The interdependence between the concepts of
reliability and validity means that increasing one tends to decrease the other.
Dependability is a combination of the two, dened in this instance as the extent to
which reliability is optimized while ensuring validity. This denition prioritizes
validity, since a main reason for using teachers assessment rather than depending
entirely on tests for external summative assessment is to increase the construct
validity of the assessment.
The main ndings from the two systematic reviews of research on the use of
teachers assessment for summative purposes are given in Box 2.
The extent to which the assessment tasks, and the criteria used in judging
them, are specied are key variables affecting dependability. Where neither
tasks nor criteria are well specied, dependability is low.
Detailed criteria, describing progressive levels of competency, have been
shown to be capable of supporting reliable assessment by teachers.
Tightly specifying tasks does not necessarily increase reliability and is likely
to reduce validity by reducing the opportunity for a broad range of learning
outcomes to be included.
Greater dependability is found where there are detailed, but generic, criteria
that allow evidence to be gathered from the full range of classroom work.
Bias in teachers assessments is generally due to teachers taking into account
information about non-relevant aspects of students behaviour or being
apparently inuenced by gender, special educational needs, or the general or
verbal ability of a student in judging performance in a particular task.
Researchers claim that bias in teachers assessment is susceptible to
correction through focused workshop training.
214
.
.
.
.
.
.
.
.
.
.
.
.
W. Harlen
216
W. Harlen
materials and opportunities for learning available and, most importantly, making
clear the purposes and goals of the work.
Some examples of using assessment in this way are provided by Maxwell (2004)
and Black et al. (2003). Maxwell describes the approach to assessment used in the
Senior Certicate in Queensland, in which evidence is collected over time in a
student portfolio, as progressive assessment. He states that
All progressive assessment necessarily involves feedback to the student about the quality
of their (sic) performance. This can be expressed in terms of the students progress
towards desired learning outcomes and suggested steps for further development and
improvement. . .
For this approach to work, it is necessary to express the learning expectations in terms of
common dimensions of learning (criteria). Then there can be discussion about whether
the student is on-target with respect to the learning expectations and what needs to be
done to improve performance on future assessment where the same dimensions appear.
As the student builds up the portfolio of evidence of their performance, earlier assessment
may be superseded by later assessment covering the same underlying dimensions of
learning. The aim is to report where the student got to in their learning journey, not
where they started or where they were on the average across the whole course. (Maxwell,
2004, pp. 2, 3)
218
W. Harlen
for improving learning. The use of computers makes this information available, in
some cases instantly, so that it provides feedback for the learner and the teacher that
can be used both formatively and summatively. In these cases the process of
assessment itself begins to impact on performance; teaching and assessment begin to
coalesce. Factors identied as values of using computers for learning then become
equally factors of value for assessment. These include: speed of processing, which
supports speed of learning; elements of motivation such as condence, autonomy,
self-regulation and enthusiasm, which support concentration and effort; ease of
making revisions and improved presentation, which support quality of writing and
other products; and information handling and organization, which support understanding (NCET, 1994).
Using formative assessment information for summative assessment
The approaches discussed above are linked to summative assessment as an
occasional, if frequent, event. In between classroom tests, whether administered by
computer or otherwise, there are innumerable other classroom events in which
teachers gather information about the students by observing, questioning, listening
to informal discussions among students, by reviewing written work and by using
students self-assessment (Harlen & James, 1997). In formative assessment this
information may be used immediately to provide students with help or it may be
stored and used to plan learning opportunities at a later stage. The information
gathered in this way is often inconclusive and may be contradictory, for what
students can do is likely to be inuenced by the particular context. This variation,
which would be a problem for summative assessment, is useful information for
formative purposes, suggesting the contexts in which students can be helped to
develop their ideas and skills. By denition, information gathered at this level of
detail relates to all aspects of students learning. It is valuable information that is
well suited to deciding next steps for individual learners or groups. An
important question is: can this rich but sometimes inconsistent information be
used for summative assessment purposes as well as for formative assessment, for
which it is so well suited? If not, then separate summative assessment will be
necessary.
A positive answer to this question was given by Harlen & James (1997) who
proposed that both purposes can be served providing that a distinction is made
between the evidence and the interpretation of the evidence. For formative assessment
the evidence is interpreted in relation to the progress of a student towards the goals of
a particular piece of work, next steps being decided according to where a student has
reached. The interpretation is in terms of what to do to help further learning, not
what level or grade a student has reached. For this purpose it is important for teachers
to have a view of progression in relation to the understanding and skills they are
aiming for their students to achieve. The course of progression can be usefully
expressed in terms of indicators, which both serve the purpose of focusing attention
on relevant aspects of students behaviour and enabling teachers to see where
220
W. Harlen
Figure 1. Formative and summative assessment using the same evidence but different criteria
reporting levels. In this process the change over time can be taken into account so
that, as in the Queensland portfolio assessment, preference is given to evidence that
shows progress during the period covered by the summative assessment. This process
is similar to the one teachers are advised to use in arriving at their teachers
assessment for reporting at the end of key stages in the National Curriculum
assessment. The difference is that in the approach suggested here teachers have
gathered information in ways suggested above (incorporating the key features of
formative assessment) over the whole period of students learning, and used it to help
students with their learning.
The detailed indicators will map onto the broader criteria, as suggested in
Figure 1. The mapping will smooth out any misplacement of the detailed
indicators. But it is important not to see this mapping as a summation of
judgements about each indicator. Instead the evidence is re-evaluated against the
broader reporting criteria.
Conclusion
What has the research evidence reviewed and the arguments presented here to say in
relation to the questions of whether teachers summative assessment and assessment
for learning need to be considered as distinct from each other or how they can be
harmonized? There seems to be value in maintaining the distinction between
formative and summative purposes of assessment while seeking synergy in relation to
the processes of assessment. These different purposes are real. One can conduct the
same assessment and use it for different purposes just as one can travel between two
places for different purposes. As the purpose is the basis for evaluating the success of
the journey, so the purpose of assessment enables us to evaluate whether or not the
purpose has been achieved. If we fuse, or confuse, formative and summative
purposes, experience strongly suggests that good assessment will mean good
assessment of learning, not for learning.
It is suggested here that the synergy of formative and summative assessment
comes from making use of the same evidence for the two purposes. This can be,
as in the Queensland example, where work collected in the portfolio is used to
222
W. Harlen
Black, P., Harrison, C., Lee, C., Marshall, B. & Wiliam, D. (2002) Working inside the black box
(London, Kings College London).
Black, P., Harrison, C., Lee, C., Marshall, B. & Wiliam, D. (2003) Assessment for learning: putting it
into practice (Maidenhead, Open University Press).
Broadfoot, P., Pollard, A., Osborn, M., McNess, E. & Triggs, P. (1998) Categories, standards and
instrumentalism: theorizing the changing discourse of assessment policy in English primary
education, paper presented at the Annual Meeting of the American Educational Research
Association, 1317 April, San Diego, California, USA.
Carter, C. R. (1997) Assessment: shifting the responsibility, Journal of Secondary Gifted Education,
9(2), Winter 1997/8, 6875.
Crooks, T. J. (1988) The impact of classroom evaluation practices on students, Review of
Educational Research, 58, 43881.
Cumming, J. & Maxwell, G. S. (2004) Assessment in Australian schools: current practice and
trends, Assessment in Education, 11(1), 89108.
Dweck, C. S (1992) The study of goals in psychology, Psychological Science, 3, 1657.
Gordon, S. & Rees, M. (1997) High-stakes testing: worth the price?, Journal of School Leadership, 7,
34568.
Harlen, W. (2004a) A systematic review of the evidence of reliability and validity of assessment by
teachers used for summative purposes (EPPI-Centre Review), Research Evidence in Education
Library, issue 3 (London, EPPI-Centre, Social Science Research Unit, Institute of
Education). Available on the website at: http://eppi.ioe.ac.uk/EPPIWeb/home.aspx?page = /
reel/review_groups/assessment/review_three.htm
Harlen, W. (2004b) A systematic review of the evidence of the impact on students, teachers and the
curriculum of the process of using assessment by teachers for summative purposes (EPPICentre Review), Research Evidence in Education Library, issue 4 (London, EPPI-Centre, Social
Science Research Unit, Institute of Education). Available on the website at: http://
eppi.ioe.ac.uk/EPPIWeb/home.aspx?page = /reel/review_groups/assessment/review_four.htm
Harlen, W. & Deakin Crick, R. (2002) A systematic review of the impact of summative assessment
and tests on students motivation for learning (EPPI-Centre Review), Research Evidence in
Education Library, issue 1 (London, EPPI-Centre, Social Science Research Unit, Institute of
Education). Available on the website at: http://eppi.ioe.ac.uk/EPPIWeb/home.aspx?page = /
reel/review_groups/assessment/review_one.htm
Harlen, W. & Deakin Crick, R. (2003a) Teaching and motivation for learning, Assessment in
Education, 10(2), 169 208.
Harlen, W. & Deakin Crick, R. (2003b) A systematic review of the impact on students and teachers
of the use of ICT for assessment of creative and critical thinking skills (EPPI-Centre Review),
Research Evidence in Education Library, issue 2 (London, EPPI-Centre, Social Science
Research Unit, Institute of Education). Available on the website at: http://eppi.ioe.ac.uk/
EPPIWeb/home.aspx?page = /reel/review_groups/assessment/review_two.htm
Harlen, W. & James, M. J. (1997) Assessment and learning: differences and relationships between
formative and summative assessment, Assessment in Education, 4(3), 36580.
Harlen, W. & Qualter, A. (2004) The teaching of science in primary schools (4th edn) (London, David
Fulton).
Hutchinson, C. (2001) Assessment is for learning: the way ahead (Internal Policy Paper, Scottish
Executive Education Department (SEED)).
Jackson, B. (1989) A comparison between computer-based and traditional assessment tests, and
their effects on pupil learning and scoring, School Science Review, 69, 80915.
Johnston, J. & McClune, W. (2000) Selection project sel 5.1: pupil motivation and attitudesselfesteem, locus of control, learning disposition and the impact of selection on teaching and
learning, in: The effects of the selective system of secondary education in Northern Ireland, Research
Papers, Vol. II (Bangor, Co. Down, Department of Education), 137.