Sei sulla pagina 1di 237

Assessment, Learning and Judgement

in Higher Education
Gordon Joughin
Editor

Assessment, Learning
and Judgement in Higher
Education

13
Editor
Gordon Joughin
University of Wollongong
Centre for Educational
Development & Interactive
Resources (CEDIR)
Wollongong NSW 2522
Australia
gordonj@ouw.edu.au

ISBN: 978-1-4020-8904-6 e-ISBN: 978-1-4020-8905-3


DOI: 10.1007/978-1-4020-8905-3
Library of Congress Control Number: 2008933305

# Springer ScienceþBusiness Media B.V. 2009


No part of this work may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, electronic, mechanical, photocopying, microfilming, recording
or otherwise, without written permission from the Publisher, with the exception
of any material supplied specifically for the purpose of being entered
and executed on a computer system, for exclusive use by the purchaser of the work.

Printed on acid-free paper

9 8 7 6 5 4 3 2 1

springer.com
In memory of Peter Knight
Preface

There has been a remarkable growth of interest in the assessment of student


learning and its relation to the process of learning in higher education over the
past ten years. This interest has been expressed in various ways – through large
scale research projects, international conferences, the development of principles
of assessment that supports learning, a growing awareness of the role of
feedback as an integral part of the learning process, and the publication of
exemplary assessment practices. At the same time, more limited attention has
been given to the underlying nature of assessment, to the concerns that arise
when assessment is construed as a measurement process, and to the role of
judgement in evaluating the quality of students’ work.
It is now timely to take stock of some of the critical concepts that underpin
our understanding of the multifarious relationships between assessment and
learning, and to explicate the nature of assessment as judgement. Despite the
recent growth in interest noted above, assessment in higher education remains
under-conceptualized. This book seeks to make a significant contribution to
conceptualizing key aspects of assessment, learning and judgement.
The book arose from the Learning-oriented Assessment Project (LOAP)
funded by the Hong Kong University Grants Committee, led by a team from
The Hong Kong Institute of Education and involving all ten of the higher
education institutions in Hong Kong between 2003 and 2006. LOAP initially
focused on assessment practices, with the goal of documenting and
disseminating practices that served explicitly to promote student learning.
This goal was achieved through conferences, symposia, and the publication of
two collections of learning-oriented assessment practices in English1 (Carless,
Joughin, Liu, & Associates, 2006) and Chinese2 (Leung & Berry, 2007). Along
with this goal, the project sought to reconceptualize the relationship between
assessment and learning, building on research conducted in the UK, the USA,
Europe, Asia and Australia, and drawing on leading assessment theorists in

1
Carless, D., Joughin, G., Liu, N-F., & Associates. (2006). How assessment supports learning:
Learning-oriented assessment in action. Hong Kong: Hong Kong University Press.
2
Leung, P. & Berry, R. (Eds). (2007). Learning-oriented assessment: Useful practices. Hong
Kong: Hong Kong University Press. (Published in Chinese)

vii
viii Preface

higher education. The initial outcome of this was a Special Issue of the journal,
Assessment and Evaluation in Higher Education, one of the leading scholarly
publications in this field (Vol 31, Issue 4: ‘‘Learning-oriented assessment:
Principles and practice’’).
In the final phase of the project, eight experts on assessment and learning in
higher education visited Hong Kong to deliver a series of invited lectures at The
Hong Kong Institute of Education. These experts, Tim Riordan from the USA,
David Boud and Royce Sadler from Australia, Filip Dochy from Belgium, and
Jude Carroll, Kathryn Ecclestone, Ranald Macdonald and Peter Knight from
the UK, also agreed to contribute a chapter to this book, following the themes
established in their lectures. Georgine Loacker subsequently joined Tim
Riordan as a co-author of his chapter, while Linda Suskie agreed to provide
an additional contribution from the USA. This phase of the project saw its
scope expand to include innovative thinking about the nature of judgement in
assessment, a theme particularly addressed in the contributions of Boud, Sadler
and Knight.
The sudden untimely death of Peter Knight in April 2007 was a great shock
to his many colleagues and friends around the world and a great loss to all
concerned with the improvement of assessment in higher education. Peter had
been a prolific and stimulating writer and a generous colleague. His colleague
and sometime collaborator, Mantz Yorke, generously agreed to provide the
chapter which Peter would have written. Mantz’s chapter appropriately draws
strongly on Peter’s work and effectively conveys much of the spirit of Peter’s
thinking, while providing Mantz’s own unique perspective.
My former colleagues at the Hong Kong Institute of Education, David
Carless and Paul Morris, the then President of the Institute, while not directly
involved in the development of this book, provided the fertile ground for its
development through their initial leadership of LOAP. I am also indebted to
Julie Joughin who patiently completed the original preparation and formatting
of the manuscript.

Wollongong Gordon Joughin


February 2008
Contents

1 Introduction: Refocusing Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.


Gordon Joughin

2 Assessment, Learning and Judgement in Higher Education: A Critical


Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Gordon Joughin

3 How Can Practice Reshape Assessment? . . . . . . . . . . . . . . . . . . . . . . . . . . 29


..
David Boud

4 Transforming Holistic Assessment and Grading into a Vehicle


for Complex Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
D. Royce Sadler

5 Faulty Signals? Inadequacies of Grading Systems and a Possible


Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Mantz Yorke

6 The Edumetric Quality of New Modes of Assessment:


Some Issues and Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Filip Dochy

7 Plagiarism as a Threat to Learning: An Educational Response . . . . . . . . 115


...
Jude Carroll

8 Using Assessment Results to Inform Teaching Practice and Promote


Lasting Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Linda Suskie

9 Instrumental or Sustainable Learning? The Impact of Learning


Cultures on Formative Assessment in Vocational Education . . . . . . . . 153
Kathryn Ecclestone

ix
x Contents

10 Collaborative and Systemic Assessment of Student Learning:


From Principles to Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Tim Riordan and Georgine Loacker

11 Changing Assessment in Higher Education: A Model in Support


of Institution-Wide Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Ranald Macdonald and Gordon Joughin

12 Assessment, Learning and Judgement: Emerging Directions . . . . . . . 215


Gordon Joughin

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229


Contributors

About The Editor


Gordon Joughin is Coordinator of Academic Development in the Centre for
Educational Development and Interactive Resources at the University of
Wollongong, Australia. He has worked in several Australian universities as
well as the Hong Kong Institute of Education where he directed the final phase
of the Hong Kong wide Learning Oriented Assessment Project. His research
and recent writing has focused on the relationship between learning and
assessment, with a special emphasis on oral assessment. His most recent book
(with David Careless, Ngar-Fun Lui and Associates) is How Assessment
Supports Learning: Learning-oriented Assessment in Action (Hong Kong
University Press). He is a member of the Executive Committee of the Higher
Education Research and Development Society of Australasia and a former
President of the Association’s Hong Kong Branch.

About The Authors


David Boud is Dean of the University Graduate School and Professor of Adult
Education at the University of Technology, Sydney. He has been President of
the Higher Education Research and Development Society of Australasia and is
a Carrick Senior Fellow. He has written extensively on teaching, learning and
assessment in higher and professional education, and more recently on
workplace learning. In the area of assessment he has been a pioneer in
developing learning-centred approaches to assessment and has particularly
focused on self-assessment (Enhancing Learning through Self Assessment,
Kogan Page, 1995) and building assessment skills for long-term learning
(Rethinking Assessment in Higher Education: Learning for the Longer Term,
Routledge, 2007). His work can be accessed at www.davidboud.com.
Jude Carroll is the author of A Handbook for Deterring Plagiarism in
Higher Education (Oxford Centre for Staff and Learning Development,
2007, 2nd edition) and works at Oxford Brookes University in the UK.

xi
xii Contributors

She is Deputy Director of the Assessment Standards Knowledge Exchange


(ASKe), a Centre for Excellence in Learning and Teaching focused on ensuring
students understand assessment standards. She researches, writes and delivers
workshops about student plagiarism and about teaching international students
in ways that improve learning for all students.
Filip Dochy is Professor of Research on Teaching and Training and Corporate
Training, based jointly in the Centre for Educational Research on Lifelong
Learning and Participation and the Centre for Research on Teaching and
Training, at the University of Leuven, Belgium. He is a past president of
EARLI, the European Association for Research on Learning and Instruction,
and the Editor of Educational Research Review, the official journal of EARLI.
His current research focuses on training, team learning, teacher education and
higher education, assessment, and corporate learning, and he has published
extensively on these themes.
Kathryn Ecclestone is Professor in Education at the Westminster Institute of
Education at Oxford Brookes University. She has worked in post-compulsory
education for the past 20 years as a practitioner in youth employment schemes
and further education and as a researcher specialising in the principles, politics
and practices of assessment and its links to learning, motivation and autonomy.
She has a particular interest in socio-cultural approaches to understanding the
interplay between policy, practice and attitudes to learning and assessment.
Kathryn is a member of the Assessment Reform Group and the Access to Higher
Education Assessment working group for the Quality Assurance Agency. She is
on the Editorial Board of Studies in the Education of Adults and is books review
editor for the Journal of Further and Higher Education. She has published a
number of books and articles on assessment in post-compulsory education.
Georgine Loacker is Senior Assessment Scholar at Alverno College, Milwaukee,
USA. While heading the English Department, she participated in the
development of the ability-based education and student assessment process
begun there in 1973 and now internationally recognized. She continues to
contribute to assessment theory through her writing and research on
assessment of the individual student. She has conducted workshops and
seminars on teaching for and assessing student learning outcomes throughout
the United States and at institutions in the UK, Canada, South Africa, Costa
Rica and in Australia and New Zealand as a Visiting Fellow of the Higher
Education Research and Development Society of Australasia.
Ranald Macdonald is Professor of Academic Development and Head of
Strategic Development in the Learning and Teaching Institute at Sheffield
Hallam University. He has been responsible for leading on policy and
practice aspects of assessment, plagiarism, learning and teaching. A previous
Co-chair of the UK’s Staff and Educational Development Association, Ranald
is currently Chair of its Research Committee and a SEDA Fellowship holder.
He was awarded a National Teaching Fellowship in 2005 and was a Visiting
Contributors xiii

Fellow at Otago University, New Zealand in 2007. Ranald’s current research and
development interests include scholarship and identity in academic development,
enquiry-focused learning and teaching, assessment and plagiarism, and the
nature of change in complex organisations. He is a keen orienteer, gardener
and traveller, and combines these with work as much as possible!
Tim Riordan is Associate Vice President for Academic Affairs and Professor
of Philosophy at Alverno College. He has been at Alverno since 1976 and in
addition to his teaching has been heavily involved in developing programs and
processes for teaching improvement and curriculum development. In addition
to his work at the college, he has participated in initiatives on the scholarship
of teaching, including the American Association for Higher Education Forum
on Exemplary Teaching and the Association of American Colleges and
Universities’ Preparing Future Faculty Project. He has regularly presented at
national and international conferences, consulted with a wide variety of
institutions, and written extensively on teaching and learning. He is co-editor
of the 2004 Stylus publication, Disciplines as Frameworks for Student Learning:
Teaching the Practice of the Disciplines. He was named the Marquette
University School of Education Alumnus of the Year in 2002, and he received
the 2001 Virginia B. Smith Leadership Award sponsored by the Council for
Adult and Experiential Learning and the National Center for Public Policy and
Higher Education.
D. Royce Sadler is a Professor of Higher Education at the Griffith Institute for
Higher Education, Griffith University in Brisbane, Australia. He researches the
assessment of student achievement, particularly in higher education. Specific
interests include assessment theory, methodology and policy; university grading
policies and practice; formative assessment; academic achievement standards
and standards-referenced assessment; testing and measurement; and assessment
ethics. His research on formative assessment and the nature of criteria and
achievement standards is widely cited. He engages in consultancies for
Australian and other universities on assessment and related issues. A member
of the Editorial Advisory Boards for the journals Assessment in Education, and
Assessment and Evaluation in Higher Education, he also reviews manuscripts on
assessment for other journals.
Linda Suskie is a Vice President at the Middle States Commission on Higher
Education, an accreditor of colleges and universities in the mid-Atlantic region
of the United States. Prior positions include serving as Director of the American
Association for Higher Education’s Assessment Forum. Her over 30 years of
experience in college and university administration include work in assessment,
institutional research, strategic planning, and quality management.
Ms. Suskie is an internationally recognized speaker, writer, and consultant
on a broad variety of higher education assessment topics. Her latest book is
Assessment of Student Learning: A Common Sense Guide (Jossey-Bass Anker
Series, 2004).
xiv Contributors

Mantz Yorke is currently Visiting Professor in the Department of Educational


Research, Lancaster University. He spent nine years in schools and four in
teacher education at Manchester Polytechnic before moving into staff
development and educational research. He then spent six years as a senior
manager at Liverpool Polytechnic, after which he spent two years on
secondment as Director of Quality Enhancement at the Higher Education
Quality Council. He returned to his institution as Director of the Centre for
Higher Education Development, with a brief to research aspects of institutional
performance, in particular that relating to ‘the student experience’. He has
published widely on higher education, his recent work focusing on
employability, student success, and assessment.
Chapter 1
Introduction: Refocusing Assessment

Gordon Joughin

The complexity of assessment, both as an area of scholarly research and as a


field of practice, has become increasingly apparent over the past 20 years.
Within the broad domain of assessment, assessment and learning has emerged
as a prominent strand of research and development at all levels of education. It
is perhaps no coincidence that, as the role of assessment in learning has moved
to the foreground of our thinking about assessment, a parallel shift has
occurred towards the conceptualisation of assessment as the exercise of profes-
sional judgement and away from its conceptualisation as a form of quasi-
measurement. Assessment, learning and judgement have thus become central
themes in higher education. This book represents an attempt to add clarity to
the discussion, research and emerging practices relating to these themes.
In seeking to develop our understanding of assessment, learning and judge-
ment, the emphasis of this book is conceptual: its aim is to challenge and extend
our understanding of some critical issues in assessment whilst remaining
mindful of how assessment principles are enacted in practice. However, under-
standing, while a worthwhile end in itself, becomes useful only when it serves
the purpose of improvement. Hence an underlying thread of the book is change.
Each chapter proposes changes to the ways in which we think about assessment,
and each suggests, explicitly or by implication, important changes to our
practices, whether at the subject or course level, the level of the university, or
the level of our higher education systems as a whole.
The concepts of assessment, learning and judgement draw together the three
core functions of assessment. While assessment can fulfil many functions,1 three
predominate: supporting the process of learning; judging students’ achievement in
relation to course requirements; and maintaining the standards of the profession

G. Joughin
Centre for Educational Development and Interactive Resources,
University of Wollongong, NSW 2522, Australia
e-mail: gordonj@uow.edu.au
1
Brown, Bull and Pendlebury (1997), for example, list 17.

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 1


DOI: 10.1007/978-1-4020-8905-3_1, Ó Springer ScienceþBusiness Media B.V. 2009
2 G. Joughin

or discipline for which students are being prepared. Each of these is important,
with each having particular imperatives and each giving rise to particular issues
of conceptualisation and implementation.

Assessment and Learning


There are numerous ways in which assessment may be seen to promote learning
and which have now been well rehearsed in the higher education literature (see,
for example, Carless, Joughin, Liu, & Associates, 2006, Gibbs & Simpson,
2004–2005). Four of these are particularly pertinent to this book, with authors
offering insights into aspects of assessment and learning that are especially
problematic.
The first is through the design of assessment tasks as learning tasks, so that
the process of meeting assessment requirements requires students to engage in
processes expected to lead to lasting learning of a worthwhile kind. Sitting for
an examination or completing a multiple-choice test will not lead to such
learning; working on a project over a period of time, engaging in field work,
or producing a carefully constructed essay can well be learning processes which
also result in products that provide information about the student’s knowledge
and skills. Assessment that engages students in the process of learning needs to
involve tasks for which responses need to be created by the student, rather than
responses that fall to hand from existing texts, a quick search of the Internet, or
from the work of fellow students.
The second and most commonly referred to way in which assessment can
promote learning is through feedback, where feedback is defined as a process of
identifying gaps between actual and desired performance, noting ways of brid-
ging those gaps, and then having students take action to bridge the gaps.
Feedback thus conceived is a moderately complex process of learning which is
often difficult to enact to the satisfaction of students or teachers.
The third way in which assessment can support learning is through the
development of students’ capacity to evaluate the quality of their own work
while they are undertaking assessment tasks, a process most clearly articu-
lated by Sadler (particularly influentially in Sadler (1989) and interestingly
extended in this book) and Boud (2007 and Chapter 3 of this book).
Assessment thus functions to help students learn about assessment itself,
particularly as it pertains to their own work, since evaluating and improving
one’s work becomes an essential requirement of their successful future
practice.
A fourth way involves the use of assessment results to inform teaching, and
thus, indirectly, to improve student learning. While this is a function of assess-
ment that is often acknowledged in higher education, how it can be operatio-
nalised is rarely addressed.
1 Introduction: Refocusing Assessment 3

Assessment and Judgement

Assessment as judging achievement draws attention to the nature of assessment


as the exercise of professional judgment, standing in contrast to misplaced
notions of assessment as measurement. As Peter Knight pointed out,
‘‘measurement theory expects real and stable objects, not fluctuating and con-
tested social constructs’’ (Knight, 2007a, p. 77). One of the contributors to this
book, David Boud, previously has argued persuasively that assessment is best
reframed in terms of informing judgement, that is, ‘‘the capacity to evaluate
evidence, appraise situations and circumstances astutely, to draw sound con-
clusions and act in accordance with this analysis’’ (Boud, 2007, p. 19). Elsewhere
Knight (2007b) noted a set of increasingly important learning outcomes that
not only defy measurement but are extraordinarily difficult to judge, namely the
wicked competencies associated with graduate attributes or generic learning
outcomes.
The process of making judgements, and the bases on which they are made,
are often considered to be unproblematic: experienced teachers apply pre-
determined criteria based on required course outcomes to allocate marks and
determine grades. This treatment of student work is strongly challenged in this
book, particularly by Sadler when he addresses the nature of expert judgement
as essentially holistic.
When assessment’s purpose is defined as judging achievement, the achieve-
ment referred to is usually internally referent, that is, it is concerned with
performance associated with the content and goals of the university course,
often defined as learning objectives or intended learning outcomes. With this
internal reference point, the responsibility of the academic to his or her depart-
ment and its programmes is to the fore.
Assessment as maintaining professional or disciplinary standards moves the
focus of assessment outside the course and locates it in the world of profes-
sional or disciplinary practice. When assessment is construed as a means of
protecting and promoting practice standards, assessment methods need to
move towards the forms of ongoing assessment that are found in the work-
place. This is a challenge for many university teachers, since it locates assess-
ment beyond the sphere of academia. Where a course has a significant
practical placement, this step may be only slightly less difficult. With a focus
on professional standards, the identification of the academic with his or her
discipline or profession is to the fore, while responsibility to the institution
moves to the background.
The maintenance of standards also raises questions about the quality stan-
dards of assessment practices themselves. Employers and professional bodies,
as well as universities, need confidence in the abilities of graduates as certified
by their awards.
When assessment has been conceived of as an act of measurement with
an almost exclusive purpose of determining student achievement, traditional
4 G. Joughin

standards of testing and measurement have applied. High stakes assessment


practices in the final year or years of high school often find their way into
university, with standards for reliability, validity and fairness, such as the
Standards for Educational and Psychological Testing developed by the Amer-
ican Educational Research Association, the American Psychological Associa-
tion and the National Council on Measurement in Education (1999) being
seen as essential if assessment is to be credible to its various stakeholders. Such
standards have been developed within a measurement paradigm, with an
almost exclusive emphasis on measuring achievement such that not only is
learning not seen as a function of assessment, but little or no concession is
made to the consequential effects of assessment on how students go about
learning.
Assessment practices designed to promote learning, while seeking to balance
assessment’s learning and judging functions, call not for the simple abandon-
ment of traditional standards from the measurement paradigm but for the
rethinking of standards that address assessment’s multiple functions in a
more compelling way.

Assessment and Change

We talk and write about assessment because we seek change, yet the process
of change is rarely addressed in work on assessment. Projects to improve
assessment in universities will typically include references to dissemination
or embedding of project outcomes, yet real improvements are often limited
to the work of enthusiastic individuals or to pioneering courses with ener-
getic and highly motivated staff. Examples of large scale successful initia-
tives in assessment are difficult to find. On the other hand, the exemplary
assessment processes of Alverno College continue to inspire (and senior
Alverno staff have contributed to this book) and the developing conceptua-
lisation of universities as complex adaptive systems promises new ways of
understanding and bringing about realistic change across a university
system.
Assessment is far from being a technical matter, and the improvement of
assessment requires more than a redevelopment of curricula. Assessment occurs
in the context of complex intra- and interpersonal factors, including teachers’
conceptions of and approaches to teaching; students’ conceptions of and
approaches to learning; students’ and teachers’ past experiences of assessment;
and the varying conceptions of assessment held by teachers and students
alike. Assessment is also impacted by students’ varying motives for entering
higher education and the varying demands and priorities of teachers who
typically juggle research, administration and teaching within increasingly
demanding workloads. Pervading assessment are issues of power and com-
plex social relationships.
1 Introduction: Refocusing Assessment 5

Re-Focusing Assessment

This book does not propose a radical re-thinking of assessment, but it does call
for a re-focusing on educationally critical aspects of assessment in response to
the problematic issues raised across these themes by the contributors to this
book. The following propositions which emerge from the book’s various chap-
ters illustrate the nature and scope of this refocusing task:
 Frequently cited research linking assessment and learning is often misunder-
stood and may not provide the sure foundations for action that are fre-
quently claimed.
 Assessment is principally a matter of judgement, not measurement.
 The locus of assessment should lie beyond the course and in the world of
practice.
 Criteria-based approaches can distort judgement; the use of holistic
approaches to expert judgement needs to be reconsidered.
 Students need to become responsible for more aspects of their own assess-
ment, from evaluating and improving assignments to making claims for their
acquisition of complex generic skills across their whole degree.
 Traditional standards of validity, reliability and fairness break down in the
context of new modes of assessment that support learning; new approaches
to quality standards for assessment are required.
 Plagiarism is a learning issue; assessment tasks that can be completed without
active engagement in learning cannot demonstrate that learning has occurred.
 Students entering higher education come from distinctive and powerful
learning and assessment cultures in their prior schools and colleges. These
need to be acknowledged and accommodated as students are introduced to
the new learning and assessment cultures of higher education.
 Universities are complex adaptive systems and need to be understood as such
if change is to be effective. Moreover, large scale change depends on gen-
erating consensus on principles rather than prescribing specific practices.

The Structure of the Book

The themes noted above of assessment, learning, judgement, and the imperatives
for change associated with them, are addressed in different ways throughout the
book, sometimes directly and sometimes through related issues. Moreover,
these themes are closely intertwined, so that, while some chapters of this book
focus on a single theme, many chapters address several of them, often in quite
complex ways.
In Chapter 2, Joughin provides a short review of four matters that are at the
heart of our understanding of assessment, learning and judgement and which
for various reasons are seen as highly problematic. The first of these concerns
how assessment is often defined in ways that either under-represent or conflate
6 G. Joughin

the construct, leading to the suggestion that a simple definition of assessment


may facilitate our understanding of assessment itself and its relationship to
other constructs. The chapter then addresses the empirical basis of two propo-
sitions about assessment and learning that have become almost articles of
belief in assessment literature: that assessment drives student learning and
that students’ approaches to learning can be improved by changing assessment.
Finally, research on students’ experience of feedback is contrasted with the
prominence accorded to feedback in learning and assessment theory.
Three chapters – those of Boud, Sadler and Yorke – address problematic
issues of judgement and the role of students in developing the capacity to
evaluate their own work.
Boud’s chapter, How Can Practice Reshape Assessment, begins this theme.
Boud seeks to move our point of reference for assessment from the world
of teaching and courses to the world of work where assessment is part of day-
to-day practice as we make judgements about the quality of our own work,
typically in a collegial context where our judgements are validated. Boud
argues that focusing on what our graduates do once they commence work,
specifically the kinds of judgements they make and the contexts in which these
occur, can inform how assessment is practised within the educational
institution.
Boud’s contribution is far more that a plea for authentic assessment. Firstly,
it represents a new perspective on the purpose of assessment. As noted earlier in
this chapter, assessment is usually considered to have three primary functions –
to judge achievement against (usually preset) educational standards; to pro-
mote learning; and to maintain standards for entry into professional and other
fields of practice – as well as a number of subsidiary ones. Boud foregrounds
another purpose, namely to develop students’ future capacity to assess their
own work beyond graduation. We are well accustomed to the notion of learning
to learn within courses as a basis for lifelong learning in the workplace and
beyond. Boud has extended this in a remarkable contribution to our under-
standing. Along the way he provides useful insights into the nature of practice,
argues for conceptualising assessment as informing judgement, and highlights
apprenticeship as a prototype of the integration of assessment and feedback in
daily work. He concludes with a challenge to address 10 assessment issues
arising from the practice perspective.
Sadler, in his chapter on Transforming Holistic Assessment and Grading into a
Vehicle for Complex Learning, issues a provocative but timely challenge to one
of the current orthodoxies of assessment, namely the use of pre-determined
criteria to evaluate the quality of students’ work. Sadler brings a strong sense
of history to bear, tracing the progressive development of analytic approaches
to Edmund Burke in 1759, noting literature in many fields, and specifically
educational work over at least forty years. This chapter represents a significant
extension of Sadler’s previous seminal work on assessment (see especially
Sadler, 1983, 1989, the later being one of the most frequently cited works on
formative assessment). In his earlier work, Sadler established the need for
1 Introduction: Refocusing Assessment 7

students to develop a capacity to judge the quality of their work similar to that
of their teacher, using this capacity as a basis to improve their work while it is
under development. Sadler’s support for this parallels Boud’s argument regard-
ing the need for practice to inform assessment, namely that this attribute is
essential for students to perform adequately in the world of their future work.
Sadler builds on his earlier argument that this entails students developing a
conception of quality as a generalized attribute, since this is how experts,
including experienced teachers, make judgements involving multiple criteria.
Consequently he argues here that students need to learn to judge work holi-
stically rather than analytically. This argument is based in part on the limita-
tions of analytic approaches, in part on the nature of expert judgement, and it
certainly requires students to see their work as an integrated whole. Fortunately
Sadler not only argues the case for students’ learning to make holistic judge-
ments but presents a detailed proposal for how this can be facilitated.
Yorke’s chapter, Faulty Signals? Inadequacies of Grading Systems and a
Possible Response, is a provocative critique of one of the bases of assessment
practices around the world, namely the use of grades to summarise student
achievement in assessment tasks. Yorke raises important questions about the
nature of judgement involved in assessment, the nature of the inferences made
on the basis of this judgement, the distortion which occurs when these infer-
ences come to be expressed in terms of grades, and the limited capacity of grades
to convey to others the complexity of student achievement.
Yorke points to a number of problems with grades, including the basing of
grades on students’ performance in relation to a selective sampling of the
curriculum, the wide disparity across disciplines and universities in types of
grading scales and how they are used, the often marked differences in distribu-
tions of grades under criterion- and norm-referenced regimes, and the ques-
tionable legitimacy of combining grades derived from disparate types of
assessment and where marks are wrongly treated as lying on an interval scale.
Grading becomes even more problematic when what Knight (2007a, b) termed
wicked competencies are involved – when real-life problems are the focus of
assessment and generic abilities or broad-based learning outcomes are being
demonstrated. Yorke proposes a radical solution – making students responsible
for preparing a claim for their award based on evidence of their achievements
from a variety of sources, including achievement in various curricular compo-
nents, work placements and other learning experiences. Such a process would
inevitably make students active players in learning to judge their own achieve-
ments and regulate their own learning accordingly.
When assessment practices are designed to promote learning, certain
consequences ensue. Dochy, Carroll and Suskie examine three of these.
Dochy, in The Edumetric Quality of New Modes of Assessment: Some Issues
and Prospects, notes that when assessment focuses on learning and moves away
from a measurement paradigm, the criteria used to evaluate the quality of the
assessment practices need to change. Carroll links learning directly to the need
to design assessment that will deter plagiarism. Suskie describes how the careful
8 G. Joughin

analysis of assessment outcomes can be a powerful tool for further learning


through improved teaching.
Dochy begins his chapter by overviewing the negative effects that traditional
forms of high stakes assessment can have on students’ motivation and self-
esteem, on teachers’ professionalism, and, perhaps most importantly on limit-
ing the kinds of learning experiences to which students are exposed. He quickly
moves on to describe the current assessment culture in higher education which
he depicts as strongly learning oriented, emphasizing the integration of assess-
ment and instruction and using assessment as a tool for learning. This is a
culture in which students are critically involved, sharing responsibility for such
things as the development of assessment criteria.
Most importantly for Dochy is the basing of new modes of assessment on
complex, real life problems or authentic replications of such problems. Such
forms of assessment challenge traditional ways of characterising the quality of
assessment in terms of validity and reliability, so that these constructs need to
be reframed. He argues that classic reliability theory should be replaced by
‘generalisability theory’ which is concerned with how far judgements of com-
petence in relation to one task can be generalised to other tasks. Dochy draws
on Messick’s well known conceptualization of validity in terms of content,
substantive, structural, external, generalizability, and consequential aspects,
interpreting and expanding these by considering transparency, fairness, cogni-
tive complexity, authenticity, and directness of assessment based on immediate
holistic judgements. It is this reconceptualisation of criteria for evaluating new
modes of assessment that leads to Dochy’s emphasis on the edumetric rather
than the psychometric qualities of assessment.
Carroll’s chapter on Plagiarism as a Threat to Learning: An Educational
Response explores the relationship between plagiarism and learning, arguing
from a constructivist perspective the need for students to transform, use or
apply information, rather than reproduce information, if learning is to occur –
plagiarised work fails to show that students have done this work of making
meaning and consequently indicates that the process of learning has been
bypassed. Interestingly, Carroll links students’ capacity to understand the
problem with plagiarism to Perry’s stages of intellectual and ethical develop-
ment (Perry, 1970), with students at a dualistic stage having particular difficulty
in seeing the need to make ideas their own rather than simply repeating the
authoritative words of others, while students at the relativistic stage are more
likely to appreciate the nature of others’ work and the requirements for dealing
with this according to accepted academic principles. The chapter concludes
with a series of questions to be addressed in setting assessment tasks that will
tend to engage students in productive learning processes while also inhibiting
them from simply finding an answer somewhere and thus avoiding learning.
In her chapter on Using Assessment Results to Inform Teaching Practice and
Promote Lasting Learning, Suskie lists thirteen conditions under which students
learn most effectively. These then become the basis for elucidating the ways in
which assessment can promote what she terms ‘‘deep, lasting learning’’. Given
1 Introduction: Refocusing Assessment 9

the limitations to assessment as a means of positively influencing students’


approaches to learning noted in Chapter 1, Suskie’s location of assessment
principles in broader pedagogical considerations is an important qualification.
The use of assessment to improve teaching is often suggested as an important
function of assessment, but how this can occur in practice is rarely explained.
Suskie suggests that a useful starting point is to identify the decisions about
teaching that an analysis of assessment results might inform, then clarify the
frame of reference within which this analysis will occur. The latter could be
based on identifying students’ strengths and weaknesses, the improvement in
results over the period of the class or course, how current students’ performance
compares with previous cohorts, or how current results compare to a given set
of standards. Suskie then elucidates a range of practical methods for summar-
ising and then interpreting results in ways that will suggest foci for teacher
action.
While most of the chapters in this book are strongly student oriented,
Ecclestone’s chapter provides a particularly sharp focus on students’ experience
of assessment before they enter university and the consequences this may have
for their engagement in university study and assessment. Ecclestone notes that
how students in higher education respond to assessment practices explicitly
aimed at encouraging engagement with learning and developing motivation
will be influenced by the learning cultures they have experienced before entering
higher education. Consequently university teachers seeking to use assessment to
promote learning need to understand the expectations, attitudes and practices
regarding assessment that students bring with them based on their experience of
assessment and learning in schools or in the vocational education sector.
Ecclestone’s chapter provides critical insights into motivation and the
concept of a learning culture before exploring the learning cultures of two
vocational courses, with emphases on formative assessment and motivation.
The research reported by Ecclestone supports the distinction between the spirit
and the letter of assessment for learning, with the former encouraging learner
engagement and what Eccelstone terms sustainable learning, while the latter is
associated with a more constrained and instrumental approach to learning.
Two chapters deal with institution level change – Riordan and Loacker
through an analysis of system level processes at Alverno College, and Macdo-
nald and Joughin through proposing a particular conceptual understanding of
university systems.
While Yorke raises serious issues regarding the assessment of complex
learning outcomes, Riordan and Loacker approach this without reservation
in their chapter on Collaborative and Systemic Assessment of Student Learning:
From Principles to Practice. Perhaps buoyed by a long experience of successfully
developing an approach to education grounded in Alverno’s ability-based
curriculum, they present a set of six tightly integrated principles that have
come to underpin the Alverno approach to assessment. Some of these principles
are clearly articulated statements that reinforce the assertions of other contri-
butors to this book. Thus, along with Sadler and Boud, they emphasize the
10 G. Joughin

importance of self-assessment at college as a precursor to self-assessment as


a practicing graduate. And like Dochy, they emphasize the importance of
performance assessment based on the kinds of contexts students will face later
in their working lives. However, in contrast to Sadler, they locate criteria at the
heart of the process of feedback, using it to help students develop their sense of
their learning in action and to develop a language for the discussion of their
performance.
Each of the preceding chapters has proposed change. In suggesting new
ways of thinking about assessment, learning and judgement, they have also
argued for new practices. In most cases the changes advocated are not incre-
mental but rather call for significant changes at the level of programmes,
institutions or even more broadly across the higher education sector. Macdo-
nald and Joughin’s chapter on Changing Assessment in Higher Education: A
Model in Support of Institutional-wide Improvement consequently addresses
change in the context of universities as complex adaptive systems, drawing on
the insights of organisational and systems theorists to propose a model of
universities that will contribute to the support of institution-wide change in
assessment, based on participation, conversation, unpredictability, uncertainty
and paradox.
The concluding chapter outlines a series of progressions in our thinking
about assessment which arise from the preceding chapters. These progressions
in turn give rise to a set of challenges, each far from trivial, which could well set a
constructive agenda for theorizing, researching, and acting to improve assess-
ment. This agenda would include the ongoing exploration of central tenets of
assessment; renewing research into taken-for-granted beliefs about the influ-
ence of assessment on learning; realigning students’ and teachers’ roles in
assessment; and applying emerging understandings of universities as complex
adaptive systems to the process of improving assessment across our higher
education institutions.

References
American Educational Research Association, the American Psychological Association, & the
National Council on Measurement in Education (1999). Standards for Educational and
Psychological Testing. Washington DC: American Educational Research Association.
Angelo, T. (1999). Doing assessment as if learning matters most. AAHE Bulletin, 51(9), 3–6.
Boud, D. (2007). Reframing assessment as if learning were important. In D. Boud &
N. Falchikov (Eds), Rethinking assessment for higher education: Learning for the longer
term (pp. 14–25). London: Routledge.
Brown, G., Bull, J., & Pendlebury, M. (1997). Assessing student learning in higher education.
London: Routledge.
Carless, D., Joughin, G., Liu, N.F., & Associates (2006). How assessment supports learning:
Learning-oriented assessment in action. Hong Kong: Hong Kong University Press.
Gibbs, G., & Simpson, C. (2004–5). Conditions under which assessment supports students’
learning. Learning and Teaching in Higher Education, 1, 3–31.
1 Introduction: Refocusing Assessment 11

Joughin, G. (2004, November). Learning oriented assessment: A conceptual framework. Paper


presented at Effective Teaching and Learning Conference, Brisbane, Australia.
Joughin, G., & Macdonald, R. (2004). A model of assessment in higher education institutions.
The Higher Education Academy. Retrieved September 11, 2007, from http://www.
heacademy.ac.uk/resources/detail/id588_model_of_assessment_in_heis
Knight, P. (2007a). Grading, classifying and future learning. In D. Boud & N. Falchikov (Eds.),
Rethinking assessment in higher education (pp. 72–86). Abingdon and New York: Routledge.
Knight, P. (2007b). Fostering and assessing ‘wicked’ competencies. The Open University.
Retrieved November 5, 2007, from http://www.open.ac.uk/cetl-workspace/cetlcontent/
documents/460d1d1481d0f.pdf
Perry, W. (1970). Forms of intellectual and ethical development in the college years: A scheme.
New York: Holt.
Sadler, R. (1989). Formative assessment and the design of instructional systems. Instructional
Science, 18, 1, 1–25.
Chapter 2
Assessment, Learning and Judgement in Higher
Education: A Critical Review

Gordon Joughin

Introduction

The literature on assessment in higher education is now so vast that a


comprehensive review of it would be ambitious under any circumstances. The
goals of this chapter are therefore modest: it offers a brief review of four central
concepts regarding assessment, learning and judgement that are considered
problematic because they are subject to conceptual confusion, lack the clear
empirical support which is often attributed to them, or give rise to contra-
dictions between their theoretical explication and actual practice. The first of
these concepts concerns how assessment is defined – the chapter thus begins by
proposing a reversion to a simple definition of assessment and noting the
relationships between assessment and learning in light of this definition. The
second concept concerns the axiom that assessment drives student learning.
The third concept concerns the widely held view that students’ approaches to
learning can be improved by changing assessment. The fourth and final matter
addressed is the contradiction between the prominence of feedback in theories
of learning and its relatively impoverished application in practice.

Towards a Simple Definition of Assessment

Any discussion of assessment in higher education is complicated by two factors.


Firstly, assessment is a term that is replete with emotional associations, so that
any mention of assessment can quickly become clouded with reactions that
inhibit profitable discussion. The statement that ‘‘assessment often engenders
strong emotions’’ (Carless, Joughin, Liu & Associates, 2006, p. 2) finds particular
support in the work of Boud who has pointed out that ‘‘assessment probably
provokes more anxiety among students and irritation among staff than any other

G. Joughin
Centre for Educational Development and Interactive Resources,
University of Wollongong, NSW 2522, Australia
e-mail: gordonj@uow.edu.au

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 13


DOI: 10.1007/978-1-4020-8905-3_2, Ó Springer ScienceþBusiness Media B.V. 2009
14 G. Joughin

feature of higher education’’ (Boud, 2006, p. xvii), while Falchikov and Boud’s
recent work on assessment and emotion has provided graphic descriptions of the
impact of prior assessment experiences on a group of teachers completing a
masters degree in adult education (Falchikov & Boud, 2007). In biographical
essays, these students reported few instances of positive experiences of assessment
in their past education, but many examples of bad experiences which had major
impacts on their learning and self-esteem. Teachers in higher education will be
affected not only by their own past experience of assessment as students, but
also by their perceptions of assessment as part of their current teaching role.
Assessment in the latter case is often associated with high marking loads, anxiety-
inducing deadlines as examiners board meetings approach, and the stress of
dealing with disappointed and sometimes irate students.
There is a second difficulty in coming to a clear understanding of assessment.
As Rowntree has noted, ‘‘it is easy to get fixated on the trappings and outcomes of
assessment – the tests and exams, the questions and marking criteria, the grades
and degree results – and lose sight of what lies at the heart of it all’’ (Rowntree,
2007, para. 6). Thus at an individual, course team, department or even university
level, we come to see assessment in terms of our own immediate contexts,
including how we view the purpose of assessment, the roles we and our students
play, institutional requirements, and particular strategies that have become part
of our taken-for-granted practice. Our understanding of assessment, and thus
how we come to define it, may be skewed by our particular contexts.
Space does not permit the exploration of this through a survey of the
literature and usage of the term across universities. Two definitions will suffice
to illustrate the issue and highlight the need to move towards a simple definition
of assessment.
Firstly, an example from a university policy document. The University of
Queensland, Australia, in its statement of assessment policy and practice,
defines assessment thus:
Assessment means work (e.g., examination, assignment, practical, performance) that a
student is required to complete for any one or a combination of the following reasons:
the fulfillment of educational purposes (for example, to motivate learning, to provide
feedback); to provide a basis for an official record of achievement or certification of
competence; and/or to permit grading of the student. (The University of Queensland,
2007)

Here we might note that (a) assessment is equated with the student’s work; (b) there
is no reference to the role of the assessor or to what is done in relation to the work;
and (c) the various purposes of assessment are incorporated into its definition.
Secondly, a definition from a frequently cited text. Rowntree defines assess-
ment thus:
... assessment can be thought of as occurring whenever one person, in some kind of
interaction, direct or indirect, with another, is conscious of obtaining and interpreting
information about the knowledge and understanding, or abilities and attitudes of that
other person. To some extent or other it is an attempt to know that person. (Rowntree,
1987, p. 4)
2 Assessment, Learning and Judgement in Higher Education 15

In this instance we might note the reference to ‘‘interaction’’ with its implied
reciprocity, and that assessment is depicted as the act of one person in relation
to another, thereby unintentionally excluding self-assessment and the student’s
monitoring of the quality of his or her own work.
Both of these definitions have been developed thoughtfully, reflect impor-
tant aspects of the assessment process, and have been rightly influential in their
respective contexts of institutional policy and international scholarship. Each,
however, is problematic as a general definition of assessment, either by going
beyond the meaning of assessment per se, or by not going far enough to specify
the nature of the act of assessment. Like many, perhaps most, definitions of
assessment, they incorporate or omit elements in a way that reflects particular
contextual perspectives, giving rise to the following question: Is it possible to
posit a definition of assessment that encapsulates the essential components
of assessment without introducing superfluous or ancillary concepts? The
remainder of this section attempts to do this.
One difficulty with assessment as a term in educational contexts is that its
usage often departs from how the term is understood in everyday usage.
The Oxford English Dictionary (2002-) is instructive here through its location
of educational assessment within the broader usage of the term. Thus it
defines ‘‘to assess’’ as ‘‘to evaluate (a person or thing); to estimate (the quality,
value, or extent of), to gauge or judge’’ and it defines assessment in education
as ‘‘the process or means of evaluating academic work’’. These two definitions
neatly encompass two principal models of assessment: where assessment is
conceived of quantitatively in terms of ‘‘gauging’’ the ‘‘extent of’’ learning,
assessment follows a measurement model; where it is construed in terms of
‘‘evaluation’’, ‘‘quality’’, and ‘‘judgement’’, it follows a judgement model.
Hager and Butler (1996) have explicated the distinctions between these para-
digms very clearly (see also Boud in Chapter 3). They describe a scientific
measurement model in which knowledge is seen as objective and context-free
and in which assessment tests well-established knowledge that stands
apart from practice. In this measurement model, assessment utilises closed
problems with definite answers. In contrast, the judgement model integrates
theory and practice, sees knowledge as provisional, subjective and context-
dependent, and uses practice-like assessment which includes open problems
with indefinite answers. Knight (2007) more recently highlighted the impor-
tance of the distinction between measurement and judgement by pointing out
the common mistake of applying measurement to achievements that are not,
in an epistemological sense, measurable and noting that different kinds of
judgement are required once we move beyond the simplest forms of knowl-
edge. Boud (2007) has taken the further step of arguing for assessment that not
merely applies judgement to students’ work but serves actively to inform
students’ own judgement of their work, a skill seen to be essential in their
future practice.
Assessment as judgement therefore seems to be at the core of assessment, and
its immediate object is a student’s work. However, one further step seems
16 G. Joughin

needed. Is assessment merely about particular pieces of work or does the object
of assessment go beyond the work? Two other definitions are instructive.
Firstly, in the highly influential work of the Committee on the Foundations
of Assessment, Knowing What Students Know: The Science and Design of
Educational Assessment, assessment is defined as ‘‘a process by which educators
use students’ responses to specially created or naturally occurring stimuli to
draw inferences about the students’ knowledge and skills’’ (Committee on the
Foundations of Assessment, 2001, p. 20).
Secondly, Sadler (in a private communication) has incorporated these
elements in a simple, three-stage definition: ‘‘The act of assessment consists of
appraising the quality of what students have done in response to a set task so
that we can infer what students can do1, from which we can draw an inference
about what students know.’’
From these definitions, the irreducible core of assessment can be limited to
(a) students’ work, (b) judgements about the quality of this work, and
(c) inferences drawn from this about what students know. Judgement and
inference are thus at the core of assessment, leading to this simple definition:
To assess is to make judgements about students’ work, inferring from this what
they have the capacity to do in the assessed domain, and thus what they know,
value, or are capable of doing. This definition does not assume the purpose(s) of
assessment, who assesses, when assessment occurs or how it is done. It does,
however, provide a basis for considering these matters clearly and aids the
discussion of the relationship between assessment and learning.

Assessment, Judgement and Learning

What then is the relationship between assessment, understood as judgment


about student work and consequent inferences about what they know, and
the process of learning? The following three propositions encapsulate some
central beliefs about this relationship as expressed repeatedly in the literature
on assessment and learning over the past forty years:
 What students focus on in their study is driven by the work they believe they
will be required to produce.
 Students’ adapt their approaches to learning to meet assessment require-
ments, so that assessment tasks can be designed to encourage deep
approaches to learning.
 Students can use judgements about their work to improve their consequent
learning.
The following three sections address each of these propositions in turn.

1
Since what they have done is just one of many possible responses, and the task itself was also
just one of many possible tasks that could have been set.
2 Assessment, Learning and Judgement in Higher Education 17

Assessment as Driving Student Learning

The belief that students focus their study on what they believe will be assessed is
embedded in the literature of teaching and learning in higher education. Derek
Rowntree begins his frequently cited text on assessment in higher education by
stating that ‘‘if we wish to discover the truth about an educational system, we
must look into its assessment procedures’’ (Rowntree, 1977, p. 1). This dictum
has been echoed by some of the most prominent writers on assessment in recent
times. Ramsden, for example, states that ‘‘from our students’ point of view,
assessment always defines the actual curriculum’’ (Ramsden, 2003, p. 182);
Biggs notes that ‘‘students learn what they think they will be tested on’’
(Biggs, 2003, p. 182); Gibbs asserts that ‘‘assessment frames learning, creates
learning activity and orients all aspects of learning behaviour’’ (Gibbs, 2006,
p. 23); and Bryan and Clegg begin their work on innovative assessment in higher
education with the unqualified statement that ‘‘research of the last twenty years
provides evidence that students adopt strategic, cue-seeking tactics in relation
to assessed work’’ (Bryan & Clegg, 2006, p. 1).
Three works from the late 1960s and early 1970s are regularly cited to
support this view that assessment determines the direction of student learning:
Miller and Parlett’s Up to the Mark: A Study of the Examination Game (1974),
Snyder’s The Hidden Curriculum (1971) and Becker, Geer and Hughes’ Making
the Grade (1968; 1995). So influential has this view become, and so frequently
are these works cited to support it, that a revisiting of these studies seems
essential if we are to understand this position correctly.
Up to the Mark: A Study of the Examination Game is well known for
introducing the terms cue-conscious and cue-seeking into the higher education
vocabulary. Cue-conscious students, in the study reported in this book,
‘‘talked about a need to be perceptive and receptive to ‘cues’ sent out by
staff – things like picking up hints about exam topics, noticing which aspects
of the subject the staff favoured, noticing whether they were making a good
impression in a tutorial and so on’’ (Miller & Parlett, 1974, p. 52). Cue-seekers
took a more active approach, seeking out staff about exam questions and
discovering the interests of their oral examiners. The third group in this study,
termed cue-deaf, simply worked hard to succeed without seeking hints on
examinations. While the terms ‘‘cue-conscious’’ and ‘‘cue-seeking’’ have
entered our collective consciousness, we may be less aware of other aspects
of Miller and Parlett’s study. Firstly, the study was based on a very particular
kind of student and context – final year honours students in a single depart-
ment of a Scottish university. Secondly, the study’s sample was small – 30.
Thirdly, and perhaps most importantly, the study’s quantitative findings do
not support the conclusion that students’ learning is dominated by assess-
ment: only five of the sample were categorized as cue-seekers, 11 were cue
conscious, while just under half (14) of the sample, and therefore constituting
the largest group of students, were cue-deaf.
18 G. Joughin

One point of departure for Miller and Parlett’s study was Snyder’s equally
influential study, The Hidden Curriculum, based on first-year students at
the Massachusetts Institute of Technology and Wellesley College. Snyder
concluded that students were dominated by the desire to achieve high grades,
and that ‘‘each student figures out what is actually expected as opposed to
what is formally required’’ (Snyder, 1971, p. 9). In short, Snyder concluded
that all students in his study were cue-seekers. Given Miller and Parlett’s
finding that only some students were cue seekers while more were cue con-
scious and even more were cue deaf, Snyder’s conclusion that all students
adopted the same stance regarding assessment seems unlikely. The credibility
of this conclusion also suffers in light of what we now know about variation in
how students can view and respond to the same context, making the singular
pursuit of grades as a universal phenomenon unconvincing. Snyder’s metho-
dology is not articulated, but it does not seem to have included a search for
counter-examples.
The third classic work in this vein is Making the Grade (Becker, Geer &
Hughes, 1968; 1995), a participant observation study of students at the
University of Kansas. The authors described student life as characterized by
‘‘the grade point perspective’’ according to which ‘‘grades are the major insti-
tutionalized valuable of the campus’’ (1995, p. 55) and thus the focus of atten-
tion for almost all students. Unlike Snyder, these researchers did seek contrary
evidence, though with little success – only a small minority of students ignored
the grade point perspective and valued learning for its own sake. However, the
authors are adamant that this perspective represents an institutionalization
process characteristic of the particular university studied and that they would
not expect all campuses to be like it. Moreover, they state that ‘‘on a campus
where something other than grades was the major institutionalized form of
value, we would not expect the GPA perspective to exist at all’’ (Becker, Geer &
Hughes, 1995, p. 122).
Let us be clear ... about what we mean. We do not mean that all students invariably
undertake the actions we have described, or that students never have any other motive
than getting good grades. We particularly do not mean that the perspective is the only
possible way for students to deal with the academic side of campus life. (Becker, Geer &
Hughes, 1995, p. 121)

Yet many who have cited this study have ignored this emphatic qualification by
its authors, preferring to cite this work in support of the proposition that
students in all contexts, at all times, are driven in their academic endeavours
by assessment tasks and the desire to achieve well in them.
When Gibbs (2006, p. 23) states that ‘‘students are strategic as never before’’,
we should be aware (as Gibbs himself is) that students could be barely more
strategic than Miller and Parlett’s cue seekers, Snyder’s freshmen in hot pursuit
of grades, or Becker, Geer and Hughes’ students seeking to optimize their
GPAs. However, we must also be acutely aware that we do not know the extent
of this behaviour, the forms it may take in different contexts, or the extent to
2 Assessment, Learning and Judgement in Higher Education 19

which these findings can be applied across cultures, disciplines and the thirty or
more years since they were first articulated.
One strand of research has not simply accepted the studies cited in the
previous section but has sought to extend this research through detailed
empirical investigation. This work has been reported by Gibbs and his associ-
ates in relation to what they have termed ‘‘the conditions under which assess-
ment supports student learning’’ (Gibbs & Simpson, 2004). The Assessment
Experience Questionnaire (AEQ) developed as part of their research is
designed to measure, amongst other things, the consistency of student effort
over a semester and whether assessment serves to focus students’ attention on
particular parts of the syllabus (Gibbs & Simpson, 2003). Initial testing of the
AEQ (Dunbar-Goddett, Gibbs, Law & Rust, 2006; Gibbs, 2006) supports the
proposition that assessment does influence students’ distribution of effort and
their coverage of the syllabus, though the strength of this influence and
whether it applies to some students more than to others is unclear. One
research study, using a Chinese version of the AEQ and involving 108 educa-
tion students in Hong Kong produced contradictory findings, leaving it
unclear whether students tended to believe that assessment allowed them to
be selective in what they studied or required them to cover the entire syllabus
(Joughin, 2006). For example, while 58% of the students surveyed agreed or
strongly agreed with the statement that ‘‘it was possible to be quite strategic
about which topics you could afford not to study’’, 62% agreed or strongly
agreed with the apparently contradictory position that ‘‘you had to study the
entire syllabus to do well in the assessment’’.
While the AEQ-based research and associated studies are still embryonic and
not widely reported, they do promise to offer useful insights into how students’
patterns and foci of study develop in light of their assessment. At present,
however, findings are equivocal and do not permit us to make blanket state-
ments – the influence of assessment on students’ study patterns and foci may
well vary significantly from student to student.

Assessment and Students’ Approaches to Learning


Numerous writers have asserted that assessment not only determines what
students focus on in their learning, but that it exercises a determinative influence
on whether students adopt a deep approach to learning in which they seek to
understand the underlying meaning of what they are studying, or a surface
approach based on becoming able to reproduce what they are studying without
necessarily understanding it (Marton & Säljö, 1997). Nightingale and her
colleagues expressed this view unambiguously when they claimed that ‘‘student
learning research has repeatedly demonstrated the impact of assessment on
students’ approaches to learning’’ (Nightingale & O’Neil, 1994, quoted by
Nightingale et al., 1996, p. 6), while numerous authors have cited with approval
Elton and Laurillard’s aphorism that ‘‘the quickest way to change student
20 G. Joughin

learning is to change the assessment system’’ (Elton & Laurillard, 1979, p. 100).
Boud, as noted previously, took a more circumspect view, stating that ‘‘Assess-
ment activities . . . influence approaches to learning that students take’’ (Boud,
2006, p. 21, emphasis added), rather than determining such approaches. How-
ever, it remains a widespread view that the process of student learning can be
powerfully and positively influenced by assessment. A review of the evidence is
called for.
The tenor of the case in favour of assessment directing students’ learning
processes was set in an early study by Terry who compared how students
studied for essays and for ‘‘objective tests’’ (including true/false, multiple choice,
completion and simple recall tests). He found that students preparing for
objective tests tended to focus on small units of content – words, phrases and
sentences – thereby risking what he terms ‘‘the vice of shallowness or super-
ficiality’’ (Terry, 1933, p. 597). On the other hand, students preparing for essay-
based tests emphasized the importance of focusing on large units of content –
main ideas, summaries and related ideas. He noted that while some students
discriminated between test types in their preparation, others reported no
difference in how they prepared for essay and objective tests.
Meyer (1934) compared what students remembered after studying for an
essay-type test and for an examination which included multiple-choice, true/
false and completion questions, concluding that the former led to more com-
plete mastery and that other forms of testing should be used only in exceptional
circumstances. He subsequently asked these students to describe how they
studied for the type of exam they were expecting, and how this differed from
how they would have studied for a different type of exam. He concluded that
students expecting an essay exam attempted to develop a general view of the
material, while students expecting the other types of exam focused on detail.
Students certainly reported being selective in how they studied, rather than
simply following a particular way of studying regardless of assessment format
(Meyer, 1935).
The conclusion to which these two studies inevitably lead is summarized by
Meyer as follows:
The kind of test to be given, if the students know it in advance, determines in large
measure both what and how they study. The behaviour of students in this habitual way
places greater powers in the teacher’s hands than many realize. By the selection of
suitable types of tests, the teacher can cause large numbers of his students to study, to a
considerable extent at least, in the ways he deems best for a given unit of subject-matter.
Whether he interests himself in the question or not, most of his students will probably
use the methods of study which they consider best adapted to the particular types of
tests customarily employed. (Meyer, 1934, pp. 642–643)

This is a powerful conclusion which seems to be shared by many contemporary


writers on assessment 70 years later. But has it been supported by subsequent
research? There is certainly a strong case to be made that it has not. In the
context of the considerable work done in the student approaches to learning
tradition, the greatest hope for assessment and learning would be the prospect
2 Assessment, Learning and Judgement in Higher Education 21

of being able to design assessment tasks that will induce students to adopt deep
approaches to learning. Thirty years of studies have yielded equivocal results.
Early studies by Marton and Säljö (and summarized in Marton & Säljö,
1997) sought to see if students’ approaches to learning could be changed by
manipulating assessment tasks. In one study, Marton sought to induce a deep
approach by getting students to respond to questions embedded within a text
they were reading and which invited them to engage in the sort of internal
dialogue associated with a deep approach. The result was the opposite, with
students adapting to the task by responding to the questions without engaging
deeply with them. In another study, different groups of students were asked
different kinds of questions after reading the same text. The set of questions for
one group of students focused on facts and listing ideas while those for another
group focused on reasoning. The first group adopted a surface approach – the
outcome that was expected. In the second group, however, approximately half
of the sample interpreted the task as intended while the other half technified the
task, responding in a superficial way that they believed would yet meet the
requirements. Marton and Säljö’s conclusion more than 20 years after their
original experiments is a sobering one:
It is obviously quite easy to induce a surface approach and enhance the tendency to take a
reproductive attitude when learning from texts. However, when attempting to induce a
deep approach the difficulties seem quite profound.’’ (Marton & Säljö, 1997, p. 53)

More than 30 years after these experiments, Struyven, Dochy, Janssens and
Gielen (in press) have come to the same conclusion. In a quantitative study of
790 first year education students subjected to different assessment methods
(and, admittedly, different teaching methods), they concluded that ‘‘students’
approaches to learning were not deepened as expected by the student-activating
teaching/learning environment, nor by the new assessment methods such as
case based evaluation, peer and portfolio assessment’’ (pp. 9–10). They saw this
large-scale study, based in genuine teaching contexts, as confirming the experi-
mental findings of Marton and Säljö (1997) and concluded that ‘‘although it
seems relatively easy to influence the approach students adopt when learning, it
also appears very difficult’’ (p. 13).
One strand of studies appears to challenge this finding, or at least make us
think twice about it. Several comparative studies, in contexts of authentic
learning in actual subjects, have considered the proposition that different
forms of assessment can lead to different kinds of studying. Silvey (1951)
compared essay tests and objective tests, finding that students studied general
principles for the former and focused on details for the latter. Scouller (1998)
found that students were more likely to employ surface strategies when
preparing for a multiple choice test than when preparing for an assignment
essay. Tang (1992, 1994) found students applied ‘‘low level’’ strategies such as
memorization when preparing for a short answer test but both low level and
high level strategies were found amongst students preparing for the assign-
ments. Thomas and Bain (1982) compared essays and objective tests, finding
22 G. Joughin

that students tended to use either deep or surface strategies irrespective of


assessment type (essay or objective test) though a subsequent study found that
‘‘transformation’’ approaches increased and ‘‘reproductive’’ approaches
decreased with a move from multiple-choice exams to open-ended assignments
(Thomas & Bain, 1984). Sambell and McDowell (1998) reported that a move
from a traditional unseen exam to an open book exam led to a shift towards a
deep approach to learning. Finally, in my own study of oral assessment
(Joughin, 2007) many students described adopting a deep approach to learning
in relation to oral assessment while taking a more reproductive approach to
written assignments.
The interpretation of the results of this strand of research is equivocal.
While the studies noted above could be seen to support the proposition that
certain kinds of assessment can tend to induce students to adopt a deep approach,
they are also consistent with the conclusion that (a) students who have the capacity
and inclination to adopt a deep approach will do so when this is appropriate to the
assessment task, but that they can also adopt a surface approach when this seems
appropriate, while (b) other students will tend to consistently adopt a surface
approach, regardless of the nature of the task. Consequently, the influence of
assessment on approaches to learning may not be that more appropriate forms of
assessment can induce a deep approach to learning, but rather that inappropriate
forms of assessment can induce a surface approach. Thus Haggis (2003) con-
cluded that ‘‘despite frequent claims to the contrary, it may be that it is almost
impossible to ‘induce’ a deep approach if it is not ‘already there’’’ (Haggis, 2003,
p. 104), adding a degree of pessimism to Ramsden’s earlier conclusion that ‘‘what
still remains unclear, however, is how to encourage deep approaches by attention
to assessment methods’’ (Ramsden, 1997, p. 204).
While this section has highlighted limits to improving learning through
changing assessment and contradicted any simplistic notions of inducing
students to adopt deep approaches to learning, it nevertheless reinforces the
vital importance of designing assessment tasks that call on students to adopt a
deep approach. To do otherwise is to impoverish the learning of many students.
Certainly where current assessment tasks lend themselves to surface
approaches, Elton and Laurillard’s dictum referred to previously remains
true: ‘‘the quickest way to change student learning is to change the assessment
system’’ (Elton & Laurillard, 1979, p. 100). However the change may apply only
to some students, while the learning of others remains largely unaffected.

Improving Learning and Developing Judgement:


Contraditions Between Feedback Theory and Practice
The final two interactions between assessment and learning noted in the open-
ing section of this chapter are concerned with the use of judgement to shape
learning and the development of students’ capacity to judge the quality of their
2 Assessment, Learning and Judgement in Higher Education 23

own work. Feedback is at the centre of both of these processes and has received
considerable attention in the assessment literature over the past two decades.
There is widespread agreement that effective feedback is central to learning.
Feedback figures prominently in innumerable theories of effective teaching.
Ramsden (2003) includes appropriate assessment and feedback as one of his six
key principles of learning, noting strong research evidence that the quality of
feedback is the most salient factor in differentiating between the best and worst
courses. Using feedback (whether extrinsic or intrinsic) and reflecting on the
goals-action-feedback process are central to the frequently cited ‘‘conversa-
tional framework’’ of learning as presented by Laurillard (2002). Rowntree
(1987, p. 24) refers to feedback as ‘‘the lifeblood of learning’’. ‘‘Providing feed-
back about performance’’ constitutes one of Gagne’s equally well known con-
ditions of learning (Gagne, 1985).
Black and Wiliam, in their definitive meta-analysis of assessment and class-
room learning research (Black & William, 1998), clearly established that feed-
back can have a powerful effect on learning, though noting that this effect can
sometimes be negative and that positive effects depend on the quality of the
feedback. Importantly, they followed Sadler’s definition of feedback (Sadler,
1989), noting that feedback only serves a formative function when it indicates
how the gap between actual and desired levels of performance can be bridged
and leads to some closure of this gap. Their study suggested a number of aspects
of feedback associated with learning, including feedback that focuses on the
task and not the student. While their study focused on school level studies along
with a small number of tertiary level studies, their findings have been widely
accepted within the higher education literature.
The work of Gibbs and Simpson noted earlier is located exclusively in the
context of higher education and nominates seven conditions required for feed-
back to be effective, including its quantity and timing; its quality in focusing on
learning, being linked to the assignment criteria, and being able to be under-
stood by students; and the requirement that students take notice of the feedback
and act on it to improve their work and learning (Gibbs & Simpson, 2004).
Nicol and Macfarlane-Dick (2006) also posit seven principles of feedback,
highlighting feedback as a moderately complex learning process centred on
self-regulation. Based on the work of Sadler, Black and Wiliam and others,
their principles include encouraging teacher and peer dialogue and self-esteem,
along with the expected notions of clarifying the nature of good performance,
self-assessment, and opportunities to close the gap between current and desired
performance.
Most recently, Hounsell and his colleagues (Hounsell, McCune, Hounsell, &
Litjens, 2008) have proposed a six-step guidance and feedback loop, based on
surveys and interviews of students in first and final year bioscience courses. The
loop begins with students’ prior experiences of assessment, moves through
preliminary guidance and ongoing clarifications through feedback on perfor-
mance to supplementary support and feed-forward as enhanced understanding
is applied in subsequent work.
24 G. Joughin

If theory attests to the critical role of feedback in learning and recent work
has suggested principles for ensuring that this role is made effective, what does
empirical research tell us about actual educational practice? Four studies
suggest that the provision of feedback in higher education is problematic.
Glover and Brown (2006), in a small interview study of science students,
reported that students attended to feedback but often did not act on it, usually
because feedback was specific to the topic covered and was not relevant to
forthcoming work. The authors subsequently analysed feedback provided on
147 written assignments, noting the relative absence of ‘‘feed-forward’’ or
suggestions on how to improve and excessive attention to grammar. Chanock
(2000) highlighted a more basic problem – that students often simply do not
understand their tutors’ comments, a point supported by Ivanič, Clark and
Rimmershaw (2000) in their evocatively titled paper, ‘‘What am I supposed to
make of this? The messages conveyed to students by tutors’ written comments’’.
Higgins, Hartley and Skelton (2002), in an interview and survey study in
business and humanities, found that 82% of students surveyed agreed that
they paid close attention to feedback, while 80% disagreed with the statement
that ‘‘Feedback comments are not that useful’’, though this study did not report
if students actually utilized feedback in further work. Like the students in
Hyland’s study (Hyland, 2000) these students may have appreciated the feed-
back they received without actually using it.
It appears from the literature and research cited in this section that the
conceptualisation of feedback may be considerably in advance of its application,
with three kinds of problems being evident: problems that arise from the com-
plexity of feedback as a learning process; ‘‘structural problems’’ related to the
timing of feedback, its focus and quality; and what Higgins, Hartley and Skelton
(2001, p. 273) refer to as ‘‘issues of power, identity, emotion, discourse and
subjectivity’’. Clearly, assertions about the importance of feedback to learning
stand in contrast to the findings of empirical research into students’ experience
of assessment, raising questions regarding both the theoretical assumptions
about the centrality of feedback to learning and the frequent failure to bring
feedback effectively into play as part of teaching and learning processes.

Conclusion

The literature on assessment and learning is beset with difficulties. Four of these
have been noted in this chapter, beginning with problems associated with
conflated definitions of assessment. Of greater concern is the reliance by
many writers on foundational research that is not fully understood and fre-
quently misinterpreted. Certainly the assertions that assessment drives learning
and that students’ approaches to learning can be improved simply by changing
assessment methods must be treated cautiously in light of the nuanced research
which is often associated with these claims. Finally, the failure of feedback in
2 Assessment, Learning and Judgement in Higher Education 25

practice to perform the pre-eminent role accorded it in formative assessment


theory raises concerns about our understanding of learning and feedback’s role
within it.

Acknowledgment I am grateful to Royce Sadler for his critical reading of the first draft of this
chapter and his insightful suggestions for its improvement.

References
Becker, H. S., Geer, B., & Hughes, E. C. (1968; 1995). Making the grade: The academic side of
college life. New Brunswick: Transaction.
Biggs, J. B. (2003). Teaching for quality learning at university (2nd ed.). Maidenhead: Open
University Press.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education,
5(1), 7–74.
Boud, D. (2006). Foreword. In C. Bryan & K. Clegg (Eds.), Innovative assessment in higher
education (pp. xvii–xix). London and New York: Routledge.
Boud, D. (2007). Reframing assessment as if learning were important. In D. Boud &
N. Falchikov (Eds.), Rethinking assessment in higher education: Learning for the longer
term (pp. 14–25). London and New York: Routledge.
Bryan, C., & Clegg, K. (2006). Introduction. In C. Bryan & K. Clegg (Eds.), Innovative
assessment in higher education (pp. 1–7). London and New York: Routledge.
Carless, D., Joughin, G., Liu, N-F., & Associates (2006). How assessment supports learning:
Learning-oriented assessment in action. Hong Kong: Hong Kong University Press.
Chanock, K. (2000). ‘Comments on essays: Do students understand what tutors write?’
Teaching in Higher Education, 5(1), 95–105.
Committee on the Foundations of Assessment; Pellegrino, J.W., Chudowsky, N., & Glaser, R.,
(Eds.). (2001). Knowing what students know: The science and design of educational assessment.
Washington: National Academy Press.
Dunbar-Goddett, H., Gibbs, G., Law, S., & Rust, C. (2006, August–September). A methodology
for evaluating the effects of programme assessment environments on student learning. Paper
presented at the Third Biennial Joint Northumbria/EARLI SIG Assessment Conference,
Northumbria University.
Elton, L., & Laurillard, D. M. (1979). Trends in research on student learning. Studies in
Higher Education, 4, 87–102.
Falchikov, N., & Boud, D. (2007). Assessment and emotion: the impact of being assessed. In
D. Boud, & N. Falchikov, (Eds.), Rethinking Assessment for Higher Education: Learning
for the Longer Term (pp. 144–155). London: Routledge.
Gagne, R. M. (1985). The conditions of learning and theory of instruction. New York: CBS
College Publishing.
Gibbs, G. (2006). How assessment frames student learning. In C. Bryan & K. Clegg (Eds.),
Innovative assessment in higher education (pp. 23–36). London: Routledge.
Gibbs, G., & Simpson, C. (2003, September). Measuring the response of students to assessment:
The Assessment Experience Questionnaire. Paper presented at the 11th International
Improving Student Learning Symposium, Hinckley.
Gibbs, G., & Simpson, C. (2004). Conditions under which assessment supports students’
learning. Learning and Teaching in Higher Education, 1, 3–31.
Glover, C., & Brown, E. (2006). Written feedback for students: Too much, too detailed or too
incomprehensible to be effective? Bioscience Education ejournal, vol 7. Retrieved 5 November
from http://www.bioscience.heacademy.ac.uk/journal/vol7/beej-7-3.htm
26 G. Joughin

Hager, P., & Butler, J. (1996). Two models of educational assessment. Assessment and
Evaluation in Higher Education, 21(4), 367–378.
Haggis, T. (2003). Constructing images of ourselves? A critical investigation into ‘approaches
to learning’ research in higher education. British Educational Research Journal, 29(1),
89–104.
Higgins, R., Hartley, P., & Skelton, A. (2001). Getting the message across: The problem of
communicating assessment feedback. Teaching in Higher Education, 6(2), 269–274.
Higgins, R., Hartley, P., & Skelton, A. (2002). The conscientious consumer: Reconsidering
the role of assessment feedback in student learning. Studies in Higher Education, 27(1),
53–64.
Hounsell, D., McCune, V., Hounsell, J., & Litjens, J. (2008). The quality of guidance and
feedback to students. Higher Education Research and Development, 27(1), 55–67.
Hyland, P. (2000). Learning from feedback on assessment. In P. Hyland & A. Booth (Eds.),
The practice of university history teaching (pp. 233–247). Manchester, UK: Manchester
University Press.
Ivanic, R., Clark, R., & Rimmershaw, R. (2000). What am I supposed to make of this? The
messages conveyed to students by tutors’ written comments. In M. R. Lea & B. Stierer
(Eds.), Student writing in higher education: New contexts (pp. 47–65). Buckingham, UK:
SRHE & Open University Press
Joughin, G. (2006, August–September). Students’ experience of assessment in Hong Kong
higher education: Some cultural considerations. Paper presented at the Third Biennial
Joint Northumbria/EARLI SIG Assessment, Northumbria University.
Joughin, G. (2007). Student conceptions of oral presentations. Studies in Higher Education,
32(3), 323–336.
Knight, P. (2007). Grading, classifying and future learning. In D. Boud & N. Falchikov (Eds.),
Rethinking assessment in higher education (pp. 72–86). Abingdon and New York:
Routledge.
Laurillard, D. (2002). Rethinking university teaching (2nd ed.). London: Routledge.
Marton, F., & Säljö, R. (1997). Approaches to learning. In F. Marton, D. Hounsell, &
N. Entwistle (Eds.), The experience of learning (2nd ed., pp. 39–58). Edinburgh: Scottish
Academic Press.
Meyer, G. (1934). An experimental study of the old and new types of examination: 1. The
effect of the examination set on memory. The Journal of Educational Psychology, 25,
641–661.
Meyer, G. (1935). An experimental study of the old and new types of examination: II.
Methods of study. The Journal of Educational Psychology, 26, 30–40.
Miller, C. M. L., & Parlett, M. (1974). Up to the mark: A study of the examination game.
London: Society for Research into Higher Education.
Nicol, D. J., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learn-
ing: A model and seven principles of good feedback practice. Studies in Higher Education,
31(2), 199–218.
Nightingale, P., & O’Neil, M. (1994). Achieving quality in learning in higher education.
London: Kogan Page.
Nightingale, P., Wiata, I. T., Toohey, S., Ryan, G., Hughes, C., & Magin, D. (1996). Assessing
learning in universities. Sydney, Australia: University of New South Wales Press.
Oxford English Dictionary [electronic resource] (2002-). Oxford & New York: Oxford
University Press, updated quarterly.
Ramsden, P. (1997). The context of learning in academic departments. In F. Marton,
D. Hounsell, & N. Entwistle (Eds.), The experience of learning (2nd ed., pp. 198–216).
Edinburgh: Scottish Academic Press.
Ramsden, P. (2003). Learning to teach in higher education (2nd ed.). London: Routledge.
Rowntree, D. (1977). Assessing students: How shall we know them? (1st ed.). London: Kogan
Page.
2 Assessment, Learning and Judgement in Higher Education 27

Rowntree, D. (1987). Assessing students: How shall we know them? (2nd ed.). London: Kogan
Page.
Rowntree, D. (2007). Designing an assessment system. Retrieved 5 November, 2007, from
http://iet.open.ac.uk/pp/D.G.F.Rowntree/Assessment.html
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instruc-
tional Science, 18, 119–144.
Sambell, K., & McDowell, L. (1998). The construction of the hidden curriculum. Assessment
and Evaluation in Higher Education, 23(4), 391–402.
Scouller, K. (1998). The influence of assessment method on students’ learning approaches:
Multiple choice question examination versus assignment essay. Higher Education, 35,
453–472.
Silvey, G. (1951). Student reaction to the objective and essay test. School and Society, 73,
377–378.
Snyder, B. R. (1971). The hidden curriculum. New York: Knopf.
Struyven, K., Dochy, F., Janssens, S., & Gielen, S. (in press). On the dynamics of students’
approaches to learning: The effects of the teaching/learning environment. Learning and
Instruction.
Tang, K. C. C. (1992). Perception of task demand, strategy attributions and student learning.
Eighteenth Annual Conference of the Higher Education Research and Development
Society of Australasia, Monash University Gippsland Campus, Churchill, Victoria,
474–480.
Tang, K. C. C. (1994). Effects of modes of assessment on students’ preparation strategies. In
G. Gibbs (Ed.), Improving student learning: Theory and practice (pp. 151–170). Oxford,
England: Oxford Centre for Staff Development.
Terry, P. W. (1933). How students review for objective and essay tests. The Elementary School
Journal, April, 592–603.
The University of Queensland. (2007). Assessment Policy and Practices. Retrieved 5 November,
2007, from http://www.uq.edu.au/hupp/index.html?page=25109
Thomas, P. R., & Bain, J. D. (1982). Consistency in learning strategies. Higher Education, 11,
249–259.
Thomas, P. R., & Bain, J. D. (1984). Contextual dependence of learning approaches: The
effects of assessments. Human Learning, 3, 227–240.
Chapter 3
How Can Practice Reshape Assessment?

David Boud

Introduction

Assessment in higher education is being challenged by a multiplicity of demands.


The activities predominantly used – examinations, assignments and other kinds
of tests – have emerged from within an educational tradition lightly influenced by
ideas from psychological measurement, but mostly influenced by longstanding
cultural practices in the academic disciplines. Assessment in higher education
has for a long time been a process influenced more from within the university
rather than externally. It has typically been judged in terms of how well it meets
the needs of educational institutions for selection and allocation of places in
later courses or research study, and whether it satisfies the expectations of the
almost totally exclusive academic membership of examination committees.
Within courses, it has been judged by how well it meets the needs of those
teaching. In more recent times it is judged in terms of how well it addresses the
learning outcomes for a course.
When we think of assessment as a feature of educational programs and
construct it as part of the world of teaching and courses, our points of reference
are other courses and assessment that occurs to measure knowledge acquired.
Assessment is positioned as part of a world of evaluating individuals in an
educational system separated from engagement in the everyday challenges of
work. In contrast, in the everyday world of work, assessments are an intrinsic
part of dealing with the challenges that any form of work generates. When we
learn through our lives, we necessarily engage in assessment. We make judge-
ments about what needs to be done and whether we have done it effectively.
While we may do this individually, we also do it with colleagues and others in
the situations in which we find ourselves. This occurs in a social context, not in
isolation from others. We also make judgements about situations and groups,
not just about individuals, and when we make judgements about individuals we
do so in a very particular context. Judgements are typically validated as part of

D. Boud
University Graduate School, University of Technology, Sydney, Australia
e-mail: David.Boud@uts.edu.au

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 29


DOI: 10.1007/978-1-4020-8905-3_3, Ó Springer ScienceþBusiness Media B.V. 2009
30 D. Boud

a community of practice. Indeed, the community of judgement defines what


constitutes good work.
Given the increasing focus on a learning outcomes-oriented approach to
education, it is useful to examine assessment from a perspective outside the
immediate educational enterprise. How can it be deployed to meet the ends
rather than the means of education? How can it address the needs of continuing
learning? Such a focus on what graduates do when they practise after they
complete their courses can enable us to look afresh at assessment within
educational institutions to ensure that it is not undermined by short-term and
local needs (Boud & Falchikov, 2006). This focus can be gained from looking at
practice in the world for which graduates are being prepared.
This chapter investigates how an emphasis on practice might be used to
examine assessment within higher education. Practice is taken pragmatically
here as representing what students and graduates do when they exercise their
knowledge, skills and dispositions with respect to problems and issues in the
world. For some, this will mean acting as a professional practitioner and
engaging in practice with a client or customer, for others it will be the practice
of problem analysis and situation improvement in any work or non-work
context. Acts of practice are many and varied. It is the intention here to use
the perspective that they provide to question how students are prepared for
practice, particularly with respect to how assessment operates to enhance and
inhibit students’ capacity to engage effectively in practice.
However, practice is not an unproblematic concept and what constitutes
practice is not self-evident. The chapter therefore starts with an examination of
practice and how it is used. It draws in this on theoretical notions of practice
that are influencing the literature now.
We then move from practice to examine some of the ideas emerging about
assessment in higher education that can be linked to this. These include a
focus on assessment for learning, the role of the learner and the need to take
account of preparing for learning beyond graduation. The argument high-
lights a view of assessment as informing judgements in contrast to a view
of assessment as measuring learning outcomes. Implications for the ways in
which assessment priorities in higher education can be reframed are consid-
ered throughout.

Practice and Practice Theory


What is practice and why might it be a useful prompt for considering assess-
ment? In one sense practice is simply the act of doing something in a particular
situation, for example, analysing particular kinds of problems and applying
the results to make a change. That is, it is a description of the everyday acts of
practitioners, and most graduates find themselves in positions in which they
can be regarded as practitioners of one kind or another even if they are not
3 How Can Practice Reshape Assessment? 31

involved in what has traditionally been regarded as professional work.


However, analysing problems in the generic sense is not a useful representa-
tion of practice. Practice will involve analysing particular kinds of problems in
a certain range of contexts. What is important for practice is what is done, by
whom, and in what kind of setting.
However, practice has come in recent years to be more than a way of
referring to acts of various kinds. It is becoming what Schatzki (2001) refers
to as best naming the primary social thing. That is, it is a holistic way of bringing
together the personal and social into sets of activities that can be named and
referred to and which constitute the domains of the everyday world in which we
operate. Professions, occupations and many other activities can be regarded as
sets of practices. To talk of them as practices is to acknowledge that they are not
just the exercise of the knowledge and skills of practitioners, but to see them as
fulfilling particular purposes in particular social contexts. The practice is mean-
ingful within the context in which it takes place. To abstract it from the
environment in which it operates is to remove key features of the practice.
Teaching, for example, may occur in the context of a school. Courses that
prepare school teachers take great care to ensure that what is learned is learned
in order to be exercised in this context, so that courses for vocational teachers or
higher education teachers might differ markedly, not as a result of fundamental
differences in how different students learn, but because of the social, institu-
tional and cultural context of the organizations in which teaching occurs. The
practices, therefore, of different kinds of teachers in different kinds of settings,
differ in quite profound ways, while retaining similar family resemblances.
Practice is also a theoretical notion that provides a way of framing ways in
which we can investigate the world. Schatzki (2001) has identified the practice
turn in contemporary theory and Schwandt (2005) has proposed ways of model-
ling the practice fields, that is, those areas represented in higher education that
prepare students for particular professional practices. In our present context it
is important to make a distinction between practice as a location for activity and
doing an activity (i.e., a hospital ward is a setting for student nurses’ learning in
practice) on the one hand, and practice as a theoretical construct referring to the
nature of activity (i.e., practice is where skills, knowledge and dispositions come
together and perform certain kinds of work), on the other (Schatzki, 2001).
Considerations of practice therefore enable us to sidestep theoretical bifurca-
tions, such as those between individual and social, structure and agency or
system and lifeworld.
This theorisation of practice points to the need to consider assessment as
more than individual judgements of learning. It brings in the importance of the
nature of the activity that performs work and the setting in which it occurs.
Practice is a holistic conception that integrates what people do, where they do it
and with whom. It integrates a multifaceted range of elements into particular
functions in which all the elements are needed.
Schwandt (2005) encapsulates these ideas in two models that represent
different characteristic views of practice. The first of these, Model1, is based
32 D. Boud

in scientific knowledge traditions and broadly describes the views that underpin
common traditional programs. It is instrumental and based on means-end
rationalities. Practice is seen as an array of techniques that can be changed,
improved or learned independently of the ‘‘contingent and temporal circum-
stances’’ (p. 316) in which practices are embedded. The kind of knowledge
generated about practice ought to be ‘‘explicit, general, universal and systematic’’
(p. 318). To achieve this, such knowledge must by definition eliminate the
inherent complexity of the everyday thinking that actually occurs in practice.
The second, Model2, draws from practical knowledge traditions. In it practice is
a ‘‘purposeful, variable engagement with the world’’ (p. 321). Practices are fluid,
changeable and dynamic, characterised by their ‘alterability, indeterminacy and
particularity’ (p. 322). What is important is the specific situation in which
particular instances of practice occur and hence the context-relativity of prac-
tical knowledge. In this model, knowledge must be a flexible concept, capable of
attending to the important features of specific situations. Practice is understood
as situated action.
While Schwandt (2005) presents these models in apparent opposition to each
other, for our present purposes both need to be considered, but because the first
has been overly emphasised in discussions of courses and assessment, we will
pay attention to Model2. As introduced earlier, the key features of a practice
view from the practical knowledge traditions are that, firstly, practice is neces-
sarily contextualised, that is, it cannot be discussed independently of the settings
in which it occurs. It always occurs in particular locations and at particular
times; it is not meaningful to consider it in isolation from the actual sites of
practice. Secondly, practice is necessarily embodied, that is, it involves whole
persons, including their motives, feelings and intentions. Discussion of it in
isolation from the persons who practice is to misunderstand practice.
Furthermore, we must also consider the changing context of professional
practice (Boud, 2006). This involves, firstly, a collective rather than an indivi-
dual focus on practice, that is, a greater emphasis on the performance of teams
and groups of practitioners. Secondly, it involves a multidisciplinary and,
increasingly, a transdisciplinary focus of practice. In this, practitioners of
different specialisations come together to address problems that do not fall
exclusively in the practice domain of any one discipline. It is not conducted by
isolated individuals, but in a social and cultural context in which what one
professional does has necessarily to link with what others do. This is even the
case in those professions in which past cultural practice has been isolationist.
Thirdly, there is a new emphasis on the co-production of practice and the
co-construction of knowledge within it. Professionals are increasingly required
to engage clients, patients and customers as colleagues who co-produce solu-
tions to problems and necessarily co-produce the practices in which they are
engaged.
Practice and practice theory point to a number of features we need to
consider in assessment. The first is the notion of context knowledge and skills
used in a particular practice setting. The kinds of knowledge and skills utilised
3 How Can Practice Reshape Assessment? 33

depend on the setting. The second is the bringing together of knowledge and
skills to operate in a particular context for a particular purpose. Practice involves
these together, not each operating separately. Thirdly, knowledge and skills
require a disposition on the part of the practitioner, a willingness to use these
for the practice purpose. Fourthly, there is a need in many settings to work with
other people who might have different knowledge and skills to undertake prac-
tice. And, finally, there is the need to recognise that practice needs to take account
of and often involve those people who are the focus of the practice.
How well does existing assessment address these features of practice? In
general, it meets these requirements very poorly. Even when there is an element
of authentic practice in a work setting as part of a course, the experience is
separated from other activities and assessment of course units typically occurs
separately from placements, practical work and other located activities. Also,
the proportion of assessment activities based upon practice work, either on
campus or in a placement, or indeed any kind of working with others, is
commonly quite small and is rarely the major part of assessment in higher
education. When the vast bulk of assessment activities are considered, they
may use illustrations and examples from the world of practice, but they do not
engage with the kinds of practice features we have discussed here. Significantly,
assessment in educational institutions is essentially individualistic (notwith-
standing some small moves towards group assessment in some courses). All
assessment is recorded against individuals and group assessments are uncom-
mon and are often adjusted to allow for individual marks. If university learning
and assessment is to provide the foundation for students to subsequently engage
in practice, then it needs to respond to the characteristics identified as being
central to practice and find ways of incorporating them into common assess-
ment events. This needs to occur whether or not the course uses placements at
all or whether or not it is explicitly vocational.
Of course, in many cases we do not know the specific social and organisational
contexts in which students will subsequently practice. Does this mean we must
reject the practice perspective? Not at all: while the specifics of any given context
may not be known, it is known that the exercise of knowledge and skill will not
normally take place in isolation from others, both other practitioners and people
for whom a service or product is provided. It will take place in an organisational
context. It will involve emotional investments on the part of the practitioner and
it will involve them in planning and monitoring their own learning. These
elements therefore will need to be considered in thinking about how students
are assessed and examples of contexts will need to be assumed for these purposes.

Assessment Embedded in Practice


Where can we look for illustrations of assessment long embedded in practice?
A well-established example is that of traditional trade apprenticeships. While
they do not occur in the higher education sector, and they have been modified in
34 D. Boud

recent years through the involvement of educational institutions, they show a


different view of the relationship between learning, assessment and work than is
familiar from universities. Mention of apprenticeships here is not to suggest
that university courses should be more like apprenticeships, but to demonstrate
that assessment can be conceived of in quite different terms than is common-
place in universities and can meaningfully engage with notions of practice. In
the descriptions of apprenticeship that follow, I am grateful for the work of
Kvale, Tanggaard and Elmholdt and their observations about how it works and
how it is accepted as normal (Kvale, 2008; Tanggaard & Elmholdt, 2008).
In an apprenticeship a learner works within an authentic site of production.
The learner is immersed in the particular practice involved. It occurs all around
and practice is not contrived for the purposes of learning: the baker is baking,
the turner is machining metal and the hairdresser is cutting real hair for a person
who wants the service. The apprentice becomes part of a community of practice
over time (Lave & Wenger, 1991). They start by taking up peripheral roles in
which they perform limited and controlled aspects of the practice and move on
by being given greater responsibility for whole processes. They may start by
washing, but not cutting hair, they may assemble the ingredients for bread and
so on. The apprentice is surrounded by continual opportunities for guidance
and feedback from experienced practitioners. The people they learn from are
practising the operations involved as a normal part of work. They are not doing
it for the benefit of the learner, but they provide on-going role models of how
practice is conducted from which to learn. When the apprentice practises,
assessment is frequent, specific and standards-based. It doesn’t matter whether
the piece is manufactured better than can be done by other apprentices. What
counts is to what tolerances the metal is machined. If the work is not good
enough, it gets repeated until the standards are reached. Norm-referenced
assessment is out and standards-based assessment is in.

Assessment for Learning

In considering what a practice perspective might contribute to assessment, it is


necessary to explore what assessment aims to do, what it might appropriately
do and what it presently emphasises.
Assessment in higher education has mainly come from the classroom tradi-
tion of teachers ‘marking’ students’ work, overlaid with external examinations
to judge performance for selection and allocation of scarce resources. The latter
focus has meant that an emphasis on comparability, consistency across indivi-
duals, defensibility and reliability has dominated. Summative assessment in
many respects has come to define the norm for all assessment practice and
other assessment activities are often judged in relation to this, especially as
summative assessment pervades all stages of courses. While formative purposes
are often acknowledged in university assessment policies, these are usually
3 How Can Practice Reshape Assessment? 35

subordinate to the summative purpose of certification. There are indications


that this is changing though and that an exclusive emphasis on summative
purposes has a detrimental effect on learning. Measurement of performance
can have negative consequences for what is learned (Boud, 2007).
The classroom assessment tradition has been reinvigorated in recent times
through a renewed emphasis on research on formative assessment in education
generally (Black & Wiliam, 1998), on self assessment in higher education (Boud,
1995) and on how assessment frames learning (Gibbs, 2006). This work has
drawn attention to otherwise neglected aspects of assessment practice and the
effects of teachers’ actions and learning activities on students’ work. Sadler
(1989, 1998) has for a long time stressed the importance of bridging the gap
between comments from teachers, often inaccurately termed ‘feedback’, and
what students do to effect learning. Nicol and MacFarlane-Dick (2006) have
most recently developed principles to guide feedback practice that can influence
students regulating their own learning. Regretfully, while there are many rich
ideas to improve assessment practice to assist learning they have been eclipsed
by the greater imperatives of grading and classification.
It is worth noting in passing that it is tempting to align the need for
certification and the need for learning. This though would be an inappropriate
simplification that failed to acknowledge the contradictions between being
judged and developing the capacity to make judgements. We have the difficulty
of the limitations of language in having one term in most parts of the English-
speaking world – assessment – to describe something with often opposing
purposes. While it may be desirable to choose new terms, we are stuck with
the day-to-day reality of using assessment to encompass many different and
often incompatible activities.
Elsewhere (Boud, 2007), I have argued that we should reframe the notion of
assessment to move away from the unhelpful polarities prompted by the for-
mative/summative split and an emphasis on measurement as the guiding meta-
phor in assessment thinking. This would enable us to consider how assessment
can be used to foster learning into the world of practice beyond the end of the
course. The focus of this reframing is around the notion of informing judgement
and of judging learning outcomes against appropriate standards. Students must
necessarily be involved in assessment as assessment is a key influence in their
formation and they are active subjects. Such involvement enables assessment to
contribute not only to learning during the course but also to future learning
beyond the end of the course through the development of students’ capacity for
making judgments about the quality of their work, and thus making decisions
about their learning. Unless students develop the capacity to make judgments
about their own learning they cannot be effective learners now or in the future.
A move away from measurement-oriented views of assessment is not new.
Hager and Butler (1996), for example, have drawn attention to what they
identify as two contrasting models of educational assessment. The first of
these is what they term the scientific measurement model in which practice is
derived from theory, knowledge is a ‘given’ for practical purposes, knowledge is
36 D. Boud

‘impersonal’ and context free, assessment is discipline-driven and assessment


deals with structured problems. The second they term the judgemental model in
which practice and theory are loosely symbiotic, knowledge is understood as
provisional, knowledge is a human construct and reflects the context in which it
is generated and used, assessment is problem-driven and assessment deals with
unstructured problems. Such a judgemental model places assessment within
the context of practice and the use of knowledge.
The notion of informing judgement can be a more generative and integrating
notion through which to view assessment than the polarising view of formative
and summative assessment. It focuses on the key act of assessment that needs to
be undertaken by both teachers and students in ensuring that learning has
occurred. Assessment may in this light be more productively seen as a process
of human judgement than as a process of scientific measurement. Such a
perspective is also helpful for a practice-based orientation to assessment. In
order to improve practice we need helpful information that not only indicates
what we are doing, but the judgements that go with this. Interpretation is
needed, as raw data alone does not lend itself to fostering change. This high-
lights the importance of feedback, yet as teachers with increasingly large classes,
we can never provide students with as much or as detailed feedback as they
need. Indeed, after the end of the course there is no teacher to offer feedback.
Consequently, judgements must be formed for ourselves, whether we are stu-
dents or practitioners, drawing upon whatever resources are available to us.
The idea of assessment as informing judgement can take us seamlessly from
being a student to becoming a professional practitioner. It is an integral part of
ongoing learning and developing the capacity to be a practitioner.
There are many implications of viewing assessment as informing student
judgement (Boud & Falchikov, 2007). The most fundamental is to create the
circumstances that encourage students to see themselves as active agents in
their own learning. Without a powerful sense that they are actively shaping
themselves as persons who can exercise increasingly sophisticated kinds of
judgement, they are destined to become dependent on others. These others
may be the lecturer or tutor while in university, but later these may morph
into experts, authority figures and employers. Of course, there is nothing wrong
in respecting the judgement of such people and taking these into account.
Difficulties arise though when judgement apart from such people cannot be
made. When this occurs, substantial risks arise; calculations may be not suffi-
ciently checked, ethical considerations may be glossed over and implications of
decisions not adequately considered. If someone else is believed to be the final
arbiter of the quality of one’s work, then responsibility has not been accepted.
An emphasis on producing students who see themselves as active agents
requires a new focus on fostering reflexivity and self-regulation through all
aspects of a course, not just assessment tasks. It cannot be expected that judge-
ment comes fully developed into play in assessment acts. What precedes assess-
ment must also make a significant contribution. This leads to the importance of
organising opportunities for developing informed judgement throughout
3 How Can Practice Reshape Assessment? 37

programs. It could be argued that structuring occasions for this is the most
important educational feature of any course. Fostering reflexivity and self-
regulation is not something that can take place in one course unit and be expected
to benefit others. It is a fundamental attribute of programs that needs to be
developed and sustained throughout. Assessment must be integrated with learn-
ing and integrated within all elements of a program over time. As soon as one unit
is seen to promote dependency, it can detract from the overall goal of the
program.
These are more demanding requirements than they appear at first sight. They
require vigilance for all assessment acts, even the most apparently innocuous.
For example, tests at the start of a course unit which test simple recall of
terminology as a pre-requisite for what follows, can easily give the message
that remembering the facts is what is required in this course. This does not
mean that all such tests are inappropriate, but it does mean that the total
experience of students needs to be considered, and perhaps tests of one kind
need to be balanced with quite different activities in order to create a program
that has the ultimate desired outcomes.
This points to the need to examine the consequences of all assessment acts
for learning. Do they or do they not improve students’ judgement, and if so,
how do they do it? If they do not, then they are insufficiently connected with
their main raison d’eˆtre of education to be justified – if assessment does not
actively support the kinds of learning that are sought, then it is at least a missed
opportunity, if not an act that undermines the learning outcome.

Apprenticeship as a Prototype

It can help to gain some perspective on higher education practice by considering


the other example of a situation mentioned earlier in which assessment and
learning occurs. As we have seen, there is a tradition of assessment that comes
from the world of work. This tradition of the artisan, craftsperson and apprentice
predates almost all of formal education (Tanggaard & Elmholdt, 2008). Here the
emphasis has been on the formation of the person into an expert practitioner. All
judgements are made in comparison to what is good work and how that can be
achieved. While there is an ultimate concern with final products, what is more
important is how they are achieved. Correct processes are what is emphasised.
These will lead to good products, but final production of complete items may be
delayed for a considerable time. What is particularly interesting in the case of
apprenticeships is how robust this practice has been over time and changing
patterns of work and how the de facto assessment activities have been able to
withstand the re-regulation of assessment tasks into new competency-based
frameworks. One of the main reasons for the strength of apprenticeships is that
they are unambiguously preparing students for particular forms of practice that
are visible to them and their workplace teachers throughout their training.
38 D. Boud

What more can we learn from the apprenticeship tradition? Firstly, it has a
strong base in a community of practice in which the apprentice is gradually
becoming a part. That is, the apprentice sees the practices of which he or she is
becoming a part. They can imagine their involvement and vicariously, and
sometimes directly, experience the joys and frustrations of competent work.
The apprentice is a normal part of the work community and is accepted as a part
of it. There are clear external points of reference for judgements that are made
about work itself. These are readily available and are used on a regular basis.
Practice is reinforced by repetition and skill is developed. In this, assessment
and feedback are integrated into everyday work activities. They are not isolated
from it and conducted elsewhere. Final judgements are based on what a person
can do in a real setting, not on a task abstracted from it. Grades are typically not
used. There is no need to extract an artificial measure when competence or what
is ‘fit-for-purpose’ are the yardsticks.
Of course, we can only go so far in higher education in taking such an
example. Higher education is not an apprenticeship and it is not being suggested
that it should become like one. In apprenticeship, the skills developed are of a
very high order, but the range of activity over which they are deployed can be
more limited than in the professions and occupations for which higher education
is a preparation. The challenge for higher education is to develop knowledge and
skills in a context in which the opportunities for practice and the opportunities to
see others practice are more restricted but where high-level competencies are
required. This does not suggest that practice be neglected, it means though that it
is even more important to focus on opportunities for it to be considered than the
apparent luxury of the everyday practice setting of the apprenticeship.

Assessment for Practice

What else can we take from a practice view for considering how assessment in
higher education should be conducted? The first, and most obvious considera-
tion is that assessment tasks must be contextualized in practice. Learning occurs
for a purpose, and while not all ultimate purposes are known at the time of
learning, it is clear that they will all involve applications in sites of practice. To
leave this consideration out of assessment then is to denature assessment and
turn it into an artificial construct that does not connect with the world. Test
items that are referential only to ideas and events that occur in the world of
exposition and education may be suitable as intermediate steps towards later
assessment processes, but they are not realistic on their own. While there may be
scope in applied physics, for example, for working out the forces on a weightless
pulley or on an object on a frictionless surface, to stop there is to operate in a
world where there is a lack of consequence. To make this assumption is to deny
that real decisions will have to be made and to create the need for students to
unlearn something in order to operate in the world.
3 How Can Practice Reshape Assessment? 39

The second consideration, which is related to this, is that performance


should be judged by the standards of the practice itself, not an abstraction
which owes little to an understanding of it. The key question to be considered is:
what is the appropriate community of judgement that should be the reference
point for assessment purposes? That is, from where should standards be drawn?
This is an issue for both students as they learn to discern appropriate sources of
standards for their work and also for teachers and assessors. The community of
judgement may vary for any given set of subject matter. What is judged to be an
appropriate level and type of mathematical knowledge, for example, may vary
between engineers who use mathematics, and mathematicians who may have a
role in teaching it.
A focus on standards also draws attention to the problem that there are far
more things to learn, know and do than can possibly be included in the assess-
ment regime of any particular course or unit of study. Rather than attempt to
squeeze an excessive number of outcomes into assessment acts it may be
necessary, as the late Peter Knight has persuasively argued (Knight, 2007), to
ensure that the environment of learning provides sufficient opportunities to
warrant that learning has occurred rather than to end-load course assessment
with so many tasks that they have to be approached by students in a manner
that produces overload and ensures they are dealt with in a superficial way.
Finally, recognition from practice that learning is necessarily embodied and
engages the emotions and volition of learners points to the need to acknowledge
that assessment has visceral effects rather than to ignore them. This implies that
assessment needs to have consequences other than the grading of students.
Students need to be involved in the impact of their learning on others. Part of
this may be simulated, but part, as occurs when there are placements in practice
settings, may be real. When they are real, as happens in the teacher education
practicum or the nursing clinical placement, students are involved with real
children or patients, but they are supervised and the risk of possible negative
consequences is controlled. Nevertheless, such settings provide students with
the social-emotional environment to enable them to experience the emotional
consequences of their actions.
To draw this together, what is being argued here is not that we must move
assessment into practice-settings in the way that has occurred in some aspects of
vocational education and training, but that an awareness of and sensitivity to
practice needs to pervade ways in which assessment is conceptualised and to
balance some of the short-term and technical considerations that have domi-
nated the agenda.
While consideration of practice can reshape ways in which we think about
assessment, practice settings can also create challenges for assessment. In con-
sidering this there is a need to distinguish between the locations available during
courses for students to practise, and practice-settings more widely. During
courses students are exposed to a limited range of settings and may take up
partial roles of practice within them. The argument here is not about the
location of assessment, nor of its content, but of the overriding purpose of
40 D. Boud

involving students in making judgements so that they and others can take a view
about whether learning suitable for life after courses has been achieved and
what more needs to be engaged in.

Implications
Why then should we consider starting with practice as a key organiser for
assessment? As we have seen, it is anchored in the professional world, not the
world of educational institutions. This means that there are multiple views of
practice that are available external to the educational enterprise. Practice
focuses attention on work outside the artifacts of the course – there is a point
of reference for decision-making beyond course-determined assessment criteria
and actions that take place have consequences beyond those of formal assess-
ment requirements. These create possibilities for new approaches to assessment.
In addition to this, judgements of those in a practice situation (professional or
client) make a difference to those involved. That is, there are consequences
beyond the learning of the student that frame and constrain actions. These
provide a reality-check not available in an internally-referenced assessment
context. These considerations raise the stakes, intensify the experience and
embody the learner more thoroughly in situations that anticipate engagement
as a full professional. As we see beyond our course into the world of profes-
sional practice, assessment becomes necessarily authentic: authenticity does not
need to be contrived.
To sum up, a ‘practice’ perspective helps us to focus on a number of issues
for assessment tasks within the mainstream of university courses. These include:
1. Locating assessment tasks in authentic contexts.
These need not necessarily involve students being placed in external work
settings, but involve the greater use of features of authentic contexts to frame
assessment tasks. They could model or simulate key elements of authentic
contexts.
2. Establishing holistic tasks rather than fragmented ones.
The least authentic of assessment tasks are those taken in isolation and
disembodied from the settings in which they are likely to occur. While
tasks may need to be disaggregated for purposes of exposition and rehearsal
of the separate elements, they need to be put back together again if students
are to see knowledge as a whole.
3. Focusing on the processes required for a task rather than the product or
outcome per se.
Processes and ways of approaching tasks can often be applied from one
situation to another whereas the particularities of products may vary mark-
edly. Involving students in ways of framing tasks in assessment is often
neglected in conventional assessment.
3 How Can Practice Reshape Assessment? 41

4. Learning from the task, not just demonstrating learning through the task.
A key element of learning from assessment is the ability to identify cues from
tasks themselves which indicate how they should be approached, the criteria to
be used in judging performance and what constitutes successful completion.
5. Having consciousness of the need for refining the judgements of students, not
just the judgement of students by others.
Learning in practice involves the ability to continuously learn from the
tasks that are encountered. This requires progressive refinement of judgements
by the learner which may be inhibited by the inappropriate deployment of the
judgements of others when learners do not see themselves as active agents.
6. Involving others in assessment activities, away from an exclusive focus on the
individual.
Given that practice occurs in a social context, it is necessary that the skill of
involving others is an intrinsic part of learning and assessment. Assessment
with and for others needs far greater emphasis in courses.
7. Using standards appropriate to the task, not on comparisons with other
students.
While most educational institutions have long moved from inappropriate
norm-referenced assessment regimes, residues from them still exist. The
most common is the use of generic rather than task-specific standards
and criteria that use statements of quality not connected to the task in
hand (eg. abstract levels using terms such as adequate or superior perfor-
mance, without a task-oriented anchor).
8. Moving away from an exclusive emphasis on independent assessment in each
course unit towards development of assessment tasks throughout a program
and linking activities from different courses.
The greatest fragmentation often occurs through the separate treatment of
individual course units for assessment purposes. Generic student attributes
can only be achieved through coordination and integration of assessment
tasks across units. Most of the skills of practice discussed here develop over
time and need practice over longer periods than a semester and cannot be
relegated to parts of an overall program.
9. Acknowledging student agency and initiation rather than have them always
responding to the prompts of others.
The design of assessment so that it always responds to the need to build
student agency in learning and development is a fundamental challenge
for assessment activities. This does not mean that students have to choose
assessment tasks, but that they are constructed in ways that maximise active
student involvement in them.
10. Building in an awareness of co-production of outcomes with others.
Practitioners not only work with others, but they co-produce with them.
This implies that there needs to be assessment tasks in which students
co-construct outcomes. While this does not necessarily require group
assessment as such, it needs to design activities with multi-participant out-
comes into an overall regime.
42 D. Boud

The challenge the practice perspective creates is to find ways of making some
of these shifts in assessment activities in a higher education context that is
moving rapidly in an outcomes-oriented direction, but which embodies the
cultural practices of an era deeply sceptical of the excessively vocational. The
implication of taking such a perspective is not that more resources are required
or that we need to scrap what we are doing and start again. It does however
require a profound change of perspective. We need to move from privileging
our own academic content and of assessing students as if our part of the course
was more important than anything else, to a position that is more respectful of the
use of knowledge, of the program as a whole and the need to build the capacity of
students to learn and assess for themselves once they are out of our hands. Some
of the changes required are incremental and involve no more than altering
existing assessment tasks by giving them stronger contextual features. However,
others create a new agenda for assessment and provoke us to find potentially
quite new assessment modes that involve making judgements in co-production of
knowledge. If we are to pursue this agenda, we need to take up the challenges and
operate in ways in which our graduates are being increasingly required to operate
in the emerging kinds of work of the twenty-first century.

References
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education,
5(1), 7–74.
Boud, D. (1995). Enhancing learning through self assessment. London: Kogan Page.
Boud, D. (2006, July). Relocating reflection in the context of practice: Rehabilitation or
rejection? Keynote address presented at Professional Lifelong Learning: Beyond Reflec-
tive Practice, a conference held at Trinity and All Saints College. Leeds: Institute for
Lifelong Learning, University of Leeds. Retrieved 20 October, 2007, from http://www.
leeds.ac.uk/educol/documents/155666.pdf
Boud, D. (2007). Reframing assessment as if learning was important. In D. Boud &
N. Falchikov (Eds.), Rethinking assessment in higher education: Learning for the longer
term (pp. 14–25) London: Routledge.
Boud, D., & Falchikov, N. (2006). Aligning assessment with long term learning. Assessment
and Evaluation in Higher Education, 31(4), 399–413.
Boud, D., & Falchikov, N. (2007). Developing assessment for informing judgement. In
Boud, D. and Falchikov, N. (Eds) Rethinking Assessment in Higher Education: Learning
for the Longer Term (pp. 181–197). London: Routledge.
Gibbs, G. (2006). How assessment frames student learning. In K. Clegg & C. Bryan (Eds.),
Innovative Assessment in Higher Education. London: Routledge.
Hager, P., & Butler, J. (1996). Two models of educational assessment. Assessment and
Evaluation in Higher Education, 21(4), 367–378.
Knight, P. (2007). Grading, classifying and future learning. In D. Boud & N. Falchikov (Eds.),
Rethinking Assessment in higher education: Learning for the longer term (pp. 72–86) London:
Routledge.
Kvale, S. (2008). A workplace perspective on school assessment, In A. Havnes & L. McDowell
(Eds.), Balancing dilemmas in assessment and learning in contemporary education
(pp. 197–208) New York: Routledge.
3 How Can Practice Reshape Assessment? 43

Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation.
Cambridge, UK: Cambridge University Press.
Nicol, D. J., & MacFarlane-Dick, D. (2006). Formative assessment and self-regulated learn-
ing: A model and seven principles of good feedback practice. Studies in Higher Education,
31(2), 199–218.
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instruc-
tional Science, 18, 119–144.
Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in Education,
5(1), 77–84.
Schatzki, T. R. (2001). Introduction: Practice theory. In T. Schatzki, K. Knorr Cetina, &
E. von Savigny (Eds.), The practice turn in contemporary theory (pp. 1–14). London:
Routledge.
Schwandt, T. (2005). On modelling our understanding of the practice fields. Pedagogy,
Culture & Society, 13(3), 313–332.
Tanggaard, L., & Elmholdt, C. (2008). Assessment in practice: An inspiration from appren-
ticeship. Scandinavian Journal of Educational Research, 52(1), 97–116.
Chapter 4
Transforming Holistic Assessment and Grading
into a Vehicle for Complex Learning

D. Royce Sadler

Introduction

One of the themes running through my work since 1980 has been that students
need to develop the capacity to monitor the quality of their own work during its
actual production. For this to occur, students need to appreciate what consti-
tutes work of higher quality; to compare the quality of their emerging work with
the higher quality; and to draw on a store of tactics to modify their work as
necessary. In this chapter, this theme is extended in two ways. The first is an
analysis of the fundamental validity of using preset criteria as a general
approach to appraising quality. The second is a teaching design that enables
holistic appraisals to align pedagogy with assessment.
For the purposes of this chapter, a course refers to a unit of study that forms a
relatively self-contained component of a degree program. A student response to
an assessment task is referred to as a work. The assessed quality of each work is
represented by a numerical, literal or verbal mark or grade. Detailed feedback
from the teacher may accompany the grade. For the types of works of interest in
this chapter, grades are mostly produced in one of two ways.
In analytic grading, the teacher makes separate qualitative judgments on a
limited number of properties or criteria. These are usually preset, that is, they
are nominated in advance. Each criterion is used for appraising each student’s
work. The teacher may prescribe the criteria, or students and teachers may
negotiate them. Alternatively, the teacher may require that students develop
their own criteria as a means of deepening their involvement in the assessment
process. In this chapter, how the criteria are decided is not important. After the
separate judgments on the criteria are made, they are combined using a rule or
formula, and converted to a grade. Analytic grading is overtly systematic. By
identifying the specific elements that contribute to the final grade, analytic
grading provides the student with explicit feedback. The template used in

D.R. Sadler
Griffith Institute for Higher Education, Mt Gravatt Campus, Griffith University,
Nathan, Qld, Australia
e-mail: r.sadler@griffith.edu.au

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 45


DOI: 10.1007/978-1-4020-8905-3_4, Ó Springer ScienceþBusiness Media B.V. 2009
46 D.R. Sadler

implementing the process may be called a rubric, or any one of scoring, marking
or grading paired with scheme, guide, matrix or grid. As a group, these models
are sometimes referred to as criterion-based assessment or primary trait analysis.
In holistic or global grading, the teacher responds to a student’s work as a
whole, then directly maps its quality to a notional point on the grade scale.
Although the teacher may note specific features that stand out while appraising,
arriving directly at a global judgment is foremost. Reflection on that judgment
gives rise to an explanation, which necessarily refers to criteria. Holistic grading
is sometimes characterised as impressionistic or intuitive.
The relative merits of analytic and holistic grading have been debated for
many years, at all levels of education. The most commonly used criterion for
comparison has been scorer reliability. This statistic measures the degree of
consistency with which grades are assigned to the same set of works by different
teachers (inter-grader reliability), or by the same teacher on separate occasions
(temporal reliability). Scorer reliability is undoubtedly a useful criterion, but is
too narrow on its own. It does not take into account other factors such as the
skills of the markers in each method, or the extent to which each method is able
to capture all the dimensions that matter.
The use of analytic grading schemes and templates is now firmly established
in higher education. Internationally, rapid growth in popularity has occurred
since about 1995. Nevertheless, the basic ideas are not new. Inductively decom-
posing holistic appraisals goes back at least to 1759, when Edmund Burke set
out to identify the properties that characterise beautiful objects in general. In
the forward direction, the theory and practice of assembling overall judgments
from discrete appraisals on separate criteria has developed mostly over the last
50 years. It has given rise to an extensive literature touching many fields. Key
research areas have been clinical decision making (Meehl, 1954/1996) and
human expertise of various types (Chi, Glaser & Farr, 1988; Ericsson &
Smith, 1991). The terminology used is diverse, and includes ‘policy capturing’
and ‘actuarial methods’.
Specifically in educational assessment, Braddock, Lloyd-Jones, and Schoer
(1963) reported early developmental work on analytic approaches to grading
English composition, and the rationale for it; primary trait scoring is described
in Lloyd-Jones (1977). Researchers in higher education assessment have
explored in recent years the use of criteria and rubrics, specifically involving
students in self- and peer-assessment activities (Bloxham & West, 2004;
Orsmond, Merry & Reiling, 2000; Rust, Price & O’Donovan, 2003; Woolf,
2004). Many books on assessment in higher education advocate analytic
grading, and provide practitioners with detailed operational guidelines.
Examples are Freeman and Lewis (1998), Huba and Freed (2000), Morgan,
Dunn, Parry, and O’Reilly (2004), Stevens and Levi (2004), Suskie (2004), and
Walvoord and Anderson (1998).
For the most part, both the underlying principles and the various methods of
implementation have been accepted uncritically. In this chapter, the sufficiency
of analytic grading as a general approach for relevant classes of student works is
4 Transforming Holistic Assessment and Grading 47

called into question, on both theoretical and practical grounds. The basic
reason is that it sets up appraisal frameworks that are, in principle, sub-optimal.
Although they work adequately for some grading decisions, they do a disservice
to others by unnecessarily constraining the scope of appraisals. The assumption
that using preset criteria is unproblematic has had two inhibiting effects. First,
teachers typically have not felt free to acknowledge, especially to students, the
existence or nature of certain limitations they encounter. Second, there has been
little or no imperative to explore and develop alternative ways forward.
The theme of this chapter is developed around five propositions. The first
four are dealt with relatively briefly; the fifth is assigned a section of its own. The
driving principle is that if students are to achieve consistently high levels of
performance, they need to develop a conceptualisation of what constitutes
quality as a generalised attribute (Sadler, 1983). They also need to be inducted
into evaluating quality, without necessarily being bound by tightly specified
criteria. This approach mirrors the way multi-criterion judgments are typically
made by experienced teachers. It is also an authentic representation of the ways
many appraisals are made in a host of everyday contexts by experts and
non-experts alike. Equipping students with evaluative insights and skills there-
fore contributes an important graduate skill. All five propositions are taken into
account in the second half of the chapter, which outlines an approach to the
assessment of complex student productions.

Applicable Types of Assessment Tasks


The types of tasks to which this chapter applies are those that require divergent
or ‘open’ responses from students. Divergent tasks provide opportunities
for learners to demonstrate sophisticated cognitive abilities, integration of
knowledge, complex problem solving, critical reasoning, original thinking,
and innovation. Producing a response requires abilities in both design and
production, allowing considerable scope for creativity. There are no formal
techniques or recipes which, if followed precisely, would lead to high-quality
responses. There is also no single correct or best answer, result or solution.
Common formats for divergent responses include field and project reports,
seminar presentations, studio and design productions, specialised artefacts, pro-
fessional performances, creative works, term papers, essays, and written assign-
ments. In assessing achievement across a broad range of disciplines and
professions, divergent responses predominate. Within each genre, student
works may take quite different forms, yet be of comparable quality. This char-
acteristic is regarded as highly desirable in many domains of higher education.
Determining the quality of divergent types of works requires skilled, qualita-
tive judgments using multiple criteria. A qualitative judgment is one made
directly by a person, the person’s brain being both the source and the instrument
for appraisal (Sadler, 1989). The judgment cannot be reduced to a set
48 D.R. Sadler

of measurements or formal procedures that lead to the ‘correct’ appraisal.


Qualitative judgments are unavoidable in many fields of higher education, and
both holistic and analytic grading are based on them. The two approaches differ
primarily in their granularity. Holistic grading involves appraising student works
as integrated entities; analytic grading requires criterion-by-criterion judgments.
Historically, a steady swing has occurred away from holistic and towards
analytic judgments, and then a further trend has occurred within analytic
judgments. When scoring guides and marking schemes first became common,
the focus tended to be on either the inclusion or omission of specific content, or
the structure of the response. For a written piece, this structure could be
Introduction, Statement of the problem, Literature review, Development of
an argument or position, and Conclusion. The subsequent shift has concen-
trated on properties or dimensions related to quality. Regardless of focus, all
analytic grading schemes introduce formal structure into the grading process,
ostensibly to make it more objective and thus reduce the likelihood of favourit-
ism or arbitrariness.

The First Four Propositions


Already referred to briefly above, Proposition 1 is that students need to develop
the capacity to monitor the quality of their work during its actual production
(Sadler, 1989). In relation to creating responses to assessment tasks, this
capability needs to be acquired as a course proceeds. Teaching therefore
needs to be designed so as to make specific provision for its development. As
the learning sequence progresses, students’ understanding of quality needs not
only to grow but also to become broadly consonant with that held by the
teacher. This is partly because the teacher usually has a strong say in the final
grade, and partly because the teacher’s feedback does not make much sense
otherwise. But there are deeper implications. Ultimately, the concept of quality
needs to relate to works that graduates will produce after their formal studies
are completed, as they demonstrate professional expertise. This implies, there-
fore, that the teacher’s frame of reference about quality should reflect the
conventions and expectations evident in other relevant environments such as
the arts and professions, industry, and elsewhere in academia.
Self-monitoring means that students make conscious judgments on their
own, without help from teachers or peers. It entails being weaned away from
ongoing dependence on external feedback, irrespective of its source or char-
acter. Among other things, self-monitoring requires an appreciation of what
makes a work of high quality. It also requires enough evaluative skill to
compare, with considerable detachment, the quality of what the producer is
creating with what would be needed for it to be of high quality. For self-
monitoring to have any impact on an emerging work, the student also needs a
repertoire of alternative moves upon which to draw at any pertinent point or
4 Transforming Holistic Assessment and Grading 49

stage in the development. Otherwise the work cannot be improved. This in turn
necessitates that the student becomes sensitive to where those ‘pertinent points’
are, as they arise, during construction.
Many of the students whose usual levels of performance are mediocre are
hampered by not knowing what constitutes work of high quality. This sets an
upper bound on their ability to monitor the quality of their own developing
work (Sadler, 1983). Raising students’ knowledge about high quality and their
capability in self-monitoring can lead to a positive chain of events. These are
improved grades, increased intrinsic satisfaction, enhanced motivation and, as
a consequence, higher levels of achievement. Within the scope of a single course,
it is obviously not realistic to expect learners to become full connoisseurs. But as
a course proceeds, the learners’ judgments about the quality of their own works
should show progressively smaller margins of error. Self-monitoring raises self-
awareness and increases the learner’s metacognition of what is going on.
Attaining this goal is not intrinsically difficult, but it does require that a number
of specific conditions be met. Not to achieve the goal, however, represents a
considerable opportunity loss.
Proposition 2 is that students can develop evaluative expertise in much the
same way as they develop other knowledge and skills, including the substantive
content of a course. Skilled appraisal is just one of many types of expertise,
although it seldom features explicitly among course objectives. Initially, a key
tool for developing it is credible feedback, primarily from the teacher and peers.
Feedback usually takes the form of descriptions, explanations or advice
expressed in words. Preset criteria coupled with verbal feedback stem from a
desire to tell or inform students. It might be thought that the act of telling would
serve to raise the performance ceiling for learners, but just being told is rarely an
adequate remedy for ignorance. The height of the ceiling depends on what
students make of what is ‘told’ to them.
The next step along the path can be taken when relevant examples
complement the verbal descriptions (Sadler 1983, 1987, 1989, 2002). Examples
provide students with concrete referents. Without them, explanatory
comments remain more or less abstract, and students cannot interpret them
with certainty. If the number of examples is small, they need to be chosen as
judiciously as examples are for teaching. If examples are plentiful, careful selection
is not as critical, provided they cover a considerable range of the quality spectrum.
Even more progress can be made if teachers and learners actively discuss the
descriptions and exemplars together. Reading verbal explanations, seeing perti-
nent exemplars, and engaging in discourse provide categorically different cogni-
tive inputs. But the best combination of all three still does not go far enough.
The remaining element is that students engage in making evaluative
decisions themselves, and justifying those decisions. No amount of telling,
showing or discussing is a substitute for one’s own experience (Sadler 1980,
1989). The student must learn how to perceive works essentially through the
eyes of an informed critic, eventually becoming ‘calibrated’. Learning
50 D.R. Sadler

environments are self-limiting to the extent that they fail to make appropriate
provision for students to make, and be accountable for, serious appraisals.
Proposition 3 is that students’ direct evaluative experience should be relevant
to their current context, not translated from another. The focus for their
experience must therefore be works of a genre that is substantially similar to
the one in which they are producing. Apart from learning about quality, closely
and critically examining what others have produced in addressing assessment
tasks expands the student’s inventory of possible moves. These then become
available for drawing upon to improve students’ own work. This is one of the
reasons peer assessment is so important.
However, merely having students engage in peer appraisal in order to make
assessment more participatory or democratic is not enough. Neither is treating
students as if they were already competent assessors whose appraisals deserve
equal standing with those of the teacher, and should therefore contribute to
their peers’ grades. The way peer assessment is implemented should reflect the
reasons for doing it. Learners need to become reasonably competent not only at
assessing other students’ works but also at applying that knowledge to their
own works.
Proposition 4 is that the pedagogical design must function not only effectively
but also efficiently for both teachers and students. There is obviously little point
in advocating changes to assessment practices that are more labour intensive than
prevailing procedures. Providing students with sufficient direct evaluative experi-
ence can be time consuming unless compensating changes are made in other
aspects of the teaching. This aspect is taken up later in the chapter.

The Fifth Proposition


Ideally, students should learn how to appraise complex works using approaches
that possess high scholarly integrity, are true to the ways in which high-quality
judgments are made professionally, and have considerable practical potential
for improving their own learning. Proposition 5 is that students in many higher
education contexts should learn how to make judgments about the quality of
emerging and finished works holistically rather than using analytic schemes.
The case for this proposition is developed by first analysing the rationale for
analytic judgments, and then mounting a critique of the method generally.
In recent years, analytic grading schemes using preset criteria have been
advocated as superior to holistic appraisals. The rationale for this is more
often implied than stated. Basically, it is that such systems:

a) improve consistency and objectivity in grading, because the appraisal


process is broken down into smaller-scale judgments;
b) make transparent to students, as an ethical obligation, the key qualities that
will be taken into account;
4 Transforming Holistic Assessment and Grading 51

c) encourage students to attend to the assessment criteria during development


of their work, so the criteria can play a product-design role which comple-
ments the assessment task specifications;
d) enable grading decisions to be made by comparing the quality of a student’s
work with fixed criteria and standards rather than to the learner’s previous
level of achievement, the performance of others in the class, or the teacher’s
personal tastes or preferences; and
e) provide accurate feedback more efficiently, with less need for the teacher to
write extensive comments.

These arguments appear sound and fair. Who would argue against an
assessment system that provides more and better feedback, increases transpar-
ency, improves accountability, and achieves greater objectivity, all with no
increase in workload? On the other hand, the use of preset criteria accompanied
by a rule for combining judgments is not the only way to go. In the critique
below, a number of issues that form the basis of Proposition 5 are set out. The
claim is that no matter how comprehensive and precise the procedures are, or
how meticulously they are followed, they can, and for some student works do,
lead to deficient or distorted grading decisions. This is patently unfair to those
students.
In the rest of this section, the case for holistic judgments is presented in
considerable detail. Although any proposal to advocate holistic rather than
analytic assessments might be viewed initially as taking a backward step, this
chapter is underwritten by a strong commitment to students and their learning.
Specifically, the aim is to equip students to work routinely with holistic apprai-
sals; to appreciate their validity; and to use them in improving their own work.
The five clauses in the rationale above are framed specifically in terms of criteria
and standards. This wording therefore presupposes an analytic model for grad-
ing. By the end of this chapter, it should become apparent how the ethical
principles behind this rationale can be honoured in full through alternative
means. The wording of the rationale that corresponds to this alternative would
then be different.

Beginning of the Case

Whenever a specific practice becomes widespread in a field of human activity,


each implementation of it contributes to its normalisation. The message that
this practice is the only or preferred approach does not have to be commu-
nicated explicitly. Consistent uncritical use sends its own strong signals. In this
section, two particular analytic assessment schemes, analytic rating scales and
analytic rubrics, are singled out for specific attention. Both are common in
higher education, and both are simple for students to understand and apply.
With analytic rating scales, multiple criteria (up to 10 or more for complex
works) are first specified. Each criterion has an associated scale line defined. In
52 D.R. Sadler

use, the appraiser makes a qualitative judgment about the ‘strength’ or level of
the work on each criterion, and marks the corresponding point on the scale
line. If a total score is required, the relative importance of each criterion is
typically given a numerical weighting. The sub-scores on the scales are multi-
plied by their respective weightings. Then all the weighted scores are summed,
and the aggregate either reported directly or turned into a grade using a
conversion table.
An analytic rubric has a different format. Using the terminology in Sadler
(1987, 2005), a rubric is essentially a matrix of cross-tabulated criteria and
standards or levels. (This nomenclature is not uniform. Some rubrics use
qualities/criteria instead of criteria/standards as the headings.) Each standard
represents a particular level on one criterion. Common practice is for the
number of standards to be the same for all criteria, but this is not strictly
necessary. A rubric with five criteria, each of which has four standards, has a
total of 20 cells. Each cell contains a short verbal description that sets out a
particular strength of the work on the corresponding criterion. This is usually
expressed either as a verbal quantifier (how much), or in terms of sub-attributes
of the main criterion. For each student work, the assessor identifies the single
cell for each criterion that best seems to characterise the work. The rubric may
also include provision for the various cell standards to carry nominated ranges
of numerical values that reflect weightings. A total score can then be calculated
and converted to a grade.
Holistic rubrics form a less commonly used category than the two above.
They associate each grade level with a reasonably full verbal description,
which is intended as indicative rather than definitive or prescriptive. These
descriptions do not necessarily refer to the same criteria for all grade levels.
Holistic rubrics are essentially different from the other two and have a differ-
ent set of limitations, particularly in relation to feedback. They are not
considered further here.
In most analytic schemes, there is no role for distinctly global primary
appraisals. At most, there may be a criterion labelled overall assessment that
enjoys essentially the same status as all the other criteria. Apart from the
technical redundancy this often involves, such a concession to valuing overall
quality fails to address what is required. Students therefore learn that global
judgments are normally compounded from smaller-scale judgments. The learn-
ing environment offers regular reinforcement, and the stakes for the student
are high.
Grading methods that use preset criteria with mechanical combination
produce anomalies. The two particular anomalies outlined below represent
recurring patterns, and form part of a larger set that is the subject of ongoing
research. These anomalies are detected by a wide range of university teachers, in
a wide range of disciplines and fields, for a wide range of assessment task types
and student works. The same anomalies are, however, invisible to learners, and
the design of the appraisal framework keeps them that way. Therein lies the
problem.
4 Transforming Holistic Assessment and Grading 53

Anomaly 1

Teachers routinely discover some works for which global impressions of their
quality are categorically at odds with the outcomes produced by conscientious
implementation of the analytic grading scheme. Furthermore, the teacher is at a
loss to explain why. A work which the teacher would rate as ‘‘brilliant’’ overall
may not be outstanding on all the preset criteria. The whole actually amounts
to more than the sum of its parts. Conversely, a work the teacher would rate as
mediocre may come out extremely well on the separate criteria. For it, the whole
is less than the sum of its parts. This type of mismatch is not confined to
educational contexts.
Whenever a discrepancy of this type is detected, teachers who accept as
authoritative the formula-based grade simply ignore their informal holistic
appraisal without further ado. Other teachers react differently. They question
why the analytic grade, which was painstakingly built from component judg-
ments, fails to deliver the true assessment, which they regard as the holistic
judgment. In so doing, they implicitly express more confidence in the holistic
grade than in the analytic. To reconcile the two, they may adjust the reported
levels on the criteria until the analytic judgment tells the story it ‘‘should’’. At the
same time, these teachers remain perplexed about what causes such anomalies.
They feel especially troubled about the validity and ethics of what could be
interpreted by others as fudging. For these reasons, they are generally reluctant
to talk about the occurrence of these anomalies to teaching colleagues or to
their students. However, many are prepared to discuss them in secure research
environments.
What accounts for this anomaly? There are a number of possible contributing
factors. One is that not all the knowledge a person has, or that a group of people
share, can necessarily be expressed in words (Polanyi, 1962). There exists no
theorem to the effect that the domain of experiential or tacit knowledge, which
includes various forms of expertise, is co-extensive and isomorphic with the
domain of propositional knowledge (Sadler, 1980). Another possible factor is
the manner in which experts intuitively process information from multiple
sources to arrive at complex decisions, as summarised in Sadler (1981). Certain
holistic appraisals do not necessarily map neatly onto explicit sets of specified
criteria, or simple rules for combination. Further exploration of these aspects
lies outside the scope of this chapter. It would require tapping into the extensive
literature on the processes of human judgment, and into the philosophy of
so-called ineffable knowledge.

Anomaly 2
This anomaly is similar to the first in that a discrepancy occurs between the
analytically derived grade and the assessor’s informal holistic appraisal. In this
case, however, the teacher knows that the source of the problem is with a
54 D.R. Sadler

particular criterion that is missing from the preset list. That criterion may be
important enough to set the work apart, almost in a class of its own. To simplify
the analysis, assume there is just one such criterion. Also assume that the
teacher has checked that this criterion is not included on the specified list in
some disguised form, such as an extreme level on a criterion that typically goes
under another name, or some blend of several criteria. Strict adherence to the
analytic grading rule would disallow this criterion from entering, formally or
informally, into either the process of grading or any subsequent explanation. To
admit it would breach the implicit contract between teacher and student that
only specified criteria will be used.
Why do sets of criteria seem comprehensive enough for adequately apprais-
ing some works but not others? Part of the explanation is that, for complex
works, a specified set of criteria is almost always a selection from a larger pool
(or population) of criteria. By definition, a sample does not fully represent a
population. Therefore, applying a fixed sample of criteria to all student works
leaves open the possibility that some works may stand out as exceptional on
the basis of unspecified criteria. This applies particularly to highly divergent,
creative or innovative responses. Arbitrarily restricting the set of criteria for
these works introduces distortion into the grading process, lowering its
validity.
To illustrate this sampling phenomenon, consider a piece of written work
such as an essay or term paper, for which rubrics containing at least four criteria
are readily available. Suppose the teacher’s rubric prescribes the following
criteria:
 relevance
 coherence
 presentation
 support for assertions.
Behind these sits a much larger pool of potentially valid criteria. An idea of the
size of this pool can be obtained by analysing a large number of published sets.
One such (incomplete) collection was published in Sadler (1989). In alphabetical
order, the criteria were:
accuracy (of facts, evidence, explanations); audience (sense of); authenticity; clarity;
coherence; cohesion; completeness; compliance (with conventions of the genre);
comprehensiveness; conciseness (succinctness); consistency (internal); content
(substance); craftsmanship; depth (of analysis, treatment); elaboration; engagement;
exemplification (use of examples or illustrations); expression; figures of speech; flair;
flavour; flexibility; fluency (or smoothness); focus; global (or overall) development;
grammar; handwriting (legibility); ideas; logical (or chronological) ordering (or
control of ideas); mechanics; novelty; objectivity (or subjectivity, as appropriate);
organization; originality (creativity, imaginativeness); paragraphing; persuasiveness;
presentation (including layout); punctuation (including capitalization); readability;
referencing; register; relevance (to task or topic); rhetoric (or rhetorical effectiveness);
sentence structure; spelling; style; support for assertions; syntax; tone; transitions;
usage; vocabulary; voice; wording.
4 Transforming Holistic Assessment and Grading 55

The majority of these would be familiar to experienced teachers, but dealing


with them as separate properties is not at all straightforward. In the abstract,
the listed criteria may appear to represent distinct qualities. When they come to
be applied, however, some seem to merge into others. The reasons are manifold.
First, they are uneven in their scope, some being broad, others narrow. Second,
even common criteria often lack sharp boundaries and standardised interpreta-
tions, being defined differently by different assessors. When their meanings are
probed, vigorous debate usually ensues. Some interpretations are contextually
dependent, being defined differently by the same teacher for different assess-
ment tasks. Third, some are subtle or specialised. An example is flair, which is
relevant to writing and many other creative arts, and appears to capture a
special and valued characteristic that is hard to describe. Fourth, some criteria
are effectively alternatives to, or nested within, others. In addition, one cluster
of criteria may have the same coverage as another cluster, but without one-to-
one correspondence. Finally, suppose it were possible to assemble and clarify
the whole population of relevant criteria. Any attempt to use them all would be
unworkable for assessors and students alike. The obvious way out of this bind is
to restrict the list of criteria to a manageable number of, say, the most impor-
tant, or the most commonly used. In contexts similar to that of written works,
such restriction necessarily leaves out the majority.

A Way Forward: Design for Teaching and Learning


Should the teacher disclose the existence of anomalies to students and, in so
doing, expose the weaknesses of analytic grading? A substantial part of the
rationale for prior specification is that students are entitled to have advance
knowledge of the basis for their teachers’ appraisals. Traditionally, holistic
grading allowed the teacher to keep the reasons more or less private. Given
the anomalies above, it is ironic that strictly honouring a commitment to preset
criteria can be achieved only at the expense of non-disclosure of anomalies
and other limitations once the grading is completed.
The two anomalies above, along with other deficiencies not covered in this
chapter, raise doubts about the sufficiency of relying on analytic frameworks
that use prespecified criteria. These difficulties are structural, and cannot be
dealt with by making templates more elaborate. If there is to be a way forward,
it has to approach the problem from a different direction. The rest of this
chapter is devoted to outlining a design that has dual characteristics. It seeks
to reinstate holistic appraisals for grading a large and significant class of
complex student works. It also allows for at least some criteria to be specified
in advance – without grade determination being dependent on following an
inflexible algorithm. Although this is a step back from grading all responses to
an assessment task by fixing both the criteria and the combination formula, the
commitment to validity, openness and disclosure can remain undiminished.
56 D.R. Sadler

This goal can be achieved by shifting the agenda beyond telling, showing and
discussing how appraisals of students’ works are to be made. Instead, the
process is designed to induct students into the art of making appraisals in a
substantive and comprehensive way. Such a pedagogical approach is integral to
the development of a valid alternative not only to analytic grading but also to
holistic grading as done traditionally.
This induction process can function as a fundamental principle for learning-
oriented assessment. Properly implemented, it recognises fully the responsibility of
the teacher to bring students into a deep knowledge of how criteria actually
function in making a complex appraisal, and of the need for the assessor to supply
adequate grounds for the judgment. In the process, it sets learners up
for developing the capability to monitor the quality of their own work during
its development. In particular, it provides for an explicit focus on the quality of how
the student work is coming together – as a whole – at any stage of development.
The approach outlined below draws upon strategies that have been trialled
successfully in higher education classes along with others that are worthy of
further exploration and experimentation. Major reform of teaching and learn-
ing environments requires significant changes to approaches that have become
deeply embedded in practice over many years. Because the use of preset criteria
has gained considerable momentum in many academic settings, transition
problems are to be expected. Some ideas for identifying and managing these
are also included.

Developing Expertise

The general approach draws from the work of Polanyi (1962). It could be
described as starting learners on the path towards becoming connoisseurs.
There is a common saying that goes, ‘‘I do not know how to define quality,
but I know it when I see it’’. In this statement, to know quality is to recognise,
seize on, or apprehend it. To recognise quality ‘‘when I see it’’ means that I can
recognise it – in particular cases. The concrete instance first gives rise to
perception, then to recognition. Formally defining quality is a different matter
altogether. Whether it is possible for a particular person to construct a defini-
tion depends on a number of factors, such as their powers of abstraction and
articulation. There is a more general requirement, not limited to a particular
person. Are there enough similarly classified cases that share enough key
characteristics to allow for identification, using inductive inference, of what
they have in common? The third factor is touched on in the account of the first
anomaly described above. It is that some concepts appear to be, in principle,
beyond the reach of formal definition. Many such concepts form essential
elements of our everyday language and communication, and are by no means
esoteric. Given these three factors, there is no logical reason to assume that
creating a formal definition is always either worth the effort, or even possible.
4 Transforming Holistic Assessment and Grading 57

On the other hand, it is known empirically that experts can recognise quality
(or some other complex characteristic) independently of knowing any
definition. This is evidence that recognition can function as a fundamentally
valid, primary act in its own right (Dewey, 1939). It is also what makes
connoisseurship a valuable phenomenon, rather than one that is just intriguing.
Basically, connoisseurship is a highly developed form of competence in quali-
tative appraisal. In many situations, the expert is able to give a comprehensive,
valid and carefully reasoned explanation for a particular appraisal, yet is unable
to do so for the general case. In other situations, an explanation for even a
particular case is at best partial. To accept recognition as a primary evaluative
act opens the door to the development of appraisal explanations that are
specifically crafted for particular cases without being constrained by predeter-
mined criteria that apply to all cases. Holistic recognition means that the
appraiser reacts or responds (Sadler, 1985), whereas building a judgment up
from discrete decisions on the criteria is rational and stepwise. That is the key
distinction in intellectual processing. The idea of recognising quality when it
is observed immediately strikes a familiar chord with many people who work
in both educational and non-educational contexts where multi-criterion
judgments are required. Giving primacy to an overall assessment has its coun-
terparts in many other fields and professions.
This does not imply that grading judgments should be entirely holistic or
entirely analytic, as if these were mutually exclusive categories. The two
approaches are by no means incompatible, and how teachers use them is a
matter of choice. To advocate that a teacher should grade solely by making
global judgments without reference to any criteria is as inappropriate as
requiring all grades to be compiled from components according to set rules.
Experienced assessors routinely alternate between the two approaches in order
to produce what they consider to be the most valid grade. This is how they
detect the anomalies above, even without consciously trying. In doing this, they
tend to focus initially on the overall quality of a work, rather than on its
separate qualities. Among other things, these assessors switch focus between
global and specific characteristics, just as the eye switches effortlessly between
foreground (which is more localised and criterion-bound) and background
(which is more holistic and open). Broader views allow things to be seen in
perspective, often with greater realism. They also counter the atomism that
arises from breaking judgments up into progressively smaller elements in a bid
to attain greater precision.
Inducting students into the processes of making multi-criterion judgments
holistically, and only afterwards formulating valid reasons for them, requires a
distinctive pedagogical environment. The end goal is to bring learners at least
partly into the guild of professionals who are able to make valid and reliable
appraisals of complex works using all the tools at their disposal. Among other
things, students need to learn to run with dual evaluative agendas. The first
involves scoping the work as a whole to get a feel for its overall quality; the
second is to pay attention to its particular qualities.
58 D.R. Sadler

In some learning contexts, students typically see only their own works. Such
limited samples cannot provide a sufficient basis for generalisations about
quality. For students to develop evaluative expertise requires that three inter-
connected conditions be satisfied. First, students need exposure to a wide
variety of authentic works that are notionally within the same genre. This is
specifically the same genre students are working with at the time. The most
readily available source of suitable works consists of responses from other
students attempting the same assessment task. As students examine other
students’ works through the lens of critique, they find that peer responses can
be constructed quite differently from one another, even those that the students
themselves would judge to be worth the same grade. They also discover, often
to their surprise, works that do not address the assessment task as it was
specified. They see how some of these works could hardly be classified as
valid ‘‘responses’’ to the assessment task at all, if the task specifications were
to be taken literally. This phenomenon is common knowledge among higher
education teachers, and is a source of considerable frustration to them.
The second condition is that students need access to works that range across
the full spectrum of quality. Otherwise learners have difficulty developing the
concept of quality at all. To achieve sufficient variety, the teacher may need to
supplement student works from the class with others from outside. These may
come from different times or classes, or be specially created by the teacher or
teaching colleagues. The third condition is that students need exposure to
responses from a variety of assessment tasks. In a fundamental sense, a develop-
ing concept of quality cannot be entirely specific to a particular assessment task.
In the process of making evaluations of successive works and constructing
justifications for qualitative judgments, fresh criteria emerge naturally. The
subtlety and sophistication of these criteria typically increase as evaluative
expertise develops. A newly relevant criterion is drawn – on demand – from
the larger pool of background or latent criteria. It is triggered or activated by
some property of a particular work that is noteworthy, and then added
temporarily to the working set of manifest criteria (Sadler, 1983). The work
may exhibit this characteristic to a marked degree, or to a negligible degree. On
either count, its relevance signals that it needs to be brought into the working
set. As latent criteria come to the fore, they are used in providing feedback on
the grade awarded.
Latent criteria may also be shared with other students in the context of the
particular work involved. Starting with a small initial pool of criteria, extending
it, and becoming familiar with needs-based tapping into the growing pool,
constitute key parts of the pedagogical design. This practice expands students’
personal repertoires of available criteria. It also reinforces the way criteria are
translated from latent to manifest, through appraising specific works. As addi-
tional criteria need to be brought into play, a class record of them may be kept
for interest, but not with a view to assembling a master list. This is because the
intention is to provide students with experience in the latent-to-manifest trans-
lation process, and the limitations inherent in using fixed sets of criteria.
4 Transforming Holistic Assessment and Grading 59

Students should then come to understand why this apparently fluid approach to
judging quality is not unfair or some sort of aberration. It is, in a profound
sense, rational, normal and professional.
Although it is important that students appreciate why it is not always possible
to specify all the criteria in advance, certain criteria may always be relevant. In
the context of written works, for example, some criteria are nearly always
important, even when not stated explicitly. Examples are grammar, punctuation,
referencing style, paragraphing, and logical development, sometimes grouped as
mechanics. These relate to basic communicative and structural features. They
facilitate the reader’s access to the creative or substantive aspects, and form part
of the craft or technique side of creative work. Properly implemented, they are
enablers for an appraisal of the substance of the work. A written work is difficult
to appraise if the vocabulary or textual structure is seriously deficient.
In stark contrast to the student’s situation, teachers are typically equipped with
personal appraisal resources that extend across all of the aspects above. Their
experience is constantly refreshed by renewed exposure to a wide range of student
works, in various forms and at different levels of quality (Sadler, 1998). This is so
naturally accepted as a normal part of what being a teacher involves that it hardly
ever calls for comment. It is nevertheless easy to overlook the importance of
comparable experience for students grappling with the concept of quality.
Providing direct evaluative experience efficiently is a major design element.
It is labour intensive, but offsets can be made in the way teaching time is
deployed. Evaluative activity can be configured to be the primary pedagogical
vehicle for teaching a considerable proportion of the substantive content of a
course. For example, in teaching that is structured around a lecture-tutorial
format, students may create, in their own time, responses to a task that requires
them to employ specific high-order intellectual skills such as extrapolating,
making structural comparisons, identifying underlying assumptions, mount-
ing counter-arguments, or integrating elements. These assessment tasks should
be strictly formative, and designed so that students can respond to them
successfully only as they master the basic content. Tutorial time is then spent
having students make appraisals about the quality of, and providing informed
feedback on, multiple works of their peers, and entering into discussions about
the process. In this way, student engagement with the substance of the course
takes place through a sequence of produce and appraise rather than study and
learn activities. Adaptations are possible for other modes of teaching, such as
studio and online.

Challenges of Transition

Complex learning, regardless of the field, requires multiple attempts (practice),


in a supportive, low-stakes environment, with good feedback. To make pro-
gress on the road to connoisseurship is to replace initially inconsistent degrees
60 D.R. Sadler

of success in appraisal, and therefore self-monitoring, by progressively higher


levels of expertise. The pedagogical environment in many higher education
institutions is not set up in ways that facilitate this type of learning. The
necessary changes cut across a number of well-established policies and prac-
tices, and therefore require careful management. Six obstacles or potential
sources of resistance are identified below. The first three are outlined only;
the second three are discussed in more detail.
Obstacle 1 is a view held by many students and teachers, overtly or by
default, that virtually every course exercise or requirement should contribute
towards the course grade. Unless something counts, so the thinking goes, it is
not worth doing. Once this climate is established, teachers know that students
will put little effort into any exercise that does not carry credit towards the
course grade. Student and teacher positions reinforce each other, setting up a
credit accumulation economy. The development of evaluative expertise requires
production, peer appraisal and peer feedback in a context where there is neither
credit nor penalty for trial and error, experimentation, or risk taking. If these
are to become normalised as legitimate processes in learning, ways have to be
found to subvert the credit accumulation economy.
Obstacle 2 arises with teachers who, on occasion, use grading for purposes
other than reporting exclusively on each student’s achievement. Teachers may
inflate a grade to reward a student who has made an outstanding effort, or who
has shown a marked improvement. Such rewards compromise the meaning of
grades and retard the development of evaluative expertise. Students and faculty
have to be hard-nosed in their focus on quality alone, including the match
between the assessment task specifications and the nature of the student
responses. They should therefore rigorously exclude all non-achievement influ-
ences from the assessment environment.
Obstacle 3 arises through institutional policy, grading practices in other
courses, or both. If all or most other courses use rubrics, for example, students
may be wary of any grading method that appears to be unsystematic, subjective
or unfair. Students may also be reluctant to engage in peer appraisal unless they
have a rubric in front of them. Many students have internalised the principle
that using preset criteria is automatically the only or the best way to appraise
complex outcomes. Such conditioning has to be explicitly unlearned (Sadler,
1998). Some ground may be made by explaining the role of appraisal in learning
to produce high-quality works. To develop expertise in this domain follows the
same basic principles that are used in developing other complex skills or forms
of expertise. These involve gathering or receiving information (being told),
processing that information in the light of actual examples (being shown),
and applying the principles for oneself (doing).
Obstacles 4–6 refer specifically to the appraisal processes described in this
chapter, and are conceptually connected. They are expressed particularly
strongly in some cultures. Obstacle 4 is the belief that appraisal or grading is
a teacher role, not a student role. The teacher is perceived not only as the
authoritative figure for course content but also as the only person who has
4 Transforming Holistic Assessment and Grading 61

the knowledge and experience to assign grades. Student peers are simply not
qualified; they may judge too harshly or too leniently, or give feedback that is
superficial and lacking in credibility.
Obstacle 5 has to do with the students’ perceptions of themselves. They may
feel ill-equipped to grade the works of peers. This is true initially, of course, but
changing both the actuality and the self-perception are two of the course goals.
The rationale for using other students’ works is not just one of convenience. It
ensures that the student responses, which are used as raw material, are as
authentic as it is possible to get. This case may be easier to establish with
students if the Course Outline includes an objective specifically to that effect,
and a complementary statement along the following lines:
The learning activities in this course involve self- and peer-assessment. This is an
important part of how the course is taught, and how one of its key objectives will be
attained. Other students will routinely be making judgments about the quality of your
work, and you will make judgments about theirs. The students whose work you
appraise will not necessarily be those who appraise yours. If this aspect of the teaching
is likely, in principle, to cause you personal difficulty, please discuss your concern with
the person in charge of the course within the first two weeks of term.

Obstacle 6 is students’ fear of exposure, loss of face or impending sense of


humiliation among their peers. This may be because they lack experience, status
or skill. These feelings are personal, about themselves and about how confident
they are. Unless students are already familiar with engaging in peer assessment,
they may appear to accept the logic behind a transition to a different pedagogy, but
retain their reservations and reluctance. Such students need reassurance that just
starting on this path is likely to be the hardest part. Once they become accustomed
to it, they typically find it highly rewarding, and their learning improves.
Learning the skills of appraisal and self-monitoring can be compared with
the ways in which many other skills are learned. They are not easy for the
uninitiated, and learners may feel embarrassed at their early attempts if they
know others will become aware of their efforts. By contrast, when young
children are learning to speak, their first bumbling attempts to say a few
words are not treated with scorn. The children are encouraged and cheered on
whenever they get it right. They then repeat the performance, maybe again and
again. Translate this into the higher education context. As soon as students
realise that they are making progress, their confidence grows. They accept new
challenges and become motivated to try for more. Furthermore, they derive joy
and satisfaction from the process. If that could be made a more widespread
phenomenon, it is surely a goal worth striving for.

Conclusion
The practice of providing students with the assessment criteria and their weight-
ings before they respond to an assessment task is now entrenched in higher
education. The rationale contains both ethical and practical elements. Most
62 D.R. Sadler

analytic rubrics and similar templates fix the criteria and the rule for combining
separate judgments on those criteria to produce the grade. This practice is rarely
examined closely, but in this chapter it is shown to be problematic, in principle
and in practice.
To address this problem, the unique value of holistic judgments needs to be
appreciated, with openness to incorporating criteria that are not on a fixed list.
To maintain technical and personal integrity, a number of significant shifts in
the assessment environment are necessary. The commitment to mechanistic use
of preset criteria needs to be abandoned. Teachers and students need to be
inducted into more open ways of making grading decisions and justifying them.
Students need to be provided with extensive guided experience in making global
judgments of works of the same types they produce themselves. Ultimately, the
aim is for learners to become better able to engage in self-monitoring the
development of their own works.
Shifting practice in this direction requires a substantially different alignment
of pedagogical priorities and processes. It also requires specific strategies to
overcome traditions and sources of potential resistance. The aim is to turn the
processes of making and explaining holistic judgments into positive enablers for
student learning.

Acknowledgment I am grateful to Gordon Joughin for his constant support and critical
readings of different versions of this chapter during its development. His many suggestions
for improvement have been invaluable.

References
Bloxham, S., & West, A. (2004). Understanding the rules of the game: marking peer assess-
ment as a medium for developing students’ conceptions of assessment. Assessment &
Evaluation in Higher Education, 29, 712–733.
Braddock, R., Lloyd-Jones, R., & Schoer, L. (1963). Research in written composition.
Urbana, Ill.: National Council of Teachers of English.
Burke, E. (1759). A philosophical enquiry into the origin of our ideas of the sublime and beautiful,
2nd ed. London: Dodsley. (Facsimile edition 1971. New York: Garland).
Chi, M. T. H., Glaser, R., & Farr, M. J. (Eds.), (1988). The nature of expertise. Hillsdale,
NJ: Lawrence Erlbaum.
Dewey, J. (1939). Theory of valuation. (International Encyclopedia of Unified Science, 2 (4)).
Chicago: University of Chicago Press.
Ericsson, K. A., & Smith, J. (Eds.), (1991). Toward a general theory of expertise: Prospects and
limits. New York: Cambridge University Press.
Freeman, R., & Lewis, R. (1998). Planning and implementing assessment. London: Kogan Page.
Huba, M. E., & Freed, J. E. (2000). Learner-centered assessment on college campuses: Shifting
the focus from teaching to learning. Needham Heights, Mass: Allyn & Bacon.
Lloyd-Jones, R. (1977). Primary trait scoring. In C. R. Cooper & L. Odell (Eds.), Evaluating
writing: Describing, measuring, judging. Urbana, Ill.: National Council of Teachers of
English.
Meehl, P. E. (1996). Clinical versus statistical prediction: A theoretical analysis and a review of
the evidence (New Preface). Lanham, Md: Rowan & Littlefield/Jason Aronson. (Original
work published 1954).
4 Transforming Holistic Assessment and Grading 63

Morgan, C., Dunn, L., Parry, S., & O’Reilly, M. (2004). The student assessment handbook:
New directions in traditional and online assessment. London: RoutledgeFalmer.
Orsmond, P., Merry, S., & Reiling, K. (2000). The use of student derived marking criteria in
peer and self-assessment. Assessment & Evaluation in Higher Education, 25, 23–38.
Polanyi, M. (1962). Personal knowledge. London: Routledge and Kegan Paul.
Rust, C., Price, M., & O’Donovan, R. (2003). Improving students’ learning by developing
their understanding of assessment criteria and processes. Assessment & Evaluation in
Higher Education, 28, 147–164.
Sadler, D. R. (1980). Conveying the findings of evaluative inquiry. Educational Evaluation and
Policy Analysis, 2(2), 53–57.
Sadler, D. R. (1981). Intuitive data processing as a potential source of bias in naturalistic
evaluations. Educational Evaluation and Policy Analysis, 3(4), 25–31.
Sadler, D. R. (1983). Evaluation and the improvement of academic learning. Journal of
Higher Education, 54, 60–79.
Sadler, D. R. (1985). The origins and functions of evaluative criteria. Educational Theory, 35,
285–297.
Sadler, D. R. (1987). Specifying and promulgating achievement standards. Oxford Review of
Education, 13, 191–209.
Sadler, D. R. (1989). Formative assessment and the design of instructional systems. Instruc-
tional Science, 18, 119–144.
Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in Education:
Principles, Policy & Practice, 5, 77–84.
Sadler, D. R. (2002). Ah! . . . So that’s ‘Quality’. In P. Schwartz & G. Webb (Eds.), Assessment:
case studies, experience and practice from higher education (pp. 130–136). London: Kogan
Page.
Sadler, D. R. (2005). Interpretations of criteria-based assessment and grading in higher
education. Assessment and Evaluation in Higher Education, 30, 175–194.
Stevens, D. D., & Levi, A. J. (2004). Introduction to rubrics: an assessment tool to save grading
time, convey effective feedback and promote student learning. Sterling, Va: Stylus
Publishing.
Suskie, L. (2004). Assessing student learning: A common sense approach. Boston, Mass: Anker
Publishing.
Walvoord, B. E., & Anderson, V. J. (1998). Effective grading: A tool for learning and assessment.
Etobicoke, Ontario: John Wiley.
Woolf, H. (2004). Assessment criteria: Reflections of current practices. Assessment & Evalua-
tion in Higher Education, 29, 51–493.
Chapter 5
Faulty Signals? Inadequacies of Grading Systems
and a Possible Response

Mantz Yorke

Grades are Signals

Criticism of grading is not new. Milton, Pollio and Eison (1986) made a fairly
forceful case that grading was suspect, but to little avail – not surprisingly, since
a long-established approach is difficult to dislodge. There is now more evidence
that grading does not do what many believe it does (especially in respect of
providing trustworthy indexes of student achievement), so the time may be ripe
for a further challenge to existing grading practices.
Grade is inherently ambiguous as a term, in that it can be used in respect of
raw marks or scores, derivatives of raw marks (in the US, for example, the letter
grades determined by conversion of raw percentages), and overall indexes of
achievement. In this chapter the ambiguity is not entirely avoided: however, an
appreciation of the context in which the term ‘grade’ is used should mitigate the
problem of multiple meaning.
Grades are signals of achievement which serve a number of functions includ-
ing the following:
 informing students of their strengths and weaknesses;
 informing academic staff of the success or otherwise of their teaching;
 providing institutions with data that can be used in quality assurance and
enhancement; and
 providing employers and others with information germane to recruitment.
For summative purposes, grading needs to satisfy a number of technical
criteria, amongst which validity and reliability are prominent. (For formative
purposes, the stringency of the criteria can be relaxed – a matter that will not be
pursued here.)
A detailed study of grading in Australia, the UK and the US (Yorke, 2008)
suggests that the signals from grading are not always clear – indeed, the stance
taken in this chapter is that they are generally fuzzy and lack robustness.

M. Yorke
Department of Educational Research, Lancaster University, LA1 4YD, UK
e-mail: mantzyorke@mantzyorke.plus.com

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 65


DOI: 10.1007/978-1-4020-8905-3_5, Ó Springer ScienceþBusiness Media B.V. 2009
66 M. Yorke

Threats to the Robustness of Grading

Felton and Koper (2005, p. 562) refer to grades as ‘‘inherently ambiguous


evaluations of performance with no absolute connection to educational
achievement’’. This chapter argues that the charge holds, save in a relatively
small number of circumstances where the intended learning and the assessment
process can be very tightly specified (some computer-marked work falls into
this category) – and even then there might be argument about the validity of the
sampling from the universe of possible learning outcomes.
What is the evidence for the charge?

Sampling

Ideally, assessment is based on a representative sampling of content reflecting


the expected learning outcomes for the curriculum. The word representative
indicates that not every aspect of the learning outcomes can be accommodated –
something that dawned rather slowly on those responsible for the system of
National Vocational Qualifications that was promoted in the early 1990s in the
UK, and which sought comprehensiveness of coverage (see Jessup, 1991).
Sampling from the curriculum reflects the preferences of interested parties –
of the institution and, in some instances, of stakeholders from outside. The
validity of the assessment depends on the extent to which the assessment process
captures the achievements expected in the curriculum – a perspective based on
internal consistency. Interested parties from outside the academy are likely to
be more interested in the extent to which the assessment outcomes predict
performance in, say, the workplace, which is a very different matter.
With increased interest by governments in the ability of graduates to demon-
strate their effectiveness in the world beyond academe, the assessment of work-
based and work-related learning is of increasing importance. However, such
assessment raises a number of challenges, some of which are discussed later in
this chapter: for now, all that is necessary is to note the complexity that this
introduces in respect of sampling and validity.

Grading Scales

There is considerable variation in grading scales, from the finely-grained per-


centage scale to the quite coarse scales of grade-points used in some Australian
universities. Where percentages are used, there seem to be norms relating to
both national and disciplinary levels. Regarding the national level, there is a
broad gradation from the high percentages typically awarded in US institutions
to the more modest percentages typical of practice in the UK, with Australian
practice lying in between. In the UK and Australia, the raw percentage is
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 67

typically used in classifying performances whereas in the US percentage scores


are converted into grade-points for cumulation into the grade-point average
(GPA). Studies undertaken by the Student Assessment and Classification
Working Group (SACWG) have drawn attention to the variation between
subjects regarding the distribution of percentages: science-based subjects tend
to have wider and flatter distributions than those in the humanities and social
sciences, for example1. Figure 5.1, which is based on marks in first-year modules

Business Statistics
12 12

10 10

8 8

6 6

4 4

2 2

0 0
1 11 21 31 41 51 61 71 81 91 1 11 21 31 41 51 61 71 81 91
Sociology Healthcare
12 12
10 10
8 8
6 6
4 4
2 2
0 0
1 11 21 31 41 51 61 71 81 91 1 11 21 31 41 51 61 71 81 91

Modern History Contract Law


12 12
26%
10 10

8 8

6 6

4 4

2 2

0 0
1 11 21 31 41 51 61 71 81 91 1 11 21 31 41 51 61 71 81 91

Fig. 5.1 Mark distributions for six first-year modules


Notes: The heights of the histogram bars are percentage frequencies, in order of facilitate
comparisons.
Vertical dotted lines indicate the median mark in each case.
In the Contract Law module a bare pass (40%) was awarded to 26 per cent of the student
cohort.

1
It is instructive to look at the statistics produced by the Higher Education Statistics Agency
[HESA] regarding the profiles of honours degree classifications for different subject areas in the
UK (see raw data for ‘Qualifications obtained’ at www.hesa.ac.uk/holisdocs/pubinfo/stud.htm
or, more revealingly, the expression of these data in percentage format in Yorke, 2008, p. 118).
68 M. Yorke

in a post-1992 university in the UK, illustrates something of the variability that


can exist in the distributions of marks awarded.
The three modules on the left-hand side of Fig. 5.1 evidence broadly similar
marking distributions, but the three on the right-hand side are quite different
(though all show some indication of the significance of the bare pass percentage
of 40 for grading practice). The distribution for Statistics uses almost the full
range of percentages, no doubt because it is easy to determine what is correct
and what is incorrect. In social sciences, matters are much less likely to be cut
and dried. The Healthcare module has attracted high marks (the influence of a
nurturing approach, perhaps?). The very large proportion of bare passes in Law
suggests that the approach adopted in marking is at some variance with that in
other subjects. In passing, it is worth noting that an early study by Yorke et al.
(1996) found student marks in Law tending to be lower than for some other
subjects – a finding that is reflected in the HESA statistics on honours degree
classifications, and is unlikely to be attributable to lower entry standards.
Figure 5.1 illustrates a more general point that is well known by those
attending examination boards involving multiple subjects: that is, that the
modules and subjects chosen by a student can have a significant influence on
the grades they achieve.
In the US, the conversion of raw percentages to grade-points allows some
mitigation of the inter-module and inter-subject variation.
One of the problems with percentage grading (which has attracted some
comment from external examiners) is the reluctance in subjects in the arts,
humanities and social sciences to award very high percentages. The issue is
most apparent in modular schemes where examination boards have to deal with
performances from a variety of subject areas. Some institutions in the UK have
adopted grading scales of moderate length (typically of between 16 and 25 scale-
points) in order to encourage assessors to use the top grades (where appro-
priate) and hence avoid the psychological ‘set’ of assuming that 100 per cent
implies absolute perfection. The issue is, of course, what signifies an excellent
performance for a student at the particular stage they have reached in their
programme of study. Whilst there is some evidence that scales of moderate
length do encourage greater use of the highest grades, it is unclear why this does
not carry over into the honours degree classification (Yorke et al., 2002).
A factor likely to exert some influence on the distribution of grades is the
extent to which assessment is norm- or criterion-referenced.

Norm- and Criterion-Referencing

The distinction between norm- and criterion-referenced assessment is often


presented as being fairly clear-cut. Conceptually, it is. It is when actual
assessment practice is examined that the sharpness of the distinction
becomes a blur. An outcomes-based approach to curriculum and pedagogy
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 69

is a modern, softer restatement of the ‘instructional objectives’ approach that


was most clearly expressed in the work of Mager (1962) and which led to the
promotion of mastery or competence-based learning. Under criterion-
referencing, every student could achieve the stated objectives to a high
standard: the consequence would be that grades could bunch at the top end
of the scale. The contrast with a grading system that judges a student’s
achievement against those of their peers, rather than in absolute terms, is
clear. There should be no surprise when norm-referencing and criterion-
referencing produce radically different distributions of grades. However,
the ‘upper-ended’ distributions possible under criterion-referencing do not
fit with sedimented expectations of a normal distribution of grades, and can
lead to unwarranted accusations of grade inflation.
The blurring of the boundary between norm- and criterion-referencing has a
number of possible causes, the first two of which are implicitly illustrated in
Exhibit 5.1 below:
 Criteria are often stated loosely.
 Criteria may be combined in variable ways.
 Stated criterion levels can be exceeded to varying extents by students.
 A grade distribution skewed towards the upper end may be seen as implau-
sible and hence may be scaled back to something resembling a norm-
referenced distribution (for example, this can happen in the US when a raw
percentage is converted into a letter grade).
 Assessors’ grading behaviour is tacitly influenced by norm-referencing.
 Different subject disciplines (or components of disciplines) adopt different
approaches to grading.
 Individuals may vary in the approaches they bring to grading (see Ekstrom &
Villegas, 1994, and Yorke, Bridges, & Woolf, 2000, for evidence on this
point).

Approaches to Grading

Some assessors approach grading analytically, specifying components of a per-


formance and building up a total mark from marks awarded for the individual
components. The components can be quite tightly specified (particularly in
science-based curricula) or more general. Hornby (2003) refers to this approach
as ‘menu marking’. Others adopt a holistic approach, judging the overall merit of
the work before seeking to ascribe values (numerical or verbal) to the component
parts. It is likely that assessment practice is not often as polarised as this, and that
it is a matter of tendency rather than absoluteness. One problem is that assess-
ment based on the two polar approaches can give significantly discrepant marks,
and there is some evidence that assessors juggle their marking until they arrive at
a mark or grade with which they feel comfortable (Baume & Yorke, with Coffey,
2004; Hawe, 2003; Sadler, Chapter 4). Assessors sometimes find that the learning
70 M. Yorke

objectives or expected outcomes specified for the work being marked do not
cover everything that they would like to reward (see Sadler, Chapter 4 and, for
empirical evidence, Webster, Pepper & Jenkins, 2000, who found assessors using
unarticulated criteria when assessing dissertations). Unbounded tasks, such as
creative work of varying kinds, are particularly problematic in this respect.
Some would like to add ‘bonus marks’ for unspecified or collateral aspects of
achievement. Walvoord and Anderson (1998), for example, suggest holding back
a ‘‘fudge factor’’ of 10 percent or so that you can award to students whose work shows a
major improvement over the semester. Or you may simply announce in the syllabus and
orally to the class that you reserve the right to raise a grade when the student’s work
shows great improvement over the course of the semester. (p. 99)

This suggestion confuses two purposes that assessment is expected to serve – to


record actual achievement and to encourage students. An extension of the
general point is the reluctance to fail students, often because the assessor
wants to give the student another chance to demonstrate that they really can
fulfil the expectations of the course (see, for example, Brandon & Davies, 1979;
Hawe, 2003). This may be more prevalent in subject areas relating to public
service (teaching, nursing and social work, for example) in which there tends to
be an underlying philosophy of nurturing, and in which an espoused commit-
ment to public service is regarded as a positive attribute.
Assessors’ approaches to grading are also influenced by the circumstances of
the assessment, such as the number of items to be marked, the time available,
whether all the submitted work is marked in one timeslot or spread out over a
number of slots, personal fatigue, and so on.

The Relationship Between Grade and Meaning

The relationship between the grade and its meaning is unclear. In general, a mark
or grade summarises a multidimensional performance, as Exhibit 5.1 indicates.

First Class

It is recognised in all marking schemes that there are several different ways of
obtaining a first class mark. First class answers are ones that are exceptionally
good for an undergraduate, and which excel in at least one and probably several
of the following criteria:
 comprehensive and accurate coverage of area;
 critical evaluation;
 clarity of argument and expression;
 integration of a range of materials;
 depth of insight into theoretical issues;
 originality of exposition or treatment.
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 71

Excellence in one or more of these areas should be in addition to the qualities


expected of an upper second.

Exhibit 5.1 An extract from a list of descriptors of levels of student achievement


Upper second class
Upper second class answers are a little easier to define since there is less variation between
them. Such answers are clearly highly competent and a typical one would possess the
following qualities:
 generally accurate and well-informed;
 reasonably comprehensive;
 well-organised and structured;
 displaying some evidence of general reading;
 evaluation of material, though these evaluations may be derivative;
 demonstrating good understanding of the material;
 clearly presented.
Source: Higher Education Quality Council (1997a, p. 27).

It is noticeable that the expansiveness of the criteria tails off as the level of
performance decreases, reflecting an excellence minus approach to grading that
often appears in lists of this type. The criteria for the lower levels of passing are
inflected with negativities: for example, the criteria for third class in this
example are all stated in terms of inadequacy, such as ‘‘misses key points of
information’’. An outsider looking at these criteria could be forgiven for assum-
ing that third class was equivalent to failure. In contrast, what might a threshold
plus approach look like?
Performances tend not to fit neatly into the prescribed ‘boxes’ of criteria,
often exceeding the expectations established in terms of some criteria and
failing to reach others. Whilst labelling some criteria as essential and others as
desirable helps, it does not solve all the problems of assigning a grade to a piece
of work. Some have thought to apply fuzzy set theory to assessment, on the
grounds that this allows a piece of work to be treated as having membership,
with differing levels of intensity, of the various grading levels available
(see relatively small scale research studies by Biswas, 1995; and Echauz &
Vachtsevanos, 1995, for example). Whilst this has conceptual attractiveness,
the implementation of the approach in practical settings offers challenges that
are very unlikely to be met.
A further issue has echoes of Humpty Dumpty’s claim regarding language in
Alice’s Adventures in Wonderland – words varying in meaning according to the
user. Webster et al. (2000) found evidence of the different meanings given to terms
in general assessment usage, exemplifying the variation in academics’ use of
analysis and evaluating. There is a further point which has an obvious relevance
to student learning – that the recipients of the words may not appreciate the
meaning that is intended by an assessor (e.g., Chanock, 2000). If one of the aims
of higher education is to assist students to internalise standards, feedback com-
ments may not be as successful as the assessor may imagine.
72 M. Yorke

More generally, a number of writers (e.g. Sadler, 1987, 2005; Tan & Prosser,
2004; Webster et al., 2000; Woolf, 2004) have pointed to the fuzziness inherent
in constructs relating to criteria and standards, which suggests that, collectively,
conceptions of standards of achievement may be less secure than many would
prefer.

The Nature of the Achievement

A grade by itself says nothing about the kind of performance for which it was
awarded – an examination, an essay-type assignment, a presentation, and so on.
Bridges et al. (2002) presented evidence to support Elton’s (1998) contention
that rising grades in the UK were to some extent associated with a shift in
curricular demand from examinations towards coursework. Further, as Knight
and Yorke (2003, pp. 70–1) observe, a grade tells nothing of the conditions
under which the student’s performance was achieved. For some, the perfor-
mance might have been structured and ‘scaffolded’ in such a way that the
student was given a lot of guidance as to how to approach the task. Other
students might have achieved a similar level of achievement, but without much
in the way of support. Without an appreciation of the circumstances under
which the performance was achieved, the meaning – and hence the predictive
validity – of the grade is problematic.
The problem for the receiver of the signal is that they have little or no
information about how the grading was performed, and hence are unable to
do other than come to an appreciation of a student’s performance that is
influenced by their own – possibly faulty – presuppositions about grading.

The Cumulation of Grades

Many set considerable store by the overall grade attained by a student,


whether in the form of a GPA or an honours degree classification. As noted
above, what is often overlooked is that the profile of results obtained by a
student may contain performances of very different character: for example, in
Business Studies there may well be a mixture of essay-type assessments,
analyses of case study material, exercises based on quantitative methods,
and presentations of group work. Performances can vary considerably
according to the nature of the task.
The overall grade is often determined by averaging the percentages or
grades awarded for the curriculum components that have been taken. This
makes the assumption that these scores can be treated as if they were consis-
tent with an interval scale, on which Dalziel (1998) has cast considerable
doubt. Some institutions in the UK avoid the metrical difficulties, noted by
Dalziel, of averaging or summation by using ‘mapping rules’ for converting
the profile of module grades into an overall honours degree classification – for
example, by stipulating that a majority of module grades must reach the level
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 73

of the classification under consideration whilst none of the remainder falls


below a specified level.
In addition to the metrical problems, other factors weaken the validity of the
overall index of performance. Astute students can build into their programmes
the occasional ‘easy’ module of study (see Rothblatt’s, 1991, p. 136, comment
on this), or – in the US – can opt to take some modules on a pass/fail assessment
basis, knowing that undifferentiated pass grades are not counted in the compu-
tation of GPA. The differences between subjects regarding the awarding of
grades were noted earlier. Suppose a student were to take a joint honours
programme in computing and sociology, where the former is associated with
wide mark spreads and the latter with narrow spreads. Assume that in one
subject the student performs well whereas in the other the student produces a
middling performance. If they do well in computing, the ‘leverage’ on the
overall grade of the computing marks may be sufficient to tip the balance in
favour of a higher classification, whereas if the stronger performance is in
sociology the leverage of the better performance is may be insufficient to tip
the balance in the same way.2
Lastly, institutions vary considerably in the general methodologies they use
when cumulating achievements into a single index. Whilst most institutions in
the US use the GPA, those in Australia and the UK have varying approaches.3
Below the level of the general methodology, sources of variation include
 the modules that ‘count’ in the determining of an overall index of
achievement;
 rules for the inclusion of marks/grades for retaken modules;
 weighting of performances at different levels of the programme; and
 approaches to dealing with claims from students for discretionary treat-
ment because of personal circumstances that may have affected their
performance.4

Grade Inflation

Grade inflation is widely perceived to be a problem, though the oft-expressed


concern has to be tempered by the kind of cool analysis provided by Adelman
(forthcoming). Adelman’s analysis of original transcript data suggests that
commentators may be allowing themselves to be led by data from elite institu-
tions, whereas data from other institutions (which make up the vast bulk of US

2
If the comparison involves a poor and a middling performance, then correspondingly the
subject with the wider spread will exert the stronger effect on the overall classification.
3
See AVCC (2002) in the case of Australia and also Yorke (2008) for details of the UK and a
summary of practice in Australia and the US.
4
The relevance of these depends on the system being considered: all do not apply to all
systems. Brumfield (2004) demonstrates in some detail the variation within higher education
in the US.
74 M. Yorke

Table 5.1 Possible causes of rising grades


Causes of grade inflation Causes of non-inflationary rises in grades
Grading practices Curriculum design (especially the approach
to assessment)
Students making ‘strategic’ choices Students making ‘strategic’ choices
regarding their programmes of study regarding their programmes of study
Easing of standards Improved teaching
Avoidance of awarding low grades Improved student motivation and/or
learning
Giving students a ‘helping hand’ Changes in participation in higher education
Student evaluations of teaching
Political and economic considerations at the
level of the institution
Maintaining an institution’s position vis-à-
vis others

higher education) are much less amenable to the charge of grade inflation.
A similar conclusion was reached by Yorke (2008) for institutions in England,
Wales and Northern Ireland. Whatever the merits of the argument regarding
grade inflation, the problem is that those outside the system find difficulty in
interpreting the signals that grading sends.
Rising grades are not necessarily an indication of grade inflation. Space
considerations preclude a detailed discussion of this issue which is treated at
length in Yorke (2008). Table 5.1 summarises distinctions between causes of
rising grades that may or may not be inflationary, some of which have already
been flagged. Note that student choice appears in both columns, depending on
the view one takes regarding students’ exercise of choice.

The Political Context


This brief critique of grading has, up to this point, largely ignored the context
of contemporary higher education. Governments around the world, drawing
on human capital theory, emphasise the links between higher education
and economic success, although there are differences in the extent to which
they press higher education institutions to incorporate an explicit focus on
workforce development or employability. Following the Dearing Report
(National Committee of Inquiry into Higher Education, 1997) in the UK
there has been increased political interest in the preparation of graduates to
play a part in the national economic life.5 This can be seen in exhortations
to institutions to involve students in work-based learning, and to encourage

5
This is not new, since the point was made in the Robbins Report of 1963 (Committee on
Higher Education, 1963).
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 75

students to document and reflect on their developing capabilities in the two


spheres of academic study and work environments. Whilst institutions have
responded, some within them find the direct link with employment to be
unpalatable, seeing it as some sort of surrender by higher education to the
demands of employers or as part of a narrowly-conceived ‘skills agenda’.
However, the Enhancing Student Employability Co-ordination Team in the
UK (ESECT) and others have made a specific case for relating employability
to the kind of ‘good learning’ that would generally be seen as appropriate in
higher education6. There has been some shift of emphasis in many first degree
programs in the UK in the direction of achievements that are not disciplin-
ary-specific but, in a lifelong learning perspective, this is arguably of less
significance than may be believed (Yorke, 2003).

Rebalancing the Demand

This blurring of the distinction between academic study and the workplace
implies a rebalancing of the demands on students, especially in respect of
bachelors degrees but also in the more overtly work-based foundation degrees7
recently developed in some institutions in the UK. The late Peter Knight
referred to the work arena8 as demanding wicked competences which are
‘‘achievements that cannot be neatly pre-specified, take time to develop and
resist measurement-based approaches to assessment’’ (Knight, 2007a, p. 2).
As Knight pointed out, a number of aspects of employability could be
described as wicked competences (see the list in Yorke & Knight, 2004/06,
p. 8), as could the components of ‘graduateness’ listed by the Higher Education
Quality Council (1997b, p. 86).
Whilst the rebalancing of content and process is contentious to some, the
greater (but perhaps less recognised) challenge lies in the area of assessment.
Assessment of performance in the workplace is particularly demanding since it
is typically multidimensional, involving the integration of performances. Eraut
(2004) writes, in the context of medical education but clearly with wider
relevance:
treating [required competences] as separate bundles of knowledge and skills for assess-
ment purposes fails to recognize that complex professional actions require more than
several different areas of knowledge and skills. They all have to be integrated together
in larger, more complex chunks of behaviour. (p. 804)

6
There is a developing body of literature on this theme, including the collection of resources
available on the website of the Higher Education Academy (see www.heacademy.ac.uk/
resources/publications/learningandemployability) and Knight and Yorke (2003, 2004).
7
Two years full time equivalent, with a substantial amount of the programme being based in
the workplace (see Higher Education Funding Council for England, 2000).
8
He also pressed the relevance of ‘wicked’ competences to studying in higher education, but
this will not be pursued here.
76 M. Yorke

It is perhaps not unreasonable to see in Eraut’s comment an assessment-


oriented analogue of the ‘Mode 2’ production of knowledge via multi-disciplinary
problem-solving, which is contrasted with ‘Mode 1’ in which problems are
approached with reference to the established theoretical and empirical knowledge-
base of a single discipline (for an elaboration, see Gibbons et al., 1994).
Knight (2007a) investigated perceptions of informants in six subject areas
(Accounting; Early Years Teaching; Nursing; Secondary Teaching; Social
Work and Youth Work) regarding the following, all of which are characterised
by their complexity:
 developing supportive relationships;
 emotional intelligence;
 group work;
 listening and assimilating;
 oral communication;
 professional subject knowledge;
 relating to clients;
 self-management (confidence and effectiveness); and
 ‘taking it onwards’ – acting on diagnoses (in social work).
Knight was surprised that the general perception of his informants was that
these were not difficult to assess. Subsequent interviews with a sub-sample did not
allow Knight to rule out the possibility that the methodology he used was
inadequate, but the more likely explanation is that the respondents believed
(rather unquestioningly) that the assessment methods they used were adequate
for their purposes. It is, after all, difficult to place long-established practices under
critical scrutiny unless something fairly drastic happens to precipitate action.
Assessment of work-based learning cannot be subject to the same degree of
control as assessment of discipline-specific studies. Expected learning outcomes
for work-based activity can be stated in only general terms since workplace
situations vary considerably. Hence there is a requirement for the assessor to
exercise professional judgement regarding the student’s performance in the
workplace, and the potential tension in tutor-student relationships is acknowl-
edged. The assessment situation may become more fraught if workplace assess-
ment involves a person in authority over the placement student, and especially
so if they double up the role of mentor and assessor.
One of Knight’s recommendations following his study of wicked compe-
tences is that
Any interventions to enhance the assessment of ‘wicked’ competences should begin
by helping colleagues to appreciate the inadequacies of current practices that are
typically – and wrongly – assumed to be ‘good enough’. This is a double challenge for
innovators. Not only does assessment practice have to be improved, but colleagues
need to be convinced of the need to improve it in the first place. (Knight, 2007a, p. 3)

The argument above regarding the necessity of assessing performances in the


workplace on a post hoc basis, but with reference to broad criteria, has a much
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 77

wider applicability. Those involved in assessing creative achievements have long


acknowledged the point, with Eisner’s (1979) work on ‘connoisseurship’ being
influential. However, when assessments in higher education are analysed, at
their heart is generally found some sort of professional judgement. Marks and
grades are typically signals of professional judgement, rather than measure-
ments. In agglomerating varied aspects of performance in a single index, such
as the mark or grade for a module, or for a collection of modules, they conceal
rather than reveal, and are of only limited value in conveying where the student
has been particularly successful and where less successful. This poses a problem
for the world outside the academy: how might it become better informed as to
graduates’ achievements?

What Can be Warranted?

Knight (2006) made a powerful case that summative assessments were ‘local’ in
character, meaning that they related to the particular circumstances under
which the student’s performance was achieved. Hence any warranting by the
institution would not necessarily be generalisable (pp. 443–4). He also argued
that there were some aspects of performance that the institution would not be in
any position to warrant, though it might be able to attest that the student did
undertake the relevant activity (for example, a placement).9
Part of the problem of warranting achievement is because of a widely-held,
but often unacknowledged, misperception that assessments are tantamount to
measurements of student achievement. This is particularly detectable in the
insouciant way in which marks or grades are treated as steps on interval scales
when overall indices of achievement are being constructed. However, as Knight
(2006, p. 438) tartly observes, ‘‘True measurement carries invariant meanings’’.
The local character of grades and marks, on his analysis, demolishes any
pretence that assessments are measurements.
Knight is prepared to concede that, in subjects based on science or mathe-
matics, it is possible to determine with accuracy some aspects of student
performance (the kinds of achievement at the lower end of the Bloom, 1956;
and Anderson & Krathwohl, 2001, taxonomies, for instance). However, as soon
as these achievements become applied to real-life problems, wicked compe-
tences enter the frame, and the assessment of achievement moves from what
Knight characterises as ‘quasi-measurement’ to judgement. Where the subject
matter is less susceptible to quasi-measurement, judgement is perforce to the
fore, even though there may be some weak quasi-measurement based on mark-
ing schemes or weighted criteria.

9
See also Knight and Yorke (2003, p. 56) on what can be warranted by an institution, and
what not.
78 M. Yorke

Knight’s (2007a, 2007b) line regarding summative assessment is basically


that, because of the difficulties in warranting achievement in a manner that
allows the warrant some transfer-value to different contexts, attention should
be concentrated on the learning environment of the student. A learning envir-
onment supportive of the development of wicked competences could be
accorded greater value by employers than one that was less supportive.
Knight’s argument is, at root, probabilistic, focusing on the chances of students
developing the desired competences in the light of knowledge of the learning
environment.

A Summary of the Position so Far

1. Marks and grades are, for a variety of reasons, fuzzier signals than many
believe them to be.
2. They signal judgements more than they act as measures.
3. Whilst judgements may have validity, they will often lack in precision.
4. Judgements may be the best that assessors can achieve in practical situations.
5. If assessments generally cannot be precise, then it is necessary to rethink the
approach to summative assessment.

Imagining an Alternative

The Honours Degree Classification in the UK

Over the past couple of years, debate has taken place in the UK about the utility
of the honours classification system that has long been in use for bachelors
degrees (Universities UK & GuildHE, 2006; Universities UK & Standing Con-
ference of Principals, 2004, 2005). This has led to suggestions that the honours
degree classification is no longer appropriate to a massified system of higher
education, and that it might be replaced by a pass/fail dichotomisation, with the
addition of a transcript of the student’s achievement in curricular components
(this is broadly similar to the Diploma Supplement adopted across the Eur-
opean Union).10 However, the variously constituted groups considering the
matter have not concentrated their attention on the assessments that are cumu-
lated into the classification.

10
‘Pass/not pass’ is probably a better distinction, since students who do not gain the required
number of credits for an honours degree would in all probability have gained a lesser number
of credits (which is a more positive outcome than would be signified by ‘fail’) There has
subsequently been a retreat from the conviction that the classification was no longer fit for
purpose (see Universities UK & GuildHE, 2007).
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 79

Abandoning Single Indexes of Achievement

Imagine that there were a commission of inquiry into summative assessment


practice which concluded that the kinds of summative assessment typically used
in higher education were for various reasons11 insufficiently robust for the
purposes to which they are put and, further, that increases in robustness
could not be obtained without a level of cost that would be prohibitive. In the
light of this hypothetical review, it would be readily apparent that not much
would be gained by tinkering with existing assessment methodologies and that a
radically different approach might offer the prospect of progress.
Whilst Knight’s (2007a, 2007b) line on assessment – that the focus should be
on the learning context rather than straining at the gnat of warranting indivi-
duals’ achievements – makes sense at the conceptual level, its adoption would
probably not satisfy external stakeholders. Unless there were a close link
between stakeholder and institution (likely in only some specific cases), the
stakeholders may be ill-informed about the learning environment and hence
revert to using the public reputation of the institution as the criterion, thereby
unquestioningly reinforcing the reputational status quo. In practice, they are
likely to call for something ‘closer to the action’ of student performance. Three
sources of information about student performance are
 assessments of various kinds that have been undertaken within the
institution;
 judgements made by assessors regarding workplace performance (the asses-
sors might come from the institution and/or the workplace); and
 information from students themselves regarding their achievements, backed
up by evidence.
Whilst all of these will be fallible in their different ways, it may be possible for
an employer (say) to develop an appreciation of an applicant’s strengths and
weaknesses through qualitative triangulation. This would be a more costly and
time-consuming process than making an initial sifting based on simplistic para-
meters such as overall grading and/or institution attended, followed by a finer
sifting and interview. Hence the proposition might be given an abrupt rejection
on the grounds of impracticability. However, those with a wider perspective
might take a different view, seeing that the costs could be offset in two ways:
 the employer would probably develop a more appropriate shortlist from
which to make the final selection, and
 the chances of making an (expensive) inappropriate selection would be
decreased.
A couple of other considerations need to be mentioned. First, professional
body accreditation might be an issue. However, if the profile of assessments

11
Some of these have been mentioned above. A fuller analysis can be found in Yorke (2008).
80 M. Yorke

(both in curricular design and student achievement) were demonstrably met,


then accreditation should in theory not be a difficulty.
Second, with globalisation leading to greater mobility of students, the
transfer-value of achievements could be a problem. However, there are pro-
blems already. National systems have very varied approaches to grading
(Karran, 2005). The European Credit Transfer and Accumulation System
(ECTS) bases comparability on norm-referencing whereas some institutions
adopt a predominantly criterion-referenced approach to assessment. Further,
international conversion tables prepared by World Education Services may
mislead12 (as an aside, a passing grade of D from the US does not figure in
these tables). It might make more sense, for the purpose of international
transfer, to adopt a pass/not pass approach supported by transcript evidence
of the person’s particular strengths instead of trying to align grade levels from
different systems.

Claims-Making

Following Recommendation 20 of the Dearing Report (National Committee of


Inquiry into Higher Education, 1997, p. 141), students in the UK are expected
to undertake personal development planning [PDP] and to build up a portfolio
of achievements to which they can refer when applying for jobs. At present,
uptake has been patchy, partly because academics and students tend to see this
as ‘just another chore’, and because the latter see no direct relationship to
learning and assessment, and hence the benefit is seen as slender when com-
pared with the effort required. (One student, responding to a survey of the first
year experience in the UK wrote: ‘‘I . . . felt the PDP compulsory meetings were a
total waste of time – sorry!’’)
If, however, students were required to make a claim for their award, rather
than have the award determined by some computational algorithm, PDP would
gain in potency. The requirement would encourage the metacognitive activities
of reflection and self-monitoring (see Sadler, Chapter 4). Claims-making would
also restore to summative assessment a programme-wide perspective on learn-
ing and achievement that has tended to get lost in the unitisation of modular
schemes in the UK.
Requiring students to claim for their award would, in effect, ask the student
to answer the question: ‘‘How have you satisfied, through your work, the aims
stated for your particular programme of study?’’ (Yorke, 1998, p. 181)13. The

12
See the tables available by navigating from www.wes.org/gradeconversionguide/ (retrieved
August 8, 2007). Haug (1997) argues, rather as Knight (2006) does in respect of ‘local’
assessment, that a grade has to be understood in the context of the original assessment system
and the understanding has to be carried into the receiving system. This requires more than a
‘reading off’ of a grade awarded on a scale in one system against the scale of the other, since
simplistic mathematical conversions may mislead.
13
The argument is elaborated in Knight and Yorke (2003, pp. 159ff).
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 81

multidimensionality of ‘graduateness’ as depicted by the Higher Education


Quality Council (1997b) and in the literature on employability suggests that
students from the same cohort might make quite different cases for their award
whilst fulfilling the broad expectations set out for it. For example, one might
centre the claim on a developed capacity to relate the disciplinary content to
practical situations whereas another might opt to make a case based on high
levels of academic achievement.
A student’s claim could be required to consist of not only their record of
achievements in curricular components,14 but also to incorporate evidence from
learning experiences such as those generated through work placement. Wicked
competences could be more clearly brought into the picture. The preparation of
a claim would assist the student in making applications for jobs (as most will
want to do), and the institution in preparing supporting references. When the
claim is used prospectively, as in applying for a job, relevant extra-curricular
experience could also be brought into play since this could indicate to a
potential employer some attributes and achievements that it might value but
which are not highlighted in the higher education experience.
The claims-making approach is not limited to students who enter higher
education straight from school, since it can be adapted to the needs of older
students who bring greater life-experience to their studies. A mind-set on
graduates aged around 21 needs to be avoided, particularly in the UK since
demographic data suggest a fairly sharp decline in the number of people aged 18
from the year 2011 (Bekhradnia, 2006), and hence imply the possibility of a
system-wide entry cohort of rising age, with a greater number bringing to their
studies experience of employment that is more than casual in character.
One point (similar to that which Sadler makes in Chapter 4) needs to be made
here – that the formalisation of claims-making would need to be no more
onerous in toto than the assessment approach it replaced. The claims procedure
would need to be streamlined so that assessors were not required, unless there
were a particular reason to do so, to work through a heap of evidence in a
personal portfolio. Examination boards might require less time to conduct their
business, and the role of external examiners (where these exist) would need to be
reviewed.

Claims-Making and Learning

It is only now that this tributary of argument joins the main flow of this
book. Summative assessment is very often focused on telling students how
well they have achieved in respect of curricular intentions – in effect, this is a
judgement from on high, usually summarised in a grade or profile of grades.

14
In the interests of making progress, a number of quite significant reservations regarding the
robustness of summative assessment of academic achievement would need to be set aside.
82 M. Yorke

Claims-making would bring students more into the picture, in that they would
have to consider, on an ongoing basis and at the end of their programmes, the
range of their achievements (some – perhaps many – of which will as a matter of
course be grades awarded by academics). They would have to reflect on what
they have learned (or not), the levels of achievement that they have reached, and
on what these might imply for their futures. Knowing that an activity of this
sort was ‘on the curricular agenda’ would prompt the exercise of reflectiveness
and self-regulation which are of enduring value.
In a context of lifelong learning, would not the involvement of students in
claims-making be more advantageous to all involved, than students merely
being recipients of ex cathedra summative judgements?

References
Adelman, C. (forthcoming). Undergraduate grades: A more complex story than ‘inflation’. In
L. H. Hunt (Ed.), Grade inflation and academic standards. Albany: State University of
New York Press.
Anderson, L. W., & Krathwohl, D. R. (2001). A taxonomy for learning, teaching and assess-
ment. New York: Addison Wesley Longman.
AVCC [Australian Vice Chancellors’ Committee]. (2002). Grades for honours programs (con-
current with pass degree), 2002. Retrieved October 10, 2007, from http://www.avcc.edu.au/
documents/universities/key_survey_summaries/Grades_for_Degree_Subjects _Jun02.xls
Baume, D., & Yorke, M., with Coffey, M. (2004). What is happening when we assess, and how
can we use our understanding of this to improve assessment? Assessment and Evaluation in
Higher Education, 29(4), 451–77.
Bekhradnia, B. (2006). Demand for higher education to 2020. Retrieved October 14, 2007, from
http://www.hepi.ac.uk/downloads/22DemandforHEto2020.pdf
Biswas, R. (1995). An application of fuzzy sets in students’ evaluation. Fuzzy sets and systems,
74(2), 187–94.
Bloom, B. S. (1956). Taxonomy of educational objectives, Handbook 1: Cognitive domain.
London: Longman.
Brandon, J., & Davies, M. (1979). The limits of competence in social work: The assessment of
marginal work in social work education. British Journal of Social Work, 9(3), 295–347.
Bridges, P., Cooper, A., Evanson, P., Haines, C., Jenkins, D., Scurry, D., Woolf, H., & Yorke,
M. (2002). Coursework marks high, examination marks low: Discuss. Assessment and
Evaluation in Higher Education, 27(1), 35–48.
Brumfield, C. (2004). Current trends in grades and grading practices in higher education:
Results of the 2004 AACRAO survey. Washington, DC: American Association of Collegi-
ate Registrars and Admissions Officers.
Chanock, K. (2000). Comments on essays: Do students understand what tutors write?
Teaching in Higher Education, 5(1), 95–105.
Committee on Higher Education. (1963). Higher education [Report of the Committee
appointed by the Prime Minister under the chairmanship of Lord Robbins, 1961–63].
London: Her Majesty’s Stationery Office.
Dalziel, J. (1998). Using marks to assess student performance: Some problems and alterna-
tives. Assessment and Evaluation in Higher Education, 23(4), 351–66.
Echauz, J. R., & Vachtsevanos, G. J. (1995). Fuzzy grading system. IEEE Transactions on
Education, 38(2), 158–65.
5 Faulty Signals? Inadequacies of Grading Systems and a Possible Response 83

Eisner, E. W. (1979). The educational imagination: On the design and evaluation of school
programs. New York: Macmillan.
Ekstrom, R. B., & Villegas, A. M. (1994). College grades: An exploratory study of policies and
practices. New York: College Entrance Examination Board.
Elton, L. (1998). Are UK degree standards going up, down or sideways? Studies in Higher
Education, 23(1), 35–42.
Eraut, M. (2004). A wider perspective on assessment. Medical Education, 38(8), 803–4.
Felton, J., & Koper, P. T. (2005). Nominal GPA and real GPA: A simple adjustment that
compensates for grade inflation. Assessment and Evaluation in Higher Education, 30(6),
561–69.
Gibbons, M., Limoges, C., Nowotny, H., Schwartzman, S., Scott, P., & Trow, M. (1994). The
new production of knowledge: The dynamics of science and research in contemporary
societies. London: Sage.
Haug, G. (1997). Capturing the message conveyed by grades: Interpreting foreign grades.
World Education News and Reviews, 10(2), 12–17.
Hawe, E. (2003). ‘It’s pretty difficult to fail’: The reluctance of lecturers to award a failing
grade. Assessment and Evaluation in Higher Education, 28(4), 371–82.
Higher Education Funding Council for England. (2000). Foundation degree prospectus.
Bristol: Higher Education Funding Council for England. Retrieved October 14, 2007,
from http://www.hefce.ac.uk/pubs/hefce/2000/00_27.pdf
Higher Education Quality Council. (1997a). Assessment in higher education and the role of
‘graduateness’. London: Higher Education Quality Council.
Higher Education Quality Council. (1997b). Graduate standards programme: Final report
(2 vols). London: Higher Education Quality Council.
Hornby, W. (2003). Assessing using grade-related criteria: A single currency for universities?
Assessment and Evaluation in Higher Education, 28(4), 435–54.
Jessup, G. (1991). Outcomes: NVQs and the emerging model of education and training. London:
Falmer.
Karran, T. (2005). Pan-European grading scales: Lessons from national systems and the
ECTS. Higher Education in Europe, 30(1), 5–22.
Knight, P. (2006). The local practices of assessment. Assessment and Evaluation in Higher
Education, 31(4), 435–52.
Knight, P. (2007a). Fostering and assessing ‘wicked’ competences. Retrieved October 10, 2007,
from http://www.open.ac.uk/cetl-workspace/cetlcontent/documents/460d1d1481d0f.pdf
Knight, P. T. (2007b). Grading, classifying and future learning. In D. Boud & N. Falchikov
(Eds.), Rethinking assessment in higher education: Learning for the longer term (pp. 72–86).
Abingdon, UK: Routledge.
Knight, P. T., & Yorke, M. (2003). Assessment, learning and employability. Maidenhead, UK:
Society for Research in Higher Education and the Open University Press.
Knight, P., & Yorke, M. (2004). Learning, curriculum and employability in higher education.
London: RoutledgeFalmer.
Mager, R.F. (1962). Preparing objectives for programmed instruction. Belmont, CA: Fearon.
Milton, O., Pollio, H. R., & Eison, J. (1986). Making sense of college grades. San Francisco,
CA: Jossey-Bass.
National Committee of Inquiry into Higher Education (1997). Higher education in the learning
society. NICHE Publications, Middlesex.
Rothblatt, S. (1991). The American modular system. In R. O. Berdahl, G., C. Moodie, &
I. J. Spitzberg, Jr. (Eds.), Quality and access in higher education: Comparing Britain and the
United States (pp. 129–141). Buckingham, England: SRHE and Open University Press.
Sadler, D. R. (1987). Specifying and promulgating achievement standards. Oxford Review of
Education, 13(2), 191–209.
Sadler, D. R. (2005). Interpretations of criteria-based assessment and grading in higher
education. Assessment and Evaluation in Higher Education, 30(2), 176–94.
84 M. Yorke

Tan, K. H. K., & Prosser, M. (2004). Qualitatively different ways of differentiating student
achievement: A phenomenographic study of academics’ conceptions of grade descriptors.
Assessment and Evaluation in Higher Education, 29(3), 267–81.
Universities UK & GuildHE. (2006). The UK honours degree: provision of information – second
consultation. London: Universities UK & GuildHE. Retrieved October 6, 2007, from
http://www.universitiesuk.ac.uk/consultations/universitiesuk/
Universities UK & GuildHE. (2007). Beyond the honours degree classification: The Burgess
Group final report. London: Universities UK and GuildHE. Retrieved 17 October, 2007,
from http://bookshop.universitiesuk.ac.uk/downloads/Burgess_final.pdf
Universities UK & Standing Conference of Principals. (2004). Measuring and recording
student achievement. London: Universities UK & Standing Conference of Principals.
Retrieved October 5, 2007, from http://bookshop.universitiesuk.ac.uk/downloads/
measuringachievement.pdf
Universities UK & Standing Conference of Principals. (2005). The UK honours degree: provision
of information. London: Universities UK & Standing Conference of Principals. Retrieved
October 5, 2007, from http://www.universitiesuk.ac.uk/consultations/universitiesuk/
Walvoord, B. E., & Anderson, V. J. (1998). Effective grading: A tool for learning and assess-
ment. San Francisco: Jossey-Bass.
Webster, F., Pepper, D., & Jenkins, A. (2000). Assessing the undergraduate dissertation.
Assessment and Evaluation in Higher Education, 25(1), 71–80.
Woolf, H. (2004). Assessment criteria: Reflections on current practices. Assessment and
Evaluation in Higher Education, 29(4), 479–93.
Yorke, M. (1998). Assessing capability. In J. Stephenson & M. Yorke (Eds.), Capability and
quality in higher education (pp. 174–191). London: Kogan Page.
Yorke, M. (2003). Going with the flow? First cycle higher education in a lifelong learning
context. Tertiary Education and Management, 9(2), 117–30.
Yorke, M. (2008). Grading student achievement: Signals and shortcomings. Abingdon, UK:
RoutledgeFalmer.
Yorke, M., Barnett, G., Bridges, P., Evanson, P., Haines, C., Jenkins, D., Knight, P.,
Scurry, D., Stowell, M., & Woolf, H. (2002). Does grading method influence honours
degree classification? Assessment and Evaluation in Higher Education, 27(3), 269–79.
Yorke, M., Bridges, P., & Woolf, H. (2000). Mark distributions and marking practices in UK
higher education. Active Learning in Higher Education, 1(1), 7–27.
Yorke, M., Cooper, A., Fox, W., Haines, C., McHugh, P., Turner, D., & Woolf, H. (1996).
Module mark distributions in eight subject areas and some issues they raise. In N. Jackson
(Ed.), Modular higher education in the UK (pp. 105–7). London: Higher Education Quality
Council.
Yorke, M., & Knight, P. T. (2004/06). Embedding employability into the curriculum. York,
England: The Higher Education Academy. Retrieved October 14, 2007, from http://www.
heacademy.ac.uk/assets/York/documents/ourwork/tla/employability/id460_embedding_
employability_into_the_curriculum_338.pdf
Chapter 6
The Edumetric Quality of New Modes
of Assessment: Some Issues and Prospects

Filip Dochy

Introduction

Assessment has played a crucial role in education and training since formal
education commenced. Certainly, assessment of learning has been seen as the
cornerstone of the learning process since it reveals whether the learning process
results in success or not. For many decades, teachers, trainers and assessment
institutes were the only partners seen as crucial in the assessment event.
Students were seen as subjects who were to be tested without having any
influence on any other aspect of the assessment process. Recently, several
authors have called our attention to what is often termed ‘new modes of
assessment’ and ‘assessment for learning’. They stress that assessment can be
used as a means to reinforce learning, to drive learning and to support learning –
preferably when assessment is not perceived by students as a threat, an event
they have to fear, the sword of Damocles. These authors also emphasize that
the way we assess students should be congruent with the way we teach and the
way students learn within a specific learning environment. As such, the ‘new
assessment culture’ makes a plea for integrating instruction and assessment.
Some go even further: students can play a role in the construction of assessment
tasks, the development of assessment criteria, and the scoring of performance
can be shared amongst students and teachers. New modes of assessment that
arise from such thinking are, for example, 908 or 1808 feedback, writing sam-
ples, exhibitions, portfolio assessments, peer- and co-assessment, project and
product assessments, observations, text- and curriculum-embedded questions,
interviews, and performance assessments. It is widely accepted that these new
modes of assessment lead to a number of benefits in terms of the learning
process: encouraging thinking, increasing learning and increasing students’
confidence (Falchikov, 1986, 1995).

F. Dochy
Centre for Educational Research on Lifelong Learning and Participation, Centre
for Research on Teaching and Training, University of Leuven, Leuven, Belgium
e-mail: filip.dochy@ped.kuleuven.be

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 85


DOI: 10.1007/978-1-4020-8905-3_6, Ó Springer ScienceþBusiness Media B.V. 2009
86 F. Dochy

However, the scientific measurement perspectives of both teachers and


researchers, stemming from the earlier, highly consistent framework, where
assessment was separate from instruction and needed to be uniformly adminis-
tered, can form a serious hindrance to a wider introduction of these new
assessment methods. As Shepard (1991) and Segers, Dochy and Cascallar
(2003) indicated, instruction derives from the emergent constructivist para-
digm, while testing has its roots in older paradigms. So, we argue, the tradi-
tional criteria for evaluating the quality of assessment need to be critically
revised. The question that needs to be asked is whether we can maintain
traditional concepts such as reliability and validity, or if these concepts need
to be considered more broadly in harmony with the development of the new
assessment contexts.
In this chapter, attention is paid to the new evolution within assessment and
especially to the consequences of this new approach to screening the edumetric
quality of educational assessment.

About Testing: What has Gone Wrong?

A quick review of the research findings related to traditional testing and its
effects shows the following.
First of all, a focus on small scale classroom assessment is needed instead of a
focus on large scale high stakes testing. Gulliksen (1985), after a long career in
measurement, stated that this differentiation is essential: ‘‘I am beginning to
believe that the failure to make this distinction is responsible for there having
been no improvement, and perhaps even a decline, in the quality of teacher-
made classroom tests. . .’’ (p. 4). Many investigations have now pointed to the
massive disadvantages of large scale testing (Amrein & Berliner, 2002; Rigsby &
DeMulder, 2003): students learn to answer the questions better, but may not
learn more; a lot of time goes into preparing such tests; learning experiences are
reduced; it leads to de-professionalisation of teachers; changes in content are
not accompanied by changes in teaching strategies; teachers do not reflect on
their teaching to develop more effective practices; effort and motivation of
teachers are decreasing; teachers question whether this is a fair assessment
for all students; and the standards don’t make sense to committed teachers or
skilled practitioners. Such external tests also influence teachers’ assessments.
They often emulate large scale tests on the assumption that this represents good
assessment practice. As a consequence, the effect of feedback is to teach the
weaker student that he lacks ability, so looses confidence in his own learning
(Black & Wiliam, 1998). One of my British colleagues insinuated: ‘‘A-levels are
made for those who can’t reach it, so they realise how stupid they are’’.
In the past decade, research evidence has shown that the use of summative
tests squeezes out assessment for learning and has a negative impact on motiva-
tion for learning. Moreover, the latter effect is greater for the less successful
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 87

students and widens the gap between high and low achievers (Harlen & Deakin
Crick, 2003; Leonard & Davey, 2001). Our research also shows convincingly
that students’ perceptions of the causes of success and failure are of central
importance in the development of motivation for learning (Struyven, Dochy, &
Janssens, 2005; Struyven, Gielen, & Dochy, 2003). Moreover, too much sum-
mative testing does not only affect students’ motivation, but also that of their
teachers. High stakes tests result in educational activities directed towards the
content of the tests. As a consequence, the diversity of learning experiences for
students is reduced and teachers use a small range of instructional strategies.
The latter leads teachers to deprofessionalization (Rigsby & DeMulder, 2003).
Firestone and Mayrowitz (2000) state: ‘‘What was missing . . . was the structures
and opportunities to help teachers reflect on their teaching and develop more
effective practices’’ (p. 745).
Recently, the finding that assessment steers learning has gained a lot of
attention within educational research (Dochy, 2005; Dochy, Segers, Gijbels, &
Struyven, 2006; Segers et al., 2003). But motivation has also been investigated as
one of the factors that is in many cases strongly influenced by the assessment or
the assessment system being used.

Characteristics of New Assessment Modes


The current assessment culture can be characterized as follows (Dochy, 2001;
Dochy & Gijbels, 2006).
There is a strong emphasis on the integration of assessment and instruction.
Many assessment specialists take the position that appropriately used educa-
tional assessments can be seen as tools that enhance the instructional process.
Additionally, there is strong support for representing assessment as a tool for
learning. The position of the student is that of an active participant who shares
responsibility in the process, practices self assessment, reflection and collabora-
tion, and conducts a continuous dialogue with the teacher. Students participate
in the development of the criteria and the standards for evaluating their per-
formance. Both the product and process are being assessed. The assessment
takes many forms, all of which could generally be referred to as unstandardised
assessments embedded in instruction. There is often no time pressure, and a
variety of tools that are used in real life for performing similar tasks are
permitted. The assessment tasks are often interesting, meaningful, authentic,
challenging and engaging, involving investigations of various kinds. Students
also sometimes document their reflections in a journal and use portfolios to
keep track of their academic/vocational growth. Reporting practices shift from
a single score to a profile, i.e. from quantification to a portrayal (Birenbaum,
1996).
New assessment modes such as observations, text- and curriculum-embedded
questions, interviews, over-all tests, simulations, performance assessments,
88 F. Dochy

writing samples, exhibitions, portfolio assessment, product assessments, and


modes of peer-and co-assessment have been investigated increasingly in recent
years (Birenbaum & Dochy, 1996; Dochy & Gijbels, 2006; Segers et al., 2003;
Topping, 1998) and a set of criteria for new assessment practices has been
formulated (Birenbaum, 1996; Feltovich, Spiro, & Coulson, 1993; Glaser,
1990; Shavelson, 1994).
Generally, five characteristics are shared amongst these new assessments.
Firstly, good assessment requires that students construct knowledge (rather
than reproduce it). The coherence of knowledge, its structure and interrelations
are targets for assessment. Secondly, the assessment of the application of
knowledge to actual cases is the core goal of these so-called innovative assess-
ment practices. This means assessing the extent to which students are able to
apply knowledge to solve real life problems and take appropriate decisions.
Thirdly, good assessment instruments ask for multiple perspectives and context
sensitivity. Students not only need to know ‘what’ but also ‘when’, ‘where’ and
‘why’. This implies that statements as answers are not enough; students need to
have insight into underlying causal mechanisms. Fourthly, students are actively
involved in the assessment process. They have an active role in discussing the
criteria, in administering the assessment or fulfilling the task and sometimes in
acting as a rater for peers or self. Finally, assessments are integrated within the
learning process and are congruent with the teaching method and learning
environment.

Effects of Assessment on Learning: Pre- and Post-assessment


Effects

Investigation into the effects of assessment on learning is often summarized


as consequential validity (Boud, 1995; Sambell, McDowell, & Brown, 1997).
Consequential validity asks what the consequences are of the use of a certain
type of assessment on education and on the students’ learning processes, and
whether the consequences that are found are the same as the intended effects.
The research that explicitly looks into the effects of assessment is
now catching up with the research on traditional testing (Askham, 1997;
Birenbaum, 1994; Boud, 1990; Crooks, 1988; Dochy & Moerkerke, 1997;
Frederiksen, 1984; Gibbs, 1999; Gielen, Dochy, & Dierick, 2007; Gielen,
Dochy, et al., 2007; McDowell, 1995; Sambell et al., 1997; Scouller, 1998;
Tan, 1992; Thomas & Bain, 1984; Thomson & Falchikov, 1998; Trigwell &
Prosser, 1991a, 1991b).
The influence of formative assessment is mainly due to the fact that the results
are looked back upon after assessment, as well as the learning processes upon
which the assessment is based. This makes it possible to adjust learning (post-
assessment effect). A special form of feedback is that which students provide for
themselves, by using metacognitive skills while answering or solving assessment
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 89

tasks. In this case, the student is then capable of drawing conclusions – after
or even during assessment – about the quality of his learning behaviour
(self-generated or internal feedback). He can then resolve to do something
about it.
The influence of summative assessment is less obvious since the post-
assessment effects of summative assessment are often minor. Besides, a
student often does not know after the assessment what he did wrong and
why. A student who passes is usually not interested in how he performed and
in what ways he could possibly improve. The influence of summative assess-
ment on learning behaviour can, instead, be called pro-active. These are pre-
assessment effects. Teachers are often not aware of these effects. There is
evidence that these pre-assessment effects on learning behaviour outweigh
the post-assessment effects of feedback (Biggs, 1996). An important differ-
ence between the pre- and post-assessment effects is that the latter are
intentional, whilst the first are more in the nature of side effects, because
summative assessment intends to orientate, to select or to certify. Nevo
(1995), however, points out the existence of a third effect of assessment on
learning. Students also learn during assessment because, at that moment,
they often have to reorganize their acquired knowledge and they have to
make links and discover relationships between ideas that they had not pre-
viously discovered while studying. When assessment incites students to
thought processes of a higher cognitive nature, it is possible that assessment
becomes a rich learning experience for them. This goes for both formative
and summative assessment. We call this the plain assessment effect. This
effect can spur students to learn (with respect to content), but it does not
really affect learning behaviour, except for the self-generated feedback that
we discussed before.
A further question that can be put is which elements of assessment could be
responsible for these pre- and post-assessment effects: for example, the content
of assessment, the type, the frequency, the standards that are used, the content
of feedback, and the way the feedback is provided (Crooks, 1988; Gielen,
Dochy, & Dierick, 2003, 2007).

The Pre-assessment Effects of Summative Assessment

This influence of assessment on learning, which is called the pre-assessment


effect, is discussed by several authors who use different terminology. Freder-
iksen and Collins (1989) discuss systemic validity. The backwash effects are
discussed by Biggs (1996), and the feed-forward function by Starren (1998).
Boud (1990) refers to an effect of the content of assessment on learning. He
states that students are used to focusing on those subjects and assimilation
levels that form part of assessment (and thus can bring in marks), to the
detriment of the others. They are also encouraged to do this by the
90 F. Dochy

consequences linked to assessment (Biggs, 1996; Elton & Laurillard, 1979;


Marton & Säljö, 1976; Scouller, 1998; Scouller & Prosser, 1994; Thomas &
Bain, 1984). When external tests are used, we see that teachers are highly
influenced by them in their instruction – what is called teaching to the test. In
theory it is not problematic that instruction and learning focus on assessment
(see above), if the range of those tests were not limited to what can be easily
tested (reproduction and skills of a lower order). At this point the problematic
character of assessment-driven instruction or what Birenbaum calls test-driven
instruction becomes clear. Frederiksen (1984) reveals that in many tests the
problems to be solved are well-structured, even though the aim of education is
(or should be) to prepare pupils and students for real problem solving tasks,
which are mostly always ill-structured:
. . . Ill-structured problems are not found in standardized achievement tests. If such an
item were found, it would immediately be attacked as unfair. However, this would only
be unfair if schools do not teach students how to solve ill-structured problems. To make
it fair, schools would have to teach the appropriate problem-solving skills. Ability to
solve ill-structured problems is just one example of a desirable outcome of education
that is not tested and [therefore] not taught. (Frederiksen, 1984, p. 199)

In the literature, pre-assessment effects are also ascribed to the way in which an
assessment is carried out. Sometimes, the described negative influence of tests is
immediately linked to a certain mode of assessment, namely multiple choice
examinations. Nevertheless, it is important to pay attention to the content of
assessment, apart from the type, since this test bias can also occur with other
types of assessment.
The frequency of assessment can also influence learning behaviour. Tan
(1992) revealed that the learning environment’s influence on learning beha-
viour can be so strong that it cuts across the students’ intentions. Tan’s
research reveals that the majority of students turn to a superficial learning
strategy in a system of continued summative assessment. They did not do this
because they preferred it and had the intention to do it, but because the
situation gave them incentives to do so. As a result of the incongruity between
learning behaviour and intention, it turned out that many students were really
dissatisfied with the way in which they learnt. They had the feeling that they
did not have time to study thoroughly and to pay attention to their interests.
In this system, students were constantly under pressure. As a result they
worked harder but their intrinsic motivation disappeared and they were
encouraged to learn by heart. Crooks (1988) also drew attention to the fact
that, in relation to the frequency of assessment, it is possible that frequent
assessment is not an aid – it could even be a hindrance – when learning results
of a higher order are concerned, even when assessment explicitly focuses on it.
According to him, a possible explanation could be that students need more
breathing space in order to achieve in-depth learning and to reach learning
results of a higher order. We should note that Crooks’ discussion is of
summative, and not formative, assessment.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 91

Summative assessment, however, does not always affect the learning


process in a negative way: ‘‘Backwash is no bad thing’’, as Biggs (1996) empha-
sizes. More than that, assessment can have a positive effect on learning
(cf. McDowell, 1995; Dochy, Moerkerke, & Martens, 1996; Dochy & McDowell,
1997). McDowell (1995) points out that it is obvious that students learn and
behave differently in courses with assessment which requires higher order
learning and an authentic/original task than when they are examined in a
traditional way. When assessment is made up of an exam paper, then there is
no sense in memorising (also see Vermunt, 1992). Students also point out this
difference themselves. By using interviews, Sambell et al. (1997) investigated
students’ perceptions of the effects of assessment on their learning behaviour.
Many students found that traditional assessment methods had a negative effect
on their learning process. They think that the quality of their learning was often
‘‘tarnished’’, because they consciously turned to ‘‘inferior’’ learning behaviour in
order to be prepared for the kind of assessment that only led to short term
learning. These experiences contrast with their experiences of assessment that
channelled their efforts in the search for comprehension and focused on the
processes of critical questioning and analysing. In general, this was perceived as
a level of learning that satisfied them more.
It is important to note that assessment often pays more inherent attention
to the formative function, which partly comes under the category of post-
assessment effects. However, new types of assessment also fail sometimes.
McDowell (1995) refers to the example of a group task that forms part of, or
is included, in the final assessment. As a result there are conflicting pressures
between learning and the production of the product. When a student aims to
improve his weaknesses during production and focuses his learning on these
points, he risks jeopardizing his final results, as well as those of the group.
McDowell concludes that, with formal summative assessment, students tend to
play safe and to mainly use their strong points, whereas a personal breakthrough
regarding learning often contains an element of risk. In order to encourage this,
it is therefore also necessary to build in formative assessment.
Pre-assessment effects work in two directions. They can influence learning
behaviour in either a positive or a negative way. New types of assessment that fit
into the assessment culture explicitly try to take into account this aspect of the
consequential validity of assessment.

The Post-assessment Effects of Assessment-for-Learning


on the Learning Process

In what way does formative assessment play its role with regard to learning
behaviour and are there also differences in the effect depending on the type of
formative assessment? Assessment for learning, or formative assessment, does
not always have positive effects on learning behaviour. Askham (1997) points
92 F. Dochy

out that it is an oversimplification to think that formative assessment always


results in in-depth learning and summative assessment always leads to super-
ficial learning:
Some tutors are moving to a more formative approach by setting short answer tests
on factual information, undertaken at regular intervals throughout a course of study.
This clearly satisfies the need for feedback and does provide the opportunity for
improvement, but may encourage only the surface absorption of facts and superficial
knowledge. (Askham, 1997, p. 301)

The content and type of tasks can, in practice, have a negative effect, as can
formative assessment, and thus undo the positive effect of feedback. If forma-
tive assessment suits the learning goals and the instruction, it can result in the
refining of less effective learning strategies of students and confirming the
results of students who are doing well.
Based upon their review of 250 research studies, Black and Wiliam (1998)
conclude that formative assessment can be effective in improving students’
learning. In addition, when the responsibility for assessment is handed over to
the student, the formative assessment process seems to be more effective.
The most important component of formative assessment, as far as influence
on learning behaviour is concerned, is feedback. Although the ways in which
feedback can be given and received are endless, we can differentiate between
internal, self-generated and external feedback (Butler & Winne, 1995). The
latter form can be further split up into immediate or delayed feedback, global
feedback or feedback per criterion, with or without suggestions for improve-
ment. The characteristics of effective feedback, as described by Crooks (1988),
have already been discussed. If we want to know why feedback works, we have
to look into the functions of this feedback.
Martens and Dochy (1997) indicate the two main functions of feedback which
can influence students’ learning behaviour. The first, the cognitive function,
implies that information is provided about the learning process and the thorough
command of the learning goals. This feedback has a direct influence on the
knowledge and views of students (confirmation of correct views, improvement
or restructuring of incorrect ones), but can also influence the cognitive and meta-
cognitive strategies that students make use of during their learning process.
Secondly, feedback can be interpreted as a positive or a negative confirmation.
The impact of feedback as confirmation of the student’s behaviour depends on
the student’s interpretation (e.g. his attribution style). A third function of for-
mative assessment that has nothing to do with feedback, but that supports the
learning process, is the activation of prior knowledge through e.g. prior knowl-
edge tests. This is also a post-assessment effect of assessment.
Martens and Dochy (1997) investigated the effects of prior knowledge tests
and progress tests with feedback for students in higher education. Their results
revealed that the influence of feedback on learning behaviour is limited. An
effect that was apparent was that 24% of the students reported that they studied
certain parts of the course again, or over a longer period, as a result of the
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 93

feedback. Students who were kept informed of the progress they made seemed
to study longer in this self study context. As far as method of study and
motivation are concerned there were, in general, no differences from the control
group. They conclude that adult students are less influenced by feedback from
one or more assessments. Negative feedback did not seem to have dramatically
negative consequences for motivation. There is, however, a tendency towards
interaction with perceived study time. Possible explanations for the unconfirmed
hypotheses are that the content and timing of the feedback were not optimal
and that the learning materials were self-supporting.
Here again we note that the contents, the type/mode and the frequency of
assessment are important factors influencing the impact of formative assess-
ment on learning behaviour. Dochy, Segers, and Sluijsmans (1999) also point
out the importance of the student’s involvement in assessment. The most
important factor, that is unique to formative assessment, is the feedback com-
ponent. This, however, does not always seem to produce the desired effects.
Feedback contents, timing, and the fact that it is attuned to the student’s needs,
seem again to be important. We refer again to the characteristics of effective
feedback from Crooks.
Biggs (1998) asks us, however, not to overlook summative assessment in this
euphoria about formative assessment. First of all, the effects of summative
assessment on learning behaviour are also significant (see above). Secondly,
research from Butler (1988) reveals that formative and summative assessment
can also interact in their effects on learning behaviour and can only be separated
artificially because students always feel the influence of both and because
instruments can also fulfil both functions.
Crooks (1988) and Tan (1992), on the other hand, quote various investiga-
tions that point out the importance of the separation of opportunities for
feedback and summative assessment. Also, when assessment counts as well,
students seem to pay less attention to feedback and they learn less from it. On
the basis of research by McDowell (1995), we have already mentioned a dis-
advantage of the combination of both. Black (1995) also has reservations about
the combination of formative and summative assessment. When summative
assessment is done externally, it has to be separated from the formative aspect
because the negative backwash effects can be detrimental to learning and
inconvenient to the good support of the learning process. But even when
summative assessment is done internally, completely or partially, by the teacher
himself, the relationship between both functions has to be taken care of.
Primarily, the function of assessment must be set down and consequently
decisions about shape and method can be made. Struyf, Vandenberghe, and
Lens (2001), however, point out that in practice this will be mixed up anyway
and that formative assessment will be the basis for a summative judgement on
what someone has achieved.
What matters to us in this discussion is that the teacher or lecturer eventually
pays attention to the consequential validity of assessment, whether it is for-
mative, summative, or even both.
94 F. Dochy

Searching for New Quality Indicators of Assessment

Edumetric indicators, such as validity and reliability, are traditionally used to


evaluate the quality of educational assessment. The validity question refers to
the extent to which assessment measures what it purports to measure. Does the
content of assessment correlate with the goals of education? Reliability
was traditionally defined as the extent to which a test measures consistently.
Consistency in test results demonstrated objectivity in scoring: the same results
were obtained if the test was judged by another person or by the same person at
another time.
The meaning of the concept of reliability was determined by the then
prevailing opinion that assessment needs to fulfil, above all, a selection func-
tion. As mentioned above, fairness in testing was aligned with objectivity.
Striving to achieve objectivity in testing and comparing scores resulted in the
use of standardised testing forms, such as multiple-choice tests. These kinds of
tests, that in practice above all measure the reproduction of knowledge, are now
criticised for their negative influence on instruction. Because great weight was
attributed to test results at a management level, tutors attuned their education
to the content and the level of knowledge that was asked for in the test. As a
consequence, lower levels of cognitive knowledge were more likely to be
attended to (Dochy & Moerkerke, 1997).
As a reaction to the negative effects of multiple-choice tests on education,
new assessment modes were developed. These new assessment modes judge
students on their performances when using knowledge creatively to solve
domain-specific problems (Birenbaum, 1996). Assessment tasks are real-life
problems, or authentic representations of real-life problems. Since an important
goal of higher education today is to educate students in using their knowledge to
solve real-life problems, new assessment modes seem more valid than standar-
dised tests. Indeed, traditional tests always assume that student answers are an
indication of competence in a specific domain. When using authentic assess-
ment, interpretation of answers is not needed because assessment is still a direct
indication of the required competence. Precisely because of their authentic,
non-standardised character, these assessment modes score unfavourably on a
conventional reliability measurement, however, because the starting-points are
not the same.
Taking the unique characteristics of new assessment modes into consideration,
the traditional method of measuring reliability can be questioned. In the first
place, a conventional reliability measurement of new assessment modes that
are, in contrast to traditional tests, not standardised, may give a ‘‘false’’ picture
of the results. Secondly, using new assessment modes inevitably implies that
different assessors judge distinct knowledge and skills in different ways, at
different times. Also, one could question whether validity should not receive a
higher priority (Dochy, 2001). As Frank (personal communication, July
13, 2001) points out,
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 95

In the textile world in which I spent most of my working life, many of the consumer
tests have been designed with reliability in mind but not validity. One simple example of
this is a crease test which purports to test the tendency of a material to get creased
during use. The problem is that the test is carried out under standard conditions at a
temperature of twenty degrees and a relative humidity of sixty-five percent. When worn
the clothing is usually at a higher temperature and a much higher humidity and textile
fabrics are very sensitive to changes in these variables but still they insist on measuring
under standard conditions, thus making the results less valid for the user.

It is clear that a new assessment culture cannot be evaluated on the basis of


criteria from a previous era. To do justice to the uniqueness of new assessment
modes, the traditionally used criteria need to be expanded and other, more
suitable, criteria for evaluating the quality of assessment need to be developed.

Revised Edumetric Criteria for Evaluating Assessment: A Review

Various authors have proposed ways to extend the criteria, techniques and
methods used in traditional psychometrics (Cronbach, 1989; Frederiksen &
Collins, 1989; Haertel, 1991; Kane, 1992; Linn, 1993; Messick, 1989). In this
section, an integrated overview is given.
Within the literature on quality criteria for evaluating assessment, a distinc-
tion can be made between authors who present a more expanded vision on
validity and reliability (Cronbach, 1989; Kane, 1992; Messick, 1989) and those
who propose specific criteria, sensitive to the characteristics of new assessment
modes (Baartman, Bastiaens, Kirschner, & Van der Vleuten, 2007; Dierick &
Dochy, 2001; Frederiksen & Collins, 1989; Haertel, 1991; Linn, Baker, &
Dunbar, 1991).

Construct Validity as a Unitary Concept for Evaluating


the Quality of Assessment

Within the classical tradition, validity encompassed three components: content,


criterion and construct validity aspects.
The content validity aspect investigates how well the range and types of tasks
used in assessment are an appropriate reflection of the knowledge domain that
is being measured.
The term construct is a more abstract concept that refers to psychological
thinking processes underlying domain knowledge, for example, the reasoning
processes involved in solving an algebra problem. Measuring construct validity
involves placing the construct that is being measured within a conceptual
network and estimating relationships between measurements of related con-
structs through statistical analysis. The extent to which there is a correlation
between scores on tests that are measuring the same construct is called criterion
96 F. Dochy

validity. Evidence that a judgement is criterion valid is used as an argument for


the construct validity of an assessment.
Because new assessment modes use complex multidisciplinary problems, it is
not clear how to measure the validity of assessment in a psychometric way.
Within the new assessment culture, a more realistic approach must be sought
for measuring construct validity. Since the psychometric approach has been
renounced, there is a need to redefine the term construct. It can be argued that
this term can best be replaced by the term competence. Indeed, goals within
higher education today tend to be formulated in terms of attainable
competencies.
Research has also demonstrated that the use of an assessment form is not
only determined by the kind of knowledge and skills measured, but also by
broader effects on the nature and content of education (Birenbaum & Dochy,
1996). Some examples are the negative influences that standardised assessment
can have on the kind of knowledge that is asked for, on the way education is
offered and, by outcome, on the way students learn the subject matter. New
modes of assessment attempt to counteract these negative influences. It can be
argued that assessing higher-order skills will lead to learning those kinds of
knowledge and skills. It has indeed appeared that exams have the most influ-
ence on the learning activities of students. To evaluate the suitability of an
assessment form, it is important not only to ask if the assessment is an appro-
priate measure of the proposed knowledge and skills, but also to ask if use of
the assessment has achieved the intended effects (Messick, 1989).
The arguments above have led to critical reflection on the traditional way of
validating new assessment modes. This has resulted in the use of the term
construct validity as a unified criterion for evaluating the quality of assessment.
The authors of the Standards for Educational and Psychological Testing define
construct validity as ‘‘a unitary concept, requiring multiple lines of evidence, to
support the appropriateness, meaningfulness, and usefulness of the specific
inferences made from test scores’’ (American Educational Research Associa-
tion, American Psychological Association, National Council on Measurement
in Education, 1985, p. 9). In this construct validity criterion, the three traditional
aspects for evaluating validity are integrated. To describe this construct validat-
ing process, a connection is sought with the interpretative research tradition.
Authors such as Kane (1992) and Cronbach (1989) use an argument-based
approach for validating assessment, whereby arguments are sought to support
the plausibility of the corresponding interpretative argument with appropriate
evidence (i) for the inferences and assumptions made in the proposed interpreta-
tive argument, and (ii) for refuting potential counter arguments.
Messick (1994) offers the most elaborated vision of construct validity. He
describes this concept as an evaluative summary of both evidence for the actual,
as well as potential, consequences of score interpretation and use. Following
his argument, the concept of validity encompasses six distinguishable parts:
content, substantive, structural, external, generalisability, and consequential
aspects of construct validity, that conjointly function as general criteria for all
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 97

educational assessment. The content aspect of validity means that the range
and type of tasks used in assessment must be an appropriate reflection (content
relevance, representativeness) of the construct domain. Increasing achieve-
ment levels in assessment tasks should reflect increases in expertise in the
construct domain. The substantive aspect emphasizes the consistency between
the processes required for solving the tasks in an assessment, and the processes
used by domain experts in solving tasks (problems). Furthermore, the internal
structure of assessment – reflected in the criteria used in assessment tasks, the
interrelations between these criteria, and the relative weight placed on scoring
these criteria – should be consistent with the internal structure of the construct
domain. If the content aspect (relevance, representativeness of content and
performance standards) and the substantive aspect of validity is guaranteed,
score interpretation, based on one assessment task should be generalisable to
other tasks which assess the same construct. The external aspect of validity
refers to the extent to which the assessment scores’ relationships with other
measures and non-assessment behaviours reflect the expected high, low, and
interactive relationships. The consequential aspect of validity includes evidence
and rationales for evaluating the intended and unintended consequences of
score interpretation and use.
The view that the three traditional validity aspects are integrated within one
concept for evaluating assessment was also suggested by Messick (1989). The
aspect most stressed recently, however, is evaluating the influence of assessment
on education. Additionally, the concept of reliability becomes part of the
construct validating process. Indeed, for new modes of assessment, it is not
important to select the scores of students following a normal curve. The most
important question is how reliable is the judgement that a student is competent
or not?
The generalised aspect of validity investigates the extent to which the deci-
sion that a student is competent can be generalised to other tasks, and thus is
reliable. Measuring reliability then can be interpreted as a question of the
accuracy of the generalisation of assessment results to a broader domain of
competence.
In the following section we discuss whether generalisability theory, rather
than classical reliability theory, can be used to express the extent of the relia-
bility of accurate assessment.

Measuring Reliability: To What Extent Can Accuracy


of Assessment Be Generalised?

In classical test theory, reliability can be interpreted in two different ways. On the
one hand reliability can be understood as the extent to which agreement between
raters is achieved. In the case of performance assessment, reliability can be
described in terms of agreement between raters on one specified task and on
98 F. Dochy

one specified occasion. The reliability can be raised, according to this definition,
by using detailed procedures and standards to structure the judgements obtained.
Heller, Sheingold, & Mayford (1998), however, propose that measurements
of inter-rater reliability in authentic assessment do not necessarily indicate
whether raters are making sound judgements and also do not provide bases
for improving the technical quality of a test. Differences between ratings some-
times represent more accurate and meaningful measurement than absolute
agreement would. Suen, Logan, Neisworth and Bagnato (in press) argue that
the focus on objectivity among raters, as a desirable characteristic of an assess-
ment procedure, leads to a loss of relevant information. In high stake decisions,
procedures which include ways of weighing high quality information from
multiple perspectives may lead to a better decision than those in which informa-
tion from a single perspective is taken into account.
On the other hand, reliability refers to the consistency of results obtained
from a test when students are re-examined with the same test and the same rater
on a different occasion. According to this concept, reliability can be improved
by using tasks which are similar in format and context. This definition of
reliability in terms of consistency over time is, in the opinion of Bennet
(1993), problematic: between the test and the retest the world of a student will
change and that will affect the re-examination. Therefore, it is difficult to see
how an assessor can assure that the same aspects of the same task can be
assessed in the same way on different occasions. Assessors will look primarily
for developments rather than consistency in scores.
Taking the above definitions into account, it looks as though reliability and
the intention of assessment are opposed to each other. Achieving a high relia-
bility in terms of the student’s knowledge and skills by means of making
concessions will lead to a decrease in content validity. In the new test culture,
the vision of weighing information from multiple perspectives may result in a
better decision than using information taken from a single perspective. In that
case, different test modes can be used and the resulting information from these
different tests can be used to reach accurate decisions.
Classical reliability theory cannot be used in line with the vision of assess-
ment modes to express the reliability of a test because assessment is about
determining the real competence of students with a set of tests and tasks and
not trying to achieve a normally distributed set of results. Additionally, with
classical reliability theory you only can examine the consistency of a test on a
certain occasion, or the level of agreement between two raters.
Assessment involves measuring the competencies of students on different
occasions and in different ways, possibly by different raters. Assessment has to
be considered as a whole and not as a set of separate parts. Thus, the reliability
of the assessment as a whole is far more interesting than the reliability of the
separate parts. It is far more useful to examine in what way and to what extent
the students’ behaviours can be generalised (or transferred) to, for example,
professional reality and the necessary tasks, occasions and raters required for
that purpose.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 99

A broader view of reliability has led to the development of generalisability


theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972). Instead of asking how
accurate the observed scores are, as a reflection of the true scores, generalisa-
bility theory asks how accurately the observed scores allow one to generalise the
behaviour of a student in a well-defined universe.
What is possible with generalisability theory that is not possible with relia-
bility theory?
Classical reliability theory tries to answer the question of how accurately
observed scores are consistent with the true scores. The more the scores from a
test correspond with the hypothetical true scores, the smaller the error and the
higher the reliability. The observed score is subdivided into a true score part and
an error part. The error contains all sorts of possible sources of variation in the
score (for example the items, the occasions, the raters). Because of the complex-
ity and the open ended character of the tasks in assessment, larger error is
possible (Cronbach, Linn, Brennan, & Haertel, 1997). The different sources of
error cannot be discriminated using classical theory. Generalisability theory
can, if well defined, discriminate between different sources of error. It is even
possible to identify and estimate the size of (and the interactions between) the
different sources of error simultaneously. In this way, the researcher can get an
answer to the question regarding the extent to which the observed data can
be generalised to a well-defined universe. In other words, the extent to which the
measurement corresponds with reality and which components or interactions
between the components cause inaccuracy.
Various researchers have tried to demonstrate the working and the benefits
of generalisability theory. Shavelson, Webb and Rowley (1989) and Brennan
and Johnson (1995), for example, show which components can be integrated
using generalisability theory, and what high or low variations in these compo-
nents mean for the measurement (the so-called G-study). With this information,
assessment can be optimised. By varying the number of tasks, the raters or the
occasions, one can estimate the reliability in the case of different measurement
procedures and in this way construct an efficient and, as far as possible, reliable
assessment (the so-called D-study).
Fan and Chen (2000) show another example of applying generalisability
theory. They demonstrate that inter-rater reliability, when compared with
classical reliability theory, often gives an overestimation of the true reliability.
Generalisability theory can compute much more accurate values.

Specific Criteria for Evaluating the Quality of New


Assessment Modes

Apart from an expansion of the traditional validity and reliability criteria, new
criteria can be suggested for evaluating the quality of assessment: transparency,
fairness, cognitive complexity and authenticity of tasks, and directness of assessment
100 F. Dochy

(Dierick, Dochy, & Van de Watering, 2001; Dierick, Van de Watering, &
Muijtjens, 2002; Frederiksen & Collins, 1989; Gielen et al., 2003; Haertel, 1991;
Linn, et al., 1991). These criteria were developed to highlight the unique char-
acteristics of new assessment modes.
An important characteristic of new assessment modes is the kind of tasks
that are used. Authentic assessment tasks question the higher order skills that
stimulate a deeper learning approach by students.
The first criterion that distinguishes new assessment modes from tradi-
tional tests is the extent to which assessment tasks are used to measure
problem solving, critical thinking and reasoning. This criterion is called
cognitive complexity (Linn et al., 1991). To judge whether assessment tasks
meet this criterion, we can analyse whether there is consistency between the
processes required for solving the tasks and those used by experts in solving
such problems. Next, it is necessary to take into account students’ familiarity
with the problems and the ways in which students attempt to solve them
(Bateson, 1994).
Another criterion for evaluating assessment tasks is authenticity. Shepard
(1991) describes authentic tasks as the best indicators of attainment of learning
goals. Indeed, traditional tests always assume an interpretation from student
answers to competence in a specific domain. When using authentic assessment,
interpretation of answers is not needed, because assessment is already a direct
indication of competence.
The criterion authenticity of tasks is closely related to the directness of
assessment. Powers, Fowles, and Willard (1994) argue that the extent to
which teachers can judge competence directly is relevant evidence of the direct-
ness of assessment. In their research, teachers were asked to give a global
judgement about the competence ‘‘general writing skill’’ for writing tasks, with-
out scoring them. Thereafter, trained assessors scored these works, following
predetermined standards. Results indicate that there was a clear correlation
between the global judgement of competence by the teachers and the marks
awarded by the assessors.
When scoring assessment, the criterion of fairness (Linn, et al. 1991) plays an
important role. The central question is whether students have had a fair chance
to demonstrate their real ability. Bias can occur for several reasons. Firstly,
because tasks are not congruent with the received instruction/education.
Secondly, because students do not have equal opportunities to demonstrate
their real capabilities on the basis of the selected tasks (e.g. because they are
not accustomed to the cultural content that is asked for), and thirdly, because of
prejudgment in scoring. Additionally, it is also important that students under-
stand the criteria that are used in assessment. ‘‘Meeting criteria improves
learning’’: communicating these criteria to students when the assignments are
given improves their performances, as they can develop clear goals to strive for
in learning (Dochy, 1999).
A final criterion that is important when evaluating assessment is the trans-
parency of the scoring criteria that are used. Following Frederiksen and Collins
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 101

(1989), the extent to which students can judge themselves and others to be as
reliable as trained assessors provides a good indicator for this criterion.
Though these criteria seem to be good indicators for evaluating the quality of
new assessments, they cannot be seen as completely different from those
formulated by Messick (1994). The difference is that these criteria are more
concrete and more sensitive to the unique characteristics of new assessment
modes. They specify how validity can be evaluated in an edumetric, instead of a
psychometric, way.
The criteria authenticity of tasks and cognitive complexity can be seen as
further specifications of Messick’s (1994) content validity aspect. Authenticity
of tasks means that the content and the level of tasks need to be an adequate
representation of the real problems that occur within the construct/competence
domain that is being measured. To investigate the criterion of cognitive com-
plexity, we need to judge whether solving assessment tasks requires the same
thinking processes that experts use for solving domain-specific problems. This
criterion corresponds to what Messick calls substantial validity.
The criteria directness and transparency are relevant in the context of the
consequential validity of assessment. The way that competence is assessed,
directly or from an interpretation of student’s answers, has an immediate effect
on the nature and content of education and the learning processes of students. In
addition, the transparency of the assessment methods used influences the learning
process of students, as their performance will improve if they know exactly which
assessment criteria will be used (see the above-mentioned argument).
The criterion fairness forms part of both Messick’s (1994) content validity
aspect and his internal structural aspect. To give students a fair chance to
demonstrate what their real capabilities are, the tasks offered need to be
varied so that they contain the whole spectrum of knowledge and skills
needed for the competence measured. Also important is that the criteria
used for assessing a task, and the weight that is given, will be an adequate
reflection of the criteria used by experts for assessing competence in a specific
domain.
On the question of whether there is really a gap between the old and the new
assessment stream in formulating criteria for evaluating the quality of assess-
ment, it can be argued that there is a clear difference in approach and back-
ground. Within psychometrics, the criteria validity and reliability are
interpreted differently.

Evaluating New Assessment Modes According


to the New Edumetric Approach

If we integrate the most important changes within the assessment field with
regard to the criteria for evaluating assessment, conducting a quality assessment
inquiry involves a comprehensive strategy that addresses the evaluation of:
102 F. Dochy

1. the validity of assessment tasks;


2. the validity of performance assessment scoring;
3. the generalisability of assessment; and
4. the consequential validity of assessment.
During this inquiry, arguments will be found that support or refute the
construct validity of assessment. Messick (1994) suggested that two questions
must be asked whenever a decision about the quality of assessment is made: Is
the assessment any good as a measure of the characteristics it is intended to
assess? and Should the assessment be used for the proposed purpose? In
evaluating the first question, evidence of the validity of the assessment tasks,
of the assessment performance scoring, and the generalisability of the assess-
ment, must be considered. The second question evaluates the adequacy of the
proposed use (intended and unintended effects) against alternative means of
serving the same purpose. In the evaluative argument, the evidence obtained
during the validity inquiry will be considered, and carefully weighed, to reach a
conclusion about the adequacy of assessment use for the specific purpose.
In Table 6.1, an overview is given of questions that can be used as guidelines
to collect supporting evidence for, and to examine possible threats to, construct
validity.

Table 6.1 A framework for collecting supporting evidence for, and examining possible
threats to, construct validity
Procedure Review questions regarding
1. Establish an explicit 1. Validity of the Assessment Tasks
conceptual framework
for the assessment Judging how well the assessment matches the content and
= Construct definition: cognitive specifications of the construct /competency that is
(content and cognitive measured
specifications) +
A. Does the assessment consist of a representative set of
tasks that cover the spectrum of knowledge and skills
needed for the construct/competence being measured?
Purpose B. Are the tasks authentic in that they are representative of
Frame of reference for the real life problems that occur within the construct/
reviewing assessment tasks competence domain that is being measured?
or items with regard to the C. Do the tasks assess complex abilities in that the same
purport construct/ thinking processes are required for solving the tasks
competence that experts use for solving domain-specific problems?
2. Identify rival explanations 2. Validity of Assessment Scoring
for the observed
performance Searching evidence for the appropriateness of the inference
from the performance to an observed score
+
+
A. Is the task congruent with the received instruction/
Collect multiple types of
education?
evidence on the rival
B. Do all students have equal opportunities to demonstrate
explanations of assessment
their capabilities on the basis of the selected tasks?
performance
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 103

Table 6.1 (continued)


Procedure Review questions regarding
Purpose – Can all students demonstrate their performance in the
Identifies possible selected mode of assessment?
weaknesses in the – Are students accustomed to the cultural content that is
interpretation of scoring asked for?
– Are students sufficiently motivated to perform
well?
+ – Can students use the necessary equipment?
Provides a basis for refuting – Do any students have inappropriate advantages?
alternative explanations for C. Do the scoring criteria relate directly to the particular
performance or for revising construct/competence that the task is intended to
the assessment, if necessary. measure?
D. Are the scoring criteria clearly defined (transparent)?
E. Do students understand the criteria that are used to
evaluate their performance?
3. The Generalisability of the Assessment Score
Searching for evidence of the appropriateness
of the inference from the observed score to a conclusion
about expected performance in the competence/construct
domain.
Within the current vision of assessment, the
traditional concept of reliability is replaced by the
notion of generalisability. Measuring reliability then
becomes a question of accuracy of generalisation, or
transfer of assessment.
The goal of using generalisability theory
in reliability inquiry is to explain the consistency/
inconsistency notion in scoring and to focus on
understanding and reducing possible sources of
measurement error.
s2 Xptr = s2p + s2t + s2r +s2 pt + s2 tr +s2 pr + ptr
(total variance )
p=person (student); t= task; r = rater
Generalisability coefficient (reliability coefficient) : Ratio:
universal score variance / observed score variance
4. Consequences of Test Use Techniques

Searching for evidence : What Investigating whether the actual  observation of


does the assessment claim consequences are also the instructional
to do? expected consequences strategies
How do students prepare  comparison of the
themselves for education? quality of learning
What kind of learning results obtained
strategies do students use? with former test
Which kind of knowledge is methods with
measured? learning results using
Does assessment stimulate the this new form of
development of various skills? assessment.
104 F. Dochy

Table 6.1 (continued)


Procedure Review questions regarding
Does assessment stimulate  presenting statements
students to apply their of expected (and
knowledge in realistic unexpected)
situations? consequences of
Are long term effects perceived? assessment to the
Is breath and depth in learning student population
actively rewarded, instead of  holding semi-
merely by chance? structured key group
Does making expectations and interviews
criteria explicit stimulate
independence?
Is relevant feedback provided
for progress?
Promises for new forms of
assessment
 encourage high quality
learning and active
participation
 encourage instructional
strategies and techniques
that foster reasoning,
problem-solving, and
communication
 have no detrimental effect on
instruction because they
evaluate the cognitive skill of
interest directly (directness of
assessment)
 give feedback opportunities
 formulating clear criteria
improves performance of
students (transparency of
assessment)
 encourage a sense of
ownership, personal
responsibility, independence
and motivation
What are the effects on the  ameliorate the learning climate
system of using the (Dochy et al., 1999;
assessment, other than what Marcoulides & Simkin, 1991;
the assessment claims? Sambell et al., 1997; Riley,
1995; Topping, 1998; . . .)
Overall purpose: verify that the inferences made from the assessment are sound.
Findings should be reported in the form of an evaluative argument that integrates the evidence
for the construct validity of the assessment

1. Is the assessment any good as measure of the construct it is interpreted to assess?


2. Should the assessment be used for the proposed use?
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 105

Validity of Assessment Tasks Used

What are the arguments in support of the construct validity of new assessment
modes?
Assessment development begins with establishing an explicit conceptual frame-
work that describes the construct domain being assessed: content and cognitive
specifications should be identified. During the first stage, validity inquiry judges
how well assessment matches the content and cognitive specifications of the con-
struct being measured. The defined framework can then be used as a guideline to
select assessment tasks. The following aspects are important to consider. Firstly,
the tasks used must be an appropriate reflection of the construct or, according to
new assessment modes, the competence that needs to be measured. Next, with
regard to the content, the tasks should be authentic in that they are representative
of real life problems that occur in the knowledge domain being measured. Finally,
the cognitive level needs to be complex, so that the same thinking processes are
required that experts use for solving domain-specific problems.
New assessment modes score better on these criteria than standardised tests,
precisely because of their authentic and complex problem characteristics.

Validity of Assessment Scoring

The next aspect that needs to be investigated is whether the assessment is valid.
The fairness criterion plays an important role in this. This requires on the
one hand that the assessment criteria fit and are appropriately used, so that they
are an appropriate reflection of the criteria used by experts and that appropriate
weightings are given for assessing different competences. On the other hand, it
requires that students need to have a fair chance to demonstrate their real abilities.
Possible problems that can occur are, firstly, that relevant assessment criteria
could be lacking, so certain competence aspects do not get enough attention.
Secondly, irrelevant, personal assessment criteria could be used. Because assess-
ment measures the ability of students at different times, in different ways, by
different judges, there is less chance that these problems will occur with new modes
of assessment, since any potential bias in judgement will be outweighed. As a result
of this, the totality of all the assessments will give a more precise picture of the real
competence of a person than standardised assessment, where the decision of
whether a student is competent is reduced to one judgement at one time.

Generalisability of Assessment

This step in the validating process investigates to what extent assessment can be
generalised to other tasks that measure the same construct. This indicates that
score interpretation is reliable and supplies evidence that assessment really
measures the intended construct.
106 F. Dochy

Problems that can occur are under-representation of the construct, and


variance which is irrelevant to the construct. Construct under-representation
means that assessment is too limited, so that important construct dimensions
cannot be measured. In the case of variance which is irrelevant to the construct,
the assessment may be too broad, thus containing systematic variance that is
irrelevant for measuring the construct (Dochy & Moerkerke, 1997). In this
context, the breadth of the construct or the purported competence needs to be
defined before a given interpretation is considered reliable and validity can be
discussed.
Messick (1994) argues that the validated interpretation gives meaning to the
measure in the particular instance, and to evidence of the generality of inter-
pretation over time and across groups and settings, showing how stable, and
thus reliable, that meaning is likely to be. On the other hand, Frederiksen and
Collins (1989) have moved away from the idea that assessment can only be
called reliable if the interpretation can be generalised to a broader domain. They
use another model where the fairness of the scoring is crucial for reliability, but
where the replicability and generalizability of the performance are not. In any
case, it can be argued that assessment where a number of authentic, representa-
tive tasks are used to measure a specific competence is less sensitive to the
above-mentioned problems. The purport construct is, after all, directly mea-
sured. ‘‘Authentic’’ means that the tasks are realistic, for example, working with
extensive case study material and not with a short case study that only lists
relevant information.

Consequences of Assessment

The last question that needs to be asked is what the consequences are of using a
particular assessment form for instruction, and for the learning process of
students. Linn et al. (1991) and Messick (1995) stressed not only the importance
of the intended consequences, but also the unintended consequences, positive
and negative. Examples of intended educational consequences concern the
increased involvement and quality of the learning of students (Boud, 1995),
improvements in reflection (Brown, 1999), improvements and changes in teach-
ing method (Dancer & Kamvounias, 2005; Sluijsmans, 2002), increased feelings
of ownership and higher performances (Dierick et al., 2001; Dierick et al., 2002;
Farmer & Eastcott, 1995; Gielen et al., 2003; Orsmond, Merry, & Reiling,
2000), increased motivation and more direction in learning (Frederiksen & Col-
lins, 1989), better reflection skills and more effective use of feedback (Brown,
1999; Gielen et al., 2003) and the development of self-assessment skills and
lifelong learning (Baartman, Bastiaens, Kirschner, & Van der Vleuten, 2005;
Boud, 1995). Examples of unintended educational consequences are, among
other things, surface learning approaches and rote learning (e.g., Nijhuis, Segers,
& Gijselaers, 2005; Segers, Dierick, & Dochy, 2001), increased test anxiety
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 107

(e.g., Norton et al., 2001) and stress (e.g., Evans, McKenna, & Oliver, 2005;
Pope, 2001, 2005), cheating and tactics to impress teachers (e.g., Norton et al.,
2001), gender bias (e.g., Langan et al., 2005), prejudice and unfair marking (e.g.,
Pond, Ui-Haq, & Wade, 1995), lower performances (e.g., Segers & Dochy, 2001),
and resistance towards innovations (e.g., McDowell & Sambell, 1999).
Consequential validity investigates whether the actual consequences of
assessment are also the expected consequences. This is a very important ques-
tion, as each form of assessment also steers the learning process. This can be
made clear by presenting statements of expected (and unexpected) conse-
quences of assessment to the student population, or by holding semi-structured
key group interviews. Using this last method, unexpected effects also become
clear. When evaluating consequential validity, the following aspects are impor-
tant to consider: what students understand the requirements for assessment to
be; how students prepare themselves for education; what kind of learning
strategies are used by students; whether assessment is related to authentic
contexts; whether assessment stimulates students to apply their knowledge to
realistic situations; whether assessment stimulates the development of various
skills; whether long term effects are perceived; whether effort, instead of mere
chance is actively rewarded; whether breath and depth in learning is rewarded;
whether independence is stimulated by making expectations and criteria expli-
cit; whether relevant feedback is provided for progress; and whether competen-
cies are measured, rather than just the memorising of facts.

The Way Forward: Practical Guidelines

From this chapter, we can derive the following practical guidelines to be kept in
mind:
 The use of summative tests squeezes out assessment for learning and has a
negative impact on motivation for learning.
 Some research reveals that the majority of students turn to a superficial
learning strategy in a system of continued summative assessment.
 Using more new modes of assessment supports the recent views on student
learning.
 Assessment steers learning.
 The consequential validity of assessments should be taken into account when
designing exams. Consequential validity asks what the consequences are of
the use of a certain type of assessment on education and on the students’
learning processes, and whether the consequences that are found are the
same as the intended effects.
 The influence of formative assessment is mainly due to the fact that the
results are looked back upon after assessment, as well as the learning pro-
cesses upon which the assessment is based.
108 F. Dochy

 In using formative and summative modes of assessment, there are often


conflicting pressures between learning and the production of the product.
 Formative assessment can be effective in improving students’ learning, and
when the responsibility for assessment is handed over to the student, the
formative assessment process seems to be even more effective.
 The most important component of formative assessment is feedback. One of
the merits of implementing peer assessment as a tool for learning is that it can
actually increase the consequential validity of the parent-assessment method
to which it is attached by making it more feasible to include challenging and
authentic tasks in one’s assessment system; by helping to make the assess-
ment demands more clear to the students; by providing a supplement or a
substitute for formative staff assessment; and finally, by supporting the
response to teacher feedback. Our empirical studies showed that qualita-
tive formative peer assessment (referred to as peer feedback), applied as a
tool for learning, is no inferior form of feedback. Peer feedback might be
considered as a worthy substitute for staff feedback. It might even lead to
higher performance than staff feedback when it is extended with measures
to enhance the mindful reception of feedback by means of an a priori
question form or an a posteriori reply form administered to the assessee
(Gielen et al., 2007). Moreover, it is important to stimulate or support
students in providing more constructive feedback in order to raise the
effectiveness of peer feedback. In order to be considered ‘‘constructive’’,
feedback should be specific, appropriate to the assessment criteria, contain
positive as well as negative comments, and include some justifications,
suggestions and thought-provoking questions. Assessees who receive this
type of feedback make better revisions, resulting in greater progress
between the draft and the final version of the essay. Examples of measures
that can be taken to enhance the constructiveness of peer feedback
are increasing the social interaction between peers; training peer assessors
in providing constructive feedback; training assessees on how to make
sure themselves that they receive the feedback they need; or installing a
quality control system in which student-assessors are rewarded for
good feedback or punished for clearly poor feedback. Feedback can
reach a high level of constructiveness without necessarily being correct or
complete. Peer feedback and staff feedback can play a complementary
role. Peer feedback can be more specific, and is better in activating, moti-
vating and coaching students; staff feedback is more trustworthy, and it
helps to understand the assessment requirements and the structure of
the course. Finally, practitioners should notice that implementing peer
assessment as a tool for learning does not necessarily result in a saving of
time (Gielen et al., 2007).
 Apart from an expansion of the traditional validity and reliability criteria,
new criteria can be suggested for evaluating the quality of assessment:
transparency, fairness, cognitive complexity and authenticity of tasks, and
directness of assessment.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 109

 In conducting a quality assessment inquiry, we should evaluate: the validity


of assessment tasks; the validity of performance assessment scoring; the
generalisability of assessment; and the consequential validity of assessment.
Since this may be time consuming, support for lecturers from a central
support department and co-operation in using new modes of assessment
between schools and universities are interesting tracks to follow.

Conclusion

In this chapter, it is argued that student centred education implies a new way of
assessing students. Assessment no longer means ‘‘testing’’ students. New assess-
ment modes differ in various aspects from the characteristics of traditional tests.
Students are judged on their actual performance in using knowledge in a
creative way to demonstrate competence. The assessment tasks used are authen-
tic representations of real-life problems. Often, students are given responsibility
for the assessment process, and both knowledge and skills are measured.
Furthermore, in contrast to traditional tests, assessment modes are not stan-
dardised. Finally, assessment implies that at different times, different knowl-
edge and skills are measured by different raters. For new assessment modes the
most important question, with regard to quality, is: How justified is the decision
that a person is competent?
Because of these specific characteristics, the application of traditional theo-
retical quality criteria is not essential. Therefore, in this article attention has
been paid to the consequences of these developments for screening the edu-
metric quality of assessment. It can be concluded that the testing of the theore-
tical meaning of validity and reliability is no longer tenable but needs, at least,
to be expanded and changed. For new assessment modes, it is important to
portray the following quality aspects: cognitive complexity, authenticity of
tasks, fairness, transparency of the assessment procedure, and the influence of
assessment on education.

References
American Educational Research Association, American Psychological Association, National
Council on Measurement in Education (1985). Standards for educational and psychological
testing. Washington, DC: Author.
Amrein, A. L., & Berliner, D. C. (2002). High-stakes testing, uncertainty, and student learning
Education Policy Analysis Archives, 10(18). Retrieved August 15, 2007, from http://epaa.
asu.edu/epaa/v10n18/.
Askham, P. (1997). An instrumental response to the instrumental student: assessment for
learning. Studies in Educational Evaluation, 23(4), 299–317.
Baartman, L. K. J., Bastiaens, T. J., Kirschner, P. A., & Van der Vleuten, C. P. M. (2007).
Teachers’ opinions on quality criteria for competency assessment programmes. Teaching
and Teacher Education. Retrieved August 15, 2007, from http://www.fss.uu.nl/edsci/
images/stories/pdffiles/Baartman/baartman%20et%20al_2006_teachers%20opinions%
20on%20quality%20criteria%20for%20caps.pdf
110 F. Dochy

Baartman, L. K. J., Bastiaens, T. J., Kirschner, P. A., & Van der Vleuten, C. P. M. (2005). The
wheel of competency assessment. Presenting quality criteria for competency assessment
programmes. Paper presented at the 11th biennial Conference for the European Associa-
tion for Research on Learning and Instruction (EARLI), Nicosia, Cyprus.
Bateson, D. (1994). Psychometric and philosophic problems in ‘authentic’ assessment,
performance tasks and portfolios. The Alberta Journal of Educational Research, 11(2),
233–245.
Bennet, Y. (1993). Validity and reliability of assessments and self-assessments of work-based
learning assessment. Assessment & Evaluation in Higher Education, 18(2), 83–94.
Biggs, J. (1996). Assessing learning quality: Reconciling institutional, staff and educational
demands. Assessment & Evaluation in Higher Education, 21(1), 5–16.
Biggs, J. (1998). Assessment and classroom learning: A role for summative assessment?
Assessment in Education: Principles, Policy & Practices, 5, 103–110.
Birenbaum, M. (1994). Toward adaptive assessment – The student’s angle. Studies in Educa-
tional Evaluation, 20, 239–255.
Birenbaum, M. (1996). Assessment 2000. In: M. Birenbaum & F. Dochy, (Eds.). Alternatives
in assessment of achievement, learning processes and prior knowledge. Boston: Kluwer
Academic.
Birenbaum, M., & Dochy, F. (Eds.) (1996). Alternatives in assessment of achievement, learning
processes and prior knowledge. Boston: Kluwer Academic.
Black, P., (1995). Curriculum and assessment in science education: The policy interface.
International Journal of Science Education, 17(4), 453–469.
Black, P. & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education:
Principles, Policy & Practices, 5(1), 7–74.
Boud, D. (1990). Assessment and the promotion of academic values. Studies in Higher
Education, 15(1), 101–111.
Boud, D. (1995). Assessment and learning: Contradictory or complementary? In P. Knight
(Ed.), Assessment for learning in higher education (pp. 35–48). London: Kogan Page.
Brennan, R. L., & Johnson, E. G. (1995). Generalisability of performance assessments.
Educational Measurement: Issues and Practice, 11(4), 9–12.
Brown, S. (1999). Institutional strategies for assessment. In S. Brown, & A. Glasner (Eds.),
Assessment matters in higher education: Choosing and using diverse approaches (pp. 3–13).
Buckingham: The Society of Research into Higher Education/Open University Press.
Butler, D. L. (1988). A critical evaluation of software for experiment development in research
and teaching. Behaviour-Research-Methods, Instruments and Computers, 20, 218–220.
Butler, D. L., & Winne, P. H. (1995). Feedback and self-regulated learning: A theoretical
synthesis. Review of Educational Research, 65(3), 245–281.
Cronbach, L. J. (1989). Construct validation after thirty years. In R. L. Linn (Ed.), Intelli-
gence: Measurement, theory and public policy (pp. 147–171). Chicago: University of Illinois
Press.
Cronbach, L. J., Gleser, G. C., Nanda, H. & Rajaratnam, N. (1972). The dependability of
behavioral measurements: Theory of generalizability for scores and profiles. New York:
Wiley.
Cronbach, L. J., Linn, R. L., Brennan, R. L. & Haertel, E. H. (1997). Generalizability analysis
for performance assessments of students’ achievement or school effectiveness. Educational
and Psychological Measurement, 57(3), 373–399.
Crooks, T. (1988). The impact of classroom evaluation practices on students. Review of
Educational Research, 58(4), 438–481.
Dancer, D., & Kamvounias, P. (2005). Student involvement in assessment: A project designed
to assess class participation fairly and reliably. Assessment and Evaluation in Higher
Education, 30, 445–454.
Dierick, S., & Dochy, F. (2001). New lines in edumetrics: New forms of assessment lead to new
assessment criteria. Studies in Educational Evaluation, 27, 307–329.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 111

Dierick, S., Dochy, F., & Van de Watering, G. (2001). Assessment in het hoger onderwijs.
Over de implicaties van nieuwe toetsvormen voor de edumetrie [Assessment in higher
education. About the implications of new test forms for edumetrics]. Tijdschrift voor Hoger
Onderwijs, 19, 2–18.
Dierick, S., Van de Watering, G., & Muijtjens, A. (2002). De actuele kwaliteit van assessment:
Ontwikkelingen in de edumetrie [The actual quality of assessment: Developments in edu-
metrics]. In F. Dochy, L. Heylen, & H. Van de Mosselaer (Eds.) Assessment in onderwijs:
Nieuwe toetsvormen en examinering in studentgericht onderwijs en competentiegericht onder-
wijs [Assessment in education: New testing formats and examinations in student-centred
education and competence based education] (pp. 91–122). Utrecht: Lemma BV.
Dochy, F. (1999). Instructietechnologie en innovatie van probleemoplossen: over constructie-
gericht academisch onderwijs. Utrecht: Lemma.
Dochy, F. (2001). A new assessment era: Different needs, new challenges. Research Dialogue
in Learning and Instruction, 2(1), 11–20.
Dochy, F. (2005). Learning lasting for life and assessment: How far did we progress? Presidential
address at the EARLI conference 2005, Nicosia, Cyprus. Retrieved October 18, 2007
from http://perswww.kuleuven.be/u0015308/Publications/EARLI2005%20presidential%
20address%20FINAL.pdf
Dochy, F., & Gijbels, D. (2006). New learning, assessment engineering and edumetrics. In
L. Verschaffel, F. Dochy, M. Boekaerts, & S. Vosniadou (Eds.), Instructional psychology:
Past, present and future trends. Sixteen essays in honour of Erik De Corte. New York:
Elsevier.
Dochy, F., & McDowell, L. (1997). Assessment as a tool for learning. Studies in Educational
Evaluation, 23, 279–298.
Dochy, F. & Moerkerke, G. (1997). The present, the past and the future of achievement
testing and performance assessment. International Journal of Educational Research, 27(5),
415–432.
Dochy F., Moerkerke G., & Martens R. (1996). Integrating assessment, learning and instruc-
tion: Assessment of domain-specific and domain transcending prior knowledge and
progress. Studies in Educational Evaluation, 22(4), 309–339.
Dochy, F., Segers, M., Gijbels, D., & Struyven, K. (2006). Breaking down barriers
between teaching and learning, and assessment: Assessment engineering. In D. Boud &
N. Falchikov (Eds.), Rethinking assessment for future learning. London: RoutledgeFalmer.
Dochy, F., Segers, M., & Sluijsmans, D. (1999). The use of self-, peer- and co-assessment in
higher education: A review. Studies in Higher Education, 24(3), 331–350.
Elton L. R. B., & Laurillard D. M. (1979). Trends in research on student learning. Studies in
Higher Education, 4(1), 87–102.
Evans, A. W., McKenna, C., & Oliver, M. (2005). Trainees’ perspectives on the assessment and
self-assessment of surgical skills. Assessment and Evaluation in Higher Education, 30, 163–174.
Falchikov, N. (1986). Product comparisons and process benefits of collaborative peer group
and self-assessments. Assessment and Evaluation in Higher Education, 11(2), 146–166.
Falchikov, N. (1995). Peer feedback marking: Developing peer assessment. Innovations in
Education and Training International, 32(2), 395–430.
Fan, X., & Chen, M. (2000). Published studies of interrater reliability often overestimate
reliability: computing the correct coefficient. Educational and Psychological Measurement,
60(4), 532–542.
Farmer, B., & Eastcott, D. (1995). Making assessment a positive experience. In P. Knight (Ed.),
Assessment for learning in higher education (pp. 87–93). London: Kogan Page.
Feltovich, P. J., Spiro, R. J. & Coulson, R. L. (1993). Learning, teaching, and testing for
complex conceptual understanding. In N. Frederiksen, R. J. Mislevy, & I. I. Bejar (Eds.),
Test theory for a new generation of tests. Hillsdale, NJ: Lawrence Erlbaum.
Firestone, W. A., & Mayrowitz, D. (2000). Rethinking ‘‘high stakes’’: Lessons from the
United States and England and Wales. Teachers College Record, 102, 724–749.
112 F. Dochy

Frederiksen, J. R., & Collins, A. (1989). A system approach to educational testing. Educa-
tional Researcher, 18(9), 27–32.
Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning.
American Psychologist, 39(3), 193–202.
Gibbs, G. (1999). Using assessment strategically to change the way students learn, In
S. Brown & A. Glasner (Eds.), Assessment matters in higher education: Choosing and
using diverse approaches, Buckingham: Open University Press.
Gielen, S., Dochy, F., & Dierick, S. (2003). Evaluating the consequential validity of new
modes of assessment: The influence of assessment on learning, including pre-, post-, and
true assessment effects. In M. S. R. Segers, F. Dochy, & E. Cascallar (Eds.), Optimising
new modes of assessment: In search of qualities and standards (pp. 37–54). Dordrecht/
Boston: Kluwer Academic Publishers.
Gielen, S., Dochy, F., & Dierick, S. (2007). The impact of peer assessment on the consequential
validity of assessment. Manuscript submitted for publication.
Gielen, S., Tops, L., Dochy, F., Onghena, P., & Smeets, S. (2007). Peer feedback as a substitute
for teacher feedback. Manuscript submitted for publication.
Gielen, S., Dochy, F., Onghena, P., Janssens, S., Schelfhout, W., & Decuyper, S. (2007).
A complementary role for peer feedback and staff feedback in powerful learning environ-
ments. Manuscript submitted for publication.
Glaser, R. (1990). Testing and assessment; O Tempora! O Mores! Horace Mann Lecture,
University of Pittsburgh, LRDC, Pittsburgh, Pennsylvania.
Gulliksen H. (1985). Creating Better Classroom Tests. Educational Testing Service. Opinion
papers, Reports – evaluative.
Haertel, E. H. (1991). New forms of teacher assessment. Review of Research in Education,
17, 3–29.
Harlen, W., & Deakin Crick, R. (2003). Testing and motivation for learning. Assessment in
Education, 10(2), 169–207.
Heller, J. I., Sheingold, K., & Mayford, C. M. (1998). Reasoning about evidence in portfolios:
Cognitive foundations for valid and reliable assessment. Educational Assessment, 5(1), 5–40.
Kane, M. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Langan, A. M., Wheater, C. P., Shaw, E. M., Haines, B. J., Cullen, W. R., Boyle, J. C., et al.
(2005). Peer assessment of oral presentations: Effects of student gender, university affilia-
tion and participation in the development of assessment criteria. Assessment and Evalua-
tion in Higher Education, 30, 21–34.
Leonard, M, & Davey, C. (2001). Thoughts on the 11 plus. Belfast: Save the Children Fund.
Linn, R. L. (1993). Educational assessment: Expanded expectations and challenges. Educa-
tional Evaluation and Policy Analysis, 15(1), 1–16.
Linn, R. L., Baker, E., & Dunbar, S. (1991). Complex, performance-based assessment:
Expectations and validation criteria. Educational Researcher, 20(8), 15–21.
Martens, R., & Dochy, F. (1997). Assessment and feedback as student support devices.
Studies in Educational Evaluation, 23(3), 257–273.
Marton, F. & Säljö, R. (1976). On qualitative differences in learning. Outcomes and process.
British Journal of Educational Psychology, 46, 4–11, 115–127.
McDowell, L. (1995). The impact of innovative assessment on student learning. Innovations in
Education and Training International, 32(4), 302–313.
McDowell, L., & Sambell, K. (1999). The experience of innovative assessment: Student
perspectives. In S. Brown, & A. Glasner (Eds.), Assessment matters in higher education:
Choosing and using diverse approaches (pp. 71–82). Buckingham: The Society of Research
into Higher Education/Open University Press.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assess-
ment. Educational Researcher, 18(2), 5–11.
Messick, S. (1994). The interplay of evidence and consequences in the validation performance
assessments. Educational Researcher, 23(2), 13–22.
6 The Edumetric Quality of New Modes of Assessment: Some Issues and Prospects 113

Messick, S. (1995). Validity of psychological assessment. Validation of inferences from


persons’ responses and performances as scientific inquiry into score meaning. American
Psychologist, 50, 741–749.
Nevo, D. (1995). School-based evaluation: A dialogue for school improvement. London:
Pergamon.
Nijhuis, J., Segers, M. R. S., & Gijselaers, W. (2005). Influence of redesigning a learning
environment on student perceptions and learning strategies. Learning Environments
Research: An International Journal, 8, 67–93.
Norton, L. S., Tilley, A. J., Newstead, S. E., & Franklyn-Stokes, A. (2001). The pressures of
assessment in undergraduate courses and their effect on student behaviours. Assessment
and Evaluation in Higher Education, 26, 269–284.
Orsmond, P., Merry, S., & Reiling, K. (2000). The use of student derived marking criteria in
peer and self-assessment. Assessment & Evaluation in Higher Education, 25(1), 23–38.
Pond, K., Ui-Haq, R., & Wade, W. (1995). Peer review: A precursor to peer assessment.
Innovations in Education and Training International, 32, 314–323.
Pope, N. (2001). An examination of the use of peer rating for formative assessment in
the context of the theory of consumption values. Assessment and Evaluation in Higher
Education, 26, 235–246.
Pope, N. (2005). The impact of stress in self- and peer assessment. Assessment and Evaluation
in Higher Education, 30, 51–63.
Powers, D., Fowles, M., & Willard, A. (1994). Direct assessment, direct validation? An
example from the assessment of writing? Educational Assessment, 2(1), 89–100.
Rigsby, L. C., & DeMulder, E. K. (2003). Teachers voices interpreting standards: compro-
mising teachers autonomy or raising expectations and performances. Educational Policy
Analysis Archives, 11(44), Retrieved August 15, 2007, from http://epaa.asu.edu/epaa/
v11n44/
Sambell, K., McDowell, L., & Brown, S. (1997). But is it fair? An exploratory study of student
perceptions of the consequential validity of assessment. Studies in Educational Evaluation,
23(4), 349–371.
Scouller, K. (1998). The influence of assessment method on student’s learning approaches:
Multiple choice question examination versus assignment essay. Higher Education, 35,
453–472.
Scouller, K. M., & Prosser, M. (1994). Students’ experiences in studying for multiple choice
question examinations. Studies in Higher Education, 19(3), 267–279.
Segers, M., & Dochy, F. (2001). New assessment forms in problem-based learning: The value-
added of the students’ perspective. Studies in Higher Education, 26(3), 327–343.
Segers, M. S. R., Dierick, S., & Dochy, F. (2001). Quality standards for new modes of
assessment. An exploratory study of the consequential validity of the OverAll Test.
European Journal of Psychology of Education, XVI, 569–588.
Segers, M., Dochy, F., & Cascallar, E. (2003). Optimizing new modes of assessment: In search
for qualities and standards. Boston: Kluwer Academic.
Shavelson, R. J. (1994). Guest Editor Preface. International Journal of Educational Research,
21, 235–237.
Shavelson, R. J., Webb, N. M. & Rowley, G. L. (1989). Generalizability Theory. American
Psychologist, 44(6), 922–932.
Shepard, L. (1991). Interview on assessment issues with Lorie Shepard. Educational
Researcher, 20(2), 21–23.
Sluijsmans, D. M. A. (2002). Student involvement in assessment: The training of peer assess-
ment skills. Unpublished doctoral dissertation, Open University, Heerlen, The
Netherlands.
Starren H. (1998). De toets als hefboom voor meer en beter leren. Academia, Februari, 1998.
Struyf, E., Vandenberghe, R., & Lens, W. (2001). The evaluation practice of teachers as a
learning opportunity for students. Studies in Educational Evaluation, 27(3), 215–238.
114 F. Dochy

Struyven, K., Dochy, F., & Janssens, S. (2005). Students’ perceptions about evaluation and
assessment in higher education: a review. Assessment and Evaluation in Higher Education,
30(4), 331–347.
Struyven, K., Gielen, S., & Dochy, F. (2003). Students’ perceptions on new modes of assess-
ment and their influence on student learning: the portfolio case. European Journal of
School Psychology, 1(2), 199–226.
Suen, H. K., Logan, C. R., Neisworth J. T. & Bagnato, S. (1995). Parent-professional
congruence. Is it necessary? Journal of Early Intervention, 19(3), 243–252.
Tan, C. M. (1992). An evaluation of continuous assessment in the teaching of physiology.
Higher Education, 23(3), 255–272.
Thomas, P., & Bain, J. (1984). Contextual dependence of learning approaches: The effects of
assessments. Human Learning, 3, 227–240.
Thomson, K., & Falchikov, N. (1998). Full on until the sun comes out: The effects of
assessment on student approaches to studying. Assessment & Evaluation in Higher Educa-
tion, 23(4), 379–390.
Topping, K. (1998). Peer-assessment between students in colleges and universities. Review of
Educational Research, 68(3), 249–276.
Trigwell, K., & Prosser, M. (1991a). Improving the quality of student learning: The influence
of learning context and student approaches to learning on learning outcomes. Higher
Education, 22(3), 251–266.
Trigwell, K., & Prosser, M. (1991b). Relating approaches to study and quality of learning
outcomes at the course level. British Journal of Educational Psychology, 61(3), 265–275.
Vermunt, J. D. H. M. (1992). Qualitative-analysis of the interplay between internal and
external regulation of learning in two different learning environments. International
Journal of Psychology, 27, 574.
Chapter 7
Plagiarism as a Threat to Learning:
An Educational Response

Jude Carroll

Introduction

Plagiarism is widely discussed in higher education. Concern about the rising


level and severity of cases of student plagiarism continues to grow. Worries
about student plagiarism are heard in many countries around the world.
This chapter will not rehearse the full range of issues linked to student
plagiarism. Guidance on how it might be defined, on how students can be
taught the necessary skills, and on how cases can be handled when they occur
are easy to find on the web (see, for example, JISC-iPAS – the UK government
sponsored Internet Plagiarism Advisory Service. Guidance is equally common
in printed format (Carroll, 2007). Instead, the chapter explores the connections
between learning and plagiarism and explains why this link should be central
to discussions about the issue. The chapter also discusses how the link with
learning differentiates the treatment of plagiarism within higher education
from the way in which the issue is handled outside the academy. It argues that,
in the former, discussions of plagiarism should centre on whether or not the
submitted work warrants academic credit and not, as happens outside of
higher education, on the integrity of the plagiarist. Actions designed to clarify
what is meant by learning and to encourage students to do their own work are
also likely to discourage plagiarism but this becomes a secondary and valuable
offshoot of a pedagogic process rather than the goal of a catch and punish view
of dealing with the issue. The chapter begins by reviewing why it is difficult to
keep the focus on learning when so many pressures would distract from this
goal. It then considers what theories of learning are especially useful in under-
standing why higher education teachers and administrators should be con-
cerned about students who plagiarise and concludes with suggestions about
how to encourage students to do their own work rather than to copy others’
efforts or commission others to do the work for them.

J. Carroll
The Oxford Centre for Staff and Learning Development, Oxford Brookes University,
Oxford, UK
e-mail: jrcarroll@brookes.ac.uk

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 115


DOI: 10.1007/978-1-4020-8905-3_7, Ó Springer ScienceþBusiness Media B.V. 2009
116 J. Carroll

Media Coverage of Plagiarism

Even a cursory review of media coverage post-2000 would show that plagiar-
ism, both within and without higher education, has a prominent position in
discussions of contemporary behavior and misbehavior. Macfarlane (2007)
refers to current interest as ‘‘almost prurient curiosity’’ and as a modern ‘‘obses-
sion’’ then, like many other commentators, lists recent examples ranging from
the UK Prime Minister’s case for the Iraq war in 2002 which included unauthor-
ized copying, to Dan Brown’s 2006 high profile court case on The Da Vinci
Code. Such stories are common and the fallout significant for high profile cases.
Journalists have been sacked, Vice Chancellors prompted to resign, novelists
have had to withdraw books and musicians have gone to court to protect their
work. Students are commonly referred to these cases when they are introduced
to academic writing requirements and urged not to follow these examples and to
attend to the consequences. For example, the 2007 WriteNow site in the UK
(http://www.writenow.ac.uk/student_authorship.html) primarily encourages
students to be authors rather than copiers and explains why this is important
but it also warns them about plagiarism and showcases these instances.
Resources from the site designed to encourage authorship include the state-
ment, ‘‘Some people in universities hate students who plagiarize!’’ then suggests
that teachers warn their students:
There are quite a lot of people in universities who are fed up with reading student essays
that have been pasted in from web sites. . . ... [S]tudents have to be extremely careful not
to get themselves in a position where they are suddenly accused of cheating and risk
having disciplinary action taken against them.

Connections between cheating and plagiarism such as these are common, but
often they are not warranted.1 Uncoupling plagiarism and integrity is not an
argument for rule breaking to be overlooked. When students plagiarise, they
have not met some important requirements and therefore should not be
awarded academic credit for the learning they have bypassed through plagiar-
ism. Moreover, in some instances, additional penalties are warranted because
the plagiarism also included serious cheating, a point which will be discussed in
more detail below.
By keeping the central focus on learning (rather than student cheating), it is
possible to develop ways other than warning students about the dangers to
deter them from adopting unacceptable methods for generating coursework.
When cases of student plagiarism do occur, learning-centred procedures for
dealing with it makes it more likely that decisions and penalties are fair and
proportionate. However, only some ways of looking at learning are useful in
explaining to students why their plagiarism is important and only some ways
in which students themselves conceive of their responsibilities as learners help

1
The authorship presentation also mentions ‘‘carelessness’’ and ‘‘hurry’’ and the more neutral
‘‘pressure’’ to explain why plagiarism might occur.
7 Plagiarism as a Threat to Learning: An Educational Response 117

them make sense of academic regulations about citation and attribution. To


that end, this chapter begins by exploring learning theories labelled as
constructivist.

Constructing Meaning, Constructing Understanding

Constructivist learning theories are often described as growing out of work


done by the French thinker, Piaget (1928), who described how learners create
schema which he said were formed when students link together perceptions they
already hold with ideas they encounter and/or experiences and actions they
have taken. Piaget saw learners as not simply acquiring new information but
rather as actively combining and linking ideas, thereby making their own
meaning(s). Atherton (2005), in his catalogue of learning theories, notes that
Piaget’s overall insights spurred others to add their own. These include Dewey
(1938) who is usually credited with encouraging the use of real life problem-
solving in education and with valuing students’ critical thinking. Bruner (1960)
focused on ways students might categorise knowledge and build hierarchies of
meaning. Vygotsky (1962) added a further dimension arising from observing
that children learned more effectively when working in collaboration with an
adult however, ‘‘it was by no means always the case that the adult was teaching
the children how to perform the task, but that the process of engagement with the
adult enabled [the children] to refine their thinking or their performance to make
it more effective’’ (Atherton, 2005). Vygotsky’s variant, social constructivism,
stressed the importance of social interaction in building shared understandings
(as opposed to purely personal ones). These are now a central plank of many
activities designed to enhance students’ understanding of assessment require-
ments, including those linked to plagiarism (Rust, O’Donovan, & Price, 2005).
All these theories and theoreticians share a belief that learning happens as
each student links new ideas to concepts and information which they have
already encountered. This active process is akin to building objects in the real
world and thus the use of the construction metaphor in the label. Students
show they have learned (as opposed to just knowing about things) as they
transform information, use it in new ways or apply it to specific situations.
Constructivists would argue that students do not show very much understanding
by finding and reproducing others’ ideas or by quoting their words (although
admittedly, selection, ordering and accurate reproduction is a form of use).

Different Ways to Show Learning


In many educational settings, both teachers and students would be unfamiliar
with the learning approach outlined in the last paragraph. In many tertiary
classrooms around the world, teachers and learners are more likely to be
118 J. Carroll

occupied in learning tasks that involve matching well-selected questions to


authoritative answers. In this sort of learning environment, a teacher’s job is
to identify questions worth pursuing, then to select and vet resources (often
in the form of text-books) in order to help students to answer the questions
which teachers deem worthy of students’ attention. For some questions, tea-
chers should tell students the most authoritative and reliable answers. In
parallel with this style of teaching, the student must know what questions will
be asked (usually in an examination) and what answers will be acceptable
to assessors. Students must learn to retrieve and use acceptable answers quickly
and accurately (often from memory) and be able to differentiate between good
and not-so-good answers for particular settings and circumstances.
This kind of learning is often described pejoratively (especially by
constructivists) as mug and jug learning where the student is the empty
mug and the teacher is the jug of expertise and information. The teacher
is described as didactic, transmitting knowledge, and the student is often
described as passive, absorbing knowledge then repeating it in examina-
tions or descriptive papers rather than altering or evaluating it. Indeed,
students in this kind of learning system report that they are penalised
for changing or revising received information (Handa & Power, 2005;
Robinson & Kuin, 1999). However, those who teach in such systems insist
that simple reproduction is not what is required. Students are expected to
understand what they present and teachers assume understanding is
demonstrated by skilful reproduction. They cite examples where repetitive
learning strategies lead to reliable recall and deep understanding (Au &
Entwistle, 1999). Students say that they cannot retain information without
understanding; they cannot use it unless they have a sense of its meaning,
and that selecting information in itself to answer questions signals agree-
ment and judgment.
The purpose of this chapter is not to resolve which of the two approaches,
the constructivist or the objectivist/reproductive, has more pedagogic merit.
Nor is it possible to characterise any one educational system as requiring
only constructivist or only reproductive learning. In any UK A-level class-
room, you will find teachers priming students to answer examination ques-
tions that test their knowledge and also guiding them on writing coursework
which is designed to allow students to demonstrate their own research and
use of information. Hayes and Introna (2005) describe seeing Engineering
lecturers in an Indian university dictating answers to examination questions
(and students copying their words and the advice on correct punctuation for
examination answers) then watching the same students engaging in problem-
based learning in another classroom. Didactic, reproductive learning is said
to be the norm in what are often referred to as non-Western universities
where teachers are likely to ask students ‘‘Show me you know about x’’-type
questions. What is less often discussed is how frequently these types of
questions also appear in assessment tasks in the UK, Australia, Canada,
or the USA. In all these places, you will hear teachers bemoaning their
7 Plagiarism as a Threat to Learning: An Educational Response 119

students’ lack of initiative for reading outside of the set text and their
preference for spoonfeeding. Nevertheless, universities in Anglophone coun-
tries and in what are often termed Western universities base their assessment
criteria and judgements about whether or not the student has learned
on assumptions that are more akin to constructivist than to reproductive
thinking. This is especially true of coursework. Teachers’ underpinning
constructivist assumptions explain why they are so exercised about student
plagiarism.

Plagiarism and the Student’s Own Work

In a plagiarised piece of work (or, what is more usual, plagiarised elements


within a piece of student coursework), the student did not do the work of
making meaning and transforming ideas, therefore he or she has offered no
evidence of learning and cannot be awarded academic credit. Bypassing learn-
ing through plagiarism means the student bypasses the opportunity for their
own development as well. A student who copies or pays someone to produce
their coursework remains the same student at the end of the assessment,
whereas one who did the work does not, even if the change is too small to be
easily described.

Distractions in Maintaining the Link Between Plagiarism


and Learning

Learning does not usually get a prominent position in most commentaries


about students’ plagiarism. Many institutions have Academic Integrity policies
rather than ones with the word plagiarism in the title, presumably to underscore
the place of academic values and beliefs. Certainly it makes logical sense to tell
students what they must do and why it is important for them to do their own
work before telling them what they must not do (plagiarise) and before laying
out the potentially grim consequences if they breach regulations. However, a
less welcome effect is to recast student plagiarism as a moral rather than a
pedagogic issue. Serious cheating involving plagiarism does happen. However,
most plagiarism is not cheating and the number of cases involving deliberate
attempts to gain unfair advantage, to deceive assessors or to fraudulently obtain
academic awards are a small or even very small percentage of overall plagiarism
cases. It is sometimes difficult to believe this to be the case since newspapers
stress the concerns and often fuel the perceived moral panic which surrounds
the issue. Statistics, too, look worrying: ‘‘80% of students cheat’’ (Maslen, 2003)
whilst downplaying details as to frequency, what percentage is through plagi-
arism, and any impact on students’ learning.
120 J. Carroll

Other studies have been more specific to plagiarism and they too look
worrying. McCabe (2006) found that four in ten students regularly cut-and-
paste without citation but did not investigate the extent of the practice in
individual pieces of work. Narrowing the focus to deliberate cheating involving
plagiarism, even lower rates emerge. Two per cent of students in one Canadian
study admitted to commissioning others to write their coursework – but we do
not know how often this happened. Actual experience is lower still as Lambert,
Ellen, and Taylor (2006) found when investigating cases in New Zealand. They
estimated that five per cent of students had a case dealt with informally by their
teachers in the previous academic year and fewer than one per cent were
managed formally, which they presumed constituted the more serious exam-
ples. In 2007, the official figures for cases in the UK, even for very large
universities, were fewer than 100 for upwards of 20,000 students.
In my own institution, we found deliberate cheating involving plagiarism in
0.015% of coursework submitted in one academic year, 2005/2006. Students in
these cases had paid ghost writers, submitted work which had been done by
someone at another university, paid professional programmers to create their
code, or had copied-and-pasted substantial amounts from others’ work into
their own without attribution. Serious cases revealed students cheating by
manipulating their references; some had altered texts using the ‘‘find and
replace’’ button in ways that were intended to create a false assumption in the
assessor as to whose work was being judged. They had stolen others’ work or
coerced others into sharing it. One student had gone to the library, borrowed
someone’s thesis then ripped out the text, returned the cover, scanned the
contents and submitted the work under his own name. (Note: we caught this
particular student because he used his real name when taking out the original!)
These are all unacceptable acts. Some are incompatible with a qualification
and where this was proven, the students did not graduate. All of them are
incompatible with awarding academic credit. However, the number of stu-
dents involved in this type of deliberate cheating is very small – one case in
1500 submissions. Even if, in all these examples, the percentage of cheating
cases were increased in some substantial way to adjust for under-detection
(which undoubtedly occurred), this level of deliberate cheating does not
threaten the overall value of a degree from our university nor does it support
the view that our graduates’ skills and knowledge cannot be trusted. It is not
cause for panic.
It is not just the relatively small number of cases that should reassure those
who might panic and dissuade those who focus too strongly on cheating. Most
of the cases in the previous examples will be one-off instances. Only a small
number of students use cheating as the regular or even exclusive means of
gaining academic credit rather than, as most studies show, a one-off mishand-
ling of assessment requirements). Szabo and Underwood (2004) found six per
cent of students who admitted to deliberate plagiarism said they did so regu-
larly. The two per cent of Canadian students purchasing essays probably did so
only once although sites where students can commission others to do their
7 Plagiarism as a Threat to Learning: An Educational Response 121

coursework usually include a section for users’ comments and in these, satisfied
customers sometimes explicitly state they will use the service again. A few sites
actively promote regular use as, for example, one which offers to produce PhD
dissertations chapter by chapter to allow discussion with supervisors. In every
institution, there are undoubtedly instances where all (or almost all) of a
student’s credit has been acquired by fraudulent means. However, in my own
university, with a student cohort of 18,000, we deal with only a handful of cases
per year which involve repeated acts by the same student.

Distractors not Linked to Worries about Cheating


Although an over-emphasis on cheating and the resulting moral panic is probably
the most significant distractor from dealing with plagiarism as a teaching and
learning issue, another distraction is an over-emphasis on citation practices.
Teachers can give the (usually mistaken) impression that adhering to referencing
guidelines is more important than writing the text which citations are designed
to support. I remember one mature student, looking back at his undergraduate
career, who said, ‘‘I was tempted to forget the essay and just hand in the reference
list since that was all teachers commented about.’’ As the WriteNow quote
above reminds students, carelessness leads to risk and disciplinary action.
Students’ worries about the risks of imperfect citation are not all misplaced.
Students have been accused of plagiarism when using informal attributions
along the lines of ‘‘As Brown says in his book’’ rather than a fully fledged
Harvard citation. They worry that less-than-perfect citations might trigger
disciplinary action. They are concerned whether their attempts to re-write
complex academic prose into their own words, sometimes also in an unfamiliar
language and nearly always in an unfamiliar disciplinary context will be suffi-
ciently different to escape accusations of plagiarism. Yet Howard (2000) and
Pecorari (2003), both of whom have studied students’ often clumsy attempts to
paraphrase, refer to the result as patch writing and describe it as an often
necessary step along the road to fully competent academic writing. Patch
writing is plagiarism but, more importantly, it is a signal for more teaching
and the need for more learning.
Over-policing citation regulations are perhaps understandable acts by
teachers who feel they must do something about plagiarism and are not sure
what that something could be. They are correct that a student who sticks too
closely to the original text or who is less than assiduous in their use of quotation
marks, even if the student’s text makes it clear that he or she is using others’
ideas, has plagiarised and unless action is taken, will probably continue to do
so. There may be utility in awarding a small penalty to ensure the student
attends to the matter as a way of encouraging the student in future to adhere
more closely to recognised conventions. However, these are issues of learning
and apprenticeship, not of lack of integrity.
122 J. Carroll

Plagiarism, as Opposed to Cheating, Is Common

Study after study confirms Park’s statement that ‘‘plagiarism is doubtless


common and getting more so’’ (Park, 2003, p. 471). Many studies support
the rule of thumb finding that at least ten per cent of students submit work
that is ‘not their own’ because they have cut-and-pasted others’ words, or
paraphrased badly, or leaned too heavily on the work of fellow students. This
rule-of-thumb ten per cent rises steeply in some disciplines and in some kinds
of assessment such as the traditional academic essay or the lab report. In the
examples mentioned in previous paragraphs, there were ten recorded cases of
negligent practice or academic malpractice involving students’ submitting
others’ work for each recorded case of deliberate misconduct. This larger
number of plagiarism cases reflects students’ relatively common difficulties
with authorship and their varied interpretations of the requirement to ‘‘do
your own work’’.
If plagiarism is defined as ‘‘submitting someone else’s work as your own’’
then the converse, namely ‘‘submitting your own work’’, must be valuable.
Students need to see that, in submitting their own work, they demonstrate to
the teacher that they have learned. These two interrelated ideas – submitting
your own work and showing your own learning – would seem to express clear-
cut aims, expressed in plain English. Indeed, most institutional approaches to
deterring and dealing with plagiarism start with definitions that are in straight-
forward language, offering examples and instances where actions would be
acceptable and unacceptable. However, quandaries remain which can only be
resolved through discussion and example. Here are some examples of potential
misunderstanding:
 If students’ work is deemed plagiarised only when it is submitted and if
students are expected to generate that work over time, then when must a
student stop sharing and co-operating with others and, instead, start work-
ing independently to ensure he or she is doing their own work?
 Where assessment criteria include marks for grammar and spelling, can
students use a proof reader? Why is it acceptable to ask a librarian how to
conduct a search when paying someone to create a list of sources is not?
 How much transformation is needed before the student can claim the result
as ‘‘my own’’ even if the final draft acknowledges the original source? Can a
student leave ten words unchanged? Eight? Five?
 Which is more valuable: a perfect assessment artefact or one ‘‘made by
me’’? Does the teacher want the best that can be made or the best that I can
make?
 If common knowledge need not be cited, how do I know what falls under that
category? If everyone knows it, which everyone are we talking about?
In quandaries such as these (and many others), social constructivist learning
theory is helpful in reminding staff and students that shared understanding grows
7 Plagiarism as a Threat to Learning: An Educational Response 123

through interaction, practice, and above all, through feedback. Declarative


knowledge such as that provided by a student handbook can provide a definition
or an hour’s plagiarism lecture can explain the rules but neither are much help for
quandaries such as the above. Resolving them will need time, interaction and
teachers’ tolerance with their students’ less-than-skilful attempts.
Eventually, many students do master referencing and citation rules,
thereby avoiding plagiarism. However, they may still regard doing the refer-
encing as a strange or arbitrary academic obsession. What is the point, many
wonder (and especially those who are non-native writers), of paraphrasing a
piece of text in flawless English to create an awkwardly expressed, personal
version? Students who know that they must change the original to avoid
plagiarism may do so via the ‘‘find and replace’’ button in Word in order to
create a ‘‘paraphrase’’ (sic). They may generate text by collecting and assem-
bling others’ words then smoothing the joins to make it seem as if the final
result is an authored (as opposed to scrap-booked) piece of work. Students
who do these things (and many more) do not see their actions as breaching
academic regulations because they have a particular view of themselves as
learners and because their ideas about learning are often very different from
those held by their teachers.

Students’ Ideas About Learning


Many studies have attempted to get inside students’ perceptions and beliefs
about learning and to understand the stages of their cognitive development.
One explanation is especially useful in linking ideas about plagiarism to stu-
dents’ understanding of learning. Perry (1970) derived insights from his work
with American undergraduates over several decades to create an account of
students’ cognitive development, starting out as dualist thinkers. Students at
this stage of cognitive development regard answers as either right or wrong, true
or false; they see learning as providing answers which their teachers or text-
books provide. For these dualist students, discussions of academic integrity
policies and tutorials on academic writing are likely to make no sense. Citations
and gathering others’ ideas appear unnecessary as they generally see judgments
and evaluations as self-evident. Dualist students can often see paraphrasing as
inexplicable and regard an assessment task where the answer may not exist and
definitely cannot be found, but instead must be created by the student, as more
akin to tricks the teacher is playing rather than elements of learning. Since such
students must nevertheless be taught to adhere to academic regulations, Angelo
(2006) suggests adopting a pragmatic approach. He suggests focussing on
behaviours and acceptable practices, rather like teaching new drivers the rules
of the Highway Code, instructing them on the best way to do an assignment and
reminding them of the consequences should they not comply with rules against
plagiarism.
124 J. Carroll

Students do usually move from the dualist position to the next level of
cognitive development which Perry terms the ‘multiplistic’ stage. Studies on
how rapidly students move to multiplistic thinking show wide variation. One
study by Baxter Magolda (2004) which followed the cognitive development
of a group of US undergraduate students over 16 years found some required
several years to move to the next stage, a few never did so, and most
managed to shift their thinking within their undergraduate study. At the
multiplistic stage, students recognise multiple perspectives on an issue, situa-
tion or problem, however, they do not necessarily develop strategies for
evaluating them. Students at this stage say, ‘‘Well, that’s your opinion and
we are all entitled to our own views’’. A multiplistic thinker is likely to
regard an essay that requires him or her to structure an argument as an
invitation to document a disagreement. A student operating at this level is
likely to find choosing between sources difficult and to view the concept
of authority as clashing with their ‘‘it is all relative’’ thinking. Citation
becomes a stumbling block to the free-flowing expression of the student’s
own ideas and opinions. Again, as for dualist students, teaching about
academic writing and avoiding plagiarism needs to focus on rules and
acceptable behaviours whilst ensuring students know there are value-driven
reasons for academic integrity policies.
Only when students reach the stage Perry terms relativistic are requirements
around plagiarism likely to be genuinely understood and acted upon.
(Note: the previously mentioned longitudinal study found that not all stu-
dents reach this point, even after several years of university study.) Relati-
vistic students, as the term implies, see knowledge as linked to particular
frames of reference and see their relationship to knowledge as detached and
evaluative. Once students see learning in this way, they are able to review
their own thinking and see alternative ways of making decisions. When they
eventually reach a conclusion (and they see this as difficult), relativistic
students are willing to accept that the position is personal, chosen and/or
even defensible. Perry describes students at this level as recognising that those
in authority (i.e. experts, authors of research publications and, indeed, their
own teachers) generally arrive at their positions by similar routes to the
student’s own and therefore are also open to review and questioning. For
relativistic students, using others’ authority to support an argument, citing
information gathered through research and reading, and paraphrasing as a
way of showing that points have been understood and personalised seems
almost second nature (albeit still tricky to do well as I myself find after many
decades of practice). Work produced by relativistic students will be their own
even if the resulting document resembles many which have been written
before.
The challenge comes when it is the teacher’s task to ensure that all students
are able to comply with academic regulations that specify students must do their
own work and avoid plagiarism. These rules apply to all students, including
those with a dualistic view of knowledge, those who have studied previously in
7 Plagiarism as a Threat to Learning: An Educational Response 125

educational environments that valued the ability to memorise then reproduce


accurately, and those who are used to accessing information rather than trans-
forming it and becoming personally involved with its use. ‘‘Do your own work’’
may mean very different things to students who see the task as finding rather
than making the answer: it may seem pointless to a student who believes
they have found someone else’s answer, already perfectly formed, and it
makes little sense to a student who seems to believe that the teacher who asks
for a 3000 word essay is more interested in that number of words rather than the
evidence they contain, more interested in the referencing practices than in
assessing the document for evidence that the student has met the learning
outcomes in their coursework.

A Reality Check and a Recommendation

It is not possible to move students towards understanding and accepting


constructivist, multiplistic perspectives at a rate faster than they are willing or
able to go. Some may move very slowly indeed. Most students do not arrive in
higher education with a fully developed set of academic skills which will equip
them to tackle complex academic writing tasks. Perhaps they never did but
certainly now, in the 21st century, as widening participation and increasing
numbers of students travel around the world to study, many, if not most,
students will not arrive at university able to locate sources, evaluate them,
read texts for ‘‘useful bits’’, construct an argument, support their views with
others’ authority, use in-text citation and find their way around a specific
referencing system. Above all, teachers cannot assume that students’ motiva-
tions and values match those of their teachers. Students may not attend spon-
taneously to the things teachers see as important and many say that extrinsic
motivations dominate their decisions. Good grades, rather than the learning
stated in course outcomes, are what matters for future employment. Our
students may be stretched, stressed, distracted and lacking in confidence in
themselves as learners. They may be operating with a small vocabulary of
English words and finding it difficult or impossible to do themselves justice in
an unfamiliar language. They may be confused and alienated by new and
unfamiliar academic expectations. The list could go on.
The response, as far as plagiarism is concerned, is not despair nor panic, nor
returning to exams nor raising the entry requirements. Instead, by linking
plagiarism and learning, it is possible to adopt measures which shape and direct
student efforts towards the learning outcomes so they do their own work
because there is little alternative. The remainder of this chapter outlines what
these strategic prompts and requirements might be, focusing primarily on
assessment as this holds a key place in the overall management of student
plagiarism (Macdonald & Carroll, 2006).
126 J. Carroll

Creating Do Your Own Work Assignments

When students are presented with an assignment, they report asking themselves
a series of questions about the task such as
 Can I do this?
 Can I do it well (or well enough)?
 Has someone else already done it and if so, can I find the result?
 If I do it myself, what else in my life will suffer?
 Is it worth putting in the time and effort to do it or should I spend my time
on other assignments?
 If I copy or find someone else’s answer, will I be caught?
 If I’m caught, what will happen?
and so on.
By asking themselves a similar series of questions when setting assessment
tasks, teachers can create courses and assessment tasks that encourage learning
and discourage plagiarism. In practice, reviewing one’s own assessments seems
more difficult than doing this with and for one’s colleagues. Often the person who
sets the assignment sees it as a valid and plausible task whereas someone else can
identify opportunities for copying and collusion. Equally, interrogating assess-
ment tasks becomes more likely if it is part of the normal programme/course
design process rather than the responsibility of individual course designers or
teachers. The following section suggests ways to interrogate course designs and
assessment tasks, either individually or as a programme team.

Questions that Encourage Learning and Discourage Plagiarism

1. Are you sure the students will have been taught the skills they will need
to display in the assessment?
Answering this question positively requires you to be able to list the skills
students will need, especially those usually classed as study skills or linked to
academic writing. Necessary skills extend beyond the use of a specific referen-
cing system to include those for locating and using information, for structuring
an argument, and for using others’ authority to underpin the student’s own
ideas and views.
As well as generic skills, students will also need explicit teaching of discipline-
specific ways to handle an assessment. Some discipline-specific skills will be
linked to the previous generic list which is largely concerned with writing.
Writing an essay as a historian, a biologist or a paediatric nurse will have
specific requirements which must be learned. Whilst some students are espe-
cially good at spotting cues and picking up these matters implicitly, most only
become skilful through explicit instruction, practice and acting on feedback.
For all students, learning to complete assessments in the discipline is part of
learning the discipline itself.
7 Plagiarism as a Threat to Learning: An Educational Response 127

Students who lack the skills to complete assignments often feel they have no
alternative but to plagiarise.
2. Have you designed a task that encourages students to invest time and effort?
Have you designed ways to dissuade them from postponing assessment tasks to
‘the last minute’?
One of the conditions named by Gibbs and Simpson (2004–2005) when they
looked at ways in which assessment supports learning was time on task. Of
course, time on its own will not always lead to learning valued by the teacher.
Students could be distracted by unfocused reading, or side-tracked into creating
an over-elaborate presentation. They could be staring at notes or spending
hours transcribing lectures. However, unless students put in the time necessary,
they cannot develop the understanding and personal development required for
tertiary learning. Course design solutions which counter student procrastina-
tion and focus time on task include early peer review of drafts where students
read each others’ work in progress then comment in writing, with the require-
ment that the student includes the peer feedback in their final submission by
either taking the comments into account or explaining why they did not do so.
Other ways to capture students’ time include asking them to log their efforts in
on-line threaded discussion or to chunk tasks and monitor their completion.
Ensuring activity is occurring is not the same as assessing it. Formative
assessment is time-consuming and unrealistic, given teachers’ workloads. How-
ever, verifying that work has started by asking to see it and then signing and
dating the student’s hard copy for later submission alongside the finished article
can be relatively undemanding on teacher time and hugely encouraging to
students to ‘get going’.
Students who delay work until the last minute often see little alternative to
plagiarism.
3. Are there opportunities for others to discuss and interact with the student’s
assessment artefact during its production ?
This is unlikely to happen unless you specifically design ways in which
students can share and comment on each others’ work into the course. You
might design in peer review and/or feedback as mentioned in the last section;
alternatively, you could set aside some face-to-face time to observe and support
the assessment work as part of scheduled class meetings. Evidence of activity
from students’ peers, workplace mentors, or tutors can enrich the student’s own
understanding as well as authenticate their efforts.
4. Turning to the task itself, can the student find the answer somewhere ? Does the
answer already exist in some form?
Verbs are often crucial here. Asking students to rank, plan, alter, invent or be
ready to debate signals they must do the work whereas asking the student to
show his or her knowledge or to demonstrate understanding of a theory or
idea (‘‘What is the role of smoking cessation in the overall goal of improving
public health?’’) usually needs a few on-line searches, some cut-and-paste and
a bit of text-smoothing. Students tell me they can do this ‘‘work’’ in a short
time – 30 minutes was the guess from a UK student who reported being given
128 J. Carroll

exactly the title involving smoking. Those with reasonable writing skills
engaging in cut-paste-smooth ‘‘authorship’’ (sic) will rarely if ever be identified,
leaving the non-native English speakers and poor writers to trigger the asses-
sor’s suspicions.
In general, all tasks that ask the student to discuss, describe or explain
anything are more appropriately judged through examination, if at all, rather
than through coursework.
5. Can the student copy someone else’s answer?
In general, if the students sense that the answer already exists, either because
the problem is well-known or the issue has been well-covered, then they see
finding the answer as being a sensible way to allocate their time rather than as
being plagiarism. Students report no difficulty finding answers if they suspect
(or know) that the questions have not changed since the last time the course was
run, even if the work was not returned to students. They describe easy ways to
access others’ coursework or locate informal collections of past student work
(and some collections are now located outside of particular universities). Copy-
ing between students is common if the assessment task/problem has one answer
or a small number of ways in which the problem can be solved.
6. Is the brief clear about what will be assessed and which aspects of the work must
be done by the student?
Students often sub-contract parts of the work if they consider it to be less
important or not relevant to the assessment decision. This means that the brief
may need to specify whether or not proofreading is acceptable; who may help
with locating and selecting sources; how much help from fellow students is
encouraged; and conversely, where such help should stop. If these are not
stated, a student must guess or make an effort to ask. Students say they shy
away from taking such matters to teachers, assuming they should use tutorial
time for ‘‘interesting’’ questions instead.
Assessment criteria often state which aspects will be judged for a grade and
students can usefully have these drawn to their attention as a way of clarifying
which work will be assessed.
7. Does the question, in the way that it is posed, push students towards asking
themselves, ‘‘How and where do I find that?’’ or ‘‘How do I make that?’’
Some of the factors that prompt one answer or the other have already been
mentioned, such as novelty and clarity of the assessment brief. Here, the issue
turns more to the nature of the task itself.
Assessments that encourage making, and therefore learning, include those
where students must include any of the following:
 their own experience or data;
 recent and specific information such as legislation, current events or recently
published material;
 application of generic theories in specific, local settings or with particular
individuals such as patients the student has cared for or experiments/projects
the student has carried out;
7 Plagiarism as a Threat to Learning: An Educational Response 129

 individual or individualised data; or


 reference to specific texts, notes or class activity.
Where assessment briefs are general and dated (‘‘Analyse the Eurovision song
contest using anthropological theory’’), students can see the assignment as a
find it opportunity, involving cut-paste-smooth authorship. (As an aside, when
writing this chapter, I made up this example question about Eurovision, basing
it loosely on an examination question which I came across several years ago.
Then, out of curiosity, I did a quick Google search and, based on less than a
minute’s inspection, I found at least ten sources that would be suitable for ‘cut-
paste-smooth’ authorship [sic].) By making the task more specific and recent
(‘‘Analyse the 2008 Eurovision song contest . . .’’), you lessen the opportunities
for finding ready-made answers as there has been less time to collect a corpus of
data and little chance that it has already been evaluated.
However, by re-shaping the question even further, you signal that it is an
invitation to construct an answer. A question such as ‘‘Analyse x number of
voting decisions by countries participating in the 2008 Eurovision song contest
and using your knowledge of social identity. . ...’’ Faced with this task, the
student must seek the voting data then apply social identity theory. Also,
by noting which students choose which countries, you can see whether
intra-student copying seems common. Alternatively, to lessen the chances
that students copy from each other, you might include a personal experience
or individual dimension to the assessment task, for example, ‘‘Draw at least
three parallels between decisions made by xx nations in the 2008 Eurovision
song contest and decisions you yourself have made between yy and zz date
which, in both cases, could be construed as expressions of social identity. . .. . ..’’.

Will This Prevent Plagiarism?

Pedagogic approaches to dealing with student plagiarism focus on students’


learning rather than on making assumptions about their character or on worry-
ing too much about catching those who cheat. Approaches which prioritise
learning engage students’ time and authenticate their effort. Such approaches
aim to shift students’ understanding towards what is valued in their own
learning – that is, making original work and transforming transmitted knowl-
edge into higher cognitive levels of thinking where they create new understand-
ings and analyse and evaluate knowledge. By designing in teaching and
apprenticeship-type practice of academic skills and by designing out easy
chances to copy and find answers, teachers encourage learning. However,
these practices can never ensure that all students’ behaviour is in line with
academic regulations. A few students will deliberately plagiarise and will not
be dissuaded from doing so by learning-focused interventions. They may be
influenced by fear of being caught and by the consequences of punishment
(Norton, Newstead, Franklyn-Stokes, & Tilley, 2001). Some may be stuck in
130 J. Carroll

dualist views of learning, unable to see their responsibilities as being other than
providing answers to teachers’ questions. However, for most students most of
the time, using strategies that support and encourage learning and which
discourage or remove opportunities for copying will tip them into doing their
own work, whether they want to or not. Rethinking assessment and course
design can only be effective if it operates in conjunction with other actions
designed to deal with student plagiarism such as good induction, well-resourced
skills teaching, written guidance, and procedures that are used and trusted by
teaching staff. Thus, the institution as a whole needs an integrated series of
actions to ensure its students are capable of meeting the requirement that they
do their own work because it is only in this way that they do their own learning.

References
Angelo, T. (2006, June). Managing change in institutional plagiarism practice and policy. Key-
note address presented at the JISC International Plagiarism Conference, Newcastle, UK.
Atherton, J. (2005) Learning and teaching: Piaget’s developmental theory. Retrieved August
11, 2007, from http://www.learningandteaching.info/learning/piaget.htm
Au, C., & Entwistle, N. (1999, August). ‘Memorisation with understanding’ in approaches
to studying: cultural variant or response to assessment demands? Paper presented at
the European Association for Research on Learning and Instruction Conference,
Gothenburg, Sweden.
Baxter Magolda, M. (2004). Evolution of a constructivist conceptualization of epistemological
reflection. Educational Psychologist, 30, 31–42.
Bruner, J. (1960). The process of education. Cambridge, MA: Harvard University Press.
Carroll, J. (2007). A handbook for deterring plagiarism in higher education. Oxford: Oxford
Brookes University.
Dewey, J. (1938). Experience and education. New York: Macmillan.
Gibbs, G., & Simpson, C. (2004–2005). Conditions under which assessment supports
students’ learning. Learning and Teaching in Higher Education, 1, 3–31.
Hayes, N., & Introna, L. (2005), Cultural values, plagiarism, and fairness: When plagiarism
gets in the way of learning. Ethics & Behavior, 15(3), 213–231.
Handa, N., & Power, C. (2005). Land and discover! A case study investigating the cultural
context of plagiarism. Journal of University Teaching and Learning Practice, 2(3b), 64–84.
Howard, R. M. (2000). Sexuality, textuality: The cultural work of plagiarism. College English,
62(4), 473–491.
JISC-iPAS Internet Plagiarism Advisory Service Retrieved October 11, 2007, from www.
jiscpas.ac.uk/index.
Lambert, K., Ellen, N., & Taylor, L. (2006). Chalkface challenges: A study of academic
dishonesty amongst students in New Zealand tertiary institutions. Assessment and Evalua-
tion in Higher Education, 31(5), 485–503.
McCabe, D. (2006, June). Ethics in teaching, learning and assessment. Keynote address at the
JISC International Plagiarism Conference, Newcastle, UK.
Macdonald, R., & Carroll, J. (2006). Plagiarism: A complex issue requiring a holistic
approach. Assessment and Evaluation in Higher Education, 31(2), 233–245.
Macfarlane, R. (2007, March 16). There’s nothing original in a case of purloined letters. Times
Higher Education Supplement, p. 17.
Maslen, G. (2003, January 23). 80% admit to cheating. Times Higher Education Supplement.
7 Plagiarism as a Threat to Learning: An Educational Response 131

Norton, L.S., Newstead, S.E., Franklyn-Stokes, A., & Tilley, A. (2001). The pressure of
assessment in undergraduate courses and their effect on student behaviours. Assessment
and Evaluation in Higher Education, 26(3), 269–284.
Park, C. (2003). In other (people’s) words: Plagiarism by university students – literature and
lessons. Assessment and Evaluation in Higher Education, 28(5), 471–488.
Pecorari, D. (2003). Good and original: Plagiarism and patch writing in academic second-
language writing. Journal of Second Language Writing, 12, 317–345.
Perry, W. (1970). Forms of intellectual and ethical development in the college years: A scheme.
New York: Holt.
Piaget, J. (1928). The child’s conception of the world. London: Routledge and Kegan Paul.
Robinson, V., and Kuin, L. (1999). The explanation of practice: Why Chinese students copy
assignments. Qualitative Studies in Education. 12(2), 193–210.
Rust, C., O’Donovan, B., & Price, M. (2005). A social constructivist assessment process
model: How the research literature shows us this could be best practice, Assessment and
Evaluation in Higher Education, 30 (3), 231–240.
Szabo, A., & Underwood, J. (2004). Cybercheats: Is information and communication tech-
nology fuelling academic dishonesty? Active Learning in Higher Education, 5(2), 180–199.
Vygotsky, L. (1962). Thought and language. Cambridge, Mass: M.I.T. Press.
Write Now CETL. (2007). Retrieved September 8, 2007 from www.writenow.ac.uk/index.
html.
Chapter 8
Using Assessment Results to Inform Teaching
Practice and Promote Lasting Learning

Linda Suskie

Introduction

While some may view systematic strategies to assess student learning as merely
chores to satisfy quality assurance agencies and other external stakeholders, for
faculty who want to foster lasting learning assessment is an indispensable tool
that informs teaching practice and thereby promotes lasting learning. Suskie
(2004c) frames this relationship by characterizing assessment as part of a
continual four-step teaching-learning-assessment cycle.
The first step of this cycle is articulating expected learning outcomes. The
teaching-learning-process is like taking a trip – one cannot plot out the route
(curriculum and pedagogies) without knowing the destination (what students
are to learn). Identifying expected student learning outcomes is thus the first
step in the process. The more clearly the expected outcomes are articulated
(i.e., articulating that one plans to visit San Francisco rather than simply the
western United States), the easier it is to assess whether the outcome has been
achieved (i.e., whether the destination has been reached).
The second step of the teaching-learning-assessment cycle is providing suffi-
cient learning opportunities, through curricula and pedagogies, for students to
achieve expected outcomes. Students will not learn how to make an effective
oral presentation, for example, if they are not given sufficient opportunities to
learn about the characteristics of effective oral presentations, to practice deli-
vering oral presentations, and to receive constructive feedback on them.
The third step of the teaching-learning-assessment cycle is assessing how well
students have achieved expected learning outcomes. If expected learning out-
comes are clearly articulated and if students are given sufficient opportunity to
achieve those outcomes, often this step is not particularly difficult – students’
learning opportunities become assessment opportunities as well. Assignments
in which students prepare and deliver oral presentations, for example, are not
just opportunities for them to hone their oral presentation skills but also

L. Suskie
Middle States Commission on Higher Education, Philadelphia, PA, USA
e-mail: LSuskie@msche.org

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 133


DOI: 10.1007/978-1-4020-8905-3_8, Ó Springer ScienceþBusiness Media B.V. 2009
134 L. Suskie

opportunities for faculty to assess how effectively students have developed


those skills.
The final step of the teaching-learning-assessment cycle is using results to
inform teaching-learning practice and thereby promote lasting learning. As
Dochy has noted in Chapter 6, assessments can be viewed as tools to enhance
the instructional process. Indeed, the best assessments have what Dochy,
along with Linn and Dunbar (1991) and Messick (1989, 1994), have called
consequential validity and what Moran and Malott (2004) have called pedagogical
validity – assessment results are used as the basis for appropriate action and,
specifically, to help ‘‘achieve the instructional objectives’’ (Moran & Malott,
2004, p. 137).
This chapter explores this conception of assessment as a means of promoting
lasting learning through the consideration of three topics. First, teaching prac-
tices that have been shown through research to promote deep, lasting learning
are reviewed. Next, some key underlying principles for using assessment results
to inform teaching practices are discussed. Finally, practical suggestions for
using assessment results to inform teaching practice and promote lasting learn-
ing are offered and explained through examples drawn from three very different
kinds of assessment tools: rubrics (rating scales or scoring guides), multiple
choice tests, and qualitative assessments such as reflective writing.

Teaching Practices that Promote Deep, Lasting Learning

Today’s faculty are, in many ways, living in a golden age of education: their
teaching practices can be informed by several decades of extensive research
and publications documenting teaching practices that promote deep, lasting
learning (e.g., Angelo, 1993; Association of American Colleges and Universities,
2002; Astin, 1993; Barr & Tagg, 1995; Chickering & Gamson, 1987, 1991; Ewell &
Jones, 1996; Huba & Freed, 2000; Kuh, 2001; Kuh, Schuh, Whitt, & Associates,
1991; Light, 2001; McKeachie, 2002; Mentkowski & Associates, 2000; Palmer,
1998; Pascarella, 2001; Pascarella & Terenzini, 2005; Romer, & Education
Commission of the States, 1996). Suskie (2004b) has aggregated this work into
a list of 13 conditions under which students learn most effectively (p. 311):
1. Students understand course and program goals and the characteristics of
excellent work.
2. They are academically challenged and given high but attainable
expectations.
3. They spend more time actively involved in learning and less time listening to
lectures.
4. They engage in multidimensional real-world tasks in which they explore,
analyze, justify, evaluate, use other thinking skills, and arrive at multiple
solutions. Such tasks may include realistic class assignments, field experi-
ences, and service-learning opportunities.
8 Using Assessment Results to Inform Teaching Practice 135

5. The diversity of their learning styles is respected; they are given a variety of
ways to learn and to demonstrate what they’ve learned.
6. They have positive interactions with faculty and work collaboratively with
fellow students.
7. They spend significant time studying and practicing.
8. They receive prompt, concrete feedback on their work.
9. They have opportunities to revise their work.
10. They participate in co-curricular activities that build on what they are
learning in the classroom.
11. They reflect on what and how they have learned and see coherence in their
learning.
12. They have a synthesizing experience such as a capstone course, independent
study, or research project.
13. Assessments focus on the most important course and program goals and
are learning activities in their own right.
These principles elucidate two ways that assessment activities play a central
role in promoting deep, lasting learning. First, as Dochy has noted in Chapter 6,
providing a broad array of activities in which students learn and are assessed
(Principle 5) promotes deep, lasting learning (Biggs, 2001; Entwistle, 2001). This
rich array of evidence also provides a more complete and more meaningful
picture of student learning, making assessment evidence more usable and useful
in understanding and improving student learning. At its best, this broad array
of learning/assessment activities is designed to incorporate three other princi-
ples for promoting lasting learning:
Principle 5: Students learn more effectively when the diversity of their learning
styles is respected and they are given a variety of ways to learn and to demonstrate
what they’ve learned. Suskie (2000) has noted that, because every assessment is
inherently imperfect, any decisions related to student learning should be based
on multiple sources of evidence. Furthermore, because every assessment favors
some learning styles over others, students should have ‘‘a variety of ways to
demonstrate what they’ve learned’’ (p. 8).
Principle 11: Students learn more effectively when they reflect on what and how
they have learned and see coherence in their learning. In Chapter 4, Sadler has
argued extensively on the educational value of students’ practicing and devel-
oping skill in critical self-appraisal. Self-reflection fosters metacognition: learn-
ing how to learn by understanding how one learns (Suskie, 2004a).
Principle 12: Students learn more effectively when they have a synthesizing
experience such as a capstone course, independent study, or research project. Such
experiences give students opportunities to engage in synthesis – integrating what
they have learned over their academic careers into a new whole (Henscheid, 2000).
The second way that assessment activities play a central role in promoting
deep, lasting learning is through the time and energy students spend learning
what they will be graded on, as Dochy has discussed extensively in Chapter 6.
Indeed, Entwistle (2001) asserts, ‘‘It is not the teaching-learning environment
136 L. Suskie

itself that determines approaches to studying, but rather what students believe
to be required’’ (p. 16). Biggs (2001) concurs, noting, ‘‘[Students] see the assess-
ment first, then learn accordingly, then at last see the outcomes we are trying to
impart’’ (p. 66). The assessments that are used to grade students can thus be a
powerful force influencing what and how students learn. This idea is captured in
the final principle: ‘‘Assessments focus on the most important course and
program goals and are learning activities in their own right.’’ Four of the
principles are keys to making this happen:
Principle 1: Students learn more effectively when they understand course and
program goals and the characteristics of excellent work. As Sadler has discussed
in Chapter 4, students who want to do well learn more effectively when they
understand clearly why they have been given a particular assignment, what they
are expected to learn from it, and how they will be graded on it. If students
know, for example, that their papers will be graded in part on how effective the
introduction is, some will try their best to write an effective introduction.
Principle 2: Students learn more effectively when they are academically chal-
lenged and given high but attainable expectations. While there is a limit to what
students can achieve in a given amount of time (even the best writing faculty, for
example, cannot in a few weeks enable first-year students to write at the level
expected of doctoral dissertations), many students respond remarkably well to
high standards, provided that they are given a clear roadmap on how to achieve
them (De Sousa, 2005). First-year students, for example, may be able to write
quite competent library research papers if the research and writing process is
broken down into relatively small, manageable steps (identifying the research
topic, finding appropriate library resources, reading and analyzing them, out-
lining the paper, etc.) and students are guided through each step.
Principle 7: Students learn more effectively when they spend significant time
studying and practicing. While this is an obvious truism, there is evidence that
students in some parts of the world spend relatively little time on out-of-class
studies. Those who attended college a generation or two ago may recall the
adage to study two hours outside of class for every hour spent in class. For a
full-time student spending 15 hours in class, this would mean spending about 30
hours on out-of-class studies. But according to the National Survey of Student
Engagement (2006), about 45% of today’s American college freshmen and
seniors report spending less than 11 hours a week preparing for class (studying,
reading, writing, doing homework or lab work, analyzing data, rehearsing, and
other academic activities). Only about ten percent spend more than 25 hours a
week preparing for class. While these results reflect part-time as well as full-time
students, it is nonetheless clear that, at least in the United States, much more
could be done to have students take responsibility for their learning.
Principle 8: Students learn more effectively when they receive prompt, concrete
feedback on their work. As Dochy has discussed in Chapter 6, we all benefit from
constructive feedback on our work, and students are no different (Ovando,
1992). The key is to get feedback to students quickly enough that they can use
the feedback to improve their learning. Faculty know all too well that final
8 Using Assessment Results to Inform Teaching Practice 137

exams graded after classes end are often not retrieved by students and, if they
are, checked only for the final grade. The problem is that providing construc-
tive, timely feedback can take a great deal of time. Here are some practical
suggestions on ways to minimize that time:
 Return ungraded any assignments that show little or no effort, with a request
that the paper be redone and resubmitted within 24 hours. As Walvoord and
Anderson (1998) point out, if students make no effort to do an assignment
well, why should the professor make any effort to offer feedback?
 When appropriate (see Sadler, Chapter 4 of this book) use rubrics (rating
scales or scoring guides) to evaluate student work. They speed up the process
because much feedback can be provided simply by checking or circling
appropriate boxes rather than writing comments.
 Use Haswell’s (1983) minimal marking method: rather than correct gramma-
tical errors, simply place a check in the margin next to the error, and require
the student to identify and correct errors in that line.
 Provide less feedback on minor assignments. Some short homework assign-
ments, for example, can be marked simply with a plus symbol for outstand-
ing work, a checkmark for adequate work, and a minus symbol for minimal
effort.

Using Assessment Results to Promote Lasting Learning: Two


Underlying Principles
Assessment results can provide a wealth of information to help faculty under-
stand how effective their teaching is and how it might be improved, provided
that two principles are followed when the assessments are planned.

Articulate the Decisions that Assessment Will Inform

MacGregor, Tinto, and Lindblad (2001) note, ‘‘If you’re not clear on the goals
of an assessment and the audiences to which that assessment will be directed, it’s
hard to do the assessment well. So your first task is to ask yourself why, with
whom, and for what purpose you are assessing. . .’’ (p. 48). Assessment results
can inform decisions in at least five areas:
Learning outcomes. Assessment results might help faculty decide, for exam-
ple, if their statements of expected learning outcomes are sufficiently clear and
focused or if they have too many intended learning outcomes to cover in the
allotted instructional time.
Curriculum. Assessment results might help faculty decide, for example, if
classes or modules taught by several faculty have sufficient uniformity across
sections or whether a service-learning component is achieving its goal.
138 L. Suskie

Pedagogy. Assessment results might help faculty decide, for example, whether
online instruction is as effective as traditional classroom-based instruction or
whether collaborative learning is more effective than traditional lectures.
Assessment. Assessment results can, of course, help faculty decide how useful
their assessment strategies have been and what changes are needed to improve
their effectiveness.
Resource allocations. Assessment results can provide a powerful argument
for resource allocations. Disappointing evidence of student writing skills, for
example, can lead to fact-based arguments for, say, more writing tutors or more
online writing software. Poor student performance on a technology examina-
tion required for licensure in a profession can be a compelling argument for
upgrading technologies in laboratories.
Understanding and articulating the decisions that a particular assessment
will inform helps to ensure that the assessment results will indeed help enlighten
those decisions. Suppose, for example, that a professor is assessing student
learning in a biology laboratory. If the results are to inform decisions about
the laboratory’s learning outcomes, for example, the assessments will need to be
designed to assess each expected outcome. If the results are to inform decisions
about curriculum, the assessments will need to be designed to assess each aspect
of the curriculum. And if the results are to inform decisions about pedagogy, the
assessments may need to be designed to provide comparable information on
different pedagogical approaches.

Develop Assessment Strategies that Will Provide Appropriate


Frames of Reference to Inform Those Decisions
Suskie (2007) has observed that assessment results considered in isolation are
meaningless – they have significance only if they are compared against some
kind of benchmark or frame of reference. Suskie has identified nine such frames
of reference, four of which are especially relevant to most faculty.
Under the strengths and weaknesses frame of reference, faculty compare the
sub-scores of an assessment to identify students’ relative strengths and weak-
nesses. An assessment of writing skills, for example, might determine that
students are relatively good at writing a strong introduction but relatively
weak at supporting arguments with appropriate evidence. This frame of refer-
ence is often of great interest and great value to faculty, as it tells them what
their students are learning well and what areas need more or different attention.
In order to use this frame of reference, the assessment must be designed to
generate comparable results on aspects of the trait being assessed, such as
effective writing. This is generally accomplished by using a rubric or rating
scale that lists the aspects being evaluated. Some published tests and surveys
also generate sub-scores that can be compared with one another. In contrast, a
holistic assessment, generating only one score per student without subscores,
cannot provide this kind of information.
8 Using Assessment Results to Inform Teaching Practice 139

Under the improvement frame of reference, faculty compare student


assessment results at the beginning and end of a class, module, course, or
program. Faculty might, for example, give students the final examination on
the first day of class and compare those scores against those on the same
examination given at the end of instruction. Or faculty might give students
writing assignments in the first and last weeks that are evaluated using the same
criteria.
Such a value-added approach is intrinsically appealing, as it appears to
convey how much students’ learning has improved as a result of faculty teach-
ing. But this approach has a number of shortcomings. One is that students must
be motivated to do their best on the entry-point assessment. This can be a
challenge because grading, generally a strong motivator, is often inappropriate
at this point: Is it fair to grade students when they have not yet had an
opportunity to learn anything?
Another challenge is that both entry- and exit-point assessment information
must be collected for this approach to be meaningful. This is not possible in
situations in which sizable numbers of students either transfer into a class,
module, course, or program after it has begun or drop out of it before it is
completed. In these situations, the students who persist from beginning to end
may not be a representative sample of all students who are subject to
instruction.
Yet another challenge with this value-added approach is that gain scores, the
difference between entry- and exit-point assessment results, are notoriously
unreliable. As noted by Banta and Pike (2007) and Pike (2006), the measure-
ment error of gain scores is essentially double that of each assessment alone.
This sizable measurement error can mask meaningful gains.
But perhaps the major concern about the improvement or value-added
frame of reference is that it is often confused with the pre-post experimental
design (Campbell & Stanley, 1963) used in the social sciences. In pre-post
experimental designs, subjects are randomly assigned to control and experi-
mental treatments; this allows the research to separate the impact of the
treatment from extraneous factors. In higher education, however, faculty
cannot randomly assign students to institutions or programs, so if faculty
find significant growth they cannot conclude that it is due to the learning
experience or to extraneous factors. If a student’s oral communication
skills improve, for example, faculty cannot be certain that the improvement
is due to work in class or to, say, concurrent participation in a club or a
part-time job in which the student uses and improves oral communication
skills.
Under the historical trends frame of reference, faculty compare student
assessment results against those of prior cohorts of students. This frame of
reference is of particular interest to faculty who want to know if their efforts
to improve their curricula and pedagogies are yielding desired improvements
in student learning. This frame of reference can only be used, of course, if
identical or parallel assessments can be utilized with successive cohorts of
140 L. Suskie

students. This is not always possible – sometimes curricula must change in


order to meet employer and societal demands, so assessments used a few
years ago may no longer be appropriate or relevant today. Another chal-
lenge with this approach is that, as with the improvement frame of reference
discussed above, this is not an experimental design with random assignments
to cohorts. As a result, faculty cannot be certain that changes in student
learning are due to changes in curricula or pedagogies or due to changes in
the students themselves. Faculty teaching at an institution that has increased
its admission standards, for example, cannot be certain that the growth they
see in student learning is due to changes in curricula and pedagogies or is
simply because today’s students are more talented, motivated, or prepared
to learn.
Under the standards frame of reference, faculty compare assessment results
against an established standard, set either by the faculty or by a regional or
national agency or organization. Faculty might decide for example, that stu-
dents must answer at least 70% of test questions correctly in order to pass an
examination, or a nursing licensure agency might state that nursing students
must earn a particular score on a licensure examination in order to be licensed
to practice. This frame of reference is of interest to faculty who want or need to
ensure that students are meeting particular standards. Many colleges and uni-
versities, for example, want to ensure that all students graduate with a parti-
cular level of writing skill.
The challenge with this approach is, not surprisingly, setting an appropri-
ate standard. If the standard has been set by an agency or organization, the
work is done, of course, but setting a defensible, valid local standard can be
very difficult and time-consuming. While faculty have been doing this for
generations (setting a standard, for example, that students must answer at
least 65% of questions correctly in order to pass a final examination), in
reality these standards are often set arbitrarily and without clear justification.
Livingston and Zieky (1982) offer a variety of techniques for establishing
defensible standards.

Practical Suggestions for Summarizing, Interpreting, and Using


Assessment Results To Whom It May Concern: Promote Deep,
Lasting Learning

In order to use assessment results to inform teaching practice and thereby


improve student learning, they must be summarized in a way that busy faculty
can quickly and easily understand. They must then be interpreted in appro-
priate ways so they may be used to inform teaching practice and thereby
promote deep, lasting learning. Summaries, analyses, and interpretations
should aim to answer two fundamental questions: (1) What have we learned
about our students’ learning? and (2) What are we going to do about what we
8 Using Assessment Results to Inform Teaching Practice 141

have learned? The key is to ensure that the steps that are taken build upon
practices that promote deep, lasting learning.
What follow are practical suggestions for summarizing, interpreting, and
using assessment results to answer these questions for three different assessment
practices: the use of rubrics (rating scales or scoring guides), multiple choice
tests, and reflective writing exercises.

Rubrics

As Sadler has discussed in Chapter 4, rubrics are a list of the criteria used to
evaluate student work (papers, projects, performances, portfolios, and the like)
accompanied by a rating scale. Using rubrics can be a good pedagogical
practice for several reasons:
 Creating a rubric before the corresponding assignment is developed, rather
than vice versa, helps to ensure that the assignment will address what the
professor wants students to learn.
 Giving students the rubric along with the assignment is an excellent way to help
them understand the purpose of the assignment and how it will be evaluated.
 Using a rubric to grade student work ensures consistency and fairness.
 Returning the marked rubric to students with their graded assignment gives
them valuable feedback on their strengths and weaknesses and helps them
understand the basis of their grade.
In order to understand rubric results and use them to inform teaching
practice, it is helpful to tally students’ individual ratings in a simple chart.
Table 8.1 provides a hypothetical example for the results of a rubric used to
evaluate the portfolios of 30 students studying journalism.
It is somewhat difficult to understand what Table 8.1 is saying. While it
seems obvious that students performed best on the fourth criterion (‘‘under-
standing professional ethical principles and working ethically’’), their other
relative strengths and weaknesses are not as readily apparent. Table 8.1
would be more useful if the results were somehow sorted. Let us suppose that,
in this hypothetical example, the faculty’s goal is that all students earn at least
‘‘very good’’ on all criteria. Table 8.2 sorts the results based on the number of
students who score either ‘‘excellent’’ or ‘‘very good’’ on each criterion. Table 8.2
also converts the raw numbers into percentages, because this allows faculty to
compare student cohorts of different sizes. The percentages are rounded to the
nearest whole percentage, to reduce the volume of information to be digested
and to keep the reader from focusing on trivial differences.
Now the results jump out at the reader. It is immediately apparent that
students not only did relatively well on the fourth criterion but also on the
first, third and fifth. It is equally apparent that students’ weakest area (of those
evaluated here) is the tenth criterion (‘‘applying basic numerical and statistical
142 L. Suskie

Table 8.1 An example of tallied results of a rubric used to evaluate the portfolios of 30
students studying journalism
The student: Excellent Very Adequate Inadequate
Good
1. Understands and applies the 15 14 1 0
principles and laws of freedom of
speech and press.
2. Understands the history and role of 18 8 4 0
professionals and institutions in
shaping communications.
3. Understands the diversity of groups 12 17 1 0
in a global society in relationship
to communications.
4. Understands professional ethical 27 3 0 0
principles and works ethically in
pursuit of truth, accuracy,
fairness, and diversity.
5. Understands concepts and applies 12 17 1 0
theories in the use and
presentation of images and
information.
6. Thinks critically, creatively, and 2 20 5 3
independently.
7. Conducts research and evaluates 6 21 3 0
information by methods
appropriate to the
communications profession(s)
studied.
8. Writes correctly and clearly in forms 9 14 6 1
and styles appropriate for the
communications profession(s)
studied and the audiences and
purposes they serve.
9. Critically evaluates own work and 6 19 3 2
that of others for accuracy and
fairness, clarity, appropriate style
and grammatical correctness.
10. Applies basic numerical and 10 8 9 3
statistical concepts.
11. Applies tools and technologies 8 19 3 0
appropriate for the communicat-
ions profession(s) studied.

concepts’’). Another area of relative weakness is the sixth criterion (‘‘thinking


critically, creatively, and independently’’). Table 8.2 thus provides a clear road-
map for faculty reflection on students’ relative strengths and weaknesses.
Professors might begin this reflection by first examining how the curriculum
addresses the application of basic numerical and statistical concepts by review-
ing syllabi to identify the courses or modules in which this skill is addressed. The
faculty might then discuss how well the identified courses or modules follow the
8 Using Assessment Results to Inform Teaching Practice 143

Table 8.2 An improved version of Table 8.1


The student: Excellent Excellent Very Adequate Inadequate
+ Very good
good
3. Understands the 97% 40% 57% 3% 0%
diversity of groups in
a global society in
relationship to
communications.
5. Understands concepts 97% 40% 57% 3% 0%
and applies theories
in the use and
presentation of
images and
information.
11. Applies tools and 90% 27% 63% 10% 0%
technologies
appropriate for the
communications
profession(s) studied.
7. Conducts research and 90% 20% 70% 10% 0%
evaluates information
by methods
appropriate to the
communications
profession(s) studied.
2. Understands the 87% 60% 27% 13% 0%
history and role of
professionals and
institutions in
shaping commun-
ications.
9. Critically evaluates own 83% 20% 63% 10% 7%
work and that of
others for accuracy
and fairness, clarity,
appropriate style and
grammatical
correctness.
8. Writes correctly and 77% 30% 47% 20% 3%
clearly in forms and
styles appropriate for
the communications
profession(s) studied
and the audiences and
purposes they serve.
6. Thinks critically, 73% 7% 67% 17% 10%
creatively, and
independently.
10. Applies basic numerical 60% 33% 27% 30% 10%
and statistical
concepts.
144 L. Suskie

thirteen principles for promoting deep, lasting learning articulated in this


chapter. They might, for example, ask themselves:
 Are we giving enough time and attention in these classes to applying basic
numerical and statistical concepts? Are we giving students enough classwork
and assignments on this skill?
 Do students spend enough time actively applying numerical and statistical
concepts?
 Are the assignments in which they apply numerical and statistical concepts
real world problems, the kinds that may have more than one ‘‘correct’’ answer?
 Would students benefit from working with fellow students on these assign-
ments rather than alone?
 Do we give students sufficient feedback on their work in applying numerical
and statistical concepts? Do we give them sufficient opportunities to correct
or revise their work?
Discussion of these and similar questions will doubtless lead to ideas about
ways to strengthen students’ skills in applying numerical and statistical con-
cepts. Faculty might decide, for example, to incorporate the application of
numerical and statistical concepts into more courses or modules, to give stu-
dents more practice through additional homework assignments, and to give
students more collaborative projects in which they must interpret real world
data with their fellow students.

Multiple Choice Tests

The validity and usefulness of multiple choice tests is greatly enhanced if they
are developed with the aid of a test blueprint or outline of the knowledge and
skills being tested. Table 8.3 is an example of a simple test blueprint.
In this example, the six listed objectives represent the professor’s key objec-
tives for this course or module, and the third, fourth, and sixth objectives are
considered the most important objectives. This blueprint is thus powerful
evidence of the content validity of the examination and a good framework for
summarizing the examination results, as shown in Table 8.4. Again, the results
in Table 4 have been sorted from highest to lowest to help readers grasp the
results more quickly and easily.

Table 8.3 Example of a test blueprint for a statistics examination


1 item Determine the value of t needed to find a confidence interval of a given size.
1 item Understand the effect of p on the standard error of a proportion.
6 items Choose the appropriate statistical analysis for a given research problem.
4 items Decide on the appropriate null and alternative hypotheses for a given
research problem and state them correctly.
2 items Identify the critical value(s) for a given statistical test.
4 items Choose the appropriate standard error formula for a given research problem.
8 Using Assessment Results to Inform Teaching Practice 145

Table 8.4 Results of a statistics examination, matched to the test blueprint


Percentage of students Learning Objective
answering correctly
95% Determine the value of t needed to find a confidence interval
of a given size.
88% Understand the effect of p on the standard error of a proportion.
85% Decide on the appropriate null and alternative hypotheses for
a given research problem and state them correctly.
79% Identify the critical value(s) for a given statistical test.
62% Choose the appropriate standard error formula for a given
research problem.
55% Choose the appropriate statistical analysis for a given
research problem

Table 8.4 makes clear that students overall did quite well in determining the
value of t needed to find a confidence interval, but a relatively high proportion
were unable to choose the appropriate statistical analysis for a given research
problem. Table 8.4 provides another clear roadmap for faculty reflection on
students’ relative strengths and weaknesses. The professor might consider how
to address the weakest area – choosing appropriate statistical analyses – by again
reflecting on the practices that promote deep, lasting learning discussed earlier in
this chapter. The professor might, for example, decide to address this skill by
revising or expanding lectures on the topic, giving students additional homework
on the topic, and having students work collaboratively on problems in this area.
Another useful way to review the results of multiple choice tests is to
calculate what testing professionals (e.g., Gronlund, 2005; Haladyna, 2004;
Kubiszyn & Borich, 2002) call the discrimination of each item (Suskie,
2004d). This metric, a measure of the internal reliability or internal consistency
of the test, is predicated on the assumption that students who do relatively well
on a test overall will be more likely to get a particular item correct than those
who do relatively poorly. Table 8.5 provides a hypothetical example of discri-
mination results for a 5-item quiz. In this example, responses of the ten students
with the highest overall scores on this quiz are compared against those of the ten
students with the lowest overall quiz scores.

Table 8.5 Discrimination results for a short quiz taken by 30 students


Item Number of ‘‘Top Number of ‘‘Bottom Difference
number 10’’ Students 10’’ Students (Discrimination)
answering correctly answering correctly
1 10 0 10
2 8 6 2
3 5 5 0
4 10 10 0
5 4 8 4
146 L. Suskie

These five items have widely varying levels of discrimination:


 Item 1 has the best possible discrimination – all the top students answered it
correctly, while none of the bottom students did. This is truly an item that
‘‘separates the wheat from the chaff,’’ discriminating students who have truly
mastered class objectives from those who have not.
 Item 2 is an example of an item with good discrimination, though not as
strong as Item 1, simply because it is easier. Items that fifty per cent of
students get wrong have the greatest potential for discrimination; easier
items have lower potential for discrimination.
 Item 3 has no discrimination – equal numbers of top and bottom students
answered it correctly. While an item with no discrimination is not inherently
a poor item, it would be worth a closer look to see why a number of top
students struggled with it while some bottom students did not. Asking the
top students why they got this question wrong would probably give the
professor ideas on ways to revise the question for future administrations.
 Item 4 also has no discrimination, but this is simply because everyone
answered it correctly. As already noted, easy items cannot discriminate
well between top and bottom students, and items that are so easy that
everyone answers them correctly will, of course, not discriminate at all.
 Item 5 discriminates negatively – students in the top group were more likely
to answer it incorrectly than students in the bottom group. It is very likely
that students in the top group misinterpreted either the question or one or
more of its options, probably reading more into the item than the professor
intended. This is an item that performed so poorly that it should be removed
from the scores of these students and revised before it is used again. As with
Item 3, the top students who got this question wrong would doubtless be able
to give the professor suggestions on how to revise the item to minimize future
misinterpretations.

Reflective Writing

Reflective writing is a learning strategy in which students reflect and write on


what and how they have learned. Students engaged in reflective writing typi-
cally reflect and write on ‘‘the larger context, the meaning, and the implications
of an experience or action’’ (Branch & Paranjape, 2002, p. 1185) and ‘‘pull
together a broad range of previous thinking or knowledge in order to make
greater sense of it for another purpose that may transcend the previous bounds
of personal knowledge or thought’’ (Moon, 2001, p. 5). Reflective writing thus
helps students develop a number of skills (Costa & Kallick, 2000), including
skill in synthesis—pulling together what they have learned in order to see the big
picture – and metacognition – understanding how one learns.
Reflective writing can also be a valuable assessment strategy. Costa and
Kallick (2002) note that reflective writing provides an opportunity for
8 Using Assessment Results to Inform Teaching Practice 147

‘‘documenting learning and providing a rich base of shared knowledge’’ (p. 60),
while the Conference on College Composition and Communication notes that
‘‘reflection by the writer on her or his own writing processes and performances
holds particular promise as a way of generating knowledge about writing’’
(2006). Reflective writing may be especially valuable for assessing ineffable
outcomes such as attitudes, values, and habits of mind. An intended student
learning outcome to ‘‘be open to diverse viewpoints’’ would be difficult to assess
through a traditional multiple choice test or essay assignment, because students
would be tempted to provide what they perceive to be the ‘‘correct’’ answer
rather than accurate information on their true beliefs and views.
Because reflective writing seeks to elicit honest answers rather than ‘‘best’’
responses, reflective writing assignments may be assessed and the results used
differently than other assessment strategies. While the structure of a student’s
reflective writing response can be evaluated using a rubric, the thoughts and
ideas expressed may be so wide-ranging that qualitative rather than quantita-
tive assessment strategies may be more appropriate.
Qualitative assessment techniques are drawn from qualitative research
approaches, which Marshall and Rossman (2006) describe as ‘‘naturalistic,’’
‘‘fundamentally interpretive,’’ relying on ‘‘complex reasoning that moves
dialectically between deduction and induction,’’ and drawing on ‘‘multiple
methods of inquiry’’ (p. 2). Qualitative assessment results may thus be
summarized differently than quantitative results such as those from rubrics
and multiple choice tests, which typically yield ratings or scores that can be
summarized using descriptive and inferential statistics. Qualitative research
techniques aim for naturalistic interpretations rather than, say, an average
score.
Qualitative assessment techniques typically include sorting the results into
categories (e.g., Patton, 2002). Tables 8.6 and 8.7 provide an example of a
summary of qualitative assessment results from a day-long workshop on the
assessment of student learning, conducted by the author. The workshop
addressed four topics: principles of good practice for assessment, promoting
an institutional culture of assessment, the articulation of learning outcomes,
and assessment strategies including rubrics. At the end of the day, participants
were asked two questions, adapted from the minute paper suggested by Angelo

Table 8.6 Responses to ‘‘What was the most useful or meaningful thing you
learned today?’’ by Participants at a one-day workshop on assessing student
learning
Percent of respondents (%) Category of response
40 Assessment strategies (e.g., rubrics)
20 Culture of assessment
16 Principles of good practice
10 Articulating learning outcomes
13 Miscellaneous
148 L. Suskie

Table 8.7 Responses to ‘‘What question remains uppermost on your mind as


we end this workshop?’’ by participants at a one-day workshop on assessing
student learning
Percent of Respondents (%) Category of Response
27 Culture of assessment
13 Organizing assessment across an institution
43 Unique questions on other topics
16 No response

and Cross (1993): ‘‘What was the most useful or meaningful thing you learned
today?’’ and ‘‘What one question is uppermost on your mind as we end this
workshop?’’
For the first question, ‘‘What was the most useful or meaningful thing you
learned today?’’ (Table 8.6), the author sorted comments into five fairly obvious
categories: the four topics of the workshop plus a ‘‘miscellaneous’’ category. For
the second question, ‘‘What one question is uppermost on your mind as we end
this workshop?’’ (Table 8.7), potential categories were not as readily evident.
After reviewing the responses, the author settled on the categories shown in
Table 8.7, then sorted the comments into the identified categories. Qualitative
analysis software is available to assist with this sorting – such programs search
responses for particular keywords provided by the professor.
The analysis of qualitative assessment results – identifying potential cate-
gories for results and then deciding the category into which a particular
response is placed – is, of course, inherently subjective. In the workshop
example described here, the question, ‘‘Would it be helpful to establish an
assessment steering committee composed of faculty?’’ might be placed into the
‘‘culture of assessment’’ category by one person and into the ‘‘organizing assess-
ment’’ category by another. But while qualitative assessment is a subjective
process, open to inconsistencies in categorizations, it is important to note that
any kind of assessment of student learning has an element of subjectivity, as the
questions that faculty choose to ask of students and the criteria used to evaluate
student work are a matter of professional judgment that is inherently subjective,
however well-informed. Inconsistencies in categorizing results can be mini-
mized by having two readers perform independent categorizations, then review-
ing and reconciling differences, perhaps with the introduction of a third reader
for areas of disagreement.
Qualitative assessment results can be extraordinarily valuable in helping
faculty understand and improve their teaching practices and thereby improve
student learning. The ‘‘Minute Paper’’ responses to this workshop (Tables 8.6
and 8.7) provided a number of useful insights to the author:
 The portion of the workshop addressing assessment strategies was clearly
very successful in conveying useful, meaningful ideas; the author could take
satisfaction in this and leave it as is in future workshops.
8 Using Assessment Results to Inform Teaching Practice 149

 The portion of the workshop addressing learning outcomes was not especially
successful; while there were few if any questions about this topic, few partici-
pants cited it as especially useful or meaningful. (A background knowledge
probe (Angelo & Cross, 1993) would have revealed that most participants
arrived at the workshop with a good working knowledge of this topic.) The
author used this information to modify her workshop curriculum to limit
coverage of this topic to a shorter review.
 Roughly one in eight participants had questions about organizing assess-
ment activities across their institution, a topic not addressed in the work-
shop. The author used this information to modify her workshop curriculum
to incorporate this topic. (Reducing time spent on learning outcomes
allowed her to do this.)
The portion of the workshop addressing promoting a culture of assessment
was clearly the most problematic. While a fifth of all respondents found it useful
or meaningful, two-fifths had questions about this topic when the workshop
concluded. Upon reflection, the author realized that she placed this topic at the
end of the workshop curriculum, addressing it at the end of the day when she
was rushed and participants were tired. She modified her curriculum to move
this topic to the beginning of the workshop and spend more time on it.
As a result of this reflective writing assignment and the subsequent changes
made in curriculum and pedagogy, participant learning increased significantly
in subsequent workshops, as evidenced by the increased proportion of com-
ments citing organizing assessment activities and promoting a culture of assess-
ment as the most useful or meaningful things learned and the smaller
proportion of participants with questions about these two areas.

Conclusion

Why do faculty assess student learning? One longstanding reason, of course, is


to form a basis for assigning grades to students. Another recently emerging
reason is to demonstrate to various constituents – government agencies, quality
assurance agencies, taxpayers, employers, and students and their families – that
colleges and universities are indeed providing students with the quality educa-
tion they promise. But the most compelling reason for many faculty to engage in
assessing student learning is for the opportunity it provides to improve teaching
practices and thereby foster deep, lasting learning. This chapter has described a
number of ways that assessment activities can accomplish these ends:
 Provide a broad array of learning and assessment activities.
 Design assessment activities (e.g., assignments, tests) so that they address key
learning outcomes.
 Help students understand course and program expectations and the char-
acteristics of excellent work.
150 L. Suskie

 Challenge students by giving them high but attainable expectations.


 Require students to spend significant time studying and practicing.
 Give students prompt, concrete feedback on their work.
 Articulate the decisions that assessment results are to inform.
 Design assessments so that they will provide appropriate frames of reference
to inform those decisions.
 Use test blueprints to plan multiple choice tests and rubrics to plan other
assignments and tests.
 Summarize assessment results into simple tables, perhaps with results sorted
so that the best and most disappointing results can be quickly identified.
 Use the results of rubrics and multiple choice tests to identify students’
relative strengths and weaknesses and ways that the assessments themselves
might be improved.
 Use the results of qualitative assessments to identify areas in which students
are confused, dissatisfied with their learning, or fail to demonstrate attain-
ment of key learning outcomes.
 Use recent research on strategies that promote deep, lasting learning, along
with feedback from students, to plan how to address assessment results that
are disappointing and thereby improve teaching practice and promote last-
ing learning.
Faculty who have a passion for teaching are always looking for ways to
improve their practice and foster lasting student learning. Once they understand
the nature and use of assessment, they quickly come to realize that assessment is
one of the best tools in their teaching toolbox for achieving these ends.

References
Angelo, T. A. (1993, April). A ‘‘teacher’s dozen’’: Fourteen general, research-based principles
for improving higher learning in our classrooms. AAHE Bulletin, 45(8), 3–7, 13.
Angelo, T. A., & Cross, K. P. (1993). Classroom assessment techniques: A handbook for college
teachers (2nd ed.). San Francisco: Jossey-Bass.
Association of American Colleges and Universities. (2002). Greater expectations: A New
vision for learning as a nation goes to college. Washington, DC: Author.
Astin, A. W. (1993). What matters in college: Four critical years revisited. San Francisco:
Jossey-Bass.
Banta, T. W., & Pike, G. R. (2007). Revisiting the blind alley of value added. Assessment
Update, 19(1), 1–2, 14–15.
Barr, R. B., & Tagg, J. (1995). From teaching to learning: A new paradigm for undergraduate
education. Change, 27(6), 12–25.
Biggs, J. (2001). Assessing for quality in learning. In L. Suskie (Ed.), Assessment to promote
deep learning: Insight from AAHE’s 2000 and 1999 assessment conferences (pp. 65–68).
Branch, W. T., & Paranjape, A. (2002). Feedback and reflection: Teaching methods for
clinical settings. Academic Medicine, 77, 1185–1188.
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for
research. Chicago: Rand McNally.
Chickering, A. W., & Gamson, Z. (1987). Seven principles for good practice in undergraduate
education. AAHE Bulletin, 39(7), 5–10.
8 Using Assessment Results to Inform Teaching Practice 151

Chickering, A. W., & Gamson, Z. (1991). Applying the seven principles for good practice in
undergraduate education. New directions for teaching and learning, No. 47. San Francisco:
Jossey-Bass.
Conference on College Composition and Communication. (2006). Writing assessment: A
position statement. Urbana, IL: National Council of Teachers of English. Retrieved
September 4, 2007, from http://www.ncte.org/cccc/resources/123784.htm
Costa, A., & Kallick B. (2000). Getting into the habit of reflection. Educational Leadership,
57(7), 60–62.
De Sousa, D. J. (2005). Promoting student success: What advisors can do (Occasional Paper
No. 11). Bloomington, Indiana: Indiana University Center for Postsecondary Research.
Entwistle, N. (2001). Promoting deep learning through teaching and assessment. In L. Suskie
(Ed.), Assessment to promote deep learning: Insight from AAHE’s 2000 and 1999 assessment
conferences (pp. 9–20).
Ewell, P. T., & Jones, D. P. (1996). Indicators of ‘‘good practice’’ in undergraduate education:
A handbook for development and implementation. Boulder, CO: National Center for Higher
Education Management Systems.
Gronlund, N. E. (2005). Assessment of student achievement (8th ed.) Boston: Allyn & Bacon.
Haladyna, T. M. (2004). Developing and validating multiple choice items. Boston, MA: Allyn &
Bacon.
Haswell, R. (1983). Minimal marking. College English, 45(6), 600–604.
Henscheid, J. M. (2000). Professing the disciplines: An analysis of senior seminars and
capstone courses (Monograph No. 30). Columbia, South Carolina: National Resource
Center for the First-Year Experience and Students in Transition.
Huba, M. E., & Freed, J. E. (2000). Learner-centered assessment on college campuses: Shifting
the focus from teaching to learning (pp. 32–64). Needham Heights, MA: Allyn & Bacon.
Kubiszyn, T., & Borich, G. D. (2002). Educational testing and measurement: Classroom
application and management (7th ed.) San Francisco: Jossey-Bass.
Kuh, G. (2001). Assessing what really matters to student learning: Inside the National Survey
of Student Engagement. Change, 33(3), 10–17, 66.
Kuh, G. D., Schuh, J. H., Whitt, E. J., & Associates. (1991). Involving colleges: Successful
approaches to fostering student learning and development outside the classroom. San Francisco:
Jossey-Bass.
Light, R. (2001). Making the most of college: Students speak their minds. Cambridge, MA:
Harvard University Press.
Linn, R., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and
validation criteria. Los Angeles: University of California Center for Research on Evalua-
tion, Standards, and Student Testing.
Livingston, S. A., & Zieky, M. J. (1982). Passing scores: A manual for setting standards on
performance on educational and occupational tests. Princeton: Educational Testing
Service.
MacGregor, J., Tinto, V., & Lindblad, J. H. (2001). Assessment of innovative efforts: Lessons
from the learning community movement. In L. Suskie (Ed.), Assessment to promote deep
learning: Insight from AAHE’s 2000 and 1999 assessment conferences (pp. 41–48).
McKeachie, W. J. (2002). Teaching tips: Strategies, research, and theory for colleges and
university teachers (11th ed.). Boston: Houghton Mifflin.
Marshall, C., & Rossman, G. B. (2006). Designing qualitative research (4th ed.). Thousand
Oaks, CA: Sage.
Mentkowski, M., & Associates. (2000). Learning that lasts: Integrating learning, development,
and performance in college and beyond. San Francisco, CA: Jossey-Bass.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement. New York:
Macmillan.
Messick, S. (1994). The interplay of evidence and consequences in the validation of perfor-
mance assessments. Educational Researcher, 23(2), 13–23.
152 L. Suskie

Moon, J. (2001). Reflection in learning and professional development: Theory and practice.
London: Routledge.
Moran, D. J., & Malott, R. W. (Eds.). (2004). Evidence-based educational methods. San Diego:
Elsevier Academic Press.
National Survey of Student Engagement. (2006). Engaged learning: Fostering success for all
students: Annual report 2006. Bloomington, IN: Author.
Ovando, M. N. (1992). Constructive feedback: A key to successful teaching and learning.
Austin, TX: University of Texas at Austin, College of Education, Department of Educa-
tion Administration. ERIC Document Reproductive Services No. ED 404 291.
Palmer, P. J. (1998). The courage to teach: Exploring the inner landscape of a teacher’s life.
San Francisco: Jossey-Bass.
Pascarella, E. T. (2001). Identifying excellence in undergraduate education: Are we even close?
Change, 33(3), 19–23.
Pascarella, E. T., & Terenzini, P.T. (2005). How college affects students: A third decade of
research. San Francisco: Jossey-Bass.
Patton, M. Q. (2002). Qualitative research & evaluation methods (3rd ed.). Thousand Oaks,
CA: Sage.
Pike, G. R. (2006). Assessment measures: Value-added models and the Collegiate Learning
Assessment. Assessment Update, 18(4), 5–7.
Romer, R., & Education Commission of the States. (1996, April). What research says about
improving undergraduate education. AAHE Bulletin, 48(8), 5–8.
Suskie, L. (2000, May). Fair assessment practices: Giving students equitable opportunities to
demonstrate learning. AAHE Bulletin, 52(9), 7–9.
Suskie, L. (2004a). Encouraging student reflection. In Assessing student learning: A common
sense guide (pp. 168–184). San Francisco: Jossey-Bass Anker Series.
Suskie, L. (2004b). Using assessment findings effectively and appropriately. In Assessing
student learning: A common sense guide (pp. 300–317). San Francisco: Jossey-Bass Anker
Series.
Suskie, L. (2004c). What is assessment? Why assess? In Assessing student learning: A common
sense guide (pp. 3–17). San Francisco: Jossey-Bass Anker Series.
Suskie, L. (2004d). Writing a traditional objective test. In Assessing student learning:
A common sense guide (pp. 200–221). San Francisco: Jossey-Bass Anker Series.
Suskie, L. (2007). Answering the complex question of ‘‘How good is good enough?’’ Assess-
ment Update, 19(4), 1–2, 14–15.
Walvoord, B., & Anderson, V. J. (1998). Effective grading: A tool for learning and assessment.
San Francisco: Jossey-Bass.
Chapter 9
Instrumental or Sustainable Learning?
The Impact of Learning Cultures on Formative
Assessment in Vocational Education

Kathryn Ecclestone

Introduction

There is growing interest amongst researchers, policy makers and teachers at


all levels of the British education system in assessment that encourages
engagement with learning, develops autonomy and motivation and raises
levels of formal achievement. These goals have been influenced by develop-
ments in outcome-based and portfolio-based qualifications in post-school
education, including higher education. There is parallel interest in developing
learning to learn skills and encouraging a positive attitude to learning after
formal education through assessment that serves immediate goals for achieve-
ment whilst establishing a basis for learners to undertake their own assessment
activities in future (see Boud & Falchikov, 2007). More specifically, research
in the school sector offers insights about how to promote a sophisticated
understanding of formative assessment that changes how students and tea-
chers regard the purposes of assessment and their respective roles in it in order
to enhance learning (see Assessment Reform Group, 2002; Black & Wiliam,
1998; Gardner, 2006).
Yet, despite such compelling goals and the apparently unproblematic
nature of principles and methods to encourage them, theoretical and empiri-
cal research about the assessment experiences of post-compulsory students
in the UK shows that promoting sustainable learning through assessment is
not straightforward. Recent studies show that students and their teachers
have different expectations about the type of learners suitable for vocational
and academic courses, the purposes of assessment as either to foster subject-
knowledge or personal development, and about ‘‘appropriate’’ forms of
assessment. Students progressing to university have therefore experienced
very different approaches to formative assessment, leading to different
expectations about what they can or should expect in terms of feedback
and help in improving their work. Taken together, these studies suggest that

K. Ecclestone
Westminster Institute of Education, Oxford Brookes University, Oxford, UK
e-mail: kecclestone@brookes.ac.uk

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 153


DOI: 10.1007/978-1-4020-8905-3_9, Ó Springer ScienceþBusiness Media B.V. 2009
154 K. Ecclestone

staff in universities need to understand more about the ways in which


learning cultures that students experience before university are a powerful
influence on expectations, attitudes and practices as they progress into
higher education (see Davies & Ecclestone, 2007; Ecclestone, 2002; Torrance
et al., 2005).
As a contribution to debate about how staff in universities might encourage
formative assessment for sustainable learning, the chapter draws on two
studies: one explored the factors that help and hinder teachers in changing
their formative assessment practices, the other explored the effects of forma-
tive and summative assessment on students’ and teachers’ ideas about learn-
ing, assessment and achievement (Davies & Ecclestone, 2007; Ecclestone
et al., in progress; Torrance et al., 2005). First, the chapter summarises some
barriers to better understanding of formative assessment in assessment
systems in the UK. Second, it summarises the concept of learning cultures as
an aid to understanding the effects of formative assessment on attitudes to
learning. Third, it applies this concept to case study data from in-depth inter-
views and observations of assessment activities in an Advanced Vocational
Business Studies qualification in a further education (tertiary) college (two
tutors, eight students) and an Advanced Vocational Science Qualification
(one teacher, two students) in a school. This section explores the different
factors in the learning cultures that affected students’ and teachers’ attitudes
to learning and the role of assessment in their learning. Finally, it evaluates
implications of this discussion for assessment practices in universities for
young people progressing to higher education from different learning cultures
in schools and colleges.

Barriers to Understanding Formative Assessment


in Post-compulsory Education

A Divided System

Most young people in the UK progress to university from general advanced


level academic qualifications or general advanced level vocational qualifica-
tions. Some go to higher education from a work-based qualification. After
compulsory schooling, students can take these qualifications in schools or
further education colleges. Despite government attempts since 2000 to encou-
rage more mixing of vocational and academic options, the UK’s qualification
tracks remain very segregated. This leads students to have strong images of
what is appropriate for them in terms of content and assessment approaches,
and to choose university options on the basis of particular images about the sort
of learners they are (see Ball, Maguire, & Macrae, 2000; Ball, David, & Reay,
2005; Torrance et al., 2005).
9 Instrumental or Sustainable Learning? 155

The past 30 years has seen a huge shift in British post-school qualifications
from methods and processes based on competitive, norm-referenced examina-
tions for selection, towards assessment that encourages more achievement.
There is now a strong emphasis on raising standards of achievement for every-
one through better feedback, sharing the outcomes and criteria with students
and recording achievement and progress in portfolios. One effect of this shift is
a harmonisation of assessment methods and processes between long-running,
high status academic qualifications in subjects such as history, sociology and
psychology and newer, lower status general vocational qualifications such as
leisure and tourism, health and social care. General academic and general
vocational qualifications now combine external examinations set by an award-
ing body, assignments or course work assessed by an awarding body and course
work or assignments assessed by teachers. Some general academic qualifica-
tions are still assessed entirely by external examinations.
A series of ad hoc policy-based and professional initiatives has encouraged
alternative approaches to both formative and summative assessment, such as
outcome and competence-based assessment, teacher and work-place assess-
ment, and portfolios of achievement. These have blurred the distinction
between summative and formative assessment and emphasised outcome-
based assessment as the main way to raise levels of participation, achievement,
confidence and motivation amongst young people and adults who have not
succeeded in school assessment (see, for example, Jessup, 1991; McNair, 1995;
Unit for Development of Adult and Continuing Education, 1989).
One effect has been to institutionalise processes for diagnostic assessment,
the setting and reviewing of targets, processes to engage students with assess-
ment specifications and criteria, methods of support and feedback to raise grade
attainment or improve competence, and ways of recording achievement. In
vocational and adult education, these processes tend to be seen as a generic
tool for motivation and achievement and for widening participation. In
contrast, similar practices in schools and higher education are more strongly
located in the demands of subject disciplines as the basis for higher achievement
and better student engagement.

Confusion about Formative Assessment

There is currently no watertight definition of formative assessment. Black and


Wiliam define formative assessment as ‘‘encompassing all those activities
undertaken by teachers and/or by their students which provide information to
be used as feedback to modify the teaching and learning activities in which they
are engaged (Black & Wiliam, 1998, p. 7).
Formative assessment is sometimes described as assessment for learning as
distinct from assessment of learning:
Assessment for learning is any assessment for which the first priority in its design and
practice is to serve the purpose of promoting students’ learning. It thus differs from
156 K. Ecclestone

assessment designed primarily to serve the purposes of accountability, or of ranking, or


of certifying competence. An assessment activity can help learning if it provides
information to be used as feedback, by teachers, and by their students, in assessing
themselves and each other, to modify the teaching and learning activities in which they
are engaged. Such assessment becomes ‘formative assessment’ when the evidence is
actually used to adapt the teaching work to meet learning needs. (Black & Wiliam,
1998, p. 2)

The Assessment Reform Group has developed principles of formative assess-


ment that encourage teachers to develop the links between information about
students’ progress towards learning goals, adaptations to planning and teach-
ing based on feedback and dialogue, and attention to the ways in which students
learn. This requires encouragement of autonomy, some choice about activities
and students’ understanding of goals, criteria and the purpose of feedback
(Assessment Reform Group, 2002). Royce Sadler (1989) argues that formative
assessment has to close the gap between current states of understanding and
final goals as part of demystifying and communicating the guild knowledge of
subject disciplines. More recently, Black regards all feedback as synonymous
with formative assessment or assessment for learning and feedback can take
many forms. He argues that especially valuable feedback can be seen in peer-
and self-assessment, in new approaches to discussion work and to teachers’
written feedback, and in more sensitive and open-ended questioning in class
(Black, 2007).
Despite agreement at the level of research, ideas about formative assessment
in everyday practice reveal images of learning as attaining objectives, where
knowledge is fixed and externally-defined (see Hargreaves, 2005). The princi-
ples outlined above can also mask practices that equate formative assessment
with continuous or modular feedback and monitoring for summative tasks
spread through a course. In contrast, the same rhetoric can convey images of
learning as the construction of knowledge, where knowledge needs reworking
by students so that it makes sense to them (Hargreaves, 2005).
Conflicting images of learning and teaching therefore affect the underlying
assumptions and apparently unproblematic practices of sharing goals and
criteria, giving feedback, closing the gap and adapting teaching to suit learning
needs. Activities based on transmission of the teacher’s expertise, knowledge
and advice (or the knowledge of those designing the assessment specifications)
have a very different ethos and outcome than formative assessment based on
transaction between teachers and students about processes, the content of an
activity or task or about its goals. In turn, formative assessment that aims to
transform students’ and teachers’ understanding of concepts and processes
associated with learning a subject offers a higher degree of challenge. For
example, Hargreaves shows how the notion of closing the gap is often rooted
in teacher-led images of performance, delivery, adapting teaching in the light of
assessment information, or as a gift from teacher to pupil (Hargreaves, 2005). It
is therefore important to pay attention to the language that teachers, qualifica-
tion designers and students use, as well as to their practices.
9 Instrumental or Sustainable Learning? 157

Attempts to define formative assessment do not, therefore, off-set widespread


misunderstanding amongst practitioners and institution managers that some
activities are formative and others summative. In post-compulsory education,
there is a tendency to see formative assessment as a series of teacher-led
techniques for feedback, diagnosis and review where, despite an accompanying
rhetoric of ‘‘engaging students with learning’’, the techniques and associated
formal paperwork are often solely to ‘‘track’’ students towards their summative
targets (see Ecclestone, 2002; Torrance et al., 2005). In post-compulsory educa-
tion, formative assessment is widely, and mistakenly, seen as synonymous with
continuous or modular assessment where summative tasks are broken up into
interim ones. Yet, more holistic definitions lead to further difficulty because
teachers, reasonably, associate such activities as questioning, written feedback
and practice examination questions with teaching.

Differentiating Between the Spirit and the Letter


of Formative Assessment

The need to be clearer and more precise about the purposes of formative
assessment is confirmed by research which shows that the same assessment
activities or methods can lead to very different kinds of learning in different
contexts. In the Learning How to Learn Project in the Economic and Social
Science Research Council’s Teaching and Learning Research Programme
(TLRP), Marshall and Drummond use the evocative terms spirit and letter of
formative assessment as assessment for learning (AfL) to capture how it was
practised in the classroom:
The ‘spirit’ of AfL. . .we have characterized as ‘high organisation based on ideas’,
where the underpinning principle is promoting pupil autonomy . . ..This contrasts
with those lessons where only the procedures, or ‘letter’ of AfL seem in place. We use
these headings – the ‘spirit’ and ‘letter’ – to describe the types of lessons we watched,
because they have a colloquial resonance which captures the essence of the differences
we observed. In common usage adhering to the spirit implies an underlying principle
which does not allow a simple application of rigid technique. In contrast, sticking to the
letter of a particular rule is likely to lose the underlying spirit it was intended to
embody. (Marshall & Drummond, 2006, p. 137)

They found that teachers working in the spirit of AfL encouraged students to
become more independent and critical learners in contrast to those working in
the letter of AfL, where formative assessment activities were teacher-centred
and teacher-led in order to transmit knowledge and skills. Researchers and
teachers in the Improving Formative Assessment project have found the terms
letter and spirit helpful in characterising formative assessment practices emer-
ging from our research, and we connect them in the paper to instrumental and
sustainable formative assessment.
This useful distinction of spirit and letter illuminates the ways in which
formative assessment might enable students to go beyond extrinsic success in
158 K. Ecclestone

meeting targets and, instead, to combine better performance with engagement


and good learning habits in order to develop learning autonomy. The distinc-
tion enables a contrast to be drawn with techniques based on a teacher-centred,
transmission view of knowledge and learning and which encourage compliant,
narrow responses. However, the spirit and letter are not neatly separated:
teachers in this project often had a particular goal and focus of attention
in mind, but shifted between these and others during a lesson (Marshall &
Drummond, 2006). The same phenomenon is also apparent amongst vocational
and adult education teachers (see Derrick & Gawn, 2007).

Motivation

Consideration of the spirit and letter of formative assessment suggests that a


more nuanced understanding of the forms of motivation and autonomy that
assessment promotes is also important if we are to understand the effects of
formative assessment on learning.
In order to go beyond an old and somewhat unrealistic dichotomy between
intrinsic and extrinsic motivation, German researchers have undertaken long-
itudinal studies of the ways in which students combine strategic approaches to
learning based on external motivation, with self-determination and personal
agency. This work uses well-known psychological constructs of motivation,
such as students’ and teachers’ attribution of achievement to effort, luck,
ability, the difficulty of a particular task or to other external factors, and the
extent to which students have a sense of agency or locus of control.
The resulting typology offers ‘‘a systematically ordered spectrum of con-
structs’’ that illuminates individual behaviours and activities in different con-
texts whilst recognising that motivation is affected strongly by social factors in a
learning group, and from family, peers and work colleagues (Prenzel, Kramer,
& Dreschel, 2001). The typology proved useful and illuminating in a study of
vocational education students’ experiences of, and responses to, assessment
(Ecclestone, 2002). Another study is currently evaluating whether or not the
typology might help teachers understand their students’ motivation better.
(Ecclestone et al., in progress). My descriptions here draw on Prenzel’s original
categories and insights gained from these two studies.

Amotivated
Amotivated learners lack any direction for learning, and are, variously, indif-
ferent or apathetic. Sometimes this state is an almost permanent response to
formal education or assessment and therefore hard to shift, or it appears at
points during a course. There is a sense that amotivated learners are drifting or
hanging on until something better appears. However, it is important to recog-
nise the obvious point, that for all of us at different times, our deepest, most
intrinsic motivation can revert to states where we are barely motivated or not
9 Instrumental or Sustainable Learning? 159

motivated at all! In this context, surviving the pressure of targets or trying to


achieve something difficult requires the reward or punishment of external
motivation.

External
Learning takes place largely in association with reinforcement, reward, or to
avoid threat or punishment, including: short-term targets, prescriptive out-
comes and criteria, frequent feedback (this might be about the task, the person’s
ego or feelings or the overall goal) and reviews of progress, deadlines, sanctions
and grades. In post-compulsory education, external motivation sometimes
takes the form of financial incentives (payment to attend classes or rewards
for getting a particular grade) or sanctions (money deducted for non-attendance
on courses).
External motives are sometimes essential at the beginning of a course, or at
low points during it. They are not, therefore, negative and can be used strate-
gically as a springboard for other forms of motivation or to get people through
difficult times in their learning. Yet, if left unchecked, external motives can
dominate learning all the way through a course and lead to instrumental
compliance rather than deep engagement.

Introjected/internalised
Introjected is a therapeutic term where someone has internalised an external
supportive structure and can articulate it as her or his own: in a qualification,
this might comprise the vocabulary and procedures of criteria, targets
and overall assessment requirements. Good specifications of grade criteria
and learning outcomes and having processes or tasks broken into small
steps enable learners to use the official specifications independently of tea-
chers. Nevertheless, although introjected motivation enables students to
articulate the official requirements and criteria almost by rote, it is not self-
determined.
For learners disaffected by assessment in the past, introjected motivation is
powerful and, initially, empowering. However, like external motivation, it can
become a straitjacket by restricting learners and teachers to prioritising the formal
requirements, especially in contexts where contact time and resources are restricted.

Identified
Learning occurs when students accept content or activities that may hold no
incentive in terms of processes or content (they might even see them as a burden)
but which are necessary for attaining a pre-defined goal such as a qualification
and short-term targets. It links closely to introjected motivation, and goals that
students identify with can be course-related, personal and social or all three as a
means to a desirable end.
160 K. Ecclestone

Intrinsic
Learners perceive any incentives as intrinsic to the content or processes of a
formal learning or assessment activity, such as enjoyment of learning something
for its own sake, helping someone else towards mastery or being committed to
others outside or inside the learning group. It is often more prevalent amongst
learners than teachers might assume and takes idiosyncratic, deeply personal
and sometimes fleeting forms. Intrinsic motivation is context-specific: someone
can therefore show high levels of intrinsic motivation in one task or context and
not in another. Learning is highly self-determined and independent of external
contingencies.

Interested
Interested motivation is characterised by learners recognising the intrinsic value
of particular activities and goals and then assigning their own subjective criteria
for what makes something important: these include introjected and identified
motives. Like intrinsic motivation, it relies on students assigning deeply perso-
nal meanings of relevance to content, activities and contexts. It is accompanied
by feelings of curiosity and perhaps risk or challenge, and encouraged by a
sense of flow, connection or continuity between different elements of a task or
situation.
High levels of self-determination, a positive identity or sense of self and the
ability to attribute achievement to factors within one’s own control are inte-
grated in a self-image associated with being a successful learner. Interested
motivation relates closely to Maslow’s well-known notion of self-actualisation,
where identity, learning activities, feelings of social and civic responsibility and
personal development are fused together. It is therefore often correlated with
good peer and social dynamics in a learning group.
Interested motivation characterised ideas about learning and personal
identity amongst some young people in Ball, Maguire and Macrae’s study of
transitions into post-compulsory education, training or work. The processes
and experiences of becoming somebody and of having an imagined future were
not only rooted strongly in formal education but also in their sense of having
positive opportunities in the local labour market and in their social lives.
Formal education therefore played an important and positive, although not
dominant part, in their evolving personal identity (Ball, Maguire, & Macrae,
2000).
Despite the descriptive and analytical appeal of these types, it is, of course,
crucial to bear in mind that they are not stable or neatly separated categories.
Nor can motivation be isolated from structural factors such as class, gender and
race, opportunities for work and education, and students’ and teachers’ percep-
tions of these factors. In formal education, students might combine aspects of
more than one type, they might change from day to day, and they might show
9 Instrumental or Sustainable Learning? 161

interested motivation strongly in one context (such as a hobby) but not at all in
the activities required of them at school, college or university.
The factors that make someone an interested or barely motivated learner are
therefore idiosyncratic, very personal and changeable: the most motivated,
enthusiastic young person can show high levels of interested motivation in
year one of a two year vocational course but be barely hanging on through
external and introjected motivation by the end of year two, simply because they
are tired with a course and want to move on. Some need the incentives of
external rewards and sanctions and to internalise the official demands and
support structures as a springboard to develop deeper forms of motivation
(see Davies & Ecclestone, 2007; Ecclestone, 2002).
Nevertheless, these caveats do not detract from a strong empirical connec-
tion shown in Prenzel’s studies and my own between intrinsic and interested
motivation based on high levels of self-determination and positive evidence of
the conditions listed below and, conversely, amotivation or external motivation
and poor evidence of these conditions:
 support for students’ autonomy, e.g., choices for self-determined discovery,
planning and acting;
 support for competence, e.g., effective feedback about knowledge and skills
in particular tasks and how to improve them;
 social relations, e.g., cooperative working, a relaxed and friendly working
atmosphere;
 relevance of content, e.g., applicability of content, proximity to reality,
connections to other subjects (here it is important to note that ‘‘relevance’’
is not limited to personal relevance or application to everyday life);
 quality of teaching and assessment, e.g., situated in authentic, meaningful
problem contexts, adapted to students’ starting points; and
 teachers’ interest, e.g., expression of commitment to students.

A Cultural Understanding of Assessment

The Concept of Learning Culture


It is not sufficient simply to promote agreed meanings and principles of
formative assessment or motivation and to improve teachers’ formative tech-
niques. Instead, we need to know more about the ways in which the idiosyn-
cratic features of local ‘‘learning cultures’’ and the political and cultural
conditions that affect how formative assessment affects motivation positively
or negatively. The concept of ‘‘learning culture’’ enables us to analyse how
formative assessment practices help to foster students’ deep engagement with
learning in some contexts and their instrumental compliance with assessment
targets in others. It was developed in the Transforming Learning Cultures in
Further Education (TLC) project which drew on the well-known work of Pierre
Bourdieu to define it as:
162 K. Ecclestone

a particular way to understand a learning site1 as a practice constituted by the actions,


dispositions and interpretations of the participants. This is not a one way process.
Cultures are (re)produced by individuals, just as much as individuals are (re)produced
by cultures, though individuals are differently positioned with regard to shaping and
changing a culture – in other words, differences in power are always at issue too.
Cultures, then, are both structured and structuring, and individuals’ actions are neither
totally determined by the confines of a learning culture, nor are they totally free.
(James & Biesta, 2007, p. 18)

A learning culture is therefore not the same as a list of features that comprise a
course or programme, nor is it a list of factors that affect what goes on in a
learning programme; rather, it is a particular way of understanding the effects
of any course or programme by emphasising the significance of the interactions
and practices that take place within and through it. These interactions and
practices are part of a dynamic, iterative process in which participants (and
environments) shape cultures at the same time as cultures shape participants.
Learning cultures are therefore relational and their participants go beyond the
students and teachers to include parents, college managers at various levels,
policy makers and national awarding bodies.
Learning culture is not, then, synonymous with learning environment since the
environment is only part of the learning culture:
a learning culture should not be understood as the context or environment within
which learning takes place. Rather, ‘learning culture’ stands for the social practices
through which people learn. A cultural understanding of learning implies, in other
words, that learning is not simply occurring in a cultural context, but is itself to be
understood as a cultural practice. (James & Biesta, 2007, p. 18, original emphases)

Instead, the TLC project shows that learning cultures are characterised by the
interactions between a number of dimensions:
 the positions, dispositions and actions of the students;
 the positions, dispositions and actions of the tutors;
 the location and resources of the learning site which are not neutral, but
enable some approaches and attitudes, and constrain or prevent others;
 the syllabus or course specification, the assessment and qualification
specifications;
 the time tutors and students spend together, their interrelationships, and the
range of other learning sites students are engaged with;
 issues of college management and procedures, together with funding and
inspection body procedures and regulations, and government policy;
 wider vocational and academic cultures, of which any learning site is
part; and

1
In the TLC project, the term ‘learning site’ was used, rather than ‘course’, to denote more
than classroom learning. In the IFA project we use the more usual terms ‘course’ and
‘programme’.
9 Instrumental or Sustainable Learning? 163

 wider social and cultural values and practices, for example around issues of
social class, gender and ethnicity, the nature of employment opportunities,
social and family life, and the perceived status of Further Education as a sector.
Learning culture is intrinsically linked to a cultural theory of learning ‘‘[that]
aims to understand how people learn through their participation in learning
cultures, [and] we see learning cultures themselves as the practices through
which people learn’’ (James & Biesta, 2007, p. 26).
A cultural understanding illuminates the subtle ways in which students and
teachers act upon the learning and assessment opportunities they encounter and
the assessment systems they participate in. Teachers, students, institutional man-
agers, inspectors and awarding bodies all have implicit and explicit values and
beliefs about the purposes of a course or qualification, together with certain
expectations of students’ abilities and motivation. Such expectations are explicit
or implicit and can be realistic or inaccurate. Other influential factors in learning
cultures are the nature of relationships with other students and teachers, their
lives outside college and the resources available to them during the course (such
as class contact time): all these make students active agents in shaping expecta-
tions and practices in relation to the formal demands of a qualification (see, for
example, Ecclestone, 2002; Torrance et al., 2005).

The Learning Culture of the Advanced Vocational Certificate


of Education (AVCE) Science

Data and analysis in this section are taken from a paper by Davies and
Ecclestone (2007). The student group at Moorview College, a school in a
rural area of south west England, comprised 16 Year 13 students aged 17–18,
with roughly equal numbers of boys and girls, and three teachers (teaching
the physics, chemistry and biology elements of the course respectively).
The learning culture was marked by a high level of synergy and expansiveness,
of which teachers’ formative assessment practices were a part. Teachers
regarded formative assessment as integral to good learning, ‘‘part of what we
do’’ rather than separate practices, where formative assessment was a subtle
combination of helping students to gain realistic grades, alongside developing
their enthusiasm for and knowledge of scientific principles and issues, and their
skills of self-assessment. It encouraged them to become more independent and
self-critical learners. Strong cohesion between teachers’ expectations, attitudes
to their subject and aspirations for their students was highly significant in the
way they practised formative assessment.
Synergy stemmed from close convergence between teachers and students
regarding both expectations and dispositions to learning. The AVCE teachers
expected students to achieve, while accepting that they did not usually arrive on
the course with such high GCSE grades as those in A-level science subjects.
There was also a general consensus between teachers and students that science
164 K. Ecclestone

was intrinsically interesting as well as practically relevant. The teachers were


confident and enthusiastic about their subject areas, teaching subjects that
were part of the accepted academic canon translated into a vocational syllabus.
Most students saw the course as a positive choice, although for others it was
a second choice when they failed to achieve high enough grades to take a single-
subject science A-level. However, once on the course, there was a high level of
enthusiasm and commitment and a desire to study science, rather than because
of any vocational relevance. Most did not initially aim for higher education but
soon became motivated to apply to study in a vocational branch of science (such
as forensic science), or a different field (such as architectural technology) where
the vocational A-level grades would help them achieve the necessary points
score. Their relationship with the course enabled some to broaden their hor-
izons for action considerably during Year 13, such horizons being the arena
within which actions can be taken and decisions made.
There was, therefore, an influential ethos amongst students and teachers of
progression in a clear route to something desirable and interesting. This rein-
forced the strong subject culture, applied approvingly by teachers and students
to real vocational and life contexts. Although grades were crucial, formative
assessment was far from being predominantly grade-focused, despite the strong
institutional ethos.
A powerful factor in the learning culture was the beliefs and commitment of
the main teacher whose assessment practices we focused on in a project designed
to improve formative assessment practices in vocational education and adult
literacy and numeracy programmes (Ecclestone et al., in progress). Derek
Armstrong insisted that he would not simply teach to the test; instead, he
emphasised constantly the importance of developing students’ understanding
of the value of scientific knowledge and their ability to become more indepen-
dent learners as an essential preparation for university. He believed strongly
that an advantage of the vocational over the academic A-level was that it taught
students to be ‘‘in charge of their own learning’’ (1st interview). This involved his
belief that students should acquire self-knowledge in order to be able to learn
effectively, including knowing when to ask for help: ‘‘I think students must
know how good they are, and know what their limitations are’’ (1st interview).
His teaching and assessment practice encouraged dispositions that could lead to
deeper learning, rather than success in meeting targets:
I’m a lot more comfortable with saying, ‘‘You’re actually getting a grade that is much
more appropriate to what you’ve done, rather than one which we could have forced you
to get, by making you do exactly what we know needs to be done’’, which obviously we
know happens more and more in education because it’s all results driven. (Derek, 2nd
interview)

Indeed, he was not prepared to compromise to meet a target-driven educa-


tional culture:
There’s no point in jumping through hoops for the sake of jumping through hoops and
there’s no point in getting grades for the sake of getting grades. I know that’s not the
9 Instrumental or Sustainable Learning? 165

answer, because the answer is – no, we should be getting them to get grades. But that’s
never as I’ve seen it and it never will be. (Derek, 3rd interview)

Derek espoused a theory of learning that encouraged him and students to


construct their knowledge and understanding together, by working actively to
understand mistakes, learn from them and build new insights together. Class-
room observations and student interviews revealed that this espoused theory
was also his theory-in-use (Argyris & Schon, 1971). He routinely asked students
to explain a point to the rest of the group rather than always doing this himself.
Although students did not conceptualise their learning in exactly the same way,
and placed a higher premium on their grades, they also showed appreciation of
the way their understanding and appreciation of science was developing:
Some of the teachers teach you the subject and some of the teachers just help you learn
it. Mr Armstrong will help you learn it and understand it. (Nick, student, 1st interview)

Despite its collaborative nature, this learning culture was rooted in a strong
belief on both sides that the teacher is the most crucial factor in learning:
‘‘I believe they all know they can’t do it without me’’ (Derek, 3rd interview).
When asked what motivated his students on the AVCE, Derek’s answer was
unequivocal: ‘‘I’m going to put me, me, me, me’’ (1st interview).

Motivation in AVCE Science


Drawing on Prenzel’s typology of motivation, our study showed that expecta-
tions of positive achievement for all students interacted with, and were also
shaped by, expectations of students’ motivation. Teachers showed high levels of
intrinsic motivation, where engagement with topics and ideas was rooted in
their intrinsic value rather than for external reward (such as grades) and also
interested motivation, where a sense of personal and learning identity is bound
up with the subject, its activities and possibilities. They expected students to
develop intrinsic and interested motivation too. Students wanted a qualification
in order to achieve their individual goals (external motivation) but the goals
stemmed from interest in the course/science and their sense of becoming
somebody in a subject with progression and future possibilities (intrinsic and
interested motivation).
Students’ motivation appeared to stem from a symbiotic relationship
between their teachers’ expertise and enthusiasm, the supportive group
dynamics, the focus on collaborative learning, and their own vocational
goals. This did, however, fluctuate with individuals and over time.
While the learning culture of AVCE Science was characterised by a high level
of synergy, it was also reasonably expansive. Despite the constraints of the
syllabus and the assessment criteria, teachers took opportunities to promote an
interest in scientific issues and topics, and Derek encouraged students to
develop individual approaches to meet the criteria. Moreover, the vocational
relevance of the AVCE contributed towards the expansive nature of the
166 K. Ecclestone

learning culture. Although the course was not highly practical and did not
include work placements, it did include relevant trips and experimental work.
It was vocational above all, though, in the way the teachers related the knowl-
edge they taught to real-life experience. As Derek summed up:
I think the real life concepts that we try and pull out in everything works very well.
I think we’re incredibly fortunate to have time to teach learning, as opposed to time to
teach content. (1st interview, original emphases)

Students generally had begun the course expecting the work to be reasonably
easy and therefore restrictive and ‘‘safe’’, rather than challenging. In fact, the
teachers’ pedagogy and formative assessment enabled students to accept chal-
lenge and risk, not in what they were learning, but in how they were learning.
Our observations and interviews showed that they found themselves explaining
work to their fellow students, joining in Derek’s explanations, negotiating how
they might go about tasks and losing any initial inhibitions about asking
questions of Derek and of one another.

The Relationship Between the Learning Culture and Formative Assessment


In theory, there was potential for negative tension between Moorview’s target-
driven, achievement-orientated ethos and the AVCE Science teachers’ commit-
ment to their subjects, but in practice this did not materialise. Instead, these
teachers saw formative assessment as being about how students learned and part
of a continuum of teaching and assessment techniques deeply embedded in their
day-to-day practice. Derek used formative assessment to help students construct
their knowledge, rather than solely to achieve targets. As he put it, ‘‘I can teach
them to enjoy the science – but I wouldn’t call that formative assessment’’
(2nd workshop). His declaration, ‘‘My primary concern has never been their
final grade’’ (3rd interview), should not, though, be taken to imply a cavalier
attitude towards helping students to gain reasonable grades. All his students
achieved high enough grades in the AVCE to gain their choice of university place.
Students’ willingness to admit to misunderstandings or half-understandings,
and for teachers to diagnose these, is crucial to effective formative assessment
(Black, 2007). The learning culture encouraged students to became involved in
peer- and self-assessment in different ways and to view a problem as an inter-
esting issue to be explored with one another and with Derek, rather than as an
indicator of their lack of ability. High levels of synergy and expansiveness were
reflected in Derek’s formative assessment practices which he sees simply as part
of teaching and learning. As he put it, ‘‘I know no jargon’’ (1st interview).
Critical and positive feedback was also integral to his teaching:
I don’t think there is any point in scribbling on a piece of paper, ‘‘This isn’t done right.
This is how it should be done’’. I think you’ve actually got to go through and do it
with them. They’ve got to know where the issues and the problems are for themselves.
(1st interview)
9 Instrumental or Sustainable Learning? 167

His approach to formative assessment might, in other learning cultures, have


been a technique used in the letter of formative assessment. However, he refused
to give students a ‘‘check list’’ to use with the criteria ‘‘because that’s not
preparing, that’s not what the course is about, and they should be working
more independently’’ (2nd interview). Instead, he used the technique in the spirit
of formative assessment, to develop students’ deeper understanding both of the
coursework subject matter and of their own ability to be self-critical of their
assignment. He wanted to encourage them, for them to say, ‘‘well, it’s not
actually as difficult as you think it is’’ (2nd interview).

The Learning Culture of Advanced Vocational Business

Data and analysis here are drawn from a different project that observed and
explored students’ and teachers’ attitudes to assessment, and which observed
formative assessment practices (Torrance et al., 2005). I focus here on students
in a further education college, Western Counties, where students progressed
from their local school to the college. The group comprised eleven students,
three boys and eight girls, all of whom had done Intermediate Level Business at
school and who were therefore continuing a vocational track. Students in this
group saw clear differences between academic and vocational qualifications:
academic ones were ‘‘higher status, more well-known’’. Yet, despite the lower
status of the vocational course, students valued its relevance.
Learning and achievement were synonymous and assessment became the
delivery of achievement, mirroring the language of inspection reports and policy
texts. Vocational tutors reconciled official targets for delivery with educational
goals and concerns about students’ prospects. They saw achievement as largely
about growing confidence and ability to overcome previous fears and failures.
Personal development was therefore more important than the acquisition of
skills or subject knowledge: this comment was typical:
[students] develop such a lot in the two years that we have them... the course gives
them an overall understanding of business studies really, it develops their under-
standing and develops them as people, and hopefully sets them up for employment.
It doesn’t train them for a job; it’s much more to develop the student than the content...
that’s my personal view, anyway.... (Vocational course leader, quoted in Torrance
et al., 2005, p. 43)

Some, but not all, students had a strong sense of identity as second chance
learners, an image that their tutors also empathised with from their own
educational experience and often used as a label. This led to particular ideas
about what students liked and wanted. One tutor expressed the widely held view
that ‘‘good assessment’’ comprised practical activities, work-experience and
field trips: ‘‘all the things these kids love... to move away from all this written
assessment’’. For her, assessment should reflect ‘‘the way that [vocational]
students prefer to learn. . .they are often less secure and enjoy being part of
168 K. Ecclestone

one group with a small team of staff...[assessment is] more supported, it’s to do
with comfort zones – being in a more protected environment’’. Beliefs about
‘‘comfort zones’’ and ‘‘protecting’’ students meant that teachers aimed to mini-
mize assessment stress or pressure. Teachers and students liked working in a
lively, relaxed atmosphere that combined group work, teacher input, time to
work on assignments individually or in small friendship-based groups, and
feedback to the whole group about completed assignments.

Motivation

There was a high level of synergy between students’ goals and the official policy
goals their teachers had to aim for, namely to raise attainment of grades,
maintain retention on courses and encourage progression to formal education
at the next level.
Three students were high achievers, gaining peer status from consistently
good work, conscientious studying and high grades. They referred to them-
selves in the same language as teachers, as ‘‘strong A-grade students’’,
approaching all their assignments with confidence and certainty and unable
to imagine getting low grades. They combined identified and interested motiva-
tion from the typology discussed above. Crucial to their positive identity as
successful students was that less confident or lower achieving students saw them
as a source of help and expertise. In an earlier study of a similar advanced level
business group in a further education college, successful students drew upon
this contrast to create a new, successful learning identity (see Ecclestone, 2004).
The majority of students combined the introjected motivation of being
familiar with the detail of the assessment specifications and the identified
motivation of working towards external targets, namely achieving the qualifi-
cation. They worked strategically in a comfort zone, adopting a different grade
identity from the high achievers. Most did not aim for A-grades but for Cs or
perhaps Bs. They were unconcerned about outright failure, since, as teachers
and students knew, not submitting work was the only cause of failure.
Goals for retention and achievement, together with teachers’ goals for
personal development and students’ desire to work in a conducive atmosphere
without too much pressure, encouraged external, introjected and identified
motivation and much-valued procedural autonomy. This enabled students to
use the specifications to aim for acceptable levels of achievement, with freedom
to hunt and gather information to meet the criteria, to escape from ‘‘boring’’
classrooms and to work without supervision in friendship groups (see also
Bates, 1998). Synergy between teachers and students in relation to these features
of the learning culture encouraged comfortable, safe goals below students’
potential capacity, together with instrumental compliance. The official specifi-
cations enabled students to put pressure on teachers to ‘‘cover’’ only relevant
knowledge to pass the assignment, or to pressurise teachers in difficult subjects
to ‘‘make it easy’’.
9 Instrumental or Sustainable Learning? 169

The Relationship Between the Learning Culture and Formative Assessment


Meeting the summative requirements dominated formative activities completely.
Teachers saw their main assessment role as a translator of official criteria,
breaking up the strongly framed assignment briefs into sequential tasks to
meet each criterion. Students prepared their assignments, working to copies
of the official criteria specified for grades in each unit. Posters of criteria and
grades gained by each student for each assignment were displayed on classroom
walls.
Formative assessment for both students and teachers focused on raising
grade achievement. Students could submit a completed draft for feedback:
this had to reflect a best attempt and they could not submit a half-hearted
version in the hope of feedback to make it pass. There was wide variation in
arrangements for this: in some courses, drafting was done numerous times while
in others, only one opportunity was offered. Lesson time was almost entirely
dominated by assessment and was used to introduce each assignment but also to
talk through the outcomes of draft assignments, outlined here by one of the
college tutors:
I talk through the assessment criteria grid with them and the assignment brief,
pinpointing the relationships between P, M and D [pass, merit and distinction] and
that it does evolve through to D. The students like to go for the best grade possible and
discuss how they could go about getting an M. There again, some students just aim for
basic pass and those are the ones who leave everything to the last minute. Then I see a
draft work, read through it, make notes, talk to each one, show the good areas in
relation to the criteria and explain why and how if they have met them, saying things
like ‘you’ve missed out M2’.... some will action it, some won’t. It’s generally giving
them ideas and giving them a platform to achieve the outstanding M or D criteria.
(Torrance et al., 2005, p. 46)

Tutors spent a great deal of time marking draft and final work, starting with
grade criteria from the Es through the Cs to the As (the grade descriptors were
changed from ‘P’, ‘M’, ‘D’ to ‘A’, ‘B’, ‘C’ in 2000). Where students had not done
enough to get an E, teachers offered advice about how to plug and cover gaps,
cross-referenced to the assessment specifications. Students had strong expecta-
tions that teachers would offer advice and guidance to improve their work.
There was a strong sense that assessment feedback and allowing time to do
work on assessed assignments in lessons had replaced older notions of lesson
planning and preparation.
Assignments were meticulously organised and staff and students in both
programmes had internalised the language of ‘‘evidencing the criteria’’, ‘‘sign-
posting’’ and ‘‘cross-referencing’’ and generating ‘‘moderatable evidence’’.
Demands from the awarding body that teachers must generate ‘‘evidence’’
that enabled moderation processes to ensure parity and consistency between
different centres offering the qualification around the country, led to student
assignments for assessment that had similar formats. Teachers in the Advanced
Business course felt that this made idiosyncratic interpretation or genuine local
design of content impossible. Yet, as we saw in the case of AVCE Science above,
170 K. Ecclestone

this sterile approach is not inevitable: the learning culture of this college, with
its strong conformity to official targets, combined with teachers’ beliefs about
students and what the qualification was primarily for, were powerful factors
shaping formative assessment.

The Implications of Different Learning Cultures on Attitudes


to Learning and Assessment

The learning cultures of the two courses influenced formative assessment


practices, just as those formative assessment practices simultaneously emerged
from and reinforced the levels of synergy and expansiveness within the learning
cultures. There were major differences between teachers’ overall approaches to
formative assessment.
A socio-cultural analysis of the effects of assessment on motivation, auton-
omy and achievement shows that students and teachers bring particular
dispositions to their teaching and assessment practices. The formal demands
of an assessment system interact with these dispositions and other features of
a learning culture whilst also creating and enabling new dispositions and
practices. The processes involved in navigating a tightly specified, regulated
assessment system socialise students into particular expectations about what
counts as appropriate feedback, help and support and about how easy or
difficult it is to act on feedback in order to get a good grade.
A cultural understanding also illuminates how feedback, internalising the
criteria, self-assessment and using detailed grade descriptors and exemplars can,
in some learning cultures, encourage superficial compliance with atomised tasks
derived from the assessment criteria, bureaucratic forms of self-assessment and
high expectations of coaching and support. In the learning culture of Advanced
Business, teachers used these processes to build confidence for learners they saw
as second chance or fragile. A significant number of students developed proce-
dural autonomy and introjected and identified motivation as part of a new,
successful learning identity. A minority used these as springboards to deeper
engagement and enthusiasm, and a sense of real achievement.
Business students accepted the requirements without complaint or dissent
and learned to navigate the various demands and processes. They liked the
assessment system and were far from passive in it: indeed, their ideas about
enjoyable learning, beliefs about their abilities and acceptable assessment
methods, together with their strategic approach, were as influential as the
official specifications and targets and teachers’ ideas about students’ needs.
Collusion was therefore integral to commitment, compliance and comfort
zones.
In the learning culture of Advanced Science, images of second chance
learners were much less prominent: despite a sense that the students did not
have good enough grades to do the academic alternative to the vocational
course, teachers had high expectations of intrinsic motivation for an interesting
9 Instrumental or Sustainable Learning? 171

subject. In contrast, there was a compelling sense in the Advanced Business


learning culture that students were following pre-determined tracks that
conformed to and confirmed an existing identity as a type of learner. There
were also strong stereotypes about what vocational students expect, need or
want, and can deal with. In many courses, narrow instrumentalism has become
central to those expectations, making assessment in post-compulsory education
not merely for learning or of learning: instead, it was learning:
The clearer the task of how to achieve a grade or award becomes, and the more detailed
the assistance given by tutors, supervisors and assessors, the more likely the candidates
are to succeed; but succeed at what? Transparency of objectives, coupled with extensive
use of coaching and practice to help learners meet them, is in danger of removing
the challenge of learning and reducing the quality and validity of outcomes
achieved. . ..assessment procedures and practices come completely to dominate the
learning experience, and ‘criteria compliance’ comes to replace ‘learning’. (Torrance
et al., 2005, p. 46)

In contrast, the learning culture of AVCE Science was shaped by the qualification
design, the subject enthusiasm of the teachers and a clear sense of vocational
knowledge, together with a system of selection as students progress from
compulsory schooling that guaranteed a certain level of achievement and
motivation. Selection was not available for the Advanced Business course: in
order to meet recruitment targets, the college took almost every student who
applied. These features, and the practices and expectations of teachers and
students, combined to produce a much more expansive learning culture, includ-
ing the way formative assessment was conceptualised and practised.
Formative assessment practices as part of the learning culture of a vocational
course are, to some extent, linked to the ways in which managers, practitioners,
parents and students perceive such courses. Differences in the learning cultures
of these two courses have also raised the question of what students and educa-
tors expect from a vocational course. Images and expectations of what counted
as vocational and the kind of status attached to vocational was a key factor in
shaping the learning culture. Its meaning in Advanced Science stemmed
strongly from the way teachers linked scientific knowledge to real life situations.
In Advanced Business, it seemed to be synonymous with the greater ratio of
coursework to exams and activities that enable students to gain the qualifica-
tion without too much difficulty. Vocational courses were generally accepted by
students, their parents and certain teachers as being of lower status than the
academic single subject courses at GCSE or A-level.
These two case studies therefore illuminate the subtle ways in which some
learning cultures foster a predominantly instrumental approach to learning and
formative assessment, while others encourage sustainable forms of formative
assessment. It seems that the high level of synergy and the expansive nature of
the learning culture of AVCE Science both encouraged and encompassed
practices in the spirit of formative assessment. In contrast, the more restrictive
learning culture of Advanced Business encouraged and perpetuated practices
that are essentially in the letter of it. From these differences, formative
172 K. Ecclestone

assessment in the latter was a straitjacket on the potential for sustainable


learning whereas formative assessment in the former can be seen as a spring-
board for it. There is potential for certain practices to become springboards in
Advanced Business, but our analysis of the learning culture suggests that this
will not be easy. Some learning cultures therefore offer more potential for
teachers to use formative assessment as a springboard for sustainable learning
than others.

Implications for Formative Assessment in Higher Education

Learners progressing to higher education in Britain will experience more


attempts to demystify assessment, engage students with the criteria and provide
as much transparency as possible: paper after paper at a European conference
on assessment in higher education in 2006 presented these processes unproble-
matically and uncritically as innovative and progressive2. Unwittingly, this
offers an image of learners who cannot or will not cope without these aids.
Rather than being used in the spirit of formative assessment, many students
adapt accordingly:
I get an A4 piece of paper for each of 5 assessment criteria. . .examine the texts and lift
out from the texts and put it in whichever section I feel relevant, that’s how I do it:
you’re getting your five pieces of work, separate pieces of the question and you just
string them all together and structure it. (Male, high achieving BA Sociology of Sport
student quoted by Bloxham & West, 2006, p. 7)

As analysis shows here, students progressing from learning cultures that have
already socialised them into instrumental attitudes to formative assessment are
likely to expect similar approaches and to resist those that are more challenging.
Yet, instrumentalism is not entirely negative. Instead, it has contradictory
effects: indeed, instrumentalism to achieve meaningful, challenging tasks can-
not be said to be uneducational per se. In a socio-cultural context of serious
concern about disengagement and the professed need to keep young people in
formal education as long as possible, instrumentalism enables some students in
the British education system to achieve when they would not otherwise have
done so and to go beyond their previous levels of skill and insight. It also
encourages others to work in comfortable confines of self-defined expectations
that are below their potential.
These conclusions raise questions about the purpose and effects of formative
assessment used primarily to increase motivation and participation rather than
to develop subject-based knowledge and skills. Assessment systems can privi-
lege broad or narrow learning outcomes, external, introjected, identified, intrin-
sic or interested motivation. They can also reinforce old images and stereotypes

2
‘Assessment for Excellence’: The Third Biennial Joint Northumbria/EARLI SIG Assess-
ment Conference, Darlington, England, August 30 – September 1, 2006.
9 Instrumental or Sustainable Learning? 173

of learning and assessment or encourage new ones, and therefore offer comfor-
table, familiar approaches or risky, challenging ones. However, in the British
education system, socio-political concerns about disengagement from formal
education amongst particular groups have institutionalised formative assess-
ment practices designed to raise formal levels of achievement rather than to
develop deep engagement with subject knowledge and skills.
Vocational students are undoubtedly ‘‘achieving’’ in some learning cultures
but we have to question what they are really ‘‘learning’’. A tendency to reinforce
comfortable, instrumental motivation and procedural autonomy, in a segre-
gated, pre-determined track, begs serious questions about the quality of educa-
tion offered to young people still widely seen as second best, second chance
learners. Nevertheless, as the learning culture of Advanced Vocational Science
shows, these features are not inevitable.

References
Assessment Reform Group. (2002). 10 principles of assessment for learning. Cambridge:
University of Cambridge.
Ball, S.J., David, M., & Reay, D. (2005). Degrees of difference. London: RoutledgeFalmer.
Ball, S.J., Maguire, M., & Macrae, S. (2000). Choices, pathways and transitions post-16: New
youth, new economies in the global city. London: RoutledgeFalmer.
Bates, I. (1998). Resisting empowerment and realising power: An exploration of aspects of
General National Vocational Qualifications. Journal of Education and Work, 11(2),
109–127.
Black, P. (2007, February). The role of feedback in learning. Keynote presentation to
Improving Formative Assessment in Post-Compulsory Education Conference, Univer-
sity of Nottingham, UK.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning, Assessment in Education,
18, 1–73.
Bloxham, S. & West, A. (2006, August–September). Tell me so that I can understand. Paper
presented at European Association for Learning and Instruction Assessment Special
Interest Group Bi-annual Conference, Darlington, England.
Boud, D., & Falchikov, N. (Eds.), (2007). Re-thinking assessment in higher education:
Learning for the long term. London: RoutledgeFalmer.
Davies, J., & Ecclestone, K. (2007, September). Springboard or strait-jacket?: Formative
assessment in vocational education learning cultures. Paper presented at the British Educa-
tional Research Association Annual Conference, Institute of Education, London.
Derrick, J., & Gawn, J. (2007, September). The ‘spirit’ and ‘letter’ of formative assessment in
adult literacy and numeracy programmes. Paper presented at the British Educational
Research Association Annual Conference, Institute of Education, London.
Ecclestone, K. (2002). Learning autonomy in post-compulsory education: The politics and
practice of formative assessment. London: RoutledgeFalmer.
Ecclestone, K. (2004). Learning in a comfort zone: Cultural and social capital in outcome-
based assessment regimes. Assessment in Education, 11(1), 30–47.
Ecclestone, K., Davies, J., Derrick, J., Gawn, J., Lopez, D., Koboutskou, M., & Collins, C.
(in progress). Improving formative assessment in vocational education and adult literacy and
numeracy programmes. Project funded by the Nuffield Foundation/National Research
Centre for Adult Literacy and Numeracy/Quality Improvement Agency. Nottingham:
University of Nottingham. www.brookes.ac.uk/education/research
174 K. Ecclestone

European Association for Learning and Instruction. (2006, August–September). Assessment


Special Interest Group Bi-annual Conference, Darlington, England.
Gardner, J. (Ed.). (2006). Assessment and learning. London: Sage.
Hargreaves, E. (2005). Assessment for learning: Thinking outside the black box. Cambridge
Journal of Education, 35(2), 213–224.
James, D., & Biesta, G. (2007). Improving learning cultures in further education. London:
Routledge.
Jessup, G. (1991). Outcomes: NVQS and the emerging model of education and training.
London: Falmer Press.
Marshall, B., & Drummond, M. J. (2006). How teachers engage with assessment for learning:
Lessons from the classroom. Research Papers in Education, 18(4), 119–132.
McNair, S. (1995). Outcomes and autonomy. In J. Burke (Ed.), Outcomes, learning and the
curriculum: implications for NVQs, GNVQs and other qualifications. London: Falmer
Press.
Prenzel, M., Kramer, K., & Dreschel, B. (2001). Self-interested and interested learning in
vocational education. In K. Beck (Ed.), Teaching-learning processes in business education.
Boston: Kluwer.
Sadler, R. (1989). Formative assessment and the design of instructional systems. Instructional
Science, 18, 119–144.
Torrance, H., Colley, H., Garratt, D., Jarvis, J., Piper, H., Ecclestone, K., & James, D. (2005).
The impact of different modes of assessment on achievement and progress in the learning and
skills sector. London: Learning and Skills Development Agency.
Unit for Development of Adult and Continuing Education. (1989). Understanding learning
outcomes. Leicester, UK: Unit for Development of Adult and Continuing Education.
Chapter 10
Collaborative and Systemic Assessment of Student
Learning: From Principles to Practice

Tim Riordan and Georgine Loacker

Introduction

What can be learned from an institution with more than thirty years experience in
the teaching and assessment of student learning outcomes? Much of that depends
on what they themselves have learned and what they can tell us. This chapter aims
to find out in one particular case. In doing so, we focus on the process of going
from principles to practice, rather than merely on a description of our practice.
Much has been written about the ability-based curriculum and assessment at
Alverno College (Arenson, 2000; Bollag, 2006; Levine, 2006; Mentkowski &
Associates, 2000; Twohey, 2006; Who needs Harvard?, 2006. Also see, www.
alverno.edu), but this particular chapter emphasizes how certain shared princi-
ples have informed the practice of the Alverno faculty and how the practice of the
faculty has led to emerging principles. Our reasoning is that the principles are
more apt to be something that Alverno faculty can hold in common with faculty
from other institutions, and the process can suggest how an institution can come
to its unique practice from a set of shared principles. The instances of Alverno
practice that we do include are examples of what kind of practice the faculty at
one institution have constructed as expressions of their shared principles.
Since 1973, the faculty of Alverno College have implemented and refined a
curriculum that has at its core the teaching and assessment of explicitly articu-
lated learning outcomes that are grounded in eight core abilities integrated with
disciplinary concepts and methods. In this chapter we explore the shared
principles and common practices that are integral to teaching, learning, and
assessment at Alverno. Indeed, one of the fundamental principles at the college
is that the primary focus of the faculty is on how to teach and assess in ways that
most effectively enhance student learning. This principle has permeated and
shaped the culture of the college and informed the scholarly work of the faculty,
and it serves as the impetus for our ongoing reflection and discussion with
colleagues throughout higher education.

T. Riordan
Alverno College, Milwaulkee WX, USA
e-mail: Tim.Riordan@alverno.edu

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 175


DOI: 10.1007/978-1-4020-8905-3_10, Ó Springer ScienceþBusiness Media B.V. 2009
176 T. Riordan and G. Loacker

We also consider in this chapter what Alverno faculty have learned as they
have worked on their campus and with other higher education institutions to
make assessment a meaningful and vital way to enhance student learning. We
draw from the experience of our faculty, from their work with many institutions
interested in becoming more steadily focused on student learning, and from
research grounded in practice across higher education. We hope that what we
have to say here will be a stimulus for further reflection on issues we all face as
educators in the service of student learning.
Although, as our title implies, we emphasize in this chapter the collaborative
and systemic dimensions of assessment, some important shifts in the thinking of
Alverno faculty happened more in fits and starts than in an immediate, shared
sense of direction; so we start at this point in this introduction with a couple of
ideas that emerged in both individual reflection by Alverno faculty and in larger
discussions.
A key insight many of us gradually came to was that we had been teaching as
scholars rather than as scholar teachers: we had the conviction that, if we did
well in conveying our scholarship, learning was the student’s job. We did care
whether the students were learning, though, and many of us were beginning to
raise questions because of that. Then we began to learn that our caring had to
take on more responsibility. Gradually we came to learn what that responsi-
bility is. We knew we needed to integrate our scholarship and our teaching, and
consequently our assessment, at the level of the individual student as well as of
the program and institution. Our ideas about assessment did not come to us
fully-formed; we simply kept asking ourselves questions about what would be
necessary to put student learning at the center of our scholarly work. We have
since come to learn much about what that entails.
For a start, we decided we were responsible for deciding and expressing what
we thought our students should learn. We made that a foundational assumption
which turned into an abiding first principle.

It Is Important to Clearly Articulate what we Require our Students


to Learn

The need to clearly and publicly articulate learning outcomes was and remains a
critical first principle for all of us at Alverno. Doing so provides a common
framework and language for faculty in developing learning and assessment experi-
ences. It helps students internalize what they are learning because they are more
conscious of the learning that is expected.
The process of articulating learning outcomes can begin in many ways. For
us, it grew out of faculty discussions in response to questions posed by the
president of the college. More than thirty years ago, we engaged in the process
of articulating student learning outcomes when the president challenged us to
consider what ‘‘students could not afford to miss’’ in our respective fields of
10 Collaborative and Systemic Assessment of Student Learning 177

study and to make public our answers to one another. Although we entered
those discussions confident that we knew what we wanted our students to learn,
we sometimes struggled to be as clear as we hoped. That is when we began to
realize that if we found it hard to articulate what students should learn, how
much harder it is for students to figure it out. We also discovered in those
discussions that, although each discipline emphasizes its unique concepts and
methods, there are some learning outcomes that seem to be common across all
fields and that faculty want students to learn no matter what their major area of
study is. The faculty agreed that all students would be required to demonstrate
eight abilities that emerged from the learning outcomes discussions. Since that
time, we have taught and assessed for the following abilities in the contexts of
study in the disciplines or in interdisciplinary areas, both in the general educa-
tion curriculum and the majors:
 Communication
 Analysis
 Problem Solving
 Valuing in Decision-Making
 Social Interaction
 Developing a Global Perspective
 Effective Citizenship
 Aesthetic Engagement.
The faculty have revised and refined the meaning of the abilities many times
over the years, but our commitment to the teaching and assessment of the
abilities in the context of disciplinary study has remained steadfast. In order
to graduate from the college, all students must demonstrate the abilities as part
of their curriculum, so faculty design of teaching and assessment processes is
informed by that requirement. We have in the course of our practice learned
more about the nature of what is involved in this kind of learning.
For instance, we were initially thinking of these abilities in terms of what we
would give the students; however, the more we thought about the abilities we
had identified and about our disciplines and the more we attended to the
research on the meaning of understanding, the more we realized what a student
would have to do to develop the abilities inherent in our disciplines. We moved
from questions like, ‘‘What is important for students to remember or know?’’ to
‘‘Are they more likely to remember or know something if they do something
with it?’’ This led us to another key principle at the foundation of our practice.

Education goes Beyond Knowing to Being able to use


What One Knows
We, along with other educational researchers, have seen that, for students, using
what they know moves their knowledge to understanding and that observing how
students use what they know or understand is usually the most effective way to
178 T. Riordan and G. Loacker

assess what they have learned. We are also committed to the notion that students
who graduate from our college should be able to apply their learning in the variety
of contexts they will face in life, whether it be to solve a life problem or to
appreciate a work of art. In all of these senses, we believe that education involves
not only knowing but also being able to use or do what one knows.
This principle has been confirmed not only by our own practice at Alverno
but also by scholars like Gardner (1999) and Wiggins & McTighe (2005) who
have demonstrated through their research that understanding is further devel-
oped and usually best determined in performance. They argue that students
grow in their understanding as they use what they have studied, and that
students’ understanding is most evident when they are required to use or
apply a theory or concept, preferably in a context that most closely resembles
situations they are likely to face in the future. For example, having engineering
students solve actual engineering problems using the theories they have studied
is usually a more effective way of assessing and improving their learning than
merely asking them to take a test on those theories. This principle has been
important to us from the start, and its implications for our practice continue to
emerge.
The nature of the eight abilities themselves reflects an emphasis on how
students are able to think and what they are able to do, so design of teaching
and assessment in the curriculum has increasingly been more about engaging
students in the practice of the disciplines they are studying, not just learning
about the subject by listening to the teacher. In most class sessions Alverno
students are, for example, engaging in communicating, interacting with one
another, doing active analysis based on assignments, arguing for a position,
exploring different perspectives on issues, or are engaged in other processes that
entail practice of the abilities in the context of the disciplines. In fact, we often
make the point that design of teaching has much more to do with what the
students will do than what the teacher will say or do. Even our physical facilities
have reflected our growing commitment to this kind of learning. Our class-
rooms are furnished not with rows of desks, but tables with chairs, to create a
more interactive learning environment, one in which students are engaged in the
analytic processes they are studying as opposed to one in which they are
watching the teacher do the analysis for them. This is not to suggest that there
is no room for input and guidance from the teacher, but, in the end, we have
found that it is the student’s ability to practice the discipline that matters.
In the same spirit, our assessment design is guided by the principle that
students not only improve and demonstrate their understanding most reliably
when they are using it, but also that what they do with their understanding
will make them most effective in the variety of contexts they face in their futures.
There has always been an emphasis on practice of a field in the professional
areas like nursing and education. Students must demonstrate their under-
standing and ability by what they can do in clinicals or in student teaching.
We take the same approach to assessment in disciplines across the curriculum.
Students in philosophy are required to give presentations and engage in
10 Collaborative and Systemic Assessment of Student Learning 179

group interactions in which they use theories they have studied to develop a
philosophical argument on a significant issue; English students draw from their
work in literary criticism to write an editorial in which they address a simulated
controversy over censorship of a novel in a school district; chemistry students
develop their scientific understanding by presenting a panel on dangers to the
environment.
In all of these instances faculty have designed assessments that require
students to demonstrate understanding and capability in performance that
involves using what they have learned. But what makes for effective assessment
design in addition to the emphasis on engaging students in using what they have
studied? Essential to effective assessment of student learning, we have found, is
the use of performance criteria to provide feedback to students – another
critical principle guiding our educational practice.

Feedback Based on Criteria Enhances Student Learning

As the authors throughout this book insist, assessment should not only evaluate
student learning but enhance it as well. Our own practice has taught us that
effective and timely feedback to students on their performance is critical to making
assessment an opportunity for improvement in their learning. In addition, we have
learned how important specific criteria are as a basis for feedback – criteria made
known to the student when an assignment or assessment is given.
A question many ask of Alverno faculty is, ‘‘What has been the greatest
change for you as a result of your practice, as teachers, in your ability-based
curriculum?’’ Most of us point to the amount of time and attention we give to
feedback. We now realize that merely evaluating the work of students by
documenting it with a letter or number does little to inform students of what
they have learned and what they need to do to improve. How many of us as
students ourselves received grades on an assignment or test with very little clue
as to what we did well or not? We see giving timely and meaningful feedback as
essential to teaching and assessment so that our students do not have a similar
clueless experience. In fact, our students have come to expect such feedback
from us and are quick to call us to task if we do not provide it.
The challenge, we have learned, is to develop the most meaningful and
productive ways of giving feedback. Sometimes a few words to a student in a
conversation can be enough; other times extensive written feedback is neces-
sary. There are times when giving feedback to an entire class of students is
important in order to point out patterns for improvement. Increasingly the
question of how to communicate with students about their performance from a
distance is on our minds.
Perhaps most important in giving feedback, however, is being able to use
criteria that give students a picture of what their learning should look like in
performance. This is a principle we have come to value even more over time.
180 T. Riordan and G. Loacker

Although we could say with confidence from the start that we wanted our
students to be able to ‘‘communicate effectively,’’ to ‘‘analyze effectively,’’ to
‘‘problem-solve effectively,’’ we also knew that such broad statements would
not be of much help to students in their learning. Therefore we have given more
attention as faculty to developing and using criteria that make our expectations
more explicit. It is helpful to students, for example, when they understand that
effective analysis requires them to ‘‘use evidence to support their conclusions,’’
and effective problem-solving includes the ability to ‘‘use data to test a hypoth-
esis.’’ The process of breaking open general learning outcomes into specific
performance components provides a language faculty can use to give feedback
to students about their progress in relation to the broader learning outcomes.
As we have emphasized, students learn the abilities in our curriculum in the
context of study in the disciplines, and this has important implications for
developing criteria as well. Criteria must not only be more specific indicators
of learning, but must also reflect an integration of abilities with the disciplines.
In other words, we have found that another significant dimension of developing
criteria for assessment and feedback is integrating the abilities of our curricu-
lum with the disciplinary contexts in which students are learning them. This
disciplinary dimension involves, in this sense, a different form of specificity. In a
literature course the generic ‘‘using evidence to support conclusions’’ becomes
‘‘using examples from the text to support interpretations’’; while in psychology
the use of evidence might be ‘‘uses behavioral observations to diagnose a
disorder’’. It has been our experience that giving feedback that integrates
more generic language with the language of our disciplines assists the students
to internalize the learning outcomes of the curriculum in general and the
uniqueness and similarities across disciplinary boundaries. The following exam-
ple illustrates how our faculty create criteria informed by the disciplinary
context and how this enhances their ability to provide helpful feedback.
Two of the criteria for an assigned analysis of Macbeth’s motivation in
Shakespeare’s play might be ‘‘Identifies potentially credible motives for Mac-
beth’’ and ‘‘Provides evidence that makes the identified motives credible.’’ These
criteria can enable an instructor, when giving feedback, to affirm a good
performance by a student in an assessment and still be precise about limitations
that a student should be aware of in order to improve future performance: ‘‘You
did an excellent job of providing a variety of possible credible motives for
Macbeth with evidence that makes them credible. The only one I’d seriously
question is ‘‘remorse’’. Do you think that Macbeth’s guilt and fear moved into
remorse? Where, in your evidence of his seeing Banquo’s ghost and finally
confronting death, do you see him being sorry for what he did?’’ In such a
case, the instructor lets the student know that the evidence she provided for
some of the motives made them credible, and she realizes that she has demon-
strated effective analysis. The question about ‘‘remorse’’ gives her an opportu-
nity to examine what she had in mind in that case and either discover where her
thinking went askew or find what she considers sufficient evidence to open a
conversation about it with the instructor.
10 Collaborative and Systemic Assessment of Student Learning 181

We have tried to emphasize here how vital using criteria to give instructor
feedback is to student learning as part of the assessment process, but just as
critical is the students’ ability to assess their own performance. How can they
best learn to self assess? First, we have found, by practice with criteria. Then, by
our modeling in our own feedback to them, what it means to judge on the basis
of criteria and to plan for continued improvement of learning. After all, the
teacher will not be there to give feedback forever; students must be able to
determine for themselves what they do well and what they need to improve.
Criteria are important, then, not only for effective instructor feedback but also
as the basis for student self assessment.

Self Assessment Is Integral to Learning

In our work with students and one another we have been continually reminded of
the paradox in education that the most effective teaching eventually makes the
teacher unnecessary. Put another way, students will succeed to the extent that they
become independent life-long learners who have learned from us but no longer
depend on us to learn. We have found that a key element in helping students
develop as independent learners is to actively engage them in self assessment
throughout their studies.
Self assessment is such a critical aspect of our educational approach that
some have asked why we have not made it the ninth ability in our curriculum.
The fact that we have not included self assessment as a ninth ability required for
graduation reflects its uniqueness because it underlies the learning and assess-
ment of all of the abilities. Our belief in its importance has prompted us to make
self assessment an essential part of the assessment process to be done regularly
and systematically. We include self assessment in the design of our courses and
work with one another to develop new and more effective ways of engaging
students in self assessment processes. We also record key performances of self
assessment in a diagnostic digital portfolio where students can trace their own
development of it along with their development of the eight abilities.
We have learned from our students that their practice of self assessment
teaches them its importance. Our research shows that beginning students ‘‘make
judgments on their own behavior when someone else points out concrete
evidence to them’’ and that they ‘‘expect the teacher to take the initiative in
recognizing their problems’’ (Alverno College Assessment Committee/Office of
Educational Research and Evaluation, 1982). Then, gradually, in the midst of
constant practice, they have an aha experience. Of what sort? Of having judged
accurately, of realizing that they understand the criteria and can use them
meaningfully, of trusting their own judgment enough to release them from
complete dependence on the teacher.
The words of one student in a panel responding to the questions of a group of
educators provide an example of such an experience of self assessment. The
182 T. Riordan and G. Loacker

student said, ‘‘I think that’s one of the big things that self assessment gives you, a
lot of confidence. When you know that you have the evidence for what you’ve
done, it makes you feel confident about what you can and can’t do, not just
looking at what somebody else says about what you did, but looking at that and
seeing why or why not and back up what you said about what you did. It’s really
confidence-building.’’
Implicit in all that we have said here about elements of the assessment
process, including the importance of feedback and self assessment, is the vision
of learning as a process of development. We emphasize with students the idea
that they are developing as learners, and that this process never ends; but what
else have we done to build on this principle of development in our practice?

Learning Is a Developmental Process

Faculty recognize that, although students may not progress in a strictly linear
fashion as learners, design of teaching and assessment must take into account the
importance of creating the conditions that assist students to develop in each stage
of their learning. This has led us to approach both curriculum and teaching design
with a decidedly developmental character.
After committing to the eight abilities as learning outcomes in the curricu-
lum, we asked ourselves what each of these abilities would look like at different
stages of a student’s program of study. What would be the difference, for
example, between the kind of analysis we would expect of a student in her
first year as opposed to what we would expect when she is about to graduate? As
a result of these discussions, the faculty articulated six generic developmental
levels for each of the eight abilities. For instance, the levels for the ability of
Analysis are:
Level 1 Observes accurately
Level 2 Draws reasonable inferences from observations
Level 3 Perceives and makes relationships
Level 4 Analyzes structures and relationships
Level 5 Refines understanding of disciplinary frameworks
Level 6 Applies disciplinary frameworks independently.
We would not argue that the levels we have articulated are the final word on
the meaning and stages of analysis or any of the abilities; indeed, we continually
refine and revise their meaning. On the other hand, we recognize the value of
providing a framework and language that faculty and students share in the
learning process. All students are required to demonstrate through level four of
all eight abilities in their general education curriculum and to specialize in some
of them at levels five and six, depending on what faculty decide are inherent in
their major and thus most important for their respective majors. For example,
nursing majors and philosophy majors, like every other major, are required to
demonstrate all eight abilities through level four. At levels five and six, however,
10 Collaborative and Systemic Assessment of Student Learning 183

nursing students would specialize in problem solving, social interaction, and


valuing in decision-making while philosophy students would specialize in ana-
lysis, valuing in decision-making, and aesthetic engagement.
We quickly learned in implementing such a developmental approach to the
curriculum that we would need to rely on each other to ensure that students were,
indeed, having the kinds of experiences that fostered the development we envisioned.
From articulating the stages of the curriculum to designing our courses, we needed
to be on the same page and in constant collaboration to make it work for students.
How has this emphasis on development affected the ways in which we design
courses? It has meant that we always ask ourselves as part of the design process
at what stage in our curriculum students will be taking a course. If students are
taking a course in the first year of their program, the learning outcomes in the
course usually reflect levels one and two of the generic institutional abilities in
disciplinary or interdisciplinary contexts. With respect to the ability of analysis,
for example, faculty design courses in the first year that focus on developing the
ability to ‘‘Observe accurately’’ and ‘‘Draw reasonable inferences from observa-
tions’’, no matter what the disciplinary or interdisciplinary context is. Whether
students are studying psychology or sociology, biology or chemistry, philoso-
phy or history the courses they take in the first year are designed with the
beginning levels of analysis in mind; and courses later in the curriculum are
designed on the basis of more sophisticated analytic abilities, as reflected in the
developmental levels indicated above. This is true for all eight of the abilities:
faculty integrate the teaching and assessment of abilities they are responsible for
into course design according to the developmental level of the course in the
curriculum, whether in general education or the major.
But, we are often asked, what have you done to ensure that faculty sustain
and improve a developmental and coherent curriculum? What have you found
helpful in fostering an ongoing spirit of inquiry consistent with your principles
regarding learning as a developmental process? Attention to structures and
processes aligned with those principles has been critical for us.

It Is Important to Create Structures and Processes that Foster


Collaborative Responsibility for Student Learning
While it is true that individual faculty are responsible for the particular group of
students they are teaching, the Alverno faculty have practiced the principle that
they also share responsibility for student learning as a whole. What one faculty
member does in a particular class can affect how students learn in other classes.
Consequently, faculty collaboration with one another about issues of teaching and
assessment is necessary to ensure the developmental and coherent nature of the
curriculum. This principle has made a difference in the structures and processes we
have developed over time.
184 T. Riordan and G. Loacker

The faculty at Alverno take seriously the idea that teaching and assessment
are not individual enterprises, but collaborative ones. In addition to the com-
mon commitment to the teaching and assessment of shared learning outcomes,
we believe that we are responsible to assist one another to improve the quality of
student learning across the college. We believe that requires organizing our
structures to allow for the collaboration necessary to achieve this. There is no
question that this does challenge the image of the individual scholar. We do
recognize that keeping current in one’s own field is part of our professional
responsibility, but we have also noted that a consequence of specialization and a
limited view of research in higher education is that faculty are often more
connected with colleagues of a disciplinary specialization than with the faculty
members on the same campus, even in the same department. As recent discus-
sions in higher education about the nature of academic freedom have suggested,
the autonomy of individual faculty members to pursue scholarship and to have
their teaching unfettered by forces that restrict free and open inquiry presumes a
responsibility to foster learning of all kinds. We believe that if the learning of
our students is a primary responsibility, then our professional lives should
reflect that, including how and with whom we spend our time. There are several
ways in which we have formalized this as a priority in institutional structures
and processes.
One important structural change we decided to make, given the significance of
the eight abilities at the heart of the curriculum, was that the Alverno faculty
decided to create a department for each of the abilities. In practice this means that
faculty at the college serve in both a discipline department and an ability depart-
ment. Depending on their interest and expertise, faculty join ability departments,
the role of which is to do ongoing research on each respective ability, refine and
revise the meaning of that ability, provide workshops for faculty, and regularly
publish materials about the teaching and assessment of that ability. These depart-
ments provide a powerful context for cross-disciplinary discourse and they are
just as significant as the discipline departments in taking responsibility for the
quality and coherence of the curriculum.
But how is this collaboration supported? In order to facilitate it, the aca-
demic schedule has been organized to make time for this kind of work. No
classes are scheduled on Friday afternoons, and faculty use that time to meet in
discipline or ability departments or to provide workshops for one another on
important developments in curriculum, teaching, and assessment. In addition,
the faculty hold three Institutes a year – at the beginning of each semester and at
the end of the academic year – and those are also devoted to collaborative
inquiry into issues of teaching, learning, and assessment. As a faculty we expect
one another to participate in the Friday afternoon work and the College
Institutes as part of our responsibility to develop as educators because of
what we can learn from one another and can create together in the service of
student learning.
The value placed on this work together is strongly reinforced by criteria for
the academic ranks, which include very close attention to the quality and
10 Collaborative and Systemic Assessment of Student Learning 185

significance of contributions faculty make to this kind of ongoing discourse on


teaching, learning, and assessment. In fact, it was during one of our College
Institutes focused on effective teaching when we noted, somewhat sheepishly,
that the criteria we had for faculty performance with respect to teaching at that
point said simply, ‘‘Teaches effectively’’. We have since redefined that general
statement into increasingly complex levels to include the different principles we
have explored in this chapter. These principles have shaped our view not only of
teaching but also of scholarship grounded in our commitment to student
learning. In this spirit, faculty are evaluated not only on the quality of their
own teaching but also on their contributions to the quality of teaching and
learning across the institution and, indeed, to higher education in general. As
faculty proceed through the academic ranks, the criteria for this dimension of
scholarship reflect more intensive and extensive contributions.

Making Teaching a Scholarly Enterprise Should be a Priority


The tradition of scholarship in higher education has emphasized both rigorous
study of the subject under investigation and making the study and its results public
to a community of scholars for review and critique. Our own work on teaching and
assessment at Alverno has certainly been informed by that tradition, but our focus
on teaching and assessment has led us to rethink the scope of what is studied, the
ways in which we might define rigor, and the form of and audience for making our
ideas public.
In his landmark publication, Scholarship Reconsidered, Boyer (1997) chal-
lenged the academic community to broaden and deepen its notion of scholar-
ship, and one very significant aspect of his analysis called for a focus on what
he referred to as the scholarship of teaching. Since that time, serious work has
been done in higher education circles on what this might mean and how we
might make it a more substantive and recognized dimension of scholarly
work. Particularly noteworthy has been the work of the Carnegie Foundation
for the Advancement of Teaching and Learning. Under the leadership of
Lee Shulman and Pat Hutchings, Carnegie (www.carnegie.org) has developed
multiple initiatives and fostered ongoing discourse on the scholarship of
teaching. At Alverno we have become increasingly committed to scholarly
inquiry into teaching and learning and have made considerable inroads in
establishing expectations regarding scholarship that contributes to the
improvement of student learning.
We know, for example, that we are more likely to help students learn if we
understand how they learn, and this implies that our scholarly inquiry includes
study beyond our own disciplinary expertise. What do we know about the
different learning styles or ways of knowing that students bring to their learning
(Kolb, 1976)? What are cognitive psychologists and other learning theorists
saying about the most effective ways of engaging students in their learning
186 T. Riordan and G. Loacker

(Bransford et al., 2000)? What can developmental theories tell us about the
complexities of learning (Perry, 1970)? We regularly share with one another the
literature on these kinds of questions and explore implications for our teaching
practice.
The research of others on learning has been a great help to us, but we have
also come to realize that each student and group of students is unique, and this
has led us to see that assessment is an opportunity to understand more about
our students as learners and to modify our pedagogical strategies accordingly.
In this sense, our faculty design and implement assignments and assessments
with an eye to how each one will assist them to know who their students are. We
are consistently asking ourselves how we have assisted students to make con-
nections between their experience and the disciplines they are studying, how we
have drawn on the previous background and learning of students, and how we
can challenge their strengths and address areas for improvement.
This focus on scholarly inquiry into the learning of our students is a supple-
ment to, not a substitute for, the ongoing inquiry of faculty in their disciplines,
but our thinking about what this disciplinary study involves and what it means
to keep current in our fields has evolved. Increasingly we have approached our
disciplines not only as objects of our own study but also as frameworks for
student learning. What has this meant for us in practice?
Perhaps most importantly we have tried to move beyond the debate in higher
education about the extent to which research into one’s field informs one’s
teaching or even whether research or teaching should take precedence in a
hierarchy of professional responsibilities. At Alverno we have taken a position
and explored its implications in order to act on them. We have asked ourselves
how our role as educators affects the kind of inquiry we pursue in our disciplines
and how we engage students in the practice of those disciplines in ways that are
meaningful and appropriate to the level at which they are practicing them.
From this perspective there is less emphasis on sharing what we know about
our disciplines and more on figuring out how to assist students to use the
discipline themselves. This is not to suggest that the content of disciplinary
study is unimportant for the faculty member or the student, but it does mean
that what faculty study and what students study should be considered in light of
how we want our students to be able to think and act as a result of their study. In
this view faculty scholarship emerges from questions about student learning,
and disciplines are used as frameworks for student learning.
Clearly essential to any scholarly work is making one’s ideas public to
colleagues for critical review and dialogue. This is an expectation we have for
our faculty as well, but we have embraced a view of ‘‘making one’s ideas public’’
that is not restricted to traditional forms of publication. As we have suggested
throughout this chapter, the first audience for our ideas on teaching, learning,
and assessment is usually our colleagues here at the college. Because our first
responsibility is to the learning of our students, our Friday afternoon sessions
and College Institutes provide opportunities for us to share and critically
consider the insights and issues we are encountering in our teaching and in
10 Collaborative and Systemic Assessment of Student Learning 187

the research on teaching we have studied. Our faculty also regularly create
written publications on various aspects of our scholarly work on teaching,
learning, and assessment. For example, Learning that Lasts (Mentkowski and
Associates, 2000) is a comprehensive look at the learning culture of Alverno
College based on two decades of longitudinal studies and on leading educa-
tional theories. Assessment at Alverno College: Student, Program, and Institu-
tional (Loacker & Rogers, 2005) explores the conceptual framework for
assessment at Alverno with specific examples from across disciplines and pro-
grams, while Self Assessment at Alverno College (Alverno College Faculty, 2000)
provides an analysis of the theory and practice of Alverno faculty in assisting
students to develop self assessment as a powerful means of learning. Ability-based
Learning Outcomes: Teaching and Assessment at Alverno College (Alverno Col-
lege Faculty, 2005) articulates the meaning of each of the eight abilities in the
Alverno curriculum and gives examples of teaching and assessment processes for
each. And in Disciplines as Frameworks for Student Learning: Teaching the
Practice of the Disciplines (Riordan & Roth, 2005), Alverno faculty from a variety
of disciplines consider how approaching disciplines as frameworks for learning
transforms the way they think about their respective fields. In most instances
these are collaborative publications based on the work of different groups of
faculty, like members of ability departments or representatives from the assess-
ment council or faculty who have pursued a topic of interest together and have
insights to share with the higher education community. The collective wisdom of
the faculty on educational issues also serves as the basis for regular workshops we
offer for faculty and staff from around the world both in hosting them on campus
and in consulting opportunities outside the college.
Our remarks about scholarly inquiry are not intended to diminish the value
of significant scholarly inquiry that advances understanding of disciplinary
fields and the connections among them. On the other hand, we have come to
value the fertile ground for inquiry that lies at the intersection of student
learning and the disciplines. We believe that it deserves the same critical and
substantive analysis because it has the potential to not only enhance student
learning but also to help us re-imagine the meaning and practice of the dis-
ciplines themselves.

Reviewing what we Have Learned


In one sense, assessment of student learning is not a new thing. We teachers
throughout the world at all levels have always been involved in determining
what and how well our students have learned. What has challenged us, parti-
cularly in higher education, is the call to raise serious questions about the kind
of learning that is most significant, the most effective ways of determining
whether students are learning, how assessment processes not only determine
student learning but improve it, and how assessment can be used to improve
188 T. Riordan and G. Loacker

design of curriculum and instruction in the service of student learning. Many of


us across the higher education community have perhaps raised these questions
for ourselves individually, but can individual spurts of a focus on student
learning enhance that learning as much as a focus committed to by an entire
faculty? What might be possible if a faculty together organized to find better
ways of improving student learning through assessment? Now institutions are
being required, by stakeholders and accrediting bodies, to not only raise the
questions but explain how they are answering them. We think it is important to
brainstorm about what individual faculty might do to bring their institutions to
a willingness to grapple with these questions and come to answers that fit the
institution and its students.
In this chapter we have explored our experience in addressing these questions
as an institution over an extended period of time. We would like to further
exchange what we have learned in the process, based on our own practice as well
as on what our colleagues have told us and what you can tell us in return.
We have learned from colleagues that our commitment to collaboration on
issues of teaching, learning, and assessment is perhaps the most important dimen-
sion of creating a culture of learning. We recognize that our particular structures
might not be appropriate or most effective for other institutions, but our
experience and that of others across higher education reinforces the value of
taking shared responsibility for student learning. However this might be trans-
lated into practice, we have found that it makes for a more coherent, develop-
mental, and even satisfying learning experience for students, because the faculty
are working together toward common learning outcomes and are developing as
well as sharing with one another the most effective ways of teaching and
assessing those outcomes.
We have learned in our work with faculty around the world on assessment of
learning outcomes that the more faculty take personal ownership of learning
outcomes across a program and/or institution the more success they will have in
doing meaningful assessment. Frequently we have heard stories from our col-
leagues at other institutions that faculty are sometimes suspicious, even resent-
ful, of assessment because they perceive it as handed down from someone or
some group. If, on the other hand, they see assessment of learning outcomes as
helping them address questions they themselves have about the quality of their
teaching and student learning, they are more likely to embrace it. At Alverno
we, as the faculty, see ourselves as responsible for articulating the eight abilities
at the heart of the curriculum. Initially we saw them and still see them as related
to the significant processes of thinking in our own fields of study. How then, we
asked, shall we evaluate each student’s performance of them? The commitment of
the faculty to consistent, shared, and systemic assessment of student learning
emerged, therefore, not in response to external stakeholders or accountability
concerns, but in recognition of the need to design forms of assessment that reflect
the kind of curriculum we had articulated. While many institutions now may be
responding to external pressures as the basis for their work on assessment,
10 Collaborative and Systemic Assessment of Student Learning 189

ultimately the success of that work will depend on whether faculty embrace it
because they see it as enhancing how students learn what is important.
We have also learned that the effectiveness of learning outcomes in promoting
student learning depends heavily on the extent to which the outcomes are actually
required. From the start of our ability-based curriculum, student success through-
out the curriculum has depended on demonstration of the eight abilities in the
context of disciplinary study. Our students know that graduation from the college,
based on performance in their courses in general education and their majors, is
directly related to their achievement of learning outcomes. They know this
because the faculty have agreed to hold them to those standards. We realize that
learning outcomes that are seen by faculty as aspirations at best and distractions at
worst will not be taken seriously by students and, thus, will not become vital
dimensions of the curriculum, no matter what assessment initiatives emerge.
We have learned as well how important it is to make a focus on student learning
a significant priority in our scholarship. When most of us in higher education
complete our graduate work we have as our primary focus scholarly inquiry
into the discipline we have chosen. Indeed, for many the focus is a very
specialized aspect of a discipline. Another way of saying this is to say that our
primary identity is based on our responsibility to the discipline. But what
happens when we make student learning the kind of priority we have suggested
throughout this chapter, indeed throughout this book? The reflections here
attempt to indicate how the lens of student learning has affected both our work
as a community of educators as well as our individual professional identities.
Finally, we have learned that, although our collaboration as Alverno colleagues
has been essential to a focus on student learning, connections with colleagues
beyond the institution are critical to our ongoing inquiry. These colleagues include
the more than four-hundred volunteer assessors from the business and profes-
sional community who contribute their time and insight to our students’ learn-
ing. They also include the colleagues we connect with in workshops,
consultations, grant projects, conferences, and other venues; or those who are
contributing to the growing body of literature on teaching, learning, and
assessment we have found so valuable. We owe them a great debt and are
certain our students have been the beneficiaries as well. It is in this spirit that
this chapter aims to reflect and foster the kind of continuing conversation we
need in order to make assessment of student learning the powerful and dynamic
process it can be.

Looking Forward

What does the future hold for us, both at Alverno College and in the higher
education community at large? How can we continue to improve in our efforts
to optimize student learning? Drawing on our own principles and practices and
on our work with hundreds of institutions around the world, we propose a few
final thoughts.
190 T. Riordan and G. Loacker

Make assessment of student learning central to teaching, not an addition to it.


One of the most frequently asked questions at our workshops on teaching and
assessing of student learning is how we can devote so much time and energy to
the practices we have described in this chapter. This question usually comes out
of the assumption that assessment of student learning outcomes is something
we are adding to the already long list of faculty responsibilities in higher
education. At Alverno, however, assessing student learning has become an
organic part of the teaching and learning process. The question is not, ‘‘How
will we find the time and resources to do add on assessment of student learning
outcomes?’’ Rather the question is, ‘‘How will we determine whether our
students have learned the kinds of thinking and doing we consider essential to
the degree we are conferring?’’ This is not a question peripheral to teaching; it is
at the heart of teaching. Whether we call it ‘‘assessment’’ or not, we strongly
believe that designing processes to determine what students are learning and to
help them improve is critical to the teaching enterprise. As institutions of higher
education seek to become more focused on student learning, this integral
connection between teaching and assessment will be important.
Think carefully about priorities, including asking hard questions about what is
not as important as student learning. Are we in higher education spending too
much time, energy, and money on interests and activities that are not contribut-
ing to student learning in effective ways? This is another way to think about the
resource questions that always seem to emerge in discussions about assessment.
Take a hard look at the connection, or lack of it, between allocation of resources
and student learning. One practical step we have taken at Alverno to ensure
productive use of resources in the service of student learning is to make as our
primary scholarly responsibility collaborative inquiry among our faculty about
teaching, learning, and assessment within and across our disciplines. This is
reflected in how we focus our intellectual energy, how we spend our time, and
how we evaluate faculty performance. We understand that different institutions
will have different scholarly requirements depending on their missions, but
asking difficult questions about the benefits to our students is important.
Find or create the language or pathways that will most actively engage faculty
in productive work on teaching, learning, and assessment. It sometimes seems that
using the word ‘‘assessment’’ is counterproductive in encouraging people to take
it seriously in relation to teaching and learning. It has so many connotations
associated not only with accountability, but even with backdoor attempts to
negatively evaluate faculty, that many hope it will just go away. On the other
hand, committed faculty surely take seriously the responsibility to help their
students learn and to create ways of making sure that either students have
learned or that something needs to be done to improve their learning. Account-
ability is here to stay and will provide the heat, but doesn’t the light need to
come from faculty commitment to student learning? The more we can do to tap
into that important resource of faculty commitment with the language we use
and the initiatives we develop, the more likely we are to engage and sustain
interest in assessment of student learning outcomes.
10 Collaborative and Systemic Assessment of Student Learning 191

Clearly, as we at Alverno study our curriculum and our students’ performance,


we realize that we do not experience learning outcomes and student assessment
as add-ons to our learning environment. Rather, we understand them as
recently discovered elements integral to the teaching/learning process that
help to define it more meaningfully. A student might be overheard, for instance,
describing a debate for which she is gathering evidence on an incident or
character in Hamlet – not as another type of speech to be learned but as a
way of understanding Shakespeare’s characters and style more deeply. In a self
assessment she might describe her problem solving process in creating a new art
work – not as a generic strategy to be learned but as a new understanding of the
role of her intellect in her production of art.
Through our refining and redefining of the curriculum, we recognize that our
practice now results from carefully thought-through revisions of our disciplin-
ary and educational philosophies – revisions that, had we foreseen them when
we began, might have kept us from proceeding. Because we thought we were
already doing a good job of teaching, we didn’t realize that students could learn
better if we changed our methods. We didn’t foresee that focusing on student
learning might threaten the ongoing existence of our favorite lecture or might
require us to listen more than we speak.
As we look ahead, we are determined to commit ourselves to keep re-
examining our practice as we learn from our own experience as well as from
other emerging theories. We also need to re-examine it as new seismic shifts like
globalization and technology defy our strategic planning and demand new
kinds of transformation. Where will our re-examining take us? We’re not
sure, but if we can keep observing and tracking growth in our students’ under-
standing and performance, we will take that growth as a prod to keep learning
rather than as a testimony to success achieved.
Signs within the educational community across the globe suggest that incor-
poration of student learning outcomes into teaching, learning, and assessment
is not just the unique focus of our institution; nor is it just a possibly reversible
trend. (Note: the First International Conference on Enhancing Teaching and
Learning Through Assessment, Hong Kong, 2005; the Bologna Declaration,
Europe, 1999; the Sharing Responsibility for Essential Learning Outcomes
Conference, Association of American Colleges and Universities, 2007). Instead,
these signs seem to us to be evidence of growth in the kind of inquiry that can
keep education alive with questioning that is intellectually and morally life-
sustaining.

References
Alverno College Assessment Committee/Office of Research and Evaluation. (1982). Alverno
students’ developing perspectives on self assessment. Milwaukee, WI: Alverno College
Institute.
Alverno College Faculty. (2005). Ability-based learning outcomes: Teaching and assessment at
Alverno College. Milwaukee, WI: Alverno College Institute.
192 T. Riordan and G. Loacker

Alverno College Faculty. (Georgine Loacker, Ed.). (2000). Self assessment at Alverno College.
Milwaukee, WI: Alverno College Institute.
Arenson, K. W. (2000, January 1). Going higher tech degree by degree. The New York Times,
p. E29.
Bollag, B. (2006, October 27). Making an art form of assessment. The Chronicle of Higher
Education, pp. A1–A4.
Boyer, E. (1997). Scholarship reconsidered: Priorities of the professoriate. New York: John
Wylie & Sons.
Bransford, J. D., Brown, A. L., & Cocking, R. R. (2000). How people learn: Brain, mind,
experience, and school (Expanded ed.). Washington, DC: National Academy Press.
Gardner, H. (1999). The disciplined mind: What all students should understand. New York:
Simon & Schuster.
Kolb, D. (1976). The learning style inventory: Technical manual. Boston: McBer.
Levine, A. (2006). Educating school teachers. The Education Schools Project Report #2.
Washington, DC: The Education Schools Project.
Loacker, G., & Rogers, G. (2005). Assessment at Alverno College: Student, program, and
institutional. Milwaukee, WI: Alverno College Institute.
Mentkowski, M. and Associates. (2000). Learning that lasts: Integrating learning, perfor-
mance, and development. San Francisco: Jossey-Bass.
Perry, W.G., Jr. (1970). Forms of intellectual and ethical development in the college: A Scheme.
Austin, TX: Holt, Rinehart &Winston.
Riordan, T., & Roth, J. (Eds.). (2005). Disciplines as frameworks for student learning: Teaching
the practice of the disciplines. Sterling, VA: Stylus Publications.
Twohey, M. (2006, August 6). Tiny college earns big reputation. The Milwaukee Journal
Sentinel, pp. A1, A19.
Who needs Harvard? Forget the ivy league. (2006, August 21). Time, pp. 37–44.
Wiggins, G. & McTighe, J. (2005). Understanding by design (Expanded 2nd ed.). Alexandria,
VA: Association for Supervision and Curriculum Development (ASCD).
Chapter 11
Changing Assessment in Higher Education:
A Model in Support of Institution-Wide
Improvement

Ranald Macdonald and Gordon Joughin

Introduction

Assessment of student learning is a complex matter being dealt with in large,


complex institutions, involving diverse collections of staff and students engaged
in intensive activities that lie at the heart of higher education. Not surprisingly,
assessment has become a focal point of research and innovation in higher
education. Consequently, the literature on assessment is now considerable
and growing rapidly, while national projects to generate ways of improving
assessment have become commonplace in many countries At the same time,
however, attempts to improve assessment often fall short of their promise. Well-
founded statements of assessment principles fail to work their way down from
the committees which draft them; individual lecturers’ innovative assessment
practices take hold within their immediate sphere of influence without infiltrat-
ing a whole programme; limited progress is made in encouraging departments
to take seriously the assessment of generic qualities or attributes considered as
essential learning outcomes for all graduates.
What then does it take to improve assessment in a university? This chapter
presents a response to this situation focused on an emerging understanding of
the nature of higher education institutions in the context of change. The
purpose of this chapter therefore is to present a model of higher education
institutions that is relevant to this context, highlighting those aspects of uni-
versities and similar organisations which require attention if assessment prac-
tices are to be improved. We believe that the challenges facing educational
organisations, their aspirations and those of their staff, are best construed
and most successfully pursued by clarifying their nature in the light of recent
organisational thinking, and that failure to do this will inevitably lead to
unrealistic goals, misdirection and consequent dissipation of energy, and, ulti-
mately, disappointment, while at the same time more effective ways of achieving

R. Macdonald
Learning and Teaching Institute, Sheffield Hallam University, Sheffield, UK
e-mail: R.Macdonald@shu.ac.uk

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 193


DOI: 10.1007/978-1-4020-8905-3_11, Ó Springer ScienceþBusiness Media B.V. 2009
194 R. Macdonald and G. Joughin

improvement in assessment will not be pursued. The chapter therefore begins


with a model of higher education institutions which seeks to clarify our under-
standing of the nature of universities, followed by an exploration of the impli-
cations of the model for changing assessment practices.

A Model of Institutional Impacts on Assessment, Learning


and Teaching

Because of the complex nature of both assessment and universities, a model is


considered to be a necessary component in addressing the issue of improving
assessment. Keeves (1994, p. 3865) notes that ‘‘research in education is
concerned with the action of many factors, simultaneously or in a causal
sequence, in a problematic situation. Thus it is inevitable that research in the
field of education should make use of models in the course of its inquiries to
portray the interrelations between the factors involved.’’ Models are useful
for more than research, however, since they are representations designed
to help us see more clearly what we already know, portray things in ways
that have not occurred to us before, or provide us with insights into areas
that have previously confused us. In the case of an organisation such as
a university, a model needs to balance both formal structures and the roles
and behaviours of people who may act outside the remit of their formal
responsibilities.
Models by their nature and purpose simplify things – by naming the elements
of a system and describing relationships between those elements, they seek to
provide a usable tool for analysis, discovery and action, and in so doing risk
omitting elements or relationships which some may see as important (Beer,
Strachey, Stone, & Torrance, 1999). This may be what Trowler and his collea-
gues had in mind when claiming that ‘‘all models are wrong; some are useful’’
(Trowler, Saunders, & Knight, 2003, p. 36, citing Nash, Plugge & Eurlings,
2000). We acknowledge that, like all models, our model is open to critique and
revision.
Our model is presented below in Fig. 11.1, followed by an explanation of its
components.
Many of the elements of our model can be located at particular levels of the
institution, or, in the case of ‘context’, beyond the institution. Other elements,
namely those associated with quality, leadership, management and administra-
tion, cut across these levels. The foundational level of the model is the unit of
study where learning and assessment happens – variously termed the module,
subject, unit or course. The second level of the model represents the collection of
those units that constitute the student’s overall learning experience – typically the
degree programme or a major within a degree. At the third level, academic staff
are normally organised into what are variously termed departments, schools or
faculties (where the latter term refers to a structural unit within the organisation).
11 Changing Assessment in Higher Education 195

Fig. 11.1 A model of institutional impacts on assessment, learning and teaching

The fourth level represents institutional-level entities that support learning,


teaching and assessment. Finally, the model acknowledges the important influ-
ence of the wider, external context in which universities operate.
The number of levels and their names may vary from institution to institu-
tion and between countries, but essentially they reflect a growing distance from
where learning, teaching and assessment happens, though they may be no less
influential in terms of the impact they may have.

The Module Level

Each level within the model has a number of elements. At the level of the
module, where learning and its assessment occurs, we might see module design,
teachers’ experience of teaching and assessment, and students’ previous and
current experiences of being assessed. This is where the individual academic or
the teaching team designs assessment tasks and where students are typically
most actively engaged in their studies. This is also the level at which most
research into and writing about assessment, at least until recently, has been
directed, ranging from the classic studies into how assessment directs students’
learning (e.g., Miller & Parlett, 1974; Snyder, 1971) to more recent studies into
student responses to feedback, and including comprehensive textbooks on
assessment in higher education (e.g., Boud & Falchikov, 2007; Heywood,
2000; Miller, Imrie & Cox, 1998), not to mention a plethora of how to books
and collections of illuminating case studies (e.g., Brown & Glasner, 1999;
Bryan & Clegg, 2006; Knight, 1995; Schwarz & Gibbs, 2002).
196 R. Macdonald and G. Joughin

The Course or Programme Level

At the course or programme level, the elements might include programme


and overall curriculum design, procedures for appointing and supporting
course teams, and how resources are allocated for course delivery. At this
level we would also see aspects such as innovation in learning and teaching,
and it is here that students’ overall experience of learning and assessment
occurs.
Both the importance of this level and its relative neglect are highlighted by
recent arguments for focusing attention at the programme level (e.g., by Knight,
2000; Knight & Trowler, 2000; Knight & Yorke, 2003). This level involves more
than the sum of individual modules; it is where disciplinary and professional
identity is to the fore and where innovation in teaching and assessment impacts
on the student experience.

The Department or Faculty Level

It is at the departmental or faculty level1 that cultural issues become a major


determinant of staff behaviour. It is here that local policies and procedures,
including the allocation of staff to particular responsibilities, can support or
impede innovations in teaching and assessment. It is where the take up of staff
development opportunities develops greater capacity for innovation and aca-
demic leadership. Messages about the relative importance of research, teaching
and management are enacted at this level, and it is here that issues of identity
and morale come to the fore.

The Institutional Level

The institutional level elements may include the overall framework for setting
principles, policies and regulations around learning, teaching and assessment,
the allocation of resources to learning and teaching as against other activities,
and rewards and recognition for learning and teaching in comparison to
rewards and recognition for research, consultancy or management. At this
level, overall resourcing decisions are made regarding teaching and learning,
probation and promotion policies are determined, assessment principles are
formulated and institutional responses to external demands are devised.

1
where ‘faculty’ is used to refer to an organisational unit rather than, as is common in the US
and some other countries, to individual academic staff
11 Changing Assessment in Higher Education 197

The External Level

Finally, the external level will have a large number of elements, including the
regulatory requirements of governments, the role of government funding bodies
in supporting quality enhancement and innovation in teaching and learning,
national and international developments within the higher education sector,
requirements of professional accrediting bodies, media perception and por-
trayal of higher education, and the external research environment. With respect
to learning and assessment, this level may well result in institutional responses
to curriculum design and content as well as overall principles such as the
requirement for general graduate attributes to be reflected in assessed student
learning outcomes. In addition to this, students’ expectations can be strongly
influenced by external forces, including their experience of other levels of
education, their engagement in work, and the operation of market forces in
relation to their chosen field of study.

Quality, Leadership, Management and Administration


Several functional elements cut across these five levels and may impact on any or
all of their elements. In our model we have identified two main functional themes,
firstly quality, including quality assurance (doing things better) and quality
enhancement (doing better things) and secondly, roles. The functional roles in
the model include leadership – conceived in terms of visioning, new ideas and
strategy; management – concerned with putting fully-operating systems into
place and maintaining them; and administration – the day-to-day operation of
systems (Yorke, 2001). The functional aspects may relate to quality through an
emphasis on quality assurance, quality enhancement or both.

Relationships

If the elements of the model indicate what aspects of an institution require


attention in attempts to improve assessment, it is how these elements relate
to each other that gives the model its dynamic. The possible connections
between elements are not infinite, but they are certainly numerous and
complex. We indicate this, albeit in a limited way, through the two-way
arrows in Fig. 11.1.

Applying the Model


How useful is the model? One test is to consider the following fictional, but
nevertheless realistic, scenario that will resonate with many academics.
198 R. Macdonald and G. Joughin

For some time students have been complaining that they are not all being
treated the same when it comes to their assessment. There are different
arrangements for where and when to hand in written assignments; the time
taken to provide feedback varies enormously, despite university guidelines;
different practices exist for late work and student illness; some staff only
seem to mark between 35 and 75 per cent whereas others seem prepared to
give anything between zero and 100; in some courses different tutors put
varying degrees of effort into identifying plagiarism; procedures for handing
back work vary with the result that some students do not even bother to
collect it. All this in spite of the fact this (UK) university has policies and
guidelines in place which fully meet the Quality Assurance Agency’s Code of
Practice on Assessment (The Quality Assurance Agency, 2006a).

Which aspects of the university, as identified by the elements of the model,


would need to be addressed to rectify or ameliorate the range of issues raised
here? Which connections between elements need strengthening? In the sce-
nario, the principles, policies and regulations at the institutional level have not
connected with the students’ experience of assessment at the module level. Is
this due to the nature of the policies themselves, to the departmental cultures
which subvert the policies, and/or to the teachers’ limited exposure to assess-
ment theory?
We don’t propose to resolve these matters here, but simply to indicate that
we think the model is useful in considering such situations. It provides a frame-
work for identifying what might need to be looked into to resolve such problems
and a starting point for defining problems, analysing their bases, and directing
activities to address them. It suggests that, to implement a planned innovation
or to address a given problem, a number of elements will need to be worked
with and several critical relationships will be involved.
How many elements, or how much of a system, needs to be brought into play
for change to be effective? At least three possibilities are apparent. The first,
which we can quickly dismiss, would be to suggest that, if we can select the
‘‘right’’ element, intervention at a single point might do the trick. The second is
to consider if there might be a small number of elements which, in combination,
might lead to large-scale change, in line with Malcolm Gladwell’s popularised
notion of the tipping point, according to which significant change can occur
when a small number of factors, none of which seem to be of major significance
in themselves, coalesce, leading to change that is significant or even dramatic
(Gladwell, 2000).
Both of these possibilities pander to an almost certainly mistaken belief in
leverage points within organisations. Most of us have a strong tendency to
believe that there are such points and an equally strong desire to know where
they are, since this would give us considerable power in working towards
our goals (Meadows, 1999). This search is illusory and unlikely to succeed,
11 Changing Assessment in Higher Education 199

not least because of the loosely coupled nature of universities as organisations


(Knight & Trowler, 2000) – change in one element may or may not lead to
significant change in adjoining elements.
The third possibility, and the one that seems most plausible to us, is that all or
most elements need to be taken into account. The need to address some elements
may be more immediately apparent, while the need to address others may be less
so. Moreover, as we intervene at one level of the institution, and in relation to
one element at that level, other elements and other levels are likely to be affected,
though the nature and degree of this impact may be difficult to predict.
Each of these possibilities is based on an understanding of the university as a
more-or-less linear organisation in which action in one part of the organisation
has more-or-less predictable consequences in other parts through a chain of
cause-and-effect leading to a desired outcome. That universities do not operate
quite, or even at all, like this is soon recognised by anyone charged with bringing
about widespread change in teaching, learning, or assessment within their
university. A more sophisticated approach is needed.

Systems Approaches

So far we have not used the term system, but it is clear that our model represents
a systems approach to understanding universities and change, so an exploration
of systems thinking is inevitable if we are to understand higher education
institutions adequately and use such understanding to adopt change strategies
that have a reasonable chance of success. In this section, therefore, we consider
some fundamental concepts in systems theory.
The essential feature of a systems approach is to view the university as a
whole, rather than focusing on one or even several of its parts. A systems
approach therefore complements the expanding body of work more narrowly
focused on specific aspects of the university such as assessment strategies,
students’ experience of assessment in their courses, plagiarism or departmental
responses to assessment issues. A systems approach is also a reaction to the
reductionism implicit in much thinking about learning, teaching and assess-
ment, according to which change depends on improvements in one or two key
areas of academic practice.

Hard and Soft Systems

Within systems thinking, a useful distinction is often made between hard


and soft systems. Checkland has documented the shift in thinking from hard to
soft systems well (Checkland 1993; 1999). Hard systems are associated
with organisations in which clearly defined, uncontested goals can be achieved
and problems corrected through the systematic application of agreed means. The
system is seen as somewhat self-evident and its existence is independent of those
200 R. Macdonald and G. Joughin

working towards its goals. Importantly, the system is open to direct manipulation
by human intervention – things can be made to happen.
In contrast to this, soft systems methodologies come into play where objec-
tives are less clear, where assumptions about elements of the system cannot be
taken for granted, and where there is a challenging issue needing to be
addressed, in short, in situations where organisational, management and lea-
dership issues are involved. Checkland’s acronym, CATWOE, indicates the
flavour of soft systems thinking (Checkland, 1993). It refers to the elements
involved in defining a situation, where there are customers (the beneficiaries of
the system); actors (the agents who undertake the various activities within the
system); a transformation process (which results, in the case of higher education
institutions, in graduates and research outputs); a Weltanschauung, or world-
view; ownership (those who fund the system); and environmental or contextual
constraints on the system. Checkland distinguishes the move towards soft
systems as a move away from working with obvious problems towards working
with situations which some people may regard as problematical, which are not
easily defined, and which involve approaches characterised by inquiry and
learning (Checkland, 1999).
When the university is construed as a hard system, it is considered to have
agreed goals of good practice, clearly defined structures designed to achieve
these goals, and processes that are controlled and somewhat predictable. This is
a view of universities which may appeal to some, particularly senior managers
keen to introduce change and move their institution in particular directions.
However, hard systems thinking is inappropriate where ends are contested,
where stakeholders have differing (and often competing) perspectives and
needs, and where understanding of the system is necessarily partial. Inappro-
priate as it may be, the temptation to see the university as a hard system
remains, especially in the minds of some administrators or leaders seeking to
achieve specific ends through imposing non-consensual means. When this
occurs, change becomes problematic, since change in a soft system involves
not merely working on the right elements of the system, but understanding
transformational processes in particular ways, including understanding one’s
own role as an actor within the system.
The difficulties in effecting change within universities as systems is often
attributed to the ‘loose-coupling’ between units. Loose-coupling is well
described by Eckel, Green, Hill and Mallon as characteristic of an organisation
in which activity in one unit may be only loosely connected with what happens in
another unit. This compares with more hierarchical or ‘tightly-coupled’ organizations
where units impact more directly on each other. As a result, change within a university
is likely to be small, improvisational, accommodating, and local, rather than large,
planned, and organization-wide. Change in higher education is generally incremental
and uneven because a change in one area may not affect a second area, and it may affect
a third area only after significant time has passed. If institutional leaders want to
achieve comprehensive, widespread change, they must create strategies to compensate
for this decentralization. (Eckel, Green, Hill & Mallon., 1999, p. 4)
11 Changing Assessment in Higher Education 201

Loose coupling can apply to both hard and soft systems thinking. Both assume
a degree of causal relationship between elements or units, and in both cases
loose coupling highlights that these connections are not necessarily direct and
immediate. In our earlier scenario, we might consider the loose couplings
involved as we follow the impact of the university’s assessment policies as
they move from the university council down to faculties and departments and
into the design of course assessment and its implementation by individual
academic staff and students, noting how original intentions become diluted,
distorted or completely lost at various stages.
Loose coupling is an evocative term and, until recently, few would have
contested its use to describe the university. Loose coupling, however, assumes a
degree of connectivity or linearity between university units. This view has been
challenged by a quite different understanding which sees the university as a non-
linear system, where interactions between elements are far from direct and
where notions of causality in change need to be radically rethought, in short,
the university is seen as a complex adaptive system.

Complex Adaptive Systems


The notion of universities as complex adaptive systems promises interesting
insights. Thinking about complex adaptive systems has been developed in a
number of fields, including genetics, economics, and mathematics, and has
some of its foundations in chaos theory. It is applied to management and
organisation theory, and to an analysis of higher education institutions, by
way of analogy, and therefore we should proceed cautiously in our use of it here.
As noted earlier regarding models in general, the test of complexity theory’s
applicability to universities will lie in its usefulness. Having noted this caveat,
three constructs from the writing on complex adaptive systems seem particu-
larly suggestive, namely agents, self organization, and emergence.

Agents
Stacey describes a complex adaptive system as consisting of ‘‘a large number of
agents, each of which behaves according to some set of rules. These rules require
the agents to adjust their behaviors to that of other agents. In other words,
agents interact with, and adapt to, each other’’ (Stacey, 2003, p. 237). Each unit
in a university can be thought of as an agent, as can each individual within each
unit, with each agent operating according to its own set of rules while adjusting
to the behaviour of other individuals and units.
The concept of agents introduces considerable complexity in two ways.
Firstly, it draws our attention to the large number of actors involved in any
given university activity. At the level of the individual course or module, we
have the body of enrolled students, tutorial groups, and individual students, as
202 R. Macdonald and G. Joughin

well as a course coordinator, individual tutors, and the course team as a whole,
including administrative staff. To these we could add the department which
interacts in various ways with each of the agents mentioned, and multiple agents
outside the department, including various agents of the university as well as
agents external to the university.
Secondly, each agent operates according to their own rules. This under-
standing is in contrast to the view that the various players in a university are
all operating within an agreed order, with certain values and parameters of
behaviour being determined at a high level of the organisation. Instead of
seeking to understand an organisation in terms of an overall blueprint, we
need to look at how each agent or unit behaves and the principles associated
with that behaviour. What appear to be cohesive patterns of behaviour asso-
ciated with the university as a whole are the result of the local activity of this
array of agents, each acting according to their own rules while also adapting to
the actions of other agents around them.
Many writers have noted the many sets of rules operating in universities,
often pointing out that the differing sets of values which drive action can be
complex and are typically contrasting. While some values, such as the impor-
tance of research, may be shared across an institution, variation is inevitable,
with disciplinary cultures, administrators, academics and students all having
distinctive beliefs (Kezar, 2001). Complexity theory draws our attention to
these variations, while suggesting that further variation is likely to be present
across each of the many agents involved in the university system.
One implication for our model is that it may need to be seen in the light of
the different agents within a university, with the concept of agency brought to
bear on each of the model’s elements. Aligned with this, we would interpret the
model as incorporating the actions within each agent, interactions between
agents, and a complex of intra- and inter-agent feedback loops (Eoyang,
Yellowthunder, & Ward, 1998). The model thus becomes considerably more
complex, and understanding and appreciating the rules according to which the
various agents operate, and how they might adapt to the actions of other
agents, becomes critical to understanding the possibilities and limitations of
change.

Self-organisation

With a multiplicity of agents each functioning according to their own rules, a


complex adaptive system would seem likely to behave chaotically. The prin-
ciple of self-organisation seeks to explain why this is not the case, and why
organisations, including universities, exhibit organised patterns of behaviour
(Matthews, White, & Ling, 1999). Systems as a whole and agents within a
system self-organise, change and evolve, often in response to external chal-
lenges. Within a complex system of agents, however, each agent acts according
11 Changing Assessment in Higher Education 203

to its own rules and according to what it believes is in its best interests. Tosey
(2002) illustrates this well with reference to students adjusting their behaviour in
meeting, or not meeting, assignment deadlines – their patterns changed in direct
response to their tutors’ lack of consistency in applying rules. Multiply this
kind of scenario countless times and we have a picture of the university as a
collection of units and individuals constantly changing or re-organising, in
unpredictable ways, in response to change. The result is not chaos, but neither
is it predictable or easily controlled, since multifarious internal and external
factors are involved.

Emergence

The characteristics of agents and self-organisation give rise to the third core
construct of complex adaptive systems. Many of us are accustomed to thinking
in terms of actions and results, of implementing processes that will lead, more or
less, to intended outcomes. The notion of emergence in complex adaptive
systems is a significant variation on this. Emergence has been defined as ‘‘the
process by which patterns or global-level structures arise from interactive local-
level processes. This ‘structure’ or ‘pattern’ cannot be understood or predicted
from the behaviour or properties of the components alone’’ (Mihata, 1997,
p. 31, quoted by Seel, 2005). In other words, how universities function, what
they produce, and certainly the results of any kind of change process, emerge
from the activities of a multiplicity of agents and the interactions between them.
In such a context, causality becomes a problematic notion (Stacey, 2003);
certainly the idea of one unit, or one individual at one level of the organisation,
initiating a chain of events leading to a predicted outcome, is seen as misleading.
As Eckel et al. (1999, p. 4) note, ‘‘(e)ffects are difficult to attribute to causes’’,
thus making it hard for leaders to know where to focus attention. On the other
hand, and somewhat paradoxically, small actions in one part of a system may
lead to large and unforeseeable consequences.
Emergence also suggests the need for what Seel (2005) refers to as watchful
anticipation on the part of institutional managers, as they wait while change
works it way through different parts of a system.
Emergence implies unpredictability, and, for those of us in higher education,
it reflects the notion of teaching at the edge of chaos (Tosey, 2002). Along with the
concept of agency, the complexity which emergence entails ‘‘challenges managers
to act in the knowledge that they have no control, only influence’’ (Tosey, 2002,
p. 10). Seel (2005) suggests a set of preconditions for emergence, or ways of
influencing emergence, including connectivity – emergence is unlikely in a frag-
mented organisation; diversity – an essential requirement if new patterns are to
emerge; an appropriate rate of information flow; the absence of inhibiting factors
such as power differences or threats to identity; some constraints to action or
effective boundaries; and a clear sense of purpose.
204 R. Macdonald and G. Joughin

Let’s return to our earlier scenario. Does viewing the university as a complex
adaptive system help us to see this situation differently, and in a more useful
way? For a start, it leads us to focus more strongly on the agents involved –
the university’s policy unit; whatever committee and/or senior staff are
responsible for quality assurance; the department and its staff; and the
students as individuals and as a body. What drives these agents, and what
motivates them to comply with or resist university policy? How do they self-
organise when faced with new policies and procedures? What efforts have
been made across the university to communicate the Quality Assurance
Agency’s codes of practice, and what, apart from dissatisfied students, has
emerged from these efforts? Perhaps most importantly, how might Seel’s
preconditions for emergence influence attempts to improve the situation?

Complex Adaptive Systems and Change

It is, at least in part, the failure of such policies and the frustrations experienced
in seeking system-wide improvements in teaching, learning and assessment that
has led to an interest in harnessing the notion of the university as a complex
adaptive system in the interests of change. While our model suggests that
change can be promoted by identifying critical elements of the university system
and key relationships between them, perhaps the facilitation of change in a
complex adaptive system requires something more.
Two management theorists have proposed principles of organisational
change in the context of complex adaptive systems. The first of these, Dooley
(1997), suggests seven principles for designing complex adaptive organisations.
We would argue that universities are complex adaptive organisations – they do
not need to be designed as such. We interpret Dooley’s suggestions as ways of
living with this complexity and taking the best advantage of the university’s
resulting organisational qualities. Dooley’s principles are as follows:
(a) create a shared purpose;
(b) cultivate inquiry, learning, experimentation, and divergent thinking;
(c) enhance external and internal interconnections via communication and
technology;
(d) instill rapid feedback loops for self-reference and self-control;
(e) cultivate diversity, specialisation, differentiation, and integration;
(f) create shared values and principles of action; and
(g) make explicit a few but essential structural and behavioral boundaries.
(Dooley, 1997, 92–93)
Dooley’s list is prescriptive. It suggests how complexity theory could be
applied to an organisation. The second theorist, Stacey, in contrast, eschews
the notion of application, implying as it does the intentional control of a system
by a manager who stands outside it. Rather, ‘‘managers are understood to be
11 Changing Assessment in Higher Education 205

participants in complex responsive processes, engaged in emergent enquiry into


what they are doing and what steps they should take next’’ (Stacey, 2003,
p. 414). Consequently, instead of offering applications or prescriptions, Stacey
notes how the theory of complex adaptive systems shifts our focus of attention.
In the context of our model, the elements remain, but the focus becomes
directed in particular ways, namely
 on the quality of participation, that is, on how self-organising units are
responding to initiatives, and recognise the responses and decisions that
emerge from them;
 on the quality of conversational life, noting the themes that organise collea-
gues’ conversational relating (along with themes that block free-flowing
conversation) and bringing themes from outside the institution (or one’s
part of it) into conversations;
 on the quality of anxiety and how it is lived with. Anxiety is an inevitable
consequence of free-flowing conversation based on a search for new mean-
ings. Public conversations have private implications, such as a threat to one’s
professional identity; principles of self-organisation apply at the individual
as well as the unit level.
 on the quality of diversity. Diversity is essential if new ways of doing things
are to emerge. Deviance, eccentricity and subversion play important roles in
emergence.
 on unpredictability and paradox. Understanding universities as complex adap-
tive systems leads to the recognition of the limits to predictability, that we
must often act not knowing what the consequences will be, and that surprise
becomes part of the dynamic of the process of change. Actions should be
judged on whether they create possibilities of further action rather than on
whether they produce predicted desired outcomes.

So what do we do now to bring about change in our scenario? As well as


looking at the various units, how they operate, and what has resulted, we are
likely to look closely at whatever happens as change is initiated. We will keep
in touch with colleagues, teaching teams, departments and other units to see
how they are responding. We will look out for, and encourage the expression
of, disparate views about what should happen. We will cease looking only
for outcomes that meet our objectives, and be alert to the unexpected. We
will work in ways that energise our colleagues. We will seek to create
contexts for conversations and learning.

Some Practical Implications: Change as Conversation


By starting with a model and exploring how aspects of the change literature help
us to put some flesh on it for a complex institution such as a university, we may
lose sight of the main purpose of our enquiry in this chapter – to understand
206 R. Macdonald and G. Joughin

how to bring about improvements in assessment most effectively in order to


improve the practices and experiences of all those engaged in the process.
Clearly it is not just the elements of our model which need exploring but
the relationships between them – the arrows in Fig. 11.1 – and how patterns
of interrelationships overlay the model. Exploring the relationship between
systems thinking and complexity, Stacey (2007) draws on a responsive pro-
cesses perspective whereby individuals are seen as interdependent persons
whose social interactions produce patterns of relationships which develop
over time. Often characterised as conversations, these interactions allow
more novel strategies to emerge, depending on how valued these conversa-
tions are within the organisation. The case study presented towards the end
of this chapter provides a genuine and current example of a change process
which, as well as using institutional structures, arose from coffee bar con-
versations. Importantly, it gives credence to those conversations that Shaw
(2002) and Owen (1997) identify as critical change mechanisms in complex
organisations.
The educational change literature, examining large change initiatives in
compulsory and post-compulsory education in recent years, alludes to many
of the complexity constructs without naming them as such. Fullan, a well-
known writer on educational change within schools, is a prime example. He
notes that significant educational change ‘‘consists of changes in beliefs,
teaching styles, and materials, which can come about only through a process
of personal development in a social context’’ (Fullan, 2001, p.124, his empha-
sis). He goes on to stress the need for teachers to be able to ‘‘converse about the
meaning of change’’ (ibid.). Fullan recognises that change involves individuals
in a social context and that there may be resistance to change or tokenistic
compliance, possibly based on past histories of change or engagement with
management. This necessitates that dialogue has to be the starting point for
change. Failure to engage individuals whereby they ‘‘alter their ways of
thinking and doing’’ is unlikely to take them through anxiety and feelings of
uncertainty (what Fullan calls the ‘‘implementation dip’’) to successful
implementation.
Hopkins’ work is also illuminating. In seeking to make connections
between research and experiences in schools and higher education, Hopkins
(2002) stresses the role of networks in supporting innovation and educa-
tional change by bringing together those with like-minded interests. This is
similar to Lave and Wenger’s concept of communities of practice (1991)
which are ‘‘ . . .groups of people who share a concern, a set of problems, or a
passion about a topic, and who deepen their knowledge and expertise in this
area by interacting on an ongoing basis’’ (Wenger, McDermott & Snyder,
2002, p. 4). They are separate from but may be part of the institution; they do
not come about as the result of organisational design. Though Wenger does
not use the terms, communities of practice may be thought of as emergent
and self-organising as the result of social interaction between agents within
the organisation.
11 Changing Assessment in Higher Education 207

Recognising that consensus is difficult in higher education because of the


individual autonomy which academics have, Hopkins (2002) outlines a set of
principles upon which successful improvement programmes in higher education
should draw. Such programmes should be:
 achievement focused;
 empowering in aspiration;
 research based and theory rich;
 context specific;
 capacity building in nature;
 inquiry driven;
 implementation oriented;
 interventionist and strategic;
 externally supported; and
 systemic.
Hopkins acknowledges that change initiatives will not necessarily draw on
all of these principles and that, because of the differing contexts, cultures and
nature of institutions, they may need to adopt differentiated approaches to
choose or adapt appropriate strategies. This highlights the fact that one size fits
all approaches to change, perhaps using the latest fad in organisational devel-
opment or strategic management, is unlikely to have the desired effects. The
practical application of any model or set of conceptual constructs such as those
expounded in the complexity literature will depend on how well we understand
the way our organisation works and how the important networks within it
operate. One approach to developing this understanding is through apprecia-
tive inquiry.
Appreciative inquiry was developed by David Cooperrider and colleagues
at Case Western Reserve’s School of Organization Behaviour in the 1980s
(Cooperrider, Whitney, & Stavros, 2005). The approach is based on the premise
that organisations change in the direction in which they inquire, so that an
organisation which inquires into problems will keep finding problems but an
organisation which attempts to appreciate what is best in itself will discover
more and more that is good. It can then use these discoveries to build a new
future where the best becomes more common.
Appreciative inquiry can be applied to many kinds of organisational devel-
opment, including change management processes involved in quality enhance-
ment initiatives. The first step in appreciative inquiry is to select the affirmative
topic – that is, the topic that will become the focus of questioning and future
interventions. In our case, this would be assessment. An appreciative inquiry
into assessment would then explore two questions:
 What factors give life to assessment in this course/programme/university
when it is and has been most successful and effective? This question seeks to
discover what the organisation has done well in the past and is doing well in
the present in relation to assessment.
208 R. Macdonald and G. Joughin

 What possibilities, expressed and latent, provide opportunities for more


vital, successful, and effective assessment practices? This question asks the
participants to envisage and design a better future. (Cooperrider, Whitney, &
Stavros, 2005, p. 32)
Participants then engage in appreciative interviews, one-on-one dialogues
using questions related to highpoint experiences, valuing what gives life to the
organisation/project/initiative at its best.
Whether one uses appreciative inquiry or some other approach to reflecting
on the organisation, a key focus is on asking questions, engaging in conversa-
tions and, drawing on complexity theory, enabling change to emerge. This may
prove problematic in situations where quality assurance processes have created
a compliance culture rather than one characterised as unpredictable, creative,
challenging and paradoxical. Hopkins (2002) observes that changes in schools
have been characterised by increases in accountability and adherence to
national educational reform programmes. One aspect of similar reforms in
higher education in the UK and elsewhere has been the introduction of self-
evaluation – currently called institutional audit in the UK (Quality Assurance
Agency, 2006a) – which encourages a more open, reflective evaluation of the
institution, including through the evaluation of subjects and courses. Since this
process includes external examiners, there is a triangulation of review and
evaluation which is increasingly focused on quality enhancement and evalua-
tion for development and learning, not just for accountability purposes
(Chelimsky, 1997).
The following example illustrates how a change programme that is still in
progress developed, not out of a top-down dictat, but out of a number of
occurrences at different levels of a university. The reader is asked to reflect on
how the matters we have considered in this chapter – systems approaches (parti-
cularly complex adaptive systems), change as conversation, and approaches to
educational change – can aid us in understanding the nature of successful change
within higher education institutions. It would require a more in depth presenta-
tion to highlight how agents, self-organisation and emergence all played their
parts, but their roles are all present in the scenario which follows.

A Case Study in Institutional Change: Sheffield


Hallam University

Sheffield Hallam University, no doubt like many others, has made changes
to assessment regulations and frameworks, processes and systems, and aca-
demic practices over the years without necessarily ensuring that the various
aspects are well integrated or that the whole is viewed holistically.
In 2005 the Assessment Working Group, which had been addressing assess-
ment regulations and procedures for a number of years, acknowledged some-
thing was not right. At the same time feedback from students was suggesting
11 Changing Assessment in Higher Education 209

that, whilst the quality of their learning experience overall was very good, there
were some aspects of assessment and feedback which needed addressing – a fact
reinforced in some subject areas by the results of the 2006 National Student
Survey.
It was in this context that the first author and a senior academic in one of the
university’s faculties produced a paper on Profile Assessment that addressed a
number of issues concerning assessment, feedback and re-assessment. This was
the outcome of the first of many coffee bar discussions! A further outcome was
a spin-off from one of the University’s Centres for Excellence in Teaching and
Learning (CETLs) with the proposal by the first author to establish TALI –
The Assessment for Learning Initiative.
At the same time, with the growing use of Blackboard as the university’s
virtual learning environment and the need for it to communicate effectively
with the University’s central data system, along with increasing complexity
in the assessment regulations as changes were made from year to year, there
was a growing misalignment between the practices of academics, the regula-
tions and procedures, and the systems used to manage and report on the
outcomes of the assessment procedures. There was also a perception that the
assessment regulations and procedures drove the academic practices, rather
than the other way around, producing what was commonly referred to as
‘the burden of assessment’.
The opportunity was taken to gather a group together, involving senior
staff from key departments and academics from the faculties at a local
conference centre, Ranmoor Hall, to discuss the issues widely and produce
suggestions from what became known as the Ranmoor Group. The Assess-
ment Project was established to focus on the deliberate actions required to
support the development of:
 assessment practices that are learner focused and promote student
engagement and attainment;
 regulations that are clear, consistent and student centred; and
 assessment processes that are efficient and effective and which enable the
delivery of a high quality experience for staff and students.
A lead was taken in each of these areas by an appropriate senior member
of the relevant department under the aegis of the Assessment Development
Group. This group comprised three separate working groups and interacted
with other key parts of the university. It reported to an Assessment Project
Board, chaired by the Pro Vice-Chancellor, Academic Development, to
ensure the appropriate governance of the project and the availability of
people and resources to make it work.
One aspect, The Assessment for Learning Initiative (TALI), illustrates
how change has been occurring. TALI’s aim has been to achieve large-scale
cultural change with regard to assessment practice across the university with
a focus on:
210 R. Macdonald and G. Joughin

 research informed change, supported by a research post in the institution’s


academic development unit (the Learning and Teaching Institute), with a
strong focus on the student and staff experience of assessment;
 the development of resources and case studies sharing good practice and
focusing on key assessment themes such as feedback and academic integ-
rity and supported by the appointment of secondees to key roles in each
faculty; and
 the innovative use of technology to create learning experiences and improve
the efficiency and effectiveness of assessment by, for example, promoting the
use of audio feedback and the development of feedback tools.
TALI has engaged with large numbers of staff through faculty and
subject group meetings, more coffee bar encounters, and generally raising
the profile of assessment across the university. Most importantly, it has
engaged with staff at all levels and with students, including through the
Students’ Union. The faculty appointments ensure the local ownership of
the initiative and remove much of the sense of a top-down imposition.
The initiative has been received enthusiastically across the university as
people feel they genuinely have something to contribute and that the changes
are designed to help them focus assessment more on learning and less on
fitting into institutional constraints.
As one would expect in a large institution (28,000+ students) with a diverse
student and academic population, change has not always gone smoothly.
However, the enthusiasm and commitment of the whole team to the success
of the initiative, including the Pro Vice-Chancellor, Academic Development,
is resulting in significant progress. The PVC recently said that this was the
most exciting change initiative that he had been involved with at the Uni-
versity as it was so successful and leading to genuine change.

Conclusion

Early in this chapter we posed the question, ‘‘What does it take to improve
assessment in a university?’’ and suggested that part of the response lay in a
model of the university that would help us to see more clearly those aspects of
the university, and the relationships between them, that would require attention
if initiatives for change were to be effective.
It is possible, in light of the various matters canvassed in this chapter, that
the complexities of universities may defy the kind of simple depiction we have
presented in Fig. 11.1. However, we do not wish to propose a new model here.
We believe that a model which is true to the complexity of higher education
institutions still needs to identify the key elements of those institutions and the
connections between them as we have done in our model. On the other hand, a
useful model must emphasise the role of agents in relation to the elements,
including both individuals and units involved in or likely to be affected by change.
11 Changing Assessment in Higher Education 211

Moreover, such a model needs to recognise the complexity of interactions


between agents, along with the agents’ essentially self-organising nature.
Finally, the model needs to be considered in relation to an ‘overlay’ of change
management, or conditions for improvement such as those proposed by Dooley
(1997) and Stacey (2003, 2007). In short, while a simple model of higher
education systems can be useful in drawing attention to certain aspects of the
system, the model as represented in Fig. 11.1 is only a partial representation
which needs to be understood in terms of the complexity and change factors
considered in this chapter.
We noted at the beginning of the chapter that models seek to simplify what
they are describing in order to assist thinking and action. Now we see that a
model of higher education institutions which might help us consider processes
for improving teaching, learning and assessment needs to be more complex than
we initially envisaged. Is a simple model of a complex system possible? The case
study suggests a model cannot be prescriptive as institutions differ – one size
does not fit all. We offer these thoughts in the hope that they will prompt further
thinking and discussion, and in the belief that conversations towards develop-
ing a more comprehensive model will be as valuable as what may emerge.

An Endnote
This chapter reflects a journey by the authors. It began when Ranald partici-
pated in a workshop run by Gordon at a conference in Coventry, UK in July
2003. This was followed by lengthy email exchanges which resulted in an article
for the Institute for Learning and Teaching (now the UK’s Higher Education
Academy) website (Joughin & Macdonald, 2004). The article introduced the
model and explored the various elements and relationships, though in little
depth and without locating it in any theoretical or conceptual literatures.
Two subsequent visits by Ranald to the Hong Kong Institute of Education,
where Gordon was then working, allowed us to develop our ideas further and
we began to draw on a variety of literatures, not least in the areas of systems and
complexity. The ideas for this chapter have developed whilst we have been
writing and are likely to continue to change from the point at which we are
writing in October 2007.
Our discussions and writing in themselves reflect aspects of complexity with
ourselves as the agents, self-organising our thoughts leading to emergent ideas
for influencing policy and practice in our roles as academic developers. Further,
we can identify with other areas of the literature such as communities of
practice, appreciative inquiry and the literature on educational change. Perhaps
the concern for us at the moment is to develop a highly pragmatic approach to
facilitating change, informed by the latest organisational development or stra-
tegic management thinking, but not dominated by this. We note with interest
that the subtitle of Stacey, Griffin and Shaw’s book applying complexity theory
212 R. Macdonald and G. Joughin

to management (Stacey, Griffin & Shaw, 2000) is ‘‘Fad or Radical Challenge to


Systems Thinking?’’ Perhaps it does not matter which it is so long as it aids
understanding and leads to effective change leadership and management
around important educational issues such as learning and assessment.

References
Beer, S., Strachey, C., Stone, R., & Torrance, J. (1999). Model. In A. Bullock & S. Tromble
(Eds.), The new Fontana dictionary of modern thought (3rd ed.), (pp. 536–537). London:
Harper Collins.
Boud, D., & Falchikov, N (Eds.). (2007). Rethinking assessment in higher education. Abing-
don, UK: Routledge.
Brown, S., & Glasner, A. (Eds.). (1999). Assessment matters in higher education. Buckingham,
UK: SRHE/Open University Press.
Bryan, C., & Clegg, K. (2006). Innovative assessment in higher education. Abingdon, UK:
Routledge.
Checkland, P. (1993). Systems thinking, systems practice. Chichester: John Wiley and Sons.
Checkland, P. (1999). Soft systems methodology: A 30-year retrospective. Chichester: John
Wiley and Sons.
Chelimsky, E. (1997). Thoughts for a new evaluation society. Evaluation, 3(1), 97–118.
Cooperrider, D. L., Whitney, D., & Stavros, J. M. (2005). Appreciative inquiry handbook.
Brunswick, Ohio: Crown Custom Publishing.
Dooley, K. (1997). A complex adaptive systems model of organization change. Nonlinear
Dynamics, Psychology, and life Sciences, 1(1), 69–97.
Eckel, P., Green, M., Hill, B., & Mallon, W. (1999). On change III: Taking charge of change:
A primer for colleges and universities. Washington, DC: American Council on Education.
Retrieved October 12, 2007, from http://www.acenet.edu/bookstore/pdf/on-change/on-
changeIII.pdf
Eoyang, G., Yellowthunder, L., & Ward, V. (1998). A complex adaptive systems (CAS)
approach to public policy making. Society for Chaos Theory in the Life Sciences. Retrieved
October 12, 2007, from http://www.chaos-limited.com/gstuff/SCTPLSPolicy.pdf
Fullan, M. (2001). The new meaning of educational change (3rd ed.). London: RoutledgeFalmer.
Gladwell, M. (2000). The tipping point. London: Abacus.
Heywood, J. (2000). Assessment in higher education: Student learning, teaching, programmes
and institutions. London: Jessica Kingsley.
Hopkins, D. (2002). The evolution of strategies for educational change – implications for higher
education. Retrieved October 10, 2007, from archive at The Higher Education Academy
Website: http://www.heacademy.ac.uk
Joughin, G., & Macdonald, R. (2004). A model of assessment in higher education institutions.
The Higher Education Academy. Retrieved September 11, 2007, from http://www.
heacademy.ac.uk/resources/detail/id588_model_of_assessment_in_heis
Keeves, J. P. (1994). Longitudinal research methods. In T. Husén, and N. T. Postlethwaite,
(Eds.), The international encyclopedia of education (2nd edn), (Vol. 6, pp. 3512–3524).
Oxford: Pergamon Press.
Kezar, A. (2001). Understanding and facilitating organizational change in the 21st century:
Recent research and conceptualizations. San Francisco: Jossey-Bass.
Knight, P. (Ed.). (1995). Assessment for learning in higher education. London: Kogan
Page.
Knight, P. (2000). The value of a programme-wide approach to assessment. Assessment and
Evaluation in Higher Education, 25(3), 237–251.
11 Changing Assessment in Higher Education 213

Knight, P., & Trowler, P. (2000). Department-level cultures and the improvement of learning
and teaching. Studies in Higher Education, 25(1), 69–83.
Knight, P., & Yorke, M. (2003). Assessment, learning and employability. Buckingham, UK:
SRHE/Open University Press.
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cam-
bridge, UK: Cambridge University Press.
Matthews, K. M., White, M. C., & Ling, R. G. (1999). Why study the complexity sciences in
the social sciences? Human Relations, 52(4), 439–462.
Meadows, D. (1999). Leverage points: Places to intervene in a system. Hartland: The Sustain-
ability Institute. Retrieved 27 September, 2007, from http://www.sustainabilityinstitute.
org/pubs/Leverage_Points.pdf
Miller, A.H., Imrie, B.W., & Cox, K. (1998). Student assessment in higher education. London:
Kogan Page.
Miller, C. M. L., & Parlett, M. (1974). Up to the mark: A study of the examination game.
London: SRHE.
Nash, J., Plugge, L., & Eurlings, A. (2000). Defining and evaluating CSCL Projects. Unpub-
lished paper, Stanford, CA: Stanford University.
Owen, H. (1997). Open space technology: A user’s guide. San Francisco: Berrett-Koehler
Publishers.
Quality Assurance Agency. (2006a). Code of practice for the assurance of academic quality
and standards in higher education, Section 6: Assessment of students. Retrieved October
12, 2007, from http://www.qaa.ac.uk/academicinfrastructure/codeOfPractice/section6/
default.asp
Quality Assurance Agency for Higher Education (2006b). Handbook for institutional audit:
England and Northern Ireland. Mansfield: The Quality Assurance Agency for Higher
Education.
Schwarz, P., & Gibbs, G. (Eds.). (2002). Assessment: Case studies, experience and practice
from higher education. London: Kogan Page.
Seel, R. (2005). Creativity in organisations: An emergent perspective. Retrieved 3 October,
2007, from http://www.new-paradigm.co.uk/creativity-emergent.htm
Shaw, P. (2002). Changing conversations in organizations: A complexity approach to change.
Abingdon, UK: Routledge.
Snyder, B. R. (1971). The hidden curriculum. New York: Knopf.
Stacey, R. (2003). Strategic management and organizational dynamics: The challenge of
complexity (4th ed.). Harlow, England: Prentice-Hall.
Stacey, R. D. (2007). Strategic management and organizational dynamics (5th ed.). Harlow:
Pearson Educational Limited.
Stacey, R. D., Griffin, D., & Shaw, P. (2000). Complexity and management: Fad or radical
challenge to systems thinking? London: Routledge.
Tosey, P. (2002). Teaching at the edge of chaos. York: LTSN Generic Centre.
Retrieved October 4, 2007, from http://www.heacademy.ac.uk/resources.asp?
process=full_record&section=generic&id=111
Trowler, P., Saunders, M., & Knight, P. (2003) Changing thinking, changing practices. York:
Higher Education Academy, available from enquiries@heacademy.ac.uk
Wenger, E, McDermott, R., & Snyder, W. M. (2002). Cultivating communities of practice.
Boston, Mass: Harvard Business School Press.
Yorke, M. (2001) Assessment: A guide for senior managers. York: Higher Education Academy
Chapter 12
Assessment, Learning and Judgement:
Emerging Directions

Gordon Joughin

Introduction

This book began by noting the complexity of assessment of student learning as


a field of scholarship and practice and proposing a re-consideration of a range
of issues concerning the very meaning of assessment, the nature and process of
making professional judgements about the quality of students’ work, the
various relationships between assessment and the process of student learning,
and the intricacies of changing how assessment is thought of and practised
across an institution.
The complexities of assessment are indicated by the range of matters
addressed in this single volume: foundational empirical research; the role of
the work context in determining approaches to assessment; how judgements
are made; the limitations of grades as reporting instruments; quality measures
for new modes of assessment; students’ experience of assessment cultures;
assessment as a source of insight for teaching; the role of plagiarism in subvert-
ing learning; and the principles and processes involved in institution-wide
enhancement of assessment. Perhaps it is appropriate that the penultimate
chapter drew on complexity theory as an aid to understanding higher education
institutions in search of improved assessment.
Each author has argued for changes in thinking and/or practice within the
particular focus of their chapter. However, the chapters’ various foci are not
isolated from each other; they are inter-related in building a picture of a
coherent culture of assessment appropriate for the first decades of the 21st
century. Thus the definition of assessment proposed in the introductory chapter
provides a basis for understanding the approaches to assessment discussed in
each of the chapters, while its emphasis on assessment as judgement is particu-
larly reinforced by Boud and Sadler. Dochy’s argument for edumetric rather
than psychometric standards for judging the quality of assessment provides

G. Joughin
Centre for Educational Development and Interactive Resources,
University of Wollongong, Australia
e-mail: gordonj@uow.edu.au

G. Joughin (ed.), Assessment, Learning and Judgement in Higher Education, 215


DOI: 10.1007/978-1-4020-8905-3_12, Ó Springer ScienceþBusiness Media B.V. 2009
216 G. Joughin

support for the forms of assessment that encourage ability-based learning


(Riordan and Loacker) and require students to create rather than simply find
answers to assessment tasks (Carroll). And the capacity for self-assessment
which underpins Riordan and Loacker’s work at Alverno College is the very
same capacity which forms the basis for Yorke’s approach to students’ claims
making in relation to their own learning at the completion of their degree. The
notion of assessment cultures introduced by Ecclestone highlights the impor-
tance of understanding assessment in terms of students’ and teachers’ experi-
ence of assessment, an experience which permeates their careers (whether as
students or teachers) and which must be accommodated as students encounter
the new approaches to assessment argued for in this book. Finally, using
assessment results to improve teaching, and devising specific ways of doing
this in relation to particular forms of assessment (Suskie), is a principle that can
be applied to each of the approaches to assessment argued for elsewhere in the
book, not least in relation to those approaches that place students’ developing
the capacity to evaluate their own work at the heart of assessment in support of
learning.
Thus, while the chapters have been conceived independently (though admit-
tedly within the context of the learning-oriented assessment approach noted in
the Preface), there is a high degree of coherence across them, so that the book as
a whole constitutes a consistent argument for an integrated approach to assess-
ment based on a set of progressions:
 from conceptualizing assessment as a process of quasi-measurement, to con-
ceptualizing assessment as a process of informing, and making, judgements;
 from judgements based on criteria abstracted from an informed under-
standing of quality, to judgements based on an holistic appreciation of
quality;
 from assessments located within the frame of reference of the course, to
assessments located in a frame of reference beyond the course in the world of
practice;
 from simple grades as expressions of achievement, to more complex and
comprehensive representations of knowledge and abilities;
 from assessment as the endpoint of learning, to assessment as a starting
point for teaching;
 from assessment as discipline- and course-focused, to assessment focused on
generic abilities aligned with discipline knowledge;
 from standardised testing, to qualitative judgements of competence through
practice-based or practice-like tasks;
 from students as objects of assessment, to students as active subjects
participating in assessing their own work; and
 from a conception of the university as an organisation susceptible to sys-
temic change through managerial interventions, to a conception of the
university as a complex organism where change depends on the values,
intentions and actions of interdependent actors.
12 Assessment, Learning and Judgement 217

These movements raise a number of challenges, each far from trivial, which
could well set the agenda for theorising, researching, and acting to improve
assessment for the next decade and beyond.

Conceptual Challenges

The conceptual challenges posed here call for the ongoing exploration of
central tenets of assessment, subjecting them to critical scrutiny, and extending
our understanding of the emerging challenges to orthodoxy noted in this
book. Core concepts and issues that demand further scrutiny include the
following:
The purposes of assessment. While the multiple functions of assessment have
long been recognised, confusion continues regarding how these functions can be
held in creative tension. Terminology does not help – the very term formative
assessment, for example, suggests to many assessment of a certain type, rather
than assessment performing a certain function. The term learning-oriented
assessment is a step in the right direction, but only if it is seen as a way of
looking at assessment, rather than as a type of assessment. A single piece of
assessment may be, to varying extents, oriented towards learning, oriented
towards informing judgements about achievement, and oriented towards main-
taining the standards of a discipline or profession. Of course, according to the
definition of assessment proffered in Chapter 2, assessment will always entail
judgements about the quality of students’ work, irrespective of the purpose to
which these judgements are put. The challenge is to define assessment in its own
right, and to use this singular definition in understanding and defining each of
assessment’s multiple purposes.
The location of assessment. Boud has argued for locating assessment in
the context of the world of work which students will inhabit on graduation.
This context not only helps to define what kinds of assessment tasks will be
appropriate, but also emphasises the need for students to develop the capacity
to assess their own work, since this becomes a routine function in the workplace.
Once this perspective is adopted, other constructs fall into place, including the
use of complex, authentic tasks, the inclusion of generic attributes as integral
objects of assessment, and tasks that deter plagiarism and engage students in
learning through embedding assessment in current but constantly changing real
world contexts. The challenge is to unite the often necessarily abstracted nature
of learning in universities with its ultimate purpose – students need not only to
learn, but to learn in order to be able to act, and to be able to act in practice-like
contexts where they can experience themselves as responsible agents and be
aware of the consequences of their actions.
Assessment and judgement. Assessment as measurement represents a para-
digm that has been strongly articulated over decades. Assessment as judgement,
though underpinning what Dochy terms the new modes of assessment which
218 G. Joughin

have been the focus of attention in assessment literature for a least the past
twenty years, has received far less attention. Eisner’s work on connoisseurship
(Eisner, 1998) and Sadler and Boud’s work described in this book are notable
exceptions. The challenge is to continue the work of conceptualizing and
articulating assessment as judgement, drawing on both education and cognate
disciplines to illuminate our understanding of the nature of professional judge-
ment and how judgements are made in practice. The use of criteria has come to
be seen as central to judgement, to the extent that criterion-referenced assess-
ment has become an established orthodoxy in the assessment literature and the
policies, if not the practices, of many universities. Indeed, both Suskie and
Riordan and Loacker see criteria as central to learning. Sadler has challenged
us to reconsider the unthinking application of criteria as a basis of judgement,
to review our understanding of quality and its representation, and to legitimate
the role of holistic judgements.

Research Challenges

Given the centrality of assessment to learning, possibly every aspect of assess-


ment noted in this book could constitute a worthy object of research. Three
pivotal aspects of assessment and learning, widely acknowledged as such
throughout the higher education literature, certainly demand further scrutiny.
Firstly, the role of assessment as a driver of student learning. We cannot
assume, from the empirical studies of the 1960s and 1970s, that assessment
dominates students’ academic lives in the ways often supposed in much con-
temporary writing about assessment. There are two reasons for this. The first,
noted in Chapter 3, is that the earlier research itself was equivocal: while
assessment loomed large in the considerations of many students, this was far
from a universal experience and contextual factors needed to be taken into
account. The second is the truism that times have changed, and the nature of
students, universities, and teaching practices presumably have changed too.
There is a need to consider anew the role that assessment plays in students’
academic lives, including, perhaps, replicating (with appropriate variations)
those seminal studies reviewed in Chapter 3.
Secondly, the role of assessment in influencing students’ approaches to learning.
Despite the 30 years that have elapsed since the pioneering work of Marton
and Säljö (see Marton & Säljö, 1997), there is no clear indication that forms of
assessment per se can induce a deep approach to learning amongst students, nor
do we have detailed studies on the ways in which forms of assessment interact
with what Ramsden (2003) describes as a student’s overall orientation to study or
tendency to adopt a deep or surface approach to learning irrespective of context.
If assessment plays a major role in learning, and if deep approaches to learning
are essential for high quality learning outcomes, more focused research into the
relationship between assessment formats and approaches to learning is needed.
12 Assessment, Learning and Judgement 219

Thirdly, the role of feedback in students’ experience of learning. If feedback is


indeed essential to learning, how do we respond to the growing number of
studies indicating inadequacies in the quantity, quality and timeliness of feed-
back and the difficulties, especially in semesterised courses, of incorporating
assessment processes that ensure that feedback is actually used by students to
improve their work and learning? Has the role of feedback in learning been
exaggerated? Or is feedback less dependent on the overt actions of teachers than
we have thought? Qualitative studies of students’ experience of feedback in
relation to their learning may provide essential insights into this key aspect of
learning through assessment.

Practice Challenges

While the emphasis of this book has been on emerging understandings of


assessment, the implications of these understandings for the practice of assess-
ment are considerable. While some of these implications have been noted by
individual authors, many of them cluster around three sets of challenges: the
redesign of assessment tasks; realigning the role of students in assessment; and
the development of academics’ professional expertise as assessors of their
students’ learning.

Re-designing Assessment Tasks


Notwithstanding the considerable developments that have occurred in moving
towards forms of assessment that support learning in all of the ways considered
in this book, much assessment remains dominated by a measurement paradigm
principally designed to determine a student’s achievement. Essays, unseen
examinations, and multiple-choice tests, for example, continue to be the staple
forms of assessment in many disciplines and in many universities, and they
continue to be the forms of assessment with which many academics are most
familiar. Where this is the case, the arguments presented in this book call for
the re-design of assessment, with assessment tasks that incorporate certain
requisite qualities and perform certain critical functions in relation to learning.
The following are some of the more important of these consequences:
 Assessment tasks should incorporate the characteristics of practice, includ-
ing the contextual and embodied nature of practice, requiring the engage-
ment of the student as a whole person.
 Assessment tasks need to both develop and reflect students’ generic abilities –
for example, to communicate effectively within their discipline, to work
collaboratively, and to act in ways that are socially responsible – as well as
developing discipline specific knowledge and skills.
220 G. Joughin

 Assessment tasks should require responses that students need to create for
themselves, and should be designed to avoid responses that can be simply
found, whether on the Internet, in the work of past students, or in prescribed
texts or readings.
 Assessment tasks should become the basis of learning rather than its result,
not only in the sense of students’ responses informing ongoing teaching as
proposed by Suskie in Chapter 8, but perhaps more importantly in Sadler’s
sense of assessment as a process whereby students produce and appraise
rather than study and learn (Sadler, Chapter 4).

Reassigning Assessment Roles: Placing Students


at the Centre of Assessment

Central to the argument of this book is the role of students as active agents
in the acts of judgement that are at the heart of assessment. This requires
recognising that assessment is appropriately a matter that engages students,
not just teachers, in acts of appraisal or judgement, and therefore working with
students whose conceptions of assessment tend to place all authority in the
hands of teachers, in order to reshape their conceptions of assessment. In short,
this entails making assessment an object of learning, devoting time to help
students understand the nature of assessment, learn about assessment and
their role in it, especially in terms of self-monitoring, and learn about assess-
ment in ways that parallel how they go about the substantive content of their
discipline.

Professional Development

These challenges to practice clearly call for more than a simple change in
assessment methods. They require a highly professional approach to assessment
in a context where, as Ramsden has argued, ‘‘university teachers frequently
assess as amateurs’’ (Ramsden, 2003, p. 177). Developing such an approach
places considerable demands on the ongoing formation of university teachers,
though fortunately at a time when the professional development of academics
as teachers is being given increasing attention in many countries. This forma-
tion is undoubtedly a complex process, but it would seem to entail at least the
following: providing access to existing expertise in assessment; providing time
and appropriate contexts for professional development activities; motivating
staff to engage in professional development through recognising and rewarding
innovations in assessment; developing banks of exemplary practices; incorpor-
ating the expertise of practitioners outside the academy; and, perhaps critically,
making assessment a focus of scholarly activity for academics grappling with
its challenges.
12 Assessment, Learning and Judgement 221

The Challenge of Change

In Chapter 11 we posed the question, ‘‘What does it take to improve assessment


across an institution?’’ Clearly, while more professional approaches to assess-
ment by academics may be a sine qua non of such improvement, this is but one
factor amongst many, as the arguments presented in that chapter make clear.
Reconceptualizing assessment as judgement and reconfiguring the relationship
between assessment and learning occurs in the context of universities as com-
plex adaptive systems comprising multifarious agents operating within and
across different organisational levels, and with identities constituted both
within and outside the university. Making assessment a focus of conversation
across the institution and locating this discussion in relation to the concerns of
the various agents within the university, including course development commit-
tees, policy developers, examination committees, deans and heads of depart-
ments, staff and student associations, and those agents outside the university
with vested interest in programs, including parents, professional organisations,
and politicians, requires an exceptional level of informed and skilled leadership.

References
Eisner, E. (1985). The art of educational evaluation: A personal view. London: Falmer Press.
Marton, F., & Säljö, R. (1997). Approaches to learning. In F. Marton, D. Hounsell, &
N. Entwistle (Eds.), The experience of learning (2nd ed., pp. 39–58). Edinburgh: Scottish
Academic Press.
Ramsden, P. (2003). Learning to teach in higher education (2nd ed.). London: Routledge-
Falmer.
Author Index

A Bloom, B. S., 77
Adelman, C., 73 Bloxham, S., 46, 172
Amrein, A. L., 86 Bollag, B., 175
Anderson, L. W., 70, 77 Borich, G. D., 145
Anderson, V. J., 46, 137 Boud, D., 2, 3, 6, 7, 9, 13, 14, 15, 20, 29, 30,
Angelo, T., 123 35, 36, 88, 89, 106, 153, 195, 215, 217, 218
Angelo, T. A., 134, 148, 149 Boyer, E., 185
Arenson, K. W., 175 Braddock, R., 46
Askham, P., 88, 91, 92 Branch, W. T., 146
Astin, A. W., 134 Brandon, J., 70
Atherton, J., 117 Bransford, J. D., 186
Au, C., 118 Brennan, R. L., 99
Bridges, P., 69, 72
Brown, A. L., 121
B Brown, E., 24
Baartman, L. K. J., 95, 106 Brown, G., 1
Bagnato, S., 98 Brown, S., 88, 106, 195
Bain, J., 88, 90 Brumfield, C., 73
Bain, J. D., 21, 22 Bruner, J., 117
Baker, E., 95 Bryan, C., 17, 195
Ball, S. J., 154, 160 Bull, J., 1
Banta, T. W., 139 Burke, E., 6, 46
Barr, R. B., 134 Butler, D. L., 92, 93
Bastiaens, T. J., 95, 106 Butler, J., 15, 35
Bates, I., 168
Bateson, D., 100
Baume, D., 69 C
Baxter Magolda, M., 124 Campbell, D. T., 139
Becker, H. S., 17, 18 Carless, D., 2, 13
Beer, S., 194 Carroll, J., 7, 8, 115, 125, 216
Bekhradnia, B., 81 Cascallar, E., 86
Bennet, Y., 98 Chanock, K., 24, 71
Berliner, D. C., 86 Checkland, P., 199, 200
Biesta, G., 162, 163 Chelimsky, E., 208
Biggs, J. B., 17 Chen, M., 99
Biggs, J., 89, 91, 93, 135, 136 Chi, M. T. H., 46
Birenbaum, M., 87, 88, 90, 94, 96 Chickering, A. W., 134
Biswas, R., 71 Clark, R., 24
Black, P., 23, 35, 86, 92, 93, 153, 155, 156, 166 Clegg, K., 17, 195

223
224 Author Index

Coffey, M., 69 Eurlings, A., 194


Collins, A., 89, 95, 100, 106 Evans, A. W., 107
Cooperrider, D. L., 207, 208 Ewell, P. T., 134
Costa, A., 146
Coulson, R. L., 88
Cox, K., 195 F
Cronbach, L. J., 95, 96, 99 Falchikov, N., 30, 36, 85, 88, 153, 195
Crooks, T., 88, 89, 90, 92, 93 Fan, X., 99
Cross, K. P., 148, 149 Farmer, B., 106
Farr, M. J., 46
Felton, J., 66
D Feltovich, P. J., 88
Dalziel, J., 72 Firestone, W. A., 87
Dancer, D., 106 Fowles, M., 100
Davey, C., 87 Franklyn-Stokes, A., 129
David, M., 154 Frederiksen, J. R., 89, 95, 100, 106
Davies, J., 154, 161, 163 Frederiksen, N., 88, 90
Davies, M., 70 Freed, J. E., 46, 134, 142
De Sousa, D. J., 136 Freeman, R., 46
Deakin Crick, R., 87 Fullan, M., 206
DeMulder, E. K., 86, 87
Derrick, J., 158
Dewey, J., 57, 117 G
Dierick, S., 88, 89, 95, 100, 106 Gagne, R. M., 23
Dochy, F., 7, 8, 10, 21, 85, 86, 87, 88, 89, 91, Gamson, Z., 134
92, 93, 95, 96, 100, 104, 106, 134, 135, 136, Gardner, H., 178
215, 217 Gardner, J., 153
Dooley, K., 204 Gawn, J., 158
Dreschel, B., 158 Geer, B., 17, 18
Drummond, M. J., 157, 158 Gibbons, M., 76
Dunbar, S., 95 Gibbs, G., 2, 17, 18, 19, 23, 35, 88, 129, 195
Dunbar, S. B., 134 Gielen, S., 21, 87, 88, 89, 100, 106, 108
Dunbar-Goddett, H., 19 Gijbels, D., 87, 88
Dunn, L., 46 Gijselaers, W., 106
Gladwell, M., 198
Glaser, R., 46, 62
E Glasner, A., 195
Eastcott, D., 106 Gleser, G. C., 99
Ecclestone, K., 9, 153, 154, 156, 157, 158, Glover, C., 24
161, 163, 164, 166, 168, 216 Green, M., 200
Echauz, J. R., 71 Griffin, D., 211, 212
Eckel, P., 200, 203 Gronlund, N. E., 145
Eisner, E., 77, 218 Gulliksen H., 86
Eison, J., 65
Ekstrom, R. B., 69
Ellen, N., 120 H
Elmholdt, C., 34, 37 Haertel, E. H., 95, 99, 100
Elton, L., 19, 22 Hager, P., 15, 35
Elton L. R. B., 90 Haggis, T., 22
Entwistle, N., 118, 135 Haladyna, T. M., 145
Eoyang, G., 202 Handa, N., 118
Eraut, M., 75, 76 Hargreaves, E., 156
Ericsson, K. A., 46 Harlen, W., 87
Author Index 225

Hartley, P., 24 Kuh, G. D., 134


Haswell, R., 137 Kuin, L., 118
Haug, G., 80 Kvale, S., 34
Hawe, E., 69, 70
Hayes, N., 118
Heller, J. I., 98 L
Henscheid, J. M., 135 Lambert, K., 120
Heywood, J., 195 Langan, A. M., 107
Higgins, R., 24 Laurillard, D., 19, 20, 22, 23, 90
Hill, B., 200 Lave, J., 34, 206
Hopkins, D., 206, 207, 208 Law, S., 19
Hornby, W., 69 Lens, W., 93
Hounsell, D., 23 Leonard, M., 87
Hounsell, J., 23 Levi, A. J., 46
Howard, R. M., 121 Levine, A., 175
Huba, M. E., 46, 134 Lewis, R., 46
Hughes, E. C., 17, 18 Light, R., 134
Hyland, P., 24 Lindblad, J. H., 137
Ling, R. G., 202
Linn, R., 95, 99, 100, 106, 134
I Litjens, J., 23
Imrie, B. W., 195 Liu, N-F., 2, 13
Introna, L., 118 Livingston, S. A., 140
Ivanic, R., 24 Lloyd-Jones, R., 46
Loacker, G., 9, 175, 187, 216, 218
Logan, C. R., 98
J
James, D., 162, 163
Janssens, S., 21, 87 M
Jenkins, A., 70 Macdonald, R., 9, 10, 125, 193, 211
Jessup, G., 66, 155 Macfarlane, R., 116
Johnson, E. G., 99 MacFarlane-Dick, D., 23, 35
Jones, D. P., 134 MacGregor, J., 137
Joughin, G., 1, 2, 5, 9, 10, 13, 19, 22, 193, Macrae, S., 154, 160
211, 215 Mager, R. F., 69
Maguire, M., 154, 160
Mallon, W., 200
K Malott, R. W., 134
Kallick, B., 146, 151 Marshall, B., 157, 158
Kamvounias, P., 106 Marshall, C., 147
Kane, M., 95, 96 Martens, R., 91, 92
Karran, T., 80 Marton, F., 19, 21, 90, 218
Keeves, J. P., 194 Maslen, G., 119
Kezar, A., 202 Matthews, K. M., 202
Kirschner, P. A., 95, 106 Mayford, C. M., 98
Knight, P., 3, 7, 15, 39, 72, 75, 76, 77, 78, 79, Mayrowitz, D., 87
80, 194, 195, 196, 199 McCabe, D., 120
Kolb, D., 185 McCune, V., 23
Koper, P. T., 66 McDermott, R., 206
Kramer, K., 158 McDowell, L., 22, 88, 91, 93, 107
Krathwohl, D. R., 77 McKeachie, W. J., 134
Kubiszyn, T., 145 McKenna, C., 107
Kuh, G., 134 McNair, S., 155
226 Author Index

McTighe, J., 178 Pike, G. R., 139


Meadows, D., 198 Plugge, L., 194
Meehl, P. E., 46 Polanyi, M., 53, 56
Mentkowski, M., 134, 175, 187 Pollio, H. R., 65
Merry, S., 46, 106 Pond, K., 107
Messick, S., 8, 95, 96, 97, 101, 102, 106, 134 Pope, N., 107
Meyer, G., 20 Power, C., 118
Miller, A. H., 195 Powers, D., 100
Miller, C. M. L., 17, 18, 195 Prenzel, M., 158, 161, 165
Milton, O., 65 Price, M., 46, 117
Moerkerke G., 88, 91, 94, 106 Prosser, M., 72, 88, 90
Moon, J., 146
Moran, D. J., 134
Morgan, C., 46 R
Muijtjens, A., 100 Rajaratnam, N., 99
Ramsden, P., 17, 22, 23, 218, 220
Reay, D., 154
N Reiling, K., 46, 106
Nanda, H., 99 Rigsby, L. C., 86, 87
Nash, J., 194 Rimmershaw, R., 24
Neisworth J. T., 98 Riordan, T., 9, 175, 187, 216, 218
Nevo, D., 89 Robinson, V., 118
Newstead, S. E., 129 Rogers, G., 187
Nicol, D. J., 23, 35 Romer, R., 134
Nightingale, P., 19 Rossman, G. B., 147
Nijhuis, J., 106 Roth, J., 187
Norton, L. S., 106, 129 Rothblatt, S., 73
Rowley, G. L., 99
Rowntree, D., 14, 17
O Royce Sadler, D., 45, 156
O’Donovan, B., 117 Rust, C., 19, 46, 117
O’Donovan, R., 46
O’Neil, M., 19
O’Reilly, M., 46 S
Oliver, M., 107 Säljö, R., 19, 21, 90, 218
Orsmond, P., 46, 106 Sadler, D. R., 2, 3, 6, 7, 9, 16, 23, 35, 47,
Ovando, M. N., 136 48, 49, 50, 52, 53, 59, 69, 72, 80, 81,
Owen, H., 206 135, 141, 156
Sambell, K., 22, 88, 91, 104, 107
Saunders, M., 194
P Schatzki, T. R., 31
Palmer, P. J., 134 Schoer, L., 46
Paranjape, A., 146 Schuh, J. H., 134
Park, C., 122 Schwandt, T., 31, 32
Parlett, M., 17, 18, 195 Schwarz, P., 195
Parry, S., 46 Scouller, K., 21, 88, 90
Pascarella, E. T., 134 Scouller, K. M., 90
Patton, M. Q., 147 Seel, R., 203, 204
Pecorari, D., 121 Segers, R., 86, 87, 88, 93, 106, 107
Pendlebury, M., 1 Shavelson, R. J., 88, 99
Pepper, D., 70 Shaw, P., 206, 211
Perry, W., 8, 123, 124, 186 Sheingold, K., 98
Piaget, J., 117 Shepard, L., 86, 100
Author Index 227

Silvey, G., 21 U
Simpson, C., 2, 19, 23, 127 Ui-Haq, R., 107
Skelton, A., 27 Underwood, J., 120
Sluijsmans, D., 93, 106
Smith, J., 46
Snyder, B. R., 17, 18, 195 V
Snyder, W. M., 206 Vachtsevanos, G. J., 71
Spiro, R. J., 88 Van de Watering, G., 100
Stacey, R., 201, 203, 204, 206, 211, 212 Van der Vleuten, C. P. M., 95, 106
Stanley, J. C., 139 Vandenberghe, R., 93
Starren, H., 89 Vermunt, J. D. H. M., 91
Stavros, J. M., 207, 208 Villegas, A. M., 69
Stevens, D. D., 46 Vygotsky, L., 117
Stone, R., 194
Strachey, C., 194
Struyf, E., 93 W
Struyven, K., 21, 87 Wade, W., 107
Suen, H. K., 98 Walvoord, B., 46, 70, 137
Suskie, L., 7, 8, 9, 46, 133, 134, 135, 138, 145, Ward, V., 202
216, 218, 220 Webb, N. M., 99
Szabo, A., 120 Webster, F., 70, 71, 72
Wenger, E., 34, 206
West, A., 46, 172
T White, M. C., 202
Tagg, J., 134
Whitney, D., 207, 208
Tan, C. M., 88, 90, 93
Whitt, E. J., 134
Tan, K. H. K., 72
Wiggins, G., 178
Tang, K. C. C., 21
Wiliam, D., 23, 35, 86, 92, 153,
Tanggaard, L., 34, 37
155, 156
Taylor, L., 120
Willard, A., 100
Terenzini, P. T., 134
Winne, P. H., 92
Terry, P. W., 20
Woolf, H., 46, 69, 72
Thomas, P., 21, 22, 88, 90
Thomson, K., 88
Tilley, A., 129
Tinto, V., 137 Y
Topping, K., 88, 104 Yellowthunder, L., 202
Torrance, H., 154, 157, 163, 167, 169, 171 Yorke, M., 7, 9, 65, 68, 74, 75, 80, 196,
Torrance, J., 194 197, 216
Tosey, P., 203
Trigwell, K., 88
Trowler, P., 194, 196, 199 Z
Twohey, M., 175 Zieky, M. J., 140
Subject Index

A B
Abilities, 7, 47, 105, 163, 175, 177, 180, 183, Backwash, 89, 91, 93
216, 219 Bias, 90, 100, 105, 107
Ability-based curriculum, 9, 175, 179, 189 Bologna Declaration, 191
Ability-based Learning Outcomes:
Teaching and Assessment at Alverno
College, 187 C
Aesthetic engagement, 177, 183 Carnegie Foundation for the Advancement
Agents, 36, 41, 201–202, 210 of Teaching and Learning, 185
Alverno College, 4, 9, 175, 181, 187, CATWOE, 200
189, 216 Change, 1, 4, 74, 204–210, 221
Amotivated learners, 158 Cheating, 107, 121, 122–123
Analysis, 9, 71, 73, 79, 95, 144, 145, 163, 167, Citation practices, 121
172, 177 Claims-making, 80–82
Analytic, 6, 7, 45, 46, 48, 50, 51, 52, 53, 54, 55, Classification, 67, 78
56, 57, 62, 69, 160, 178, 183 Classroom assessment, 35, 86
Analytic rating scales, 51 Clinical decision making, 46
Analytic rubrics, 51, 62 Co-assessment, 85, 88
Appraising, 16, 45, 48, 54, 58 Co-construction of knowledge, 32
Appreciative inquiry, 207, 208, 211 Code of Practice on Assessment, 198
Apprenticeship, 37–38 Cognitive complexity, 8, 99, 100, 101, 108, 109
Approaches to learning, 13, 16, 19–22, 24, Cognitive development, 123, 124
158, 218 Committee on the Foundations of
Assessment at Alverno College, 175, 187 Assessment, 16
Assessment-driven instruction, 90 Communication, 76, 104, 139, 142–143,
Assessment Experience Questionnaire, 19 177, 204
Assessment for learning, 34–37, 85, 91–93, Community of judgement, 30, 39
107, 155, 157, 209 Community of practice, 30, 34, 38
Assessment of learning, 85, 155, 188 Competency grades, 37, 102
Assessment for practice, 38–40 Complex adaptive systems, 4, 10, 201,
Assessment Reform Group, 153, 156 204–205, 208, 221
Authentic assessment, 6, 8, 21, 33, 34, 40, Complexity theory, 201, 202, 204, 208,
47, 54, 58, 61, 87, 91, 94, 98, 99, 100, 101, 211, 215
105, 106, 107, 108, 109, 127, 129, Conditions of learning, 23
161, 217 Conference on College Composition and
Authenticity, 8, 40, 100, 101, 108, 109 Communication, 147
Authentic practice, 33 Connoisseurship, 57, 77, 218
Autonomy, 153, 156, 158, 161, 168, 170, 173, Consequential validity, 88, 91, 93, 101, 102,
184, 207 107, 108, 109, 134

229
230 Subject Index

Constructivist learning theories, 117 Experiences of assessment, 4, 14, 23, 91


Construct validity, 95–97, 102, 104, 105 External examinations, 34, 155
Content validity, 95, 98, 101, 144 External motives, 159
Conversation, 10, 23, 179, 180, 189,
205–211, 221
Conversational framework, 23 F
Co-production of practice, 32 Fairness, 4, 5, 8, 94, 99, 100, 101, 105, 106,
Course work, 155 108, 109, 141, 142, 143
Criteria, 3, 5, 6, 7, 8, 10, 14, 23, 40, 41, 45, 46, Feedback, 2, 6, 10, 13, 14, 22–24, 34, 35,
47, 49–62, 69, 70, 71, 72, 76, 77, 85–88, 95, 36, 38, 45, 48, 49, 51, 52, 58, 59, 60, 61,
96, 97, 99–101, 103, 104, 105, 107, 108, 63, 71, 85, 86, 89, 92, 93, 104, 106, 107,
109, 119, 122, 128, 139, 141, 148, 155, 156, 108, 123, 127, 133, 136, 137, 141, 144,
159, 165, 167, 168, 169, 171, 172, 179–181, 155, 156, 157, 159, 161, 166, 169, 170,
184, 185, 216, 218 179–181, 182, 195, 198, 202, 204, 208,
Criterion-based assessment, 46 209, 210, 219
Critical reasoning, 47 Feedback loop, 23, 202, 204
Critical thinking, 100, 117 Feed-forward, 23, 24, 89
Cue-conscious, 17 First International Conference on
Cue-deaf, 17 Enhancing Teaching and Learning
Cue- seeking, 17 Through Assessment, 191
Curriculum, 7, 9, 17, 18, 66, 68, 72, 74, 85, 87, Formative assessment, 6, 9, 25, 35, 88,
133, 137, 138, 142, 149, 175, 177, 178, 179, 90, 91, 92, 93, 107, 108, 127,
180, 181, 182, 183, 184, 187, 188, 189, 191, 153–173, 217
196, 197 Fuzzy set theory, 71

D G
Dearing Report, 74, 80 Generalisabilty, 77, 96, 97, 99, 102, 103,
Deep approach, 16, 19, 21, 22, 218 105–106, 109
Definition of assessment, 6, 13–16, 215, 217 Generic abilities, 7, 216, 219
Diagnostic assessment, 155 Ghost writers, 120
Directness, 8, 99, 100, 101, 104, 108 Global grading, 46
Disciplines as Frameworks for Student Global perspective, 177
Learning: Teaching the Practice of the Grade inflation, 69, 73–74
Disciplines, 187 Grade point perspective, 18
Discrimination, 145, 146 Grades, 3, 65, 72–73, 74, 77, 78, 82, 125,
149, 159
Grading, 6, 7, 14, 35, 39, 42, 45–62, 65–82,
E 138, 139
Edumetric, 85–109 Graduateness, 75, 81
Effective citizenship, 177
Emergence, 201, 203–204
Emotion, 13, 14, 24, 33, 39, 76 H
Enhancing Student Employability Hard systems, 199–200
Co-ordination Team, 75 The Hidden Curriculum, 17, 18
Evaluative expertise, 49, 58, 60 Higher Education Academy, 75, 211
Examination, 2, 14, 17, 20, 29, 30, 34, 68, 72, Higher Education Funding Council for
81, 90, 98, 118, 128, 129, 138, 139, 140, England, 75
144, 145, 155, 157, 219, 221 Higher Education Quality Council, 71,
Expectations, 9, 29, 48, 69, 70, 71, 81, 75, 81
104, 107, 125, 134, 136, 149, 150, 153, High stakes assessment, 4, 8, 86, 87
154, 163, 165, 169, 170, 171, 172, 180, Holistic assessment, 3, 5, 6, 7, 8, 31, 40,
185, 186, 197 45–62, 69, 138, 157, 216, 218
Subject Index 231

I Multidisciplinary, 32, 96
Improving Formative Assessment, 157 Multiple choice, 2, 20, 21, 22, 90, 94, 134,
Inference, 16, 56, 102, 103 141, 144–146, 147, 150, 219
Institutional impacts, 194–195 Multiple choice tests/Multiple-choice tests,
Integrity, 50, 62, 115, 116, 119, 121, 123, 94, 134, 141, 144, 145, 147, 150, 219
124, 210
Internet Plagiarism Advisory Service, 115
Inter-rater reliability, 98, 99 N
Intrinsic motivation, 90, 158, 160, 165, 170 National Committee of Inquiry into Higher
Introjected motivation, 159, 161, 168 Education, 74, 80
New modes of assessment, 5, 7, 8, 85–109,
215, 217
J Norm-referenced (assessment), 7, 34, 41, 155
Judgement/judgment, 1, 3–4, 5, 6, 7, 8, 10,
13–25, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 76, 77, 78, 79, 81, 82, 93, 96, O
97, 98, 100, 105, 119, 215–221 Objectives, 3, 49, 69, 70, 134, 144, 146, 156,
200, 205
Objective test, 22
L Open book examination, 22
Latent criteria, 58 Oral assessment, 22
Learning culture, 9, 161–173, 187 Oral presentation, 133
Learning to learn, 6, 153 Outcomes, 3, 4, 8, 14, 21, 37, 39, 40, 41, 42,
Learning-oriented assessment, 216, 217 53, 60, 66, 68, 70, 78, 90, 96, 133, 136, 138,
Learning outcomes, 3, 7, 9, 29, 30, 35, 37, 66, 153, 155, 156, 159, 169, 171, 188, 189, 190,
76, 125, 133, 137, 138, 147, 149, 150, 159, 199, 203, 205, 209
172, 175, 176, 177, 180, 182, 183, 184, 187,
188, 189, 190, 191, 193, 197, 218
Learning styles, 135, 185 P
Learning that Lasts, 187 Patch writing, 121
Loose-coupling, 200 Pedagogical validity, 134
Pedagogy, 45, 61, 68, 138, 149, 166
Peer appraisal, 50, 60
M Peer-assessment, 46, 61
Making the Grade, 17, 18 Peer review, 127
Manifest criteria, 58 Perceptions of assessment, 14
Marking, 14, 34, 46, 48, 68, 69, 70, 77, 107, Performance, 2, 3, 7, 9, 10, 14, 23, 32, 34, 35,
137, 169 39, 41, 47, 49, 51, 61, 66, 67, 68, 69, 70, 71,
Massachusetts Institute of Technology, 18 72, 73, 75, 76, 77, 79, 85, 87, 94, 97, 100,
Measurement, 3, 4, 5, 7, 15, 29, 35, 36, 48, 75, 101, 102, 103, 104, 106, 107, 108, 109, 117,
77, 86, 94, 95, 96, 98, 99, 103, 139, 216, 138, 141, 147, 156, 158, 178, 179, 180, 181,
217, 219 185, 188, 189, 190, 191
Measurement model, 15, 35 Performance assessment, 10, 97, 102, 109
Memorising, 91, 107 Personal development planning, 80
Menu marking, 69 Plagiarism, 5, 7, 8, 115–130, 198, 199, 215, 217
Metacognition, 49, 135, 146 Portfolio, 21, 80, 81, 85, 88, 153, 181
Minimal marking method, 137 Post-assessment effects, 89–91
Misconduct, 122 Post-compulsory education, 154–160, 171, 206
Models, 15, 31, 32, 34, 35, 46, 194, 201, 211 Practice, 1, 2, 3, 4, 5, 6, 8, 9, 10, 13, 15, 22–24,
Monitor, 45, 48, 49, 56, 127 29–42, 46, 50, 51, 52, 56, 58, 59, 60, 65, 66,
Motivation, 8, 9, 49, 74, 86, 87, 90, 93, 104, 68, 73, 76, 79, 86, 87, 88, 92, 120, 122, 123,
106, 107, 125, 153, 155, 158–161, 163, 126, 129, 133–150, 155, 156, 161, 163, 166,
165–166, 168–170, 171, 172, 173, 180 170, 175–191, 193, 198, 199, 200, 206, 208,
Multi-criterion judgements, 57 215, 216, 218, 219–220
232 Subject Index

Pre-assessment effects, 89–91 Self-regulation, 23, 36, 82


Primary trait analysis, 46 Situated action, 32
Problem-solving, 76, 104, 180 Social constructivism, 117
Professional judgement, 1, 76, 77, 218 Social interaction, 108, 117, 177, 183, 206
Professional practice, 32, 40 Soft systems, 199–201
Purpose (of assessment), 3, 6, 14, 16, Standardised testing, 94, 216
137, 141 Standards, 1, 3, 4, 5, 6, 9, 34, 35, 39, 41,
51, 52, 55, 68, 69, 71, 72, 74, 86, 87,
89, 90, 94, 95, 96, 97, 98, 100, 105,
Q 109, 136, 140, 144, 145, 155, 189,
Quality assurance, 65, 133, 149, 197, 198, 215, 216, 217
204, 208 Standards-based (assessment), 34
Quality Assurance Agency, 198, 208 Standards for Educational and
Quasi-measurement, 1, 77, 216 Psychological Testing, 4, 96
Student Assessment and Classification
Working Group, 67
R Substantial validity, 101
Referencing, 54, 59, 68–69, 80, 121, 123, 125, Summative assessment, 34, 36, 77, 78, 79,
126, 169 80, 81, 89–91, 92, 93, 107, 154, 155
Reflection, 46, 80, 87, 95, 96, 97, 99, 101, Summative testing, 86, 107
105, 106, 135, 142, 145, 147, 149, 175, Surface approach, 19, 21, 22, 218
176, 189 Sustainable learning, 9, 153–173
Reflective writing, 134, 141, 146–149 Systemic validity, 89
Reflexivity, 36–37 Systems approaches, 199, 208
Regulation, 23, 36, 37, 82
Relativistic students, 124
Reliability, 4, 5, 8, 34, 46, 65, 86, 94, 95, T
97–99, 101, 103, 106, 108, 109, 145 Teaching-learning-assessment cycle,
Reproduction, 90, 94, 117, 118 133, 134
Research, 1, 4, 6, 9, 10, 17, 18, 19, 20, 22, Temporal reliability, 46
23, 24, 29, 35, 46, 52, 53, 71, 86, 87, 88, Test, 2, 20, 21, 22, 37, 38, 90, 94, 95,
90, 92, 93, 96, 100, 107, 118, 124, 134, 135, 96, 97, 98, 99, 103, 106, 118, 140,
136, 139, 142, 145, 147, 153, 156, 157, 158, 144, 145, 147, 150, 164, 178, 179,
176, 178, 181, 184, 186, 187, 193, 194, 195, 180, 197, 201
196, 197, 200, 202, 206, 207, 210, 215, Test bias, 90
218–219 Test blueprint, 144, 145
Rubrics, 46, 51, 52, 54, 60, 62, 134, 137, Test-driven instruction, 90
141–144, 147, 150 Transformation, 22, 122, 191, 200
Rules, 45, 51, 53, 54, 57, 62, 72, 73, 76, 116, Transmission, 156, 158
122, 123, 124, 157, 201, 202, 203 Transparency, 8, 51, 99, 100, 101, 104, 108,
109, 171, 172

S
Scholarship, 15, 176, 184, 185, 186, 189, 215 U
Scholarship Reconsidered, 185 Understanding, 1, 4, 5, 6, 9, 10, 14, 19, 23,
Scholarship of teaching, 185 39, 48, 71, 80, 103, 117, 118, 122, 123,
Selection, 29, 34, 49, 54, 79, 94, 117, 125, 129, 135, 138, 141, 146, 153,
155, 171 154–163, 165, 166, 167, 170, 177, 178,
Self-assessment, 10, 15, 23, 106, 156, 163, 179, 182, 187, 191, 193, 194, 199, 200,
166, 170, 216 201, 202, 205, 207, 208, 215, 216, 217,
Self Assessment at Alverno College: Student, 218, 219
Program, and Institutional, 187 Unseen exam, 22, 219
Self-monitoring, 48, 49, 60, 61, 62, 80, 220 Up to the Mark, 17
Subject Index 233

V W
Validity, 4, 5, 8, 45, 51, 53, 54, 55, 65, 66, 72, Warranting, 77, 78, 79
73, 78, 86, 88, 89, 91, 93, 94, 95–97, 98, 99, Watchful anticipation, 203
101, 102, 104, 105, 106, 107, 108, 109, 134, Wellesley College, 18
144, 171 Who Needs Harvard?, 175
Valuing in decision-making, 177, 183 Wicked competencies, 3, 7
Vocational, 153–173 Work-based assessment, 74, 75, 76, 154
Vocational Qualifications, 66, 154, Write Now, 116, 121
155, 167 Written assignment, 22, 24, 47, 198

Potrebbero piacerti anche