Sei sulla pagina 1di 23
Seediscussions,stats,andauthorprofilesforthispublicationat: Three


Article in Assessment&EvaluationinHigherEducation·July2015





READS 148 1author: DRoyceSadler UniversityofQueensland 45 PUBLICATIONS 2,727






Author’s final manuscript. This article was published online in July 2015. It will be assigned to an issue of the journal with final page numbers in due course. Publication details: Sadler, D. R. (2015 online): Three in- course assessment reforms to improve higher education learning outcomes, Assessment & Evaluation in Higher Education, DOI:10.1080/02602938.2015.1064858

Three In-Course Assessment Reforms to Improve Higher Education Learning Outcomes

D. Royce Sadler School of Education, The University of Queensland

Abstract A current international concern is that, for too large a proportion of graduates, their higher-order cognitive and practical capabilities are below acceptable levels. The constituent courses of academic programs are the most logical sites for developing these capabilities. Contributing to patchy attainment are deficiencies in three particular aspects of assessment practice: the design and specifications of many assessment tasks; the minimum requirements for awarding a passing grade in a course and granting credit towards the degree; and the accumulation of points derived from quizzes, assessments or activities completed during the teaching period. Rethinking and reforming these would lead to improvements for significant sub-populations of students. Pursuing such a goal would also have significant positive implications for academic teachers but be contingent on favourable contextual settings including departmental and institutional priorities.

Keywords: generic skills, higher education competencies, learning outcomes, academic standards, higher education grades


This article is mainly about cognitive capabilities that are important in most academic fields:

proficiency in thinking, reasoning, synthesising, conceptualising, evaluating and communicating. These ‘higher-order’ capabilities form a subset of what are also variously called ‘intended learning outcomes’ (Biggs and Tang 2011); or some combination of ‘generic’, ‘graduate’ or ‘higher education’ with ‘competencies’, ‘skills’ or ‘attributes’. With the rapid expansion of higher education worldwide, it is natural to ask about the extent to which all students can demonstrate adequate levels of such ‘higher-order’ capabilities by the time they graduate. But what is meant by ‘adequate’? This is the fundamental question. A

number of agencies and commentators referenced in the next section have alleged that while many graduates do achieve desired standards, many others do not. This article is based on the premise that the most logical, direct and appropriate site for developing capabilities is within the courses that constitute degree programs. Research by Jones (2009, 2013) has demonstrated that interpretations of the competences differ from field to field, sometimes widely. This is the nature of disciplines. However, there are reasonable grounds for believing that capabilities developed thoroughly in one context a particular course or sequence of courses normally have a transferable element to them. This allows them to be reconfigured and repurposed for use in other contexts at other times. As Strathern (1997, 320), an anthropologist, explained it:

In making transferable skills an objective, one cannot reproduce what makes a skill work, i.e. its embeddedness… [W]hat is needed is the very ability to embed oneself in diverse contexts, but that can only be learnt one context at a time…[I]f you embed yourself in site A you are more likely, not less, to be able to embed yourself in site B. But if in Site A you are always casting around for how you might do research in B or C or D, you never learn that. There is a lesson here for disciplines. …Somehow we have to produce embedded knowledge: i.e. insights that are there for excavating later, when the context is right, but not until then … [W]e have not to block or hinder …the organism's capacity to use time for the absorption of information …time-released knowledge or delayed-reaction comprehension. [Capitalization in the original].

Reforming three particular assessment practices would increase the likelihood that more students, especially those currently at the minimum ‘pass’ level, would achieve the levels expected of all graduates. The three form a mutually interdependent package. They are: the design and specification of assessment tasks; the requirements for a Pass; and the design of course assessment programs. Wherever these are not currently being practiced as aspects of normal institutional quality assurance, they amount to reforms that require enabling changes to be made elsewhere in the learning environment.


Two widely read books by Bok (2006) and Arum and Roksa (2010) respectively describe unevenness in graduate outcomes as perceived in the USA. Bok (2006, 7-8) wrote: ‘Survey after survey of students and recent graduates shows that they are remarkably pleased with their college years’. Overall, they also ‘achieve significant gains in critical thinking, general knowledge, moral reasoning, quantitative skills, and other competencies’. At the same time and fully compatible with that, ‘colleges and universities, for all the benefits they bring, accomplish far less for their students than they should. Many seniors graduate without being able to write well enough to satisfy their employers’ (8) by expressing themselves ‘with clarity, precision,… style and grace’ (82). ‘Many cannot reason clearly or perform competently in analysing complex, nontechnical problems, even though faculties rank

critical thinking as the primary goal of a college education’ (8). ‘The ability to think critically to ask pertinent questions, recognize and define problems, identify the arguments on all sides of an issue, search for and use relevant data, and arrive in the end at carefully reasoned judgments – is the indispensable means of making effective use of information’


Here, Bok has raised quite specific concerns. They may be valid to a greater or lesser extent for particular institutions, academic degree programs or component courses there is usually no independent way of telling. However, his portrayal of the situation in the USA resonates with similar concerns raised in other countries. These are reflected in the number of national and international discussions, policies, projects, regulations, instruments and forms of cooperation aimed at assuring graduate outcomes (Australian Learning and Teaching Council 2010; Bergan and Damian 2010; Lewis 2010; Williams 2010; Douglass, Thomson and Zhao 2012; Blömeke, Zlatkin-Troitschanskaia, Kuhn and Fege 2013; Dill and Beerkens 2013; Sadler 2013b; Shavelson 2010, 2013; Tremblay, 2013; Coates, 2014). Part of the overall unease is because, globally, higher education has expanded rapidly without matching increases in public funding directed specifically towards teaching. Despite what may seem an overwhelming challenge, progress could be made by ensuring that the course grades entered on students’ academic transcripts can be trusted to represent adequate levels of the expected graduate competencies. Across a full degree program, the transcript reports student performance on a large range of demanding tasks, in a wide variety of courses, studied over a considerable period of time, and covering substantial disciplinary and professional territory. Specialised tests of graduate competencies are not set up to do this (Shavelson 2013). If third parties are to draw reasonably robust conclusions about a graduate’s acquired overall capability or competence, the grades on transcripts must be trustworthy.

Reform 1: Assessment task design and specifications

Grading a student’s performance involves drawing an inference from what the student produces. The quality of the inference depends on several factors, two obvious ones being the quality of the data (the student production) and the ability of the assessor. The quality of the data is the focus here. Ordinarily, students respond to assessment items. An ambiguous item is unlikely to give rise to good quality data because different students will most likely interpret the item differently. This is why the stimulus needs to be both well designed and clearly specified. It must set up a fresh problem to be solved, a question to be answered, an issue to be addressed, or a position to be critiqued or defended. Students may or may not be able to do the task well, but at least there is no excuse for not knowing the type of response required. To make this concrete, consider this poor assessment task: ‘Write an essay on directive and supportive leadership styles’. Any student who simply writes separate detailed accounts of the two leadership styles technically fulfils the requirements of the task description. However, high performing students delve deeply into a topic as a matter of course. They may, for example, describe the two leadership styles briefly but then go on to analyse

similarities, differences, and the superior fit of one of the styles for a particular purpose. (Many other possibilities exist.) These students’ comprehensive understanding of both the topic itself and the assessment context leads them to be analytical rather than descriptive, and high marks typically follow. Regardless of the actual form of the assessment task specifications, examiners and markers find that student responses generally range from low to high quality for any reasonably sized student group. In some examinerseyes, this range would be sufficient evidence to conclude that the structure and content of the assessment task is unproblematic, thus reinforcing the status quo. That reasoning is faulty. An example of better design for the leadership styles task would be to set up some scenario involving two particular types of organisations (say, a voluntary association and a business employing mostly casual staff). Ask students to explain which leadership style, or which aspects of each, might suit the two organisations. Making the intention clear in this way makes separate descriptions unnecessary, because how well students know the two styles will be evident in their responses. The improved design also makes it reasonable to hold all students to the task requirements. This is an important consideration if the evidenceof achievement is not to be compromised by poor item structure. ‘Poor quality

evidence of a student’s

achievement’ (Sadler 2014b, 286). In general, tasks need to stimulate higher-order thought processes such as:

hypothesising; extrapolating (or interpolating); exploring and articulating relationships among things; estimating the likely effects of varying the parameters of a system; redesigning something to suit a new purpose; using analogues as explanatory tools; outlining and defending a scenario; and evaluating inadequacies or errors in solutions or arguments. Given the huge variety of expected outcomes in different disciplines, fields and professions, academics in those fields are best placed to determine the nature of well-formed questions that push the students into the right amount of unfamiliar territory. Ideally, the task specifications identify for all students the genre of response required. Critical reviews, arguments, underlying assumptions that have to be identified, and causal explanations are all distinct response genres (Sadler 2014a). This does not mean that students should be given copious instructions on how to go about the task or detailed rubrics and statements of criteria and standards of the type often recommended (Grunert O'Brien, Millis and Cohen 2008). It means they have the right to know the genre for their response. It is both illogical and counterproductive to appraise the quality of a student work as a member of a particular genre if the work is not actually a member of that genre. (The concept of response genreis not identical with writing genreas Gardner and Nesi [2013] use the term in connection with teaching academic writing to students.) Creating demanding assessment tasks from scratch is hard work if the tasks are to tap into higher-order operations on ideas and information. A straightforward way to proceed is to collect a broad range of existing tasks that require students to construct responses of considerable length. Sources include previous assignment tasks, project descriptions and examinations in the field. Similar material from related discipline fields may also prove useful for ideas. Academics, individually or in groups generally can, without special tuition or much difficulty, scrutinise the materials, broaden their own insights, and differentiate them according to quality. In so doing, they expand their own understanding of the

achievement must not be confused with evidence of poor

possibilities and can decide which to avoid, emulate or adapt to suit their own context and purpose. They can also imagine themselves as students faced with responding to particular task specifications, trying to figure out how they would proceed. Potential sources also include real-life problems in the relevant field. These may be of special value in assessing graduate capability late in a degree program. Although it may not be feasible to deal with the complexities of the full problem in its context, doing away with unnecessary detail has to be balanced against the cost of providing students with experience in deciding for themselves what is necessary and what may be safely discarded to make the problem amenable to solution (Taylor 1974).

Iterative improvement of task specifications

Professional test developers routinely engage in revising task designs and specifications in the light of experience. In higher education, a simple but revealing check on whether an assessment task actually requires higher-order thinking and production is to ask one or two competent others. They should interpret the wording of the specifications literally and either indicate the absolute minimum that could be done to complete the task, or better still, actually attempt the task itself. A more thorough check is to compare task specifications with actual student works or performances and analyse how students responded to the tasks. This process is passive and distinctly different from that used to score responses. What is sought is at least a partial diagnosis of any deficiencies in the task design or specifications. Where at least some responses technically do fall within a literal interpretation but are much simpler than was intended, it may not have been imagined that such interpretations would be possible. At the opposite extreme is a response that really 'captures' what was intuitively hoped for but not fully conceptualised when the specifications were written. Capitalise on that for the future. The final check is to consult students themselves (Hounsell 1987) rather than try to infer what they ‘must have been thinking’ as they went about the task. This is the only independent source that can confirm or disconfirm their understanding or reactions (Alderson 1986). What went on in their heads while they were working out how to respond to the task, and then during the planning and production phases? Were they surprised by how the quality of their work was appraised?

Intended learning outcomes

To digress briefly, it might be thought at this point that assessment task design should start with statements of course objectives, graduate capabilities or intended learning outcomes. Biggs and Tang (2011) recommended this as foundational to achieving what they termed constructive alignment. The treatment above, however, began directly with consideration of assessment task design and specification. The rationale for this is as follows. Statements of objectives typically use abstract terminology to frame higher-order cognitive competencies such as ‘critical analysis’, ‘problem solving’ and the like. These terms are open to wide interpretation (Weissberg 2013), and adding more words cannot solve the

problem. The explanation can be found in Sadler’s (2014b) parallel argument about the impossibility of expressing academic achievement standards in verbal or other codified form. The same reasoning applies to learning outcomes. The key terms in the language used cannot be interpreted unambiguously. They ‘float’ according to context because they have imagined rather than concrete referents. On the other hand, assessment tasks and specifications are material formulations that can be exhibited, argued about and administered. They provide the sharpest and most direct tool available for discussing, clarifying and communicating course intentions for students and academics alike.

Reform 2: Grading at the Pass level

Many of the objects, products and processes used in everyday life have their quality governed by external standards that are set by some recognised authority and sharply discriminate pass from fail. Independent licensed testing agencies apply these standards using calibrated testing procedures. No corresponding infrastructure exists for marking, grading and reporting course-based student achievement in higher education. Exactly what ‘standards’ and ‘comparability’ mean receives relatively little attention. Yet markers constantly need to make sound judgments about the quality of work in order to infer underlying competence or capability. A central issue is where to pitch the course grade boundaries. An especially important one is the lower boundary for a Passing grade, because that usually determines whether credit will be granted towards the degree. Where should that lower boundary be set so that when all courses are taken together, the result satisfies discipline-based expectations, professional accreditation requirements and the capabilities society expects of all higher education graduates? In ordinary conversation, something is said to be ‘passable’ when it is adequate or satisfactory for the purpose. The speaker initially assumes and hearers could clarify if they need to – what ‘adequacy’ means in the context. Sometimes, tone of voice can indicate that the requirements must technically be met, even if only just. The following has been distilled from several existing definitions:

Pass (v): to demonstrate attainment, achievement or proficiency at or exceeding a level accepted as satisfactory but not necessarily of the highest level; to satisfy fully a minimum agreed performance requirement; to show sufficiency or adequacy to purpose; to meet expectations, conform to specifications or reach some fixed and approved standard.

How do institutions conceptualise what should count as a pass? Some clues can be found in their published grade descriptors, where these exist, although the statements may not necessarily correspond closely with actual grading decisions. Consider the five statements in Table 1 outlining what a Pass represents in five different institutions, all obtained from their web sites. All use the word ‘Pass’ explicitly as a grade label or refer to a ‘pass’ in associated documentation. In some cases, conditions apply. For example, the number of courses that

can be passed at the minimum level and also credited towards a degree may be strictly limited. In these statements, expectations range from concessions to students who stay the full length of courses but may actually learn very little through to notionally adequate levels of capability. Also in there can be found open tolerance of low levels of performance on higher-order objectives (specifically, the ability to make sound judgments, act independently, engage in analysis, and communicate clearly) and specific endorsement of participation in class towards course credit. Participation is not strictly an element of ‘achievement’ or ‘competence’ at all. Taken together, these grade descriptors send mixed messages about what it means to ‘pass’.

Table 1 Five grade descriptors for the lowest level of achievement in a course for which credit can be counted towards the degree. Conditions may apply.



Grade Descriptor

1 50-59


2 D (D+, D, D-)

3 40-49

3rd Pass

4 E

5 D

Satisfactory. Demonstrates appreciation of subject matter and issues. Addresses most of the assessment criteria adequately but may lack in depth and breadth. Often work of this grade demonstrates only basic comprehension or competency. Work of this grade may be poorly structured and presented. (Monash University).

Earned by work that is unsatisfactory but indicates some minimal command of the course materials and some minimal participation in class activities that is worthy of course credit toward the degree. (Harvard University, College of Arts and Sciences).

Acceptable attainment of most intended learning outcomes, displaying a qualified familiarity with a minimally sufficient range of relevant materials, and a grasp of the analytical issues and concepts which is generally reasonable, albeit insecure. (University of Stirling).

Sufficient: A performance that meets the minimum criteria, but no more. The candidate demonstrates a very limited degree of judgement and independent thinking. (University of Oslo).

Deficient in mastery of course material; originality, creativity, or both apparently absent from performance; deficient performance in analysis, synthesis, and critical expression, oral or written; ability to work independently deficient. (Dartmouth College).

* Grade code as entered on academic transcript.

Although these formal grade descriptors indicate particular orientations, the definitive measure of the adequacy of an institution’s standards is whether the lowest-performing students who gain credit for a course achieve higher-order objectives to a sufficient degree. In the case of written responses, that includes the quality of writing. This can be determined only by scrutinizing student responses to well-constructed assessment tasks. If a grade of D is officially the lowest on the credit-earning scale but all students gain at least a B-, the salient issue is whether the work awarded a B- deserves credit in terms of higher-order outcomes. At the upper end of the grade scale, the issue is whether all students who gain the highest available grade really do demonstrate excellence or a high level of distinction. This is not the end of the story, and key questions still need to be asked: What is meant and implied by acceptable standards or to a sufficient degree? How can appropriate standards be set collaboratively so as to reflect a broad consensus? What is required to give course grades integrity and currency across courses, programs and institutions? How may standards be given material form so they can remain stable reference points over time? These have been at least partially addressed both theoretically (Sadler 2013a, 2014b) and in field trials (Watty, et al. 2014).

Reform 3: Redesigning course assessment plans

This Reform is about the timing, purpose and structure of assessment during and at the end of a course. Specific aspects are the practice of combining marks awarded during a course with those awarded at the end; ensuring that assessment during a course functions formatively; and changing the parameters of summative assessment for grading.

Accumulation of marks

In theory, a course grade is meant to represent a student’s level of capability attained by the end of a course. ‘[G]rading… is the assignment of a letter or number to indicate the level of mastery the student has attained at the end of a course of study’ (Schrag 2001, 63). It is literally the out-come that goes on record. This is entirely consistent with the customary (and legitimate) way of expressing intended learning outcomes: ‘By the end of this course, students should: …’. Whether the actual path of learning is smooth or bumpy, and regardless of the effort the student has (or has not) put in, only the final achievement status should matter in determining the course grade (Sadler 2009, 2010b). However, in many higher education institutions, accumulating marks or points for work assessed during a period of learning (continuous assessment) is the prevailing practice, mandated or at least endorsed by the institution. Readily available software provides bookkeeping tools for it. These make it easy to progressively ‘bank’ marks, then weight and process them at the point of withdrawal for conversion into the course grade. The common arguments for accumulation are essentially instrumentalist (Isaksson 2008). The purpose is not so much to help learners attain adequate levels of complex knowledge and skills by the end of a course as to keep them working and provide multiple

opportunities for feedback. In any case, so it is argued, students need, expect, appreciate and thrive under continuous assessment (Trotter 2006; Hernández 2012). However, notwithstanding its superficial appeal, accumulation actually diverts attention from the goal of achieving a satisfactory level by course end. First, accumulating performance measures during the learning period maps the shape of the actual learning path into the grade (Sadler 2010b). In general, the context and actions of both teacher and learner influence the rate and depth at which learning occurs. For many students, coming to grips with and then and overcoming false starts, errors, bumbling attempts and time spent going up blind alleys lead to deep understandings by the end of a course. Students can take bold risks that end in disasters and safely ‘make’ conceptual connections that later have to be unlearned. For well over a century, the role of spacing during the total time available for developing high-order knowledge and skills has been extensively studied. This research provides robust findings on how humans learn (Ebbinghaus 1885; Bloom 1974; Conway, Cohen and Stanhope 1992; Rohrer and Pashler 2010; Budé et al. 2011). This is ‘especially marked in sequential learning in which competence is attained only after a series of learning experiences that may take months or years to complete before the learner has developed a satisfactory degree of attainment in the field’ (Bloom 1974, 682). Second, accumulation lends itself to awarding and banking marks for a variety of non-achievements for the purpose of influencing student behaviour. Marks are used to incentivise and reward student effort, engagement in preferred activities, completion of exercises or work stages, and participation. These behaviours and activities may well assist learning, but they do not constitute the final level of achievement, or even part of it. On the debit side of the ledger, marks may be deducted to penalise late submission, cheating or plagiarism. The cost of using marks to modify behaviour is contamination of the grade. Other ways have to be found. Quite apart from behaviour management, many students insist they have a moral right for aspects other than unadulterated achievement to be included in their grades (Zinn et al. 2011; Tippin, Lafreniere and Page 2012). Overall, the banking model takes data from non-achievement contaminants, early ‘deficits’, and idiosyncratic paths of learning and mixes them all into the final grade. The grade is then logically impossible to disentangle and hence interpret (Brookhart 1991; Sadler 2010b). Equally serious is that no coherent concept of a ‘standard’ can apply to such a mishmash of data. Finally, although accumulating marks may succeed in motivating and focusing student effort, the pressure and drive typically ease off once the ledger balance approaches the Pass score cut-off. This allows students to sidestep the challenge of gaining a command over the course as a whole, especially its higher-order objectives. Put another way, accumulation invites students to valorise externally offered proximate goals at the very time that the eventual goal should be kept front and centre in their minds. A person’s perspective on the fullness of the eventual goal to be achieved, or the central purpose to be served, can play a determinative role in how they approach and manage their own learning, and the task of becoming competent (Sommers 1980; Entwistle 1995; Sadler 2014a). A steady stream of extrinsic rewards is a poor substitute for developed intrinsic rewards where students take primary responsibility for their own learning. Extrinsic rewards work directly against the

students-as-learners maturation process in which they progress towards becoming independent, self-directed, lifelong learners.

Designing during-course formative assessment

When the drag imposed by accumulating low or irrelevant marks is eliminated, during- course tests and assessments are freed up to function purely formatively. ‘Purely’ indicates high stakes for learning but zero influence on the summative grade. True, in a broader context, students may use information about end-of-course performance in one course to improve their performance in subsequent courses, but that is a different issue. Within a single course, formative and summative assessments need to be clearly separated so that they can serve their respective purposes. Given a set of course objectives, formative assessment is commonly viewed narrowly as giving students assessment tasks and then feedback so that they can improve (Nicol and Macfarlane-Dick 2006). Despite all the effort typically invested in creating better and better feedback, it too often makes practically no difference. Sadler (2010a, 2013c) argued that the principal reason is that feedback is basically about telling students the transmission model of teaching transposed into an assessment setting. The alternative is to offer students formative assessment opportunities that provide authentic evaluative experience of the type they need in order to become better able to recognise, monitor and control the quality of works they themselves are to produce. This matter is discussed at length in Sadler (1989, 2010a, 2013c, 2014a). Students need to be exposed to a variety of complex tasks and their corresponding response genres. This immerses them in decision spaces that are similar to those inhabited by marker-teachers in which judgments are made about whether a particular work falls within the required response genre, and if so, the macro-level and micro-level determinants of its quality. Students need to become competent not only in making judgments about their own works but also in defending those judgments and figuring out how those works could have been made better. This involves learning to notice aspects that make a difference to quality, and to pass over those that make only negligible difference. In other words, they need practice in appraising works holistically, so that they come to understand how the appropriate use of smaller-scale tactics enable a larger-scale purpose to be accomplished. ‘This type of “seeing” typically goes unrecognized in most of the research on assessment for learning, where the focus has been on feedback’ (Sadler 2013c, 58). Part of the agenda is specifically and deliberately to induct students into appreciating the types and ranges of problems, issues or questions that could legitimately be set as assessment tasks in the course. Multiple assessment tasks that demand complex cognitive and other capabilities serve multiple purposes that include conveying the intended learning outcomes for the course and equipping students for summative assessment at the end of the course. Students need to be challenged with problems that develop, activate and coordinate the same cognitive processes and professional skills they will need as graduates. By definition, capability implies the freedom, versatility and adaptability to tackle successfully problems that have not been delineated or anticipated in advance, and to do so on demand,

unaided and to a satisfactory standard. There are not just a handful of stereotypic problems or types of tasks that characterise the course but a wide range of possibilities that entail diverse cognitive and practical skills in different combinations. Research by Entwistle (1995) showed that the best preparation for course examinations comes about only by having a thorough grasp of the whole course. Sound assessment plans, tasks and specifications are crucial to this. The choice of assessment task format is an important meta-parameter in the design of course assessment programs, both formative and summative. Extensive use of multiple choice tests reduces if not eliminates altogether the number of written prose responses, and with it a valuable opportunity to develop competence in discipline-focused writing. Creating precise and cogent prose promotes high-level learning primarily because it requires ‘careful, probing thought’ (Bok 2006, 103). In her classic 1977 article, Emig wrote that ‘Clear writing by definition is that writing which signals without ambiguity the nature of conceptual relationships, whether they be coordinate, subordinate, superordinate, causal, or something other’ (126). In Sternglass’s research, students repeatedly reported that: ‘Only through writing [papers of a type that]…required them to integrate theory with evidence did they achieve the insights that moved them to complex reasoning about the topic under consideration’ (1997, 295). Bok (2006), Zorn (2013) and many others have argued that the best site for developing good writing is within the disciplines themselves, not separately as a specialist activity.

Re-inventing end-of-course summative assessment

The plan for summative assessment in a course amounts to more than just ensuring that each assessment task is well designed and specified. A basic tenet of assessment is that the evidence of academic achievement should be unquestionably the student’s own work. Common threats to the integrity of achievement data include cheating, collusion, plagiarism, outsourcing term papers and using substitute test takers. Increasingly sophisticated digital technologies and telecommunications have contributed to the problem (Park 2003; Walker 2010). The traditional way of satisfying the secure data requirement has been to use previously unseen assessment tasks in invigilated, time-restricted written examinations. These assessment formats typically fail to take advantage of the technologies and tools of production currently used in most workplaces use of the keyboard input, office productivity software, the internet and web searching. Exploring ways to address this is an active area of research. Williams (2006) and Williams and Wong (2009), for instance, have trialled forms of open-book, open-web assessments. Composing text and manipulating data using a keyboard (rather than pen and paper) are now so common that these tools should be readily available for candidates. Editing of text, in particular, has the potential to improve learning during testing. Sommers (1980) highlighted the special role that editing and revising one’s work can play in creating and clarifying meaning:

[E]xperienced writers seek to discover (to create) meaning in the engagement with their writing, in revision. They seek to emphasize and exploit the lack of clarity, the

differences of meaning, the dissonance, that writing as opposed to speech allows in the possibility of revision. Writing has spatial and temporal features not apparent in speech words are recorded in space and fixed in time which is why writing is susceptible to reordering and later addition. Such features make possible the dissonance that both provokes revision and promises, from itself, new meaning. (386)

Additional concerns about traditional examinations have their roots in typical examination conditions. Students often experience considerable stress because of both the strict time limits and the ‘summary’ nature of high stakes, make-or-break events. In some cases, medical researchers have explored coping strategies and the possible use of medication (Edwards and Trimble 1992). Removing or relaxing problematic examination conditions could well include making time limits generous (within reasonable limits) and allowing review time and re-examination (with an accompanying fee if necessary). If it is objected that all students in a course should perform under identical conditions, the reply is straightforward. Students with special needs typically have accommodations made for them, but within any course, some students may be just below the threshold at which special accommodations would apply. In addition, the quality of a student’s response as appraised against standards rather than against other students’ work is a clearer indicator of their capability than the speed of task completion. This section is concluded with two observations that apply regardless of the mode or medium of response: efficiency and sampling. An efficient plan results in high levels of valid achievement information relative to the costs of getting it including time in setting and marking student work, and administrative overheads. Appropriate sampling involves coverage across both the course subject matter (a preoccupation with many examiners) and the range of relevant intended higher-order outcomes. These two together are somewhat analogous to evaluating the economic potential of a mineral deposit by drilling a series of cores into a prospective ore body to test its lateral extent and its richness (Whateley and Scott 2006). Emphasising depth in thinking and precision in expression may well result in higher quality but more condensed outputs.

Implications for students

In many higher education institutions, reforms along the lines sketched out above would shift a significant measure of responsibility from the educational environment (teacher, program director, resources, technology and the institution) to the students themselves. For students to rise to the challenge of passing each course without concessions of any kind, they have to set their priorities, and coordinate the resources over which they have control (prior knowledge, personal talent, effort and time) in order to gain credit towards their degrees. For that, they need a clear sense of ‘future-mindedness’ or ‘prospection’ (Osman 2014; Seligman et al. 2013). This makes sense for the student only when the socially constructed context or ‘order’ in which they will live when they finish their degrees is sufficiently stable or secure for them to know all the effort will have been worthwhile (Schatzki 2001). In the short term,

this involves attending to course achievement goals as they come, and for each, a sense of agency over personal performance.

Goal setting

Extensive research over several decades in a wide variety of field and laboratory settings has investigated the impact that so-called hard goals have on task performance. Progressive reviews of this work are available in Locke et al. (1981), Locke et al. (1990), and the first and last chapters of Locke and Latham (2013). Hard goals are specific and clear rather than general or vague, difficult and challenging rather than simple or easy, and closer to the upper limit of a person’s capacity to perform than was their initial level of performance. Goals that require students to stretch for them generally lead to substantial gains in performance. They act to focus attention, mobilise effort and increase persistence at a task. In contrast, do-your- best goals often fare little better than having no goals at all. As one would expect, the degree of improvement is moderated by other factors, including the complexity of the task, the learner’s ability, the strategies employed and various contextual constraints (Locke et al. 1981). However, the general conclusion is that ‘…an individual [cannot] thrive without goals to provide a sense of purpose… If the purpose is neither clear nor challenging, very little gets accomplished’ (Locke and Latham 2009, 22). Arranging the learning environment so that all students have an adequate grasp of the higher-order outcomes stated in course outlines is a clear imperative for universities and colleges. Setting standards that some students initially see as tough and possibly even unfair or coercive, depending on their initial expectations is part of that. Serious students adapt pragmatically to hard constraints provided the settings are known, fair and relevant. The consequences of a hard-earned Pass are highly positive in terms of both credit towards the degree and personal sense of accomplishment. Carried out ethically, hard goals work constructively for the student in both the short and the long term (Sadler 2014b).

Student sense of agency

The nub of student agency is the belief that, in the matter of passing courses and gaining credit, one is significantly responsible for one’s own learning and achievement. This is captured nicely by Pacherie (2008, 195), who drew a distinction between a long-term and an ‘occurrent’ sense of agency. The long-term sense is ‘a sense of oneself as an agent apart

from any particular action

future actions are given a general coherence and unified through a set of overarching goals, motivations, projects and general lines of conduct’. The occurrent sense is that which ‘one experiences at the time one is preparing or performing a particular action’. Pacherie was not writing specifically about higher education, but Lindgren and McDaniel (2012) were:

‘Agency can shape both the process and the outcomes of student learning… People are more driven to achieve the agendas they set for themselves’ (346). The three reforms outlined above, including formative assessment implemented according to sound principles, can contribute to the growth of student agency in learning not only by imposing rigorous but

a form of self-narrative where one’s past actions and projected

reasonable conditions for students to succeed but also by providing effective developmental support. Being vividly aware that one is in control of one’s actions brings with it a personal sense of responsibility. Frith (2014) summarised an ancient Hellenistic perspective on this, which in essence is that one’s sense of agency is developed through two factors. The first is the cognitive binding that links one’s intentional action (say, considerable effort) to its outcome (Passing the course). The second is the belief that an alternative action (investing little or no effort) would have led to a different outcome (Fail) accompanied by an experience of regret. The second part of this is known as ‘counterfactual reasoning’ because although it is valid to think this way, it is essentially hypothetical, being contrary to what actually happened (Roese 1997). If the likelihood of failure in a course is low or non-existent, the sense of agency is weakened or disappears altogether. For students to gain clarity on a complex course-based achievement goal something radically different from trying to improve by, say, one grade they must understand what high-level achievement looks like and experience for themselves what reaching it entails. Overall, students need to see and appreciate the purpose to be served, experience success in moving towards its attainment, and be motivated, with grit and determination, to follow through to completion. Genuine achievement for which a student works hard and produces a high quality result brings about levels of fulfilment and confidence that come only from possessing deep and thorough knowledge of some body of worthwhile material or attaining proficiency in high-level professional skills. The terms pleasure, satisfaction, motivation and accomplishment have many nuanced and overlapping meanings, but there is little doubt about the legitimacy of ‘pleasure as a by-product of successful striving’ (Duncker 1941, 391). This is categorically different from, in the modern context, having satisfying experiences in the classroom (although the two may co-occur) or experiencing success in winning against others. For some students more than others, developing this type of personal capital demands substantial striving and struggling and induces considerable stress. However, little by way of significant and enduring learning comes cheaply, and experiencing success at something that was originally thought to be out of reach brings a distinctive personal reward, a palpable sense of accomplishment. Not to insist on a demonstration of an adequate level of higher-order capabilities is to deprive students of both an important stimulus to achieve and the satisfaction of reaching a significant goal.

Inhibitors of change

Some inhibitors are conceptual in nature. One of these consists of the multiple meanings attached to the term ‘standard’. Add to that a limited awareness of the need for externally validated anchorage points for standards generally and Passing grades in particular (Sadler 2011, 2013a). Others have to do with assessment practices that detract from the integrity of course marks and grades. Some have been criticized in the literature for decades (Oppenheim, Jahoda and James 1967; Elton 2004; Sadler 2009), but they are now so deeply embedded in assessment cultures they are resistant to change. In addition, new practices

keep coming along and are added incrementally. Accepted uncritically, these often become popular through being labelled as ‘innovative’ or ‘best practice’. They are defended strongly by academic teachers, students and administrators and may even be mandated in institutional assessment policies. Accumulating marks is but one example. The fact that they reduce the integrity of course grades goes largely unheralded. Whether hard goals are actually set and enforced depends on a variety of other factors as well, some of which are related to the grading dispositions of individual academics. At successively higher levels in the chain of authority, the freedom of academics to make significant changes depends on: an enabling and supportive context provided by academic department heads and program directors; the fixedness of the prevailing assessment traditions, grading policies and academic priorities; and requirements externally set by governments or accrediting agencies.

Internal momentum and culture

Recent trends in higher education assessment practice have included: minimising the proportion of achievement data that is secure; allowing lower-order outcomes to be substituted for higher-order as the minimum for course credit; programming the learning path so it is presented in small manageable self-contained steps to facilitate smooth, painless progress from step to step; and markers reading between the lines of poorly composed written work accompanied by making generous inferences as to the students’ level of understanding. The underlying drive is to ensure the least possible discomfort or stress for students (Fiamengo 2013). At the institutional policy level lie: curriculum freedom that allows students the flexibility to pick and choose courses from a wide range to make up a substantial part of a degree program; and credit transfer policies and recognition of prior learning that impose few restrictions. These decrease the effectiveness of coherent sequences of courses specifically designed to promote development of higher-order outcomes, which require considerable time and multiple encounters to mature properly. Institutional factors are also influenced by financial considerations, particularly continuity in total income from student fees and government funding. In principle, assuring the quality of all graduates, maintaining student entry levels and ensuring a satisfactory enrolments-based income stream are not incompatible. However in practice, academic achievement standards can be compromised to avoid rebalancing internal resource allocations to prioritise teaching. At the scale of individual academics, the following statements all have their origins in conversations with academics in universities in different countries. They reveal a range of problematic dispositions and attitudes related to passing courses. ‘Many students are low- entry or from disadvantaged backgrounds. They have limited ability to achieve well on higher-order outcome measures, but they nevertheless benefit greatly from the experience of higher education.’ ‘Students who put in substantial effort no doubt learn something of importance in the process and therefore deserve to pass.’ ‘While it is disappointing when students submit low-level work, there is no guarantee those students would gain employment directly in the fields of their degrees anyway.’ ‘Students who fail courses suffer adverse

personal and social consequences, such as loss of face, additional fees and delay to graduate earnings, so avoiding failing grades is important.’ ‘When students have to pay substantial fees, they expect to pass and in any case would appeal against failure.’ ‘All grade results are reviewed by the Assessment Review Committee and, with very few exceptions, approved without amendment.’ ‘Consistent with the principle of academic freedom, professors must be free to decide, according to their own professional judgments, the grades to be assigned.’ ‘Creative ways are found for students to earn enough marks for them to at least pass, with scaffolding and active coaching there to help.’ ‘Students these days need a qualification even if it means they are not truly qualified at the end. In any case, graduates learn most of what they need to know after graduation.’ ‘Cutting out cumulative assessment and instead, grading according to serious standards would produce high failure rates and consequential loss of income. The institution would not tolerate that’. Finally, ‘I know I am generous in grading, but I need to keep my teaching evaluation scores up so I can look forward to tenure’. Whether there is a causal link between grades and teaching evaluations is debated, but ‘[r]egardless of the true relationship between grades and teaching evaluations, the fact that many instructors perceive a positive correlation between assigned grades and student evaluations of teaching has important consequences when there also exists a perception that student course evaluations play a prominent role in promotion, salary, and tenure decisions’ (Johnson 2003, 49). Most of these comments amount to admissions that things as they exist may not be as they ought to be, but by implication, not much can be done about it. Addressing inflated pass rates at their source by raising actual achievement levels is the only valid means of ensuring grade integrity. No amount of tinkering with other variables, and no configuration of proxy measurements, will make the difference required.


In recent decades, the focus for evaluating teaching quality has been heavily weighted towards inputs (student entry levels, participation rates, facilities, resources and support services) and a select group of outcomes (degree completions, employability, starting salaries and student satisfaction, experience or engagement). Conspicuously absent is anything to do with actual academic achievement in courses. This has allowed a number of sub-optimal assessment practices to become normalised into assessment cultures. One of the consequences is that too many students have been able to graduate without the capabilities expected of graduates, yet this is not necessarily apparent from their transcripts. The focus in this article has been on student outcomes rather than inputs, with particular emphasis on the higher-order capabilities of students. Many students fail to master these, yet they gain credit in course after course and eventually graduate. Directly addressing the deficient aspects of assessment culture and practice could radically alter this state of affairs, but it would require a transformation in thinking and practice on the part of many academics. The ultimate aim is to ensure that all students accept a significant proportion of the responsibility for achieving adequate levels of higher-order outcomes. Bluntly put, no student would be awarded a pass in a course without being able to demonstrate these levels.

For some students, this would necessitate a major change in their priorities. For academics, both their assessment practices and the nature of the student-teacher relationship would change. Undoubtedly, determination to pursue this end would have significant washback effects on teaching, learning, and course and program objectives, but that is intended. The likelihood of success depends on finding a rational, ethical and affordable way to do it. This may require re-engineering some parts of the transition path, creating other parts from scratch, and reworking priorities, policies, and practices to a considerable extent. In particular, it would entail rebalancing institutional resource allocations in order to cater for student cohorts that have become much more diversified. Except for aims geared narrowly to economic and employment considerations, this goal is broadly consistent with traditional and many recent statements of the real purposes of higher education.

Sources for grade descriptors in Table 1

1. Monash University, Assessment in Coursework Units Policy: Grade Descriptors; Course label Pass P (above Credit C; below Fail, F). Accessed 26-May-2015.


2. Harvard University, College of Arts and Sciences; The Grading System. Grade label D (above C; below E). Course Requirements for the Degree: All candidates for the Bachelor of Arts or the Bachelor of Science degree must pass 16.0 full courses and receive letter grades of Cor higher in at least 10.5 of them (at least 12.0 to be eligible for a degree with honors). Additional note: ‘Grades of D+ through D– are passing but unsatisfactory grades.’ Accessed 26-May-2015.


3. University of Stirling, University Common Marking Scheme; Grade label 3rd Pass (above 2.2 Pass; below: Fail-Marginal). Accessed 26-May-2015.

4. University of Oslo Grading system: Grading scale with letter values. Grade label: E Sufficient (above D Satisfactory; below F Fail). Accessed 26-May-2015.

5. Dartmouth College. Grade descriptions; Grade label: D (above C; below E); Credit eligibility: Requirements for the Degree of Bachelor of Arts. II. A student must pass thirty-five courses… No more than eight courses passed with the grade of D … may be counted toward the thirty-five courses required for graduation.Accessed 26-May-2015.


Alderson, J. C. 1986. “Innovations in Language Testing?” In Innovations in Language Testing: Proceedings of the IUS/NFER Conference, edited by M. Portal, 93-105. Windsor, Berkshire: NFER-Nelson. Arum, R., and J. Roksa. 2010. Academically Adrift: Limited Learning on College Campuses. Chicago: University of Chicago Press. Australian Learning and Teaching Council 2010. Learning and Teaching Academic Standards Project Final Report. Strawberry Hills, NSW: Australian Learning and Teaching Council. Bergan S., and R. Damian, R., eds. 2010. Higher Education for Modern Societies:

Competences and Values. Higher Education Series No. 15. Strasbourg: Council of Europe Publishing. Biggs, J. B., and C. Tang. 2011. Teaching for Quality Learning at University: What the Student Does. 4th ed. Maidenhead, UK: McGraw-Hill/Society for Research into Higher Education/Open University Press. Blömeke, S., O. Zlatkin-Troitschanskaia, C. Kuhn, and J. Fege, eds. 2013. Modeling and Measuring Competencies in Higher Education: Tasks and Challenges. Rotterdam:

Sense Publishers. Bloom, B. S. 1974. “Time and Learning.” American Psychologist 29 (9), 682-688.


Locke, E. A., G. P. Latham, K. J. Smith, and R. E. Wood, 1990. A Theory of Goal Setting and Task Performance. Englewood Cliffs, NJ: Prentice Hall. Locke, E. A. and G. P. Latham. 2009. “Has Goal Setting Gone Wild, or Have its Attackers Abandoned Good Scholarship?” Academy of Management Perspectives 23 (1): 17-23.

Rohrer, D., and H. Pashler. 2010. “Recent Research on Human Learning Challenges Conventional Instructional Strategies.” Educational Researcher 39 (5): 406-412.


Sadler, D. R. 2010b. “Fidelity as a Precondition for Integrity in Grading Academic Achievement.” Assessment & Evaluation in Higher Education 35 (6): 727-743.


Sadler, D. R. 2013c. “Opening up Feedback: Teaching Learners to See.” In Reconceptualising Feedback in Higher Education: Developing Dialogue with Students, edited by S. Merry, M. Price, D. Carless, and M. Taras, 54-63. London: Routledge. Sadler, D. R. 2014a. “Learning from Assessment Events: The Role of Goal Knowledge.” In Advances and Innovations in University Assessment and Feedback edited by C. Kreber, C. Anderson, N. Entwistle, and J. McArthur, 152-172. Edinburgh: Edinburgh University Press. Sadler, D. R. 2014b. “The Futility of Attempting to Codify Academic Achievement Standards.” Higher Education 67 (3): 273-288. doi:10.1007/s10734-013-9649-1. Schatzki, T. R. 2001. “Practice Mind-ed Orders.” In The Practice Turn in Contemporary Theory, edited by T. R. Schatzki, K. K. Cetina, and E. von Savigny, 50-63. London:

Watty, K., Freeman, M., Howieson, B., Hancock, P., O’Connell, B., de Lange, P., and Abraham., A. 2014. “Social Moderation, Assessment and Assuring Standards for Accounting Graduates.” Assessment & Evaluation in Higher Education, 39 (4): 461- 478. doi:10.1080/02602938.2013.848336. Weissberg, R. 2013. “Critically Thinking about Critical Thinking.” Academic Questions 26 (3): 317-328. doi:10.1007/s12129-013-9375-2. Whateley, M. K. G. and Scott, B. C. 2006. “Evaluation Techniques.” Chap.10 in Introduction to Mineral Exploration. 2nd ed. edited by Charles J. Moon, Michael K. G. Whateley, and Anthony M. Evans, 199-252. Malden, MA: Blackwell. Williams, G. 2010. “Subject Benchmarking in the UK.” In Public Policy for Academic Quality: Analyses of Innovative Policy Instruments, edited by D. D. Dill, and M. Beerkens, 157-181. Dordrecht: Springer. doi:10.1007/978-90-481-3754-1_9. Williams, J. B. 2006. “The Place of the Closed Book, Invigilated Final Examination in a Knowledge Economy.” Educational Media International 43 (2): 107119.