Sei sulla pagina 1di 29

How You Can Learn To Love Large-Scale Assessment:

Let Me Count the Ways


Ways

An Outline For Our Future At The University of Alberta

Dr. Mark Gierl, Professor and Canada Research Chair

Centre for Research in Applied Measurement and Evaluation


University of Alberta

Presentation at the Centre for Teaching and Learning (CTL) Teaching Big
Symposium
University of AlbertaAugust, 2012

TO BEGIN

Educational measurement is a discipline and a


profession focused on the use of methodologies
for assigning test scores to examinees, typically
on a numeric scale, so we can make inferences
about their knowledge, skills, and competencies

Once a static and largely quantitatively-driven


field, recent developments in the learning
sciences, mathematical statistics, computer
technology, educational psychology, and computing
science are creating profound changes in
educational measurementas a result, our
contemporary assessments barely resemble their
predecessors of decade ago

Centre for Research in Applied Measurement


and Evaluation

OVERVIEW
BACKGROUND
Measurement, Evaluation, and Cognition (MEC)
Program in the Department of Educational Psychology
Centre for Research in Applied Measurement and
Evaluation (CRAME)
PRESENTATION
Four principles of testing in large classrooms
Two applications for putting principles into
practice
Plea for our collective future
My presentation today will have four key messages
Centre for Research in Applied Measurement
and Evaluation

OVERVIEW
Measurement, Evaluation, and Cognition (MEC) is 1
or 8 areas in the Department of Educational
Psychology
Graduate students (16 currently) who receive an
MEd or PhD in MEC specialize in educational
measurement, statistics, research methods,
cognition applied to assessment, and/or program
evaluation
Our graduates work in the private sector at
testing companies like the Educational Testing
Service (ETS) or in the public sector for
(e.g.,
different
The Centreagencies
for Research
inAlberta
AppliedEducation;
Measurement and
Medical
Council
of
Canada)
Evaluation (CRAME) is a centre within MEC focused
on conducting research in the areas of educational
MEC
has five faculty
members:
Drs. ,Mark
measurement,
cognitive
psychology
and Gierl,
statistics
Jacqueline
Leighton,
Ying
Cui,
Cheryl
Poth,
and
with the goal of making assessment an integral
Sharla
part ofKing
learning and instruction
Centre for Research in Applied Measurement
and Evaluation

OVERVIEW
MESSAGE #1: Educational measurement is a
specialized discipline where you can earn a
graduate degree at both the MEd and PhD
levelsthis indicates that testing is
embedded in a discipline that requires
rigorous and comprehensive training
MESSAGE #2: You have colleagues at the
University of Alberta who actually love to
talk about tests and who train graduate
students who also like and excel in our
discipline [resources exist on campus]

Centre for Research in Applied Measurement


and Evaluation

TESTING TIPS BY MARK


HOW TO MAKE A GOOD MULTIPLE-CHOICE TEST ITEM
The item measures specific content, as outlined in the test
specifications.
The item is based on important topic in the curriculum and is
designed to measure key thinking and problem-solving skills.
The item is carefully edited, formatted, and presented using
correct grammar, punctuation, capitalization, and spelling.
The central idea in included in the stem, not the options.
The stem of the item is worded positively, and avoids negatives
such as NOT or EXCEPT.
Only one of the options is clearly correct.
The correct option is not cued due to item writing errors such
as presenting a conspicuous correct options or blatantly
incorrect options.
All of the distractors are plausible (e.g., basing distractors
on typical errors made by students)
Etc., etc., etc., etc., etc., etc.

Centre for Research in Applied Measurement


and Evaluation

OUR FOUR PRINCIPLES


PRINCIPLE #1: We will shift from infrequent
summative assessments (e.g., 2 midterms + final) to
more frequent formative assessment (e.g., 8-10
exams or more per term)
PRINCIPLE #2: Testing on-demand is required where
students can write exams at any time and at any
location
PRINCIPLE #3: Assessments will be scored
immediately and students will receive both instant
and
detailed
their
overall
PRINCIPLE
#4:feedback
You willon
spend
less
time performance
and less
as
wellimplementing
as their problem-solving
strengths
and
effort
these principles
in your
large
weaknesses
classes compared to the amount of time you
currently spend on assessment-related activities
in fact, much less

Centre for Research in Applied Measurement


and Evaluation

COMPUTED-BASED TESTING

APPLICATION #1:
COMPUTER-BASED TESTING

Centre for Research in Applied Measurement


and Evaluation

PAPER-BASED TESTING
Test Development

Test Administration

Test Reporting

Centre for Research in Applied Measurement


and Evaluation

COMPUTED-BASED TESTING

Centre for Research in Applied Measurement


and Evaluation

COMPUTED-BASED TESTING

AUTOMATED

Centre for Research in Applied Measurement


and Evaluation

COMPUTED-BASED TESTING

Centre for Research in Applied Measurement


and Evaluation

COMPUTED-BASED TESTING

Centre for Research in Applied Measurement


and Evaluation

COMPUTED-BASED TESTING

Centre for Research in Applied Measurement


and Evaluation

COMPUTED-BASED TESTING

In short, computer-based testing is a very good


thing and it is here to staycomputer-based
testing either eliminates or automates 2/3 of the
testing activities that, currently, you do
manually

Admittedly, we are focusing on examples that use


objectively-scored assessment itemsbut examples
can also be cited for automated essay scoring of
student-produced assessment tasks

The architecture for a computer-based testing


system#3:
is The
feasible
[ PAPER
BASED TESTING IS DEAD]
MESSAGE
University of
Alberta
needs a
computer-based testing system because YOU need
this system for all of your classes, big and small

Centre for Research in Applied Measurement


and Evaluation

COMPUTED-BASED TESTING
Test Development

Test Administration

*ELIMINATED*

Test Reporting

*AUTOMATED*

Centre for Research in Applied Measurement


and Evaluation

AUTOMATIC ITEM GENERATION

APPLICATION #2:
AUTOMATIC ITEM
GENERATION

Centre for Research in Applied Measurement


and Evaluation

ONE WAY TO CREATE TEST


ITEMS

Professor writing test


items the day before the
midterm exam

Centre for Research in Applied Measurement


and Evaluation

AUTOMATIC ITEM GENERATION

Another way to address this item development


challenge is with automatic item generation (AIG)

Automatic item generation is the process of using


item models to generate test items with the aid
of computer technologywith this approach,
hundreds or even thousands of items can be
generated with a single item model
While the idea of automatic item generation may
be viewed as a dream come true I am here to
tell you that the dream is well within our reach
because of developments in modern educational
measurement theory

Centre for Research in Applied Measurement


and Evaluation

A 54-year-old woman has a laparoscopic


cholecystectomy. On post-operative day 3
she has a temperature of 38.5c. Physical
examination reveal a red and tender wound
and calf tenderness. Which one of the
following is the best next step?
a.
b.
c.
d.

Mobilize
Antibiotics
Anti coagulation
Reopen the wound

AUTOMATIC ITEM GENERATION

Centre for Research in Applied Measurement


and Evaluation

AUTOMATIC ITEM GENERATION


That ugly diagram is a cognitive model
highlighting the knowledge, skills, and content
required to make a medical diagnosis
The model includes three key outcomes:
1.Identify THE PROBLEM (i.e., Post-Operative
Fever);
2.Specify Sources of information required to
diagnose the problem (e.g., Type of Surgery); and
3.
Describe KEY features within each information
source (e.g., Guarding and Rebound) needed to
create different instances of the problem

Centre for Research in Applied Measurement


and Evaluation

AUTOMATIC ITEM GENERATION

Centre for Research in Applied Measurement


and Evaluation

AUTOMATIC ITEM GENERATION

Next, an item models is created, where an item model


is like a template or a mould of the assessment task
(i.e., its a target where we want to place the
content in the test item)
A 54-year-old woman has a <TYPE OF SURGERY>. On postoperative day <Timing of Fever> the patient has a
temperature of 38.5c. Physical examination reveal
<Physical Examination>. Which one of the following is the
best next step?
Type of Surgery: Gastrectomy, Right Hemicolectomy, Left
Hemicolectomy, Appendectomy, Laparoscopic Cholecystectomy
Timing of Fever: 1 to 6 days
Physical Examination: Red and Tender Wound, Guarding and
Rebound, Abdominal Tenderness, Calf Tenderness

Centre for Research in Applied Measurement


and Evaluation

AUTOMATIC ITEM GENERATION


Finally, we combine this
information systematically to
produce new items
To accomplish this complex
combinatoric task, we created
software for item generation
called IGOR (Item GeneratOR)
IGOR was programmed using JAVA

Centre for Research in Applied Measurement


and Evaluation

AUTOMATIC ITEM GENERATION


When we used our method with 5 different item
models developed for the MCC QE Part I in surgery,
more than 20,000 items were generated:
Item
Item
Item
Item
Item

Model
Model
Model
Model
Model

1:
2:
3:
4:
5:

Gallstones288
Hernias256
Aneurism5,184
Post Operation Management7,488
Post Operation Fever7,680

We have also developed item models at the K-12


levels in Language Arts, Social, Science, Math as
well as AP Biology and Architecture in addition
to 10 different content areas in Medicine
producing millions of test items

Centre for Research in Applied Measurement


and Evaluation

AUTOMATIC ITEM GENERATION


16. A 60-year-old woman has been booked for a laparoscopic cholecystectomy for
symptomatic gallstones. Prior to her surgery, she presents to the Emergency Department
with a history of feeling faint and unwell. She has had rigors. On physical
examination, her temperature is 40 C. Her white blood count is 22 x 109/L; aspartate
aminotransferase 63 U/L; alanine aminotransferase 78 U/L; alkaline phosphatase 450 U/L;
amylase level 200 U/L and bilirubin 50 mol/L. Which one of the following is the most
likely diagnosis?
(a) Cholecystitis.
(b) Cholangitis.
(c) Pancreatitis.
(d) Hepatic abscess.
(e) Duodenal ulcer.
39. An obese 61-year-old male collapsed with sudden pain at a shopping center and is
brought to hospital by ambulance. He is diaphoretic. His pulse is 96/minute; blood
pressure 100/70 mm Hg; he complains of severe pain in his abdomen and left flank. Which one
of the following is the most likely diagnosis?
(a)
(b)
(c)
(d)
(e)

Acute hemorrhagic pancreatitis.


Ruptured aortic aneurysm.
Mesenteric vascular occlusion.
Acute diverticulitis.
Volvulus of sigmoid colon.

Centre for Research in Applied Measurement


and Evaluation

CONCLUSION
Educational measurement is a specialized discipline
requiring advanced graduate trainingthis implies
that assessment contains many complex and thorny
issues but please remember that you have colleagues
on-campus who can help you deal with these issues
Our discipline is undergoing profound changes that
will yield much better methods for evaluating
students while at the same time requiring less time
and effort for the examiner because much of the
unpleasant work is being automatedcomputer-based
testing and automatic item generation are but two
MESSAGE
There
is no
examples#4:
from
a list
of going
many back to the good
old daystherefore, we must work together to
structure our future at the University of Alberta
by building and implementing these new assessment
systemsbut also recognize that this work is just
getting started
Centre for Research in Applied Measurement
and Evaluation

THANK YOU
Dr. Mark J. Gierl
(mark.gierl@ualberta.ca)
6-110 Education Centre North

Centre for Research in Applied Measurement


and Evaluation

Potrebbero piacerti anche