Sei sulla pagina 1di 26

Assessing Programming Ability in

Introductory Computer Science: Why


Can't Johnny Code?

Robyn McNamara
Computer Science & Software Engineering
Monash University

Can Johnny Program?

Mostsecondyearscan(kindof),butevidencesuggests
thatmanycan't.
NotjustatMonashthisisaglobalproblem.

McCrackenreport(2002)fouruniversitiesonthreecontinents
foundthattheirsecondyearslackedtheabilitytocreatea
program

papersfromeverycontinent(exceptAntarctica)indicate
problemswiththeprogrammingabilityofCS2students

Antarcticadoesn'tseemtohaveanytertiaryprogramsinIT.

Kinds of assessment

Formative:feedbacktostudents

howamIdoing?howcanIimprove?

whatarethelecturerslookingfor?

Summative:countstowardfinalmark

pracs,exams,assignments,tests,pracexams,hurdles,etc.

Purpose of CS assessment

Ensurethatstudentsenterthenextlevelofstudyor
workpracticewithenoughknowledgeandskilltobeable
tosucceed.Mayinclude:

programmingability

graspofcomputingtheory

problemsolving/analyticalability

more!

Theseskillsaremutuallyreinforcingratherthan
orthogonal.

What's at stake
Inadequateassessmentinearlyyearscanleadto:

inadequatepreparationforlateryearcourses

wateringdowncontent

gradeinflation

hiddencurriculumeffects(Snyder)

tostudents,assessmentdefinescurriculum

poorstudentmorale,whichbrings

attrition

plagiarism(Ashworthet.al.,1997)

Characteristics of good assessment

Reliability:isitagoodmeasure?

ifyoutestthesameconcepttwice,studentsshouldgetsimilar
marks(cf.precision)

canbeevaluatedquantitativelyusingestablishedstatistical
techniques(AERAet.al.,1985)

Validity:isitmeasuringtherightthing?

notdirectlyquantifiablemeasuredindirectlyusing(e.g.)
correlationstudies

thisiswhatI'minterestedin!

Types of validity

Contentvalidity:assessmentneedsto

berelevanttothecourse

coverallofthecourse(notjustthepartsthatareeasytoassess)

Who discovered the Quicksort algorithm?


a) Donald Knuth
b) C.A.R. Hoare
c) Edsger Dijkstra
d) Alan Turing

Types of validity

Constructvalidity:assessmentmeasuresthe
psychologicalconstruct(skill,knowledge,attitude)it's
supposedtomeasure.

Can'tbeevaluateddirectly,sowehavetouseotherformsof
validityasaproxy(Cronbach&Quirk,1976)

You can store several items of the same type


in an:
a) pointer
b) array
c) struct
d) variable

Example: time and construct validity

Allocatingtoolittletimeforataskthreatensvalidity

youendupassessingtimemanagementororganizationalskills

Allocatingtoomuchtimecanalsothreatenvalidity!

studentscanspendalongtimeworkingonprogrammingtasks

theycangothroughmanyredesigncyclesinsteadofjustafew
intelligentones

evenwithanunintelligentheuristic,astudentcaneventually
convergeonagoodenoughanswergivenenoughiterations

notatruetestofproblemsolving/designabilityconstruct
validitythreat

Types of validity

Criterionvalidity:theassessmentresultscorrespond
wellwithothercriteriathatareexpectedtomeasurethe
sameconstruct

predictivevalidity:resultsareagoodpredictorof
performanceinlatercourses

concurrentvalidity:resultscorrelatestronglywithresultsin
concurrentassessment(e.g.twopartsofthesameexam,exam
andpracinsameyear,corequisitecoursesetc.)

Wecanmeasurethis!

Method

TookCSE1301pracandexamresultsfrom2001,only
thosewhohadsatboththeexamandatleastoneprac
Groupedexamquestionsinto

multiplechoice

shortanswer

programming

Calculatedpercentagemarkforeachstudentineach
examcategory,plusoverallexamandoverallprac
Generatedscatterplotsandbestfitlinesfrompercentage
marks

Predictions

Programmingquestionsontheexamshouldbethebest
predictorofpracmark...

...followedbyshortanswer...

...withmultiplechoicebeingtheworstpredictor

programmingskillsareclearlysupposedtobeassessedbyon
papercodingquestionsandpracs

manyshortanswerquestionscoveraspectsofprogramming,e.g.
syntax

Soundsreasonable,right?

MCQ vs Short Answer

Strongcorrelation:0.8
Samestudents,same
exam(sosameday,same
conditions,samelevelof
preparation)

MCQ vs Code

Correlation0.82
NotetheXinterceptof
30%forbestfitline

MCQ vs Prac

Correlationonly0.55
Wepredictedarelatively
poorcorrelationhere,so
that'sOK
NotetheYintercept

Short Answer vs Code

Correlation0.86
SAisabetterpredictor
thanMCQ;sofarsogood
NotetheXinterceptat20
aguessworkeffect?

Short Answer vs Prac

Correlation0.59
StrongerthanMCQ,as
expected,butonly
slightly.

Code vs Prac

Correlationstillonly0.59
nobetterthanshort
answer!
Notethatthebestfitline
hasaYinterceptofmore
than50%!

Exam vs Prac

Notethatsomeonewho
gotzerofortheexam
couldstillexpect45%in
thepracs

45%wasthehurdle
requirementforthepracs

Summary

Examprogrammingandlabprogrammingarestrongly
correlated,sothey'remeasuringsomething.But...
Examprogrammingresultsarenotabetterpredictorof
abilityinpracsthanshortanswerquestions,andonly
slightlybetterthanmultiplechoice.
Somethingisdefinitelynotrighthere!

What next?

Istillhaven'taskedthereallyimportantquestions:

whatdowethinkwe'reassessing?

whatdothestudentsthinkthey'repreparingfor?

arepracsorexamsbetterpredictorsofsuccessinlatercourses,
especiallyatsecondyearlevel?

whatarethefactorsthataffectsuccessinprogrammingbased
assessmenttasks,otherthanprogrammingability?

computerprogrammingandcomputerscience:howarethey
different?Whataretheramificationsforourteachingand
assessment?(Thisisabigandprobablypostdoctoralquestion.)

What's next?

CurrentplanformyPhDresearch:threestages

Whatdowethinkwe'redoing?

interviewlecturerstodeterminewhatskillstheyaretryingto
assess

Whatarewedoing?

obtainfinelygrainedassessmentresultsforfirstyearand
secondyearcoresubjectsforonecohortandanalysetheseresults
toseewhichtaskshavehighestpredictivevalidity

interviewstudentstodeterminehowtheyapproachassessment

Whatshouldwebedoing?

suggestfeasiblewayswecanimproveassessmentvalidity

Bibliography
Reliabilityandvalidity
AmericanEducationalResearchAssociation,AmericanPsychological
Association,&NationalCouncilonMeasurementinEducation.(1985).
Standardsforeducationalandpsychologicaltesting.
Cronbach,Lee.(1971).Testvalidation.InR.L.Thorndike(Ed.).Educational
Measurement
Cronbach,L.J.&Quirk,T.J.(1976).Testvalidity.InInternational
EncyclopediaofEducation.
Oosterhof,A.(1994).Classroomapplicationsofeducationalmeasurement.
McMillan.

Bibliography
GeneralandCS
Ashworth,P.,Bannister,P.&Thorne,P.(1997)Guiltyinwhoseeyes?
Universitystudent'sperceptionsofcheatingandplagiarisminacademic
workandassessment,StudiesinHigherEducation22(2),pp.187203.
Barros,J.A.et.al.,Usinglabexamstoensureprogrammingpracticeinan
introductoryprogrammingcourse,ITiCSE2003pp.1620.
Chamillard,A.&Joiner,J.K.,Evaluatingprogrammingabilityinan
introductorycomputersciencecourse,SIGCSE2000pp.212216.
Daly,C.&Waldron,J.(2001)Introductoryprogramming,problemsolving,and
computerassistedassessment,Proc.6thAnnualInternationalCAA
Conference,pp.95107.
Daly,C.&Waldron,J.(2004)Assessingtheassessmentofprogramming
ability,SIGCSE2004pp.210213.

Bibliography
deRaadt,M.,Toleman,M.&Watson,R.(2004)Trainingstrategicproblem
solvers,SIGCSE2004pp.4851.
Knox,D.&Woltz,U.(1996)Useoflaboratoriesincomputerscienceeducation:
Guidelinesforgoodpractice,ITiCSE1996pp.167181.
Kuechler,W.L.&Simkin,M.G.(2003)Howwelldomultiplechoicetests
evaluatestudentunderstandingincomputerprogrammingclasses?Jnlof
InformationSystemsEducation,14(4)pp.389399.
Lister,R.(2001)ObjectivesandobjectiveassessmentinCS1,SIGCSE2001
pp.292297.
McCracken,M.et.al.,Amultinational,multiinstitutionalstudyofassessment
ofprogrammingskillsoffirstyearCSstudents,SIGCSE2002pp.125140.
Ruehr,F.&Orr,G.(2002)Interactiveprogramdemonstrationasaformof
studentprogramassessment,JnlofComputingSciencesinColleges18(2),
pp.6578.

Bibliography
Sambell,R.&McDowell,L.(1998).Theconstructionofthehiddencurriculum:
Messagesandmeaningsintheassessmentofstudentlearning,Jnlof
AssessmentandEvaluationinHigherEducation23(4),pp.391402.
Snyder,B.R.(1973).Thehiddencurriculum,MITPress.
Thomson,K.&Falchikov,N.(1998).'Fullonuntilthesuncomesout':The
effectsofassessmentonstudentapproachestostudy,JnlofAssessmentand
EvaluationinHigherEducation23(4),pp.379390.

Potrebbero piacerti anche