Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Robyn McNamara
Computer Science & Software Engineering
Monash University
Mostsecondyearscan(kindof),butevidencesuggests
thatmanycan't.
NotjustatMonashthisisaglobalproblem.
McCrackenreport(2002)fouruniversitiesonthreecontinents
foundthattheirsecondyearslackedtheabilitytocreatea
program
papersfromeverycontinent(exceptAntarctica)indicate
problemswiththeprogrammingabilityofCS2students
Antarcticadoesn'tseemtohaveanytertiaryprogramsinIT.
Kinds of assessment
Formative:feedbacktostudents
howamIdoing?howcanIimprove?
whatarethelecturerslookingfor?
Summative:countstowardfinalmark
pracs,exams,assignments,tests,pracexams,hurdles,etc.
Purpose of CS assessment
Ensurethatstudentsenterthenextlevelofstudyor
workpracticewithenoughknowledgeandskilltobeable
tosucceed.Mayinclude:
programmingability
graspofcomputingtheory
problemsolving/analyticalability
more!
Theseskillsaremutuallyreinforcingratherthan
orthogonal.
What's at stake
Inadequateassessmentinearlyyearscanleadto:
inadequatepreparationforlateryearcourses
wateringdowncontent
gradeinflation
hiddencurriculumeffects(Snyder)
tostudents,assessmentdefinescurriculum
poorstudentmorale,whichbrings
attrition
plagiarism(Ashworthet.al.,1997)
Reliability:isitagoodmeasure?
ifyoutestthesameconcepttwice,studentsshouldgetsimilar
marks(cf.precision)
canbeevaluatedquantitativelyusingestablishedstatistical
techniques(AERAet.al.,1985)
Validity:isitmeasuringtherightthing?
notdirectlyquantifiablemeasuredindirectlyusing(e.g.)
correlationstudies
thisiswhatI'minterestedin!
Types of validity
Contentvalidity:assessmentneedsto
berelevanttothecourse
coverallofthecourse(notjustthepartsthatareeasytoassess)
Types of validity
Constructvalidity:assessmentmeasuresthe
psychologicalconstruct(skill,knowledge,attitude)it's
supposedtomeasure.
Can'tbeevaluateddirectly,sowehavetouseotherformsof
validityasaproxy(Cronbach&Quirk,1976)
Allocatingtoolittletimeforataskthreatensvalidity
youendupassessingtimemanagementororganizationalskills
Allocatingtoomuchtimecanalsothreatenvalidity!
studentscanspendalongtimeworkingonprogrammingtasks
theycangothroughmanyredesigncyclesinsteadofjustafew
intelligentones
evenwithanunintelligentheuristic,astudentcaneventually
convergeonagoodenoughanswergivenenoughiterations
notatruetestofproblemsolving/designabilityconstruct
validitythreat
Types of validity
Criterionvalidity:theassessmentresultscorrespond
wellwithothercriteriathatareexpectedtomeasurethe
sameconstruct
predictivevalidity:resultsareagoodpredictorof
performanceinlatercourses
concurrentvalidity:resultscorrelatestronglywithresultsin
concurrentassessment(e.g.twopartsofthesameexam,exam
andpracinsameyear,corequisitecoursesetc.)
Wecanmeasurethis!
Method
TookCSE1301pracandexamresultsfrom2001,only
thosewhohadsatboththeexamandatleastoneprac
Groupedexamquestionsinto
multiplechoice
shortanswer
programming
Calculatedpercentagemarkforeachstudentineach
examcategory,plusoverallexamandoverallprac
Generatedscatterplotsandbestfitlinesfrompercentage
marks
Predictions
Programmingquestionsontheexamshouldbethebest
predictorofpracmark...
...followedbyshortanswer...
...withmultiplechoicebeingtheworstpredictor
programmingskillsareclearlysupposedtobeassessedbyon
papercodingquestionsandpracs
manyshortanswerquestionscoveraspectsofprogramming,e.g.
syntax
Soundsreasonable,right?
Strongcorrelation:0.8
Samestudents,same
exam(sosameday,same
conditions,samelevelof
preparation)
MCQ vs Code
Correlation0.82
NotetheXinterceptof
30%forbestfitline
MCQ vs Prac
Correlationonly0.55
Wepredictedarelatively
poorcorrelationhere,so
that'sOK
NotetheYintercept
Correlation0.86
SAisabetterpredictor
thanMCQ;sofarsogood
NotetheXinterceptat20
aguessworkeffect?
Correlation0.59
StrongerthanMCQ,as
expected,butonly
slightly.
Code vs Prac
Correlationstillonly0.59
nobetterthanshort
answer!
Notethatthebestfitline
hasaYinterceptofmore
than50%!
Exam vs Prac
Notethatsomeonewho
gotzerofortheexam
couldstillexpect45%in
thepracs
45%wasthehurdle
requirementforthepracs
Summary
Examprogrammingandlabprogrammingarestrongly
correlated,sothey'remeasuringsomething.But...
Examprogrammingresultsarenotabetterpredictorof
abilityinpracsthanshortanswerquestions,andonly
slightlybetterthanmultiplechoice.
Somethingisdefinitelynotrighthere!
What next?
Istillhaven'taskedthereallyimportantquestions:
whatdowethinkwe'reassessing?
whatdothestudentsthinkthey'repreparingfor?
arepracsorexamsbetterpredictorsofsuccessinlatercourses,
especiallyatsecondyearlevel?
whatarethefactorsthataffectsuccessinprogrammingbased
assessmenttasks,otherthanprogrammingability?
computerprogrammingandcomputerscience:howarethey
different?Whataretheramificationsforourteachingand
assessment?(Thisisabigandprobablypostdoctoralquestion.)
What's next?
CurrentplanformyPhDresearch:threestages
Whatdowethinkwe'redoing?
interviewlecturerstodeterminewhatskillstheyaretryingto
assess
Whatarewedoing?
obtainfinelygrainedassessmentresultsforfirstyearand
secondyearcoresubjectsforonecohortandanalysetheseresults
toseewhichtaskshavehighestpredictivevalidity
interviewstudentstodeterminehowtheyapproachassessment
Whatshouldwebedoing?
suggestfeasiblewayswecanimproveassessmentvalidity
Bibliography
Reliabilityandvalidity
AmericanEducationalResearchAssociation,AmericanPsychological
Association,&NationalCouncilonMeasurementinEducation.(1985).
Standardsforeducationalandpsychologicaltesting.
Cronbach,Lee.(1971).Testvalidation.InR.L.Thorndike(Ed.).Educational
Measurement
Cronbach,L.J.&Quirk,T.J.(1976).Testvalidity.InInternational
EncyclopediaofEducation.
Oosterhof,A.(1994).Classroomapplicationsofeducationalmeasurement.
McMillan.
Bibliography
GeneralandCS
Ashworth,P.,Bannister,P.&Thorne,P.(1997)Guiltyinwhoseeyes?
Universitystudent'sperceptionsofcheatingandplagiarisminacademic
workandassessment,StudiesinHigherEducation22(2),pp.187203.
Barros,J.A.et.al.,Usinglabexamstoensureprogrammingpracticeinan
introductoryprogrammingcourse,ITiCSE2003pp.1620.
Chamillard,A.&Joiner,J.K.,Evaluatingprogrammingabilityinan
introductorycomputersciencecourse,SIGCSE2000pp.212216.
Daly,C.&Waldron,J.(2001)Introductoryprogramming,problemsolving,and
computerassistedassessment,Proc.6thAnnualInternationalCAA
Conference,pp.95107.
Daly,C.&Waldron,J.(2004)Assessingtheassessmentofprogramming
ability,SIGCSE2004pp.210213.
Bibliography
deRaadt,M.,Toleman,M.&Watson,R.(2004)Trainingstrategicproblem
solvers,SIGCSE2004pp.4851.
Knox,D.&Woltz,U.(1996)Useoflaboratoriesincomputerscienceeducation:
Guidelinesforgoodpractice,ITiCSE1996pp.167181.
Kuechler,W.L.&Simkin,M.G.(2003)Howwelldomultiplechoicetests
evaluatestudentunderstandingincomputerprogrammingclasses?Jnlof
InformationSystemsEducation,14(4)pp.389399.
Lister,R.(2001)ObjectivesandobjectiveassessmentinCS1,SIGCSE2001
pp.292297.
McCracken,M.et.al.,Amultinational,multiinstitutionalstudyofassessment
ofprogrammingskillsoffirstyearCSstudents,SIGCSE2002pp.125140.
Ruehr,F.&Orr,G.(2002)Interactiveprogramdemonstrationasaformof
studentprogramassessment,JnlofComputingSciencesinColleges18(2),
pp.6578.
Bibliography
Sambell,R.&McDowell,L.(1998).Theconstructionofthehiddencurriculum:
Messagesandmeaningsintheassessmentofstudentlearning,Jnlof
AssessmentandEvaluationinHigherEducation23(4),pp.391402.
Snyder,B.R.(1973).Thehiddencurriculum,MITPress.
Thomson,K.&Falchikov,N.(1998).'Fullonuntilthesuncomesout':The
effectsofassessmentonstudentapproachestostudy,JnlofAssessmentand
EvaluationinHigherEducation23(4),pp.379390.