Smith 2017

J Sci Educ Technol
DOI 10.1007/s10956-017-9690-4
Collaborative and Competitive Video Games for Teaching

Computing in Higher Education
Spencer Smith1 · Samantha Chan1
© Springer Science+Business Media New York 2017
Abstract This study measures the success of using a col- Introduction

laborative and competitive video game, named Space Race,
to teach computing to first year engineering students. Space Many studies have explored the benefits of video games
Race is played by teams of four, each with their own tablet, for teaching computing, but there are still aspects of this
collaborating to compete against the other teams in the topic that have not been investigated. For instance, many
class. The impact of the game on student learning was stud- of the previous studies have focused on teaching via tools
ied through measurements using 485 students, over one for game creation, as opposed to learning directly through
term. Surveys were used to gauge student reception of the game play. Moreover, the previous studies are often aimed
game. Pre and post-tests, and in-course examinations were at younger learners, not at college or university students.
used to quantify student performance. The game was well Although previous studies have looked at single player
received with at least 82% of the students that played it games to motivate learning through competition, fewer stud-
recommending it to others. In some cases, game partici- ies have looked at multiplayer games and the associated
pants outperformed non-participants on course exams. On opportunity for collaborative learning. Combining the above
the final course exam, all of the statistically significant gaps in the literature, a question emerges: can a competitive
(p < 0.05) comparisons (42% of the relevant questions) and collaborative game be used to improve learning when
showed a performance improvement of game participants teaching computing to university students? This study aims
on the questions, with a maximum grade improvement of to address this question by measuring the use of such a game
41%. The findings also suggest that some students retain the to teach first year engineering students introductory com-
knowledge obtained from Space Race for at least 7 weeks. puting. Unlike many of the previous studies on using games
The results of this study provide strong evidence that a col- to teach computing, the findings will be supported by exten-
laborative and competitive video game can be an effective sive empirical data, based on measurements from a class of
tool for teaching computing in post-secondary education. 485 students.
The class in question, ENGINEER 1D04 at McMaster
Keywords Improving classroom teaching · Post-secondary University, Canada, is an introductory programming course,
education · Programming and programming languages · taught in the Python language. The current curriculum pre-
Collaborative learning · Game design scribes 3 h of hands-on lab activities over the course of
12 weeks. Our experience has been that this amount of time
is insufficient for beginners. Combining this problem with a
general lack of motivation to practice programming, many
Spencer Smith
students find themselves struggling with the course content.
smiths@mcmaster.ca
To motivate the students, and hopefully improve their
1 Computing and Software Department, McMaster University, learning, an activity was added to ENGINEER 1D04
Hamilton, Ontario, L8S 4K1, Canada that combines four pedagogical ideas and technologies:
J Sci Educ Technol
(i) educational computer games are motivational learning tional approach to teaching. The newly developed game,
tools1 (Barnes et al. 2008; Pirker et al. 2014; Prensky Space Race, is played by teams of four, with each student
2007; Barab and Dede 2007; Annetta et al. 2009; Connolly controlling their own tablet. Teams compete against one
et al. 2012); (ii) a tablet platform has the advantages of another for the best time for completing each of the four
being portable, affordable, and technologically relevant to levels.
younger people (Baloian et al. 2013); (iii) social compe- Using Space Race, this research provides computer sci-
tition contributes to the enjoyment felt by computer game ence and software engineering educators and pedagogical
players (Vorderer et al. 2003) and has been observed to researchers with insight on the feasibility and effectiveness
motivate educational computer game players (Boyce et al. of a collaborative and competitive video game as a teaching
2011); and (iv) a collaborative component, through multi- tool. As seen in the “Literature Review” section, teaching
player teams, can be added to motivate those students not computer programming through a game in higher education
influenced by competition. Empirical studies have shown remains a relatively new field. This study contributes the
that individuals demonstrate higher levels of cooperation following to this field:
within their teams when there is competition between
– Provides evidence that a majority of students in engi-
groups (Burton-Chellew et al. 2010). Those that are more
neering would like to see video games incorporated into
individualistic in nature are also more likely to contribute
education.
under group competition, as opposed to a standard public
– Demonstrates how cooperation and group discussion
situation (Probst et al. 1999).
can be harnessed effectively through a game to teach
Of the ideas listed above, collaborative learning seems
students basic computer programming concepts.
particularly suited to motivating first year college students.
– Highlights the importance of feedback in educational
In an essay on approaches to improving student retention,
video games.
Tinto (2000) argues “... that colleges and universities should
– Proves that a collaborative and competitive video game
make learning communities and the collaborative pedagogy
can potentially have a positive effect on student knowl-
that underlies them the hallmark of the first year experience.
edge in programming immediately after gameplay.
They should ensure that shared learning is the norm, not the
– Shows that a collaborative and competitive educational
exception, of student first year experience.” With respect to
video game that teaches programming can positively
computer science education specifically, some hypothesize
impact student understanding of course material.
that students may develop the skills necessary to succeed
– Reveals that programming knowledge obtained in a
in a game more readily if friends are present to motivate
video game can be used in a non-gaming context, like
and challenge them (Hicks 2010). Continuing along this
on a test.
line, traditional computer science education may not be well
– Indicates that the understandings acquired through a
suited to millennial students (Nickel and Barnes 2010); col-
collaborative and competitive video game that teaches
laborative educational games hold the promise of making
programming can be retained for at least 7 weeks by
the learning experience better for both student and educators
some students that played the game.
(Nickel and Barnes 2010). As one example, collaborative
learning has been incorporating into Epik (Edutainment by The “Literature Review” section provides a literature
Playing and Interacting with Knowledge), an online appli- review for the use of video games for teaching computing.
cation for the development of quiz games (Sampaio et al. The design of Space Race is given in the “Game Design”
2013). Epik is an example of one of the few serious games section and an overview of the experimental procedure
that have been designed with multiplayer support. Wen- used to measure the impact of Space Race is provided
del et al. (2013) show that collaborative multiplayer serious in the “Experimental Procedure” section. In the “Game
games are possible and can be enjoyable, but the scope Reception and Feedback” section, the survey results from
of their study means that they stop short of incorporating the study participants are summarized and analyzed. The
subject-specific tasks and evaluating the learning outcomes “Educational Effectiveness of the Game” section mea-
of their game. sures the impact of the game on student learning, first
The current study quantifies the effectiveness of a col- through a comparison of pre- and post-quiz results and
laborative and competitive video game versus a tradi- then by comparing final examination results between game
participants and non-participants. Further details on the
1 The meta analysis of Wouters et al. (2013) suggests that serious
design and assessment of Space Race for teaching engi-
games are not actually more motivating than conventional instruction.
neering students introductory computing can be found in
However, their analysis still supports the use of serious games because Chan (2014), while a brief overview can be found in
their data shows an increase in student learning and retention. Chan and Smith (2014).
J Sci Educ Technol
Literature Review and iPhone game developed by Sleeping Beast Games©

2012 (Smith 2014). When playing, each player has their
Empirical evidence is currently lacking to support game- own tablet.
based learning in computing. Here, the term game-based The game features cooperation among team members
learning refers to the use of games as a tool to teach pro- and competition between teams. Players progress through a
gramming; as mentioned previously, it does not refer to level by correctly answering a series of questions served by
the development of games as a method of teaching pro- the game. Game performance is dependent upon the time
gramming. As such, many popular block-based graphical it takes to complete the instructions in a level; the shorter
languages such as StarLogo The Next Generation (Yoon the time, the better the performance. The best time for each
2005), Scratch (Maloney et al. 2004), Alice2 (Kelleher team is recorded per level on a public leaderboard that play-
et al. 2002), Robocode (Kumar and Khurana 2012; Long ers are able to access through the main menu in the game.
2007), and Cleogo (Cockburn and Bryant 1998) will not be The teams are listed from best to worst according to time.
considered here. These programming environments are not Each level in Space Race introduces new programming
considered games, since students have to build a game, not concepts that rely on a firm understanding of the concepts
play one, to learn. taught in the previous levels. The programming language
Very little research has been done to explore game-based being taught in all four levels is Python, version 2.7. The
learning as a method of instruction for programming. Fur- emphasis is on syntax and introductory concepts. To keep
thermore, most of the research that has been completed does the interaction with the game simple, the decision was made
not definitively compare the effects of game-based learn- to not display tutorial style feedback for incorrect answers.
ing to more traditional methods. Moreover, almost none of The idea being that students will give one another feed-
the research has evaluated whether the programming knowl- back or discuss the problem with the instructor. As seen in
edge acquired from the game is transferable to contexts the “Game Reception and Feedback” section, some students
outside of the game, and none of the games employ col- would have preferred in game feedback. Each level of Space
laboration or teamwork in gameplay. Many games have Race was designed to be completed in an hour or less. On
been proposed by researchers, but they have either not been average, teams completed each level within 45 min.
implemented or not investigated (Connolly et al. 2007).
Table 1 lists some games that have been proposed by Level 1 Game Design
different researchers for implementation in software engi-
neering or computer science education. Although Chaffin Figure 1 depicts the game screen for level 1. The player can
et al. (2009) use the differences between pre- and post-tests interact with level 1 through four touch buttons and a shake
to quantify what the game taught participants, no control motion. The four touch buttons are used to select answers
group is used to show what the students would have learned and the shake motion is used to indicate an Error response.
following a traditional approach. The gameplay of level 1 is very similar to that of a
Table 2 lists the few conclusions that other researchers multiple-choice quiz game. However, modifications were
have drawn after developing and studying the effects of a made to ensure the design goal of cooperation between team
video game that teaches programming or computing con- members is attained. Every player on the team of four will
cepts to students. The success of these existing projects with view the same game screens. The only difference in infor-
educational games provides the motivation for the game mation being displayed is the Python data type shown in
(Space Race) used in this study. To contrast with the projects the Panel Type display. A different data type is randomly
summarized in Table 2, the current project has considerably assigned to each player at the beginning of the game, from
more participants (485 students), is aimed at higher educa- the possible types of int, str, bool, or float\long.
tion, incorporates cooperation and competition, includes a The objective of this level is to answer all the questions in
statistical analysis, measures longer term impact (7 weeks the shortest time. All four players will see the same question
after gameplay), and evaluates the transfer of the knowledge in the Question Monitor and the same four values for the Answer
gained from the game to a non-gaming context (test). Buttons. The player to select the correct answer must have
the data type that corresponds to the data type of the variable
in the question. The following is an example question:
Game Design
1 X = 6
Space Race is a four-player Android tablet game that was 2 Y = 5
developed for this research project. The game has four lev- 3 Z = X + Y
els and is inspired by Spaceteam, a collaborative Android 4 Z = ?
J Sci Educ Technol
Table 1 Games proposed for implementation in computing education
Author(s) Game(s) Programming topics
Martin (2000) Simulation game Information systems development

Ford Jr and Minsker (2003) “TREEZ” computer game Tree traversal techniques using iterative and recursive algorithms
Navarro and van der Hoek (2004) SimSe-Interactive Simulation Game Software engineering process (project management)
Rajaravivarma (2005) Number and word games Basic programming skills (I/O, strings, arrays, etc.)
Li et al. (2008) “Bomberman” Introductory C programming language
Muratet et al. (2009) Real-time strategy game Basic programming skills (I/O, strings, arrays, etc.)
Chaffin et al. (2009) Programming puzzles Recursion
The values in the answer buttons are 10, 11, 12, and 13. randomly each time. Using the same questions ensures that
To match the correct type, the player with the Panel Type of each team has an equal opportunity to improve their overall
int must be the player to select 11. This encourages the time with every iteration of this level.
team to discuss the correct answer together.
In some cases, the code block shown in the Question Levels 2, 3, and 4 Game Design
Monitor may raise Python exceptions, as shown below.
Since the underlying game mechanics for levels 2, 3, and
1 X = 6
4 are the same, a combined description is given here. For
2 Y == X
these levels, students are provided with a “Cheat Sheet” that
3 Y = ?
concisely summarizes each level’s concepts.
Since a Python NameError exception is raised by line The game screen for level 2 can be seen in Fig. 2. All
2, all four players must shake their tablets to acknowledge of the touchscreen controls are displayed together on a sin-
the error. gle control panel. Each touchscreen control on the panel
The level is complete when all 45 questions have been is paired with a touch button that is used to submit that
answered correctly. Each iteration of this level generates control’s selected value. These Submit Buttons will turn
the same questions in the same order. The only poten- green if the control’s submitted value is correct, red when
tial change between runs is which player should select the control’s submitted value is incorrect, and yellow if the
the correct answer, since their panel types are determined incorrect control has been changed.
Table 2 Educational computing games with quantified impact on learning
Author(s) Game(s) Topics Evaluation
Kahn (1999) ToonTalk Programs, 12 pairs of students played the game and were able to complete the
algorithms puzzles. Effectiveness was not quantified.
Long (2007), Bierre et al. (2006) Robocode Java and 83 community members were surveyed. 80% of respondents per-
.NET ceived programming skills increased (Long 2007). Incorporation
in education may not be as effective because students copy open-
source code for robots online (Bierre et al. 2006).
Eagle M and Barnes (2008) Wu’s Castle Arrays For 27 students, score for the array portion of the final exam was
and loops 51% higher for game participants (p = .01).
Papastergiou (2009) LearnMem Computer LearnMem1 was compared to LearnMem2, a computer-based
memory teaching tool, with 88 students randomly split between the
concepts 2 options. No significant difference on pre-test [F (1, 86) =
2.625, p = .109]. LearnMem1 is significantly better on post-test
[F (1, 83) = 8.853, p = .004]. Survey showed students found
LearnMem1 more appealing.
Kazimoglu et al. (2012) Program Program, 25 students played the game and reported enjoying it. Effectiveness
Your Robot debug was not quantified.
Browne and Anand (2013) 6 tablet Binary search, 101 students played the games in 6 separate sessions. Quizzes
video quicksort, revealed students perform better with a combination of game and
games Dijkstra traditional instruction over either type alone. No consistent trend
algo etc. was found between game-only or traditional-only instruction.
J Sci Educ Technol
Fig. 1 Game screen for level 1
The objective for this level is for players to execute correctly, player 2’s instruction will be cleared. Should
the instructions for four different programs in the short- any player be presented with an instruction that raises a
est possible time. Before the game begins, each player is Python SyntaxError, every team member must shake
randomly assigned a unique program from four available their tablets to acknowledge the exception.
programs. Each program is paired with a specific panel that Similar to a regular program, some instructions will
the player will see on their screen. This means that players require controls to be set to a new value that depends on that
will view unique panels and program instructions on their control’s current value or another control’s current value. As
game screens. Table 3 lists the program and panel pairs used an example, a possible instruction a player may receive is as
for level 2. During the game, the players are served one follows: Feedback = Feedback + "C". The player
instruction at a time from their assigned program. A new with the Feedback control would then refer to their Pre-
instruction is served when the previous instruction has been vious Values Display, as shown in Fig. 2, to determine the
correctly executed. Instructions indicate the value that con- current value of Feedback. This current value is the pre-
trols should be set to. Players can receive instructions for viously correct value for the control. Assuming that the
a control on their own control panel or for a teammate’s current value of Feedback is DCA, the player would have
control panel. to set Feedback to DCAC.
In Fig. 2, player 1 has received the instruction Aileron For research purposes, the game ensures that every player
= 9%2. This is an example of a player receiving an instruc- receives the same learning experience. To achieve this, every
tion that affects their own panel. To correctly execute the program has the same kind of instructions, but the instruc-
instruction, player 1 must set their Aileron control to the tions do not occur in the same order. For instance, Aileron
value of 1 and press the Submit Button. Once the instruc- = 9%2 is instruction # 1 from Program 1 and Torque =
tion has been cleared, player 1 will be served their next 5%3 is instruction # 4 from Program 3. These instructions
instruction. are of the same kind; they both test the player’s understand-
A player may also receive an instruction that does not ing of the modulus (%) operator. By the end of the level, all
refer to a control on their control panel. For instance, player the players will have given and received a set of instructions
2 may be served the instruction Message = "DAB", that covers the same range of topics.
when the Message control is found on player 4’s con- With the way the game is designed, it is possible for
trol panel, as shown in Table 3. In this instance, player players to finish all the instructions at different points in
2 must verbally communicate to player 4 that Message time. A player will be given the instruction “Listen
should be set to DAB. Once player 4 completes this task to teammates!” if they have completed their programs

J Sci Educ Technol
Table 3 Level 2 programs, panels, control names, and control types same controls, but all the controls will have unique names
except for the Terminate Loop button. Unlike level 3, all
Panel Name Type
the players will view the same file in their File Viewers.
Program 1 The game mechanics for this level are similar to levels
1 Aileron int 2 and 3, except for the addition of for-loops. The four
4 Elevator int programs a team must complete are, once again, paired
4 Rudder float/int with a panel and randomly assigned to each player at the
3 Text string beginning of the game. However, the four programs are no
2 Break bool longer separate and distinct programs. The programs follow
a pattern; instructions alternate between a level 2 or 3 style
Program 2
instruction and a group of instructions for a for-loop. The
2 Flaps int
for-loop instructions are the same across all four programs.
1 Slats int
An example is shown below.
1 Hover float/int
4 Message string 1 for Rudder in range(3):
3 Horn bool 2 Hover = Hover + 1
Program 3 3 Torque.append(Rudder)
3 Torque int 4 Roll[Rudder] = Rudder
2 Pitch int
2 Yoke float/int One control from each panel will be affected in the group
1 Speaker string of instructions. This means that all of the team members
4 Light bool will be able to participate in the execution of the for-loop.
Each line in the for-loop will be highlighted sequentially,
Program 4 starting with line 1. The purpose of the highlighter is to indi-
4 Roll int cate the line to be executed. The highlighter advances once
3 Yaw int the instruction is executed by the correct player. After the
3 Altitude float/int loop body is executed, the highlighter will loop back to the
2 Feedback string first line. Once the loop has been executed to completion, all
1 Lock bool four players must press the Terminate Loop button to indi-
cate that the for-loop has terminated. This will then prompt
the game to serve the next set of level 2 or 3 style instruc-
before everyone else. They will then have to continue to par- tions. The gameplay for the for-loop forces the players to
ticipate in the game by executing the instructions verbally traverse through the loop like a computer program.
communicated from their team members. The level ends
when all four programs have been completed.
The objective and game mechanics for level 3 are the Experimental Procedure
same as level 2. The only difference lies in the con-
trols/control types that are available and the introduction of The study was conducted with voluntary participants from
file I/O. The control types for level 3 are int, str, list of ENGINEER 1D04 in the winter term of 2013–2014. Any
int, and list of str. The layout for level 3 is shown in student from the class of 485 was free to volunteer, with no
Fig. 3. As with level 2, all the players are randomly assigned restrictions based on their characteristics (e.g., age, gender,
a program and panel pair. For this level, the program and affiliation). We did not collect data on demographic char-
panel pairs are also matched with a specific file, where the acteristics, but we did survey the students with respect to
file consists of three characters separated by newlines. The programming experience and attitudes towards video games
file I/O instructions in any given program will only refer in education (Chan 2014).
to the file that is matched with that program. This design By their self-determined decision on whether or not to
limits confusion by ensuring that a player will never have play Space Race, the students were naturally divided into two
to ask their team members for the contents of an opened groups: one control group that did not play Space Race and an
file. Furthermore, it brings the player’s focus towards under- experimental group that did. (The potential for self-selection
standing how the lines of code should be interpreted rather bias is discussed in the “Benchmark for Student Abilities”
than where the file is located. section). Since students were not required to play all four
Finally, the game screen for level 4 can be seen in Fig. 4. levels, the number of participants varied across the four
Like the previous two levels, every player will have the levels, as shown in Table 4.
J Sci Educ Technol
No limits were placed on communication between those ENGINEER 1D04 lectures, labs, and tutorials from the
that played the game and those that did not. However, previous week.
diffusion (where the experimental group influences the con- The experimental group was asked to complete a survey
trol group) is not a significant concern since the game (presented in the “Game Reception and Feedback” section)
was not available to students outside of the lab setting. after each level. Students used this survey to evaluate the
Moreover, if the game participants shared any of their poten- perceived usability, effectiveness, and quality of the game.
tially increased understanding to improve the knowledge of The survey allowed students to report how the game affected
the non-participants, the bias would be in the conservative their motivation to study ENGINEER 1D04. Completion of
direction of weakening the observed value of Space Race, the survey was voluntary.
since the effect would be to reduce the performance gap Like many experimental studies, we collected perfor-
between the two groups. mance data by using a pre-test, followed by our intervention
The surveys on programming experience and attitude (game), followed by a post-test. The difference between the
showed that the experimental and control groups were very pre- and post-test measures the effectiveness of the inter-
similar (Chan 2014, Chapter 6). An overwhelming major- vention. Game-based learning research, in particular, has
ity of students in both groups consider themselves as novice successfully used this technique. Some examples include an
programmers. The students from both the experimental and educational game in civil engineering (Ebner and Holzinger
control groups had approximately the same video gaming 2007), an educational virtual reality game (Manos 2005),
habits. Only 7% of the control group and 13% of the exper- and game-based learning in high school computer science
imental group had played an educational video game. Even (Papastergiou 2009).
so, 64% of the control group and 67% of the experimen- In our experiment, two identical quizzes were given to
tal group agreed that they would enjoy playing educational the experimental group to complete before and after playing
video games. a level in Space Race. The details of the quiz questions can
Table 5 outlines the weeks when performance data was be found in Chan (2014). Each quiz features five multiple-
collected. Starting from week 3, each Space Race level choice questions related to the material covered in the level.
was available for students to play for 1 week. Each level Solutions to the quiz are not offered after completion of
was designed to incorporate the concepts taught in the the pre- or post-quiz. The quizzes are completed only once

J Sci Educ Technol
Table 4 Space Race participation numbers A Kruskal-Wallis H test (Kruskal and Allen WW 1952)
was performed, using SciPy (SciPy Developers 2014), to
Level Experiment Control
compare and evaluate the results. This non-parametric test
1 236 249 was chosen because the Likert scale responses are ordinal
2 214 271 and because there are more than two categorical groups. The
3 194 291 analysis follows steps similar to the example in Opie and
4 190 295 Sikes (2004, p. 210). Kruskal-Wallis tests the null hypoth-
esis that the response distributions are equal. The p value
indicates how likely any differences observed would have
by the students. That is, the students are not asked to com- occurred if responses were drawn from the same popula-
plete the quiz again if they choose to re-play the level. All tion. The results of the Kruskal-Wallis H test, performed on
questions have to be completed before the quiz can be sub- questions 1 through 24, on the survey across the four cat-
mitted. The students were instructed to complete the quiz egorical groups are shown in Table 6. If the p value for a
individually. given response is not less than 0.05, one cannot say that
Examinations for ENGINEER 1D04 were completed by there is a significant difference in attitudes on the survey for
the entire class; therefore, the impact of Space Race can be the Likert responses between the four categorical groups.
assessed by comparing the exam performance between the However, if there is a significant result, then one can make
control and experimental groups. Since the majority of the the conclusion that at least one of the groups is different
exam questions were designed by the 1D04 instructor, not from the others. The test does not identify where the differ-
by us, the questions should have minimal bias. 1D04 used ences occur or how many differences actually exist (Kruskal
two midterm exams and one final exam to evaluate students’ and Allen WW 1952).
proficiency. All exams were multiple choice. The results of Table 6 indicate that there are significant
differences for at least one group for questions 1, 3, 6, 12,
13, and 14. Since the game mechanics of levels 2, 3, and 4
Game Reception and Feedback are very similar, while the game mechanics for level 1 are
different, the Kruskal-Wallis H test was performed again,
This section provides the results of the student survey, which but this time only with the responses for levels 2, 3, and
included Likert scale questions and written answers. An 4. The results in Table 6 indicate that for levels 2, 3, and
excerpt from the survey is shown in Fig. 5; the full ques- 4, only questions 1 and 3 have responses that are signifi-
tionnaire is available in Chan (2014). The survey results are cantly different. Going forward, all of the questions that had
used throughout this section to support the qualitative obser- insignificant differences between the four levels are ana-
vations made while watching students play Space Race. lyzed using the pooled data. In those rare cases where it is
The number of respondents for levels 1 to 4 was 180, 180, necessary, the analysis of the responses will consider the
92, and 98, respectively. The Likert response results of the differences between levels. Further details can be found in
survey (questions 1 through 24) from level 1 are shown Chan (2014).
in Fig. 6. The Likert responses for the other levels look
qualitatively similar (Chan 2014). Playability
To determine if the survey responses on the different
levels are actually similar, each question was evaluated to deter- Questions 1–5 and 24 on the Likert response portion of
mine if the results across the four levels can be combined. the survey relate to playability. A majority of the written
feedback mentioned the need for a more interactive tuto-
Table 5 Experimental timeline and course schedule rial on gameplay. One student said that they “like the game
however the instructions were not very intuitive and it was
Week no. Item
hard to initially learn how to play. After learning how to
Week 1 Classes begin play it was more fun.” An example of the current tutorial
Week 3 Space Race level 1, survey page, which is shown before each level, is shown in Fig. 7.
Week 4 Space Race level 2, survey The page features the game screen, along with explanations
Week 5 Space Race level 3, survey for each control, which students access by clicking on the
Week 6 Space Race level 4, survey, and midterm 1 appropriate button. We observed that the students rarely, if
Week 10 Midterm 2
ever, read the instructions before playing the game. This
Week 12 Classes ends
behavior may have contributed to the perceived difficulty in
Week 13 Final exam
learning how to play the game. It also suggests that the tra-
ditional method of defining game rules through text is no
J Sci Educ Technol
Fig. 5 Sample survey questions
longer effective. Perhaps, as a student suggested, “...a video of the game; they have a surprisingly negligible effect on
or example would be much more effective.” While a video player engagement in games that are less complex or similar
tutorial could be more interesting for students to review, it is to existing games in the same genre. Furthermore, providing
not clear whether a tutorial presented in a different manner help on-demand, during gameplay, can either have positive,
would have any effect. Andersen et al. (2012) show that the negative, or negligible effects on player engagement. Given
value of tutorials is highly dependent upon the complexity the relatively simplistic game mechanics of Space Race, a
Fig. 6 Level 1 survey Likert response results

J Sci Educ Technol
Table 6 Survey levels 1, 2, 3,

and 4 Kruskal-Wallis H test. Levels 1, 2, 3, and 4 Levels 2, 3, and 4
Significant at p < 0.05
Quest H-stat p-val Signif? H-stat p-val Signif?
Q1 32.861 0 Yes 27.41 0 Yes

Q2 7.752 0.051 No 0.549 0.76 No
Q3 22.865 0 Yes 10.798 0.005 Yes
Q4 4.083 0.253 No 4.165 0.125 No
Q5 6.419 0.093 No 0.458 0.795 No
Q6 7.894 0.048 Yes 4.001 0.135 No
Q7 7.139 0.068 No 4.314 0.116 No
Q8 1.389 0.708 No 1.311 0.519 No
Q9 2.171 0.538 No 0.088 0.957 No
Q10 1.117 0.773 No 0.929 0.628 No
Q11 5.624 0.131 No 0.887 0.642 No
Q12 19.038 0 Yes 0.152 0.927 No
Q13 11.028 0.012 Yes 4.012 0.135 No
Q14 11.175 0.011 Yes 4.67 0.097 No
Q15 4.62 0.202 No 2.696 0.26 No
Q16 4.062 0.255 No 2.119 0.347 No
Q17 4.788 0.188 No 4.586 0.101 No
Q18 0.717 0.869 No 0.625 0.732 No
Q19 1.616 0.656 No 1.487 0.475 No
Q20 4.214 0.239 No 3.863 0.145 No
Q21 4.224 0.238 No 3.081 0.214 No
Q22 0.264 0.967 No 0.073 0.964 No
Q23 2.848 0.416 No 0.935 0.626 No
Q24 1.802 0.614 No 0.459 0.795 No
complete tutorial seems unnecessary. However, the impact whether Space Race was able to motivate students to study
of on-demand help during gameplay should be investigated the course material and whether students prefer the game
further. over traditional teaching methods.
Overall, a large majority of the students enjoyed the When students were asked if “the video game helped
game, with at least 82% agreeing with the statement “I [them] to learn the basic programming concepts being pre-
enjoyed playing the video game” (survey question 4, Fig. 5). sented” (question 6), at least 85% students answered with
Level 2 had the highest proportion of students in agreement agreement in all four levels. Level 4 had both the most
with the statement, at 88%. A student commented that they students that agreed with the statement (88%) and dis-
“enjoyed the whole game, [because] it helped [them] learn agreed with the statement (6.5%). The Kruskal-Wallis H
some basic programming techniques and cleared up some test results shown in Table 6 reveal that there was a sig-
uncertainty.” Further feedback included the suggestion that nificant difference in the results between the four groups.
a “storyline element [be added to the game] as it makes it However, this can most likely be attributed to the difference
much more immersive and easier to remember when told as in the proportion of students that chose “Strongly Agree” as
a narrative.” opposed to “Agree” since all four levels had approximately
the same proportion of students that chose “Neither Agree
Teachability nor Disagree” and “Disagree”/“Strongly Disagree”; all were
comparatively low. Notably, level 3 and level 4 had almost
Questions 6–8, 11, 12, 15–17, 22, and 23 on the survey double the proportion of students that selected “Strongly
are the Likert responses that relate to the teachability of Agree” versus level 1.
the game. These questions explore students’ perceptions For question 8, at least 82% of the students felt that they
of the effectiveness of Space Race for teaching basic pro- “would recommend that others try to learn the basic pro-
gramming concepts. The teachability questions also explore gramming concepts with [Space Race]” for all four levels.
J Sci Educ Technol
However, from the written feedback, a student noted that help. Finally, asking an instructor for help in the context
“the game confirms knowledge [; it] doesn’t really teach.” of a game could perhaps be perceived as lower risk for the
Some other students expressed the same feelings and felt student than asking a question in another context, such as
that Space Race needed “a better response system to incor- during lectures, tutorials, or labs.
rect answers.” For example, a student suggested that “hints Question 15 was used to assess students’ preference for
[appear] if a team continuously got a certain question an educational video game over traditional teaching meth-
wrong.” To summarize, students felt that the game needed ods. At least 65% of the students agreed with the following
more feedback to teach them concepts, instead of simply statement: “I would prefer to learn basic programming con-
confirming correct knowledge. Although the game itself cepts through a video game as opposed to other traditional
does not directly teach, it certainly does create multiple teaching methods such as lecture tutorials, or textbooks.” In
opportunities for team members to teach each other. This the written response section, a student also said that “when
will be discussed further in the “Cooperation” section. [they] played this game [they] felt as if [they] were studying.
Furthermore, the game pushes students to reference their [They] would enjoy using this game as a break from the reg-
“Cheat Sheets” in levels 2, 3, and 4 for assistance and feed- ular studying for tests and exams.” Another student stated
back. Students felt that “the cheat sheet was very helpful in that they “learned a lot from this game as [they] were able to
summarizing the game.” apply what [they learned] in lectures and tutorials.” Finally,
From our observations, although the game does not a student said that “the interactions between team members
directly teach new concepts, it does create many “teachable bolstered the understanding of concepts and created a better
moments.” An instructor was always present during game learning experience than just reading a textbook.”
play to answer student questions. This allowed the instruc- The results of questions 16 and 17 can be used to gauge
tor to understand where the students needed to improve. how effective Space Race was in motivating students to
Moreover, the game facilitated the students identifying the be more interested in the subject of programming. When
concepts they were not familiar with. The game encour- students were asked if “the video game stimulated [their]
ages students to ask questions. A student is perhaps more curiosity on the basic programming concept being pre-
likely to receive help from an instructor when they are work- sented” (question 16), at least 74% agreed and at most 7.4%
ing with team members, because a less assertive student’s disagreed with the statement for all the levels. Furthermore,
more assertive teammates will make sure that the ques- at least 67% agreed that “the video game has motivated
tion is asked. Also, there is less shame associated with not [them] to review the course material” across all four levels.
understanding a concept if a student’s peers also do not Therefore, for some students, the game has the side effect of
understand. This, in turn, could lower a student’s anxiety motivating them to seek an understanding of programming
towards approaching their instructor, an authority figure, for outside the context of the game.
Fig. 7 Space Race level 3

tutorial paage
J Sci Educ Technol
Cooperation to play with them.” Another said “[their teammates were]

great but they need to be more efficient.” It was much more
Perhaps the most important aspect of Space Race is the common for students to comment that their team members
incorporation of cooperation into the game. Questions 9, were able to teach them a concept they did not understand.
10, 13, 14, 18, and 21 provide insight on how coopera- To quantify, across all levels, only 9 out of 231 (0.039%)
tion affected student learning. From question 9, across all responses for question 26, “How did your team members
levels, as little as 2.3% and at most 7.4% disagreed with affect your ability to learn the basic programming concepts
the statement that “[their] team members helped [them] to being presented,” could be interpreted as being “negative.”
better understand the basic programming concepts being The positive attitude towards cooperation in Space Race
presented.” Furthermore, only 0 to 2.3% disagreed with the may be partly because the interaction between teammates is
statement that “cooperation between team members makes face-to-face, as opposed to being virtual, as is the case for
this video game an effective teaching tool for learning basic many other video games. The benefits of face-to-face inter-
programming concepts” (question 18) for all the levels. action are supported by the success of Augmented Reality
From this, one can conclude that cooperation played a large games for teaching science (Dunleavy et al. 2009; Cheng
part in the effectiveness of Space Race as a teaching tool. and Tsai 2013). As Dunleavy et al. (2009) observe, in-
Students have also provided the following feedback with person interaction has a much greater bandwidth on multiple
regard to cooperation, which shows that they used their dimensions when compared to Multi-User Virtual Environ-
teammates to facilitate learning: ments (MUVEs). Moreover, instructional conversation, like
that between teammates explaining concepts to one another,
– “I really enjoyed the cooperative aspects of the game
is generally believed to provide learning benefits, especially
because it does a tremendous job of encouraging team-
when the talk is interpretive; that is, the greatest learning
work and sharing individual knowledge for the benefit
gains come when the conversation is generated in the service
of a common goal....”
of analysis or explanation (Palincsar 1998). Players work-
– “When we were stuck on a question, the team would
ing in groups was also shown to be beneficial for learning in
offer suggestions and we would get through the ques-
the meta-analysis of the cognitive and motivational effects
tion together. This was helpful when questions were
of serious games (Wouters et al. 2013).
hard and especially when the player did not know the
correct answer on their own.”
– “[My team members] helped me a lot, as sometimes my
Educational Effectiveness of the Game
initial guess was incorrect, but they [would] correct me
and explain the concept.”
This section will quantify the educational effectiveness of
– “Communicating with team members encourages dis-
Space Race. Specifically, the pre- and post-quizzes will
cussion of concepts which I like.”
be analyzed, followed by the ENGINEER 1D04 exam
– “I really liked the cooperative nature of this video
results. The quiz results demonstrate the immediate effects
game. Individually, we may have struggle [sic] with the
of Space Race, while the exam results demonstrate whether
concepts; but as a team, we conquered them.”
the knowledge gained can be retained over time. The exam
The individuals in each group had different levels of results also allow consideration of whether the understand-
understanding computing. From the feedback listed above, ing gained through Space Race can be transferred outside of
it is clear that the less informed students felt they were able the game.
to learn from their more capable peers. For their part, the We use the conventional value of p < 0.05 as our test
more capable peers also seem to have generally benefit- for statistical significance, even though we have multiple
ted from the experience. From the comments received on tests. In cases with n tests, the Bonferroni adjustment sug-
the survey, most of the students that felt confident with the gests lowering the p value to 0.05/n to account for the
course material still enjoyed the game; they felt a sense of increased chance of getting an incorrect “significant” result
pride in being able to instruct others. For example, a student (Type I error) (Bland and Altman 1995). However, we have
stated that they “[were] the lead of the team so [they] did not not used the Bonferroni correction because it is too conser-
learn anything from [their team members] however, [they] vative, being concerned with the universal null hypothesis,
taught them a lesson or 2 [sic] about Python.” It is important which states that all null hypotheses are true simultaneously
to note that there were very few comments on the survey (Perneger 1998). The Bonferroni correction has the logical
where students felt their team members did not teach them weakness that the results will change depending on the num-
anything new. Some of the “negative” comments could even ber of tests. Moreover, the Bonferroni correction increases
have a positive interpretation. For example, a student said the likelihood of type II errors (Perneger 1998). Rather than
that “[their teammates] slowed [them] down but [it] was fun use statistics to make sense of the combination of the test
J Sci Educ Technol
results, we present the statistical results separately for each level to determine whether the differences between the two
test and then use common sense to interpret them. Rather groups are insignificant. Since the t test assumes the sam-
than think of our approach as multiple tests of the same ples are normally distributed, the normality test from SciPy
hypothesis, we think of it as a series of individual tests, was used to first verify that both the experimental and con-
each with their own hypothesis. Our null hypothesis is not trol groups are normally distributed. The results are shown
that “the proportion of students that are correct do not differ in columns 4 and 5 of Table 7. Given that all the p val-
between the experimental and control groups for the exam- ues are larger than the level of significance of 0.05, the null
ination”, but rather that “the proportion of students that are hypothesis that the samples come from a normal distribution
correct do not differ between the experimental and control cannot be rejected. As such, the distribution of scores for
groups for Question X.” both the experimental and control groups are assumed to be
normal.
Benchmark for Student Abilities Having established normality for the distribution of
scores for both the control and experimental groups, the t
Since members of the experimental group were volunteers, test was performed with the null hypothesis that the two
we need to first assess that self-selection bias is minimal independent samples, the experimental and control groups,
before causation can be attributed to Space Race. A method have the same distribution. The results of the t test are shown
of benchmarking must be employed to determine the distri- in the last two columns of Table 7. It can be seen that the
bution of students in the experimental and control groups. p value for all four levels is well above 0.05. Consequently,
To do this, a combination of each student’s math and physics the null hypothesis cannot be rejected. This suggests that
exam marks were used. Both of these exams are manda- there is very weak evidence that the distribution of scores
tory for all first year engineering students at McMaster. between the experimental and control groups are not the
Almost all of the study participants will have taken these same, which implies that self-selection bias is not a con-
exams in the term preceding our experimental study. Math cern. Therefore, differences in performance between the two
and physics were chosen because research has shown that groups can reasonably be attributed, at least in part, to the
some misunderstandings arise from parallel or identical influence of Space Race.
causes between the domains of mathematics, physics, and
computer programming (Perkins and Simmons 1988). Pre- and Post-quiz Results
Each student was assigned a score S, which is calculated
using the formula As mentioned previously, in the “Experimental Procedure”
section, the pre- and post-quizzes for each level were iden-
S = Average(M, Average(P1 , P2 )), (1) tical. An example quiz question, for level 1, question 5
(L1Q5), is shown in Fig. 8. The full set of quiz questions
where M, P1 , and P2 are the grades on the mathemat- are available in Chan (2014). Students were not told which
ics final exam, physics midterm 1, and physics midterm 2. questions they answered correctly.
When using this equation, the following rules were applied Table 8 shows the percentage of students that answered
to special cases: correctly on the pre- and post-quiz. The table also includes
the number of students that successfully completed the pre-
1. If only one of P1 or P2 was available, the available mark
and post-quiz questions for each level (indicated by N). This
was used.
number may be less than the number of students that actu-
2. If only one of Average(P1 , P2 ) or M was available, the
ally played and completed that level of Space Race, since
available mark was used.
some students were unable to submit the post-quiz due to
3. If no marks were available, the student was omitted
technical issues. The quiz data, both pre and post, for these
from the evaluations.
students has been eliminated from the results.
Following these rules, five students could not be assigned McNemar’s test was chosen to verify the significance of
a score. Table 7 shows the number of students omitted for each the differences between the pre- and post-quiz results for
level. The number of participants eliminated is much smaller each question. McNemar’s test can be used when the data
than the size of the original sample they were removed from. being investigated is paired and nominal and the outcome of
Therefore, the elimination of participants is considered to interest is a proportion (McNemar 1947). The null hypoth-
have a negligible effect on the observed results. esis in this instance is that the proportion of subjects that
Visual comparison of the score distributions between the answer a question correctly is the same before and after play-
experimental and control groups suggests that these groups ing Space Race. The results of McNemar’s test can be seen in
can be considered as drawn from the same population. To Table 8. The table shows that 65% (13/20) of all the
confirm this, a two-sided t test was performed for each results are significant. Additionally, all of the results that are
J Sci Educ Technol
Table 7 Experimental and

control group statistics Eliminated Normality Tst p-val t test
Lvl Experiment Control Experiment Control t-stat p-val
1 2 of 236 3 of 249 0.421 0.595 0.5214 0.6024

2 2 of 214 3 of 271 0.364 0.699 1.2440 0.2142
3 2 of 194 3 of 291 0.081 0.669 1.0046 0.3157
4 1 of 190 4 of 295 0.355 0.668 0.8312 0.4064
significant are cases where the proportion of students that the values for their controls would change. Unfortunately,
were correct in the post-quiz exceeded that of the pre-quiz students appear to have been unable to abstract their under-
(positive delta correct). On the other hand, all of the negative standing from Space Race and apply it to L2Q2. Perhaps,
delta correct results were relatively small and statistically L2Q2 would have met with more success if there was an
insignificant. explicit example in the game where an instruction does not
Level 1 had two questions where the difference was not cause any controls to change because assignment has not
significant: questions 2 and 3 (abbreviated to L1Q2 and taken place. Providing in-game feedback may also have
L1Q3). The insignificance of the results suggests that there been helpful. The game could have explicitly revealed to the
was no change in the proportion of students that were cor- player that changing control values because of an instruc-
rect before and after game play. Both questions required tion is the same as changing a variable’s value through an
students to generalize what they observed in Space Race. assignment statement.
The game mechanics of level 1 make it easy for students to Table 8 shows insignificant improvement for L2Q3,
simply guess until they got the right answer. Moreover, the which is surprising because of its close relationship with
game does not provide feedback for incorrect answers. This L2Q5. L2Q5 asks students to identify, from a list of types,
combination could have resulted in the poor results on L1Q2 which type is not mutable. Eighty-five percent of the stu-
and L1Q3. dents were able to correctly identify strings as the
Out of the four levels, students improved the least on the correct answer after playing Space Race, a 16% improve-
level 2 quiz, with only one of the positive delta correct cases ment from the pre-quiz. However, only 31% of the students
being significant. L2Q1 saw negative improvement with a were able to select the correct answer for the seemingly
delta of −1.1%. However, the percentage of students that similar question L2Q3, shown in Fig. 10.
were correct before and after playing Space Race is over L2Q3 (Fig. 10) requires students to identify the option
95%. This suggests that the question was perhaps too easy; that produces an error; the correct answer (A) is the one
the concept it tested was already understood by the large where the line of code incorrectly attempts to change the
majority of students prior to game play. string via: “X[0] = "E".” The results from L2Q5 and
L2Q2 did not show a significant improvement in perfor- L2Q3 suggest that students know strings are not mutable,
mance after the students completed the game. L2Q2, shown but they do not understand what this implies. All of the
in Fig. 9, was used to assess whether students understood options from L2Q3 appear in the questions for level 2. That
that the value of a variable (of an immutable type) cannot is, all of the incorrect answers for L2Q3 are instructions that
change when the value is used, only when a new value is students have to execute in the game and the correct answer
assigned to the variable. is an “Error” result that they would have faced. The prob-
The topic of variable assignment was handled in Space lem might be traced to the use of Python’s slice notation
Race with instructions like “Yoke = float(7/2).” The in L2Q3, since the students were observed to struggle with
student must change their Yoke control to 3.0 because a this notation. The inability to recognize the “Error” option
value has been assigned to it. Without assignment, none of in L2Q3 as the correct answer may perhaps highlight Space
Race’s failure to teach students to identify exceptions. Feed-
back from the game to explain why their error was cleared
might be beneficial.
Table 8 shows that levels 3 and 4 had the greatest num-
ber of significant results. The only insignificant result is for
L3Q4 which, like L2Q1, had a high proportion of students
correct for both the pre- and post-quiz. L3Q5 (Fig. 11) had
the highest improvement out of all the quiz questions, with
Fig. 8 Sample quiz question, level 1, question 5 (L1Q5) a delta correct of 65%. This question is almost identical
J Sci Educ Technol
Table 8 Pre- and post-quiz

McNemar’s test results. Quest Pre (%) Post (%) Delta (%) Chi-Sqr p-val Signif?
Significant p < 0.05
Level 1, N = 230
1 17.83 40.87 23.04 44.59 0 Yes
2 26.09 30.43 4.35 1.79 0.181 No
3 71.3 69.13 −2.17 0.36 0.547 No
4 39.57 46.96 7.39 6.42 0.011 Yes
5 19.13 60 40.87 83.36 0.000 Yes
Level 2, N = 188
1 96.28 95.21 −1.06 0.33 0.564 No
2 23.94 28.72 4.79 3.24 0.072 No
3 26.6 30.85 4.26 2 0.157 No
4 53.19 54.79 1.6 0.36 0.549 No
5 69.15 85.11 15.96 22.5 0.000 Yes
Level 3, N = 171
1 66.67 73.68 7.02 5.14 0.023 Yes
2 30.41 46.2 15.79 13.75 0.000 Yes
3 66.08 81.29 15.2 16.1 0.000 Yes
4 90.06 90.64 0.58 0.04 0.835 No
5 17.54 82.46 64.91 98.57 0.000 Yes
Level 4, N = 167
1 79.04 92.81 13.77 17.06 0.000 Yes
2 61.08 69.46 8.38 6.53 0.011 Yes
3 49.1 59.88 10.78 9 0.003 Yes
4 72.46 85.03 12.57 13.36 0.000 Yes
5 31.14 61.68 30.54 40.02 0.000 Yes
to one of the instructions that appears in the programs of Comparing the Effects of Space Race on Different
level 3. Students
Even though this concept was explained on the “Cheat
Sheet,” many students struggled with this instruction. This The next question we asked was whether student ability had
is not surprising, since the percentage of students that got an influence on the degree of improvement in the post-test
L3Q5 correct (answer C) in the pre-quiz (18%) was the low- results (Chan 2014). To compare the effects of Space Race
est of all the questions. While playing the game, students on “stronger” and “weaker” students, all the students within
would often require the instructor’s help on this point. This the experimental group were sorted into three separate bins
question is different from the other instructions because it based on their S value (Eq. 1):
is less intuitive, since the empty string ("") is part of the
– Bin 0: S < 50%
answer. This provided the instructor with an opportunity – Bin 1: 50% ≤ S ≤ 75%
to explain the concept in more depth. This suggests that – Bin 2: S > 75%
the game may be most effective when it motivates inter-
action with an instructor on a topic that requires deeper The students in Bin 0, Bin 1, and Bin 2 represent the
explanation. weakest, average, and strongest students, respectively. The
analysis from the previous section was run on each bin
to determine the delta in correct answer on the post-quiz,
using the p value to judge significance (Chan 2014). Results
show that 6/20 (30%), 11/20 (55%), and 10/20 (50%) of the
improvements are significant for Bins 0, 1, and 2, respec-
tively. This indicates that Space Race had a larger positive
impact on the average and stronger students than on the
weaker students. Also, it appears that the impact of Space
Race on stronger and average students are approximately
Fig. 9 Question L2Q2 the same.
J Sci Educ Technol
Fig. 10 Question L2Q3
Midterm Exam Results better, than the students from the control group. One pat-
tern that can be drawn from the results in Table 9 is that
This section compares the performance of the experimental level 1 did not result in a significant positive delta correct
and control groups on the multiple-choice midterm exams. value.
We selected the ten questions from midterm 1 and the two
questions from midterm 2 that covered the same concepts as Final Exam Results
Space Race. Each selected question was matched with one
or more game levels. This was necessary to determine which The final exam results are presented in the same manner as
students belong to the experimental or control groups, since the midterm 1 and 2 results in the “Midterm Exam Results”
not everyone that participated completed all of the levels. section. The chi-square statistic was again used to test the
Table 9 shows the proportion of students that were cor- significance of the results, as shown in Table 10. The inter-
rect in both the experimental and control groups for each pretation of the final exam results differs from the midterms,
question of the midterm exams. The assigned levels for each because the final occurred at a later time, as shown in
question are listed, along with the delta correct value, which Table 5. At least 7 weeks passed between each student’s last
shows the difference in percentage of students that were exposure to Space Race and their completion of the final
correct between the experimental and control groups. exam. Therefore, the performance of the experimental group
The chi-square statistic was chosen to test the signifi- on the final exam can indicate whether the students were
cance of each result. This statistic can be used to investigate able to retain the concepts they learned in Space Race over
whether the distribution of the categorical variables differ a longer time period.
from one another (Moore 1976). It compares the counts Questions 5, 6, and 7 on the final exam were set by us,
of categorical responses between two or more independent not by the instructor. These questions are identical to the
groups. The null hypothesis posits that the proportion of pre- and post-quiz questions given in Space Race: question
students that are correct do not differ between the experi- 5 from L3Q5 (Fig. 11), question 6 from L2Q3 (Fig. 10), and
mental and control groups. Table 9 shows the results of the question 7 from L1Q5 (Fig. 8).
chi-test. Table 11 shows a comparison of how students that com-
The results show that 4/12 (33%) of the questions had pleted the post-quiz for each relevant question performed on
a significant difference in performance between the exper- the exam. For example, for question 5 (L3Q5), 141 students
imental and control groups. All of these significant results had the correct answer on their post-quiz; of those 141 stu-
are for a positive delta correct value, with the largest dents, 114 students (about 80%) had the correct answer on
improvement being 15% for question 27 on midterm 2. their final exam. Alternatively, 30 students got question 5
That is, all of the significant results are represented by (L3Q5) incorrect on their post-quiz, but 15 of them (about
cases where the experimental group outperforms the control 50%) got the correct answer on the final. The results show
group. The one negative delta correct result was not statisti- that only 55% of the students that had the correct answer
cally significant. From this, one can conclude that students on their post-quiz for question 6 selected the correct answer
from the experimental group either perform equally well or again on the final exam. However, questions 5 and 7 showed
significantly better retention rates, with 80% of the students
maintaining their correct answer. Not all of the students
retained their knowledge after playing Space Race, but the
majority did. In particular, questions that saw a large delta
correct from pre- to post-quiz results had a higher retention
rate. In all cases, some students that originally selected the
incorrect answer on the post-quiz chose the correct answer
on the final. Space Race may have been a contributing factor
Fig. 11 Question L3Q5 to motivating them to study the topic being tested.
J Sci Educ Technol
Table 9 Midterm chi-square

test results. Significant if Quest Level Expt correct (%) Control correct (%) Delta correct (%) Chi-square p-val Signif?
p < 0.05
Midterm 1
1 1 54.69 50.00 4.69 0.64 0.423 No
5 1, 2 89.58 93.33 −3.75 1.22 0.270 No
9 1 96.35 94.44 1.91 0.40 0.527 No
10 4 46.75 36.24 10.51 3.71 0.054 No
13 3 51.92 46.76 5.16 0.77 0.380 No
17 4 74.03 63.30 10.72 4.27 0.039 Yes
20 3, 4 35.26 29.63 5.63 1.07 0.300 No
23 2, 3, 4 59.30 47.50 11.80 4.71 0.030 Yes
27 3, 4 70.51 63.43 7.09 1.73 0.188 No
28 3 70.51 57.87 12.64 5.69 0.017 Yes
Midterm 2
5 1 78.80 75.14 3.67 0.50 0.479 No
27 3, 4 45.70 30.37 15.32 8.30 0.004 Yes
Table 10 Final exam

chi-square test results. Quest Level Expt correct (%) Control correct (%) Delta correct (%) Chi-square p-val Signif?
Significant if p < 0.05
4 1 51.83 51.08 0.76 0.00 0.965 No
5 (L3Q5) 3, 4 77.56 36.20 41.37 61.22 0.000 Yes
6 (L2Q3) 2, 3 48.26 32.68 15.57 8.83 0.003 Yes
7 (L1Q5) 1 80.63 63.98 16.65 12.26 0.000 Yes
11 4 88.16 82.67 5.49 1.73 0.189 No
16 1, 2 84.82 84.95 −0.13 0.01 0.913 No
17 1 37.70 43.55 −5.85 1.11 0.293 No
21 4 84.87 87.11 −2.24 0.22 0.640 No
23 3, 4 82.69 79.19 3.51 0.51 0.474 No
29 3, 4 76.28 61.54 14.74 8.42 0.004 Yes
44 3, 4 33.33 21.27 12.07 6.27 0.012 Yes
46 1 64.92 67.74 −2.82 0.22 0.638 No
Table 11 Final exam results

compared to post-quiz results Post-quiz correct sample Post-quiz correct Exam correct Exam/post-quiz
Quest 5 N = 171 141 114 80.85%
Quest 6 N = 188 58 32 55.17%
Quest 7 N = 230 138 111 80.43%
Post-quiz incorrect sample Post-quiz incorrect Exam correct Exam/post-quiz
Quest 5 N = 171 30 15 50%
Quest 6 N = 188 130 51 39.23%
Quest 7 N = 230 92 70 76.09%
J Sci Educ Technol
Questions 5, 6, and 7 had the largest difference in per- some cases simply because they spent more time on com-
formance between the experimental group and the control puting, since playing all four levels of Space Race required
group as seen in Table 10. Out of these questions, question about an hour per week for 4 weeks. A future study should
5 had the biggest delta correct value (41%). Interestingly, look at whether non-gaming participants that spent the same
question 5 (L3Q5) also had the greatest delta correct value amount of time with traditional methods of learning, such
for the pre- and post-quiz as seen in Table 8. This could sug- as reading a textbook or attending a tutorial, would per-
gest that most of the students that did not play Space Race form equally well or surpass the performance of the gaming
failed to learn the concept tested in question 5. participants.
Overall, 5/12 (42%) of the results from Table 10 were Work still remains to gain a better understanding of how
significant. None of the negative delta correct values were collaborative and competitive video games can be used to
significant results. This, again, suggests that students that teach programming (and other topics). There are a vast
played Space Race either performed equally well or better number of ways that a game can be designed to teach com-
than the students that did not play Space Race. For questions puter programming. Space Race is only one example of a
5, 6, and 7, it is unclear whether the experimental group per- collaborative and competitive video game that can be incor-
formed better than the control group simply because they porated into a course. Further investigation is necessary to
had seen the questions before, although it is important to fully judge the educational effectiveness. Specifically, this
note that the post-quiz did not provide any feedback to study was able to show, in a limited manner, that knowledge
students, i.e., students were not told whether their answer acquired from gameplay can be retained by some students
was correct or incorrect, unless they sought the answer out over 7 weeks. This was shown with three questions on the
themselves. final exam. Future work should include a greater amount of
data, potentially over a longer time period, so as to judge
knowledge retention more completely.
Concluding Remarks Given the relative success of Space Race, the question
posted at the outset of this paper can be answered in the
This paper has explored the effectiveness of a collaborative affirmative: a competitive and collaborative video game
and competitive video game in improving student under- can be used to improve learning when teaching computing
standing of basic programming. Based on the results of to post-secondary students. The work presented here will
this study, game-based learning can effectively be used to hopefully motivate other educators and researchers to use
teach basic computer programming concepts to students in collaborative and competitive video games to motivate and
higher education. In particular, cooperation within a game engage this generation’s learners.
world can be used effectively to encourage students to teach
one another. Also, in some cases, the knowledge learned in
Acknowledgments The support from the Faculty of Engineering,
the game can be transferred to a non-gaming context and McMaster University, is gratefully acknowledged, as is the participa-
retained by the student. tion of the first year engineering students. The specific individuals
As proposed in the “Introduction” section, the combi- that we would like to thank for their contributions are Christopher
nation of pedagogical idea and technology used in this Anand, Kevin Browne, Andrew Curtis, Douglas Down, Steve Drekic,
and Michael Viveros.
study were successful in motivating students and improv-
ing their performance. The specific ideas that are supported
by the findings include the following: (i) educational com-
puter games can be used as motivational learning tools; (ii)
References
a tablet platform can be used effectively for educational
Andersen E, O’Rourke E, Liu Y-E, Snider R, Lowdermilk J, Truong
games in higher education; (iii) social competition can con-
D, Cooper S, Popovic Z (2012) The impact of tutorials on games
tribute to the enjoyment and motivation felt by computer of varying complexity. In: Proceedings of the SIGCHI Conference
game players; and (iv) a collaborative component, through on Human Factors in Computing Systems. ACM, pp 59–68
multiplayer teams, can be added to motivate those students Annetta LA, Minogue J, Holmes SY, Cheng M-T (2009) Investigating
the impact of video games on high school students’ engagement
not influenced by competition.
and learning about genetics. Comput Educ 53(1):74–85
This study demonstrated that students that played Space Baloian N, Pino JA, Vargas R (2013) Tablet gestures as a
Race were able to outperform students that did not play motivating factor for learning. In: Proceedings of the 2013
Space Race on course exams in some instances. However, Chilean Conference on Human - Computer Interaction, ChileCHI
’13, pp 98–103,, NY, USA. ACM. ISBN 978-1-4503-2200-3.
it does not directly compare the educational effectiveness
doi:10.1145/2535597.2535622
of the video game to more traditional methods of teach- Barab S, Dede C (2007) Games and immersive participatory sim-
ing. One could argue that gaming participants did better in ulations for science education: an emerging type of curricula.
J Sci Educ Technol
Journal of Science Education and Technology 16(1):1–3, 2. ISSN International Conference on the Foundations of Digital Games,
1059-0145. doi:10.1007/s10956-007-9043-9 FDG ’10, pp 259–261, New York, NY, USA. ACM. ISBN 978-1-
Barnes T, Powell E, Chaffin A, Lipford H (2008) Game2Learn: 60558-937-4. doi:10.1145/1822348.1822386
improving the motivation of CS1 students. In: Proceedings of the Kahn K (1999) A computer game to teach programmming. In:
3rd International Conference on Game Development in Computer National Educational Computing Conference: NECC ’00: Spot-
Science Education, GDCSE ’08, pp 1–5, New York, NY, USA. light on the Future: Technology for the New Millenium
ACM. ISBN 978-1-60558-057-9. doi:10.1145/1463673.1463674 Kazimoglu C, Kiernan M, Bacon L, Mackinnon L (2012) A seri-
Bierre K, Ventura P, Phelps A, Egert C (2006) Motivating OOP by ous game for developing computational thinking and learning
blowing things up: an exercise in cooperation and competition in introductory computer programming. Procedia Soc Behav Sci
an introductory java programming course. ACM SIGCSE Bulletin 47:1991–1999
38(1):354–358 Kelleher C, Cosgrove D, Culyba D, Forlines C, Pratt J, Pausch
Bland JM, Altman DG (1995) Multiple significance tests: the Bon- R (2002) Alice2: programming without syntax errors. In: User
ferroni method. BMJ?[Br Med J] 310(6973):170–170, 01. http:// Interface Software and Technology. Citeseer
www.ncbi.nlm.nih.gov/pmc/articles/PMC2548561/ Kruskal WH, Allen WW (1952) Use of ranks in one-criterion variance
Boyce A, Doran K, Campbell A, Pickford S, Culler D, Barnes T (2011) analysis. J Am Stat Assoc 47(260):583–621
Beadloom game: adding competitive, user generated, and social Kumar B, Khurana P (2012) Gamification in education-learn computer
features to increase motivation. In: Proceedings of the 6th Interna- programming with fun. Int J Comput Distributed Syst 2:46–53
tional Conference on Foundations of Digital Games, FDG ’11, pp Li F, Zhao J, Shih TK, Lau R, Li Q, McLeod D (2008) Introduc-
139–146, New York, NY, USA. ACM. ISBN 978-1-4503-0804-5. tory C programming language learning with game-based dig-
doi:10.1145/2159365.2159384 ital learning. In: Advances in Web Based Learning - ICWL
Browne K, Anand C (2013) Gamification and serious game approach 2008, volume 5145 of Lecture Notes in Computer Science, pp
for introductory computer science tablet software. In: Gamifica- 221–231. Springer Berlin Heidelberg. ISBN 978-3-540-85032-8.
tion 2013. University of Waterloo, Stratford Ontario doi:10.1007/978-3-540-85033-5 22
Burton-Chellew MN, Ross-Gillespie A, West SA (2010) Cooperation Long J (2007) Just for fun: using programming games in software pro-
in humans: competition between groups and proximate emotions. gramming training and education—a field study of IBM robocode
Evol Hum Behav 31(2):104–108 community. J Inf Technol Educ 6:279–290
Chaffin A, Doran K, Hicks D, Barnes T (2009) Experimental evalua- Maloney J, Burd L, Kafai Y, Rusk N, Silverman B, Resnick M (2004)
tion of teaching recursion in a video game. In: Proceedings of the Scratch: a sneak preview. In: Proceedings of the Second Inter-
2009 ACM SIGGRAPH Symposium on Video Games, Sandbox national Conference on Creating, Connecting and Collaborating
’09, pp 79–86, New York, NY, USA. ACM. ISBN 978-1-60558- Through Computing, C5 ’04. IEEE Computer Society, Washing-
514-7. doi:10.1145/1581073.1581086 ton, pp 104–109. ISBN 0-7695-2166-5. doi:10.1109/C5.2004.33
Chan S (2014) The educational effectiveness of a cooperative and Manos MVGKK (2005) Combining software games with education:
competitive video game for teaching introductory programming. evaluation of its educational effectiveness. Educ Technol Soc
Master’s thesis, McMaster University, Hamilton, ON, Canada 8:54–65. http://ifets.info/journals/8 2/ets 8 2.pdf#page=59
Chan S, Smith WS (2014) A cooperative competitive video game for Martin A (2000) The design and evolution of a simulation/game
teaching introductory computer programming. In: ICEER 2014, for teaching information systems development. Simul Games
p4 31(4):445–463
Cheng K-H, Tsai C-C (2013) Affordances of augmented reality in sci- McNemar Q (1947) Note on the sampling error of the difference
ence learning: suggestions for future research. J Sci Educ Technol between correlated proportions or percentages. Psychometrika
22(4):449–462 12(2):153–157
Cockburn A, Bryant A (1998) Cleogo: collaborative and multi- Moore DS (1976) Chi-square tests. Defense Technical Information
metaphor programming for kids. In: Proceedings of the Computer Center
Human Interaction, 1998. 3rd Asia Pacific. IEEE, pp 189–194 Muratet M, Torguet P, Jessel J-P, Viallet F (2009) Towards a serious
Connolly TM, Stansfield M, Hainey T (2007) An application of game to help students learn computer programming. Int J Comput
games-based learning within software engineering. Br J Educ Games Tech 2009:3
Technol 38(3):416–428 Navarro EO, van der Hoek A (2004) Simse: an educational simula-
Connolly TM, Boyle EA, MacArthur E, Hainey T, Boyle JM (2012) tion game for teaching the software engineering process. ACM
A systematic literature review of empirical evidence on computer SIGCSE Bulletin 36(3):233–233
games and serious games. Comput Educ 59(2):661–686 Nickel A, Barnes T (2010) Games for CS education: computer-
Dunleavy M, Dede C, Mitchell R (2009) Affordances and limita- supported collaborative learning and multiplayer games. In: Pro-
tions of immersive participatory augmented reality simulations ceedings of the Fifth International Conference on the Foundations
for teaching and learning. J Sci Educ Technol 18(1):7–22. ISSN of Digital Games, FDG ’10, pp 274–276, New York, NY, USA.
1059-0145. doi:10.1007/s10956-008-9119-1 ACM. ISBN 978-1-60558-937-4. doi:10.1145/1822348.1822391
Eagle M, Barnes T (2008) Wu’s castle: teaching arrays and loops Opie C, Sikes PJ (2004) Doing Educational Research. SAGE Pub-
in a game. SIGCSE Bull 40(3):245–249. ISSN 0097-8418. lications. ISBN 9780761970026. https://books.google.ca/books?
doi:10.1145/1597849.1384337 id=XjBJltyWTzAC
Ebner M, Holzinger A (2007) Successful implementation of user- Palincsar SA (1998) Social constructivist perspectives on teaching and
centered game based learning in higher education: an example learning. Annu Rev Psychol 49(1):345–375. doi:10.1146/annurev.
from civil engineering. Comput Educ 49(3):873–890. ISSN 0360- psych.49.1.345
1315. doi:10.1016/j.compedu.2005.11.026 Papastergiou M (2009) Digital game-based learning in high school
Ford Jr CW, Minsker S (2003) TREEZ—an educational data structures computer science education: impact on educational effective-
game. J Comput Sci Coll 18(6):180–185 ness and student motivation. Computers & Education 52(1):1–
Hicks A (2010) Towards social gaming methods for improving game- 12. ISSN 0360-1315. doi:10.1016/j.compedu.2008.06.004. http://
based computer science education. In: Proceedings of the Fifth www.sciencedirect.com/science/article/pii/S0360131508000845
J Sci Educ Technol
Perkins DN, Simmons R (1988) Patterns of misunderstanding: an inte- pp 153–159, New York, NY, USA. ACM. ISBN 978-1-4503-2482-
grative model for science, math, and programming. Rev Educ Res 3. doi:10.1145/2526968.2526985
58(3):303–326 SciPy Developers (2014) Scipy.org. http://www.scipy.org/, 2014
Perneger TV (1998) What’s wrong with Bonferroni adjustments. Accessed: July 19
BMJ?[Br Med J] 316(7139):1236–1238, 04. http://www.ncbi.nlm. Smith H (2014) Spaceteam. http://www.sleepingbeastgames.com/
nih.gov/pmc/articles/PMC1112991/ spaceteam/, July 2014. http://www.sleepingbeastgames.com/
Pirker J, Riffnaller-Schiefer M, Gütl C (2014) Motivational active spaceteam/. Accessed: July 2
learning: engaging university students in computer science edu- Tinto V (2000) Taking student retention seriously: rethinking the first
cation. In: Proceedings of the 2014 Conference on Innovation year of college. NACADA J 19(2):5–10. http://suedweb.syr.edu/
& Technology in Computer Science Education, ITiCSE ’14, pp Faculty/Vtinto/Files/AACRAOSpeech.pdf
297–302, New York, NY, USA. ACM. ISBN 978-1-4503-2833-3. Vorderer P, Hartmann T, Klimmt C (2003) Explaining the enjoyment
doi:10.1145/2591708.2591750 of playing video games: the role of competition. In: Proceedings of
Prensky M (2007) Digital game-based learning. Paragon House the second international conference on Entertainment computing,
Probst TM, Carnevale PJ, Triandis HC (1999) Cultural values in inter- pp 1–9. Carnegie Mellon University
group and single-group social dilemmas. Organ Behav Hum Decis Wendel V, Gutjahr M, Göbel S, Steinmetz R (2013) Design-
Process 77(3):171–191 ing collaborative multiplayer serious games. Education and
Rajaravivarma R (2005) A games-based approach for teach- Information Technologies 18(2):287–308. ISSN 1360-2357.
ing the introductory programming course. SIGCSE Bull doi:10.1007/s10639-012-9244-6
37(4):98–102. ISSN 0097-8418. doi:10.1145/1113847.1113 Wouters P, Nimwegen CV, Oostendorp HV, Van Der Spek ED (2013)
886 A meta-analysis of the cognitive and motivational effects of
Sampaio B, Morgado C, Barbosa F (2013) Building collaborative serious games. J Educ Psychol 105(2):249
quizzes. In: Proceedings of the 13th Koli Calling International Yoon EKS (2005) Developing games and simulations for today and
Conference on Computing Education Research, Koli Calling ’13, tomorrow’s tech savvy youth. TechTrends 49:33–41

Smith 2017

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Smith 2017

Caricato da

Copyright:

Formati disponibili

J Sci Educ Technol

Collaborative and Competitive Video Games for Teaching

© Springer Science+Business Media New York 2017

Abstract This study measures the success of using a col- Introduction

Literature Review and iPhone game developed by Sleeping Beast Games©

Table 1 Games proposed for implementation in computing education

Author(s) Game(s) Programming topics

Martin (2000) Simulation game Information systems development

Table 2 Educational computing games with quantified impact on learning

Author(s) Game(s) Topics Evaluation

Fig. 1 Game screen for level 1

Fig. 2 Game screen for level 2

Fig. 3 Game screen for level 3

Fig. 4 Game screen for level 4

Fig. 5 Sample survey questions

Fig. 6 Level 1 survey Likert response results

Table 6 Survey levels 1, 2, 3,

Q1 32.861 0 Yes 27.41 0 Yes

Fig. 7 Space Race level 3

Cooperation to play with them.” Another said “[their teammates were]

Table 7 Experimental and

Lvl Experiment Control Experiment Control t-stat p-val

1 2 of 236 3 of 249 0.421 0.595 0.5214 0.6024

Table 8 Pre- and post-quiz

Fig. 10 Question L2Q3

Table 9 Midterm chi-square

Table 10 Final exam

Table 11 Final exam results

Potrebbero piacerti anche