Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
http://isc.sagepub.com
Published by:
Hammill Institute on Disabilities
and
http://www.sagepublications.com
Additional services and information for Intervention in School and Clinic can be found at:
Subscriptions: http://isc.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Downloaded from http://isc.sagepub.com at ESC NAL D EST PROFESS-ACATLAN on March 18, 2010
Features Intervention in School and Clinic
Volume 44 Number 2
November 2008 76-82
© 2008 Hammill Institute on Disabilities
10.1177/1053451208324504
http://isc.sagepub.com
hosted at
http://online.sagepub.com
Using evidence-based practices, or those instructional techniques shown by research to improve student outcomes
meaningfully, increases the performance of students with disabilities and should therefore be a priority for special
educators. But how does a practice come to be considered evidence based? The unique characteristics of group
experimental research (i.e., the use of a meaningful comparison group and the active manipulation of an interven-
tion) allow research consumers to conclude whether an intervention causes desired changes in student outcomes. As
such, group experimental research is one type of research that is well suited to determine evidence-based practices.
Examples of group experimental research are provided from the contemporary special education literature.
Keywords: effective instruction; law; legal; policy; personnel; preparation; professional development; control
group designs; research; education; training; teachers
are often encouraged to read and apply the research Group Experimental Research
reported in professional journals (Greenwood &
Maheady, 2001). However, without a sufficient back- The two hallmarks of group experimental research are
ground in research design and statistics, interpreting the (a) using a control or comparison group, with which the
research literature can prove deceptive. For example, outcomes of those receiving a particular treatment
not all research can answer questions such as “What are compared, and (b) actively initiating the treatment or
works?” and “Which practices are evidence based?” independent variable. Although multiple types of experi-
Lacking training in the principles of research design, mental and quasi-experimental designs exist (Campbell &
many educators may form false beliefs that practices are Stanley, 1963), for the sake of brevity and clarity, we focus
evidence based after reading research that in fact does on what Campbell and Stanley (1963) referred to as the
not address the issue of whether a practice works. pretest–posttest control group design and the nonequiva-
Different research designs result in specific types lent control group design. In these designs, one group of
of information that answer different questions. For participants (the experimental group) receives the inter-
example, qualitative research might yield important vention being tested, while another group (the control
information regarding how specific teachers feel about group) is taught as usual, without the intervention.
implementing an instructional strategy or why they Researchers measure student outcomes (e.g., reading per-
decided to maintain or discontinue the use of a teaching formance) in both of these designs before (pretest) and
technique. However, the determination of whether a after (posttest) the intervention is introduced to the exper-
practice works, or is evidence based, cannot be made on imental group. In well-designed group experimental
the basis of teachers’ perceptions. Indeed, only one type research, if meaningful differences in outcomes are
of research design allows research consumers to draw observed between those who did and those who did not
conclusions as to whether a teaching practice works, receive an intervention, assuming the research is otherwise
and that is experimental research. Although experimen- sound, research consumers can conclude that the interven-
tal research (i.e., including group experimental and tion caused the difference in group outcomes. With the
quasi-experimental research, as well as single-subject exception of single-subject research, which is often con-
research) cannot indicate why a teaching practice sidered a type of experimental research (Creswell, 2005;
works or how teachers and students feel about an Rumrill & Cook, 2001), no other research design allows
instructional approach, both of which are important consumers to infer such cause-and-effect relationships
issues addressed by other types of research, it is unique from research findings (e.g., if Intervention X is imple-
in that it can determine the general effectiveness of a mented, increased student performance results).
practice.
Despite the recent attention paid to EBPs in legisla- Control Groups
tion and the professional literature (e.g., Cook &
Schirmer, 2006; Odom et al., 2005), a great deal of In determining whether an intervention is effective,
uncertainty remains about which educational practices researchers must do more than show that student per-
are evidence based and how EBPs are identified. It is formance improved after an intervention was imple-
important for teachers, administrators, parents, policy mented. Without a comparable control group (also
makers, teacher educators, and other educational stake- referred to as a comparison group) that does not receive
holders to understand why and how a practice comes to the intervention, one cannot be sure that it was the
be considered evidence based. In particular, without intervention that produced or caused any change in out-
understanding some fundamental aspects of experimen- comes that may be observed. For example, suppose that
tal research design, educators cannot (a) examine a group of 20 fourth grade students with learning dis-
the research literature and reliably determine which abilities (LD) began participating in classwide peer
instructional practices are most likely to be effective or tutoring ([CWPT] Greenwood, Maheady, & Delquadri,
(b) understand the basis on which practices are or are 2002) in reading for 20 minutes a day, in addition to 40
not determined to be evidence based by researchers. minutes of typical reading instruction they already
Similar to single-subject research (see Tankersley, received from their resource room teacher. Before the
Harjusola-Webb, & Landrum, 2008 [this issue]), group intervention began, a pretest indicated that these
experimental research is designed to definitively students read correctly, on average, 90 words per
demonstrate whether an intervention causes change in minute on grade-level passages. After 10 weeks, the
student outcomes and is an EBP. students read an average of 125 words correctly per
Downloaded from http://isc.sagepub.com at ESC NAL D EST PROFESS-ACATLAN on March 18, 2010
78 Intervention in School and Clinic
minute on similar grade-level passages. At first glance, previous example, most of the students who received the
it is tempting to conclude that the intervention was intervention had, for instance, better nutrition or more
effective and caused the increase in reading perfor- support at home than students in the control group, the
mance. However, without a meaningful control group, difference in reading performance might be due to these
one has no way of knowing whether these students factors rather than the implementation of CWPT. The
would have improved just as much without the inter- more similar the comparison and experimental groups
vention being implemented. It is possible that typical are, the more confidence one has that any differences in
instruction would have produced the same gains or that outcomes between the groups are caused by the inter-
some other variable (e.g., the school started a new vention. In true experiments, participants are randomly
breakfast program that improved students’ nutrition) is assigned to either the experimental (those receiving the
responsible for the increase in performance. intervention) or the control (those not receiving the inter-
In group experimental designs, a comparison group vention) group. Random assignment does not guarantee
that does not receive the intervention is used to evaluate that the groups will be identical (e.g., by chance, students
the impact of the intervention. For example, assume that with greater support at home could be disproportionately
the researchers in the above example also measured the placed in the control group). However, it does eliminate
reading performance of 20 additional fourth grade the possibility of any systematic bias being introduced
students with LD in the same school who did not receive by the researchers and, especially when a study involves
CWPT but instead received typical reading instruction a large number of participants, makes it highly likely that
from the same resource room teacher for 60 minutes per groups will be functionally equivalent in every possible
day. Assume that the mean number of words read cor- way with the exception of the treatment variable being
rectly per minute improved among students in the con- examined (Creswell, 2005; Rumrill & Cook, 2001).
trol group from 92 to 110 over the semester. The If the groups are functionally equivalent in every possi-
researchers now use statistical tests to compare the two ble way except the presence of the intervention, any
groups, addressing the question of whether a gain from differences in outcomes are logically caused by the inter-
a mean of 90 to 125 (observed for the experimental vention, the one variable on which the groups differ.
group) meaningfully exceeds a gain from a mean of 92 Researchers can use methods beyond random assign-
to 110 (observed for the control group). ment to groups to heighten the likelihood that members
By using a control group that shared many characteris- in the control and experimental groups are functionally
tics with the experimental group (e.g., a new principal’s equivalent, such as blocking and matching (Creswell,
high expectations, the amount of time receiving reading 2005). These methods are particularly appropriate in
instruction, grade level, the disability status of students), studies with relatively small numbers of participants,
the researchers eliminated many alternative explanations because random assignment is less likely to produce
for the greater performance gains observed in the experi- functionally equivalent groups in these situations. In
mental group. For example, neither the new principal’s blocking, researchers make sure that the same numbers
high expectations nor the amount of time devoted to read- of participants from each level of a variable or variables
ing could have caused the larger gains in the experimen- of interest are represented in the control and experimen-
tal group, because both groups had the same new tal groups. For example, if the researchers in the previ-
principal and worked on reading for 1 hour per day. When ous example wanted to ensure that both groups were
the control and experimental groups are equivalent in similar in terms of IQ, they could group students into
regard to a variable, the researchers have controlled or three levels: below 100, 100 to 115, and above 115. They
accounted for that variable, meaning that it cannot have then would randomly assign an equal number of students
caused any outcome difference between the two groups. from each group to the control and experimental groups,
When experimental and control groups are highly similar, thus ensuring that that students at any level of IQ are not
researchers have controlled for a large number of vari- overrepresented in one group. Matching provides an
ables. This similarity among groups increases one’s con- even more rigorous method for enhancing the compara-
fidence that the one variable on which researchers made bility of groups. In this method, researchers rank all par-
sure the groups differed—the independent variable, in ticipants on a variable of interest (e.g., IQ) and pair
this case CWPT—was responsible for the difference in participants according to rank (i.e., students with the
outcomes between the two groups. highest and second highest IQ scores are paired, those
with the third and fourth highest IQs are paired, and so
True experiments. The nature of the control group is a forth). Researchers then randomly assign one participant
critical aspect of group experimental research. If, in the from each pair to the two groups, thereby ensuring that
Downloaded from http://isc.sagepub.com at ESC NAL D EST PROFESS-ACATLAN on March 18, 2010
Cook et al. / Group Experimental Research 79
the control and experimental groups are functionally Gersten et al. recommended that educators consider
equivalent on the matched variable. findings from quasi-experimental research as providing
meaningful support for the effectiveness of an instruc-
Quasi-experiments. True experiments in which par- tional practice when several pretest measures show the
ticipants are randomly assigned to groups are often experimental and control groups to be comparable (i.e.,
impractical. For example, it is often difficult, unethical, the groups differ by 0.50 standard deviations or less on
or both to randomly assign half of the students in a class each measure), and even then, results should be inter-
to receive an intervention that their classmates do not. preted with considerable caution.
Rather than randomly assigning participants to groups,
quasi-experimental studies involve researchers using Active Manipulation of a Treatment
pre-existing or intact groups (Creswell, 2005; Rumrill & Variable
Cook, 2001). For example, a researcher might train the
fourth grade teachers at one school to use CWPT and In combination with using a comparison group, group
compare their outcomes after a semester of implementa- experimental research must involve active manipulation
tion with those of fourth grade students at another of the intervention (i.e., systematically introducing the
school that does not use CWPT. Assume that on aver- intervention and/or removing it from instructional prac-
age, students in the school using CWPT increased their tice) to determine whether an instructional practice
words read correctly per minute on a grade-level pas- causes improved student outcomes. In nonexperimental
sage by 35, whereas the control group increased by only research, researchers often observe and compare out-
18. To the degree that the experimental and control comes as they naturally occur. For example, assume that
groups are functionally equivalent, research consumers researchers wanted to examine the impact of teacher
can conclude that CWPT is effective and is responsible praise on the on-task behavior of students with emotional
for the differences in the groups’ outcomes. and behavior disorders. The researchers observed the fre-
However, a quasi-experimental study can provide quency of teacher praise for 30 elementary teachers of
only tentative support for the claim that CWPT caused students with emotional and behavior disorders through-
the superior performance increase in the intervention out the semester, as well as the on-task behavior of their
group. Because of nonrandom assignment, the control students. The researchers then grouped the teachers into
and experimental groups may vary in a number of rele- high-praise and low-praise groups of 15 teachers each.
vant ways that researchers were unable to control, intro- After statistically controlling for relevant differences
ducing a host of alternative explanations for why between the groups (e.g., teacher experience, student aca-
outcomes improved more in the experimental group demic performance), the researchers find that students
(Creswell, 2005; Rumrill & Cook, 2001). Among an attending the classrooms of the high-praise teachers were
almost endless variety of potential differences, it is pos- more frequently on task than the students of the low-
sible that the teachers were more experienced at one praise teachers. Although it is tempting to conclude that
school, that one school had smaller classes, or that the greater teacher praise caused high levels of on-task
students in one school had received better instruction in behavior, because the researchers did not actively intro-
reading in previous years. duce the intervention, one cannot draw clear conclusions
Researchers can improve the soundness of quasi- regarding the direction of the relationship. That is, it is
experimental studies by measuring characteristics quite possible that students’ on-task behavior caused
related to the target outcome(s) (e.g., previous student teachers’ high levels of praise, rather than teacher praise
achievement, teacher experience) and demonstrating causing students to be on-task more frequently.
that the control and experimental groups are function- In experimental research, the direction of a cause-and-
ally similar. If the groups are found to differ on one or effect relationship is established in part because of the
more variables, researchers can statistically control for chronology of events. In the example regarding the rela-
such differences in data analysis. However, when partic- tion between teacher praise and students’ on-task behav-
ipants are not randomly assigned to groups, the groups ior, it is not possible to determine which variable occurred
may vary along more dimensions than researchers can first, or whether one caused the other. When researchers
reasonably measure and control for with statistical tech- introduce an instructional practice that was not previously
niques. Because of the inherent limitations associated used, however, the direction of the relationship is easier to
with not randomly assigning participants to groups, the infer. Suppose in the example of teacher praise that
credibility of findings from quasi-experiments is a researchers observe that two classrooms are functionally
subject of debate (Gersten, Baker, & Lloyd, 2000). equivalent, including equally low rates of teacher praise.
Downloaded from http://isc.sagepub.com at ESC NAL D EST PROFESS-ACATLAN on March 18, 2010
80 Intervention in School and Clinic
By providing training to one teacher in regularly deliver- and use the schema diagram to represent the problem,
ing praise for on-task behavior and measuring the (c) transform the diagram into a math sentence and
teacher’s frequent praise to students, the researchers then solve the problem, and (d) look back and check.
introduce teacher praise at a high level into only one of Xin et al. (2005) used 32-item tests composed of mul-
the classrooms. The other classroom, in which the tiplication and division word problems to assess partici-
teacher’s rate of praise was not changed systematically, pants’ performance in math problem solving. The
serves as the control. If students’ on-task behavior is researchers implemented the SBI and GSI interventions
observed to increase in the classroom in which praise was through 12 one-hour training sessions after giving a
implemented, but not in the low-praise classroom, it is pretest to all participants and before assessing the effects
reasonable to conclude not only that praise and student of the interventions on a posttest. Given the random
behavior are related, but that there is some directionality assignment of participants to groups, it is not surprising
to this relationship. Because the researchers actively that the mean performance of the two groups did not dif-
introduced, or manipulated, the higher rate of teacher fer significantly on the pretest (GSI group M = 29.85, SBI
praise before an observed increase in students’ on-task group M = 25.19). However, the SBI group performed
behavior occurred, change in student behavior could not significantly higher than the control group at posttest
have caused the higher levels of teacher praise in the (e.g., GSI posttest M = 47.55, SBI group M = 79.41). A
experimental group. Rather, all else being equal between significant difference between groups also was observed
the two groups, increased teacher praise logically must on assessments administered 1 to 2 weeks after the inter-
have caused the differential increase in student on-task ventions were completed (i.e., maintenance) and 3 weeks
behavior observed in the experimental group. Thus, using to 3 months after the conclusion of instruction (i.e., fol-
an equivalent comparison group and actively manipulat- low-up). Because Xin et al. (a) randomly assigned partic-
ing a treatment variable work together to enable ipants to groups, which were demonstrated to be
researchers to determine whether an instructional practice functionally equivalent except for the differences in prob-
causes change in student outcomes. lem-solving instruction, and (b) introduced the instruc-
tional practice between pre- and posttesting, one can have
Examples of Group Experimental Research considerable confidence that SBI caused meaningful
in Special Education gains in performance on multiplication and division word
problems beyond those produced by GSI.
True experimental research. Xin, Jitendra, and
Deatline-Buchman (2005) investigated the efficacy of Quasi-experimental research. Using a quasi-experimen-
schema-based instruction (SBI), an instructional prac- tal design, Woodward and Brown (2006) also investigated
tice for problem solving in math, for middle school the effectiveness of an instructional practice for middle
students who were at risk for math failure. Xin et al. ran- school students at risk for failure in mathematics.
domly assigned 11 students to the SBI group and 11 to Participating students had been identified for intense,
a general strategy instruction (GSI) group, which served remedial instruction in mathematics at two middle schools,
as a control group. The authors demonstrated that ran- both of which were lower-middle-class, suburban schools.
dom assignment resulted in similar composition of the Thirty-nine of the 53 participating students were identified
experimental and control groups on such variables as with LD. Researchers assigned the 25 at-risk students at
gender (6 girls in the SBI group and 5 in the GSI group), one school to the experimental group and the 28 students
age (respective means of 153.8 and 156.7 months), and at-risk for math failure in the other school to the control
intelligence (mean standard scores of 92 for both group. The researchers documented the similarity of the
groups). Students in the GSI group received instruction groups in these areas: number of students with LD (25 in
for mathematical problem solving in which they were the experimental group, 14 in the control group), mean
taught to use four steps: (a) read to understand, (b) draw score on a math test (12.36 in the experimental group,
a picture to represent the problem, (c) solve the prob- 13.36 in the control group), score on a math attitudinal sur-
lem, and (d) look back and check. The SBI group vey (50.92 in the experimental group, 53.32 in the control
received instruction in identifying the type of problem group), and gender (11 girls and 14 boys in the experimen-
presented and using schematic diagrams corresponding tal group, 15 girls and 13 boys in the control group).
to the problem type to solve the problem. The SBI group Students in the experimental group received instruc-
used the following procedure for mathematical problem tion in math for 55 minutes a day using a curriculum
solving that incorporated the use of schematic diagrams: emphasizing conceptual understanding that was
(a) read to understand, (b) identify the problem type designed and empirically validated for students at risk
Downloaded from http://isc.sagepub.com at ESC NAL D EST PROFESS-ACATLAN on March 18, 2010
Cook et al. / Group Experimental Research 81
Downloaded from http://isc.sagepub.com at ESC NAL D EST PROFESS-ACATLAN on March 18, 2010
82 Intervention in School and Clinic
Downloaded from http://isc.sagepub.com at ESC NAL D EST PROFESS-ACATLAN on March 18, 2010