Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
in educational planning
Series editor: Kenneth N.Ross
1
Module
T. Neville Postlethwaite
Institute of Comparative Education
University of Hamburg
Educational research:
some basic concepts
and terminology
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible
for the educational policy research programme conducted by the Southern and
Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ).
iiep/web doc/2005.01
The designations employed and the presentation of material throughout the publication do not imply the expression of
any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area or of
its authorities, or concerning its frontiers or boundaries.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means: electronic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission
in writing from UNESCO (International Institute for Educational Planning).
Content
1. Introduction 1
1
Module 1 Educational research: some basic concepts and terminology
6. Conclusion 29
Appendix A
Terminology used in educational research 30
Formative and summative evaluation 31
Assessment, evaluation, and research 33
Measurement 33
Surveys and experiments 34
Tests 36
1. Test items 36
2. Sub-scores/Domain scores 37
Variable 37
1. Types of variables 38
Indicator 42
Attitude scales 43
Appendix B
Further reading suggestions 46
Introductory texts 46
Examples of educational research studies that aimed
to have an impact on educational planning 47
Encyclopedias and handbooks 48
Journals 48
Appendix C
Exercises 49
II
Introduction 1
© UNESCO 1
Module 1 Educational research: some basic concepts and terminology
2
2. Descriptive research provides information about conditions,
situations, and events that occur in the present. For example,
a survey of the physical condition of school buildings in order
to establish a descriptive profile of the facilities that exist in a
typical school.
© UNESCO 3
Module 1 Educational research: some basic concepts and terminology
4
Types of education research
© UNESCO 5
Module 1 Educational research: some basic concepts and terminology
Descriptive questions
In the field of educational planning, the research carried out on
descriptive questions is often focused on comparing the existing
conditions of schooling with: (i) legislated benchmark standards,
(ii) conditions operating in several other school systems, or (iii)
conditions operating in several sectors of a single school system.
6
• Do the supplies and equipment in classrooms in the schools
match the legislated standards set by the Ministry? (The
supplies and equipment might be textbooks, exercise books,
pencils, erasers, seats and desks. The Ministry may have norms
that each student in a particular grade must have one mother
tongue textbook, one math textbook, one science textbook and
one social studies textbook, four exercise books, three pencils
and one eraser in a year, and that each student must have
one seat and one writing place. The research required in this
situation then consists of undertaking a count of the supplies
and equipment in all schools, or in a scientific sample of schools
that can be used to estimate the situation in all schools, and
then matching every school or classroom against the Ministry’s
norms. The main aim of this kind of research study would be to
examine whether there are particular districts or regions which
are under-supplied or over-supplied.)
© UNESCO 7
Module 1 Educational research: some basic concepts and terminology
Correlational questions
Behind these kinds of questions, there is often an assumption
that if an association is found between variables then it provides
evidence of causation. However, care must be exercised when
moving between the notions of association and causation. For
example, an ‘association’ may be discovered between the incidence
of classroom libraries and average class reading scores. However,
the real ‘cause’ of higher reading scores may be that students from
high socio-economic backgrounds, while they tend to be in classes
with classroom libraries, read better than other students because
their home environments (in terms of physical, emotional, and
intellectual resources) facilitate the acquisition of reading skills.
8
Three types of research questions in educational planning
Causal questions
Causal questions are usually the most important to educational
planners. For example, in some schools it is considered normal for
children to have a desk at which to sit. In other schools the children
sit on the ground and write on their laps. It is important to know if
schools (with a particular socio-economic background of children)
with a shortage of desks and seats achieve less well than schools
(with a similar socio-economic background of children) with an
adequate supply of desks and chairs. Or, to put the question in a
different way, is it the desks and chairs, or something else, which
really cause the better achievement? It may be a better supply
of books or better qualified teachers or, or, or.... It is, therefore,
important to disentangle the relative influence of each of the many
input and process factors in schools on achievement.
Thus, causal questions take one of two forms. Some examples are:
© UNESCO 9
Module 1 Educational research: some basic concepts and terminology
10
Identifying research issues 4
for educational planning
The reason why all educational planners should be prepared to
undertake research is that it is important to be sure of the facts
before making suggestions for changes in educational policies and
practices. The maxim must always be “when in doubt, find out”.
The questions listed above are only examples and are very general
in nature. It is up to each Ministry and educational planning office
to pose its own questions in order to remove doubt. The formulation
of research questions is, however, not an easy matter.
© UNESCO 11
Module 1 Educational research: some basic concepts and terminology
12
Identifying research issues for educational planning
© UNESCO 13
Module 1 Educational research: some basic concepts and terminology
14
Identifying research issues for educational planning
© UNESCO 15
Module 1 Educational research: some basic concepts and terminology
Literature review
The review of literature aims to describe the ‘state of play’ in the
area selected for study. That is, it should describe the point reached
by the discipline of which the particular research study will form
a part. An effective literature review is not merely a summary
of research studies and their findings. Rather, it represents a
‘distillation’ of the essential issues and inter-relationships associated
with the knowledge, arguments, and themes that have been
explored in the area. Such literature reviews describe what has been
written about the area, how this material has been received by other
scholars, the major research findings across studies, and the major
debates in terms of substantive and methodological issues.
16
Research design
Given the specific research questions that have been posed, a
decision must be taken on whether to adopt an experimental design
for the study or a survey design. Further, if a survey design is to
be used, a decision must be taken on whether to use a longitudinal
design, in which data are collected on a sample at different points
of time, or a cross-sectional design, in which data are collected at a
single point of time.
© UNESCO 17
Module 1 Educational research: some basic concepts and terminology
Instrumentation
Occasionally, data that are required to undertake a research study
already exist in Ministry files, or in the data archives of research
studies already undertaken, but this is rarely the case. Where
data already exist, the analysis of them is known as “secondary
data analysis”. But, usually, primary data have to be collected.
From the specific research questions established in the first step
of a research study it is possible to determine the indicators and
variables required in the research, and also the general nature of
questionnaire and/or test items, etc. that are required to form these.
Decisions must then be taken on the medium by which data owe
to be collected (questionnaires, tests, scales, observations, and/or
interviews).
18
Sequential stages in the research process
â
Search for, and review of, other previous studies that (a) identify
controversie debates, and knowledge gaps in the field; (b) elucidate
Stage 2
theoretical foundations that need to be tested empirically; and/or (c)
Literature
provide excellent models in terms of design, management, analysis,
reporting, and policy impact
â
Stage 3 Development of overall research design including specification of the
Research design information that is to be collected from which individuals under what
"review" research conditions
â
Construction of operation definitions of key variables and selection/
Stage 4
preparation of instruments (tests, questionnaires, observation
Instrumentation
schedules, etc.) to be employed in the measurement of these variables
â
Pilot testing of instruments/data collection and recording procedures
Stage 5
and techniques. Use of results to revise instruments and to refine all
Pilot testing
data collection procedures
â
Stage 6
Data collection and data preparation prior to main data analysis
Data collection
â
Stage 7
Data summarization and tabulation
Data analysis
â
Stage 8
Writing of research report(s)
Research report
© UNESCO 19
Module 1 Educational research: some basic concepts and terminology
Pilot testing
At the pilot testing stage the instruments (tests, questionnaires,
observation schedules, etc.) are administered to a sample of the
kinds of individuals that will be required to respond in the final
data collection. For example, school principals and/or teachers and/
or students in a small number of schools in the target population.
If the target population has been specified as, for example, Grade
5 in primary school, knowledge should exist in the Ministry, or in
the inspectorate, about which schools are good, average, and poor
schools in terms of educational achievement levels or in the general
conditions of school buildings and facilities. A ‘judgement sample’
of five to eight schools can then be drawn in order to represent a
range of achievement levels and school conditions. It is in these
schools that the pilot testing should be undertaken.
20
Sequential stages in the research process
The same can be said about the procedures for entering data,
cleaning data, and merging files. This work is usually undertaken
by the planning office data processing unit, but again the results
of the pilot testing experience can help to ‘de-bug’ the procedures.
Once the instruments and procedures have been finalized, the main
data collection can begin.
© UNESCO 21
Module 1 Educational research: some basic concepts and terminology
Data collection
When a probability sample of schools for the whole of the target
population under consideration has been selected, and the
instruments have been finalized, the next task is to arrange the
logistics of the data collection. If a survey is being undertaken in
a large country, this can require the mobilization of substantial
resources and many people.
22
Sequential stages in the research process
© UNESCO 23
Module 1 Educational research: some basic concepts and terminology
After the data have been entered, cleaned, and merged – which
often requires the student, teacher, and school files for a particular
school to be joined into one record – the data analysis can begin.
Data analysis
If there are unequal probabilities of selection for members of the
sample, or if there is a small amount of (random) non-response,
then the calculation of sampling weights has to be undertaken.
For teacher and school data there are choices which can be made
about weighting. For example, if one is conducting a survey and
each student in the target population had the same probability of
entering the sample, then the school weights can either be designed
to reflect the probability of selecting a school, or school weights
can be made proportional to the weighted number of students in
the sample in the school. In this latter case, the result for a school
variable means the school value given is what the ‘average student’
experiences. This matter has been discussed in more detail in the
module on ‘Sample Design’.
24
Sequential stages in the research process
a. Descriptive
Typically, the first step in the data analyses is to produce descriptive
statistics separately for each variable. These statistics are often
called univariates. Some variables are continuous – for example
‘size of school’ which can run from, say, 50 to 2,000. In this case
the univariate statistics consist of a mean value for all schools,
the standard deviation of the values, and a frequency distribution
showing the number of schools of different sizes. Other variables
are proportions or percentages. Such a variable could be the
percentage of teachers with different types of teacher training.
© UNESCO 25
Module 1 Educational research: some basic concepts and terminology
Rural Urban
All schools
Educational provision Schools schools
Mean S.D. Mean S.D. Mean S.D.
Desks per classroom X X X X X X
Chairs per classroom X X X X X X
Floor space per student X X X X X X
Pens/Pencils per student X X X X X X
The first pair of columns presents the mean value and standard
deviation for all schools in the sample for desks per classroom,
chairs per classroom, floor space per student, etc. However, the total
sample is ‘broken down’ in the second and third pairs of columns
into rural and urban schools.
b. Correlational
In this case product moment correlations or cross tabulations can
be calculated. There are statistical tests which can be applied to
determine whether the association is more than would occur by
chance. When the association between two variables is examined,
this is known as ‘bivariate’ analysis.
c. Causal
If the research design used is an experimental one, then tests can be
applied to see if the performance of the experimental group (that is,
the group subjected to the new treatment) is better than the control
group.
26
Sequential stages in the research process
© UNESCO 27
Module 1 Educational research: some basic concepts and terminology
Research report
There are three major types of research reports. The first is the
Technical Report written in great detail and showing all of the
research details. This is typically read by other researchers. It is
this report that provides evidence that the research was conducted
soundly. This is usually the report which is written first.
The second report is for the senior policy makers in the Ministry
of Education. It is in the form of an Executive Summary of about 5
or 6 pages. It reports the major findings succinctly and explains, in
simple terms, the implications of the findings for future action and/
or policy.
28
Conclusion 6
Each system of education has its political goals, its general
educational goals, and its specific educational objectives. For
example, some political goals stress equality of opportunity, others
stress quality of education, and many stress both.
© UNESCO 29
Module 1 Educational research: some basic concepts and terminology
Appendix A
30
The operational research aims emerging from these general and
specific research questions could include the following:
© UNESCO 31
Module 1 Educational research: some basic concepts and terminology
are trained to teach the new units, and that the units are tried out
in a range of schools. Examples of the kinds of questions asked in
formative evaluation would be: Do the specific objectives which
have been developed cover the general objectives to be learned
which are in the curriculum? Can the teachers cope with the new
units? Are there any ‘gaps’ in the curriculum units which result in
a poor coverage of some of the specific objectives? Can the layout of
the curriculum units be changed so as to make the material more
interesting for students?
32
Appendix A
Measurement
Measurement is a process that assigns a numerical description to
some attribute of an object, person, or event. Just as rulers and stop-
watches can be used to measure, for example, height and speed, so
can other quantities of educational interest be measured indirectly
through the use of achievement tests, questionnaires and the like.
© UNESCO 33
Module 1 Educational research: some basic concepts and terminology
34
Appendix A
© UNESCO 35
Module 1 Educational research: some basic concepts and terminology
Tests
A test is an instrument or procedure that proposes a sequence of
tasks to which a student is to respond. The results are then used to
form measures to define the relative value of the trait to which the
test refers.
1. Test items
A test may be an achievement, intelligence, aptitude, or practical
test. A test consists of questions, known as items.
An item is divided into two parts: the stem and the answer. Stems
pose the question. For example, a stem could be:
• What is the sum of 40 and 8?
• or in Reading Comprehension it could be a reading passage
followed by specific questions.
The answer could be an ‘open-ended’, a ‘closed’, or a ‘fill-in’ answer.
For example, in the first stem given above an open or fill in answer
could require the student to write the answer in a box. Or, it could
be put into multiple choice format as follows:
• What is the sum of 40 and 8?
a. 84 b. 50 c. 48 d. 408
In this closed format the student is requested to tick the correct
answer.
36
Appendix A
2. Sub-scores/Domain scores
The score of a student on the whole test is known as the “total test
score”. A sub-score refers to the achievement of the students on a
sub-set of items in the overall test. Thus, for example, in a Science
test it may be considered desirable to classify the items into Biology
items, Chemistry items, and Physics items. Each of these constitutes
a domain of the test and the scores on each are known as ‘sub-
scores’ or ‘domain scores’.
Variable
The term variable refers to a property whereby the members of a
group being studied differ from one another. Labels or numbers
may be used to describe the way in which one member of a group is
the same or different from another.
© UNESCO 37
Module 1 Educational research: some basic concepts and terminology
1. Types of variables
Variables may be classified according to the type of information
which different classifications or measurements provide. There are
four main types of variables: nominal, ordinal, interval, and ratio.
a. Nominal
This type of variable permits statements to be made only about
equality or difference. Therefore we may say that one individual is
the ‘same as’ or’ ‘different from’ another individual. For example,
colour of hair, religion, country of birth.
b. Ordinal
This type of variable permits statements about the rank ordering
of the members of a group. Therefore we may make statements
about some characteristics of an individual being ‘greater than’ or
‘less than’ other members of a group. For example, physical beauty,
agility, happiness.
c. Interval
This type of variable permits statements about the rank ordering
of individuals. It also permits statements to be made about the
‘size of the intervals’ along the scale which is used to measure the
individuals and to compare distances at points along the scale.
It is important to note that interval variables do not have true
zero points. The numbering of the years in describing dates is an
interval scale because the distance between points on the scale is
comparable at any point, but the choice of a zero point is a socio-
cultural decision.
38
Appendix A
d. Ratio
This type of variable permits all the statements which can be made
for the other three types of variables. In addition, a ratio variable
has an absolute zero point. This means that a value for this type of
variable may be spoken of as ‘double’ or ‘one third of’ another value.
For example, physical height or weight.
1. Validity
Validity is the most important characteristic to consider when
constructing or selecting a test or measurement technique. A valid
test or measure is one which measures what it is intended to measure.
Validity must always be examined with respect to the use which
is to be made of the values obtained from the measurement
procedure. For example, the results from an arithmetic test may
have a high degree of validity for indicating skill in numerical
calculation, a low degree of validity for indicating general reasoning
ability, a moderate degree of validity for predicting success in future
mathematics courses, and no validity at all for predicting success in
art or music.
© UNESCO 39
Module 1 Educational research: some basic concepts and terminology
a. Content validity
This type of validity refers to the extent to which a test measures
a representative sample of subject-matter content and behavioural
content from the syllabus which is being measured. For example,
consider a test which has been designed to measure “Competence
in Using the English Language”. In order to examine the content
validity of the test one must initially examine the subject-matter
knowledge and the behavioural skills which were required to
complete the test, and then after this examination compare these
to the subject-matter knowledge and behavioural skills which
are agreed to comprise correct and effective use of the English
language. The test would have high content validity if there was a
close match between these two areas.
b. Criterion-related validity
This type of validity refers to the capacity of the test scores to
predict future performance or to estimate current performance
on some valued measure other than the test itself. For example,
‘Reading Readiness’ scores might be used to predict a student’s
future reading achievement, or a test of dictionary skills might be
used to estimate a student’s skill in the use of the dictionary (as
determined by observation).
40
Appendix A
c. Construct validity
This type of validity is concerned with the extent to which test
performance can be interpreted in terms of certain psychological
constructs. A construct is a psychological quality which is assumed
to exist in order to explain some aspect of behaviour. For example,
“Reasoning Ability” is a construct. When test scores are interpreted
as measures of reasoning ability, the implication is that there is
a quality associated with individuals that can be properly called
reasoning ability and that it can account to some degree for
performance on the test.
2. Reliability
Reliability refers to the degree to which a measuring procedure
gives consistent results. That is, a reliable test is a test which would
provide a consistent set of scores for a group of individuals if it was
administered independently on several occasions.
Note that reliability refers to the nature of the test scores and not to
the test itself. Any particular test may have a number of different
reliabilities, depending on the group involved and the situation in
which it is used. The assessment of reliability is measured by the
‘Reliability Coefficient’ (for groups of individuals) or the “Standard
Error of Measurement” (for individuals).
© UNESCO 41
Module 1 Educational research: some basic concepts and terminology
Indicator
An indicator generally refers to one or more pieces of numerical
information related to an entity that one wishes to measure. In
some cases, it consists of information about only one variable
and this information may be gathered by only one question on a
questionnaire. For example, consider an indicator of classroom
library availability. In this case the indicator may be assessed by a
single variable (which has only two values) that is measured by one
question on a questionnaire:
Possession No Yes
Car
Refrigerator
T.V.
Video
etc.
42
Appendix A
Attitude scales
Probably the least debated definition of an attitude is: “a moderate
to intense emotion that prepares or predisposes an individual to
respond consistently in a favourable or unfavourable manner when
confronted with a particular object” (Anderson, 1985). In education,
attitudes which are often measured are ‘Like School’, ‘Interest
in Subject-matter’, and ‘Teacher Satisfaction with Classroom
Conditions’. Each of these titles implies a high to low measure.
Thus, ‘Like School’ implies a measure that ranks students from
those who love school to those who hate school.
© UNESCO 43
Module 1 Educational research: some basic concepts and terminology
Strongly Strongly
Disagree Incertain Agree
disagree agree
Since the Likert Scale is the one most frequently used in educational
research, a short explanation is given here. For example, consider
the development of a ‘Like School’ scale to be used for 14-year-
students. The researcher must first of all listen carefully to how
14-year-old students describe their like or dislike of schools. Both
positive and negative statements are used, after editing, to form a
set of statements about ‘Like School’. An example is given below:
44
Appendix A
© UNESCO 45
Module 1 Educational research: some basic concepts and terminology
Appendix B
46
Examples of educational research studies
that aimed to have an impact on educational
planning
Asmah bt Mohd Taib, Ahmad, Siatan, Solehan bin Remot, &
Nordin, Abu Bakar. (1982). Moral education in Malaysia.
Evaluation in Education, 6 (1), 109-136.
Jiyono & Suryadi, Ace. (1982). The planning, sampling, and some
preliminary results of the Indonesian Repeat 9th Grade survey.
Evaluation in Education, 6 (1), 5-30.
© UNESCO 47
Module 1 Educational research: some basic concepts and terminology
Journals
American Educational Research Journal
Applied Measurement in Education
Assessment in Education
Comparative Education Review
Educational Assessment
Educational Evaluation and Policy Analysis
International Journal of Educational Research
International Journal of Educational Development
International Review of Education
Journal of Education Policy
Research Papers in Education: Policy and Practice
Review of Educational Research
Studies in Educational Evaluation
48
Appendix C
Exercises
© UNESCO 49
Module 1 Educational research: some basic concepts and terminology
Select one of the five general aims above that you believe would
probably receive a high priority in your country. For that general
aim write five specific research questions. For each of the five
specific research questions, prepare several operationalized
research aims that focus on the performance of the education
system in meeting these aims. Then, write down a broad outline
of the sequence of activities that would need to be undertaken in
order to assess the system’s performance with respect to these
aims.
When this has been completed, discuss and write down, in outline
form only, the sequence of activities to be undertaken in the
research study or studies.
Collate the work of the small groups. Again refine the wording.
Then write down in detail the sequential activities to be
undertaken in a research study for each general aim covered to
provide valid, reliable, and useful information for decision-makers
to assess to which extent the general aims have been addressed.
50
Since 1992 UNESCO’s International Institute for Educational Planning (IIEP) has been
working with Ministries of Education in Southern and Eastern Africa in order to undertake
integrated research and training activities that will expand opportunities for educational
planners to gain the technical skills required for monitoring and evaluating the quality
of basic education, and to generate information that can be used by decision-makers to
plan and improve the quality of education. These activities have been conducted under
the auspices of the Southern and Eastern Africa Consortium for Monitoring Educational
Quality (SACMEQ).
In 2004 SACMEQ was awarded the prestigious Jan Amos Comenius Medal in recognition
of its “outstanding achievements in the field of educational research, capacity building,
and innovation”.
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible for
SACMEQ’s educational policy research programme. All modules may be found on two
Internet Websites: http://www.sacmeq.org and http://www.unesco.org/iiep.
Quantitative research methods
in educational planning
Series editor: Kenneth N.Ross
2
Module
Ian D. Livingstone
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible
for the educational policy research programme conducted by the Southern and
Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ).
The designations employed and the presentation of material throughout the publication do not imply the expression of
any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area or of
its authorities, or concerning its frontiers or boundaries.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means: electronic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission
in writing from UNESCO (International Institute for Educational Planning).
Content
1. Introduction 1
Preparing the ground 4
1
Module 2 From educational policy issues to specific research questions and the basic elements of research design
6. Summary 76
7. Annotated biliography 77
II
Introduction 1
© UNESCO 1
Module 2 From educational policy issues to specific research questions and the basic elements of research design
2
Introduction
© UNESCO 3
Module 2 From educational policy issues to specific research questions and the basic elements of research design
4
Who are the researchers?
They could be:
1. Policy makers
• want research to deal with their own particular problems, and
may not necessarily be interested in the relationship of these
issues to the broader socio-political context, the ‘fabric’ of
society.
© UNESCO 5
Module 2 From educational policy issues to specific research questions and the basic elements of research design
6
The systematic identification of educational policy research issues
Anticipating difficulties
If the ground is not properly prepared by a good dialogue between
the policy maker and the researcher, major difficulties can arise.
© UNESCO 7
Module 2 From educational policy issues to specific research questions and the basic elements of research design
8
The systematic identification of educational policy research issues
© UNESCO 9
Module 2 From educational policy issues to specific research questions and the basic elements of research design
Two very simple such models are shown in Figures. The first, the
linear model, may be useful in the physical sciences, but has not
generally been found to be appropriate in the social sciences, and
education in particular. One does not begin with a problem, devise
a research strategy to solve it, obtain a solution, and promulgate the
results as Figure 1 would indicate.
10
The systematic identification of educational policy research issues
1. The first model is the linear one referred to above, in which basic
research will lead to applied research, which in turn will lead to
development, and then application.
3. The third model is the interactive one, which assumes that some
sort of back-and-forth dialogue will take place between policy
makers and researchers (often through intermediaries), and
that this will result in a compromise acceptable to all parties,
and allow sound policy directions to be determined. This model
too tends to err on the optimistic side.
5. The fifth is the tactical one. Here the policy makers delay
making a decision on a matter about which they are
© UNESCO 11
Module 2 From educational policy issues to specific research questions and the basic elements of research design
E XERCISE 1
12
The systematic identification of educational policy research issues
© UNESCO 13
Module 2 From educational policy issues to specific research questions and the basic elements of research design
14
The systematic identification of educational policy research issues
© UNESCO 15
Module 2 From educational policy issues to specific research questions and the basic elements of research design
The first and second level goals are the ones of interest to policy
makers and planners, and are the ones with which researchers in
education ministries are most likely to be concerned.
16
The systematic identification of educational policy research issues
Some examples of research topics which would fall into this area of
learning goals and curriculum are:
b. Curriculum development
This follows on naturally from needs assessment surveys. Once
the needs of society are known, including both the needs of
citizens and the needs of employers, it is necessary to translate
these requirements into actual curriculum statements of what
shall be taught. A good feedback mechanism between employers
© UNESCO 17
Module 2 From educational policy issues to specific research questions and the basic elements of research design
and the education system can help to ensure that what is taught
is appropriate, that the curriculum is up-to-date and relevant
and not imported from some other, quite different society, or ten
years behind the times! There is much research to be done here,
and many different curriculum development models exist to
guide the researcher. Furthermore, if a policy is made to bring in
a curriculum innovation without careful trialing including sound
on-going formative evaluation, the new curriculum may be poorly
implemented and eventually unsuccessful in bringing about desired
results.
c. Provision of resources
Again, following on naturally from curriculum development is
the provision of appropriate curriculum resources to allow the
curriculum to be implemented as intended. In many developing
countries, a very large amount of time and energy has been spent
on the production of textbooks, learning packages, and other
curriculum materials, and on the setting up of libraries and resource
centres. More recently, some countries have set up educational radio
and/or TV networks, and have introduced computers to schools,
with all the necessary hardware and software. Apart from these,
there are the obvious needs for school buildings - classrooms,
laboratories, with all their necessary science equipment, and sports
facilities and equipment for cultural activities.
18
The systematic identification of educational policy research issues
entrusted with the task will need research to guide them in the
sorts of provisions they might make, and the likely costs of those
provisions.
a. Examinations
Virtually every country has written examinations in one form of
another. Some have national examinations at several points in
the education system, which determine the rate of promotion of
students. Results on such examinations provide an indication of the
level of education received by the student, and also an indication
of attainment relative to other students at this level. But they also
act as a filter, a form of selection, a mechanism for rationing of
scarce resources, to control the entry of students to higher levels of
education, and eventually their career paths into the occupational
hierarchy in the world of work. Although examinations may take
various forms (a single national examination, a number of regional
examinations, teacher-based assessments, or a combination of
these), virtually every country has them, at a higher or lower level.
In many developing countries, the first such examination is that
for selection for entry to secondary school. Whatever the form
of the examinations may be, their validity (particularly content
© UNESCO 19
Module 2 From educational policy issues to specific research questions and the basic elements of research design
20
The systematic identification of educational policy research issues
b. Educational enrolments
Once the population base is determined, political decisions are
likely to determine the extent of provision for education. But
politicians need guidance on what is possible (e.g., in providing a
pre-school service, or expanding a system of secondary schools),
and basic statistical research can provide that guidance. Information
needs to be available (in relation to both the statutory school
beginning and leaving age) on such matters as: the percentage
of children who are not in school, or some other educational
institution, at every level, the rural/urban and male/female balance,
© UNESCO 21
Module 2 From educational policy issues to specific research questions and the basic elements of research design
c. Educational structures
Once the characteristics of the relevant population base have
been ascertained, it is then possible to proceed to consider the
educational structures necessary to cater for those who wish (and
are able) to take advantage of them. This is likely to involve studies
of the location of schools and other educational institutions (school
‘mapping’), the provision of various alternative types of secondary
and tertiary education (comprehensive secondary schools,
vocational training institutions, teachers colleges, universities).
It will be necessary to advise on the likely effects of automatic
promotion or grade repetition policies, which are in turn linked to
examination pass-rates. Investigations are needed on the prevalence
of school truancy, ‘stop-out’ and ‘drop-out’ and the reasons for
them. Studies should be undertaken on the retention rates of
various institutions (including tertiary institutions such as teachers
colleges and universities). All of these will have a major bearing on
the quantity of education which must be provided.
22
The systematic identification of educational policy research issues
a. Unit costs
Policy makers need to know the cost of particular forms and levels
of education, and their various economic rates of return, both
private and social. This is desirable if the demand is to be estimated
accurately. At the same time, it should be appreciated that human
beings do not always behave rationally, and traditional rate of
return analysis makes strong assumptions. Its results should always
be placed alongside other information which takes political and
social realities into account before major financial decisions are
made. It is also helpful if the actual costs of running institutions
are known, to ascertain whether economies of scale are possible
(e.g., with small rural schools), and whether the marginal costs of
bringing in extra students are likely to be relatively small.
b. Resource allocation
In most countries, education is seen as a public good, to be provided
for all its citizens as one of their rights, at least up to a certain
level. But when times are tough, there is likely to be some pressure
towards user-pays, particularly if it is believed that some students
(e.g., those attending university) are receiving an undue share of the
© UNESCO 23
Module 2 From educational policy issues to specific research questions and the basic elements of research design
24
The systematic identification of educational policy research issues
c. Administrative structures
Some education systems are highly centralised, others devolve a
large amount of responsibility to the local level in administrative
matters, and sometimes in curriculum as well. Most achieve a blend
between the two. Research can be valuable in determining the
best compromise, in determining the cost-effectiveness of various
alter-native patterns of administration, and ascertaining the effects
of these upon teachers, principals, members of school governing
bodies, and parents (Wylie, 1990). The reward and incentive
structures for teachers also have a considerable bearing upon the
quality of education, and the evenness of its spread across rural and
urban areas in any country. Research is also needed on alternative
teaching strategies and delivery systems (e.g., distance learning,
small group, problem-based enquiry learning) and their impact on
costs and the physical environments for learning.
a. Teacher selection
Every country must pay considerable attention to the way in which
it selects its teaching force, because the lives of its future citizens
and its own economic welfare are in their hands. Research is
© UNESCO 25
Module 2 From educational policy issues to specific research questions and the basic elements of research design
b. Teacher education
The settings in which pre-service education of teachers is carried
out, whether at a university, a teachers college or both, the level
and length of training, and the balance between education theory
and classroom practice are all legitimate topics for research. In-
service education is another issue which is becoming of increasing
importance, as new and updated curricula and teaching methods
are introduced (e.g., in science) which make much heavier demands,
both upon teachers’ knowledge of their subjects, their ability to
use new equipment and new approaches (e.g., discovery, problem-
solving methods), and on their ability to cater for individual needs.
They may be required to work in team teaching situations in open
plan classrooms, and generally cope with a much more flexible and
less-structured teaching environment, in which the traditional ‘lock-
step’ rote learning is no longer acceptable.
c. Teacher effectiveness
All the matters mentioned above have a bearing on the general
issue of teacher effectiveness. In spite of the vast amount of
research into this area, we still do not know enough about
what makes for effective teaching, in any global, international
sense. This probably differs at different class levels, in different
countries, under different teaching conditions, and with different
community expectations. But it is important to have evidence about
effective and ineffective teaching behaviours to plan the content
26
The systematic identification of educational policy research issues
a. Monitoring achievement
It is not usual to undertake monitoring of achievement at every
grade level, because of the sheer expense of the operation.
Some countries do not even undertake formal monitoring at
all, through the use of tests or other assessments, because they
are not convinced that it is a cost-effective way to maintain
standards. They may prefer to use informal methods of quality
assurance by concentrating upon teacher in-service training, or
by providing standardised tests of achievement to guide teachers
in their curriculum and assessment decisions. But many countries
do select important ‘check-points’ in the system at which to
© UNESCO 27
Module 2 From educational policy issues to specific research questions and the basic elements of research design
b. Comparative evaluation
On a slightly broader front, system evaluation studies are desirable,
to consider such topics as the following. Is my country investing
more or less in the education of its population than other similar
countries, seen as a proportion of its GDP? How does the country
fare in relation to these other countries on a range of social
indicators, such as school enrolment ratios, graduation rates,
proportion of enrolment in higher level science and mathematics
courses, etc.? Are we producing a sufficient supply of highly-
qualified persons to compete with the output from other rapidly
developing countries. And even, what proportion of the total
educational budget should be devoted to educational research!
c. Inspection
Another traditional way in which educational systems maintain a
quality check has been through regular inspection of its teachers, at
least at the primary and secondary levels. The quality of education
provided in any country is crucially affected by the quality of the
28
The systematic identification of educational policy research issues
d. Miscellaneous
An additional Miscellaneous category has been added at the bottom,
to cater for such things as research on methods of information
dissemination, the preparation of research bibliographies, clearing-
house activities, research methodologies, and any other topic which
may not fit neatly into the grid. A summary of all these topics is
contained in the table below, grouped according to the classification
given, and expanded to indicate the sorts of activities which might
fall into each category.
© UNESCO 29
Module 2 From educational policy issues to specific research questions and the basic elements of research design
7. Miscellaneous
30
The systematic identification of educational policy research issues
A Pre-school
B Primary School
C Secondary School
D Tertiary Education
E Non-formal Education
F Other
These are not the only possible categories, of course. Some
countries with selective secondary schools might wish to divide the
Secondary School category up into two or more. Others with no
organised pre-school services may not need to include this category.
But the pattern will remain the same. Every cell in the table now has
its own code (e.g. 1.3B would refer to a project on the provision of
resources at primary school level, 5.2D would refer to a study of the
selection of lecturers for some form of tertiary education (perhaps
teacher education), the code 7.0F might refer to a miscellaneous
project on establishing an education index for the whole education
system, and so on. Projects can span several levels of the system,
and should be entered under each relevant level. Occasionally
a project may fit more than one content category. In this case, it
should be allocated to the category it fits best, cross-referencing it to
another category if this is thought desirable.
© UNESCO 31
Module 2 From educational policy issues to specific research questions and the basic elements of research design
Pre- Non
school Primary Secondary Tertiary formal Other
A B C D E F
7. Miscellaneous
32
The systematic identification of educational policy research issues
E XERCISE 2
Below are ten research projects which you may assume have been suggested to
your Ministry of Education. Classify each one onto the grid above, by giving it
the correct code. Compare your answers with those of other members of your
group.
© UNESCO 33
Module 2 From educational policy issues to specific research questions and the basic elements of research design
E XERCISE 3
Now consider whether there are any variations which you think
should be made to the grid to fit the education system in your own
country, either to the content categories or the levels.
Think up ten new projects which you think would appeal to policy
makers in your country, and use your own grid to classify them.
(You can stay with the grid above if you think it is suitable for
application in your country)
34
Setting priorities for 3
educational policy research
issues
When establishing a policy-oriented research programme, it is
highly desirable to establish a mechanism to determine national
priorities. Researchers are not good judges of what is important
nationally. They tend to see research projects in terms of ideas,
models or methodologies which are of interest to them. On the
other hand, some administrators lack foresight, and are only able
to identify projects when problems arise in Parliament or there is a
national outcry. It is usually then too late to initiate research which
will deliver the desired results on time.
© UNESCO 35
Module 2 From educational policy issues to specific research questions and the basic elements of research design
E XERCISE 4
36
Setting priorities for educational policy research issues
E XERCISE 5
1. Examine the list of ten projects given in Exercise 2 above, which you
labelled according to their location on the content grid.
© UNESCO 37
Module 2 From educational policy issues to specific research questions and the basic elements of research design
… E XERCISE 5
á
3 A moderately important project; could be done
2 A rather unimportant project; probably not a high priority right
now
1 An unimportant project; need not be done at this time
not important
Next consider the feasibility of the project, rating each project in a similar way,
from 5 down to 1. Note that importance is not the same as feasibility. A project
is important if we ought to do it. A project is feasible if we can do it, i.e., it is
possible, within the limitations of our resources of personnel, equipment and
finance, or those that we can obtain.
á
3 A moderately easy project to mount; some difficulty could be
experienced
2 A difficult project to get underway
1 A quite impossible project to carry out at this time
not feasible
38
Setting priorities for educational policy research issues
… E XERCISE 5
4. Now add together your ratings to obtain a total score out of 10 for
each project.
5. Come together with your group again, and on the basis of your ratings, see
if you can obtain a consensus on the order in which the projects should be
placed, from top priority down to bottom. It is suggested you enter the
results of the ratings for the whole group on a chart like the one below.
Using a desk calculator, find the mean and standard deviation of the ratings
in each row, i.e., for each project, and use this as the basis for discussion.
Mean SD
1.
2.
3.
… etc.
Did you find the same five projects at the top of the list as you did previously?
If you did not, can you explain why?
6. Note that the Standard deviation (S.D.) gives a rough measure of how
strong the agreement was within the group. A large standard deviation
(over 2, say) means there was quite a bit of difference in viewpoint; a small
standard deviation means you were all basically in agreement.
© UNESCO 39
Module 2 From educational policy issues to specific research questions and the basic elements of research design
40
in which the effort is made to cast light on current conditions and
problems through a deeper and fuller understanding of what has
already been done. If we believe the answer exists somewhere in
the present, we use the survey approach. In this approach we seek
to cast light on current problems by a further description and
understanding of current conditions. In other words, we seek more
fully to understand the present through a data-gathering process
which enables us to describe it more fully and adequately than is
now possible.
If, on the other hand, our interest is in predicting what will happen
in the future, that is, if we undertake something new and different,
or make some changes in the present condition, we have the
experimental approach, which is experimental in that it seeks to
establish on a trial (or experimental) basis a new situation. Then,
through a study of this new situation under controlled conditions,
the researcher is able to make a more generalised prediction of what
would happen if the condition were widely instituted.
© UNESCO 41
Module 2 From educational policy issues to specific research questions and the basic elements of research design
P. Most of those who are still in school at that time take it.
42
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
R. Well, perhaps that might be part of the reason. They might have
more support and resources at home to do well at school and
pursue their education. How could you check it out?
© UNESCO 43
Module 2 From educational policy issues to specific research questions and the basic elements of research design
P. Yes.
R. Could you get the results from boys and girls separately from
those schools which administer the tests in mathematics at this
level?
R. And then you could compare the mean scores of boys and girls
on the standardised test to see whether the same differences in
favour of girls occurred at a younger age.
P. Well, actually, now you mention it, I’ve had a suspicion about
the mathematics examinations for Form 3 over the last three
years. I have a feeling that these examinations have been
prepared in order to favour the performance of girls!
44
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
E XERCISE 6
Now that the real reason for the request for research has
emerged, try to continue the dialogue between the policy maker
and researcher. See if you can arrive at a plan of action to solve the
problem, which you have started to clarify by this discussion. You
will note it was not the problem you first thought it was!
Research question
What is the need for a secondary school on the island of Marino?
© UNESCO 45
Module 2 From educational policy issues to specific research questions and the basic elements of research design
Is there a need?
46
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
Detailed research
questions Examples of interview questions
What about the future? What is the birth rate in the whole
country?
Is the birth rate on the island of Marino
likely to be any different?
Is there much internal migration to or from
Marino?
...
...
© UNESCO 47
Module 2 From educational policy issues to specific research questions and the basic elements of research design
E XERCISE 7
E XERCISE 8
Now imagine you were interviewing the head man of the local
tribe on the island. What research questions would you want to
ask him? (Many of them will be different!)
Discuss this exercise with your group, and compare your answers
48
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
Some would not accept ‘It prepares a child for primary school’.
Some would not accept ‘It has economic benefit’. The answer to
this question lies in the field of values and is not really a matter for
scientific investigation. However, if someone told you what kind of
answer they would accept as evidence (e.g., it prepares a child for
school) then the question could be researched, within those terms.
E XERCISE 9
© UNESCO 49
Module 2 From educational policy issues to specific research questions and the basic elements of research design
50
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
© UNESCO 51
Module 2 From educational policy issues to specific research questions and the basic elements of research design
52
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
© UNESCO 53
Module 2 From educational policy issues to specific research questions and the basic elements of research design
Example 1
• Topic: The relationship between age of entry to primary school
and subsequent school success.
Note that the general topic has been made more specific by the
research question, in that school success has been defined in
terms of the English language only. Clearly there are many other
definitions of school success, and these would all need their own
research questions.
Note next that the research hypothesis has further narrowed the
research question, in specifying that performance in English is
defined in operational terms as written performance only (not
spoken performance or listening skills) and that this performance is
limited to what can be measured on a standardised test of reading
comprehension (not vocabulary, for example). Other hypotheses
would be needed to cover other aspects of English.
Furthermore, the age of 12 years has been set as the point at which
the measurement is to be done. If it was desired to see whether
the advantage persisted to a later age, it may be necessary to test
54
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
Example 2
• Topic: The use of micro-computers in diagnosing errors in basic
arithmetic.
© UNESCO 55
Module 2 From educational policy issues to specific research questions and the basic elements of research design
E XERCISE 10
As you go, check each of your statements against the four criteria
given above: relational, non-trivial, testable, clear and concise.
When you have finished, exchange your problem statements with
those of another member of your group. Can you improve on one
another’s problem statements? Discuss and critique them together.
56
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
• Observation
• Theory
• Literature Review
© UNESCO 57
Module 2 From educational policy issues to specific research questions and the basic elements of research design
58
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
knowledge, skills and attitudes) for a new learning task, and if the
quality of instruction is adequate, then they should all learn the
task, given sufficient time. Two typical hypotheses following from
this theory would be:
• Hypothesis 1
Following corrective instruction (mastery learning) the
relationship between original learning and corrected learning
will be zero.
• Hypothesis 2
Given equal prior learning, corrective instruction (mastery
learning) will produce greater achievement that non-corrective
instruction.
© UNESCO 59
Module 2 From educational policy issues to specific research questions and the basic elements of research design
60
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
© UNESCO 61
Module 2 From educational policy issues to specific research questions and the basic elements of research design
In the former, more quantitative approach, the data are fairly well
structured, as for example, with a structured observational schedule
or an interview schedule. Researchers often start with the second
approach, by recording conversations with people on the topic of
study, before moving on a more structured analysis.
62
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
© UNESCO 63
Module 2 From educational policy issues to specific research questions and the basic elements of research design
64
2. Research questions identify the target population from which
any sample will be drawn. Only when it is known exactly about
whom the policy decisions are to be made will it be possible to
decide the subjects to consult and study. If you do not identify
the people you are most interested in, those who are most
able to provide the information required, you risk omitting
important respondents from the project.
© UNESCO 65
Module 2 From educational policy issues to specific research questions and the basic elements of research design
66
Moving from specific research questions and research hypotheses to the basic elements of research design
Example 1
This is drawn from the illustration in Exercise 7 above, about the
new secondary school on the island of Marino.
Research question:
Sub-questions:
Is there a need?
Where is the need?
Why is there a need?
What kind of need is there likely to be in the future?
…
Type of study:
© UNESCO 67
Module 2 From educational policy issues to specific research questions and the basic elements of research design
Once such a table is complete, you will have some idea of the
size and scope of the project, the number of interviews and
questionnaires which will be required, and a rough estimate of the
amount of time and money which will be needed. The sample sizes
can be modified at a later stage in the planning, of course, when
the statistical treatment of the data is finally determined. But it is
helpful to have some guide at this early stage.
68
Moving from specific research questions and research hypotheses to the basic elements of research design
The source of the data for a few of the questions in the sample has
been indicated by the codes in the final column of Table 2 below.
E XERCISE 11
3. Write out the main research question again, and then at least
four sub-questions which follow from it, as in Exercise 7. These
give your general and specific aims for the project.
© UNESCO 69
Module 2 From educational policy issues to specific research questions and the basic elements of research design
Example 2
For some research projects, particularly experimental studies, it
is possible to go one stage further, and actually list the outcome
variables and key predictor variables which are to be considered.
This was less relevant for the question on the secondary school on
the island of Marino, because the outcome was simply whether or
not a school would be built, and many of the so-called ‘predictor’
variables were really background demographic information,
political intentions by government, and central and local
perceptions of need. It would be difficult to carry out an empirical,
statistical analysis on such data.
The project arose from findings in other studies that children with
high achievement levels invariably came from schools with large
libraries and/or homes with many books. Access to books seemed
to be important for language learning, and it was hypothesised that
a substantial increase in the supply of books available to children
might improve their language learning. This was actually done by
a ‘book flood’, the donation of a large number of books for use in
school classrooms.
These were divided up into two groups of four schools each, one
to adopt what was known as a Shared Book approach to reading,
the other to adopt a Silent Reading approach. (These terms are
described in the reference above.) A third, matched group of four
schools was used as a control group in the experiment.
70
Moving from specific research questions and research hypotheses to the basic elements of research design
Research question
Tables like this bring system to the research, and show critical
points in the timing. It is important that the research does not get
behind schedule, particularly if (as in this case) the fieldwork has
to take place at particular times to fit in with school holidays. The
delay of even a week or two here could set the whole project back a
whole term, or even a whole year.
© UNESCO 71
Module 2 From educational policy issues to specific research questions and the basic elements of research design
Experimental
group Shared Book Silent Reading Control
72
Moving from specific research questions and research hypotheses to the basic elements of research design
© UNESCO 73
Module 2 From educational policy issues to specific research questions and the basic elements of research design
Tribal region
Grade
level A B C D E
K
1
2
3
4
5
6
Total
74
Clarifying priority educational policy research issues and developing specific research questions and research hypotheses
Source of data
Ministry Principals
Tribal National of primary Tribal
region Local school Parents heads
Region A
Region B
Region C
Region D
Region E
For the third research question, Table 7 might give a convenient way
of recording the information, by classifying the reasons given by
various groups in their efforts to justify the location of a secondary
school. These reasons would be drawn from open-ended questions
asked at interviews, and coded afterwards into a few convenient
and logically coherent categories.
Source of data
Ministry Principals
National of primary Tribal
Reason Local school Parents heads
Reason 1
Reason 2
Reason 3
Reason 4
© UNESCO 75
Module 2 From educational policy issues to specific research questions and the basic elements of research design
6 Summary
You have now come to the end of this module. In it you have traced
the path that a research worker engaged in policy research must
tread. You have:
But you should now have a good grip on the aims of research into
educational policy issues, and how to get started on it.
76
Annotated biliography 7
Borg, Walter R. and Meredith D. Gall (1983). Educational Research:
An Introduction. London, Longman. [This text contains a good
section on developing the research proposal and planning the
research.]
© UNESCO 77
Module 2 From educational policy issues to specific research questions and the basic elements of research design
Livingstone, Ian D., Barry Eagle and John Laurie (1988). The
Computer as a Diagnostic Tool in Mathematics. Study 13 of the
Evaluation of Exploratory Studies in Educational Computing.
Wellington, New Zealand Council for Educational Research.
78
Annotated bibliography
© UNESCO 79
Module 2 From educational policy issues to specific research questions and the basic elements of research design
Travers, Kenneth J. and Ian Westbury (eds) (1989). The IEA Study
of Mathematics I: Analysis of Mathematics Curricula. Oxford,
Pergamon Press, for the International Association for [the
Evaluation of] Educational Achievement. [This is the first of a
three-volume work on the massive collaborative, international
study of mathematics carried out under the auspices of the IEA
during the 1980s. This volume on the mathematics curriculum
breaks new ground in the way in which it conceptualises and
measures the various curriculums. See pp.5-10.]
80
Annotated bibliography
© UNESCO 81
Since 1992 UNESCO’s International Institute for Educational Planning (IIEP) has been
working with Ministries of Education in Southern and Eastern Africa in order to undertake
integrated research and training activities that will expand opportunities for educational
planners to gain the technical skills required for monitoring and evaluating the quality
of basic education, and to generate information that can be used by decision-makers to
plan and improve the quality of education. These activities have been conducted under
the auspices of the Southern and Eastern Africa Consortium for Monitoring Educational
Quality (SACMEQ).
In 2004 SACMEQ was awarded the prestigious Jan Amos Comenius Medal in recognition
of its “outstanding achievements in the field of educational research, capacity building,
and innovation”.
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible for
SACMEQ’s educational policy research programme. All modules may be found on two
Internet Websites: http://www.sacmeq.org and http://www.unesco.org/iiep.
Quantitative research methods
in educational planning
Series editor: Kenneth N.Ross
3
Module
Kenneth N. Ross
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible
for the educational policy research programme conducted by the Southern and
Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ).
The designations employed and the presentation of material throughout the publication do not imply the expression of
any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area or of
its authorities, or concerning its frontiers or boundaries.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means: electronic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission
in writing from UNESCO (International Institute for Educational Planning).
Content
© UNESCO 1
Module 3 Sample design for educational survey research
5. References 70
Appendix 1
Random number tables for selecting a simple random sample
of twenty students from groups of students of size 21 to 100 73
Appendix 2
Sample design tables (for roh values of 0.1 to 0.9) 78
Appendix 3
Estimation of the coefficient of intraclass correlation 81
© UNESCO 1
Module 3 Sample design for educational survey research
Populations :
desired, defined, and excluded
In any educational research study it is important to have a precise
description of the population of elements (persons, organizations,
objects, etc.) that is to form the focus of the study. In most studies
this population will be a finite one that consists of elements
which conform to some designated set of specifications. These
specifications provide clear guidance as to which elements are to be
included in the population and which are to be excluded.
Sampling frames
The selection of a sample from a defined target population requires
the construction of a sampling frame. The sampling frame is
commonly prepared in the form of a physical list of population
elements – although it may also consist of rather unusual listings,
such as directories or maps, which display less obvious linkages
between individual list entries and population elements. A
well-constructed sampling frame allows the researcher to ‘take hold’
of the defined target population without the need to worry about
contamination of the listing with incorrect entries or entries which
represent elements associated with the excluded population.
© UNESCO 3
Module 3 Sample design for educational survey research
Representativeness
The notion of ‘representativeness’ is a frequently used, and often
misunderstood, notion in social science research. A sample is often
described as being representative if certain percentage frequency
distributions of element characteristics within the sample data are
similar to corresponding distributions within the whole population.
© UNESCO 5
Module 3 Sample design for educational survey research
1. Judgement sampling
The process of judgement, or purposive, sampling is based on the
assumption that the researcher is able to select elements which
represent a ‘typical sample’ from the appropriate target population.
The quality of samples selected by using this approach depends
on the accuracy of subjective interpretations of what constitutes a
typical sample.
2. Convenience sampling
A sample of convenience is the terminology used to describe a
sample in which elements have been selected from the target
population on the basis of their accessibility or convenience to the
researcher.
© UNESCO 7
Module 3 Sample design for educational survey research
3. Quota sampling
Quota sampling is a frequently used type of non-probability
sampling. It is sometimes misleadingly referred to as ‘representative
sampling’ because numbers of elements are drawn from various
target population strata in proportion to the size of these strata.
© UNESCO 9
Module 3 Sample design for educational survey research
2. Stratified sampling
The technique of stratification is often employed in the preparation
of sample designs because it generally provides increased accuracy
in sample estimates without leading to substantial increases in
costs. Stratification does not imply any departure from probability
sampling – it simply requires that the population be divided
into subpopulations called strata and that probability sampling
be conducted independently within each stratum. The sample
estimates of population parameters are then obtained by combining
information from each stratum.
3. Cluster sampling
A population of elements can usually be thought of as a hierarchy
of different sized groups or ‘clusters’ of sampling elements. These
groups may vary in size and nature. For example, a population of
school students may be grouped into a number of classrooms, or it
may be grouped into a number of schools. A sample of students may
then be selected from this population by selecting clusters of students
© UNESCO 11
Module 3 Sample design for educational survey research
• Randomly select one class, then include all students in this class
in the sample. (p = 1/6 x 4/4 = 1/6).
/ \ / \ / \
© UNESCO 13
Module 3 Sample design for educational survey research
this population and the sample mean calculated for each sample,
the average of the resulting sampling distribution of sample means
would be referred to as the expected value.
���� � ��
���� � � � �
� �
© UNESCO 15
Module 3 Sample design for educational survey research
where the standard error of the sample mean is equal to the square
root of the variance of the sample mean. Similarly the we can be “95
percent confident” that the population mean lies within the range
specified by:
���� � � �
���� �
���� � ��� �
© UNESCO 17
Module 3 Sample design for educational survey research
���� � � �
���� � � � � ������� ���
���� � ��� �
where b is the size of the selected clusters, and roh is the coefficient
of intraclass correlation
�� � � � � ����
�� � � � � �� � �� ����� ����
© UNESCO 19
Module 3 Sample design for educational survey research
� ���� �������
��� ��� �
�� � �
�� ������� ���
���� �� �
�� � �
© UNESCO 21
Module 3 Sample design for educational survey research
That is, the size of the simple random sample, n*, would have to be
greater than, or equal to, about 400 students in order to obtain 95
per cent confidence limits of p + 5 per cent.
���
� � � �� � ������� ����
�
���
� � � �� � ��� ����� ����
��
� ��
That is, for roh = 0.1, a two-stage cluster sample of 1160 students
(consisting of the selection of 58 primary sampling units followed
by the selection of clusters of 20 students) would have sampling
accuracy equivalent to a simple random sample of 400 students.
In Table 1.1 the planning equation has been employed to list sets
of values for a, b, deff, and nc which describe a group of two-stage
cluster sample designs that have sampling accuracy equivalent
to a simple random sample of 400 students. Three sets of sample
designs have been listed in the table – corresponding to roh values
of 0.1, 0.2, and 0.4. In a study of school systems in ten developed
countries, Ross (1983: 54) has shown that values of roh in this range
are typical for achievement test scores obtained from clusters of
students within schools.
© UNESCO 23
Module 3 Sample design for educational survey research
The main use of sample design tables like the one presented in Table
1.1 is to permit the researcher to choose, for a given value of roh,
one sample design from among a list of equally accurate sample
design options. The final choice between equally accurate options
is usually guided by cost factors, or data analysis strategies, or a
combination of both of these.
The choice of a sample design option may also depend upon the
data analysis strategies that are being employed in the research. For
example, analyses may be planned at both the between-student and
between-school levels of analysis. In order to conduct analyses at
the between-school level, data obtained from individual students
may need to be aggregated to obtain files consisting of school
records based on student mean scores. This type of analysis
© UNESCO 25
Module 3 Sample design for educational survey research
© UNESCO 27
Module 3 Sample design for educational survey research
E XERCISE A
The reader should work through the 12 steps of the sample design.
Where tabulations are presented, the figures in each cell should be
verified by hand calculation using the listing of schools that has been
provided in step 3. After working through all steps, the following
questions should be addressed:
© UNESCO 29
Module 3 Sample design for educational survey research
Step 1
List the basic characteristics of the sample design
1. Desired target population: Grade Six students in Country X.
8. Selection equation
Nhi nhi
Probability = ah × ( )×( )
Nh nh
ah × nhi
=
Nh
Step 2
Prepare brief and accurate written descriptions of
the desired target population, the defined target
population, and the excluded population
1. Desired Target Population: “Grade Six students in Country X”.
© UNESCO 31
Module 3 Sample design for educational survey research
Step 3
Locate a listing of all primary schools in Country_X
that includes the following information for each
school that has students in the desired target
population
1. Description of listing
2. Contents of Listing
© UNESCO 33
Module 3 Sample design for educational survey research
Step 4
Use the listing of schools in the desired target
population to prepare a tabular description of
the desired target population, the defined target
population, and the excluded population
There are 1100 students attending 20 schools in the desired target
population. Two of these schools are ‘special schools’ and are
therefore their 100 students are assigned to the excluded population
– leaving 1000 students in 18 schools as the defined target
population.
Step 5
Select the stratification variables
The stratification variables are ‘Region’ (which has two categories:
‘Region_1’ and ‘Region_2’) and ‘School Size’ (which has two
categories: ‘Large’ and ‘Small’ schools). These two variables
combine to form the following four strata.
© UNESCO 35
Module 3 Sample design for educational survey research
Step 6
Apply the stratification variables to the desired,
defined, and excluded population
Table 3 Schools and students in the desired, defined
and excluded populations listed by the four
strata
Step 7
Etablish the allocation of the sample across strata
The sample specifications in Step 1 require five schools to be
selected in a manner that provides a proportionate allocation of
the sample across strata. Since a fixed-size cluster of 20 students
is to be drawn from each selected school, the number of schools
to be selected from each stratum must be proportional to the total
stratum size.
Sample allocation
Population of
students
Stratum Schools Students
© UNESCO 37
Module 3 Sample design for educational survey research
Stratum_1 5 0 1 4 4 1 5
Stratum_2 0 4 1 3 1 3 4
Stratum_3 4 0 1 3 2 2 4
Stratum_4 0 5 1 4 3 2 5
Country X 9 9 4 14 10 8 18
© UNESCO 39
Module 3 Sample design for educational survey research
Step 9
For schools with students in the defined target
population, prepare a separate list of schools for
each stratum with ‘pseudoschools’ identified by a
bracket ( [ )
The sample design specifications in Step 1 require that 20 students
be drawn for each selected school. However, one school in the
defined target population, School_M, has only 10 students. This
school is therefore combined with a similar (and nearby) school,
School_N, to form a ‘pseudoschool’.
Note that School_D and School_R do not appear on the list because
they are members of the excluded population.
© UNESCO 41
Module 3 Sample design for educational survey research
Step 10
For schools with students in the defined target
population, assign ‘lottery tickets’ such that each
school receives a number of tickets that is equal
to the number of students in the defined target
population
Note that the pseudoschool made up from School_M and School_N
has tickets numbered 1 to 90 because these two schools are treated
as a single school for the purposes of sample selection.
Lottery
tickets
Lottery
tickets
Lottery
tickets
Lottery
tickets
© UNESCO 43
Module 3 Sample design for educational survey research
Step 11
Select the sample of schools
• Selection of two schools for stratum 1
The ‘winning tickets’ for the first stratum are drawn by using a
‘random start – constant interval’ procedure whereby a random
number in the interval of 1 to 200 is selected as the first winning
ticket and the second ticket is selected by adding an increment of
200. Assuming a random start of 105, the winning ticket numbers
would be 105 and 305. This results in the selection of School_B
(which holds tickets 101 to 160) and School_G (which holds
tickets 231 to 310). The probability of selection is proportional to
the number of tickets held and therefore each of these schools is
selected with probability proportional to the number of students in
the defined target population.
Only one winning ticket is required for the other strata and
therefore one random number must be drawn between 1 and 200
for each stratum. Assuming these numbers are 65, 98, and 176, the
selected schools are School_F, School_L, and School_T.
• Sample of Schools
Lottery
tickets
© UNESCO 45
Module 3 Sample design for educational survey research
Step 12
Use a table of random numbers to select a simple
random sample of 20 students in each selected
school
Within a selected school a table of random numbers is used to
identify students from a sequentially numbered roll of students in
the defined target population. The application of this procedure
has been described in detail by Ross and Postlethwaite (1991). A
summary of the procedure has been presented below.
© UNESCO 47
Module 3 Sample design for educational survey research
Step 1
List the basic characteristics of the sample design
1. Desired Target Population: Grade Six students in Zimbabwe.
© UNESCO 49
Module 3 Sample design for educational survey research
minimum cluster size, and ‘n’ has been used to describe the
total sample size.
The rows of Table 4 that correspond to a minimum cluster
size of one refer to the effective sample size. That is, they
describe the size of a simple random sample which has
equivalent accuracy. Therefore, the pairs of figures in the
fourth and fifth columns in the table all refer to sample
designs which have equivalent accuracy to a simple random
sample of size 400. The second and third columns refer to
an effective sample size of 1600, and the final two pairs
of columns refer to effective sample sizes of 178 and 100,
respectively.
The most important columns of figures in Appendix 2 for this
example are the fourth and fifth columns because they list a
variety of two-stage samples that would result in an effective
sample size of 400.
To illustrate, consider the intersection of the fourth and
fifth columns of figures with the third row of figures in the
first page of Appendix 2. The pair of values a=112 and n=560
indicate that if roh is equal to 0.1 and the minimum cluster
size, b, is equal to 5, then the two-stage cluster sample design
required to meet the required sampling standard would be
5 students selected from each of 112 schools – which would
result in a total sample size of 560 students.
The effect of a different value of roh, for the same
minimum cluster size, may be examined by considering the
corresponding rows of the table for roh=0.2, 0.3, etc. For
example, in the case where roh=0.3, a total sample size of
880 students obtained by selecting 5 students from each of
176 schools would be needed to meet the required sampling
standard.
© UNESCO 51
Module 3 Sample design for educational survey research
8. Selection Equation
Nhi nhi
Probability = ah × ( )×( )
Nh nh
ah × nhi
=
Nh
Step 2
Prepare brief and accurate written descriptions of
the desired target population, the defined target
population, and the excluded population
1. Desired target population: “Grade Six students in Zimbabwe”.
© UNESCO 53
Module 3 Sample design for educational survey research
Step 3
Locate a listing of all primary schools in Zimbabwe
that includes the following information for each
school that has students in the desired target
population
E XERCISE B
Use the SAMDEM software to list the following groups of schools in the
Zimbabwe.dat file:
© UNESCO 55
Module 3 Sample design for educational survey research
Step 4
Use the listing of schools in the desired target
population to prepare a tabular description of
the desired target population, the defined target
population, and the excluded population
E XERCISE C
Step 5
Select the stratification variables
E XERCISE D
© UNESCO 57
Module 3 Sample design for educational survey research
Step 6
Apply the stratification variables to the desired,
defined, and excluded population
E XERCISE E
Step 7
Establish the allocation of the sample across strata
E XERCISE F
© UNESCO 59
Module 3 Sample design for educational survey research
E XERCISE G
Table 10 Schools with students in the defined target population listed by school
size, school location, and school type (Zimbabwe Grade Six 1991)
Table 11 Students in the defined target population listed by school size, school
location, and school type (Zimbabwe Grade Six 1991)
© UNESCO 61
Module 3 Sample design for educational survey research
Step 9
For schools with students in the defined target
population, prepare a separate list of schools for
each stratum with ‘pseudoschools’ identified with a
bracket ( [ )
E XERCISE H
Use the SAMDEM software to list the schools having students in the
defined target population and located in ‘stratum 2: Harare region small
schools!’.
Step 10
For the defined target population, assign ‘lottery
tickets’ such that each school receives a number of
tickets that is equal to the number of students in
the defined target population
E XERCISE I
Use the SAMDEM software to assign ‘lottery tickets’ for schools having
students in the defined target population and located in ‘stratum 2:
Harare region small schools’. repeat this exercise for one other stratum.
Step 11
Select the sample of schools.
E XERCISE J
Step 12
Use a table of random numbers to select a simple
random sample of 20 students in each school.
E XERCISE K
© UNESCO 63
Module 3 Sample design for educational survey research
• Greater Flexibility
The Jackknife may be applied to a wide variety of sample
designs whereas (i) Balanced Repeated Replication is designed
for application to sample designs that have precisely two
primary sampling units per stratum, and (ii) Independent
Replication requires a large number of selections per stratum
so that a reasonably large number of independent replicated
samples can be formed
• Ease of Use
The Jackknife does not require specialized software systems
whereas (i) Balanced Repeated Replication usually requires
the prior establishment by computer of complex Hadamard
matrices that are used to ‘balance’ the half samples of data that
form replications of the original sample, and (ii) Taylor’s Series
methods require specific software routines to be available for
each statistic under consideration.
© UNESCO 65
Module 3 Sample design for educational survey research
��� � ��������������� � �
Quenouille’s contribution was to show that, while yall may have bias
of order 1/n as an estimate of y, the Jackknife value, y*, has bias of
order 1/n2.
�
���� � � �
�
���� � � �
�
� � � �
� � ���� � � � ���� �
��������
Tukey (1958) set forward the proposal that the pseudovalues could
be treated as if they were approximately independent observations
and that Student’s t distribution could be applied to these estimates
in order to construct confidence intervals for y *. Later empirical
work conducted by Frankel (1971) provided support for these
proposals when the Jackknife technique was applied to complex
sample designs and a variety of simple and complex statistics.
Substituting for yi* in the expression for var(y*) permits the variance
of y* to be estimated from the k subsample estimates, yi, and their
mean – without the need to calculate pseudovalues.
� ������� � �
���� � � � � ��� � � ����� � �
�
Wolter (1985, p. 156) has shown that replacing y* by yall in the right
hand side of the first expression for var(y *) given above provides a
conservative estimate of var(y*) – the overestimate being equal to
(yall-y*)2/(k-1).
© UNESCO 67
Module 3 Sample design for educational survey research
� � �
���� � � � � ���� � � � ���� ��� �
��������
Wolter (1985, p.180) and Rust (1985) have presented an extension
of these formulae for complex stratified sample designs in which
there are kh primary sampling units in the hth stratum (where h =
1,2,...,H). In this case, the formula for the variance of yall employs yhi
to denote the estimator derived from the same functional form as
yall – calculated after deleting the ith primary sampling unit from
the hth stratum.
� � ������ �
���� � ��� � � ����� ������ � � ������ ��� �
��
where K = Σkh is the total number of samples that are formed
© UNESCO 69
Module 3 Sample design for educational survey research
5 References
Brickell, J. L. (1974). “Nominated samples from public schools and
statistical bias”. American Educational Research Journal, 11(4),
333-341.
© UNESCO 71
Module 3 Sample design for educational survey research
R21 R22 R23 R24 R25 R26 R27 R28 R29 R30
1 1 1 2 1 1 1 1 1 4
2 2 2 3 2 2 2 2 2 6
3 3 3 4 3 3 3 3 3 7
4 5 5 5 4 5 4 4 4 8
5 6 6 7 5 6 6 6 5 10
6 7 7 8 6 7 7 7 6 12
7 8 9 9 7 8 8 8 9 15
8 9 11 10 9 9 11 11 10 16
9 10 12 11 11 10 12 12 11 17
10 11 13 12 12 12 13 13 12 18
11 12 14 13 13 13 15 15 14 19
12 13 15 15 14 14 17 17 16 20
13 14 16 16 15 15 18 18 21 22
14 15 17 17 16 16 19 19 22 23
16 16 18 18 18 18 20 20 23 24
17 17 19 19 20 19 21 21 24 25
18 18 20 21 21 21 22 22 25 26
19 20 21 22 23 22 25 25 27 27
20 21 22 23 24 25 27 27 28 29
21 22 23 24 25 26 28 28 29 30
© UNESCO 73
Module 3 Sample design for educational survey research
Size of Group
R31 R32 R33 R34 R35 R36 R37 R38 R39 R40
1 4 1 2 1 4 1 1 1 2
4 6 2 3 2 5 3 4 2 3
5 7 4 4 7 7 6 7 3 5
8 9 5 5 8 8 7 9 5 6
10 10 7 6 9 10 8 11 6 8
11 11 10 710 11 13 9 12 7 12
14 12 12 12 14 16 10 14 8 13
15 13 13 13 15 17 11 16 12 16
16 14 14 14 16 19 12 17 14 17
17 15 16 16 17 21 13 18 15 18
19 16 18 18 21 22 17 19 17 23
20 17 19 19 22 24 18 22 20 24
22 21 21 21 23 25 19 23 24 25
23 23 23 23 24 26 24 24 26 31
24 24 25 25 25 27 25 25 27 33
25 25 26 26 26 28 28 26 28 34
27 26 28 28 32 29 29 28 29 37
29 28 31 31 33 31 30 31 31 38
30 30 32 32 34 33 34 32 36 39
31 31 33 33 35 36 36 37 38 40
Size of Group
R41 R42 R43 R44 R45 R46 R47 R48 R49 R50
1 2 1 1 3 1 3 1 1 3
4 3 2 3 5 11 5 3 2 4
5 4 6 4 7 12 10 6 4 5
7 5 11 6 8 15 17 15 6 6
8 6 13 7 10 17 24 19 8 7
10 8 14 8 11 21 25 20 13 8
12 10 15 9 12 22 27 24 15 11
13 13 17 11 16 23 28 26 17 13
15 16 20 13 19 24 30 27 20 14
17 20 25 15 23 25 31 29 23 16
20 21 26 18 24 26 32 30 24 20
21 23 28 19 25 28 33 31 27 22
25 24 31 24 32 29 34 32 31 24
27 27 32 25 34 32 36 34 32 29
28 29 33 32 35 34 37 36 34 32
29 31 37 35 37 36 38 37 37 33
32 34 38 38 38 40 39 38 41 36
34 35 39 40 40 41 42 40 43 37
35 36 40 43 43 44 44 44 44 40
39 41 42 44 44 45 45 48 47 45
Size of Group
R51 R52 R53 R54 R55 R56 R57 R58 R59 R60
3 4 3 1 1 2 2 3 3 1
4 5 6 2 6 5 6 5 4 5
8 6 8 4 7 7 7 6 5 6
9 10 9 8 9 10 10 8 8 7
10 13 11 10 10 12 17 16 10 9
12 17 12 11 12 15 19 17 11 11
13 18 14 16 14 18 23 21 15 14
15 19 15 20 16 24 29 23 16 16
17 20 18 24 29 25 34 27 17 19
26 22 19 25 31 27 35 32 20 21
27 25 23 27 36 32 37 37 24 24
29 27 26 28 38 34 38 38 26 28
31 28 28 29 40 38 43 42 29 29
32 32 33 32 42 43 44 44 31 39
36 35 38 37 44 45 46 46 36 40
39 40 42 39 47 47 48 48 38 49
40 42 43 41 48 49 52 49 42 50
44 49 48 46 52 50 53 50 48 51
47 51 51 47 53 52 55 52 50 52
48 52 53 49 54 55 57 56 53 54
Size of Group
R61 R62 R63 R64 R65 R66 R67 R68 R69 R70
2 9 8 5 3 1 4 2 6 5
5 12 12 8 4 2 5 4 9 6
7 14 14 10 7 8 6 5 11 7
8 22 15 17 10 9 10 10 14 9
11 23 16 19 14 11 14 16 23 11
18 25 18 21 18 19 17 17 24 13
19 227 24 25 19 22 18 21 29 17
21 28 32 26 20 23 20 22 31 18
22 29 33 27 28 25 21 23 36 21
27 33 34 29 37 26 25 28 40 22
30 41 38 33 40 27 29 31 44 36
34 42 39 35 42 30 30 32 48 39
35 43 40 37 46 28 34 33 49 41
39 46 42 39 47 46 37 37 50 47
46 50 45 45 49 50 39 42 52 49
48 51 46 48 51 52 41 49 55 50
49 53 48 49 58 53 47 55 59 53
50 57 54 54 59 54 57 58 60 55
53 59 61 58 61 61 58 61 67 61
54 62 63 61 65 66 63 67 68 67
© UNESCO 75
Module 3 Sample design for educational survey research
Size of Group
R71 R72 R73 R74 R75 R76 R77 R78 R79 R80
9 3 1 4 2 6 3 12 8 6
15 6 5 15 7 20 5 13 10 10
17 10 9 19 9 21 6 23 14 17
20 12 11 23 15 22 7 24 15 19
22 18 12 24 16 24 10 28 21 21
25 20 13 27 23 25 16 32 23 23
27 22 14 28 29 30 24 46 31 26
28 26 21 35 36 31 31 47 32 31
34 29 28 38 37 35 32 48 24 32
35 32 29 42 41 37 36 49 41 36
37 38 34 47 43 39 38 52 46 41
39 41 37 49 45 50 45 53 48 42
46 43 42 50 46 51 48 57 54 44
48 44 48 51 50 58 62 59 58 45
50 47 54 52 53 61 64 63 61 46
53 48 57 56 60 63 70 64 62 50
60 50 62 62 66 65 71 67 68 61
61 52 63 64 67 67 72 70 71 71
62 61 68 65 69 68 73 75 75 73
64 66 72 72 72 73 75 76 77 76
Size of Group
R81 R82 R83 R84 R85 R86 R87 R88 R89 R90
9 9 2 4 3 1 1 6 2 1
13 15 4 6 6 8 3 8 12 2
16 16 14 7 7 13 7 16 14 5
33 23 17 13 25 24 14 23 15 10
40 27 30 14 27 34 16 26 39 22
41 28 35 19 30 35 17 29 48 24
42 29 41 25 32 39 19 32 52 25
44 33 42 31 36 45 20 35 56 27
45 42 49 34 45 47 26 42 57 31
46 43 52 39 48 50 28 43 60 35
54 48 53 40 50 52 30 48 62 40
59 50 58 44 54 61 40 50 64 46
64 56 63 59 58 64 53 55 66 51
71 57 67 62 63 65 54 59 67 53
72 60 69 70 64 66 60 63 68 54
73 61 71 73 66 67 66 64 70 73
74 66 74 74 68 68 77 70 75 81
75 67 76 77 72 75 80 74 76 83
76 69 77 83 83 80 83 81 88 85
79 71 80 84 85 83 86 86 89 90
Size of Group
R91 R92 R93 R94 R95 R96 R97 R98 R99 R100
7 8 1 1 7 2 3 8 3 7
15 15 7 8 9 4 10 11 11 21
20 17 9 9 12 6 26 17 20 22
29 21 19 10 14 9 33 23 29 25
36 25 20 14 18 16 37 25 30 27
38 26 21 15 24 18 39 31 34 35
42 31 22 20 32 21 40 36 38 40
43 34 30 23 35 28 41 38 42 49
49 40 31 36 46 32 48 42 50 50
50 41 32 46 49 47 50 48 51 52
54 49 35 48 54 50 51 54 53 56
58 54 36 57 60 51 53 58 54 57
59 61 40 58 63 63 61 63 60 75
67 69 46 60 64 64 62 70 62 79
73 71 51 61 69 70 64 71 63 81
80 78 62 72 78 73 65 78 68 86
84 79 72 79 87 78 68 89 79 87
85 81 73 81 88 80 70 91 92 92
87 83 76 87 92 89 82 92 94 93
91 84 89 91 95 92 97 93 99 94
© UNESCO 77
Module 3 Sample design for educational survey research
Appendix 2
© UNESCO 79
Module 3 Sample design for educational survey research
© UNESCO 81
Module 3 Sample design for educational survey research
�����������
��������� ��� �
���� � �� � �� ����
If the numerator and denominator on the right hand side of the above
expression are both divided by the value of WCMS, the estimated roh
can be expressed in terms of the value of the F statistic and the value
of b.
� ����
��������� ��� �
� � �� � ��
� �� ������
��������� ��� �
������� ��
Note that, in situations where the number of elements per cluster varies,
the value of b is sometimes replaced by the value of the average cluster
size.
In 2004 SACMEQ was awarded the prestigious Jan Amos Comenius Medal in recognition
of its “outstanding achievements in the field of educational research, capacity building,
and innovation”.
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible for
SACMEQ’s educational policy research programme. All modules may be found on two
Internet Websites: http://www.sacmeq.org and http://www.unesco.org/iiep.
Quantitative research methods
in educational planning
Series editor: Kenneth N.Ross
4
Module
Richard M. Wolf
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible
for the educational policy research programme conducted by the Southern and
Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ).
The designations employed and the presentation of material throughout the publication do not imply the expression of
any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area or of
its authorities, or concerning its frontiers or boundaries.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means: electronic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission
in writing from UNESCO (International Institute for Educational Planning).
Content
1. Introduction and purpose 1
1
Module 4 Judging educational research based on experiments and surveys
6. Selection 24
7. Drop-out 24
8. Interaction 25
9. Diffusion of experimental treatment 26
10. Resentful demoralization of students receiving less desirable
treatments 26
11. Generalizability 27
II
Content
Appendix
Adult Education Project: Thailand 53
Objectives of the evaluation project 54
Design 55
The construction of measures 62
Data collection 70
The participants’ achievement 74
Regression analyses 80
Conclusions and recommendations 89
References 93
Additional readings 94
© UNESCO III
Introduction and purpose 1
© UNESCO 1
Module 4 Judging educational research based on experiments and surveys
2
Introduction and purpose
© UNESCO 3
Module 4 Judging educational research based on experiments and surveys
2 Research as a way of
knowing
Research is a way of knowing the world and what happens in it. Its
application is governed by various principles, canons, and rules. It
can be contrasted with several other ways of knowing: authority,
tradition, and personal experience. Each of these has its limitations
and advantages.
4
long-term consequences such as soil depletion despite apparent
short-term benefits. That is, the existence of traditional knowledge
does not necessarily insure that it is also useful knowledge.
However, traditions can often be strong and educational planners
and policy-makers need to take them into account.
© UNESCO 5
Module 4 Judging educational research based on experiments and surveys
6
Research as a way of knowing
© UNESCO 7
Module 4 Judging educational research based on experiments and surveys
8
study of individuals over time in order to describe the process of
development and the stability and change in various characteristics.
Survey studies furnish a picture of a group of individuals or an
organization at a particular point in time. They often contain a
number of items of information for describing the individuals
or organization. Finally, experimental studies assess the effects
of particular kinds of interventions on various outcomes of
individuals.
© UNESCO 9
Module 4 Judging educational research based on experiments and surveys
10
matter how carefully planned and carried out, is a presumption of
causation. The presumption may be quite strong, however, if there
is considerable previous evidence and a strong theory favouring a
certain conclusion. For example, the evidence on the relationship
between cigarette smoking and lung cancer in human beings is
based on associational studies (it would be highly unethical to
conduct experimental studies in this area with human beings).
However, the weight of evidence from many studies and a strong
physiological theory makes the conclusions from such studies
strongly presumptive of a causal relationship even though these
studies do not definitively establish a causal relationship.
© UNESCO 11
Module 4 Judging educational research based on experiments and surveys
12
exercise this level of control cannot be termed an experimental
study and any causal conclusions from it must be regarded as
presumptive.
© UNESCO 13
Module 4 Judging educational research based on experiments and surveys
variables. However, this does not mean that the possibility of causal
relationships cannot be explored. They often are. In recent years,
a number of highly technical statistical procedures have been
developed that are used to explore possible causal relationships
among variables. The general term for such procedures is causal
modelling. Briefly, such procedures allow one to set up a network of
hypothesized causal relationships among variables and to test the
tenability of these relationships.
These models and the associated statistical procedures that are used
are often quite complex. They are based on a rather simple notion,
however. Whereas the existence of an association among two
variables does not mean that the two variables are causally related,
the researcher can examine the information collected in a study to
see if an association exists between the two variables in question. If
there is no association, then the researcher’s theory is disconfirmed.
However, if the two variables are associated, then the possible
causal relationship between the variables remains tenable. This is
not to say that a causal relationship has been established. It is only
that the existence of a causal relationship remains a possibility.
A great deal of rather complex analytic work goes on in the area
of causal modelling and readers of research reports often have
difficulty in following it. The basic goal of causal modelling should
be clearly kept in mind though. Causal modelling, at best, tests the
tenability of causal relationships; it does not establish them.
14
The main characteristics of experimental and survey studies
© UNESCO 15
Module 4 Judging educational research based on experiments and surveys
16
statistical analysis while in qualitative studies, such information is
not subjected to these procedures.
© UNESCO 17
Module 4 Judging educational research based on experiments and surveys
Subjects
O1T1O3
O2T2O4
18
In the diagram, ‘R’ denotes that a random assignment procedure
has been used to assign students to groups and groups to treatment
conditions. ‘O’ denotes an observation of performance. Such
observation may consist of tests that are administered to students.
Note that observations ‘O’ are obtained both before the introduction
of the treatment conditions and after completion of the treatment
period. While it is considered desirable to have both before (pre-
test) and after (post-test) observations, the former are not considered
crucial in an experimental study and are sometimes omitted
without jeopardizing the integrity of the design. The symbol ‘T1’
denotes the major treatment condition that is being studied. It may
be a particular instructional treatment or method of teaching, a
type of organizational structure, or some other intervention that
is being studied. ‘T2’ denotes the alternative treatment to which it
is being compared. In some studies, T2 is often a condition of no
treatment at all. In this way, researchers can assess the effect of a
particular intervention in relation to no intervention at all. The ‘no
intervention’ alternative is only occasionally studied in the field of
education since there are usually legal regulations that prevent very
uneven treatment of students – unless, of course, all treatments are
considered equally beneficial.
© UNESCO 19
Module 4 Judging educational research based on experiments and surveys
20
Experimental studies and some factors that often threaten their validity
1. History
In educational research experiments, events other than the
experimental treatment can occur during the time between the
pre-test and the post-test. For example, an in-service programme
to improve the knowledge and proficiency of teachers of reading
may be undertaken by a particular school. At the same time, some
of the teachers may be enrolled in university courses leading to
an advanced degree. As part of the programmes, these teachers
may be taking a course in the teaching of reading. It is certainly
possible to assess the teachers’ knowledge and proficiency in the
teaching of reading at the conclusion of the in-service programme,
but it would be virtually impossible to determine how much of their
performance is due to the in-service programme and how much
to their graduate course. This inability to determine the source of
an effect, namely, the enhanced knowledge and proficiency in the
teaching of reading renders the results of the study uninterpretable.
History, as a source of internal invalidity, opens the results of a
study to alternative interpretations. Readers of research reports
should routinely ask themselves whether a demonstrated effect is
due to the intervention under study or to something else.
2. Maturation
While an experiment is being undertaken, normal biological
psychological growth and development processes are almost certain
to continue to occur. These processes may produce changes in the
experimental subjects that are mistakenly attributed to differences
in treatment. Maturation effects are often noticed in long-term
experiments in which students learn a great deal through the
natural processes of exposure to stimuli that are a normal part
of their socio-cultural environment. For example, students at a
particular grade level in primary school who have mastered some of
the rudiments of reading will undoubtedly improve in their reading
© UNESCO 21
Module 4 Judging educational research based on experiments and surveys
3. Testing
In most educational experiments a pre-test is administered before
the experimental treatment which is then followed by a post-test.
The very administration of the pre-test can improve performance
on the post-test in a manner that is independent of any treatment
effect. This occurs when pre-testing enhances later performance
through providing practice in the skills required for the post-test, by
improving the ‘test-wiseness’ (or test-taking skills) of the students,
or by sensitizing the students to the purposes of the experiment.
The effect of retesting can sometimes be reduced by making sure
that students are given a different set of questions to answer when
a test is readministered. There are two ways to eliminate a testing
effect. The first would be to test only once, at the completion of
the treatment period. This can be troublesome since it would
deprive the researcher of information about the proficiency of
students at the beginning of the programme (‘O1’ and ‘O2’ in the
above diagram). The second way to eliminate a testing effect is to
randomly divide each group of students in half and administer the
test to one half of the group before the period of treatment and to
the other half of the group after instruction.
4. Instrumentation
A difference in the pre-test and post-test scores for an experiment
may sometimes occur because of a change in the nature or
quality of the measurement instrument during the course of the
22
Experimental studies and some factors that often threaten their validity
experiment. For example, the scores of essay tests may change from
pre-test to post-test because different standards are used by two
sets of scorers on different occasions. If the scorers of the pre-test
essays are particularly stringent in their grading while the scorers of
the post-tests are lenient, then gains in the essay scores may all be
due to the differences in standards used by the scorers rather than
the exposure of students to effective teaching. The same situation
may hold for more objective measures of student performance. For
example, the researcher might simply ask easier questions on a post-
test than on a pre-test. Instrumentation problems also often arise
when the amount of proficiency required to go from, say, a score
of six to twelve is different from the amount required to go from
a score of twelve to eighteen. Test scores are typically treated as if
the difference between score points is uniform throughout the test,
and therefore the research worker must be sensitive to the nature
of the instruments that are used and the units of measurement that
express performance.
5. Statistical regression
When students are selected for a treatment on the basis of extreme
scores, later testing invariably shows that these students, on
average, perform somewhat closer to the average for all students.
This phenomenon was identified by the psychologist Lewis
Terman in his studies of gifted children over half a century ago.
Terman sought to identify a group of gifted children. His major
criterion for classifying them as gifted was the score obtained on
the Stanford-Binet Intelligence Test. As part of his initial follow-
up of these children, Terman had them retested and found, to his
surprise, that the average intelligence test score had ‘regressed’
rather dramatically (eight points) toward the average. More recently,
remedial educational programmes have been developed in a
number of countries to help disadvantaged students. A common
practice in such programmes is to select individuals who score
© UNESCO 23
Module 4 Judging educational research based on experiments and surveys
6. Selection
In a study that seeks to compare the effects of treatments on
different groups of students, the group receiving one treatment
might be more able, older, more receptive, etc. than a group
receiving another, or no treatment. In this case, a difference
between groups on post-test scores may be due to prior differences
between the groups and not necessarily the differences between
treatments. For example, if students volunteer to participate in an
experimental learning programme, they can differ considerably
from students who decide to continue in a conventional programme.
In this case, the act of volunteering may indicate that the volunteer
students are looking for new challenges and may approach their
lessons with greater zeal. If differences favouring the experimental
programme are found, one faces the task of trying to decide how
much such results reflect the effects of the programme and how
much the special characteristics of the volunteer students.
7. Drop-out
In experiments that run for a long period of time there may be
differential drop-out rates between the groups of students receiving
the experimental treatments. For example, random allocation of
students to several educational programmes may have ensured
24
Experimental studies and some factors that often threaten their validity
comparable groups for the pre-test – but if one group incurs a loss
of low-performing students during the course of the experiment,
that group’s average performance level will increase, regardless of
the effectiveness of the programme to which it was exposed.
8. Interaction
It is possible for some of the above factors to occur in combination.
For example, a source of invalidity could be selection- maturation
interaction whereby, due to a lack of effective randomization, major
age differences occur between the treatment groups – which, in
turn, permits the possibility of differential rates of maturity or
development between the groups. These latter differences may
result in differences in post-test scores independently of treatment
effects. Another example illustrates how the joint operation
of several factors can lead one erroneously to conclude that a
programme has been effective. A study was conducted on the
effects of the ‘Sesame Street’ educational television programme
in the USA. In the first year of the study of that programme’s
effectiveness, four different groups of children were examined
to judge the instructional effects of the programme. The groups
were established on the basis of the amount of time spent viewing
‘Sesame Street’. This ranged from rarely or never watching the
programme to viewing it more than five times a week. Scores on
the ‘Sesame Street’ pre-tests were found to be highly related to the
amount of time spent viewing the programme. That is, the higher
the initial score, the more time was spent watching the programme.
Scores on the post-test, and hence the gains, were in the same
way highly related to the time spent viewing the programme. The
combination of pre-test performance and self-selection into viewing
category made it impossible to assess the effectiveness of the first
year of ‘Sesame Street’ from the data collected.
© UNESCO 25
Module 4 Judging educational research based on experiments and surveys
26
Experimental studies and some factors that often threaten their validity
11. Generalizability
The ten threats to validity described above focus upon the need
to ensure that the effect of the experimental treatment is not
confounded by extraneous variables. All of these threats can
be managed through exerting firm controls over the design and
execution of an experiment. However, the implementation of these
controls may lessen the ‘realism’ of the environment in which
an experiment is conducted and consequently may affect the
generalizability of the research findings to other populations, other
treatments, other measurement instruments, and other social/
economic/cultural/environmental settings. Any research study
is conducted in a particular time and place and with particular
students, treatment variables, and measurement variables. To what
extent can the results of any one study be generalized to other cases?
Strictly speaking, this question is unanswerable. However, at the
very least, the reader of research reports must decide how similar
the group under study is to the groups he/she is responsible for and
how similar the conditions in his/her setting are to the one in which
the study was conducted. In order to make these decisions it will
be extremely helpful if the writer of a research report has carefully
described the setting in which the study occurred, the students
who were studied, the particular treatment that was investigated,
and the types of measurements that were taken. In short, the
generalizability of a study’s findings for a different setting is a
matter that requires careful consideration, not automatic acceptance.
© UNESCO 27
Module 4 Judging educational research based on experiments and surveys
28
sample has to be chosen in such a way that it is representative of the
larger population of which it is a part.
Survey studies have also been used for comparative purposes. The
studies conducted by the International Association for the Evaluation
of Educational Achievement (IEA), have been used to compare the
performance of students at various age and grade levels in different
nations. The identification of achievement differences has often led
to a closer examination of various nations’ educational systems with
a view to improving them. Within nations, comparisons have often
been made between various types of schools, for example, single-
sexed versus coeducational, in order to assess the relationship
between the sex composition of schools and achievement.
© UNESCO 29
Module 4 Judging educational research based on experiments and surveys
30
Survey studies and some factors that often threaten their validity
© UNESCO 31
Module 4 Judging educational research based on experiments and surveys
32
Survey studies and some factors that often threaten their validity
© UNESCO 33
Module 4 Judging educational research based on experiments and surveys
34
Survey studies and some factors that often threaten their validity
3. Instrumentation
Once one has decided what population to study, the next step is
deciding what items of information should be collected via the data
collection instruments (tests, questionnaires, etc.) that have been
designed for the study. One may choose to study a very limited set
of variables or a fairly large set of variables. However, the research
questions governing a study should determine the information
that is to be collected. Some variables may be obtained at very low
cost. For example, asking a member of a sample to report his or her
sex will require only a few seconds of the respondent’s time. On
the other hand, obtaining an estimate of a student’s proficiency in
mathematics or science may require one or more hours of testing
time. Time considerations will be a major determinant of how much
information is collected from each respondent. Usually, a researcher
or a group of researchers will need to make compromises between
all the information that they wish to collect and the amount of
time available for the collection of information. The data collection
instruments should be clear in terms of the information they seek,
retain data disaggregated at an appropriate level, and permit the
matching of data within hierarchically designed samples or across
time. Furthermore, they must be designed to permit subsequent
statistical analysis of data for reliability and (if possible) validity.
The basic requirements are that the questions posed do not present
problems of interpretation to the respondent, and that, when forced
choice options are provided, the choices are mutually exclusive and
are likely to discriminate among respondents.
© UNESCO 35
Module 4 Judging educational research based on experiments and surveys
36
Survey studies and some factors that often threaten their validity
© UNESCO 37
Module 4 Judging educational research based on experiments and surveys
38
Other issues that should be 9
considered when evaluating
the quality of educational
research
Previous sections have identified specific threats to the integrity
of research studies. In this section more general considerations
regarding the quality of research studies are presented. Some of
these may strike the reader as being self-evident. If this is the
case, then the reader has a fairly good grasp of the basic canons of
research. Unfortunately, too many studies are carried out in which
these canons are violated.
© UNESCO 39
Module 4 Judging educational research based on experiments and surveys
40
Other issues that should be considered when evaluating the quality of educational research
The term Demand characteristics effect is concerned with all the cues
available to subjects regarding the nature of a research study. These
can include rumours as well as facts about the nature of the study,
the setting, instructions given to subjects along with the status
and personality of the researcher. These can influence the research
study and, more importantly, the results. At present, research is
underway to determine the conditions under which such factors
may influence the outcomes of studies.
© UNESCO 41
Module 4 Judging educational research based on experiments and surveys
42
Box 1 Research evaluation framework
1. Problem
a. is stated clearly and understandable;
b. includes the necessary variables;
c. has theoretical value and currency (impact on ideas);
d. has practical value and usability (impact on practice).
2. Literature review
a. is relevant and sufficiently complete;
b. is presented comprehensively and logically;
c. is technically accurate.
3. Hypotheses and/or questions
a. are offered, and in directional form where possible;
b. are justified and justifiable;
c. are clearly stated.
4. Design and method
a. is adequately described;
b. fits the problem;
c. controls for major effects on internal validity;
d. controls for major effects on external validity.
5. Sampling
a. gives a clear description of the defined target population;
b. employs probability sampling to ensure representativeness;
c. provides appropriate estimates of sampling error.
6. Measures
a. are adequately described and operationalized;
b. are shown to be valid;
c. are shown to be reliable.
7. Statistics
a. are the appropriate ones to use;
b. are used properly.
8. Results
a. are clearly and properly presented;
b. are reasonably conclusive;
c. are likely to have an impact on theory, policy, or practice.
9. Discussion
a. provides necessary and valid conclusions;
b. includes necessary and valid interpretations;
c. covers appropriate and reasonable implications.
10. Write-up
a. is clear and readable;
b. is well-organized and structured;
c. is concise.
Suggested guide for scoring
5 = As good or as clear as possible; could not have been improved.
4 = Well done but leaves some room for improvement.
3 = Is marginal but acceptable; leaves room for improvement.
2 = Is not up to standards of acceptability; needs great improvement.
1 = Is unacceptable; is beyond improvement.
© UNESCO 43
Module 4 Judging educational research based on experiments and surveys
1. Problem
The first criterion in the framework involves the research problem.
The above statement of objectives was clear and understandable,
had some theoretical value and considerable practical value. While
the necessary variables were not explicitly stated, they were strongly
implied. The first objective referred to skills and knowledge to be
gained in typing and sewing courses. Furthermore, the third and
fourth objectives specified employment in typing or sewing within
six months of the end of the course and use of skills within six
months after completing the course. The second objective referred
to variables “...having an effect on achievement of participants” but
did not specify what these variables were. These variables were
presented later in the report of the study. A rating of 4 would seem
suitable for the statement of the problem.
2. Literature review
The study report contained no literature review. This may have been
omitted due to the extreme length of the report (28 pages) or the fact
that the project was one that had an extreme practical orientation. In
44
A checklist for evaluating the quality of educational research
any case, the omission of any literature review was troubling. If one
were to be generous, one could give the study a rating of NA (Not
Applicable). The alternative would be to give it a rating of 2.
© UNESCO 45
Module 4 Judging educational research based on experiments and surveys
fairly well. However, the study was not able to control adequately
for either internal or external validity. The reason for this was
that the researchers had virtually no control over the selection of
participants for the study or the assignment of participants to the
different length courses. Another design issue that the authors
should have advanced was the rather limited time of 6 months
that was used for the tracer study. Given the prevailing economic
conditions, perhaps at least one year would have been more
appropriate. At best, a rating of 3 must be given on this criterion.
5. Sampling
There were a number of difficulties encountered in sampling. First,
the selection of which of the Lifelong Education Centres would be
included presented problems. According to the investigators, “The
selection of centres was made by purposive sampling rather than
simple random sampling as originally foreseen, because of time
limitations and in order to have a sufficient number of participants”
(p. 55). The use of ‘purposive sampling’ was questioned earlier
in this module. It is a somewhat elegant way of saying that the
sampling was less than desirable. Second, within each centre,
already established classes of 150 or 200 hours duration were
selected at random with the proviso that the number of participants
per centre should be at least 30. This condition was not always
met. More serious, however, was the use of intact classes. The
inability randomly to assign participants to classes of different
durations presents real problems since it is possible that more able
participants could be assigned to one type of class length. Third,
the data collection was seriously compromised in some cases. The
authors report, “In some cases, centres had already completed the
course a week or two before the dates arranged with the centre for
the administration of the tests. The teachers tried to persuade the
participants to return to take the tests but, unfortunately, many of
them did not do so” (p. 55). In addition, “...participants enrolled but
never attended the course or dropped out of the course before it
46
A checklist for evaluating the quality of educational research
finished because they had found a job, were already satisfied with
what they had learned, became tired of the course, or were needed
on the land for seasonal work” (p. 55). The effect of these events
was to introduce a bias into the study whose influence is unknown.
The lack of a clear definition of a target population, the lack of
probability sampling, and the difficulties encountered in actually
obtaining the sample raise serious questions about the adequacy of
the groups that were studied. A rating of 2 on this criterion seems
warranted.
6. Measures
There were several measures used to assess the achievement of
the participants. Tests were developed in both typing and sewing
for the study. In addition, questionnaires were developed to assess
the background of participants and other variables of interest.
A teacher questionnaire was also developed for use in the study.
The description of the development of the instruments was quite
thorough (pp. 57-64) and there is considerable evidence of content
validity. The reliability of the cognitive tests of typing were rather
low (Thai + English = .60 and Thai = .64). The performance tests, in
contrast, showed high reliabilities (typing = .87 to .95 and sewing
total = .92). Clearly, the instruments are one of the outstanding
features of the study. A rating of 4 or 5 is clearly warranted.
7. Statistics
The authors used standard statistical procedures to analyze
their data. They presented the means obtained on each measure
for each subject for each course length and indicated whether
differences between groups were statistically significant or not.
Unfortunately, they did not present the standard deviations of the
scores. This made it somewhat difficult to interpret the results that
were obtained. For example, the authors report (p. 70) a difference
© UNESCO 47
Module 4 Judging educational research based on experiments and surveys
8. Results
Some comments on the results were presented under the above
criterion. The general lack of differences between participants in
the 150 hour course and those in the 200 hour course are notable
and led to the conclusion that little was gained by having courses
of more than 150 hours. This is a finding that is likely to have
considerable impact on policy and practice. At the conclusion of the
report, the authors suggest that, “Serious consideration should be
given to abandoning the 200 hour sewing course ...” If adopted, such
a recommendation would have a great impact on policy and practice
and should result in a considerable saving of money. A rating of 4
seemed appropriate.
9. Discussion
The authors provided a thoughtful discussion of their results. At
the beginning of the study, the authors expressed the expectation
that graduates of the programme would be able to use their
newly acquired skills in typing or sewing to gain employment.
Subsequently, they found out that employment opportunities
were quite limited and that participants used their newly acquired
skills in sewing, for example, to sew clothes for themselves. The
absence of economic opportunities for programme graduates
was therefore not interpreted as any reflection on programme
48
A checklist for evaluating the quality of educational research
10. Write-up
Many of the comments that have been made about this study could
only be given because the authors were so clear and thorough in
their write-up. Despite a few omissions – standard deviations for
the performance measures, for example – the authors described
their study in a clear and thorough way. Some of this may be due to
the fact that the authors were given a generous page allotment by
the editors of the journal (28 pages). In any case, the study is almost
a model for a clear, coherent presentation of a research endeavour. A
rating of 4 or 5 is clearly warranted on this criterion.
© UNESCO 49
Module 4 Judging educational research based on experiments and surveys
Criterion Rating
1. Problem 4
2. Literature review NA or 2
3. Hypotheses and/or questions 4
4. Design and method 3
5. Sampling 2
6. Manipulation and measures 4
7. Statistics 4
8. Results 4
9. Discussion 3
10. Write-up 4 or 5
50
Summary and conclusions 11
This module has sought to furnish a guide to readers of educational
research reports. The intended audience for this guide is educational
planners, administrators, and policy-makers. A sincere effort has been
made to make this module as non-technical as possible. It is doubtful
that this effort has fully succeeded though. There are technical issues
involved in research studies and any attempt to avoid them would
be irresponsible. Rather, technical issues have been addressed when
necessary and an attempt has been made to address them in as simple
and direct a way as possible. It is felt that the ideas that underlie
technical issues in research are well within the grasp of the readers of
this module and that assistance with technical details can be sought
when needed.
© UNESCO 51
Module 4 Judging educational research based on experiments and surveys
52
Appendix
© UNESCO 53
Module 4 Judging educational research based on experiments and surveys
which are special talks on the law and elections, and participation
in special activities of the province).
54
Appendix
Design
A cross-sectional survey was conducted throughout the 24 Lifelong
Educational Centres to acquire a sufficient number of courses for
both subject areas. Practical tests and a cognitive test (typing only)
were administered at the end of each course to assess the courses
and to identify variables associated with achievement. Six months
after the end of the courses, a tracer study was conducted to assess
how and to what extent the graduates were using the skills learned.
© UNESCO 55
Module 4 Judging educational research based on experiments and surveys
56
Appendix
To sum up, the total achieved sample was 498, divided into 135
for sewing (150 hours), 147 for sewing (200 hours), 130 for typing
(150 hours), and 86 for typing (200 hours). The achieved sample is
presented in Table 1.
© UNESCO 57
Module 4 Judging educational research based on experiments and surveys
Sewing
1. Chiengmai - 23 14 37 1. Nakornswan 14 12 15 41
2. Khonken 11 13 6 30 2. Petchaboon 47 - 7 54
3. Nakornratchasima 10 - 11 21 3. Ratburi 14 - 8 22
4. Nakronswan 7 8 - 15 4. Surin 26 - 4 30
5. Samusakorn - - 9 9
6. Ubonratchatanee 12 11 - 23
Typing
1. Chiengmai 9 5 19 33 1. Angthong 2 - 12 14
2. Khonken 9 12 9 30 2. Ayuthaya 5 1 3 9
3. Nakronratchasima 16 - 6 22 3. Petchaboon 15 5 - 20
4. Uthaithanee 12 - 1 13 4. Samusakorn - - 17 17
5. Ubonratchatanee 9 17 6 32 5. Surin 9 10 7 26
58
Appendix
Sewing Typing
1. Angthong - - - 3
2. Ayuthaya - - - 2
3. Chiengmai 9 - 9 -
4. Khonken 8 - 8 -
5. Nakornratchasima 5 - 5 -
6. Nakornswan 4 10 - -
7. Petchaboon - 14 - 5
8. Ratburi - 6 - -
9. Samusakorn 2 - - 4
10. Surin - 7 - 7
11. Uthaithanee - - 3 -
12. Ubonratchathanee 6 - 8 -
Total 34 37 33 21
© UNESCO 59
Module 4 Judging educational research based on experiments and surveys
Sewing Typing
1. Angthong - - - 11
2. Ayuthaya - - - 4
3. Chiengmai 29 - 19 -
4. Khonken 16 - 14 -
5. Nakornratchasima 14 - 8 -
6. Nakornswan 14 21 - -
7. Petchaboon - 49 - 15
8. Ratburi - 19 - -
9. Samusakorn 8 - - 13
10. Surin - 20 - 10
11. Uthaithanee - - 6 -
12. Ubonratchathanee 9 - 16 -
90 109 63 53
Sub-Total
(66.7%) (74.1%) (48.5%) (61.6%)
60
Appendix
The differences between the tracer sample and the drop-out sample
were small – being less than one third of a standard deviation
score on each student characteristic. The largest differences were
noted for hours of attendance in the sewing and typing courses.
The maximum difference in attendance hours was 4.8 hours for the
typing course.
© UNESCO 61
Module 4 Judging educational research based on experiments and surveys
There was one cognitive test for typing and practical tests in both
typing and sewing with a one-page student questionnaire on
student background information (sex, education, age, previous
experience, motivation, siblings, father’s education, mother’s
education, father’s occupation, mother’s occupation and machine
at home). A three-day workshop was held in Bangkok for course
content analysis and test construction. Eight teachers from Lifelong
Education Centres, mobile Trade-Training Schools and the Non-
Formal Education Department at the Ministry of Education, as well
as two local experts, were invited to participate in this workshop.
Cognitive test of typing: content analysis of the curricula for the
150 and 200 hour courses was undertaken by the instructors
teaching the typing courses and team members. Table 4 presents the
topic areas and objectives, and the number of items per topic. The
number of items per topic represents the weights accorded to each
topic. The items were in multiple-choice form with four alternatives
per item, only one was the correct answer.
62
Appendix
Number of items
Total 50 55
© UNESCO 63
Module 4 Judging educational research based on experiments and surveys
characteristics of the final test were: for Thai + English typing with
a maximum score of 55: X = 32.25 SD. = 5.621 and a KR 21 of .596;
for Thai typing with a maximum score of 45: X = 23.298 Sd. = 5.408
and a KR 21 of .636.
Two scores were calculated. The first was a combined speed and
accuracy score which was the number of correctly typed words per
minute. The formula used was:
64
Appendix
Maximum
score Mean S.D. KR 21
© UNESCO 65
Module 4 Judging educational research based on experiments and surveys
The experts selected this for the final testing. The scoring criteria
were set by the sewing instructors and approved by the local
experts. The scoring system was set by examining the difficulty
and time consumed in the sewing process. Thus, the experts agreed
that out of a total of 100 points, body measurement should receive
10 points, building pattern 25 points, laying and cutting fabric 25
points, and sewing 15 points.
66
Appendix
2. Building pattern (25 points) was divided into five sections; each
section was awarded five points. They were (i) front and back
pieces, (ii) collar, (iii) sleeve, (iv) bent lines, and (v) calculation.
In each section five points were awarded if the measurement
figures were correctly converted into pattern figures and the
pattern was drawn correctly. One point was subtracted for each
mistake.
3. Laying, tracing and cutting (25 points) was divided into three
sections. They were (i) front and back pieces (10 points), (ii)
collar (10 points), and (iii) interfacing of collar and sleeves (5
points). Scoring depended on how well students laid the cloth
in relation to its line (grain), on whether they left enough spare
cloth for sewing, and on whether they cut out all the pieces
correctly. Three points were subtracted for each mistake in
laying, cutting, front-and-back pieces, and collar; and two
points for incorrect interfacing.
4. Sewing (15 points) was divided into five sections and each
section was awarded three points. These were: (i) sewing the
shoulder and side seams, (ii) sewing the collar, (iii) sewing
the sleeves, (iv) sewing the button-holes and buttons, and (v)
hemming. If the participants did the sewing process correctly,
i.e., started from part 1 and went through to part 5 in the
correct order, they would get a full score. If they jumped one
step, they lost three points. Sub-scores were calculated for each
of the four major process sections.
There were two major sub-scores for product: goodness of fit (10
points) and tidiness of sewing (15 points). (See Figure 2).
1. Goodness of fit when the dress was fitted to the model was
also divided into five parts, each receiving two points. The
parts were body, sleeves, collar, length of blouse, and overall
goodness of fit. Two points were awarded for the fit in each part
of the text.
© UNESCO 67
Module 4 Judging educational research based on experiments and surveys
At the final testing, two Ministry sewing experts were the scorers.
Each scorer scored every garment according to the criteria and
their average was calculated to represent a participant’s score.
Unfortunately, scorers were instructed only to give a global score for
each of the six major sub-scores. If we were to repeat the exercise
again, we should have each separate item (for example, in body
measurement score). The reliability has been calculated for each
68
Appendix
Mean S.D. KR 21
A. Process
B. Product
© UNESCO 69
Module 4 Judging educational research based on experiments and surveys
Data collection
The research team and staff visited and collected data at sample
centres during March and early June, 1981, at the end of the typing
and sewing courses. The course instructors were asked to fill out the
questionnaires and return them to the research team. At that time
the team administered both cognitive and practical tests for both
courses to the students.
About six months after giving the tests, the tracer study was carried
out by the team members and some additional staff. All data were
coded and punched in Bangkok. The analyses were undertaken at
the National Statistical Office in Bangkok and the D.K.I. Computer
Centre in Jakarta.
The teachers on these sewing courses were all women who tended
to be older and more experienced than the typing teachers, but with
fewer formal qualifications.
The long (200 hours) sewing course attenders were an elite. They
were older women who already had some experience in sewing and
were seeking to reach a higher standard. Sixty per cent of them
wanted to sew for themselves and their families, and 72 per cent
70
Appendix
Typing Sewing
Short Long
(150 hrs) (200 hrs)
Short Long
Thai+Eng Thai Thai+Eng Thai (150 hrs) (200 hrs)
© UNESCO 71
Module 4 Judging educational research based on experiments and surveys
Table 7 (continued)
Typing Sewing
Short Long
(150 hrs) (200 hrs)
Short Long
Thai+Eng Thai Thai+Eng Thai (150 hrs) (200 hrs)
Course variables
Class size : Under 16 69.4 60.3 34.0 61.5 43.7 59.9
Shift
Morning 34.7 51.7 42.6 28.3 29.6 68.7
Afternoon 40.3 8.6 10.6 15.4 40.7 8.2
Evening 25.0 39.7 46.8 56.4 29.6 22.4
Equipment : Lacking 59.1 74.1 76.6 2.6 56.3 72.1
Attendance : 75% + 90.2 82.9 79.0 64.5 92.6 63.5
Quality of facility
Excellent - - 34.0 48.7 17.0 -
Fair 98.6 69.0 66.0 51.3 83.0 100
Poor 1.4 31.1 - - - -
Table 8 Mean achievement scores in sewing for 150 and 200 hour courses
72
Appendix
had machines at home. They were taught by the oldest and most
experienced teachers, generally during the morning shifts. As with all
longer courses, however, their attendance record was less complete.
It was not clear why these latter students would not rather put
their energies into a Thai plus English typing course; perhaps their
knowledge of English was insufficient. Clearly, students did treat
these courses as graded steps in accomplishment: it was not true to
© UNESCO 73
Module 4 Judging educational research based on experiments and surveys
say, for example, that students doing a combined Thai plus English
course were already able to type in Thai (about 90 per cent in both
the short and long Thai plus English courses lacked previous typing
experience). Quite a large number of students in these courses were
young teenagers picking up an additional credit while they were
studying the adult continuing education by attending day-time
classes.
As can be seen, the total score for the 200 hour course was
significantly higher than that for the 150 hour course. The largest
differences were for sewing and cutting. However, for body
measurement and building patterns, there was no difference. Table 9
presents the results for typing. As can be seen, the total scores on
the cognitive test for longer courses were higher than for shorter
courses. However, it was only significant for the Thai typing course.
The major difference was for the principle of typing official letters.
For Thai plus English typing, the major differences were for ‘how to
feed and release papers and set intervals’ and ‘principle of typing
official letters’.
For the practical test, the scores for longer courses were not
significantly higher than for shorter courses on ‘speed’. Remarkably,
the scores on ‘format plus tidiness’ for longer courses were less than
for shorter courses.
74
Appendix
The objectives of the tracer study were to find out whether the
graduates took employment within six months of the end of the
course, and how participants used the skills they learned in the
course.
The criteria used for these objectives were the amount of money
the participants earned, the money they saved, the amount of
time spent on typing and the reason why they did not utilize their
knowledge and skills. Table 10 presents this information.
© UNESCO 75
Module 4 Judging educational research based on experiments and surveys
I
Basic Body
Course knowledge Parts Maintenance position Feed return Manipulation
Thai
200 hours 2.18 2.54 0.69 1.77 2.74 4.90
150 hours 1.84 2.67 0.90 1.76 2.66 4.50
Difference 0.34 -0.13 **-0.21 0.01 0.08 0.40
Thai + English
200 hours 7.23 3.55 0.77 1.70 2.85 5.06
150 hours 6.45 4.14 0.47 7.78 3.97 4.07
Difference **0.78 -0.59 0.30 -0.08 ***-1.12 ***0.99
II
Practical
Speed
Course typing Carbon Stencil Principal Total Speed Format
Thai
200 hours 1.46 1.23 1.13 6.21 24.85 8.59 2.74
150 hours 1.29 1.16 0.90 4.59 22.26 7.47 3.02
Difference 0.17 0.07 0.23 ***1.62 *2.59 1.12 -0.28
Thai + English
200 hours 2.53 1.45 1.02 7.02 33.19 16.47 6.28
150 hours 2.07 1.36 1.44 5.97 31.84 16.01 6.68
Difference *0.46 0.09 ***-0.42 *1.05 1.35 0.46 -0.4
76
Appendix
Thai + Thai +
English Thai English Thai 150 hours 200 hours
N = 32 N = 31 N = 32 N = 22 N = 90 N = 109
© UNESCO 77
Module 4 Judging educational research based on experiments and surveys
Table 10 (continued)
Thai + Thai +
English Thai English Thai 150 hours 200 hours
N = 32 N = 31 N = 32 N = 22 N = 90 N = 109
• If they don't sew, why?
(N=6) (N=2)
No machine - - - - 33.3 -
Lack confidence - - - - 33.3 -
No time - - - - 33.3 100.0
78
Appendix
From the four types of typing courses, many participants stated that
they continued to type, i.e., use the skill six months after the end
of the course. Seventy-eight per cent from the short course (Thai +
English) and 59 per cent from the long course (Thai only) used their
typing skills. Those who did not use their skills gave as reasons that
they had no typewriter at home, no time or that their occupation did
not require the typing skills they had learned in the course. Only
a few (15.6 per cent) earned money from typing because many of
them were unable to find a job (there were very few employment
possibilities in their area) and they did not possess a typewriter at
home. Some stated that they lacked confidence or that they typed
free of charge for their friends. Many of them had no time because
they were still studying.
However, just over 50 per cent typed for their own pleasure up to
one hour a week and some of them up to three hours per week.
Those who had taken the Thai typing courses spent more time
typing than those who had taken the Thai plus English courses. In
fact, 39 per cent of Thai typing course participants typed more than
three hours a week.
© UNESCO 79
Module 4 Judging educational research based on experiments and surveys
Regression analyses
Two main sets of regression analyses were conducted. The first
concerned sewing and the second typing.
80
Appendix
© UNESCO 81
Module 4 Judging educational research based on experiments and surveys
Let us now examine the results. Tables 11 and 12 report the results
for achievement in course and tracer study respectively. The
complex multi-stage sample designs employed in this study did
not conform to the well-known model of simple random sampling.
Consequently it was not possible to employ the standard error
estimates provided by SPSS computer programme package (Nie
et al., 1975). Instead, it was decided to use the formula provided
by Guilford and Fruchter (1973: p. 145) for the standard error
of correlation coefficients to obtain approximate estimates for
the standard error of a standardized regression coefficient.
This decision provided more conservative error limits than the
SPSS programme and consequently represented a more realistic
approach to testing the significance of the standardized regression
coefficients (Ross, 1978). Accordingly, as the number of students
in Table 11 is 282, .12 represents two standard errors while the
number of students in Table 12 is 199 so 0.14 represents two
standard errors. Only 28 per cent of the variance was explained by
the variables in the model of predicted achievement at the end of
the course (Table 11). This is disappointing and clearly more effort
will have to be made to identify other variables which are likely
to be influencing sewing achievement, and include these in future
studies of this kind. The only variables to survive the regression
were participants’ age, the additional training of the teacher, and
the quality of the facilities as perceived by the teacher. Older
participants had higher achievement scores, better quality facilities
were associated with higher achievement and so was the fact that
teachers who had received some in-service course on sewing in
the last five years had participants who scored higher on the end-
of-course achievement test. The latter two variables were clearly
policy variables and it would seem that it would be advantageous to
attempt to supply machines and materials of adequate quality and
to give teachers special in-service courses.
82
Appendix
Block I Block II
r ß r ß
Father’s occupation -.16 * *
Kits .20 * *
Participant’s age .30 .25 .15
Previous experience .16 * *
R2 .12 .28
Note : N = 282, 2 se (ß) = 0.12.
ß not exceeding two standard errors are asterisked (*).
© UNESCO 83
Module 4 Judging educational research based on experiments and surveys
r ß r ß r ß r ß r ß r ß
84
Appendix
Size of class, level of teacher training, the reason for joining the
class, age and father’s occupation were not associated with money
earned. Again, however, the regression accounted for only 31 per
cent of the variance.
Firstly, the sewing course itself was important because those with
higher scores earned more and saved more, but we must remember
that higher scores were primarily a function of participants’ age,
additional teacher training and the quality of the facilities.
© UNESCO 85
Module 4 Judging educational research based on experiments and surveys
86
Appendix
personal study objective (for a career, for credit or for self, family
and others). Block II included teacher qualification, size of class, the
shift attended (morning, afternoon, or evening), and finally the type
of course participants enrolled in (150 or 200 hours).
The achieved sample size for these interviews was 116 participants
(53.7 per cent of all typing). As a result, 0.19 was calculated as being
about two standard errors for the beta coefficients in this analysis.
The results of the analysis are presented in Table 13.
Block I Block II
r ß r ß
R2 .09 .14
Note : N + 116, 2 se (ß) = .19.
ß not exceeding two standard errors are asterisked (*).
© UNESCO 87
Module 4 Judging educational research based on experiments and surveys
88
Appendix
© UNESCO 89
Module 4 Judging educational research based on experiments and surveys
16 per cent from the 150 hour course and 7 per cent from the 200
course were employed. In sewing, 42 per cent became employed but
with no difference between the short and long courses. The result
for typing is disappointing, but participants were asked how much
they typed, irrespective of whether this was for employment or not.
A further 66 per cent from the Thai typing course indicated that
they typed, but without reimbursement. From the Thai plus English
course, a further 62 per cent from the short course and 53 per cent
from the long course typed, but again, without reimbursement.
In general, typists were typing below one hour per week after
six months. Many of the participants in the typing courses were
students in other adult continuing education courses held in the
centres. These courses were for obtaining educational certification
equivalent to full-time schooling. Fifty per cent of all participants
in the Thai typing courses were students as were just over 50 per
cent in the Thai plus English courses. It could not be expected that
those who were still students six months after the end of the course
would be employed. Only 42 per cent of the participants in the
sewing classes were employed but another 50 per cent sewed for
their families and their own use. Thus, the sewing course can be
regarded as highly successful.
90
Appendix
more), and the score at the end of the course. The other three factors
were being from farming families, being members of smaller classes
at the centre, and having teachers with higher levels of pre-service
training.
© UNESCO 91
Module 4 Judging educational research based on experiments and surveys
E XERCISE
Select two published articles from an educational research or
educational evaluation journal that have the following general
features.
92
Appendix
References
Brickell, J.L. (1974). Nominated samples from public schools and
statistical bias. In American Educational Research Journal. Vol. 11.
No. 4, pp. 333-41.
Coleman, J.; Hoffer, T.; Kilgore, S. (1987). Public and private schools.
Washington: National Center for Educational Statistics.
© UNESCO 93
Module 4 Judging educational research based on experiments and surveys
Additional readings
Borg, W.; Gall, M. (1989). Educational research: an introduction (Fifth
edition). New York: Longman.
94
Since 1992 UNESCO’s International Institute for Educational Planning (IIEP) has been
working with Ministries of Education in Southern and Eastern Africa in order to undertake
integrated research and training activities that will expand opportunities for educational
planners to gain the technical skills required for monitoring and evaluating the quality
of basic education, and to generate information that can be used by decision-makers to
plan and improve the quality of education. These activities have been conducted under
the auspices of the Southern and Eastern Africa Consortium for Monitoring Educational
Quality (SACMEQ).
In 2004 SACMEQ was awarded the prestigious Jan Amos Comenius Medal in recognition
of its “outstanding achievements in the field of educational research, capacity building,
and innovation”.
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible for
SACMEQ’s educational policy research programme. All modules may be found on two
Internet Websites: http://www.sacmeq.org and http://www.unesco.org/iiep.
Quantitative research methods
in educational planning
Series editor: Kenneth N.Ross
5
Module
Graeme Withers
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible
for the educational policy research programme conducted by the Southern and
Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ).
The designations employed and the presentation of material throughout the publication do not imply the expression of
any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area or of
its authorities, or concerning its frontiers or boundaries.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means: electronic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission
in writing from UNESCO (International Institute for Educational Planning).
Content
1
Module 5 Item writing for tests and examinations
17. Exercises 83
II
Item writing – art or science? 1
© UNESCO 1
Module 5 Item writing for tests and examinations
The term ‘item writing’, used in the title of this document, draws
attention to this essential independence – the separate skills,
abilities or pieces of knowledge which make up human learning are
considered individually in the test. This is the prime focus. However
the discussion (and the test development process) begins and ends
with consideration of a second focus. It considers what happens
when these items achieve additional significance or importance
by having been grouped or combined with other items to form a
test instrument. We must continually remember that our building
blocks are part of a larger whole.
2
Item writing – art or science?
© UNESCO 3
Module 5 Item writing for tests and examinations
4
Test specifications or 2
blueprints
© UNESCO 5
Module 5 Item writing for tests and examinations
Figure 1
6
Item writing – art or science?
Figure 2
© UNESCO 7
Module 5 Item writing for tests and examinations
Who designs them? In a school, the teacher who is setting the test is
the designer, with help and a critical perspective being given where
appropriate by a senior teacher or subject leader. If the test is to be
given to more than one class taken by a number of teachers, each
teacher should participate in the design until agreement is reached
that the test would be fair or valid for each class.
8
Figure 3
Detailed
content
Figure 3a
Writing skills
Reading ability
Computation
© UNESCO 9
Module 5 Item writing for tests and examinations
Figure 4
The time weighting for the test now becomes easier to define. The
specification for whole test (Figure 2) shows that 45 minutes is
available. Figure 5 shows the matrix with appropriate time values
inserted.
Figure 5
10
Developing the detailed matrix
© UNESCO 11
Module 5 Item writing for tests and examinations
Figure 6
12
Developing the detailed matrix
Figures 6 and 6a also raise two other important issues with regard
to test specification. The first is: ‘Just how long should a test be?’
There are, of course, no hard and fast rules, but common practice
suggests that 45 minutes is a maximum for any sort of formal
testing in middle primary school. This maximum might rise to an
© UNESCO 13
Module 5 Item writing for tests and examinations
hour for upper primary and junior secondary, two hours for mid-
secondary, and three hours for the most senior school students. The
test-writer might comfortably achieve a satisfactory coverage of the
topics and objectives to be tested in less time, and should aim to do
so where possible. Two tests a few days apart will work better that
one large, over-long one.
The second issue relates to the fact that it is very easy to write
items to test simple knowledge, and too often that is all that test-
writers do. They forget application, analysis, synthesis and so on.
Putting these categories clearly into a test specification reminds us
that they are there to be tested, and they need to be tested. They
may also require different mark weightings: one mark for a simple
factual recall item is fine, but an application or detailed analysis of
some learning may deserve more, as in Figure 6b. The number of
items per cell will depend on the available time, and your estimate
of a good balance to cover all the learning to be tested. Remember
too the test writer’s responsibilities to create a good impression on
teachers, and have a positive effect on learning. If the instrument
does no more than test factual recall, often that is all that will be
taught – teachers are great ones for scanning past papers to see
what is expected, in order to give their students the best chance. If
they find nothing but lower-order skills, then learning and teaching
in the whole education system may become the poorer for it.
14
Developing the detailed matrix
Figure 6a
Objective or
behaviour
Knowledge Comprehension Application TOTALS
Classroom
topics or content
1. Addition
2. Subtraction
3. Multiplication
4. Division
TOTALS
Figure 6b
Objective
Knowledge Comprehension Application TOTALS
Content
© UNESCO 15
Module 5 Item writing for tests and examinations
• how much paper will be available for printing papers for the
total number of candidates to be tested? Or will that not be an
issue?
• will students answer on the test book (which means it is not re-
usable) or on a separate answer sheet?
16
the broad parameters ought to be ready for the item-writers to have
in mind once they begin work.
© UNESCO 17
Module 5 Item writing for tests and examinations
Figure 7
18
…
2. All items have at least one ‘right’ answer or response,
in the sense that it would earn full credit from an
assessor if offered by a candidate.
• Sometimes the choice is ‘closed’, as in multiple-choice
items. The right answers are actual and printed on the
paper as an option.
• Sometimes, as in essay tests, these responses are potential
– not realised until someone reads the stimulus and makes
the response. (Sometimes, of course, in reality nobody
does get full credit – but all items should be written in
such a way that somebody might.)
• Some items are ‘open’. They have more than one ‘right’
answer, such as two essays which each score full marks
even though they are different. (Multiple-choice items
never do.)
© UNESCO 19
Module 5 Item writing for tests and examinations
Figure 8
informative
• a passage in a multiple-choice test to which a set of items
refers
• a map in a Geography test which candidates are expected
to refer when answering some short-answer questions
• a photograph in an Art test about which candidates are
expected to write an essay
• a diagram in a maths test which forms the basis for the
solution of a problem
directive
• the leading sentence or ‘stem’ of a multiple-choice item,
such as: “In the passage, who ate the cake?”
• a short-answer item such as: “Use the scale of the map to
calculate the distance from the tower to the bridge”.
• a specific essay topic, such as: “Compare and contrast
the painting by Picasso in the photograph with two other
paintings you have studied by the same artist”.
• an extended response item such as: “Write a critical
review of one novel you have studied this semester”.
• a statement of a problem to be solved, such as: “What
is the area, in square metres, of the shaded part of the
diagram above? Show all your calculations”.
20
Item types or formats
Figure 9
DIRECTIVE
stimulus
material
* the NUMBER 1 At this stage of the game, who had lost the
and STEM most money?
* OPTIONS A Bob
for answering B Carol
C Ted
E There is not enough information to say.
DISTRACTORS A , B, C and E
or wrong
answers
© UNESCO 21
Module 5 Item writing for tests and examinations
a. True-false
b. Matching items
CHEERFUL . . . . . . . . . . . . . . . . . . . . . . . . . . .
SOBER . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
WAKEFUL . . . . . . . . . . . . . . . . . . . . . . . . . . .
CALM . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
Item types or formats
c. Classification items
. . . . . . . . . . . . . . . . . . . . KENYA AFRICA
. . . . . . . . . . . . . . . . . . . BOGOTA . . . . . . . . . . . . . . . .
SRI LANKA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
d. Multiple-choice items
See Figure 9 for an example. It should be noted that multiple-choice
items can come in two styles. One might be called stimulus-related,
and the item in Figure 9 is an example of this kind. The other
might be called ‘stimulus-free’ in that options are printed, but the
candidate is expected to draw on prior knowledge, gained during
classwork, to select the keyed answer.
© UNESCO 23
Module 5 Item writing for tests and examinations
a. Short-answer items
Write a good title for the story you have just read.
16 x 4 = . . . . . . . . or . . . . . . . . . . .
24
Item types or formats
c. Cloze items
One subset of completion items might be especially mentioned.
These are so-called ‘cloze items’, where candidates’ word knowledge
is tested by asking them to insert appropriate words in regularly
spaced gaps in a passage of prose (e.g. where every fifth word
has been deleted, and a space left). The format began in reading
research as a test of readability of prose, but has been adapted for
educational testing purposes. Here is a sample of a pure cloze task:
© UNESCO 25
Module 5 Item writing for tests and examinations
These formats often bring into sharp focus the problem of choice
between different tasks on differing aspects of course work. If it is
at all possible, choice should be avoided and all candidates asked to
26
Item types or formats
© UNESCO 27
Module 5 Item writing for tests and examinations
John’s boss has told him to set up six jars as a display for a jelly
bean promotion. John sets up six jars on a shelf, three full ones then
three empty ones.
“Well, I’d like it better if you alternated full and empty jars’, said the
boss.
Look at the photograph and plan of the room. You will notice none
of the corners is a right angle (90 degrees).
28
Selecting item types for test 6
purposes
As stated earlier, the basis for any selection from within the wide
range of possible formats broadly categorised in the previous
section lies in the learning to be covered. The test writer must
achieve the best possible match between this learning and the
potential instrument, as revealed in the specification.
© UNESCO 29
Module 5 Item writing for tests and examinations
Before we take fright at all these problems, and allow our testing
to degenerate into mere assessment of general knowledge or the
most simplistic kinds of understanding and cognitive processing,
we should remember that testing of abilities in analysis, synthesis
and higher-order evaluations is possible, and should be done (as
an earlier section pointed out). It is just that one format can’t do
them all. We might never completely solve the guessing problem
(‘did the student really know this?’) but a range of formats in
30
Selecting item types for test purposes
© UNESCO 31
Module 5 Item writing for tests and examinations
Figure 10
B Matching items • useful for testing relationships • the cluster approach destroys item
• useful for testing factual independence
information • difficult to word instructions
• easy to construct a large number
32
Selecting item types for test purposes
Figure 11
© UNESCO 33
Module 5 Item writing for tests and examinations
These simple distinctions will help improve the test quality overall.
They will help us to remember to develop items which do not
merely test factual knowledge and simple understandings but tap
into higher order skills which students develop as they learn.
34
beyond this level an important opportunity has been missed: there
is more to learning than this.
© UNESCO 35
Module 5 Item writing for tests and examinations
well on the item. By all means choose material or items which focus
on substantial misunderstandings or mistakes which you know
students might have or make, or which reflect on what you know
to be a common and important difficulty for students doing the
course. But make sure that it is common and important, not merely
a devious or slippery misinterpretation you have invented to trap
the unwary.
The last two points in Figure 12 will assist in avoiding ‘test writer
bias’. A single view of the curriculum to be tested might be narrow,
or even faulty, even if the test constructor is experienced and an
acknowledged expert. In school-based test development it is often
impossible to achieve this variety, where only one person really
knows the course as taught. Elsewhere it often is possible, and is a
good principle.
Assuming that all that has been done, the team has been set up,
curriculum has been clarified and formats decided upon by the
group, the next stage of the work is individual – drafting and
revising the item material. As an example, we might look at an
individual working on development of items in one particular
format – perhaps the most difficult of all – multiple-choice.
36
Item writing as a creative act
Figure 12
• Once the format is chosen, select the material and give yourself
time to ponder it: familiarise yourself with its main points and
other minor ones which might form the basis for good items
and distractors.
© UNESCO 37
Module 5 Item writing for tests and examinations
Figure 13
38
Item writing as a creative act
…
8. Write these sketches at the top of individual sheets of paper
or cards (one per item), and just under them some preliminary
sketches for possible distractors might be made as they occur
to you. Do these in pencil, not ball-point.
9. It is not necessary to ‘finish’ one item (complete with stem
and distractors) before going on to the next. Ideas will emerge
at different times (sometimes quite inconvenient ones),
especially if enough time has been allowed and the process
isn’t rushed.
10. Once an item has been sketched, underneath it write a draft in
the correct format. If you’ve used separate pieces of paper the
items can be drafted in any order (or even left incomplete for
the time being, if a third or fourth distractor just won’t come
to mind). Do all this in pencil, too.
11. Assemble the various pieces of paper (or cards) into what
seems a reasonable order in terms of the difficulty of the
items – the general rule is “easiest to hardest”, but there are
often exceptions to this. Place the stimulus material on top of
the pile and have the sequence typed out.
At point 2 you will need to consider the appeal of the material for
various groups of test-takers: will it appeal to both girls and boys?
Urban and rural dwellers? Is there some significant sub-group who
© UNESCO 39
Module 5 Item writing for tests and examinations
At point 5, you might ask yourself: ‘Why does this piece appeal to
me? What would I hope my candidate-readers would learn from
the experience of reading this piece or looking at this picture?’ In
these ways you might develop for yourself a fuller understanding
of what makes the stimulus tick, as it were. Simultaneously, it will
ensure that you see the central importance of the piece, as well as
exploring some of the details that later will come in handy for item
or distractor development.
40
Item writing as a creative act
During the final assembly of your draft items (point 11), you will
also have an opportunity to review the content and quality of the
items you have written. You will be looking for a range of difficulty.
You will also be able to see the spread of the types of questions
you have prepared: are there enough global questions? Too many
particular questions focusing on no more than vocabulary? Is
inferential reasoning well represented? If the mood of a passage
is important, is it in an item in your pile? All these considerations
will help you develop an effective sequence for the material as you
present it during the panel meeting which is the next stage.
Passage
© UNESCO 41
Module 5 Item writing for tests and examinations
main point
42
Item writing as a creative act
• Are there words in the passage which are too hard for the
candidates?
© UNESCO 43
Module 5 Item writing for tests and examinations
This might also be the place to consider just how many options
(four or five) in a multiple-choice item one is going to have in the
final test. Professional opinion amongst item-writers varies about
this issue. Five options does cut down the correct-guessing rate, but
the number does make life harder for both the candidate and the
item writer. The feeling of the present writer is that four options
will do – a larger number of elegant, straightforward items might
appear on our test-papers, without the fifth, strained or irrelevant
distractor that is all we can think of. However, by all means present
five (or six, even) possibilities to the panel who will review your
drafts.
44
Item writing as a creative act
© UNESCO 45
Module 5 Item writing for tests and examinations
8 Panelling or moderating
drafted items
Critical review of items is essential. And it is essential before the
items and the item-writer get locked into a situation where they
have to make do with what they’ve got. Two words are sometimes
used for this part of the process – ‘panelling’ draws attention to the
need for a panel of reviewers, or more than one critic; ‘moderating’
draws attention to the actual process of the meeting – moderating
the more extreme efforts of the writer by exposing the work to other
views.
46
The standard for multiple-choice items is as in:
Figure 14
© UNESCO 47
Module 5 Item writing for tests and examinations
48
Panelling or moderating drafted items
© UNESCO 49
Module 5 Item writing for tests and examinations
50
• deleting the less successful draft distractors, or ones the panel
feels are implausible;
• establishing a final order for the items in the set or the whole
test.
Passage
© UNESCO 51
Module 5 Item writing for tests and examinations
52
STAGE ONE – editing or vetting
Next, we might take a cold, hard look at the language used. The
stem, for example: is it explicit enough? We might try a slightly
more elaborate wording, such as «What is the main point the
author is trying to make in the passage?» If we used that (or even
the original), A doesn’t make much sense – what is “it” that is
taking time? Both A and B would need a verb: ‘indicating’ for A
and ‘following’ for B would help us understand things more readily
– D and F both have such verbs. G doesn’t, so we might insert
‘recognising’. What does our item look like now?
However this is still not right – the five options need something to
hold them together. We could do this by extending the stem to point
out something all the options have in common. The passage is being
written for item writers, so we might adjust all our wording yet
again to include them, in what is called a run-on stem:
© UNESCO 53
Module 5 Item writing for tests and examinations
Now, what’s the right answer? B, D and F are all mentioned in the
passage so they are reasonably plausible as options. But is “the”
answer to be A or G? Back to the stem: we might ask ourselves
‘Is there only one main point in the passage?’ Time is certainly a
recurring problem – it gets mentioned or implied several times – but
beyond that it is the overall difficulty of the whole process which is
being indicated, with ‘not rushing’ being a major factor. If we only
need four options for our final item, we could collapse A and G into
a single option, such as: ‘preparing items slowly and methodically’,
and put the word ‘difficulty’ (or something like it) into the run-on
stem. Now we might have:
A final check: does the run-on stem agree syntactically with each
of the options? No: we’ve forgotten to put back the participle-form
into F: it needs to be ‘avoiding’. And since the run-on stem forms
four different sentences, we’ll need full-stops for each option. We
need too to adjust the option letters now the item is finished, with
the key now being D. Also note one other last-minute change:
options should all be about the same length in terms of words, so an
addition (‘all faulty’) to the new option C has been made from the
passage. The item now looks like this:
54
STAGE ONE – editing or vetting
It should be pointed out that although the discussion in this and the
previous section has focused on the multiple-choice format as the
basis for the process description, the same sequence of procedures
should be followed with regard to other item formats. They too need
intense and critical review. about curricular relevance, the wording
used, and their position within the pattern of activities which runs
through the whole test.
© UNESCO 55
Module 5 Item writing for tests and examinations
The other activity is to design the test paper and answer sheet layout
and format, so that the field test version is as close as possible to the
real design envisaged for the final test. Layouts and instructions to
candidates need to be field-tested as well as items.
Figure 15
56
…
3. Each of the distractors will be plausible: that is, they will
represent a possibly relevant view of the matters raised in the
stimulus and the stem.
4. The item will be independent. Finding the keyed answer will
not depend on successful answering to any other item in the
test. No clues as to the keyed answer will be given anywhere
else in the test.
5. A successful response to the item-stem will depend on the
test-taker understanding a key issue in the stimulus, not
eliminating distractors to find a ‘best answer’ or merely
recognising a stated fact.
6. The question stated or implied in the item will be positively
worded. Where an important issue in the material unavoidably
requires negative wording if it is to be tested at all, this will be
in the stem, printed in bold capitals (‘NOT’; ’EXCEPT’). No
additional negatives will be used in any of the options.
7. The item will contain four or five options for answering, and
be laid out in standard form.
8. Each of the options will be roughly the same length. If this is
impossible, then two groups of options will be of similar length
(e.g. two short and three longer).
9. The item will have been trial-tested and found to have a facility
between 20 and 80 percent.
10. In trial-testing, the keyed answer to the item will have been
found to discriminate positively, and distractors to discriminate
negatively.
© UNESCO 57
Module 5 Item writing for tests and examinations
A first review of keyed answer order should also take place for
multiple-choice tests. Two consecutive items may have the same key
letter, but not more than two. Also, an approximately equal number
of items should be assigned to each option letter (A, B, C, D) over
the whole test. There is often a tendency amongst item-writers to try
to ‘bury’ keyed answers by assigning them to C or D. If these two
letters are overused, the candidate who guesses using them has a
more than 25 percent chance of getting each item right.
Incidentally, the wise item writers don’t throw away the successive
draft and re-workings of the items they have prepared. For example,
the final version of item 1 above may not ‘work’ in trial testing,
and may need to be later revised. The rejected wordings and extra
options may make that task easier if that happens.
58
Field or trial testing 11
In a school setting, a trial test of items on a population which will
not do the final test is often impossible – the only students who
could do the trial are the ones who have been taught the course, and
for whom the test has been written. However in larger programs,
field testing is indispensable, for four reasons:
The sizes of the trial population needed for various tests will vary
considerably – the rule-of-thumb might be ‘the largest possible
population, given the available resources’.
© UNESCO 59
Module 5 Item writing for tests and examinations
60
Item analysis 12
There is not room in a paper of this size to canvass and describe
all the different options which exist for the analysis of item data.
Papers for other modules will explore the matter in some detail.
School-based tests will use simple rather than complex strategies
to obtain (and indices to express) this information. Larger test
programs will probably have the resources to engage in quite
lengthy and complex processing.
• the response level for each item: how many actually attempted
it, right or wrong;
• the criterion score on each item: the mean score of all those who
did attempt it;
• whether any distractors did not function well: attracted too few
candidates, or a preponderance of those of high ability.
© UNESCO 61
Module 5 Item writing for tests and examinations
62
Item analysis
© UNESCO 63
Module 5 Item writing for tests and examinations
Once every item is clean, making up the final form actually begins.
As in so many matters to do with test development, there is a
sequence of activity which should be followed, in order to ensure
that new bugs don’t appear and the test as completely as possible
meets the original specification. Figure 16 suggests an appropriate
sequence.
Figure 16
1. Read the item analysis and sort the items (or groups of items)
into three piles:
a) ready to go;
b) needing editing;
c) possible rejects.
2. Edit items and establish a final pool of items for the test.
Check the omit rate to establish optimum test length.
…
64
…
3. Check the specification against this pool for the number and
qualities of the items available. Reinstate any usable ‘rejects’ if
all the objectives are not satisfactorily covered.
4. Assign the items to a tentative order for the whole test and
enter the scoring scheme at the end of each section of the
test.
5. Check this order for:
a) order of difficulty (for example, in a set of multiple-choice
items, make sure some easy ones occur early, to give the
candidate some confidence);
b) keyed answer order and distribution;
c) balance and variety of item type.
© UNESCO 65
Module 5 Item writing for tests and examinations
balance
1. Are the items selected for the test representative of the
achievement (content and behaviours) which is to be assessed?
2. Are there enough items on the test to adequately sample the
content which has been covered and the behaviours as spelled
out by the objectives?
specificity
3. Do the test items require knowledge in the content or subject
area covered by the test?
4. Can general knowledge be used to respond to the test items?
objectivity
5. Is it clear to the test taker what is expected?
6. Is the correct response definite?
66
As a test-writer you rely on your test panel to verify positive
answers to many of these questions, but you will need to keep
them in mind as you go through the developmental process. The
specification will help you to meet the demands of Question 1.
Distinguishing content and objectives as horizontal and vertical
dimensions of your matrix (as in Figures 6, 6a and 6b) is an easy
way to start. Not every cell has to be filled, but ‘Balance’ requires
that every line and every column has something in it.
The panel (or you own judgement) will help you accommodate to
Question 2. Where testing-time is insufficient to cover everything
taught (and there rarely is enough time), achieving the best
‘Balance’ means that the sample should consist of the most
important content and the most important behaviours.
© UNESCO 67
Module 5 Item writing for tests and examinations
poor
Write all you know about the Spanish Civil War.
poor
Some people say that the Spanish Civil War was really a chance
for certain European countries to try out tactics and use new
weaponry in advance of the outbreak of a larger European War.
If you agree with this view, find some evidence for its truth and
write this in about a page. If you don’t agree, say why you don’t in
a piece of writing of about the same length.
better
“The Spanish Civil War was a dress rehearsal for World War II.”
Do you agree? In a short essay of about 500 words, support your
view with evidence.
68
But is it a good test?
poor
Write a few lines about the floating markets of Bangkok.
better
Give two reasons why the floating markets of Bangkok are
important to the city’s economy.
1 ......................................................................................................................
.........................................................................................................................
2 ......................................................................................................................
.........................................................................................................................
poor
1. Find a word in the second paragraph of the second story that
means ‘briefly’.
2. Write a short character-sketch of one of the players in the first
story.
3. Why didn’t the captain perform very well?
4. Is it true that the Blue Team won the game?
better PASSAGE 1
1. The Blue Team won the game: true or false?
2. Write a five-line character-sketch of one of the football players
mentioned in this story.
© UNESCO 69
Module 5 Item writing for tests and examinations
better PASSAGE 2
3. Write the word in Paragraph 2 that means ‘briefly’.
4. In three sentences, explain why the captain did not perform
very well.
poor
Do you know what the word prescient means?
better
The word “prescient” means ...........................................................
. . . . . . . . . . . . . . . . . . . . . . .
poor
How did Tina feel in your own words?
better
Write two sentences which show, in your own words, how Tina
probably felt during the hold-up.
. . . . . . . . . . . . . . . . . . . . . . .
poor
Don’t attempt the essay until after you have written it out first.
better
Draft your piece of writing on the blank page, then write out a
good copy on the lined page.
70
But is it a good test?
poor
What is the total elapsed time?
better
The journey took ................. hours, ................... minutes.
. . . . . . . . . . . . . . . . . . . . . . .
poor
Since John walked right round the court once, how far did he
walk?
better
In walking around the perimeter of the court, John travelled
................... meters.
. . . . . . . . . . . . . . . . . . . . . . .
poor
Solve the following problems.
better
Solve the following problems. Express your answers correct to
two decimal places.
© UNESCO 71
Module 5 Item writing for tests and examinations
samples
Up to five marks may be deducted if you do not show all your
working.
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
72
But is it a good test?
samples
In 750-1000 words write a critical review of the various changes in
the overall direction of Pablo Picasso’s painting style from 1905 to
1945.
You might mention:
a. the early work;
b. the impact of Cubism;
c. post-Cubist developments.
. . . . . . . . . . . . . . . . . . . . . . .
Choose any two poems by Schiller which you have studied this
year. Write a critical comparison of the two poems, showing what
each reflects about Schiller’s poetic achievements, any essential
differences in language or tone between the two, and your
personal assessment of their qualities.
samples
“The recent economic history of Argentina might be well
described as the nation staggering from one crisis to another.”
In a piece of writing of 750-1000 words, give your view of the
country’s economic history since 1960, emphasising one or more
of the following:
© UNESCO 73
Module 5 Item writing for tests and examinations
• international trade;
• domestic monetary policy;
• the impact of the war in the Malvinas (Falkland) Islands.
Write about the man in the photograph above. You may write in
any form you like: for example, a story, a letter, or a conversation.
poor
List the names of the characters in the story you have just heard.
better
Name the character in the story you have just heard whose
actions contribute most to the build-up of suspense. In your own
words, tell what he or she did that was decisive in achieving that
build-up.
. . . . . . . . . . . . . . . . . . . . . . .
poor
Which of the towns on the map is the third-largest in terms of its
population?
better
Town X is the market town for the region shown on the map. One
reason is that it is a railway junction. Use the map and its legend to
identify three other reasons why it has become so important.
74
But is it a good test?
poor
Candidates must do the questions in order.
better
You will give yourself your best chance if you work through the
questions in the order of presentation. However, you may need to
leave any particularly hard questions to come back to later.
. . . . . . . . . . . . . . . . . . . . . . .
poor
TIME: One hour.
better
TIME: One hour.
Leave a few minutes at the end of the test-time in order to check
your work thoroughly.
© UNESCO 75
Module 5 Item writing for tests and examinations
This team needs to be fully trained. And the training will need
the prior production of a set of criteria to be applied by all team-
members to the products of the testing, whether these be essays,
painted or sculpted works of art, diagrams, plans or computer-
output. To some extent all assessment is criterion-based in this
way. Someone exercises a judgement with some criteria or other in
mind. The need here is for the assessors to have common or shared
criteria, as far as such a thing is possible. The best assessment is
also criterion-referenced, in that the criteria not only determine the
award of credit by markers, but also underlie the reports of student
achievement which result from the assessment process. This is true
even if those results have to be norm-referenced or standardised
later for other purposes.
76
should have happened, the extended response items were pre-
tested, samples on each of the finally-chosen topics or questions will
be available. In addition, the item constructor will have had a clear
idea of what was intended or foreseen as a good or medium or poor
response to the item – these foresights should yield a basic list of
assessment criteria with which to begin the training sessions.
A simple set of base criteria with which to start any session, in any
subject, might be the following:
The words in bold can be ‘translated’ to fit just about any criterion
set. In language tests, mechanics might mean accuracy of spelling
and punctuation and organisation might yield insight into
paragraphing skills. In mathematics tests, style might mean the
economy or elegance of the thinking which went into proposing
a particular solution to a problem. What this set (or a similar one)
might do is lead the assessors towards a fruitful discussion of
© UNESCO 77
Module 5 Item writing for tests and examinations
the important criteria for their particular task. It might also help
them to avoid a common assessment problem: concentrating on
the surface features, to the exclusion of deeper, more important
qualities of a student’s work.
Figure 17
TR AINING A SCORING TE A M
8. Marking commences.
78
Training scoring teams
© UNESCO 79
Module 5 Item writing for tests and examinations
80
Fur ther reading 16
Below are citations to six books. Reference to them will take your
understanding further. As the publication dates indicate some are
quite elderly: nevertheless they are mentioned because this may
improve the chances of accessing them through libraries. Page
numbers are in bold.
© UNESCO 81
Module 5 Item writing for tests and examinations
82
Exercises 17
© UNESCO 83
Module 5 Item writing for tests and examinations
18 Glossary of terms
answer sheet or booklet
a piece of test stationery separate from the question booklet, on
which candidates record personal details and the answers to the test
items.
criterion score
the mean facility of an item taking into account only the
performance of those candidates who actually attempted to answer:
can also be calculated for whole tests or groups of items, using the
same rule.
discrimination
the ability of an option to distinguish between those groups of
candidates who had greater and lesser ability as indicated by
their performance on the whole test. The indices used are usually
expressed as positive or negative fractions of 1.0 and can be derived
using a number of different formulae (e.g. point bi-serial; phi
coefficients, etc.)
distractor
in a multiple-choice item, an option for choice which is not the
keyed answer, but which has been written in such a way as to
distract weaker candidates from selecting that key.
84
editing
preparation of refined versions of tests or items after other key
stages in the development process, such as panelling or trial-testing.
It is usually performed by the original item-writer(s).
facility
the index obtained by a multiple-choice item during testing which
indicates the number of candidates who got it right: expressed as a
percentage of the total number of candidates who sat the test. The
index for an item to be used in a final test should always lie within
the range 20-80 percent.
final form
the test instrument after it has been trial-tested, analysed, edited
and prepared for publication.
instructions
• to candidates
information printed on the question paper or answer sheet which
candidates need to be able to complete the test satisfactorily,
but which does not actually form part of the stimulus material
or questions: in some cases, these may also be read aloud by a
supervisor.
• to supervisors
information provided for test supervisors or invigilators on how
to conduct the test session correctly: includes a script of any
instructions to candidates which are required to be read aloud.
© UNESCO 85
Module 5 Item writing for tests and examinations
item
an individual task which forms one component of a test instrument:
usually applied in the context of a multiple-choice test to indicate a
single question, but can be used more broadly.
key order
in a multiple-choice test, the sequence of letters attached to keyed
answers, as in 1 D, 2 B, 3 C, etc. The same key letter should be used
for no more than two consecutive items: viz. 3 C, 4 C, 5 A.
moderation
sometimes used to describe the process whereby expert panels meet
to discuss and offer critical comment on test materials.
multiple-choice
an item format whereby a restricted number of optional responses
is offered to candidates, from which they must select one as their
answer.
omit rate
a tally of the number of candidates who did not answer a test item:
especially important in estimating the performance of trial test
candidates in the later items of the test, with a view to establishing
an acceptable test length.
option
in a multiple-choice item, a set of responses (usually four or five)
from which the candidates select their answer.
86
Glossary of terms
panel, panelling
a group of experts called together to discuss and evaluate draft
items proposed for use in a test instrument.
question book(let)
a printed test instrument which contains instructions, stimulus
material and test items for students to work through during the test
session. Answers may be recorded in this book, or on a separate
answer sheet.
specification
a document which specifies in some detail the nature and
composition of a test program or instrument: sometimes called a
‘blueprint’.
stem
in a multiple-choice item, the sentence(s) or part-sentence which
indicate the testing point or question, which candidates use to select
their answer from the options which follow.
stimulus
• directive
any information in a test which candidates need to understand the
specific task which they are being asked to perform (e.g. the stem of
a multiple-choice item, or a detailed essay topic).
• instructive
any information printed in a question booklet which candidates are
expected to refer to when answering the specific questions which
relate to it.
© UNESCO 87
Module 5 Item writing for tests and examinations
trial form
the test instrument after it has been developed, panelled and edited
ready for administration to a trial population in the field.
vetting
the process of editing and arriving at a draft test form, using the
discussions and evaluations of draft items by a panel as a guide.
88
Since 1992 UNESCO’s International Institute for Educational Planning (IIEP) has been
working with Ministries of Education in Southern and Eastern Africa in order to undertake
integrated research and training activities that will expand opportunities for educational
planners to gain the technical skills required for monitoring and evaluating the quality
of basic education, and to generate information that can be used by decision-makers to
plan and improve the quality of education. These activities have been conducted under
the auspices of the Southern and Eastern Africa Consortium for Monitoring Educational
Quality (SACMEQ).
In 2004 SACMEQ was awarded the prestigious Jan Amos Comenius Medal in recognition
of its “outstanding achievements in the field of educational research, capacity building,
and innovation”.
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible for
SACMEQ’s educational policy research programme. All modules may be found on two
Internet Websites: http://www.sacmeq.org and http://www.unesco.org/iiep.
Quantitative research methods
in educational planning
Series editor: Kenneth N.Ross
6
Module
John Izard
Overview of
test construction
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible
for the educational policy research programme conducted by the Southern and
Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ).
The designations employed and the presentation of material throughout the publication do not imply the expression of
any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area or of
its authorities, or concerning its frontiers or boundaries.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means: electronic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission
in writing from UNESCO (International Institute for Educational Planning).
Content
2. What is a test? 5
1
Module 6 Overview of test construction
II
Content
10. References 50
11. Exercises 52
© UNESCO III
Assessment needs at different 1
levels of an education system
© UNESCO 1
Module 6 Overview of test construction
2
Assessment needs at different levels of an education system
© UNESCO 3
Module 6 Overview of test construction
Those who are taking action should also know the likely direct and
indirect effects of various action options, and the costs associated
with those options. They will include politicians, high level advisors,
senior administrators, and those responsible for curriculum,
assessment, teacher training (pre-service and in-service), and other
educational planners.
That is, those taking action need to be able to provide evidence that
their actions do ‘pay off’. For example. politicians have to be able to
convince their constituents that the actions taken were wise, and
senior administrators need to be able to show that programmes
have been implemented as intended and to show the effectiveness
of those programmes. It is important for such officials to realise
that effecting change requires more than issuing new regulations.
At the national level, action will probably be needed to train those
responsible for implementing change.
4
What is a test? 2
One valid approach to assessment is to observe everything that is
taught. In most situations this is not possible, because there is so
much information to be recorded. Instead, one has to select a valid
sample from the achievements of interest. Since school learning
programmes are expected to provide students with the capability
to complete various tasks successfully, one way of assessing each
student’s learning is to give a number of these tasks to be done
under specified conditions. Conventional pencil-and-paper test
items (which may be posed as questions) are examples of these
specially selected tasks. However other tasks may be necessary as
well to give a comprehensive, valid and meaningful picture of the
learning. For example, in the learning of science subjects practical
skills are generally considered to be important so the assessment
of science subjects should therefore include some practical tasks.
Similarly, the student learning music may be required to give a
musical performance to demonstrate what has been learned. Test
items or tasks are samples of intended achievement, and a test is a
collection of such assessment tasks or items.
Clearly, the answer for one item should not depend on information
in another item or the answer to another item. Otherwise this
© UNESCO 5
Module 6 Overview of test construction
6
What is a test?
© UNESCO 7
Module 6 Overview of test construction
The students and their parents will focus on the total scores at
the foot of the columns. High scores will be taken as evidence
of high achievement, and low scores will be taken as evidence of
low achievement. However, in summarising achievement, these
scores have lost their meaning in terms of particular strengths and
weaknesses. They give no information about which aspects of the
curriculum students knew and which they did not understand.
Teachers, subject specialists, curriculum planners, and national
policy advisors need to focus on the total scores shown to the right
of the matrix. These scores show how well the various content areas
have been covered. Low scores show substantial gaps in knowledge
where the intentions of the curriculum have not been met. High
scores show where curriculum intentions have been met (at least for
those questions that appeared on the test).
8
Figure 1. Matrix of student data on a twenty-item test
Students
Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 15
2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 14
3 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 14
4 0 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 11
5 1 0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 13
6 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 11
7 0 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 1 1 8
8 0 0 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 13
9 0 0 0 0 1 0 0 1 0 1 1 0 0 1 1 1 1 1 9
10 0 1 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 1 7
11 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 0 7
12 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 6
13 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1 7
14 0 0 1 0 0 0 1 1 1 1 1 1 0 1 0 1 1 1 11
15 1 0 1 1 0 0 0 0 1 1 1 1 1 0 1 1 1 1 12
16 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 1 1 1 8
17 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 1 1 1 8
18 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 6
19 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 4
20 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 5
21 1 1 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 0 10
3 4 5 7 7 9 10 10 12 12 14 14 14 14 15 15 17 17 199
© UNESCO 9
Module 6 Overview of test construction
Students
Items 1 2 3 4 5 • • • • • • • • •
1
2
•
•
•
10
Interpreting test data
© UNESCO 11
Module 6 Overview of test construction
For example, we could weigh the students and give the highest
scores to those with the largest mass. This assessment could be very
consistent, particularly if our scales were accurate. However, this
assessment information is not meaningful when trying to judge
whether learning has occurred. To be meaningful in this context,
our assessment tasks (items) have to relate to what students learn.
The choice of what to assess, the strategies of assessment, and the
modes of reporting depend upon the intentions of the curriculum,
the importance of different parts of the curriculum, and the
audiences needing the information that assessment provides.
12
• all important parts of the curriculum are addressed;
• achievement over the full range is assessed (not just the narrow
band where a particular selection decision might be required on
a single occasion).
© UNESCO 13
Module 6 Overview of test construction
Tasks may also be too easy. If all students can do all of the tasks
then the most able students will not be able to provide evidence of
their advanced achievements. If two such assessments are made
it will appear that these able students have not learned anything
(because their scores cannot improve) even though they may have
learned a great deal of important knowledge and skills. Such
assessments are also faulty in that they fail to recognise learning
that has occurred. (A test with many easy items which do not
allow more able students to show evidence of their learning may be
referred to by saying that the test has a ‘ceiling’ effect.)
14
Inferring range of achievement from samples of tasks
When assessment results have high stakes (as in the case where
results are used to select a small proportion for the next stage
of schooling or for employment), the chosen assessment tasks
have a high degree of influence on curriculum, teaching practice,
and student behaviour. When public examination papers are
published, teachers and students expend a great deal of effort
in analysing these papers, to practice test-taking skills, and
to attempt to predict what topics will be examined so that the
whole curriculum does not have to be studied. These practices of
restricting learning to examinable topics may lead to high scores
being obtained without the associated (and expected) coverage of
the intended curriculum.
© UNESCO 15
Module 6 Overview of test construction
This may cost more at first because the levels of skill in writing
such tests are much higher. Generally teams of item-writers are
required rather than depending upon a very limited number of
individuals to write all of the questions. The pool of experienced
teachers with such skills will not increase in size if teachers are
not encouraged by the assessment procedures to prepare students
for higher quality assessments. Further, item writing skills develop
in part from extensive experience in writing items (especially
those that are improvements on previous items). Such experience
of item writing, of exploring the ways that students think, and
of considering the ways in which students interpret a variety
of evidence, is gained gradually. Many good item writers are
experienced and sensitive classroom teachers who have developed
the capacity to construct items which reveal (correct and incorrect)
thought processes.
16
Inferring range of achievement from samples of tasks
© UNESCO 17
Module 6 Overview of test construction
18
Results used to compare students
Norm-referenced tests: These tests provide the results for
a reference group on a representative test and therefore scores
on the test are normally presented in terms of comparisons with
this reference group. If the reference group serves as a baseline
group, norm-referenced scores can provide evidence of learning
improvement (or decline) for the student population although this
is in terms of a change in score rather than an indication of what
students can now do (or not do) compared with what they could do
before. If changes in score are reported (for example, a difference
in average score), administrators have little evidence about the
strengths and weakness reflected in the results for particular
topics and may rely on rather limited experience (such as their
own experiences as a student) to interpret the changes. This could
result in increased expenditure on the wrong topic, wasting scarce
resources, and not addressing the real problems.
© UNESCO 19
Module 6 Overview of test construction
Very few tests provide the user with a strategy for making such
adjustments for themselves although some tests prepared using
Item Response Theory or Latent Trait Theory do enable qualified
and experienced users to estimate new norm tables for particular
sub-sets of items.
20
What purposes will the test serve?
© UNESCO 21
Module 6 Overview of test construction
22
What purposes will the test serve?
© UNESCO 23
Module 6 Overview of test construction
Finally, the term can refer (loosely) to a published test which was
prepared by standard (or conventional) procedures. The usage of
‘standardised’ has become somewhat confused because published
tests often present scores interpreted in terms of deviation from the
mean (or average) and have a standard procedure for administering
tests and interpreting results.
5 metres
3 metres 3 metres
5 metres
24
What purposes will the test serve?
5 metres
3 metres 3 metres
5 metres
© UNESCO 25
Module 6 Overview of test construction
26
There are potential difficulties in scoring such prose, oral, drawn
and manipulative responses. An expert judge is required because
each response requires interpretation to be scored. Judges vary
in their expertise, vary over time in the way they score responses
(due to fatigue, difficulty in making an objective judgment without
being influenced by the previous candidate’s response, or by
giving varying credit for some correct responses over other correct
responses), and vary in the notice they take of handwriting,
neatness, grammatical usage and spelling.
© UNESCO 27
Module 6 Overview of test construction
28
What types of tasks?
A B C D
A Platypus
B Marmosets
C Flatworms
D Plankton
© UNESCO 29
Module 6 Overview of test construction
30
What types of tasks?
© UNESCO 31
Module 6 Overview of test construction
32
Figure 4. Stages in test construction
â
Decision to allocate resources
â
Content analysis and test blue print
â
à
Item writing
â
Item review 1
â
Planning item scoring
â
Production of trial tests
â
Trial testing
â
Item review 2
â
Amendment
(revise/replace/discard)
â
More items needed? à Yes
No
â
Assembly of final tests
© UNESCO 33
Module 6 Overview of test construction
Objectives
Content
Recall Computational
Understanding Total
of facts skills
Total 14 12 28 54
34
The test construction steps
Item writing
Item writing is the preparation of assessment tasks which can reveal
the knowledge and skill of students when their responses to these
tasks are inspected. Tasks which confuse, which do not engage the
students, or which offend, always obscure important evidence by
either failing to gather appropriate information or by distracting
the student from the intended task. Sound assessment tasks
will be those which students want to tackle, those which make
clear what is required of the students, and those which provide
evidence of the intellectual capabilities of the students. Remember,
items are needed for each important aspect as reflected in the test
specification. Some item writers fall into the trap of measuring what
is easy to measure rather than what is important to measure. This
enables superficial question quotas to be met but at the expense
of validity – using questions that are easy to write rather than
those which are important distorts the assessment process, and
therefore conveys inappropriate information about the curriculum
to students, teachers, and school communities.
© UNESCO 35
Module 6 Overview of test construction
Item review
The first form of item analysis: Checking intended
against actual
Writing assessment tasks for use in tests requires skill. Sometimes
the item seems clear to the person who wrote it but may not
necessarily be clear to others. Before empirical trial, assessment
tasks need to be reviewed by a review panel (with a number of
people) with questions like:
• Is there a single clearly correct (or best) answer for each item?
36
The test construction steps
This review before the items are tried should ensure that we avoid
tasks which are expressed in language too complex for the idea
being tested, avoid redundant words, multiple negatives, and
distractors which are not plausible. The review should also identify
items with no correct (or best) answer and items with multiple
correct answers. Such items may be discarded or re-written.
• Will the students be told how the items are to be scored? Will
they be told the relative importance of each item? Will they be
given advice on how to do their best on the test?
© UNESCO 37
Module 6 Overview of test construction
• Has the layout of the test (and answer sheet if appropriate) been
arranged for efficient scoring of responses? Are distractors for
multiple-choice tests shown as capital letters (easier to score
than lower case letters)?
38
The test construction steps
© UNESCO 39
Module 6 Overview of test construction
as informing those attempting the trials tests that their results will
be used to validate the items and will not have any effect on their
current course work), and gather all test materials before candidates
leave the room.
Item analysis
The second form involving responses by real
candidates
Empirical trial can identify instances of confused meaning,
alternative explanations not already considered by the test
constructors, and (for multiple-choice questions) options which are
popular amongst those lacking knowledge, and ‘incorrect’ options
which are chosen for some reason by very able students.
40
The test construction steps
© UNESCO 41
Module 6 Overview of test construction
A test with high content validity for one curriculum may not be
as valid for another curriculum. This is an issue which bedevils
international comparisons where the same test is administered in
several countries. Interpretation of the results in each country has
to take account of the extent to which the comparison test is content
valid for each country. If two countries have curricula that are only
partly represented in the test, then comparisons between the results
of those countries are only valid for part of the data.
42
The test construction steps
When tests have high construct validity we may argue that this
is evidence of dimensionality. When we add scores on different
parts of a test to give a score on the whole test, we are assuming
dimensionality without checking whether our assumption is
justified. Similarly, when item analysis is done using the total score
on the same test as the criterion, we are assuming that the test
as a whole is measuring a single dimension or construct, and the
analysis seeks to identify items which contradict this assumption.
© UNESCO 43
Module 6 Overview of test construction
44
Resources required to 8
construct and produce a test
© UNESCO 45
Module 6 Overview of test construction
teachers (either set aside from classroom work to join the test-
construction team, or paid additional fees to work outside school
hours), class teacher and student time for trials, the paper on
which the test (and answer sheet if appropriate) is to be printed
or photocopied, the production of copies of the test materials,
distribution to schools, retrieval from schools (if teachers are
not to score the tests), and scoring and analysis costs. Figure 6
shows a possible time scale for developing two parallel forms of
an achievement test of 50 items for use during the sixth year of
schooling. Figure 6 also shows the resources that would need to be
assembled to ensure that the tests were produced on time. (Note
that this schedule assumes that the test construction team has
had test development training prior to commencing work on the
project.)
46
Resources required to construct and produce a test
Time
Task Resources
(weeks)
© UNESCO 47
Module 6 Overview of test construction
Preparation of final forms of a test is not the end of the work. The
data gathered from the use of final versions should be monitored
as a quality control check on their performance. Such analyses
can also be used to fix a standard by which the performance of
future candidates may be compared. It is important to do this as
candidates in one year may vary in quality from those in another
year.
48
It is customary to develop more trial forms so that some forms of
the final test can be retired from use (where there is a possibility of
candidates having prior knowledge of the items through continued
use of the same test).
The trial forms should include acceptable items from the original
trials (not necessarily items which were used on the final forms
but similar in design to the pattern of item types used in the final
forms) to serve as a link between the new items and the old items.
The process of linking tests using such items is referred to as
anchoring. Surplus items can be retained for future use in similar
test papers.
© UNESCO 49
Module 6 Overview of test construction
10 References
General measurement and evaluation
1. Hopkins, C.D. and Antes, R.L. (1990). Classroom measurement
and evaluation. Itasca, Illinois: Peacock.
Item writing
1. Withers, G. (1997). Item Writing for Tests and Examinations.
Paris: International Institute for Educational Planning.
50
Testing applications
1. Adams, R.J., Doig, B.A. & Rosier, M.J. (1991). Science learning
in Victorian schools: 1990. (ACER Research Monograph No. 41).
Hawthorn, Vic.: Australian Council for Educational Research.
© UNESCO 51
Module 6 Overview of test construction
11 Exercises
1. CONSTRUCTION OF A TEST PL AN
52
2. TE X TBOOK ANALYSIS AND ITE M WRITING
(b) Choose one cell of the test plan and write some items for this
cell.
© UNESCO 53
Since 1992 UNESCO’s International Institute for Educational Planning (IIEP) has been
working with Ministries of Education in Southern and Eastern Africa in order to undertake
integrated research and training activities that will expand opportunities for educational
planners to gain the technical skills required for monitoring and evaluating the quality
of basic education, and to generate information that can be used by decision-makers to
plan and improve the quality of education. These activities have been conducted under
the auspices of the Southern and Eastern Africa Consortium for Monitoring Educational
Quality (SACMEQ).
In 2004 SACMEQ was awarded the prestigious Jan Amos Comenius Medal in recognition
of its “outstanding achievements in the field of educational research, capacity building,
and innovation”.
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible for
SACMEQ’s educational policy research programme. All modules may be found on two
Internet Websites: http://www.sacmeq.org and http://www.unesco.org/iiep.
Quantitative research methods
in educational planning
Series editor: Kenneth N.Ross
7
Module
John Izard
Trial testing
and item analysis
in test construction
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible
for the educational policy research programme conducted by the Southern and
Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ).
The designations employed and the presentation of material throughout the publication do not imply the expression of
any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area or of
its authorities, or concerning its frontiers or boundaries.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means: electronic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission
in writing from UNESCO (International Institute for Educational Planning).
Content
1. Introduction 1
1
Module 7 Trial testing and item analysis in test construction
7. Aknowledging co-operation 24
II
Content
14. Exercises 77
© UNESCO III
Introduction 1
© UNESCO 1
Module 7 Trial testing and item analysis in test construction
â
Decision to allocate resources
â
Content analysis and test blue print
â
à
Item writing
â
Item review 1
â
Planning item scoring
â
Production of trial tests
â
Trials
â
Item review 2
â
Amendment
(revise/replace/discard)
â
More items needed? à Yes
â
No
â
Assembly of final tests
2
Preparing for trial testing 2
Before undertaking a trial test project, we need to make some
important checks. Trial testing uses time and resources so we must
be sure that the proposed trial test is as sound as possible so that
time and resources are not wasted. The team preparing the trial
tests should have prepared a content analysis and test blueprint. A
panel should review the trial test in terms of the content analysis
and test blueprint to make sure that the trial test meets the intended
test specifications. It is also necessary to review each test item
before trial testing commences.
Content analysis
A content analysis provides a summary of the intentions of the
curriculum expressed in content terms. Which content is supposed
to be covered in the curriculum? Are there significant sections of
this content? Are there significant subdivisions within any of the
sections? Which of these content areas should a representative test
include?
Test blueprint
A test blueprint is a specification of what the test should cover
rather than a description of what the curriculum covers. A test
blueprint should include the test title, the fundamental purpose
of the test, the aspects of the curriculum covered by the test, an
© UNESCO 3
Module 7 Trial testing and item analysis in test construction
indication of the students for whom the test will be used, the types
of task that will be used in the test (and how these tasks will fit in
with other relevant evidence to be collected), the uses to be made of
the evidence provided by the test, the conditions under which the
test will be given (time, place, who will administer the test, who will
score the responses, how the accuracy of scoring will be checked,
whether students will be able to consult books (or use calculators)
while attempting the test, any precautions to ensure that the
responses are only the work of the student attempting the test, and
the balance of the questions.
Item review
Why should the proposed trial test be reviewed before trial? The
choice of what to assess, the strategies of assessment, and the modes
of reporting depend upon the intentions of the curriculum, the
importance of different parts of the curriculum, and the audiences
needing the information that assessment provides. If we do not
select an appropriate sample of evidence, then the conclusions we
draw will be suspect, regardless of how accurately we make the
assessments. Tasks chosen have to be representative so that:
4
Preparing for trial testing
© UNESCO 5
Module 7 Trial testing and item analysis in test construction
• Is there a single, clearly correct (or best) answer for each item?
This part of the review before the items are tried should help
avoid tasks which are expressed in language too complex for
the idea being tested, and/or contain redundant words, multiple
negatives, and distracters which are not plausible. The review
should also identify items with no correct (or best) answer and
items with multiple correct answers. Such items may be discarded
or re-written. Only good items should be used in a trial test. (The
subsequent item analysis helps choose the items with the best
statistical properties from the items that were good enough for
trial).
6
Preparing for trial testing
Will the students be told how the items are to be scored? Will they be
told the relative importance of each item? Will they be given advice
on how to do their best on the test?
Will there be practice items? Do students need advice on how they are
to record their responses? If practice items are to be used for this
purpose, what types of response do they cover? How many practice
items will be necessary?
Has the scoring been arranged for efficient scoring (or coding) of
responses? Are distracters for multiple-choice tests shown as capital
letters (less confusing to score than lower case letters)? One long
column of answers is generally easier to score by hand than several
short columns.
© UNESCO 7
Module 7 Trial testing and item analysis in test construction
How much time will students have to do the actual test? What time
will be set aside to give instructions to those students attempting
the test? Will the final number of items be too large for the test to
be given in a single session? Will there be a break between testing
sessions when there is more than one session?
What type of score key will be used? Complex scoring has to be done
by experienced scorers, and they usually write a code for the mark
next to the test answer or on a separate coding sheet. Multiple-
choice items are usually coded by number or letter and the scoring
is done by a test analysis computer programme.
• how they are to show their answers (whether on the test paper,
or on a separate answer sheet); and
8
Preparing for trial testing
When several trial tests are being given at the same time (and this is
usually the case) it is important to have some visible distinguishing
mark on the front of each version of the test. Then the test
supervisor can see at a glance that the tests have been alternated.
If distinguishing marks for each version cannot be used, then a
different colour of cover page for each version is essential.
The trial test pages should not be sent for reproduction of copies
until the whole team is satisfied that all possible errors have been
found and corrected. All corrections must be checked carefully to
be sure that everything is correct! [Experience has shown that
sometimes a person making the corrections may think it better to
© UNESCO 9
Module 7 Trial testing and item analysis in test construction
retype a page rather than make the changes. If only the ‘corrections’
are checked, a (new) mistake that may have been introduced will
not be detected.]
10
Planning the trial testing 3
Empirical trial testing provides an opportunity to identify
questionable items which have not been recognised in the process of
item writing and review. At the same time, the test administration
instructions are able to be refined to ensure that the tasks presented
in the test are as identical as possible for each candidate. (If the
test administration instructions vary then some candidates may
be advantaged over others on a basis unrelated to the required
knowledge which is being assessed by the test).
© UNESCO 11
Module 7 Trial testing and item analysis in test construction
The target audience for the final form of the test should guide
the selection of a trial sample. If the target audience is to be a
whole nation or region within a nation, then the sample should
approximate the urban/rural, sizes and types of school and age level
mix in the target audience. This type of sample is called a judgment
sample, because we depend on experience to choose a sufficiently
varied sample for trial purposes. The choice of sample also has to
consider two competing issues: the costs of undertaking the trial
testing and the need to restrict the influence of particular schools.
The more schools are involved in the trial testing and the more
diverse their location, the greater the travel and accommodation
costs. The smaller the number of schools the greater the influence of
a single school on the results.
12
• Government/Private
• Co-educational/Boys/Girls
• Primary/Secondary/Vocational
• Selective/Non-selective
The test specification grid (part of the test blueprint) will help in the
preparation of this documentation (see Figure 2). For example, the
content and skill objectives of a basic statistics test are shown in the
grid below.
© UNESCO 13
Module 7 Trial testing and item analysis in test construction
Objectives
Content
Recall Computational
Understanding Total
of facts skills
Total 14 12 28 54
Objectives
Content
Recall Computational
Understanding Total
of facts skills
Total 14 12 28 54
14
Choosing a sample of candidates for the test trials
The code book should show which items appear in each cell. One
way of doing this is to show the specification grid with the item
numbers in place and show the score key below the grid (see
Figures 3 and 4).
© UNESCO 15
Module 7 Trial testing and item analysis in test construction
16
Choosing a sample of candidates for the test trials
These instructions assume the candidates can read. The tester should have a
stopwatch, or a digital watch showing minutes and seconds or a clock with a
sweep-second hand. Make sure each candidate has a pen, ballpoint pen, or
pencil. All other materials should be put away before the test is started.
Give each candidate a copy of the test, drawing attention to the instruction
on the cover.
do not open this book or write anything until you are told.
Instruct the candidates to complete the information on the front cover of
the test, assisting as necessary. Check that each candidate has completed
the information correctly. (Year of birth should not be this year; number
of months in the age should be 11 or less;first name should be shown in
full rather than as an initial.) Ensure that the test booklet remains closed.
Read these instructions (which are shown on the cover of the test), asking
candidates to follow while you read.
Say:
Work out the answer to each question in your head if you can. You
can use the margins for calculations if you need to. You will receive
one mark for each correct answer.
Work as quickly and accurately as you can so that you get as many
questions right as possible. You are not expected to do all of the
questions. If you cannot do a question do not waste time. Go on to
the next question. If there is time go back to the questions you left
out.
Write your answer on the line next to the question. If you change
an answer, make sure that your new answer can be read easily.
Check that everybody is ready to start the test. Tell candidates that they
have 30 (thirty) minutes from the time they are told to start to answer the
questions. Note the time and tell candidates to turn the page and start
question one.
After 30 (thirty) minutes tell candidates to stop work and to close their
booklets.
Collect the tests, making sure that there is one test for each candidate, and
thank the candidates for their efforts.
© UNESCO 17
Module 7 Trial testing and item analysis in test construction
The supervisor makes sure that all candidates are seated, introduces
him/herself, explains briefly what will happen in the testing session
and answers queries, distributes the test and associated papers to
each person according to the agreed plan, and ensures that each
candidate has a fair chance of completing the trial test without
interruption. The supervisor must enforce the test time limits so
that candidates in each testing room have essentially the same time
to attempt the items.
After the test has been attempted, it is usual for all test materials to
be placed in an envelope (or several if need be) with identification
information about the trial group and the location where the
tests were completed. If there is time, the trial tests can be sorted
into the different test forms before being placed in the envelope.
The envelope should be sealed. The test supervisor for a room is
responsible for ensuring that all the test papers (used and unused)
are returned to those who will process the information.
18
Processing test responses 6
after a trial testing session
When the trial tests arrive back at the trial testing office they should
still be in their sealed envelopes or packages. Only one envelope is
opened at a time, as it is important to know the source of every test
paper. When an envelope is opened, the trial tests are sorted into
stacks according to the test version.
© UNESCO 19
Module 7 Trial testing and item analysis in test construction
Scoring procedures
• Multiple-choice
• Constructed response
20
Processing test responses after a trial testing session
A more subtle difference occurs when some judges see more “shades
of grey” or see fewer such gradations (as in the tendency to award
full-marks or no marks). Scorers should make use of similar ranges
of the scale.
© UNESCO 21
Module 7 Trial testing and item analysis in test construction
When this sorting has been finished the essays in each group are
checked quickly to ensure that they are in the correct group. The
essays in each group are then sorted into two further groups and
checked again. For both approaches essays should be assessed as
anonymously as possible.
22
Processing test responses after a trial testing session
When all items have been marked, the scores are entered into a
computer file. If the test is multiple-choice in format, the responses
may be entered into a computer file directly. (The scoring of the
correct answers is done by the test analysis computer programme).
The next envelope of tests is not opened until the processing of
the first package has been assigned. This is to ensure that tests do
not get interchanged between packages. [Sending the wrong results
to an institution reflects very badly on those in charge of the test trials
and analysis.] Data entry can be done in parallel provided that each
package is the responsibility of one person (who works on that
package until all work on the tests it contains is completed). The
tests are then returned to their package until the analysis has been
completed, and the wrapping is annotated to show which range
of candidate numbers is in the envelope and the tests for which
the data have been entered. (If a query arises in an analysis, the
actual test items for that candidate must be accessed quickly and
efficiently).
© UNESCO 23
Module 7 Trial testing and item analysis in test construction
7 Aknowledging co-operation
If the results from the trial tests are to be sent back to the
institutions which co-operated in the trials, the results should be
accompanied by some advice on interpretation. This advice should
include something like this.
24
Analysis in terms of 8
candidate responses
When candidate responses are available for analysis, trial test
items can be considered in terms of their psychometric properties.
Although this sounds very technical and specialized, the ideas
behind such analyses are relatively simple. We expect a test to
measure the skills that we want to measure. Each item should
contribute to identifying the high quality candidates. We can see
which items are consistent with the test as a whole. In effect, we are
asking whether an item identifies the able candidates as well as can
be achieved by using the scores on the test as a whole.
• Item difficulty
© UNESCO 25
Module 7 Trial testing and item analysis in test construction
• Item discrimination
26
Analysis in terms of candidate responses
© UNESCO 27
Module 7 Trial testing and item analysis in test construction
Students
Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 15
2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 14
3 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 14
4 0 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 11
5 1 0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 13
6 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 11
7 0 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 1 1 8
8 0 0 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 13
9 0 0 0 0 1 0 0 1 0 1 1 0 0 1 1 1 1 1 9
10 0 1 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 1 7
11 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 0 7
12 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 6
13 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1 7
14 0 0 1 0 0 0 1 1 1 1 1 1 0 1 0 1 1 1 11
15 1 0 1 1 0 0 0 0 1 1 1 1 1 0 1 1 1 1 12
16 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 1 1 1 8
17 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 1 1 1 8
18 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 6
19 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 4
20 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 5
21 1 1 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 0 10
3 4 5 7 7 9 10 10 12 12 14 14 14 14 15 15 17 17 199
28
Analysis in terms of candidate responses
When the data were entered into the table, the data for the student
with lowest score was entered first, then the student with the next
lowest score, and so on.
In Figure 6, the position of the rows (item scores) has been altered so
that the easiest item is at the top of the matrix and the other rows
are arranged in descending order. Notice that the top right corner
of the matrix has mostly entries of 1s, and the lower left corner has
mostly entries of 0s.
© UNESCO 29
Module 7 Trial testing and item analysis in test construction
Students
Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 15
2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 14
3 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 14
5 1 0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 13
8 0 0 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 13
15 1 0 1 1 0 0 0 0 1 1 1 1 1 0 1 1 1 1 12
4 0 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 11
6 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 11
14 0 0 1 0 0 0 1 1 1 1 1 1 0 1 0 1 1 1 11
21 1 1 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 0 10
9 0 0 0 0 1 0 0 1 0 1 1 0 0 1 1 1 1 1 9
16 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 1 1 1 8
17 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 1 1 1 8
7 0 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 1 1 8
13 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1 7
10 0 1 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 1 7
11 0 0 0 0 1 0 0 1 0 1 0 0 1 1 1 0 1 0 7
12 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 6
18 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 6
20 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 5
19 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 4
3 4 5 7 7 9 10 10 12 12 14 14 14 14 15 15 17 17 199
30
Analysis in terms of candidate responses
Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 15
The Low group has 4 successes; the High group has 6 successes.
You can draw a graph like the one shown in Figure 8 for item 1.
6
5
4
3
2
1
Low Middle High
Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
4 0 0 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 11
The Low group has 2 successes; the High group has 4 successes.
You can draw the graph like the one shown in Figure 10.
© UNESCO 31
Module 7 Trial testing and item analysis in test construction
6
5
4
3
2
1
Low Middle High
Note that in each case, although the actual numbers differ, the low
group had less success than the high group. This is the expected
pattern for correct answers if the item measures the same skills as
the whole test. Now look at the pattern for item 19 (Figure 11).
Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 4
The Low group has 1 success; the High group has 1 success. You
can draw a graph like the one shown in Figure 12.
6
5
4
3
2
1
Low Middle High
32
Analysis in terms of candidate responses
In this case the columns are equal. If these data were from a larger
sample and gave this pattern, we could conclude that item 19 was
not consistent with the rest of the test. Further, if the low group
did better than the high group we would think that there was
something wrong with the item, or that it was measuring something
different, or that the answer key was wrong. Test analysis can
identify a problem with an item but the person doing the analysis
has to work out why this is so.
Look again at the graphs for correct answers for items 1, 4, and 19
(as shown in Figure 13 below). Trend lines have been added. Items
performing as expected have a rising slope from left to right for
the correct answers. Item 19 does not show a rise; the data for this
item show no evidence that the item distinguishes between those
who are able and those who are not (where the criterion groups are
determined from scores on the test as a whole). For item 21 (Figure
13 below) there is evidence that this item distinguishes between
those who are able and those who are not (as determined from the
test as a whole) but not in the expected direction. Those who are less
able are better on this item than those who are more able. It may
be that the score key has the wrong ‘correct’ answer, that the item
is testing something different from the other items, that the better
candidates were taught the wrong information, and/or only the
weaker candidates were taught the topic because it was assumed
(incorrectly) that able students already knew the work. Item analysis
does not tell you which fault applies. You have to speculate on
possible reasons and then make an informed judgment.
© UNESCO 33
Module 7 Trial testing and item analysis in test construction
Item 1 Item 4
6 6
5 5
4 4
3 3
2 2
1 1
Item 19 Item 21
6 6
5 5
4 4
3 3
2 2
1 1
34
Analysis in terms of candidate responses
© UNESCO 35
Module 7 Trial testing and item analysis in test construction
Item 1 Item 4
6 6
5 5
4 4
3 3
2 2
1 1
Item 19 Item 21
6 6
5 5
4 4
3 3
2 2
1 1
36
Analysis in terms of candidate responses
© UNESCO 37
Module 7 Trial testing and item analysis in test construction
Correct responses
OK ? ? ? ? OK? OK?
38
Analysis in terms of candidate responses
The data for analysis are shown below (Figure 18). In this figure
the candidates are listed in the left column. Each row shows
the responses to the items. Acceptable responses (correct and
incorrect) are 1, 2, 3, 4, 5 and 6. The first five acceptable responses
are multiple-choice options for each item. (In this example the
responses have been entered as numerals, but they could have been
entered as letters such as A, B, C, D, E and F). The key (the list of
correct answers in the correct order for this test) is supplied at the
bottom of the response data. The 6 indicates that the question was
omitted, but candidates had sufficient time to attempt all items. To
help line up columns, the last two lines show the item numbers.
© UNESCO 39
Module 7 Trial testing and item analysis in test construction
X03, X08, X19, X21, X22, X24, X26, X14, and X15 will be in the high
group; X05, X25, X09, X06, X12, X13, X20, X11, and X23 will be in
the middle group; and X16, X27, X07, X01, X02, X04, X10, X17, and
X18 will be in the lower group).
Make some tables like Figure 19. Use one table for each item. Taking
each item in turn, count how many from the High group chose 1,
how many chose 2, how many chose 3, and so on. As you complete
each count, write the result in your table for that item.
40
Analysis in terms of candidate responses
X01 6 2 4 2 3 6 5 4 3 5 1 3 2 3 2 2 3 3 5 2 4 2 1 4 3 2 1 2 2 2
X02 5 2 4 2 2 5 5 1 3 4 1 4 2 5 1 1 3 2 9 9 5 5 1 9 3 1 3 2 2 2
X03 5 2 4 2 5 5 5 4 3 1 1 4 5 5 5 2 3 3 3 5 4 5 4 1 4 3 9 2 1 2
X04 2 2 4 1 2 5 1 5 3 1 1 1 2 5 5 2 3 5 5 4 2 5 5 4 3 2 2 2 1 2
X05 5 2 4 3 4 5 5 5 3 5 1 4 2 5 3 2 3 3 5 5 4 5 4 2 4 2 3 2 3 3
X06 5 2 4 2 3 9 1 1 3 1 1 4 2 5 4 2 3 3 5 4 4 5 4 1 3 2 2 2 1 2
X07 5 2 4 1 2 1 5 1 3 1 1 4 2 5 5 3 3 3 3 1 1 5 4 2 1 4 3 2 4 2
X08 5 2 4 2 5 5 5 4 3 5 2 4 2 5 5 2 3 3 3 5 4 3 4 1 5 4 5 2 1 2
X09 5 2 4 1 5 1 2 4 3 2 1 4 2 5 5 2 1 1 5 5 4 5 4 3 4 3 3 2 1 2
X10 5 1 4 1 5 1 2 4 3 1 3 4 2 3 2 1 3 3 4 1 5 5 2 1 3 2 4 3 1 2
X11 5 2 4 1 5 1 5 2 3 2 5 3 2 2 5 2 1 3 3 5 4 5 5 1 1 2 3 2 1 2
X12 5 2 4 2 5 5 1 4 3 1 1 4 2 5 5 1 3 3 5 5 4 2 4 2 1 4 1 4 1 2
X13 5 2 4 2 2 5 5 5 9 1 1 4 2 5 5 2 3 3 3 9 9 5 9 9 9 2 9 2 1 2
X14 5 2 4 2 2 5 5 4 3 5 1 4 2 5 5 2 3 3 4 9 4 5 4 9 9 9 1 2 1 3
X15 5 2 4 2 2 5 5 2 3 5 1 4 2 5 5 2 3 3 5 9 4 5 1 4 1 2 2 2 1 2
X16 5 2 4 2 9 5 5 5 3 4 1 4 2 5 5 9 3 5 5 9 4 5 4 9 1 9 9 2 1 2
X17 5 2 4 5 3 2 5 2 3 5 5 1 2 5 5 5 1 9 5 5 4 5 5 3 9 9 9 2 1 2
X18 5 2 4 1 1 5 1 4 4 5 1 4 2 3 2 1 3 3 5 9 9 2 5 1 2 2 2 4 1 1
X19 5 2 4 2 5 5 5 3 3 1 1 4 2 5 5 2 3 3 3 9 4 5 4 1 9 9 4 2 1 3
X20 5 2 4 2 4 5 5 2 5 9 1 4 2 5 5 1 3 3 3 9 4 5 4 9 9 3 9 2 1 2
X21 5 2 4 2 5 1 5 9 3 5 1 4 2 5 5 2 9 3 3 5 4 9 4 9 4 9 9 2 1 2
X22 2 2 4 2 5 5 5 4 3 5 1 4 2 5 5 2 3 3 3 1 4 2 1 1 5 2 2 2 1 2
X23 5 2 4 2 5 5 5 4 3 9 1 4 2 2 4 1 1 3 2 4 4 5 3 1 4 3 3 2 2 2
X24 5 2 4 2 5 5 5 4 3 1 1 4 2 1 5 2 3 3 5 2 4 5 4 1 1 3 5 2 1 2
X25 5 2 4 2 2 5 2 9 3 5 2 4 2 5 5 2 3 3 3 9 4 5 4 9 3 9 5 2 1 2
X26 5 2 4 2 5 3 5 4 5 5 1 4 2 5 5 2 2 3 2 3 4 5 4 3 4 3 2 2 1 2
X27 5 1 4 2 2 5 2 4 2 3 1 4 2 1 5 2 3 3 1 3 2 5 4 3 4 4 4 2 2 2
Key 5 2 4 2 5 5 5 4 3 5 1 4 2 5 5 2 3 3 3 5 4 5 4 1 4 2 4 2 1 2
Item ^^^^^^^^^^^^^^^^^^^^^^^ 1 ^^^^^^^^^^^^^^^^^^^^^^^ 2 ^^^^^^^^^^^^^^^^^^^^^^^ 3
Num 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
© UNESCO 41
Module 7 Trial testing and item analysis in test construction
Item Option
No — 1 2 3 4 5 Other Total
H — — — — — — —
M — — — — — — —
L — — — — — — —
Total — — — — — — —
Item Option
M — — — — –9– — –9–
Item 1 has been completed (Figure 20) to show you how the results
are recorded. The * indicates the option that was keyed as correct.
42
Analysis in terms of candidate responses
When all items have a completed table of data, the information for
the keyed responses can be graphed. A graph for item 1 is shown
below (Figure 21), together with a blank graph (Figure 22). These
graphs can be compared with those in Figure 17.
Item –1–
10
9
8
7
6
5
4
3
2
1
Low Middle High
© UNESCO 43
Module 7 Trial testing and item analysis in test construction
Item —
10
9
8
7
6
5
4
3
2
1
Low Middle High
44
Item analysis approaches 9
using the computer
There are two main types of approaches to item analysis used
extensively in test research and development organizations.
Some use one approach, some use the other, and some use both
approaches in conjunction with each other. In this module the
earlier approach will be called the Classical (or traditional) item
analysis, and the more recent approach will be called Item Response
Modelling.
© UNESCO 45
Module 7 Trial testing and item analysis in test construction
46
Item analysis approaches using the computer
The first part of the computer output from a traditional test analysis
report for a multiple-choice test might look like Figure 23.
© UNESCO 47
Module 7 Trial testing and item analysis in test construction
48
Item analysis approaches using the computer
The analysis for item 23 is shown in Figure 24. This item has
many good qualities. It is in an appropriate range of difficulty (the
proportion correct was 0.593) and those who were incorrect are
spread over each of the other options. The ‘correct’ option has a
substantial positive agreement (0.549) with the test as a whole.
All of the ‘incorrect’ options have negative agreements: 1 -0.072;
2 -0.287; 3 -0.049; and 5 -0.515 with the test as a whole.
© UNESCO 49
Module 7 Trial testing and item analysis in test construction
Item 25 (Figure 26) is similar to item 27, but identifying the problem
with the item may be difficult. The keyed option does have a
positive agreement (0.265) with the test as a whole. However other
options also have positive agreements (0.030 and 0.403). The test
50
Item analysis approaches using the computer
Sometimes an item has some options which work and some which
contribute nothing to distinguishing between those who have
knowledge and those who do not. In Item 7 (Figure 27), options 3
and 4 were not endorsed by any person, and no index of agreement
with the test as a whole could be calculated (as shown by -9.000). In
effect, only part of this item has worked; those who constructed the
item need to provide two more attractive options.
© UNESCO 51
Module 7 Trial testing and item analysis in test construction
In some cases, the item may have more than one fault. For example,
item 13 (Figure 28) appears to be mis-keyed (or the better candidates
are mis-informed) and some of the options do not attract.
52
Item analysis approaches using the computer
© UNESCO 53
Module 7 Trial testing and item analysis in test construction
54
Item analysis approaches using the computer
The other test items are considered in the same way. The final page
of the ITEMAN test analysis looks like the information in Figure 30.
Comments on the printout have been added.
© UNESCO 55
Module 7 Trial testing and item analysis in test construction
Scale Statistics
Scale: 0 <-- This is the scale identification code.
N of items 30 <-- The number of items on this scale.
N of Examinees 27 <-- The number of candidates.
Mean 19.815 <-- The mean (or average) for this group of 27 persons (on 30 questions).
Variance 10.818 <-- A measure of spread of test scores for these candidates.
Std. Dev. 3.289 <-- Another measure of spread of test scores for these candidates.
(The standard deviation is the square root of the variance
Skew -0.111 <-- This index summarises the extent of symmetry in the disribution of
candidates scores. A symmetrical distribution has a skewness of 0;
negative values indicate more high scores than low scores and positive
values indicate more low scores than high scores.
Kurtosis -0.893 <-- This index compares the distribution of candidate scores with a
particular mathematical distribution of scores known as the
Normal or Gaussian distribution. Positive values indicate a more
peaked distribution than the specified distribution; negative
values indicate a flatter distribution
Minimum 14.000 <-- This is the lowest candidate score in this group.
Maximum 26.000 <-- This is the highest candidate score in this group.
Median 20.000 <-- This is the middle score when all candidates scores in this group
are arranged in order.
Alpha 0.543 <-- This index indicates how similar the questions are to each other.
The lowest value is 0.0 and the highest is 1.0. Provided that
candidates had ample time to complete each item, higher values
indicate greater internal consistency in the items.
(See Test Reliability below).
SEM 2.224 <-- We use this index to estimate how much the scores might change
if we gave the same test to the same candidates on several occasions
(See Test Reliability below).
Mean P 0.660 <-- This is the average proportion correct for these items with these
candidates.
Mean Item-Tot. 0.254 <-- This is the average point biserial correlation for these items.
Mean Biserial 0.338 <-- This is the average biserial correlation for these items.
56
Item analysis approaches using the computer
Test reliability
The term validity refers to usefulness for a specified purpose and
can only be interpreted in relation to that purpose. In contrast,
reliability refers to the consistency of measurement regardless of
what is measured. Clearly, if a test is valid for a purpose it must also
be reliable (otherwise it would not satisfy the usefulness criterion).
But a test can be reliable (consistent) without meeting its intended
purpose. Test reliability is influenced by the similarity of the test
items, the length of the test, and the group on which the test is
tried. When we add scores on different parts of a test to give a score
on the whole test, we assume that the test as a whole is measuring
on a single dimension or construct, and the analysis seeks to
identify items which contradict this assumption. In the context of
test analysis, removing items which contradict the single-dimension
assumption should contribute to a more reliable test. Where trial
tests vary in length, the reliability index for one test cannot be
compared directly with another. An adjustment to a common-length
test of 100 items can be made using the Spearman-Brown formula:
© UNESCO 57
Module 7 Trial testing and item analysis in test construction
58
Item analysis approaches using the computer
© UNESCO 59
Module 7 Trial testing and item analysis in test construction
Figure 31 also shows how the development of trial tests can result in
more items in some difficulty ranges and less items in others. Most
of the candidates have attainments (as judged by this test) higher
than the average difficulty for the items. In other words, most items
have difficulties below the attainment levels of the candidates.
60
Item analysis approaches using the computer
© UNESCO 61
Module 7 Trial testing and item analysis in test construction
62
Item analysis approaches using the computer
Figure 33 shows the raw scores for each item and the maximum
scores, the ability level on the continuum where the probability of
success changes from less likely to be correct to more likely to be
correct. The point is called the threshold for the item. Underneath
each threshold numeral there is another numeral indicating the error
associated with the threshold estimate.
Figure 33. Item estimates for test data in Figure 18 (part only)
© UNESCO 63
Module 7 Trial testing and item analysis in test construction
3. Are the fit t-values in the last two columns of the item
estimates table larger than 3?
If No, keep the item and continue to 4; If Yes, probably change
or reject the item.
64
Item analysis approaches using the computer
© UNESCO 65
Module 7 Trial testing and item analysis in test construction
10 Maintenance of security
When trial tests are developed for secure purposes it is important
that the secure nature of the tests be preserved. Copies of the tests
for file purposes must be kept under lock and key conditions. The
computer control files for the test analysis include the score key for
each trial test so there has to be restricted access to the computers
where the test processing is done.
The analysis reports (such as the item analysis, and the summary
tables) will include the score keys and therefore those reports must
be kept secure.
66
Test review after trials 11
© UNESCO 67
Module 7 Trial testing and item analysis in test construction
68
Test review after trials
© UNESCO 69
Module 7 Trial testing and item analysis in test construction
provide a ‘correct’ answer for each question. This enables the test
constructor’s (new) list of correct answers to be checked.
Preparation of final forms of a test is not the end of the work. The
data from use of final versions should be monitored as a quality
control check on their performance. Such analyses can also be used
to fix a standard by which the performance of future candidates
may be compared. It is important to do this as candidates in one
year may vary in quality from those in another year. In some
instances such checks may detect whether there has been a breach
of test security.
The trial forms should include acceptable items from the original
trials (not necessarily items which were used on the final forms
but in similar design to the pattern of item types used in the final
forms) to serve as a link between the new items and the old items.
The process of linking tests using such items is referred to as
anchoring. Surplus items can be retained for future use in similar
test papers.
70
Confidential disposal of trial 12
tests
It is usual to dispose of the used copies of trial tests by confidential
destruction after a suitable time. [The ‘suitable’ time is difficult to
define. Usually, trial tests are destroyed about one month after all
analyses have been concluded and when the likelihood of further
queries about the analyses is very low.]
© UNESCO 71
Module 7 Trial testing and item analysis in test construction
Computer software
The QUEST computer program is published by The Australian
Council for Educational Research Limited (ACER). Information
can be obtained from ACER, 19 Prospect Hill Road, Camberwell,
Melbourne, Victoria 3124, Australia.
72
The BIGSTEPS program is published by MESA Press. Information
can be obtained from MESA Press, 5835 S. Kimbark Avenue,
Chicago, Illinois 60637, United States of America.
References
Adams, R.J. ; Khoo, S.T. (1993). QUEST: The interactive test analysis
system. Hawthorn, Vic.: Australian Council for Educational
Research.
© UNESCO 73
Module 7 Trial testing and item analysis in test construction
74
Using item analysis software
Haines, C.R., Izard, J.F., Berry, J.S. et al. (1993). «Rewarding student
achievement in mathematics projects». Research Memorandum
1/93, London: Department of Mathematics, City University.
(54pp.)
© UNESCO 75
Module 7 Trial testing and item analysis in test construction
Masters, G.N. et al. (1990). Profiles of learning: The basic skills testing
program in New South Wales, 1989. Hawthorn, Vic.: Australian
Council for Educational Research.
76
Exercises 14
1. Choose an important curriculum topic or teaching subject
(either because you know a lot about it or because it is
important in your country’s education programme).
© UNESCO 77
Since 1992 UNESCO’s International Institute for Educational Planning (IIEP) has been
working with Ministries of Education in Southern and Eastern Africa in order to undertake
integrated research and training activities that will expand opportunities for educational
planners to gain the technical skills required for monitoring and evaluating the quality
of basic education, and to generate information that can be used by decision-makers to
plan and improve the quality of education. These activities have been conducted under
the auspices of the Southern and Eastern Africa Consortium for Monitoring Educational
Quality (SACMEQ).
In 2004 SACMEQ was awarded the prestigious Jan Amos Comenius Medal in recognition
of its “outstanding achievements in the field of educational research, capacity building,
and innovation”.
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible for
SACMEQ’s educational policy research programme. All modules may be found on two
Internet Websites: http://www.sacmeq.org and http://www.unesco.org/iiep.
Quantitative research methods
in educational planning
Series editor: Kenneth N.Ross
8
Module
Maria Teresa Siniscalco
and Nadia Auriat
Questionnaire
design
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible
for the educational policy research programme conducted by the Southern and
Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ).
The designations employed and the presentation of material throughout the publication do not imply the expression of
any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area or of
its authorities, or concerning its frontiers or boundaries.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means: electronic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission
in writing from UNESCO (International Institute for Educational Planning).
Content
1. Introduction 1
2. Initial planning 3
Why a new questionnaire – and when? 3
Relationships between research problems,
research hypotheses, and variable construction 5
Characteristics of research hypotheses 6
Specifying variables and indicators 6
Operationalization of research questions 8
The problem of the cross-national validity of educational
concepts, definitions, and data collection instruments 12
1. What is formal education? 12
2. Distinguishing public and private service providers 14
3. What is a school? 16
4. What is a student? 17
5. What is a teacher? 18
1
Module 1 Educational research: some basic concepts and terminology
4. Examples of questions 38
Student background 38
1. Gender and age 39
2. Socio-economic background: occupation, education, and possessions 41
Teacher characteristics 47
School location 49
Learning, teaching, and school activities 51
1. Student reading activity 51
2. Teacher activities 52
3. School head activities 53
II
Content
The codebook 79
6. Further reading 84
© UNESCO III
Introduction 1
© UNESCO 1
Module 8 Questionnaire design
2
Initial planning 2
This section reviews the steps required to determine the need for
a new questionnaire, and looks at how a general research problem
needs to be translated into a number of specific research questions
and hypotheses. It examines the problem of valid cross-national
instruments and provides helpful hints and recommendations for
using comprehensive and precise definitions of key educational
concepts.
© UNESCO 3
Module 8 Questionnaire design
4
Initial planning
© UNESCO 5
Module 8 Questionnaire design
• Be value free in the sense that they exclude the personal biases
of the researcher.
6
Initial planning
• Dependent variables
Variables that the researcher is trying to explain (for example,
student achievement).
• Control variables
Variables that are used to test for a spurious relationship
between dependent and independent variables. That is, to test
whether an observed relationship between dependent and
independent variables may be explained by the presence of
another variable.
• Continuous variables
Variables that take all values within a particular range.
• Discrete variables
Variables that take a number of specific values.
© UNESCO 7
Module 8 Questionnaire design
Operationalization of research
questions
When operationalizing a specific research question to identify
an appropriate indicator it is necessary to specify the indicator
according to the following components.
8
Initial planning
© UNESCO 9
Module 8 Questionnaire design
10
Initial planning
Components of
Teacher stability
the final indicator
© UNESCO 11
Module 8 Questionnaire design
12
Initial planning
b. Helpful hints
Some directions helping to answer these questions can
be drawn from the following comprehensive definition
of education proposed within the International Standard
Classification of Education (ISCED).
© UNESCO 13
Module 8 Questionnaire design
14
Initial planning
b. Helpful hints
An approach often adopted is to distinguish between the
following three categories of schools: schools controlled by
public authorities; schools controlled by private authorities
but depending on substantial government funds; and schools
controlled and funded by private authorities.
© UNESCO 15
Module 8 Questionnaire design
3. What is a school?
a. Problem/issues to be resolved
A school is often difficult to define in a manner that is
consistent for a cross-national data collection. In some cases
a school consists of several buildings, managed by the same
head-teacher. In other cases, the same building hosts different
schools in different shifts at different times of the day. In
some cases a school has a well-defined structure, consisting of
separate classrooms – with each classroom being endowed with
one teacher table and chair, one desk and chair for each student,
and a chalkboard in each classroom. In other cases the school
is in the open air, perhaps under a tree, where teachers and
students sit on the ground, and the students use their knees as
writing places. When collecting comparative information on
schools, these different scenarios have to be taken into account.
b. Helpful hints
Suppose, for example, that ‘school crowdedness’ – expressed as
square metres of classroom space per pupil – is being measured.
The result obtained by dividing the number of square metres
by the total enrolment will be correct (and comparable across
schools) only in a situation where all schools have one shift. But
if some schools operate more than one shift, then the results
will be misleading.
16
Initial planning
4. What is a student?
a. Problem/issues to be resolved
Suppose that student enrolment figures are being investigated.
How will the corresponding statistics be calculated and
reported? When the focus of the analysis is on rates of
participation, what should be done with repeaters, and how
should they be distinguished from students enrolling regularly
for the first time in a grade or year of study? All these issues
need to be taken into account when designing questions on
student enrolment figures for an education system.
b. Helpful hints
A distinction should be made between the number of students
and the number of registrations. The number of students
enrolled refers to the number of individuals who are enrolled
within a specific reference period, while the number of
registrations refers to the count of enrolments within a specific
reference period for a particular programme of study. The two
measures are the same if each individual is only enrolled in
one programme during the reference period, but the measures
differ if some students are enrolled in multiple programs. Each
measure is important: the number of students is used to assess
participation rates (compared to population numbers) and to
establish descriptive profiles of the student body. The number
of registrations is used to assess total education activities for
different areas of study.
© UNESCO 17
Module 8 Questionnaire design
5. What is a teacher?
a. Problem/issues to be resolved
How can teachers be defined in order to distinguish them from
other educational personnel? One approach would be to base
the definition on qualifications. However, this could result in an
overestimation of the number of teachers because a number of
personnel employed in schools may have a teacher qualification
but do not actually teach. Another approach would be to define
teachers on the basis of their activities within schools, but this
alone would not be sufficient to distinguish professionals from
those who may act as teachers occasionally or on a voluntary
basis. A further issue is the reduction of head-counts to full-
time equivalents (if part-time employment applies). How can
part-time teachers be converted into full-time equivalents?
No questionnaire concerning teacher characteristics can be
designed before these issues have been clarified.
b. Helpful hints
The following definition of a teacher provides a useful
framework for overcoming ambiguities:
18
Initial planning
© UNESCO 19
Module 8 Questionnaire design
In some cases the solutions found will be used to define the target
population for which data will be collected – as shown in the
examples given in the paragraph on formal education. In other
cases the definitional work will contribute directly to the design
of questionnaire items – as shown in the examples given in the
previous section on service provider. In yet other cases definitions
and explanations will be used to prepare accompanying notes
that provide instructions on how to answer specific questions
– as shown in the examples given for the conversion of part- time
teachers and students into full-time equivalents).
20
Initial planning
E XERCISES
© UNESCO 21
Module 8 Questionnaire design
Question structure
Two important aspects of questionnaire design are the structure of
the questions and the decisions on the types of response formats for
each question. Broadly speaking, survey questions can be classified
into three structures: closed, open-ended, and contingency
questions.
22
1. Closed questions
Closed (or multiple choice) questions ask the respondent to choose,
among a possible set of answers, the response that most closely
represents his/her viewpoint. The respondent is usually asked to
tick or circle the chosen answer. Questions of this kind may offer
simple alternatives such as ‘Yes’ or ‘No’. They may also require that
the respondent chooses among several answer categories, or that
he/she uses a frequency scale, an importance scale, or an agreement
scale.
Never . . . . . . . . . . . . . . . . . . 1
1 or 2 times a week . . . . . . . 2
3 or 4 times a week . . . . . . 3
Nearly every day . . . . . . . . 4
© UNESCO 23
Module 8 Questionnaire design
The response format for closed questions can range from a simple
yes/no response, to an approve/disapprove alternative, to asking
the respondent to choose one alternative from 3 or more response
options.
24
The design of questions
p Male
p Female
© UNESCO 25
Module 8 Questionnaire design
2. Open-ended questions
Open-ended or free-response questions are not followed by any
choices and the respondent must answer by supplying a response,
usually by entering a number, a word, or a short text. Answers are
recorded in full, either by the interviewer or, in the case of a self-
administered survey, the respondent records his or her own entire
response.
• they are less likely to suggest or guide the answer than closed
questions because they are free from the format effects
associated with closed questions, and
• they can add new information when there is very little existing
information available about a topic.
26
The design of questions
© UNESCO 27
Module 8 Questionnaire design
3. Contingency questions
A contingency question is a special case of a closed-ended
question because it applies only to a subgroup of respondents.
The relevance of the question for a subgroup is determined by
asking a filter question. The filter question directs the subgroup to
answer a relevant set of specialized questions and instructs other
respondents to skip to a later section of the questionnaire.
The formats for filter and contingency questions can vary. One
option is to write directions next to the response category of the
filter question.
28
The design of questions
© UNESCO 29
Module 8 Questionnaire design
Narrative :
texts that tell a story or give the order in which things happen.
Expository :
texts that provide a factual description of things or people or
explain how things work or why things happen.
Documents :
tables, charts, diagrams, lists, maps.
30
The design of questions
© UNESCO 31
Module 8 Questionnaire design
32
The design of questions
usually on time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
sometimes a week late . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
more than a week late . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
under 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
20-30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
30-40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
40-50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
50-60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
60 or more . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
under 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
20-30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
31-40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
41-50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
51-60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
61 or more . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
© UNESCO 33
Module 8 Questionnaire design
34
The design of questions
Research has shown that response rates are usually higher for
homogeneous or select groups (for example, high school teachers,
university professors, physicians) because they are more likely to
identify with the goals of the study. Beyond this distinction, it is
known that interest and familiarity with the topic has a positive
effect on response rates.
© UNESCO 35
Module 8 Questionnaire design
E XERCISES
36
The design of questions
… E XERCISES
Question 1
How many teachers are there in your school who have been
at the school for at least five years and who are involved in
special initiatives outside the normal class activities at least
once per week?
. . . . . . . . . . . . . . . teachers
Question 2
Do you enjoy studying English and Mathematics?
Yes . . . . . . . . . . . 1
No . . . . . . . . . . . 2
Question 3
If you could attend university which subjects would you like to
study?
..................................................
Question 4
In the last six months how many times did you teach your
students to read expository materials?
..................................................
Question 5
Sometimes teachers do not give me sufficient attention.
Definitely Mostly Mostly Definitely
Disagree disagree agree agree
1 2 3 4
Question 6
What is the condition of each of the following in your school?
Bad Good
Lighting 1 2
Water 1 2
Canteen 1 2
Water taps 1 2
© UNESCO 37
Module 8 Questionnaire design
4 Examples of questions
In the following discussions some examples have been presented
of the main types of questions that are often used in educational
planning and educational research data collections. These questions
cover the areas of student background, teacher characteristics,
school location, learning/teaching activities, and attitudes. The
examples related to attitude scales include a discussion of the
principles of Likert scaling and the method used for the calculation
of the discrimination power of attitude scale items.
Student background
Demographic questions are designed to elicit information from
respondents concerning their personal characteristics and social
background. This type of information is important for explaining
variations in educational outcomes and behavioural patterns. The
most frequently used demographic questions focus on gender, age,
level of education, income level, marital status, level of parents’
education, religion, and ethnic background. A number of these
areas cover sensitive and personal issues and therefore need to be
handled carefully.
38
1. Gender and age
Data on student gender is critically important for examining issues
of gender equity in all school systems. This information can be
gathered from the class attendance register or can be asked as part
of a student questionnaire.
p Boy
p Girl
Your sex:
p Male
p Female
© UNESCO 39
Module 8 Questionnaire design
Age: . . . . . . . . . . . . . . . . .
40
Examples of questions
2. Socio-economic background:
occupation, education, and possessions
Another type of background characteristic concerns the socio-
economic background of the student. Indicators can be developed
using information obtained directly from the individual, or by using
either objective or subjective responses. Some common indicators
of student socio-economic background are the parents’ level of
income, their occupational status, their level of education, and the
personal possessions in the home. The measurement of parent
income is always a difficult task in all countries of the world – and
for many different reasons. Most school-aged children cannot
answer such questions accurately. Similarly, adults sometimes have
difficulty answering income questions because they are not in an
occupation with a regular salary or because questions in this area
represent an invasion of personal privacy. It is usually not useful
to include a question on parent income when the respondent is
under 15 years of age. For this reason, parents’ level of education,
occupational status, and home possessions are the most frequently-
used proxy indicators of household wealth.
a. Parent occupations
Parent occupations are usually grouped into ‘occupational status’
categories based on levels of education, skill, and income. The
categories are then ranked from lowest occupational status to
highest. The categories used to group the occupations must reflect
the range of occupations existing in society, and they must also be
comprehensible to the respondent. Terms such as white-collar, blue-
collar, professional, skilled, semi-skilled, unskilled are not easily
understood by younger children if left undefined.
© UNESCO 41
Module 8 Questionnaire design
42
Examples of questions
Response:
My father works in a factory
Probe:
What kind of machine does he operate?
Response:
My father is a teacher
Probe:
What level does he teach (or, alternatively, what age students does
he teach?)
Response:
My father works in a shop
Probes:
Does he own the shop? Or does he manage the shop?
Or does he work for someone else in the shop?
© UNESCO 43
Module 8 Questionnaire design
b. Parent’s education
Open-ended questions that ask directly for the number of years of
a parent’s education are very difficult to answer because they imply
that students remember not only the level of education completed
by their parents, but also the sequence of levels of education,
and that they know the duration of each of these levels. For these
reasons, questions on parent’s education should be given in multiple
choice format.
44
Examples of questions
The items included in the list must relate to the context in which
the questionnaire is administered, and to the level of development
and characteristics of the society. It is important that the list include
possessions that denote high, medium, and low economic status in
order to discriminate among students with different socio-economic
backgrounds.
© UNESCO 45
Module 8 Questionnaire design
The data gathered using this kind of question are at best very
approximate. However, experience has shown that these data are
generally highly correlated with educational outcomes.
46
Examples of questions
Teacher characteristics
Among the teacher’s characteristics of interest in educational
data collection are gender, age, education, and years of teaching
experience. At the school level this information can be collected
either from teachers themselves or from school heads. However,
asking teachers to answer a question such as ‘How many years of
education have you completed?’ provides very little information.
Such a question neither specifies whether pre-service training is to
be included in the ‘years of education’, nor provides information on
years of grade repetitions (if any) or on whether part-time years of
attendance were converted into full-time years equivalent.
© UNESCO 47
Module 8 Questionnaire design
48
Examples of questions
School location
The location of a school is often a key issue in data collections
because physical location is often strongly related to the
sociocultural environment of the school. In addition, the degree
of physical isolation of a school can have important impacts on
decisions related to staffing and infrastructure costs.
© UNESCO 49
Module 8 Questionnaire design
The list of items for which the distance in kilometres is asked can
vary according to the focus of the survey and the characteristics of
the country. Whatever items are used, the number of kilometres can
be summed for all items and then divided by the number of items
as a measure of the degree of isolation of the school.
50
Examples of questions
How often do you read these types of publications for personal interest
and leisure?
Mystery 1 2 3 4 5 6
Romance 1 2 3 4 5 6
Sport 1 2 3 4 5 6
Adventure 1 2 3 4 5 6
Music 1 2 3 4 5 6
Nature 1 2 3 4 5 6
© UNESCO 51
Module 8 Questionnaire design
2. Teacher activities
During the school year, how often do you teach comprehension of each
of the following kinds of text?
(Circle one number per line)
a. Narrative text 1 2 3 4 5
(that tells a story or gives the
order in which things happen)
b. Expository text 1 2 3 4 5
(that describes things or
people, or explains how things
work or why things happen)
c. Documents 1 2 3 4 5
(that contain tables, charts,
diagrams, lists, maps)
52
Examples of questions
© UNESCO 53
Module 8 Questionnaire design
Importance ranking
(a) evaluating the staff ............
(b) discussing educational objectives with teachers ............
(c) pursuing administrative tasks ............
(d) organizing in-service teacher training courses ............
(e) organizing extra-class special programs ............
(f) talking with students in case of problems ............
54
Examples of questions
1. Likert scaling
Likert scaling is the most frequently applied attitude scaling
technique in educational research. It consists of six main steps.
© UNESCO 55
Module 8 Questionnaire design
56
Examples of questions
Strongly Strongly
agree uncertain disagree
agree disagree
5 4 3 2 1
© UNESCO 57
Module 8 Questionnaire design
58
Examples of questions
© UNESCO 59
Module 8 Questionnaire design
60
Examples of questions
E XERCISES
© UNESCO 61
Module 8 Questionnaire design
… E XERCISES
62
Moving from initial draft 5
to final version of the
questionnaire
This section looks at the ordering of questions in the questionnaire,
the training of interviewers and administrators, pilot testing, and
the preparation of a codebook. It gives advice on how to design the
layout of the questionnaire, including instructions to respondents,
interviewer instructions and introductory and concluding remarks.
Guidance is provided on how to trial test and then use the results of
this to improve the final form of the questionnaire.
© UNESCO 63
Module 8 Questionnaire design
64
Moving from initial draft to final version of the questionnaire
© UNESCO 65
Module 8 Questionnaire design
66
Moving from initial draft to final version of the questionnaire
The results of this study will determine the reading literacy levels
of primary school students and this information will be used as
part of a review of teacher pre-service and in-service training
programmes.
Yours sincerely,
xxxxxxx
© UNESCO 67
Module 8 Questionnaire design
68
Moving from initial draft to final version of the questionnaire
INTERVIEW FORMAT
1. Thinking about government facilities provided for schools, do
you think your neighborhood gets better, about the same, or
worse facilities than most other parts of the city?
Better (ASK A) 1
About the same 2
Worse (ASK A) 3
Don’t know 8
© UNESCO 69
Module 8 Questionnaire design
SELF-ADMINISTERED FORMAT
1. Thinking about the government facilities provided for schools,
do you think your neighborhood gets better, about the same,
or worse facilities than most other parts of the city?
Better 1 (answer 1A below)
About the same 2
Worse 3 (answer 1A below)
Don’t know 8
Training of interviewers or
questionnaire administrators
Frequently, the testing of a questionnaire is undertaken by
interviewing respondents – even if the final version of the
questionnaire is to be self-administered. This implies, however,
that the questionnaire administrators and the interviewers receive
an appropriate level of basic training before setting out to pilot the
questionnaire.
70
Moving from initial draft to final version of the questionnaire
© UNESCO 71
Module 8 Questionnaire design
72
Moving from initial draft to final version of the questionnaire
© UNESCO 73
Module 8 Questionnaire design
74
Moving from initial draft to final version of the questionnaire
© UNESCO 75
Module 8 Questionnaire design
76
Moving from initial draft to final version of the questionnaire
© UNESCO 77
Module 8 Questionnaire design
2. Reliability
Reliability concerns the consistency of a measure. That is, the
tendency to obtain the same results if the measure was to be
repeated by using the same subjects under the same conditions.
Item Kappa
Are you a boy or a girl? 0.98
Do you speak Swedish at home? 0.77
How often do you read for somebody at home? 0.41
Although the kappa for the question on gender seems high (0.98),
for such a question one would expect the value to be 1. On a
question like this, agreement can be increased through more careful
supervision by the person who administered the questionnaire.
The relatively low coefficients for the second two questions suggest
that multiple data sources on many questions may be required for
children at this age.
78
Moving from initial draft to final version of the questionnaire
The codebook
A codebook should be prepared in order to enter the data into a
computer. The codebook is a computer-based structure file designed
to guide data entry. It contains a field for every piece of information
which is to be extracted from the questionnaire – starting from the
identification code which allows each respondent in the sample to
be uniquely identified.
© UNESCO 79
Module 8 Questionnaire design
The coding scheme for the above question will be ‘1’ for ‘Boy’, ‘2’ for
‘Girl’, ‘8’ for ‘Not Applicable’ and ‘9’ for ‘Missing’. It is customary to
assign missing data to the highest possible value. That is ‘9’ for one-
digit questions, ‘99’ for two-digit questions, etc. The values of ‘8’,
‘88’ etc. can be used to code ‘Not Applicable’ data.
80
Moving from initial draft to final version of the questionnaire
CODEBOOK FORMAT
Respondent
identification number
Code Actual Number
IDNUMBER (001-528) 1-3
Q1 Highest grade 4 9 8
completed
1=1-8
2=9-11
3=12
4=13-15
5=16
6=17+
Q2 Gender of teacher 5 9 8
1=Male
2=Female
© UNESCO 81
Module 8 Questionnaire design
E XERCISES
82
Moving from initial draft to final version of the questionnaire
… E XERCISES
© UNESCO 83
Module 8 Questionnaire design
Munck, I. 1991. Plan for a measurement study within the Swedish IEA
Reading Literacy Survey and some results for population A. Stockholm:
Institute of International Education, University of Stockholm.
84
OECD. 1995. Definitions, explanations and instructions. In:
UNESCO, OECD, EUROSTAT data collection instruments for the 1996
data collection on education statistics. Paris: OECD.
© UNESCO 85
Since 1992 UNESCO’s International Institute for Educational Planning (IIEP) has been
working with Ministries of Education in Southern and Eastern Africa in order to undertake
integrated research and training activities that will expand opportunities for educational
planners to gain the technical skills required for monitoring and evaluating the quality
of basic education, and to generate information that can be used by decision-makers to
plan and improve the quality of education. These activities have been conducted under
the auspices of the Southern and Eastern Africa Consortium for Monitoring Educational
Quality (SACMEQ).
In 2004 SACMEQ was awarded the prestigious Jan Amos Comenius Medal in recognition
of its “outstanding achievements in the field of educational research, capacity building,
and innovation”.
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible for
SACMEQ’s educational policy research programme. All modules may be found on two
Internet Websites: http://www.sacmeq.org and http://www.unesco.org/iiep.
Quantitative research methods
in educational planning
Series editor: Kenneth N.Ross
10
Module
Andreas Schleicher
and Mioko Saito
Data preparation
and management
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible
for the educational policy research programme conducted by the Southern and
Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ).
The designations employed and the presentation of material throughout the publication do not imply the expression of
any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area or of
its authorities, or concerning its frontiers or boundaries.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means: electronic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission
in writing from UNESCO (International Institute for Educational Planning).
Content
1. Introduction 1
Professional data management as an essential component
of educational surveys 1
Other related documentation 4
1
Module 10 Data preparation and management
Data entry 43
1. Basic approaches to data entry 43
2. Using a text editor for data entry 44
3. Using a computer-controlled approach for data entry,
the dataentrymanager programme 47
WinDEM 50
1. Entering data 50
2. Reviewing your data 52
II
Content
6. Data verification 64
Data verification steps 66
1. Verification of file integrity 67
2. Special recodings 68
3. Value validation 68
4. Treatment of duplicate identification codes 69
5. Internal validation of an hierarchical identification system 69
6. Verification of the linkages between datafiles 69
7. Verification of participation indicator variables against data variables 70
8. Verification of exclusions of respondents 71
9. Checking for inconsistencies in the data 71
8. Conclusion 78
III
Introduction 1
1
Module 10 Data preparation and management
field can be ruined when: (a) coding and data entry teams are
insufficiently trained or supervised; (b) coding instructions and
codebook specifications are incomplete or inadequate; or (c) the
database management is inappropriate so that information is
lost, composite variables are created incorrectly, data are used at
the wrong level of analysis, or no attention is given to “adjusting”
estimates for the structure of the sample design used.
The issues presented above illustrate the need for a great deal
of thought to be given to the management of data prior to the
commencement of an educational survey research study. In
particular, close attention must be given to: (a) the type of data
collected, (b) the data collection methods, (c) the design of data
collection instruments, and (d) the administrative procedures and
field operations.
2
Introduction
3
Module 10 Data preparation and management
4
An over view of data 2
management for educational
sur vey research
5
Module 10 Data preparation and management
First, the resources required for field operations, data entry, and
data processing generally depend on the sample size that is to be
used. In situations where there are severe constraints on resources,
this will often require trade-offs to be made concerning various
factors which influence the quality with which the survey can be
carried out.
Second, the procedures for coding and data entry will depend to a
great extent on the types of response required of the questions in
the data collection instruments.
6
An overview of data management for educational survey research
7
Module 10 Data preparation and management
8
An overview of data management for educational survey research
9
Module 10 Data preparation and management
The largest part of the data collection costs are often caused by
the coding, entry, and verification of the data. Careful thought
must therefore be given to the establishment of consistent coding
schemes that are easy to apply and that cover the potential
responses and different instances of missing data in an exhaustive
and mutually exclusive way. It is important that there are enough
personnel and enough technical resources in order to complete
the entering and cleaning of the data in a timely fashion. What is
especially important is that coders are well trained and that there is
a head-coder to whom queries can be directed and who can decide
what to do when there are problems with the coding.
10
An overview of data management for educational survey research
11
Module 10 Data preparation and management
12
• The development of procedures for the analytical treatment and
reporting of deviations from the quality standards.
13
Module 10 Data preparation and management
14
Data management and quality control
Preparation of a codebook
1. Datafiles, records, and variables
Data are stored in computers in the form of units called datafiles.
In general terms, a datafile can be described as a collection of
related information. For example, a datafile can contain a number
that identifies each member of a sample of students and gives the
student responses for each item of an achievement test, and, in
addition, provides descriptive background information for each
student. Each datafile is referenced by a unique filename.
Most statistical data analysis systems can read and process raw
datafiles. The user of these systems must “tell the system” in
which location and in which format the data have been written. To
simplify this process, many statistical data analysis systems employ
their own system file format in which the data and all the technical
information concerning the file structure, the data format, and the
coding schemes are integrated. However, these system files can
usually only be used with a specific software system and therefore
are often not suitable for data transfer between different software
systems.
15
Module 10 Data preparation and management
16
Data management and quality control
Errors often occur during the entry of data into a computer when
a number of variables have possible values within the same range
and, at the same time, they appear in a sequence or are coded in
a continuous string. These variables take values within the same
range and therefore this leads to greater potential for column shifts.
To guard against this type of error it is often useful to insert, at
certain positions in the datafile, variables for which a certain fixed
value (for example, a blank space) must be specified. Similarly, it is
often useful to introduce variables that indicate the participation
status of the respondent or that indicate reasons for excluding a
respondent from the assessment. Variables that do not represent
data from the respondents but that are introduced for checking
purposes are usually referred to as control variables.
17
Module 10 Data preparation and management
18
Data management and quality control
19
Module 10 Data preparation and management
4 The preparation of a
codebook
In the following discussion, an example in the preparation
of a codebook has been illustrated for a short hypothetical
questionnaire.
Elements in codebook
1. Codebook information for the school
identification code
• Variable Name: Each variable must be identified by a unique
variable name. In this example the school identification
variable has been given the name IDSCHOOL.
• Variable Type: The type of coding that is used for the variable
must now be defined. Usually a distinction is made between
alphanumeric variables which are treated as categorical
data and open-ended numerical codes which are treated as
20
numbers. Sometimes also a distinction between different
types of numerical codes is made. Identification variables
always have categorical codes but we can choose between an
alpha or a numeric data representation.
• Variable Length and Recording Positions: The number of
digits (including decimal places) which are required to code
the data values of this variable and the positions in the datafile
must then be specified. Starting the datafile with the school
identification code we will put this into the columns 1-3 of the
raw datafile.
• Number of Decimal Places: Where decimals are used in data
codes it is necessary to specify how many decimal places
are used. For the school identification code there will be no
decimal places.
• Instrument Location: The codebook should also tell the
coders about the location of information in the data collection
instruments. For example, the coders should be informed that
they will find school identification codes in the headers of
assigned questionnaires.
• Variable Label: A brief descriptive label should be assigned
to the variable that can help later users of the programme to
remember what the short variable name stands for.
• Coding Scheme: For categorical variables it is necessary to
specify the code for each possible category. In addition, for
all types of variables it is necessary to specify the codes
associated with frequently-occurred data (such as missing, not
administered, not reached, etc.).
• Range Validation Criteria: It is often useful to specify a valid
range for the variable that determines which data values
the user is allowed to enter into the computer. Such range
validation criteria may take the form of a simple set of allowed
codes or they may have a complex structure, relating the codes
to responses to other questions or the responses of other
respondents.
21
Module 10 Data preparation and management
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2 3 4
(b) Lunch 1 2 3 4
(c) Evening meal 1 2 3 4
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1
No .............................................. 2 If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3
Sometimes................................... 2
Never............................................ 1
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1
1 to 10 books ............................. 2
11 to 50 books ........................... 3
More than 50 books ................. 4
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0 1
(b) TV 0 1
(c) Table to Write on 0 1
(d) Bicycle 0 1
(e) Electricity 0 1
(f) Running Water 0 1
(g) Daily Newspaper 0 1
22
The preparation of a codebook
23
Module 10 Data preparation and management
24
The preparation of a codebook
25
Module 10 Data preparation and management
• The fifth column (Code R:Recode) presents the codes for the
responses, and the recodes for variables for which recoding is
necessary and where recoding is not covered by the general
notes on recoding. Whenever actual numerical data are
supplied in the response to the questions, this is indicated
by the keyword “VALUE”. The missing-code presented in the
codebook indicates “missing/non-response” values. The “not
administered” code presented in the codebook indicates “not
administered” values;
26
The preparation of a codebook
27
Module 10 Data preparation and management
28
The preparation of a codebook
29
Module 10 Data preparation and management
For each data file which you create with the WinDEM programme,
the programme maintains an electronic codebook which contains
all technical information required to define the file structure, the
coding scheme, the data verification rules, and quality standards
for the datafile. Whenever variables are modified, the programme
updates the electronic codebook automatically.
30
File construction
1. Specifying a filename
In order to create a new datafile, the programme will first ask you
to give your datafile an alphanumeric name with a length of up to 8
characters, for example, SAMPLE1.
A display as shown in Figure 3 will appear where you can fill in the
variable definitions in the codebook fields:
31
Module 10 Data preparation and management
a. Essential information
The following pieces of information are essential for the
definition of a variable.
Unique Variable Name: Each variable must be identified
by a unique variable name. We will start with the school
identification code which is presented in the header of the
questionnaire. We have given it the name “IDSCHOOL”, so
you would enter “IDSCHOOL” into the first blank field.
Variable Type: The next question asks about the type of coding
that is used for the variable. The letter “C” indicates categorical
variables with a fixed set of alphanumeric or numeric
categories. The letter “N” indicates non-categorical variables
with open-ended numerical codes. While there are a fixed
number of schools and therefore only a fixed set of possible
school identification values, the number of possible values is
very large and can be understood as quasi-open-ended, so you
should enter “N” into the second blank field.
Variable Length: Afterwards you need to specify the number
of digits (including decimal places) which are required to code
the data values of this variable. Assuming that, in our example,
there are 150 schools the identification codes of which are the
numbers 1 to 150, we can use a three-digit code to identify the
schools, so you would enter “3” into the codebook field for the
length.
Decimals: Afterwards you can specify the number of decimal
places to be used in the codes. In the school identification code
there are no decimal places, so you would leave the “0” in this
codebook field which is the default value and go to the next
codebook field.
Location in Instrument: The next piece of information will tell
the coders where (in the data collection instruments) they will
find the question used as the source of information. You can
32
The data entry manager software system
33
Module 10 Data preparation and management
34
The data entry manager software system
c. Adding variables
Having completed the definition of the variable, the
programme will bring you back to the tabular display where
you can review your definitions or add new variables. In the
following discussion you will find two more examples for
the preparation of variables in the electronic codebook. The
definition of the student identification code is similar to the
definition of the school identification code, except that a five-
digit number will be used. You would enter “IDSTUD” for the
variable name, “N” for the variable type, “5” for the length, “0”
for the number of decimals, “Student ID” for the instrument
location, and “Student identification code” for the variable
label. Then you could enter “99999” for the “missing” data code
and “99998” for the “not administered” data code.
35
Module 10 Data preparation and management
36
The data entry manager software system
For each valid code, you will find one row displayed in a
small window. In the blank fields on the left hand side of this
window you should enter the codes, and in the blank fields
on the right hand side you should enter the meaning of the
codes, which are referred to as the value labels. For the code
“1” you would enter “boy” and for the code “2” you would
enter “girl” (the codes and value labels should be based on the
questionnaire presented in Figure 1).
For the variable class you should select “D” to indicate that
this question refers to the student’s description. Also the
remaining questions in this questionnaire will refer to the
students’ description, so you should also select “D” for the
variable class for the remaining variables.
E XERCISE 1:
Complete defining all the variables in SAMPLE1 based on the
questionnaire (Figure 1) and the printed codebook (Figure 2). You
should have the following screen (Figure 6) when you finish:
37
Module 10 Data preparation and management
If you define none or too few categories of missing data, you may
end up with severe problems in the data analyses. For example, to
calculate the percentage of correct answers for an item in a reading
test you may want to assume that the students who omitted an item
could not answer it and will therefore be scored as wrong. However,
it would be unfair to score some items as wrong which were not
administered to the student because they were, for example,
misprinted in the student booklet. If the coders do not assign
different codes for each of these instances then you will not be able
to make that distinction in the data analyses.
38
The data entry manager software system
1. Key requirements
The codes for missing data need to represent the different instances
of missing data exhaustively. This means that each code in the
datafiles should either represent a valid data value or one of the
missing codes. There should never be a situation where a position in
the datafile is just left blank. There should also never be a situation
where there is no data from the respondent but none of the missing
codes applies.
Finally, it should be clear how the missing codes are coded in the
datafile and how the different instances of missing data are treated
in the data analyses.
39
Module 10 Data preparation and management
a. Missing/omit
“Missing/omit” codes refer to questions/items which a
respondent should have answered but which he/she either did
not answer or which were answered in an invalid way (though
sometimes a finer distinction between these categories may be
required). Some obvious reasons for assigning this code:
No Response: Where there was no response to a question or an
item where there should be one.
Two or More Responses: Where there were two or more
responses when only one answer was allowed.
Response Unreadable: Where the response was unreadable or
uninterpretable. Often the codes “9”, “99”, “999” (depending on
the length for a variable) are assigned to this type of missing
data to distinguish them from the valid and “not applicable”
data.
Sometimes a further distinction between questions that
were omitted by a respondent and questions that have been
answered in an invalid way is required but the analytical
distinctions will then be very complicated.
b. Not administered
“Not administered” codes are assigned when data were not
collected for an observation on a specific variable. There are
some obvious cases when this code should be used:
Respondent Not Present: For example, if a student was not
present in a particular testing session, then all variables
referring to that session were supposed to be coded to “not
administered”. However, if the student received the instrument
but did not answer particular questions, then these questions
must be coded as “missing”.
40
The data entry manager software system
41
Module 10 Data preparation and management
42
The data entry manager software system
Data entry
Once the data have been returned from the respondents the data
need to be recorded in computer readable form. This section
provides an overview of different approaches to data entry and then
discusses two approaches to data entry in a more detailed way.
43
Module 10 Data preparation and management
103103042 83941991019110
103103051124232130110110
103104063 92221241000110
44
The data entry manager software system
As you can see, the codebook starts with the School ID (103),
followed by the Student ID (10304), the student sex (the 2 indicates
a girl), the students age (8 years), and so on until all variables in the
codebook have been coded.
If, by mistake, a coder skips a code or enters a code twice, then all
subsequent codes in the datafile will be shifted and thus change
their implied meaning in the datafile:
Incorrect: 10310304283941991019110
Correct: 103103042 83941991019110
The approach also does not allow to verify during data entry
whether the data values entered conform indeed to the
specifications in the codebook:
45
Module 10 Data preparation and management
In this example the position for the student sex contains the value
“3” which is outside the set of permitted values (“1” for “boy”, “2”
for “girl”, and “8” and “9” for the missing codes) and is obviously a
coding error. Besides losing the information for this student it also
has, if undetected, an impact on the results of statistical analyses.
Furthermore, such an approach does not allow to verify the data for
internal consistency while the data are entered:
46
The data entry manager software system
In this example, the school identification code does not match the
first three digits in the student identification code, even though a
hierarchical identification system was used. Entering the data into a
text file the coder might not notice this mistake. While this problem
would be impossible to resolve once the original data collection
instruments are no longer available, a computer-controlled data
entry programme could verify the student and school identification
codes during data entry for internal consistency, alerting the coder
that either the student or the school identification code contains an
error and asking the coder to immediately check this information
back against the original data collection instruments.
47
Module 10 Data preparation and management
000 respondents. All datafiles created are fully compatible with the
dBASE IV™ standard.
With the FILE menu you can open, create, delete or sort a datafile.
As you have seen earlier in this module, you can use this menu
also to edit the electronic codebook which is associated with
each datafile and which contains all information about the file
structure and the coding schemes employed. Furthermore, you
can use this menu to print the electronic codebook or to transform
the information in the electronic codebook into SAS™, SPSS™,
or OSIRIS/IDAMS™ control statements which you can use later
in order to convert the datafiles into SAS™, SPSS™, or OSIRIS/
IDAMS™ system files. Finally this menu allows you to exit the
WinDEM programme.
With the EDIT menu you can enter, modify, or delete data in
datafiles. You can look at a datafile in two different ways: (a) in
record view, you can view the data for one record at a time with
detailed information on each of the variables; (b) in table view, you
can view a datafile as a whole in tabular form with records shown
as rows and variables shown as columns. The programme will
control the processing of the data entered, interrupting and alerting
you when data values fail to meet the range validation criteria which
are specified in the electronic codebook.
With the SEARCH menu you can search for specific records using
your own search criteria or locate a record with a known record
number.
With the SUBSET menu, you can define a subset of specific records
using your own criteria. This will then restrict your view of the data
to the records which match these criteria.
48
The data entry manager software system
You can use the TOOLS menu to back-up data from the hard disk
onto diskettes or to restore data from the backup diskettes in case
the data on your hard disk has been damaged. You can further use
this menu to configure the programme to your specific hardware
environment.
49
Module 10 Data preparation and management
WinDEM
This would require the following steps:
After selecting the datafile, the programme will bring you to the
EDIT menu, where you can choose to look at the datafile in two
different ways, in record view or in table view. The difference
between the two displays is that record view will provide you
with a detailed display of one record at a time, whereas table view
will provide you with a tabular overview of several records of the
datafile at the same time.
1. Entering data
The record view: Suppose we would choose the Record view item
from the EDIT menu.
When you start editing in Record view, you will see some useful
information on the screen (Figure 7).
50
The data entry manager software system
• fields filled with default codes (“999” and “99999”) for the
identification variables (IDSCHOOL and IDSTUD respectively).
You may enter data in the fields filled with default codes. You can go
to a previous variable with the [á ]-key or to a subsequent variable
with the [â ]-key provided that the variable in which the cursor is
currently positioned has a valid code.
51
Module 10 Data preparation and management
Note that the programme will only allow you to enter those
variables which match the criteria which you have specified in the
codebook. For example, if you enter the code “3” for the variables
SSEX (students sex) the programme will reject this value. This is
because we have defined only the codes “1” for “boy” and “2” for
“girl”, “9” for “missing”, and “8” for “not administered”.
52
The data entry manager software system
E XERCISE 2:
Enter data using the questionnaires filled by 10 students, shown in
Figure 9.
53
Module 10 Data preparation and management
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2 3v 4
(b) Lunch 1 2 3 4
(c) Evening meal 1 2 3 4v
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1 v
No .............................................. 2 If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3
Sometimes................................... 2
Never............................................ 1
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1
1 to 10 books ............................. 2
11 to 50 books ........................... 3
More than 50 books ................. 4
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0 1v
(b) TV 0v 1
(c) Table to Write on 0 1v
(d) Bicycle 0 1
(e) Electricity 0 1v
(f) Running Water 0 1v
(g) Daily Newspaper 0v 1
54
The data entry manager software system
Case 2
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2v 3 4
(b) Lunch 1 2 3 4v
(c) Evening meal 1 2 3 4v
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1 v
No .............................................. 2 If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3 v
Sometimes................................... 2
Never............................................ 1
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1
1 to 10 books ............................. 2 v
11 to 50 books ........................... 3
More than 50 books ................. 4
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0 1v
(b) TV 0 1v
(c) Table to Write on 0 1v
(d) Bicycle 0v 1
(e) Electricity 0 1v
(f) Running Water 0 1v
(g) Daily Newspaper 0 1v
55
Module 10 Data preparation and management
Case 3
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2 3 4v
(b) Lunch 1 2 3 4v
(c) Evening meal 1 2 3 4v
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1 v
No .............................................. 2 If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3
Sometimes................................... 2 v
Never............................................ 1
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1 v
1 to 10 books ............................. 2
11 to 50 books ........................... 3
More than 50 books ................. 4
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0 1v
(b) TV 0v 1v
(c) Table to Write on 0 1v
(d) Bicycle 0v 1
(e) Electricity 0 1v
(f) Running Water 0 1v
(g) Daily Newspaper 0v 1
56
The data entry manager software system
Case 4
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2 3v 4
(b) Lunch 1 2 3 4v
(c) Evening meal 1 2 3 4v
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1
No .............................................. 2 v If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3
Sometimes................................... 2 v
Never............................................ 1
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1 v
1 to 10 books ............................. 2
11 to 50 books ........................... 3
More than 50 books ................. 4
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0v 1
(b) TV 0v 1
(c) Table to Write on 0v 1
(d) Bicycle 0v 1
(e) Electricity 0 1v
(f) Running Water 0 1v
(g) Daily Newspaper 0v 1
57
Module 10 Data preparation and management
Case 5
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2v 3 4
(b) Lunch 1 2 3 4v
(c) Evening meal 1 2 3v 4
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1 v
No .............................................. 2 If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3
Sometimes................................... 2 v
Never............................................ 1
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1
1 to 10 books ............................. 2
11 to 50 books ........................... 3
More than 50 books ................. 4 v
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0 1v
(b) TV 0 1v
(c) Table to Write on 0 1v
(d) Bicycle 0 1v
(e) Electricity 0 1v
(f) Running Water 0 1v
(g) Daily Newspaper 0 1v
58
The data entry manager software system
Case 6
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2v 3 4
(b) Lunch 1 2 3 4v
(c) Evening meal 1 2v 3 4
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1 v
No .............................................. 2 If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3
Sometimes................................... 2
Never............................................ 1 v
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1
1 to 10 books ............................. 2 v
11 to 50 books ........................... 3
More than 50 books ................. 4
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0 1v
(b) TV 0v 1
(c) Table to Write on 0v 1
(d) Bicycle 0v 1
(e) Electricity 0 1v
(f) Running Water 0 1v
(g) Daily Newspaper 0v 1
59
Module 10 Data preparation and management
Case 7
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2 3v 4
(b) Lunch 1 2 3v 4
(c) Evening meal 1 2 3v 4
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1 v
No .............................................. 2 If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3
Sometimes................................... 2 v
Never............................................ 1
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1
1 to 10 books ............................. 2 v
11 to 50 books ........................... 3
More than 50 books ................. 4
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0v 1
(b) TV 0v 1
(c) Table to Write on 0v 1
(d) Bicycle 0v 1
(e) Electricity 0 1v
(f) Running Water 0 1v
(g) Daily Newspaper 0v 1
60
The data entry manager software system
Case 8
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2 3v 4
(b) Lunch 1 2 3 4v
(c) Evening meal 1 2 3v 4
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1 v
No .............................................. 2 If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3 v
Sometimes................................... 2
Never............................................ 1
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1
1 to 10 books ............................. 2
11 to 50 books ........................... 3
More than 50 books ................. 4 v
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0 1v
(b) TV 0v 1
(c) Table to Write on 0 1v
(d) Bicycle 0v 1
(e) Electricity 0 1v
(f) Running Water 0 1v
(g) Daily Newspaper 0 1v
61
Module 10 Data preparation and management
Case 9
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2 3 4v
(b) Lunch 1 2 3 4v
(c) Evening meal 1 2 3 4v
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1 v
No .............................................. 2 If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3 v
Sometimes................................... 2
Never............................................ 1
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1
1 to 10 books ............................. 2
11 to 50 books ........................... 3 v
More than 50 books ................. 4
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0 1v
(b) TV 0 1v
(c) Table to Write on 0 1v
(d) Bicycle 0 1
(e) Electricity 0 1v
(f) Running Water 0 1v
(g) Daily Newspaper 0 1v
62
The data entry manager software system
Case 10
3. How often do you eat each of the following meals? (Tick one number on each line)
Not 1 or 2 times 3 or 4 times Every
at all a week a week day
(a) Morning meal 1 2v 3 4
(b) Lunch 1 2 3 4v
(c) Evening meal 1 2 3 4v
4. Are there any books where you live that you could read which are not your
school books? (Tick one number)
Yes .............................................. 1 v
No .............................................. 2 If “No”, go to question 6.
5. If “Yes”, how often do you read these books? (Tick one number)
Always .......................................... 3
Sometimes................................... 2
Never............................................ 1 v
6. If “Yes”, how many books are there in your home? (Tick one number)
None............................................. 1
1 to 10 books ............................. 2
11 to 50 books ........................... 3 v
More than 50 books ................. 4
7. Do you have the following things in your home? (Tick one number on each line)
Do not have this Have one or more
(a) Radio 0 1v
(b) TV 0 1v
(c) Table to Write on 0 1v
(d) Bicycle 0 1v
(e) Electricity 0 1v
(f) Running Water 0 1v
(g) Daily Newspaper 0 1v
63
Module 10 Data preparation and management
64
The amount of work involved in resolving these problems, often
called “data cleaning” can greatly be reduced by using well-designed
instruments, qualified field administration and coding personnel
and appropriate transcription mechanisms. The steps that must
be undertaken to verify the data are implied in the quality
standards that have been defined for the corresponding survey.
Procedures must be implemented for checking invalid, incorrect
and inconsistent data, which may range from simple deterministic
univariate range checks to multivariate contingency tests between
different variables and different respondents. The criteria on which
these checks are based depend, on the one hand, on variable type
(i.e. different checks may apply to data variables, control variables,
and identification variables) and, on the other hand, to the manner
and sequence in which questions are asked. For some questions a
certain number of responses are required, or responses must be
given in a special way, due to a dependency or logical relationship
between questions.
65
Module 10 Data preparation and management
66
Data verification
67
Module 10 Data preparation and management
2. Special recodings
Sometimes it is necessary to recode certain variables before they
can be used in the data analyses. Examples for such situations are:
3. Value validation
Background questions and test items for which a fixed set of codes
rather than open-ended values applied need to be checked against
the range validation criteria defined in the codebook. Variables with
open-ended values (e.g. “Student age”) need to be checked against
theoretical ranges.
68
Data verification
69
Module 10 Data preparation and management
70
Data verification
71
Module 10 Data preparation and management
Some questions are asked and/or coded in terms of more than one
variable. The data verification rules applicable to such variables then
depend on whether the variables were related to each other and on
whether open-ended codes or a fixed set of codes were used.
Further problems arise when data that are missing are not properly
distinguished from “zero” values. For example, suppose in a
questionnaire for school principals there is a question asking for the
enrolment rates of boys and girls. If the school principal in an all
girls-school leaves out the question asking for the boys enrolment
rates implying that the omission means a “zero”, then the coder
might enter a missing code for this question to the datafile which is
misleading. An extra data verification step needs then to be applied
(that in this case would cross-check the variables for boys and girls
enrolments) in order to check for these problems and to avoid a
distortion of the corresponding sample estimates.
72
Data verification
between cases where a student did not check any of the response
options because none of the options applied and cases where a
student omitted the whole question. However, in the analysis it is
important to know whether a respondent omitted the whole question
or whether he or she did not check a particular response option. It is
best to avoid such problems by not using such response formats.
73
Module 10 Data preparation and management
1. Unique ID check
There must be only one record within a file for each unit of analysis
surveyed. This verification procedure checks whether each record
has been allocated with a unique identification code.
2. Column check
When a series of similar variables exist in a file, it is possible
that the enterer skips a variable or enters a variable twice, and
consequently a column shift occurs. This can be avoided if you
introduce variables in the datafile at regular positions in the
codebook, into which the data entry personnel must enter a blank
value. In order to be recognized by the automatic checking routines
of the WinDEM program, the names of these variables must have
the prefix “CHECK”. Column shift should not occur if the data
enters followed these directions of entering the blank values. You
can also see that the data entry proceeded correctly by looking at
the “Table entry” from the “view” menu.
3. Validation check
As mentioned before, WinDEM assures that the values are within
the range specified in the structure file unless the data puncher
explicitly confirms the out-of-range values entered. This validation
criteria check will show all the variables of all the cases that have
been “confirmed” to contain out-of-range values. This can be useful
especially when many data enterers are involved in the survey study.
74
Data verification
4. Merge check
WinDEM allows you to check the consistency between variables.
This check detects records in a datafile that do not have matches in
a related datafile for a higher level of data aggregation. For example,
a survey in which data is collected from the students and from the
school principals of the schools in which the students are enrolled.
In such a case, the student data could be recorded in a student
datafile with the name “student.DBF”, and the data from the school
principals could be recorded in a school datafile with the name
“school.DBF”. To check whether each student in the student datafile
has a matching school principal in the school datafile, the school
identification code “IDSCHOOL” must exist in both the student
datafile and the school datafile.
Using “Merge check” from the “Verify” menu, you can select the
variables (or variable combinations) by which the records in the
selected data file are matched against the records in the higher-
level aggregated data file. The software will ask you to specify the
datafile against which to check the merge of the current datafile in
the “File Open Dialog”.
The program will notify you if some errors are found. The software
will ask you if you want to open the data verification report for
further details.
75
Module 10 Data preparation and management
The format of the data after data entry and data verification
processes have been completed, is often not the best format for the
use in data analyses. In order to manipulate, analyze and report
the information collected in a convenient and efficient way, the
data needs to be organized in a database system. Such a database
system is a structured aggregation of data-elements which can be
related and linked to each other through specified relationships.
The data-elements can then be accessed through specified criteria
and through a set of transaction operations, which are usually
implemented through a data-retrieval language. In such a database
system the links between the physical data stored in the computer,
their conceptual representation, and the views of the users on the
data are implemented through a database management system.
The database system ensures that: (a) information is stored with as
little redundancy as possible; (b) data are stored in a way which is
independent of the application and the storage is independent of
the users’ view on the data; (c) inconsistencies between different
datafiles are avoided; and (d) data can be stored centrally and be
shared and controlled by a single security system.
76
For the purpose of database construction all data need first to
be organized in logical entities such that the data-elements are
logically de-coupled especially with respect to different levels of
data aggregation. Different conceptual data-models are used in
the design of database systems which are associated with different
functional tasks.
If the primary focus of the researcher is data analysis, then the use
of statistical packages with integrated data management capabilities
are often valuable. For this purpose the Statistical Analysis System
(SAS) is currently the most promising candidate. It allows the
researcher to generate and link data structures and to programme
data analysis requests without requiring software development
expertise.
77
Module 10 Data preparation and management
8 Conclusion
The careful planning and implementation of data management are
essential to obtain accurate and valid survey results and to avoid
delays in survey administration. Computing staff should therefore
be consulted from the very beginning of a research project.
This module has shown that data management issues are relevant,
and must be planned, during almost all phases of a research project;
starting from the design of the data collection instruments and
the development of the coding schemes, through the design of the
data collection methods and field administration procedures, the
setting of quality standards, the data entry and data verification,
and finishing with database construction. To implement each of
these steps, various technologies are available and it is the task of
the researcher to decide which procedures are most appropriate for
the survey design given administrative, logistical, and economic
constraints.
78
Since 1992 UNESCO’s International Institute for Educational Planning (IIEP) has been
working with Ministries of Education in Southern and Eastern Africa in order to undertake
integrated research and training activities that will expand opportunities for educational
planners to gain the technical skills required for monitoring and evaluating the quality
of basic education, and to generate information that can be used by decision-makers to
plan and improve the quality of education. These activities have been conducted under
the auspices of the Southern and Eastern Africa Consortium for Monitoring Educational
Quality (SACMEQ).
In 2004 SACMEQ was awarded the prestigious Jan Amos Comenius Medal in recognition
of its “outstanding achievements in the field of educational research, capacity building,
and innovation”.
These modules were prepared by IIEP staff and consultants to be used in training
workshops presented for the National Research Coordinators who are responsible for
SACMEQ’s educational policy research programme. All modules may be found on two
Internet Websites: http://www.sacmeq.org and http://www.unesco.org/iiep.