Designing Language Test

DESIGNING LANGUAGE TEST
Testing is a universal feature of social life. Throughout history people

have been put to the test to prove their capabilities or to establish their credentials;
this is the stuff of Homeric epic, of Arthurian legend.
There are many reasons for developing a critical understanding of the
principles and practice of language assessment. Obviously you will need to do so
if you are actually responsible for language test development and claim expertise
in this field. But many other people working in the field of language study more
generally will want to be able to participate as necessary in the discourse of this
field, for a number of reasons.
First, language tests play a powerful role in many people’s lives, acting as
gateways at important transitional moments in education, in employment, and in
moving from one country to another. Since language tests are devices for the
institutional control of individuals, it is clearly important that they should be
understood, and subjected to scrutiny. Secondly, you may be working with
language tests in your professional life as a teacher or administrator, teaching to a
test, administering tests, or relying on information from tests to make decisions on
the placement of students on particular courses.
Finally, if you are conducting research in language study you may need to
have measures of the language proficiency of your subjects. For this you need
either to choose an appropriate existing language test or design your own.
Thus, an understanding of language testing is relevant both for those
actually involved in creating language tests, and also more generally for those
involved in using tests or the information they provide, in practical and research
contexts.
TEST TYPES
Team System testing tools provides several test types that you can use for
your specific software testing purposes. This section describes those test types. It
also describes how to create and customize tests of each type, because these
processes are specific to the type of test.
However, common among the various test types are many test-related
tasks such as managing tests and working with test results. These common tasks
are described in the section Testing Tools Tasks.
In This Section
Working with Unit Tests
Provides links to topics that describe unit tests and how to create them.
Working with Web Tests
Describes how to create, edit, run, and view Web tests.
Working with Load Tests
Describes the uses of load tests, how to edit and run them, how to collect
and store load test performance data, and how to analyze load test runs.
Working with Manual Tests
Describes how to create and run manual tests, the only non-automated test
type.
Working with Generic Tests
Describes how to create and run generic tests. Generic tests wrap external
programs and tests that were not originally developed for use in the Team
System testing tools.
Working with Ordered Tests
Describes how to create ordered tests, which contain other tests that are
meant to be run in a specified order.
TEST SPECIFICATION
The test case specifications should be developed from the test plan and are
the second phase of the test development life cycle. The test specification should
explain "how" to implement the test cases described in the test plan.
Test Specification Items

Each test specification should contain the following items:
• Case No.: The test case number should be a three digit identifer of the
following form: c.s.t, where: c- is the chapter number, s- is the section
number, and t- is the test case number.
• Title: is the title of the test.
• ProgName: is the program name containing the test.
• Author: is the person who wrote the test specification.
• Date: is the date of the last revision to the test case.
• Background: (Objectives, Assumptions, References, Success Criteria):
Describes in words how to conduct the test.
• Expected Error(s): Describes any errors expected
• Reference(s): Lists reference documententation used to design the
specification.
• Data: (Tx Data, Predicted Rx Data): Describes the data flows between
the Implementation Under Test (IUT) and the test engine.
• Script: (Pseudo Code for Coding Tests): Pseudo code (or real code) used
to conduct the test.
TEST ITEMS
The table below presents both pros and cons for various test item types.
Your selection of item types should be based on the types of outcomes you are
trying to assess (see analysis of your learning situation). Certain item types such
as true/false, supplied response, and matching, work well for assessing lower-
order outcomes (i.e., knowledge or comprehension goals), while other item types
such as essays, performance assessments, and some multiple choice questions, are
better for assessing higher-order outcomes (i.e., analysis, synthesis, or evaluation
goals). The italicized bullets below will help you determine the types of outcomes
the various items assess.
With your objectives in hand, it may be useful to create a test blueprint
that specifies your outcomes and the types of items you plan to use to assess those
outcomes. Further, test items are often weighted by difficulty. On your test
blueprint, you may wish to assign lower point values to items that assess lower-
order skills (knowledge, comprehension) and higher point values to items that
assess higher-order skills (synthesis, evaluation).
Item Type Pros Cons

Multiple • more answer options (4- • reading time increased
Choice 5) reduce the chance of with more answers
(see tips for guessing that an item is • reduces the number of
writing correct questions that can be
multiple choice • many items can aid in presented
questions student comparison and • difficult to write four or
below) reduce ambiguity five reasonable choices
• greatest flexibility in • takes more time to write
type of outcome questions
assessed: knowledge
goals, application
goals, analysis goals,
etc.
True/False • can present many items • most difficult question
(see tips for at once to write objectively
writing • easy to score • ambiguous terms can
true/false • used to assess popular confuse many
questions misconceptions, cause- • few answer options (2)
below) effect reactions increase the chance of
guessing that an item is
correct; need many
items to overcome this
effect
Matching • efficient • difficult to assess
• used to assess student higher-order outcomes
understanding of (i.e., analysis, synthesis,
associations, evaluation goals)
relationships,
definitions
Interpretive • a variation on multiple • hard to design, must
Exercise choice, true/false, or locate appropriate
(the above three matching, the introductory material
item types are interpretive exercise • students with good
often criticized presents a new map, reading skills are often
for assessing short reading, or other at an advantage
only lower- introductory material
order skills; the that the student must
interpretive analyze
exercise is a • tests student ability to
way to assess apply and transfer prior
higher-order knowledge to new
skills w/ material
multiple choice, • useful for assessing
T/F, and higher-order skills such
matching as applications,
items) analysis, synthesis, and
evaluation
Supplied • chances of guessing • scoring is not objective
Response reduced • can cause difficulty for
• measures knowledge computer scoring
and fact outcomes well,
terminology, formulas
Essay • less construction time, • more grading time, hard
easier to write to score
• encourages more • can yield great variety
appropriate study habits of responses
• measures higher-order • not efficient to test large
outcomes (i.e., analysis, bodies of content
synthesis, or evaluation • if you give the student
goals), creative the choice of three or
thinking, writing ability four essay options, you
can find out what they
know, but not what they
don't know
Performance • measures higher-order • labor and time-intensive
Assessments outcomes (i.e., analysis, • need to obtain inter-
(includes synthesis, or evaluation rater reliability when
essays above, goals) using more than one
along with rater
speeches,
demonstrations,
presentations,
etc.)
The table below presents tips for designing two popular item types: multiple
choice questions and true/false questions.
Tips for Writing Multiple Choice Tips for Writing True/False

Questions Questions
• Avoid responses that are • Do not use definitive words such
interrelated. One answer should as "only," "none," and "always,"
not be similar to others. that lead people to choose false,
• Avoid negatively stated items: or uncertain words such as
"Which of the following is not a "might," "can," or "may," that
method of food irradiation?" It is lead people to choose true.
easy to miss the the negative • Do not write negatively stated
word "not." If you use negatives, items, as they are confusing to
bold-face the negative qualifier interpret: "Thomas Jefferson did
to ensure people see it. not write the Declaration of
• Avoid making your correct Independence." True or False?
response different from the other • People have a tendency to
responses, grammatically, in choose "true," so design at least
length, or otherwise. 60% of your T/F items to be
• Avoid the use of "none of the "false" to further minimize
above." When a students guesses guessing effects.
"none of the above," you still do • Use precise words (100, 20%,
not know if they know the half), rather than vague or
correct answer. qualitative language (young,
• Avoid repeating words in the small, many).
question stem in your responses. • Avoid making the correct answer
For example, if you use the word longer than
"purpose" in the question stem,
do not use that same word in
only one of the answers, as it
will lead people to select that
specific response.
• Use plausible, realistic
responses.
• Create grammatically parallel
items to avoid giving away the
correct response. For example, if
you have four responses, do not
start three of them with verbs
and one of them with a noun.
• Always place the "term" in your
question stem and the
"definition" as one of the
response options.
SCORING
Scoring have become a common method for evaluating student work in
both the K-12 and the college classrooms. The purpose of this paper is to describe
the different types of scoring rubrics, explain why scoring rubrics are useful and
provide a process for developing scoring rubrics. This paper concludes with a
description of resources that contain examples of the different types of scoring
rubrics and further guidance in the development process.
What is a scoring?
Scoring are descriptive scoring schemes that are developed by teachers or
other evaluators to guide the analysis of the products or processes of students'
efforts (Brookhart, 1999). Scoring rubrics are typically employed when a
judgement of quality is required and may be used to evaluate a broad range of
subjects and activities. One common use of scoring rubrics is to guide the
evaluation of writing samples. Judgements concerning the quality of a given
writing sample may vary depending upon the criteria established by the individual
evaluator. One evaluator may heavily weigh the evaluation process upon the
linguistic structure, while another evaluator may be more interested in the
persuasiveness of the argument. A high quality essay is likely to have a
combination of these and other factors. By developing a pre-defined scheme for
the evaluation process, the subjectivity involved in evaluating an essay becomes
more objective.
GRADING
Grading is a major concern to both new and experienced instructors. Some
are quite strict at the beginning to prove that they are not pushovers. Others, who
may know their students personally, are quite lenient. Grades cause a lot of stress
for undergraduates; this concern often seems to inhibit enthusiasm for learning for
its own sake (“Do we have to know this for the exam?”), but grades are a fact of
life. They need not be counterproductive educationally if students know what to
expect.
Grades reflect personal philosophy and human psychology, as well as
efforts to measure intellectual progress with objective criteria. Whatever your
personal philosophy about grades, their importance to your students means that
you must make a constant effort to be fair and reasonable and to maintain grading
standards you can defend if challenged.
College courses are supposed to change students; that is, in some way the
students should be different after taking your course. In the grading process you
have to quantify what it is they learned, and give them feedback, according to
some metric, to how much they learned.
The following four philosophies of grading are from instructors at FSU.
Which is closest to your philosophy?
Philosophy 1
Grades are indicators of relative knowledge and skill; that is, a student’s
performance can and should be compared to the performance of other students in
that course. The standard to be used for the grade is the mean or average score of
the class on a test, paper, or project. The grade distribution can be objectively set
by determining the percentage of A’s, B’s, C’s, and D’s that will be awarded.
Outliers (really high or really low) can be awarded grades as seems fit.
Philosophy 2
Grades are based on preset expectations or criteria. In theory, every
student in the course could get an A if each met the preset expectations. The
grades are usually expressed as the percentage of success achieved (e.g., 90% and
above is an A, 80-90% is a B, 70–80% is a C, 60 - 70 a D, and below 60 is an F).
Pluses and minuses can be worked into this range.
Philosophy 3
Students come into the course with an A, and it is theirs to lose through
poor performance or absence, late papers, etc. With this philosophy the teacher
takes away points, rather than adding them.
Philosophy 4
Grades are subjective assessments of how a student is performing
according to his or her potential. Students who plan to major in a subject should
be graded harder than a student just taking a course out of general interest.
Therefore, the standard set depends upon student variables and should not be set
in stone.
Instructor’s Personal Philosophy

Florida State University does not have a suggested grading philosophy.
Such decisions remain with the instructors and their departments. However, the
grading system employed ought to be defensible in terms of alignment with the
course objectives, the teaching materials and methods, and departmental policies,
if any.
The grading system as well as the actual evaluation are closely tied to an
instructor’s own personal philosophy regarding teaching. Consistent with this, it
may be useful in advance, to consider factors that will influence instructors’
evaluation of students.
• Some instructors make use of the threat of unannounced quizzes to
motivate students, while others do not.
• Some instructors weigh content more heavily than style. It has been
suggested that lower (or higher) evaluations should be used as a tool to
motivate students.
• Other instructors may use tests diagnostically, administering them during
the semester without grades and using them to plan future class activities.
Extra credit options are sometimes offered when requested by students.
• Some instructors negotiate with students about the method(s) of
evaluation, while others do not. Class participation may be valued more highly
in some classes than in others.
These and other issues directly effect the instructor’s evaluation of students’
performance.
As personal preference is so much a part of the grading and evaluating of
students, a thoughtful examination of one’s own personal philosophy concerning
these issues will be very useful.

Designing Language Test

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Designing Language Test

Caricato da

Copyright:

Formati disponibili

DESIGNING LANGUAGE TEST

Testing is a universal feature of social life. Throughout history people

Test Specification Items

Item Type Pros Cons

Tips for Writing Multiple Choice Tips for Writing True/False

Instructor’s Personal Philosophy

Potrebbero piacerti anche