Sei sulla pagina 1di 3

Introducing Validity

Question of validity impact on our daily lives and how we interact with people and the world around us.
The view of Validity presupposes that than we write a test we have an intention to measure something
is real and enquiry concerns finding out whether a test actually does measure what is intended.
Three Types of Validity in early theory ;
1. criterion-oriented validity
Relationship between particular test and a criterion to which wishing to make prediction.
a. predictive validity ; the term used when the test scores are used to predict some future
criterion, such as academic success.
b. Concurrent validity ; the scores used to predict a criterion at the same time test given.
2. Content validity ; Defined as any attempt to show that the content of the test is a representative from
the domain that is to be tested.
3. Construct validity ; defining what a ‘construct’ is. Perhaps the easiest way to understand the term
‘construct’ is to think of the many abstract nouns that we use on a daily basis, but for which it would be
extremely hard to point to an example. Consider these, the first of which we have already touched on.
1. Love 6. Aptitude
2. Intelligence 7. Extroversion
3. Anxiety 8. Timidity
4 .Thoughtfulness 9. Persuasiveness
5. Fluency 10. Empathy.

Construct validity and truth


In language teaching and testing, ‘fluency’ and ‘accuracy’ are two well-known constructs. Secondly, the
nomological network contains the observable variables – those things that we can see and measure
directly, whereas we cannot see ‘fluency’ and ‘accuracy’ directly.

Here we will set out Stendhal’s theory of love as if it were a nomological network. Constructs:
1. Passionate Love, ‘like that of Heloïse for Abelard’
2. Mannered Love, ‘where there is no place for anything at all unpleasant – for that would be a breach of
etiquette, of good taste, of delicacy, and so forth’
3. Physical Love, ‘where your love life begins at sixteen’

Write down a number of hypotheses. Stendhal went on to attach certain observable behaviours to each
‘type’ of love. Here are some of them. Which of these observable behaviours do you think Stendhal
thought characterized each type of love?
■ Behaviour always predictable
■ Lack of concentration
■ Always trying to be witty in public
■ Staring at girls
■ Following habits and routines carefully
■ Always very money-conscious
■ Engaging in acts of cruelty
■ Touching.

Validity theory occupies an uncomfortable philosophical space in which the relationship between theory
and evidence is sometimes unclear and messy, because theory is always evolving, and new evidence is
continually collected.

CUTTING THE VALIDITY CAKE


In this view,‘validity’ is not a property of a test or assessment but the degree to which we are justified in
making an inference to a construct from a test score (for example, whether ‘20’ on a reading test
indicates ‘ability to read first-year business studies texts), and whether any decisions we might make on
the basis of the score are justifiable (if a student below 20, we deny admission to the programme).

Table A1.1 Facets of validity (Messick, 1989: 20)


Test interpretation Test use
Evidential basis Construct validity Construct validity + Relevance/ utility
Consequential basis Value implications Social consequences

There are other ways of cutting the validity cake. For example, Cronbach (1988) includes categories such
as the ‘political perspective’, which looks at the role played by stakeholders in the activity of testing
include the test designers, teachers, students, score users, governments or any other individual or group
that has an interest in how the scores are used and whether they are useful for a given context.
From the list below, which pieces of information would be most useful for your evaluation of this test?
Rank-order their importance and try to write down how the information would help you to evaluate
validity:
■ analysis of test content
■ teacher assessments of students after placement
■ relationship to end-of-course test
■ analysis of task types
■ spread of scores
■ students’ affective reactions to the test
■ analysis of the syllabus at different class levels
■ test scores for different students already at the school

Test usefulness
Bachman and Palmer (1996: 18) have used the term ‘usefulness’ as a superordinate in place of construct
validity, to include reliability, construct validity, authenticity, interactiveness and practicality. Reliability
is the consistency of test scores across facets of the test. Authenticity is defined as the relationship
between test task characteristics, and the characteristics of tasks in the real world. Interactiveness is the
degree to which the individual test taker’s characteristics (language ability, background knowledge and
motivations) are engaged when taking a test. Practicality is concerned with test implementation rather
than the meaning of test scores.
Pragmatic validity
pragmatic validity is therefore dependent upon a view that in language testing there is no such thing as
an ‘absolute’ answer to the validity question. The role of the language tester is to collect evidence to
support test use and interpretation that a larger community – the stakeholders (students, testers,
teachers and society) – accept.

In a pragmatic theory of validity, how would we decide whether an argument was adequate to support
an intended use of a test? Peirce (undated: 4–5) has suggested that the kinds of arguments we construct
in language testing may be evaluated through abduction, or what he later called retroduction.

In language testing, the most adequate explanation is that which is most satisfying to the community of
stakeholders, not because of taste or proclivity, but because the argument put forward has the same
characteristics as a successful Sherlock Holmes case. And in language testing, the validity method is the
same: it involves the successful elimination of alternative explanations of the facts.

In order to conduct this kind of validity investigation a number of criteria have been established by
which we might decide which is the most satisfying explanation of the facts:
1. Simplicity, otherwise known as Ockham’s Razor, which states: ‘Pluralitas non est ponenda sine
necessitate’, translated as: ‘Do not multiply entities unnecessarily.’ In practice this means: the least
complicated explanation of the facts is to be preferred, which means the argument that needs the
fewest causal links, the fewest claims about things existing that we cannot investigate directly, and
that does not require us to speculate well beyond the evidence available.
2. Coherence, or the principle that we prefer an argument that is more in keeping with what we already
know. Testability, so that the preferred argument would allow us to make predictions about future
actions, behaviour, or relationships between variables, that we could investigate.
3. Comprehensiveness, which urges us to prefer the argument that takes account of the most facts and
leaves as little unexplained as possible.

When you have completed this task, you will have discovered why the argument is not adequate,
whereas in the story the argument is adequate, because it meets accepted criteria for the evaluation of
arguments.
We conclude this section by reviewing the key elements of a pragmatic theory of validity:
1. An adequate argument to support the use of a test for a given purpose, and the interpretation of
scores, is ‘true’ if it is acceptable to the community of language testers and stakeholders in open
discussion, through a process of dialogue and disagreement.
2. Disagreement is an essential part of the process in investigating alternative hypotheses and
arguments that would count against an adequate argument.
3. There are criteria for deciding which of many alternative arguments is likely to be the most adequate.
4 .The most convincing arguments should start at the end point of considering the consequences of
testing, and working backwards to test design.

Potrebbero piacerti anche