Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
Philosophy is concerned with rational thinking about ... the general nature of
the world (metaphysics or theory of existence), the justification of belief (episte-
mology or theory of knowledge) and the conduct of life (ethics or theory of value)
(Honderich, 1995, p. 666). In education and language testing we are concerned
with questions of ontology (what we believe to be true), epistemology (how we
discover what is true), and the consequences of testing (the nature of ethical prac-
tice). This chapter will focus primarily on questions of ontology and epistemology,
as ethics is dealt with separately in Chapter 95. Furthermore, while general agree-
ment among language testers exists on key ethical principles to guide our practice,
there are radical differences of views regarding ontological and epistemological
questions.
As far as epistemology is concerned, the question usually boils down to: Should
the human sciences emulate the methods of the natural sciences or should they
develop their own? (Polkinghorne, 1983, p. 15). Realistsheirs to Hobbes, Mill,
and Comte, who believe in the existence of what we observe and test independ-
ently of the observer or testergive special place to the scientific method. Antireal-
ists, on the other hand, usually hold that the constructs we claim to test are not
independent of the language tester or the act of testing. The so-called objects of
our observation exist only in relation to our interpretations of them as they are
locally constructed. They would argue with Dilthey (1883/2008) that the richness
of human experience and culture cannot be captured by methods developed for
the natural sciences. Of particular importance in language testing is the social
turn, which brings critical analysis to test use and impact. There is much room
for disagreement here. Paradigm clashes are not unusual in the social sciences, but
in language testing the fault lines are more pronounced because, for most of its
The Companion to Language Assessment, First Edition. Edited by Antony John Kunnan.
2014 John Wiley & Sons, Inc. Published 2014 by John Wiley & Sons, Inc.
DOI: 10.1002/9781118411360.wbcla032
2 Interdisciplinary Themes
history, it has been firmly grounded in the scientific realism of early quantitative
approaches: One of the most important objects of measurement ... is to obtain a
general knowledge of the capacities of man by sinking shafts, as it were, at a few
critical points (Cattell & Galton, 1890, p. 380). In this chapter I set out the realist
and antirealist positions, realizing that there are many gradations between the two.
I argue that extreme positions on the cline are untenable. I make a case for realism
in the pragmatist tradition, which is not to be associated with the naive realism
that is the target of constructivism. I also recognize the role for critical research,
especially where language testing is misused or abused. I conclude by proposing
an optimistic view of the future within an Enlightenment-inspired framework.
I begin by describing the realist position, and then move on to antirealist
stances. With Bachman (2006, pp. 1967), I distinguish two kinds of antirea
list stance, the constructivist and the operationalist, although I prefer to call the
latter instrumentalist for reasons that will become clear, and because Kane (2006b,
p. 442) explicitly distances his approach to validation from the operationalist posi-
tion. I then discuss two key issues upon which language testers are in fundamental
disagreement because of their philosophical positions. I then briefly indicate the
research each position generates, and outline the challenges they face. Finally, I
suggest a way forward based on classical pragmatism.
Conceptualizations
Realism
Realists hold to the Enlightenment view that the scientific method is the most
productive in empirical research (whether quantitative or qualitative), as expressed
by Popper (1959, p. 3):
The applicability of realism to social sciences has also been championed by edu-
cationalists such as Dewey, for whom
the scientific method is simply the method of experimental enquiry combined with
free and full discussionwhich means, in the case of social problems, the maximum
use of the capacities of citizens for proposing courses of action, for testing them, and
for evaluating the results. (Putnam, 1990, p. 190)
Theories and evidence that provide the basis for decision making need to be
assessed using generally accepted criteria. In language testing, four have been
suggested (Fulcher & Davidson, 2007, p. 20):
It is argued that these criteria are paradigm free and can be used in theory
and model evaluation of any kind. However, the logic of the key criterion of
testability assumes an evidential approach to validation, which in turn presup-
poses that the evidence exists. It seems reasonable that a researcher in any evi-
denced-based discipline must subscribe to this notion, encapsulated in this
summary of Humes position: He holds that objects that have real existence
must have duration and must be independent of what we individually think
about them (Meyers, 2006, p. 63). In order to test theories we must have experi-
ences of enduring objects, events, or states that co-occur to a degree that would
minimally allow us to make statements about the likelihood of, and possible
reasons for, co-occurrence.
In language testing this leads to two claims. First, that individuals have a
stable language competence and capacity for use that endures for some time
even though it is subject to change (through learning or attrition), and that
responses to test items or tasks can be translated into numbers that are indexical
of that competence. This is not to deny that communication is a social act, but
recognizes that, unless an individual has an enduring performable competence,
they cannot engage in anything like the co-construction of discourse (Fulcher,
2003, pp. 1920). Second, that score meaning can be generalized and extrapo-
lated to relevant domains for a reasonable period of time, and with a known
degree of probability: our theory makes predictions about the likelihood of
future events.
Language testing has, for the most part, relied on realist assumptions
throughout its history, partly because it has been largely dependent upon the
normative practices in measurement that Quetelet imported into social science
research from astronomy in the creation of his social physics (1842/1962, p.
9); and, as Hamp-Lyons (2000, p. 582) has argued, The early history of lan-
guage testing on the American side of the Atlantic is part of the larger story of
intelligence testing, which was firmly grounded in positivism. This observa-
tion is largely correct, even if the geographical claim and the reference to posi-
tivism are not. First, there had always been an interest in measurement in the
United Kingdom (Edgeworth, 1888, 1890), and in 1923 Ballard (1923, p. 29)
could write
The British Press refers to mental tests as though they were new things invented by
Americans. In point of fact they are neither new nor American. They have been the
common property of the race since the dawn of history.
4 Interdisciplinary Themes
Ballard cites research by Cyril Burt, as well as the adaptation of the Binet tests.
Second, the label positivism is now typically used pejoratively, and with less
specificity than it deserves. Most researchers who hold a realist position do not
hold positivist views or espouse the verifiability principle (Jordan, 2004, p. 32).
Such a position is nominalist, and therefore profoundly antirealist. In arguing that
only verifiable statements are meaningful, and that only words which refer to
observables are capable of verification, all theoretical words are rendered unin-
telligible (Devitt & Sterelny, 1987, pp. 18990). Without theoretical language, sci-
entific research programs are unattainable; this is why positivism is referred to as
the linguistic turn in philosophy.
Constructionism
Constructionism (or social constructionism) is a postmodern approach that does
not ask about truth, but wishes to uncover the historical and cultural reasons that
led to the currently dominant version of truth. This may take the form of decon-
structing text where no form (particularly scientific) has any special status
(Derrida), or uncovering the power structures that are claimed to marginalize
people while legitimizing the power of the elite (Foucault). Constructivists hold
that our tests and what they measure are contingent upon the social context in
which they are designed and used.
All shades of constructionism are therefore critical, and the basic assumptions
are laid out by Hacking (1999):
seen as the mechanism through which the elite exercise power and maintain their
position (Foucault, 1975, pp. 18494). Questions of inductive inference are irrele-
vant, because all knowledges are equal in value; facts do not help to build,
support, or undermine theories, for the facts emerge only in the context of some
point of view (Fish, 1995, p. 253). The ultimate statement of this extreme position
was provided by Nietzsche (1888, 604):
Interpretation, the introduction of meaning not explanation (in most cases a new
interpretation over an old interpretation that has become incomprehensible, that is
now itself only a sign). There are no facts, everything is in flux, incomprehensible,
elusive; what is relatively most enduring isour opinions.
Instrumentalism
Although I have classed instrumentalism as antirealist, it may be more appropri-
ate to call it nonrealist, because instrumentalists hold that, if a test assists in useful
decision making, that is really all that matters. For instrumentalists the issue of
whether the terms of theories refer to any real entity is simply irrelevant. They
accept Humes fork, and hold that nondeductive (subjective) inference is always
subject to question and error. One argument for instrumentalism is provided by
Laudan (1981a) in his critique of realism, in which he uses historical evidence to
undermine the premise that successful theories have terms that refer. For example,
atomic theory failed to be empirically successful for hundreds of years, while the
miasmatic theory of disease transmission was: it led to policies of moving people
away from ports and introducing quarantine. Thus, theories are evaluated prima-
rily on the grounds of the degree to which they enable us to predict phenomena
and manipulate our environment in useful ways, as we can never be certain that
our terms refer.
Each of the three positions described in the introduction have impacted upon
language testing, leading to incommensurable stances that are explored in the next
section.
6 Interdisciplinary Themes
I have selected two themes for discussion. My rationale is that these best illustrate
fault lines that are directly related to philosophical beliefs.
Constructs/Theoretical Terms
Bachman (2006, pp. 1823) writes: When a researcher observes some phenome-
non in the real world, he generally does this because he wants to describe, induce
or explain something on the basis of this observation. That something is what can
be called a construct. These are nonobservable abstract nouns that are opera-
tionalized in such a way that we may make inferences about them from our
observations (Fulcher & Davidson, 2007, p. 7). Realists minimally subscribe to the
reality of these nonobservables.
This is very close to a correspondence theory of truththe natural home of the
realist. Models of communicative competence/language ability, from Ollers use
of Spearmans g to modern componential approaches, rest on an assumption
that the terms of the theory refer to real competences that are not merely useful
fictions.
Some researchers explicitly work within this paradigm rather than just assume
it to be the case:
We argue that the validity of any given teaching, learning, and assessment task
whether it is representative, authentic, and generalizableis just a more complex
version of the problem of determining whether a representation of a given state of
affairs is true or not. We provide two logical arguments. Both of them show the
construal (production and interpretation) of surface forms of discourse in order to
represent faithfully (and truthfully) certain changing states of affairs in the real world
is the necessary and sufficient basis for any validity to be found in any teaching,
learning, and assessment tasks whatever. (Badon, Oller, Yan, & Oller, 2005, p. 2)
Badon et al. argue that the validity of a test of aviation English can be evaluated
on the grounds of whether or not language used by pilots, air traffic controllers,
and test takers represents a true state of affairs in the real world. The facts of real
world events must be encoded into recognized conventional signs (linguistic
realizations). Based on Ollers theory of pragmatic mapping, the validity question
becomes whether the construct to be measured exists, and whether variation in
scores is causally linked to variations in the construct. It is therefore necessary to
develop tasks which require test takers to refer to objects and events in the real
world, and use language to control and change events.
The data-based approach to scale development, with its careful analysis of
language use in context, but relating observable variables to constructs such as
discourse management and pragmatics, would sit comfortably within this
kind of interpretation (Fulcher, Davidson, & Kemp, 2011). For this reason we add
the further observation that realist approaches do not abandon context. Rather,
social actions, and realia found in actual contexts of discourse. While codes, contexts,
and interactions must be distinguished in theory, in practice they interact holistically.
(Badon et al., 2005, p. 1)
For realists, context is real, not constructed, and so, while it is important to maintain
a connection between the world and conventional signs, realists must also take
seriously implicature and illocutionary intent.
Some would go further and argue that the term construct needs to be distin-
guished from trait, as the former implies that the theoretical term is a construc-
tion of the researcher: It may be part of a nomological net, but does not refer. That
is, construct theorists are said to really be constructivists with a scientific air about
them. For example, they may admit that a number of models could fit their data,
and the theoretical terms could vary by model. In contrast, Blackburn (2005, p.
118) describes a real realist, an industrial strength, meat-eating realist as someone
who holds that (a) there are no such things as constructs, only traits, which refer
to properties that exist in the real world, are discovered not created, and exist
independently of the researcher or theories, and (b) the terms define the properties
in ways that are not contingent. This position is best represented by Borsboom
and colleagues, who argue:
Validity in this formulation is equivalent to the existence of what the test meas-
ures, and goes back to the strongest scientific claims for testing made in the 19th
and early 20th centuries. The argument is that only if this ontological claim holds,
then the measurement procedure can be used to find out about the attributes to
which it refers (Borsboom, 2005, p. 152).
Constructivism is incommensurable with all shades of realism. Constructivists
challenge the primary claim that there are facts or traits in the real world that exist
independently of the mind of the researcher or test taker. The world itself is con-
structed. The trail of the human serpent is everywhere.
Do language testers deal with facts or things that exist? McNamara argues
that they do not. He represents a trend in language-testing research that focuses
upon the social nature of language testing, and the dependency of all concepts
and communication on locally situated interaction:
The constructs have no existence in the external world, and their conventional
names are signs constructed for socialprimarily politicalpurposes. More spe-
cifically, tests play a critical role in the power struggles that constitute identity-
forming social life, and may be deconstructed using Foucaultian insights (Shohamy,
2001, pp. 204, 548). The proper focus of attention is the social construction of
tests, their social impact, and role in policy. Construct labels no longer refer, reduc-
ing them to the embodiment of the values and ideologies at play in the power
struggles of the day.
As a direct consequence, the role of cognition is downplayed in critiques of
validity theories, and the link between performance (observation) and compe-
tence (construct) abolished. Using the notion of performativity from feminist
poststructuralism, McNamara also suggests:
So
to claims using the Toulmin model as the basis for an interpretive argument (see
Figure 85.1).
The evidence leads to a score generated by scoring rules (the application of a
scoring rubric), and an inference is made from the score to the claim. It is impor-
tant to note that this is done without the need for a construct inference such as
the students fluency.
The procedures for constructing and evaluating interpretative arguments are
generic, but adapted to the specific claims of each assessment context (Kane, 2010,
p. 79). Constructing and challenging arguments has an analogy in the courtroom
where, If the procedures have not been followed correctly or if the procedures
themselves are clearly inadequate, the interpretive argument would be effectively
overturned (Kane, 2006a, p. 29). The role of the prosecution is to undermine the
defences argument with alternative explanations of the data. The argument of
utility for an intended purpose is all that we are able to evaluate.
Neither the real realists nor the constructivists are keen on instrumentalism.
For the former it does away with the all-important traits (Borsboom, 2006a,
p. 431). For the latter it is too concerned with individual cognition (McNamara
& Roever, 2006). But this does not matter to instrumentalists, because they
accept both critiques: we need pluralism so that we have a range of approaches
to solve different problems (Kane, 2006b). If it seems useful, instrumentalists go
with it.
realists wish to abolish social impact and consequences from validity discussions
completely:
However, other realists do not agree. Badon et al. (2005, pp. 910) argue that, if a
test can be shown to measure a trait that is critical to aviation communication,
and if teaching this trait reduces miscommunication and hence aviation accidents,
this would (a) constitute evidence of validity, and (b) have a positive social
consequence.
Clearly, this is not likely to be enough for constructivists. McNamara and
Roever (2006, pp. 2050251), for example, describe Borsbooms version of realism
as an attempt to strip validity theory of its concern for values and consequences
and to take the field back 80 years to the view that a test is valid if it measures
what it purports to measure. They quote Shohamy with approval:
The ease with which tests have become so accepted and admired by all those who
are affected by them is remarkable. How can tests persist in being so powerful, so
influential, so domineering and play such enormous roles in our society? One answer
to this question is that tests have become symbols of power for both individuals and
society. Based on Bourdieus . . . notion of symbolic power, [we] will examine the
symbolic power and ideology of tests and the specific mechanisms that society
invited to enhance such symbolic power. (Shohamy, 2001, p. 117)
Current Research
Realism
Much of the research in designing assessments for specific purposes is generally
realist. We have seen that this is the case with aviation English, arguably one of
the highest stakes uses of tests. It seems unlikely that stakeholders would wish to
use a test that the designers claimed did not measure constructs/traits of interest
because they did not exist. Similarly, the growth of interest in diagnostic testing
(Jang, 2009) and the assessment of language disorders (Oller, 2012) has a strongly
realist flavor. Approaches that employ factor-analytic techniques, particularly
structural equation modeling, make strong realist assumptions about traits (e.g.,
Song, 2008). Work into the design of scoring models also assumes that perform-
ance in domains of interest can be described in terms of relevant generalizable
traits. For example, Fulcher et al. (2011) arrange observable variables from the
analysis of service encounters into clusters under the trait headings of discourse
competence and pragmatic competence. It is assumed that these competen-
cies exist, and that they are manifested through their associated observable vari-
ables. Most current test development activity also takes place within a realist
framework (Mislevy & Yin, 2012).
Constructivism
Constructivist research takes a number of forms. One trend is the description
of language use, particularly investigating locally co-constructed interaction
between participants in speaking tests (e.g., Brooks, 2009). Another area of interest
is the description and assessment of second language pragmatics (Roever, 2011).
There is always a strong fairness agenda in constructivist writing, with advocacy
for those who are marginalized. This can be combined with test analysis tech-
niques such as differential item functioning to discover if tests discriminate against
subgroups (McNamara & Roever, 2006). Where constructivists excel is in carrying
out case studies of the social use of tests, unmasking policy agendas behind test
use, and investigating the construction of identities through competing discourses
(Shohamy, 2001). Constructivist research in this vein helps maintain the conscience
of the field by asking difficult questions about contingent constructed ideas.
As constructivists are inherently distrustful of tests and the motivations of their
developers, there is little research into constructivist test development. The one
exception is dynamic assessment (DA). Set within a sociocultural theoretical
framework, DA uses assessment to scaffold language acquisition, and so is con-
cerned with change (Fulcher, 2010, pp. 727). As each use of DA is considered a
unique encounter, the preferred method of research is the individual case study,
which cannot be generalized to any other case (Lantolf & Poehner, 2011).
Instrumentalism
Research within this tradition is concerned with establishing and following appro-
priate procedures, because reports of what was done count as validity evidence
12 Interdisciplinary Themes
Challenges
Realism
Realism needs strong testable theories, which it is generally acknowledged do not
exist in psychology or language testing even by real realists (Borsboom, 2006b,
pp. 4645). Closely related to this problem is the fact that traits in language
testing are not separate from the individuals in whom we posit their existence;
even if we can claim that traits like discourse competence or fluency really
exist, separating out their effect on measures is simply not as easy as in the natural
sciences. Perhaps the most intransigent problem in all social science research is
that the researcher interacts with and changes the subjects of the research, both as
a result of the research methods, and by naming traits (value labels in Messicks
terms). In short, there is a genuine problem not only with reference but also with
defining and operationalizing traits (Fulcher, 2010, pp. 324), and this may be the
most significant reason why social science theories have not lead to research pro-
grams that are as successful as those in the natural sciences.
Constructivism
The first problem is that constructivist research is ideologically driven. Those
committed to a Foucaultian reading of the use of tests will see evidence of struggle
and marginalization in any data they collect. In principle, there is no data that
could falsify a priori beliefs. The second problem is concerned with what is con-
structed. Hacking (1999) argues that constructivism is useful as a tool to investi-
gate ideas that are abstractions of observables and reified within a matrix of
facts and relations. In language testing, such an idea would be the native
speaker (Davies, 2003). Individual native speakers exist, and are not problematic.
We manage to classify them accurately despite dialects and idiolects. But once we
extract the idea of the native speaker it becomes a political, social, and prob-
lematic thing; and we know that it is used for political purposes, including in
some cases weaving it into a matrix that relates it to territory and citizenship.
However, critical social tools are not appropriate for the analysis of objects in the
real world, theoretical terms, or elevator words like knowledge or reality.
We do not construct people, trees, quarks, or (in the case of elevator words) eve-
rything. That would be to reduce the world to mere mental states (without indi-
viduals in which to reside).
Philosophy and Language Testing 13
Instrumentalism
The only test of success in instrumentalism is the utility of a belief, practice, or
test to improving life and furthering our projects. While engagement with data
is important, it is accepted that all our theories are underdetermined, and hence
no single explanation is true. This does not matter, however, as long as we
have an assessment process that proves to be useful for making decisions with
reasonable accuracy. Perhaps the major criticism to be directed at instrumental-
ism is its lack of ambition. It has given up on the larger questions of truth (just
what is the nature and structure of language knowledge and ability for use in a
specified domain?) in return for a purely epistemological solution to a practical
problem.
This is not a new problem for instrumentalism, and neither is the standard
response. Dewey (1912) argues that truth is wrapped up with the notion of social
credit, or what works to improve the human condition:
I should say that as method for philosophy it indicated a more severe intellectual
conscience; less free and easy use of the concept of Truth in general and more careful
use of truths in particular to designate such conceptions and propositions as have
emerged successfully from the test conditions that are practically appropriate.
(Dewey, 1912, p. 80)
Future Directions
Bachman (2006, p. 200) correctly suggests that many studies do not succeed in
clearly combining philosophical approaches. We should add that frequently they
do not articulate their own philosophical assumptions, and some are internally
incoherent. Even when they do articulate assumptions there can be less clarity
than is sometimes required. This is the case, for example, in Fulcher and Davidson
(2007), where there is some sliding between classical and modern pragmatism,
which has led some readers to (mistakenly) assume that the text has a postmodern
agenda. Researchers also need to be aware that while some combining is possible
there are areas where assumptions are incommensurable. It is a disservice to the
field to paper over the fault lines, for it is only in disagreement and healthy debate
that progress is made (Mill, 1859/1998, p. 25).
The first important question for the future relates to the nature of our con-
structs/traits. Unless there is some general consensus, it appears that the field will
follow three separate agendas. I will start by making explicit what is implicit in
the preceding discussionthat the constructivist position is both confused and
untenable in this respect. If everything is constructed and contingent, from proc-
esses to traits, our project is lost from the start.
The rest of the problem may be tackled by recourse to classical pragmatism.
Pragmatism was defined by Peirce in Baldwins dictionary (1902/1998, p. 300) as:
What kind of being has it? What does its reality consist in? Why it consists in some-
thing being true of something else that has a more primary mode of substantiality.
Here we have, I believe, the materials for a good definition of abstraction. (1903,
p. 134)
In the case of fluency, the abstraction consists of a set of primary substances (in
Peirces terms), which may include features such as speed of delivery, pausing
(for content planning at syntactically appropriate slots), hesitating (causing syn-
tactic disjunct), and so on. Peirce continues to a definition: An abstraction is a
substance whose being consists in the truth of some proposition concerning
a more primary substance (1903, p. 135). If the categories of fluency described
in Fulcher (1996) can be observed, and if they vary in ways predicted (North, 2007,
p. 657, found independently that the fluency descriptors were the only consistent
set capable of acting as anchors in the construction of the CEFR), the abstraction
is true, even though its name is conventional. Finally, Peirce (1903, p. 134) insists
reality can mean nothing except the truth of statements in which the real thing is
asserted. According to this treatment it is arguably the case that fluency is a
trait that has the property of being real (although it is questionable how real it
remains if reductionist strategies are employed for the sake of automated scoring
or research, as in the case of Bernstein, Van Moere, & Cheng, 2010, p. 362), just as
hardness and weight are real because of their practical consequences.
The pragmatist strategy therefore avoids the need for a strong correspondence
theory of truth that is required by the real realists on the one hand, while incor-
porating the instrumentalist arguments supported by relevant empirical data on
the other. It steers a course between extremes, incorporating the advantages of
each, while mitigating the challenges.
Research agendas within such a framework could lead to substantive validation
programs. This would have practical consequences; as Laudan (1981b, p. 145)
says: the aim of science is to secure theories with a high problem-solving effec-
tiveness and language testing is a problem-solving activity.
The second way forward is to re-engage with a progressive Enlightenment
agenda that incorporates consideration of consequences, but without ideological
baggage. All fields evolve, and for the most part advances are made through incre-
mental theory building, empirical research, and conceptual development. Theory
in natural sciences evolves as well, and each stage has allowed humans to manipu-
late their environment in predictable and successful ways in order to achieve more
than had previously been possible. This is also true of language testing and the
validation process. Karl Popper referred to this as verisimilitude, or the approxima-
tion of a theory to truth. Peirce (1877/1998, p. 155) held a similar view:
This great law is embodied in the conception of truth and reality. The opinion that
is fated to be ultimately agreed to by all who investigate, is what we mean by the
truth, and the object represented in this opinion is the real. That is the way I would
explain reality.
progress would be endless. Scientific inquiry does not lead to the discovery of
Truth with a capital T, but makes genuine progress by not being wrong. A better
language-testing future cannot be built on a static or ideological view of society,
individuals, or trait definitions. It needs an optimistic agenda of expanding our
knowledge, and learning how to build better tests in the service of meritocratic
and just decision making.
SEE ALSO: Chapter 31, Assessing Test Takers with Communication Disorders;
Chapter 46, Defining Constructs and Assessment Design; Chapter 86, Cognition
and Language Assessment; Chapter 93, The Influence of Ethics in Language
Assessment
References
Lantolf, J. P., & Poehner, M. E. (2011). Dynamic assessment in the classroom: Vygotskian
praxis for second language development. Language Teaching Research, 15(1), 1133.
Laudan, L. (1981a). A confutation of convergent realism. Philosophy of Science, 48(1),
1949.
Laudan, L. (1981b). A problem-solving approach to scientific progress. In I. Hacking (Ed.),
Scientific revolutions (pp. 14455). Oxford, England: Oxford University Press.
McNamara, T. (2001). Language assessment as social practice: Challenges for research.
Language Testing, 18(4), 33349.
McNamara, T. (2006). Validity and values: Inferences and generalizability in language
testing. In M. Chalhoub-Deville (Ed.), Inference and generalizability in applied linguistics:
Multiple perspectives (pp. 2745). Amsterdam, Netherlands: John Benjamins.
McNamara, T., & Roever, C. (2006). Language testing: The social dimension. London:
Blackwell.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13103). New
York, NY: Macmillan/American Council on Education.
Meyers, R. G. (2006). Understanding empiricism. Chesham, England: Acumen.
Mill, J. S. (1859/1998). On liberty. In J. Gray (Ed.), John Stuart Mills On liberty and other
essays (pp. 5128). Oxford, England: Oxford University Press.
Mislevy, R., & Yin, C. (2012). Evidence-centered design in language testing. In G. Fulcher
& F. Davidson (Eds.), The Routledge handbook of language testing (pp. 20822). London,
England: Routledge.
Nietzsche, F. (1888). The will to power. Book 3: Principles of a new evaluation. Retrieved October
25, 2012 from http://evans-experientialism.freewebspace.com/nietzsche_wtp03.htm
North, B. (2007). The CEFR illustrative descriptive scales. Modern Language Journal, 91,
6569.
Oller, J. W. (2012). Language assessment for communication disorders. In G. Fulcher & F.
Davidson (Eds.), The Routledge handbook of language testing (pp. 15061). London,
England: Routledge.
Peirce, C. S. (1877/1998). The fixation of belief. In E. C. Moore (Ed.), The essential writings
of Charles S. Peirce (pp. 12036). New York, NY: Prometheus Books.
Peirce, C. S. (1902/1998). Some contributions to Baldwins dictionary. In E. C. Moore (Ed.),
The essential writings of Charles S. Peirce (pp. 30013). New York, NY: Prometheus Books.
Peirce, C. S. (1903). Pragmatism as a principle and method of right thinking: The 1903 Harvard
Lectures on Pragmatism (P. A. Turrisi, Ed.). New York, NY: State University of New York
Press.
Pennycook, A. (2001). Critical applied linguistics: An introduction. Mahwah, NJ: Erlbaum.
Phillipson, R. (1988). Linguicism: Structures and ideologies in linguistic imperialism. In
J. Cummins & T. Skuttnab-Kangas (Eds.), Minority education: From shame to struggle (pp.
33958). Clevedon, England: Multilingual Matters.
Polkinghorne, D. (1983). Methodology for the human sciences: Systems of inquiry. Albany, NY:
State University of New York Press.
Popper, K. (1959). The logic of scientific discovery. London, England: Routledge.
Putnam, H. (1990). A reconsideration of Deweyan democracy. Southern Californian Law
Review, 63, 167197. (Reprinted in Goodman, R. B. (Ed.). (1995). Pragmatism: A contem-
porary reader [pp. 183204]. London, England: Routledge).
Quetelet, A. (1842/1962). A treatise on man and the development of his faculties. New York, NY:
Burt Franklin.
Roever, C. (2011). Testing of second language pragmatics: Past and future. Language Testing,
28(4), 46381.
Rorty, R. (1989). The contingency of language. In R. B. Goodman (Ed.), Pragmatism (pp.
10723). New York, NY: Routledge.
Philosophy and Language Testing 19
Shohamy, E. (2001). The power of tests: A critical perspective on the uses of language tests.
London, England: Longman.
Song, M.-Y. (2008). Do divisible subskills exist in second language (L2) comprehension?
A structural equation modeling approach. Language Testing, 25(3), 43564.
Toulmin, S. E. (2003). The uses of argument (2nd ed.). Cambridge, England: Cambridge Uni-
versity Press.
Suggested Readings
Baggini, J., & Fosl, P. S. (2003). The philosophers toolkit: A compendium of philosophical concepts
and methods. Malden, MA: Blackwell.
Blackburn, S., & Simmons, K. (Eds.). (1999). Truth. Oxford Readings in Philosophy. Oxford,
England: Oxford University Press.
Kenny, A. (2006). An illustrated history of Western philosophy (2nd ed.). London, England:
Blackwell.
Philosophy Bites (n.d.). Home page. Retrieved October 25, 2012 from http://www.philoso
phybites.com/