Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Las pruebas se utilizan en casi todos los pases con propsitos de orientacin
psicolgica, seleccin y asignacin. Su aplicacin ocurre en entornos tan diversos
como escuelas, servicio pblico, industria, clnicas mdicas y centros de
orientacin psicolgica. La mayora de las personas han tomado docenas de
pruebas y no le dan gran importancia al asunto. Sin embargo, para el momento en
que el individuo tpico llega a la edad de retiro, es probable que los resultados de
las pruebas psicolgicas hayan ayudado a moldear su destino. Los cambios en el
curso de la vida debidos a los resultados de pruebas psicolgicas podran ser
sutiles, como ocurre cuando un futuro matemtico accede a un curso de clculo
avanzado con base en calificaciones de rendimiento del primer ao de
preparatoria. Es ms comn que los resultados de pruebas psicolgicas alteren el
destino individual de manera profunda. El que se acepte a una persona en una
universidad y no en otra; que se le ofrezca un empleo, pero se le rechace en otro;
se le diagnostique como deprimido o no, todas estas determinaciones dependen,
al menos en parte, de la interpretacin de los resultados de pruebas que realizan
individuos con autoridad. Dicho en trminos sencillos, los resultados de las
pruebas psicolgicas cambian la vida. Por tal razn es prudente y, de hecho,
casi obligatorio que los estudiantes de psicologa aprendan acerca de los usos
actuales y de los abusos ocasionales en la aplicacin de pruebas. En el ejemplo
de caso 11 se ilustran los cambios de vida que son consecuencia de las
pruebas psicolgicas a travs de varias muestras de historias clnicas verdaderas.
razonar bien son los manantiales de la inteligencia 9et y Simon, 1905; segn la
traduccin en Fancher, 185).
4. Los reactivos estaban distribuidos segn su nivel aproximado de dificultad, en
vez de en funcin del contenido. Se realiz una estandarizacin preliminar con 50
nios normales cuyas edades iban de 3 a 11 aos y tambin con varios nios
subnormales y con retraso.
Las 30 pruebas en la escala de 1905 iban de las pruebas sensoriales
abiertamente simples, a las abstracciones verbales bastante complejas. As, la
escala era apropiada para evaluar la gama completa de la inteligencia desde el
retraso mental grave hasta los niveles superiores de la inteligencia dotada. La
escala completa se resume en el cuadro 11.
Excepto por las pruebas muy sencillas que se disearon para la clasificacin de
los idiotas de grado muy inferior (un trmino diagnstico muy desafortunado que
se ha abandonado desde entonces), las pruebas tenan importantes cargas hacia
las habilidades verbales, lo cual refleja el alejamiento de Binet con respecto a la
tradicin de Galton.
Un punto interesante que con frecuencia pasan por alto los alumnos de psicologa
en la actualidad, es que Binet y Simon no ofrecieron en su escala de 1905 un
mtodo preciso para llegar a una puntuacin total. Sera bueno recordar que su
propsito era la clasificacin, no la medicin, y que su motivacin era
completamente humanitaria, es decir, la de identificar a aquellos nios que
necesitaban asignarse a educacin especial. Segn normas contemporneas, es
difcil aceptar la confusin inherente a dicho enfoque, pero ello puede reflejar una
inclinacin moderna hacia la cuantificacin ms que una debilidad por parte de la
escala de 1905. De hecho, su escala fue popular entre los educadores en Pars.
Incluso en ausencia de una cuantificacin precisa, el enoque fue exitoso en la
seleccin de candidatos para clases especiales.
La idea de una prueba es, por tanto, un elemento que domina en nuestra cultura,
una caracterstica que damos por sentada. Sin embargo, el concepto que tiene un
lego acerca de ellas no necesariamente coincide con la perspectiva ms
restringida que tiene un psicmetra (especialista en psicologa o educacin que
desarrolla y evala pruebas psicolgicas). Debido a los equvocos generales en
cuanto a la comprensin de su naturaleza, es adecuado comenzar este tema con
una pregunta fundamental que define el campo de accin de todo el libro: qu es
una prueba?
La imagen que se desea representar tiene que ver en especial con las pruebas
referentes a la norma las que utilizan una poblacin bien definida de personas
para su esquema interpretativo. Sin embargo, las caractersticas definitorias de
una prueba difieren un tanto en el caso especial de las pruebas referidas a criterio
las que miden lo que una persona puede hacer, en lugar de comparar los
resultados con los niveles de desempeo de otros. Por tal razn, se tratan por
separado las pruebas referidas a criterio.
El procedimiento estandarizado es una caracterstica esencial de cualquier prueba
psicolgica. Se considera que una prueba est estandarizada si los
procedimientos para su aplicacin son uniformes de un examinador a otro y de un
ambiente a otro. Por supuesto, la estandarizacin depende, en cierto grado, de la
capacidad del examinador. Incluso la mejor prueba puede resultar intil en manos
de un evaluador descuidado, con entrenamiento deficiente o mal informado, como
descubrir el lector en el tema 2B, Proceso de aplicacin. Sin embargo, la mayora
de los examinadores son competentes. Por tanto, la estandarizacin depende en
gran medida de las disposiciones de aplicacin que se encuentran en el manual
de instrucciones que por lo comn acompaa a una prueba.
La formulacin de las instrucciones es un paso esencial para la estandarizacin de
una prueba. A fin de garantizar procedimientos uniformes de aplicacin, quien
desarrolla una prueba debe proporcionar materiales - estmulo comparables para
todos los examinados, debe especificar con una precisin considerable las
instrucciones verbales para cada reactivo o subprueba y debe aconsejarle al
examinador cmo ha de manejar una diversidad de dudas por parte de la persona
evaluada.
Para ilustrar estos puntos, considrense las diversas maneras en que una persona
que desarrolla una prueba podra enfocar la evaluacin de la retencin de dgitos
el nmero mximo de dgitos presentados verbalmente que un sujeto puede
recordar de memoria. Podra ser que una prueba no estandarizada de retencin
de dgitos sugiriera tan slo que el examinador presentara de manera verbal series
cada vez ms largas de nmeros hasta que el sujeto fallara. El nmero de dgitos
en la serie ms larga recordada sera, entonces, la capacidad de retencin de
dgitos del sujeto. La mayora de los lectores puede darse cuenta de que una
prueba con tal definicin tan general carecer de uniformidad de un examinador a
otro. Si quien aplica la prueba est en libertad de improvisar cualquier serie de
dgitos, qu podra impedirle que presentara, con la inflexin familiar de un
locutor de televisin, 1-800-325-3535? Tal serie sera bastante ms fcil de
recordar que un conjunto ms aleatorio, por ejemplo, 7-2-8-1 -9-4-6-3-7-4-2. La
velocidad de presentacin tambin puede tener un efecto crucial sobre la
uniformidad de una prueba de retencin de dgitos. Para propsitos de
encontrado deficiente. Esta es una leccin para quienes emplean las pruebas: el
hecho de que una prueba exista y declare medir cierta caracterstica, no es
garanta de que sea veraz en sus afirmaciones. Una prueba puede tener un bonito
ttulo, instrucciones precisas, normas elaboradas, empaque atractivo y
descubrimientos preliminares, pero si en el estudio desapasionado por parte de
investigadores independientes, la prueba no puede pronosticar conductas
externas apropiadas, entonces es intil.
Las pruebas situacionales incluan tareas de grupo para transportar equipo al otro
lado de un arroyo y escalar una pared de 3 m de altura, s como el escrutinio
individual de la capacidad para sobrevivir un interrogatorio realista y comandar a
dos subordinados poco cooperativos en una tarea de construccin.
Con base en las observaciones conductuales y en los resultados de prueba, el
personal de la OSS calificaba a los candidatos en docenas de rasgos especficos,
en categoras tan amplias como liderazgo, relaciones sociales, estabilidad
emocional, inteligencia efectiva y capacidad fsica. Estas calificaciones sirvieron
para seleccionar al personal militar de la OSS.
TIP0S DE PRUEBAS
su diseo y propsito, deben aplicarse a una sola persona. Una ventaja importante
de estas ltimas es que el examinador puede estimar el nivel de motivacin del
sujeto y evaluar la relevancia de otros factores (p. ej., impulsividad o ansiedad)
sobre los resultados de prueba.
Por conveniencia, las pruebas se clasificarn en ocho categoras representadas
en el cuadro 21. Cada una de ellas contiene pruebas referidas a la norma,
referidas a criterio, individuales y grupales. El lector observar que cualquier
tipologa de las pruebas es una determinacin puramente arbitraria. Por ejemplo,
podra postularse incluso otra dicotoma: pruebas que buscan medir el desempeo
mximo (p. ej., una prueba de inteligencia) contra las que buscan estimar una
respuesta tpica (p. ej., un inventario de personalidad).
En un sentido estricto, existen cientos quiz miles de tipos diferentes de
pruebas, cada una de las cuales mide un aspecto ligeramente diferente del
individuo. Por ejemplo, podra discutirse que incluso dos pruebas de inteligencia
constituiran diferentes tipos de medida. Una prueba podra revelar la suposicin
de que la inteligencia es un constructo biolgico que puede medirse mejor a travs
de las ondas cerebrales, mientras otra podra fundamentarse en la perspectiva
tradicional de que la inteligencia se exhibe en la capacidad para aprender
habilidades aculturadas como el vocabulario. Agrupar ambas medidas bajo la
categora de pruebas de inteligencia es con toda seguridad una simplificacin
exagerada pero, sin embargo, es un punto de partida til.
Como se vio en el primer captulo, las pruebas de inteligencia se disearon
originalmente para tomar una muestra de un amplia variedad de habilidades, a fin
de estimar el nivel intelectual general del individuo. Las escalas BinetSimon
tuvieron xito, en parte, debido a que incorporaban tareas heterogneas,
incluyendo definiciones de palabras, memoria de diseos, preguntas de
comprensin y tareas de visualizacin espacial. Las pruebas grupales de
inteligencia que florecieron con tal profusin durante y despus de la segunda
Guerra Mundial tambin medan capacidades diversas como lo demuestra la
prueba Army Alfa con sus ocho secciones diferentes que miden juicio prctico,
informacin, aritmtica y razonamiento, entre otras habilidades.
instituciones educativas utilizan con frecuencia las pruebas para determinar los
niveles de asignacin para los alumnos y las universidades evalan a quin deben
admitir, basadas parcialmente en las puntuaciones de prueba. Los sistemas del
servicio pblico estatales, federales y locales tambin dependen, en gran medida,
de las pruebas para propsitos de seleccin de personal.
Incluso el profesional independiente utiliza principalmente las pruebas para la
toma de decisiones. Los ejemplos incluyen al psiclogo consultor que emplea una
prueba de personalidad para determinar si un departamento de polica contrata a
un candidato y no a otro y al neuropsiclogo que emplea pruebas para concluir
que un cliente ha sufrido dao cerebral.
Pero la simple toma de decisiones no es la nica funcin de las pruebas
psicolgicas. Es conveniente distinguir cinco usos de las pruebas:
. Clasificacin.
Diagnstico y planificacin del tratamiento.
Autoconocimiento.
Valoracin de programas.
Investigacin.
Con frecuencia, estas aplicaciones se traslapan y, en ciertas ocasiones, es difcil
distinguir unas de otras. Por ejemplo, una prueba que ayuda a determinar un
diagnstico psiquitrico podra tambin proporcionar una forma de
autoconocimiento. Se analizarn con mayor detalle estas aplicaciones.
El trmino clasificacin engloba una variedad de procedimientos que comparten
un propsito comn: asignar a una persona a una categora en vez de a otra. Por
supuesto, la asignacin de categoras no es un fin en s mismo, sino la base para
un tratamiento diferencial de algn tipo. As, la clasificacin puede tener
importantes efectos, como conceder o restringir el acceso a una universidad
especfica o determinar si se contrata a una persona para un trabajo en particular.
Existen muchas y variadas formas de clasificacin, cada una de las cuales
enfatiza un propsito particular en la asignacin de personas a categoras. Se
distinguir entre asignacin, deteccin, certificacin y seleccin.
La asignacin es la distribucin de personas en los diferentes programas
apropiados para sus necesidades o habilidades. Por ejemplo, con frecuencia las
universidades utilizan un examen de asignacin en matemticas para determinar
Hasta este punto se han analizado las aplicaciones prcticas de las pruebas
psicolgicas a problemas cotidianos, como la seleccin de personal, el diagnstico
o la evaluacin de programas. En cada uno de estos casos, las pruebas satisfacen
un propsito inmediato, de naturaleza prctica: ayudar al examinador a tomar
decisiones sobre personas o programas. Pero las pruebas tambin representan
una funcin importante en las ramas aplicada y terica de la investigacin
conductual. Como ejemplo de las pruebas en la investigacin aplicada,
considrese el problema que enfrentan los neuropsiclogos que desean investigar
la hiptesis de que la absorcin de plomo a bajo nivel causa deficiencias
conductuales en los nios. La nica manera factible de explorar esta suposicin es
sometiendo a prueba a nios normales con exposicin al plomo con una batera
de pruebas psicolgicas. Needleman, Gunnoe, Leviton, Reed, Peresie, Maher y
Barrett (1979) utilizaron un conjunto de pruebas tradicionales e innovadoras para
concluir que la absorcin de plomo a bajo nivel causa disminuciones en el CI,
alteraciones en el tiempo de reaccin y aumentos progresivos de conducta
indeseable en el saln de clase. Sus conclusiones inspiraron un tumultuoso y
amargo intercambio de opiniones que no se revisarn aqu (Needieman, y
colaboradores, 1990). Sin embargo, las pasiones inspiradas por este estudio son
la personificacin de una cuestin importante: los acadmicos y las personas que
crean la poltica pblica respetan las pruebas psicolgicas. Por qu si no habran
de participar en largos y enconados debates acerca de la validez de los
descubrimientos de investigacin basados en pruebas?
En ciertas ocasiones, las pruebas satisfacen un papel menos mundano al ayudar a
los cientficos a investigar cuestiones tericas que no tienen aplicaciones prcticas
inmediatas u obvias. Por ejemplo, para analizar la dependencia del campo
perceptual, Witkin (1949) invent las pruebas de la habitacin inclinada y de la silla
inclinada (HISI). El aparato para estas pruebas consiste en una habitacin tipo
caja, suspendida de pivotes con balines, de modo que pueda inclinarse en
cualquier grado hacia la izquierda o derecha. Dentro de la habitacin hay una silla
para el sujeto, la cual tambin puede inclinarse con independencia de la
habitacin. La tarea del sujeto es llevar su cuerpo a una posicin que se perciba
como recta. Los sujetos que dependen del campo alinean de cierto modo sus
cuerpos con respecto a la habitacin, en vez de basarse en la fuerza de gravedad
percibida. Los sujetos independientes del campo se ven menos afectados por la
habitacin mal alineada y estn ms a tono con sus seales perceptuales internas;
es decir, sus juicios perceptuales son relativamente independientes de la
informacin visual distorsionada. La HISI inspir toda una vida de investigaciones
acerca del desarrollo de la personalidad, pero en raras ocasiones se aplic a algn
problema prctico de prueba.
RESUMEN
NORMAS Y CONFIABILIDAD
Por lo general, el resultado inicial de las pruebas es una puntuacin natural como
el nmero total de afirmaciones de personalidad refrendadas en una direccin en
particular o el nmero total de problemas resueltos correctamente, quiz con la
adicin de puntos bonificados por las soluciones rpidas. En la mayor parte de los
casos, esta puntuacin inicial es intil en s misma. Para que los resultados de
prueba tengan significado, los examinadores deben ser capaces de convertir la
puntuacin inicial a alguna forma de puntuacin derivada que se base en la
comparacin con un grupo normativo o de estandarizacin. Gran parte de las
pruebas se interpreta al comparar los resultados individuales con el desempeo
del grupo normativo; las pruebas referidas a criterio, que se analizan ms
adelante, constituyen una excepcin.
Un grupo normativo consiste en una muestra de personas examinadas que son
representativas de la poblacin hacia la cual se dirige la prueba. Considrese una
prueba del conocimiento de vocabulario, diseada para utilizarse con futuros
estudiantes universitarios de primer grado. En este caso, podran recolectarse los
resultados del desempeo de una muestra grande, heterognea y nacional de
dichas personas, con propsitos de estandarizacin. El objetivo esencial de la
estandarizacin de una prueba consiste en determinar la distribucin de las
puntuaciones naturales en un grupo normativo, de modo que quienes desarrollan
la prueba tengan la posibilidad de publicar las puntuaciones derivadas conocidas
como normas. Como se analizar ms adelante, las normas se encuentran en
muchas variedades; por ejemplo, rangos percentiles, equivalentes por edad,
equivalentes de grado o puntuaciones estndar. En general, las normas indican la
posicin que tiene un individuo dentro de la prueba, en relacin con el desempeo
de otras personas de la misma edad, grado escolar, sexo y otras variables.
Para ser efectivas, las normas deben obtenerse con gran cuidado y construirse
segn los preceptos ya conocidos que se analizan despus; lo que es ms,
pueden volverse anticuadas en slo unos cuantos aos, de modo que la regla,
ms que la excepcin, debe ser el establecimiento peridico de nuevas normas
(ejemplo de caso 31). El tema de las normas se enfoca de manera indirecta, ya
que primero se le proporciona al lector un anlisis sobre las puntuaciones
naturales y despus se revisan los conceptos estadsticos esenciales para una
comprensin de las normas.
PUNTUACIONES NATURALES
DISTRIBUCIN DE FRECUENCIAS
Una manera simple y til de resumir los datos consiste en tabular una distribucin
de frecuencias (cuadro 32), la cual se prepara al especificar un pequeo nmero
de intervalos de clase de igual tamao y despus determinar cuntas
puntuaciones caen dentro de cada intervalo. La suma de las frecuencias de todos
los intervalos ser igual a N, el nmero total de puntuaciones en la muestra. No
existe una regla simple para determinar el tamao de los intervalos; ste,
obviamente, depende del nmero de intervalos deseado. Es comn que la
distribucin de frecuencias tenga entre 5 y 15 intervalos de clase. En el caso del
cuadro 32 existen 9 intervalos de clase con 3 puntuaciones cada uno. El cuadro
indica que un profesor obtuvo una calificacin de 4, 5 o 6; 8 profesores obtuvieron
7, 8 o 9, y as sucesivamente.
MEDIDAS DE VARIABIUDAD
Donde
significa sumatoria de, X representa cada puntuacin individual,
es
la media de las puntuaciones y N es el nmero total de puntuaciones. Como lo
sugiere el nombre, la varianza es una medida de variabilidad. Sin embargo, en
general, los psiclogos prefieren informar la desviacin estndar, que se calcula
obteniendo la raz cuadrada de la varianza. Por supuesto, la varianza y la
desviacin estndar transmiten informacin intercambiable una se puede
calcular a partir de la otra, al elevar al cuadrado (la desviacin estndar para
obtener la varianza) u obtener la raz cuadrada (de la varianza para obtener la
desviacin estndar).
DISTRIBUCIN NORMAL
Una tercera base para preferir una distribucin normal de las puntuaciones de
prueba es que, con frecuencia, la curva normal surge de manera espontnea en la
naturaleza. De hecho, los primeros investigadores se impresionaron tanto con la
universalidad de la distribucin normal, que consagraron a la curva normal como
una ley de la naturaleza. Galton (1888) escribi:
Es la suprema ley de la sinrazn. Cada vez que se controla una amplia muestra de
elementos caticos y se les ordena segn su magnitud, resultan haber tenido
latente todo el tiempo una insospechada y bellsima forma de regularidad.
Seguramente no existe una ley de la naturaleza concerniente a la forma que
deben asumir las distribuciones de frecuencia. Sin embargo, es cierto que muchas
caractersticas humanas importantes tanto fsicas como mentales producen
una estrecha aproximacin a la curva normal cuando se grafican las medidas de
muestras grandes y heterogneas. Por ejemplo, un hallazgo muy conocido es una
curva de distribucin casi normal para las caractersticas fsicas como peso,
estatura y tamao del cerebro al momento del nacimiento (Jensen, 1980).
Tambin se encuentra una distribucin aproximadamente normal en el caso de
numerosas pruebas mentales, incluso con aquellas que se construyeron
totalmente sin referencia a la curva normal. Para ilustrar este punto, se har
ASIMETRA
Darle sentido a los resultados de prueba es, en gran medida, una cuestin de
transformar las puntuaciones naturales en formas ms interpretables y tiles de
informacin. En el anlisis anterior acerca de las distribuciones normales, se
insinuaron las transformaciones al mostrar la manera en que el conocimiento
sobre la media y la desviacin estndar de dichas distribuciones pueden ayudar a
determinar la posicin relativa de una puntuacin individual. En esta seccin se
continuar con este tema de una manera ms directa, al presentar los requisitos
formales para varios tipos de transformaciones de las puntuaciones naturales.
The history of psychological tests is fascinating and has great relevance to current
practices. After all, the contemporary evidence not emerged from a vacuum; they
have slowly evolved from a multitude of precursors that paraded over the last 100
years. In view of this, this chapter presents a review of the historical roots of the
current psychological tests. The issue the: origins of the psychological tests,
focuses largely on the efforts of European psychologists to measure intelligence
during the latter part of the 19th century and the era before the first world war.
Often, these first tests of intelligence and his successors exercised powerful effects
on individuals examined them, so that the first theme also incorporates a brief
parenthesis that documents the importance of the results of psychological tests.
The theme iB: first tests in the U.S., organize the numerous tests developed by the
American psychologists in the first half of the 20th century.
The psychological tests in its current form originated more than 100 years ago in
laboratory studies on sensory discrimination, motor skills and reaction time. (The
British genius Francis Galton 1822-1911) invented the first battery of tests, a joint
peuliar of sensory and motor measures which will be reviewed later. The
American psychologist James McKeen Cattell (1860-1944) studied with Galton and
later, in 1890, stipulated the essential topics of modern tests in his classic article
entitled Mental Tests and Measurements. He was cautious and modest in
describing the purposes and applications of their instruments:
Psychology can not achieve certainty and accuracy of the physical sciences,
unless it is based on experiments and measurement. A step can be in that direction
to implement a series of tests and measures mental large numbers of individuals.
The results would have considerable scientific value in the discovery of the
constancy of the mental processes, their interdependence and its variation under
different circumstances. In addition, individuals would find interesting trials and
perhaps useful with regard to training, lifestyle or indication of disease. The
scientific and practical value of such evidence would significantly improve if it
adhered to a uniform system, so that they could compare and combine the
determinations made in different times and places (Cattell, 1890).
The conjecture of Cattell that "perhaps" evidence may be useful in "training,
lifestyle or indication of disease" should be classified with any security as one of
the most remarkable prophetic subestimaciones of all time. Anyone raised in the
Western world knows that the psychological tests have emerged from its tentative
IMPORTANCE OF TESTING
The tests are used in almost all countries with counselling, selection and
assignment purposes. Its implementation occurs in environments as diverse as
schools, public service, industry, clinical medical and psychological counselling
centres. The majority of people have taken dozens of tests and not give great
importance to the issue. For the moment in which the typical individual reaches the
age of retirement, however, likely that the results of psychological tests have
helped shape their destiny. Changes in the course of life due to the results of
psychological tests may be subtle, such as when a mathematical future access to a
course of calculation made based on qualifications of the first year of high school
performance. It is more common that the results of psychological tests alter the
individual fate in a profound way. The accepted a person in a University and not in
another; offered a job, but reject you in another; diagnosed as depressed or not, all
these determinations depend, at least in part, of the interpretation of the results of
tests involving individuals with authority. Told in simple terms, the results of
psychological tests change life. For this reason is prudent - and, indeed, almost
compulsory - that of psychology students to learn about the current uses and
occasional abuses in the application of tests. The example of case 1-1 illustrates
the life changes that are the result of the psychological tests through several
samples of real case histories. The importance of the evidence is also clear from
the perspective of a historical review. In general, psychology students consider the
historical subjects as dull, dry and difficult and, at times, these prejudices are well
justified. After all, many textbooks fail to explain the relevance of historical issues
and provided only vague sketches of the development of the initials in mental
testing. As a result, it is common to the students of psychology, the first semesters
completed incorrectly that the historical topics are boring and irrelevant. In fact, the
history of psychological tests is fascinating and has substantial relevance to current
practices. The historical evolution is relevant to the contemporary evidence for the
following reasons:
1. A review of the origins of the psychological tests helps to explain current
practices which, otherwise, might seem arbitrary and even unique. For example,
why many current intelligence tests incorporate not intellectual appearance as the
memory capabilities in the short term of digits? The answer is, in part, the historical
inertia - intelligence tests have always included a measure of retention of digits.
2. The power and limitations of testing also highlights more easily when such
methods are observed within a historical context. For example, the reader will
discover that the modern intelligence tests are exceptionally good to predict the
failure at school, precisely because of that this was the original and only purpose of
the first of these instruments developed in Paris, France, at the beginning of this
century.
3. Finally, the history of psychological tests contains some sad and unfortunate
episodes to help remember that it is it should not be too vehement in use currently
given to the evidence. For example, based on the mindless application and
prejuiciada of the results of intelligence tests, several prominent psychologists
helped to adopt the Decree 1924 of the immigration restriction (Irnmigration
Restriction Act of 1924).
Later chapters will explore the principles of psychological tests, will investigate
applications for specific fields (p. ej., personality, intelligence, neuropsychology),
and it will reflect on the social and legal implications of the evidence. However, the
reader will find that these issues are more understandable when analysing them in
a historical context. Thus, for the moment, begin with the revision of the
rudimentary forms of evidence that existed for more than 4 000 years in imperial
China.
children whose ages ranged from 3 to 11 years and also several subnormal
children and delay took place.
30 Tests on the scale of 1905 were openly simple sensory evidence, to the quite
complex verbal abstractions. Thus, the scale was appropriate to evaluate the full
range of intelligence - from serious mental retardation until higher levels of the
endowed with intelligence. Full scale is summarized in table 1-1.
Except for the very simple tests that were designed for the classification of the
idiots of much lower grade (a very unfortunate diagnostic term that has been
abandoned since then), the tests were important loads to verbal skills, reflecting
the removal of Binet with respect to the tradition of Galton.
An interesting point that is often overlooked by students of Psychology Today, is
that Binet and Simon not offered in their scale of 1905 a precise method to arrive at
a total score. It would be good to remember that its purpose was to the
classification, not the measurement, and that his motivation was completely
humanitarian, i.e. the identify those children who needed to be given to special
education. According to contemporary standards, it is difficult to accept the
confusion inherent in this approach, but this may reflect a modern tilt toward the
quantification rather than a weakness by the scale of 1905. In fact, its scale was
popular among educators in Paris. Even in the absence of a precise quantification,
the enoque was successful in the selection of candidates for special classes.
In 1908, Binet and Simon published a revision of the scale of 1905. In the previous
level, more than half of the items had been designed for individuals with very
noticeable delay; However, the main diagnostic decisions involved older children
and persons with a borderline intellect. To remedy this imbalance, the greater part
of the very simple items was abandoned and they were new to the upper end of
the scale. The scale of 1908 had 58 issues or evidence, almost double the number
of 1905. Added new tests, many of which are still used today: reconstruction of
prayers in disarray, copying of a Rhombus and realization of a sequence of three
orders. Some of the items consisted of absurdities that children had to detect and
explain. One of such items was fun for French children: "We found the body of an
unfortunate girl, cut into 18 pieces." "It is thought that the girl committed suicide".
However, this reactive was very disturbing to some U.S. individuals, which
demonstrates the importance of cultural factors on intelligence (Fancher, 1985).
The main innovation of the scale of 1908 was the introduction of the concept of
mental level. The tests had been standardized with nearly 300 normal children
between 3 and 13 years of age. This allowed that Binet and Simon ordered tests
according to the level of age which usually were they approved. Any reagents that
approval of 80 to 90 per cent of children of three years, was allocated to the level
of three years and so forth, until the age of 13. Binet and Simon also designed an
approximate rating system where a basal age from the level of age that are not
failed in more than one test was first determined. For every five tests approved at
levels above the basal level, he was granted a full year of mental level.
The first is predominantly used for two purposes: measure intelligence and detect
personality disorders. Therefore, it is understandable that the average citizen do an
equivalence between psychological tests and scores of CI, ink stains and
personality inventories. Certainly, in this view there is more than a grain of truth:
the measures of personality and intelligence are still the essential pillars of
psychological tests. However, the psicmetras have developed many other types
of these instruments for purposes different and imaginative that the pioneers could
never have anticipated. This chapter provides an overview of psychological tests
and its many applications. In item 2A, nature and uses of psychological tests,
summarizes the different types and the various applications of these. In the topic
2B, application process, emphasizes that the implementation of evidence is a
transaction between the examiner and the assessed person, not a sterile
measurement process.
From birth to old age, we run into these instruments at almost every moment of
change in life. The first test of the baby, made immediately after birth, is the test
Apgar, a rapid and multivariate evaluation of heart rate, breathing, muscle tone,
reflects irritability and color (Clarke-Stewart and Friedman, 1987). The Apgar score
(0 to 10) total score helps determine the need for any kind of immediate medical
attention. Later, an infant who has previously received a low Apgar score could be
candidate for an assessment of developmental disabilities. The preschool child can
perform tests in preparation for the school. Once it has begun with the school
career, each student crosses by hundreds, perhaps thousands, of evidence
academic before graduating (not to mention that measure disabilities to learning,
endowed with intelligence, vocational interests and admission to the University).
After graduating, adults can face testing for entering employment, license
management, authorization of safety, operation of personality, marital compatibility,
disabilities, cerebral dysfunction - the list is almost endless. Some people are even
facing a final indignity in the weaker of his last years party: a test to determine their
capacity to manage their economic affairs.
The idea of a test is, therefore, an element that dominates our culture, a feature
that we take for granted. However, the concept that has a lego about them does
not necessarily match the narrower perspective which has a psicmetra (specialist
in psychology or education that develops and evaluates psychological tests). Due
to the General misunderstanding as to the understanding of its nature, is
appropriate to begin this issue with a fundamental question which defines the
scope of the entire book: what is a test?
DEFINITION OF A TEST
discover. However, the majority of the examiners are competent. Therefore, the
standardization depends largely on the implementing provisions which are in the
instruction manual that usually accompanies a test.
The wording of the instructions is an essential step for the standardization of a test.
In order to ensure uniform procedures for implementation, who develops a test
must provide materials - comparable stimulus for all examinees must specify with
considerable accuracy verbal instructions for each reagent or subprueba and
should advise you to the browser how to handle a variety of questions by the
assessed person.
To illustrate these points, consider the many ways in which a person who develops
a test could focus the evaluation of the retention of digits - the maximum number of
digits presented orally a subject can recall from memory. It could be a not
standardized test of retention of digits to suggest only that the examiner submitted
verbally increasingly long series of numbers so the subject fails. The number of
digits in the remembered longest series would be, then the retention capacity of
digits of the subject. The majority of readers can realize that a test with such as
general definition will have no uniformity of a browser to another. If who applied the
test is free to improvise any number of digits, what could prevent him to submit,
with the familiar inflection of a television broadcaster, "1-800-325-3535"? This
series would be far easier to remember than a more random set, for example, "7-28-1 - 9-4-6-3-7-4-2". The speed of presentation can also have a crucial effect on
the uniformity of a test of retention of digits. For standardization purposes, it is
essential that all reviewers submit each series to a constant rate; for example, one
digit per second. "Finally, the examiner needs to know how to react to unexpected
responses, as a subject to tell: could repeat them again?" For obvious reasons, the
usual advice is "no".
Who develops a test may even get to the point of recommending the behaviour
desired in the examiner, as maintaining a neutral facial expression when the
response of a subject. These seemingly subtle influences can have a serious
impact on the uniformity of the testing procedures. For example, a review that
smiles with flippant when it registers the responses could lead to that the subject
feels anxious and fail in an easy task. The next topic, the application process, will
analyse the potential influence of the examiner on test results.
Psychological testing is also a limited behavior sample. Neither the subject nor the
examiner have long enough for a truly comprehensive test, even when it is directed
to a well defined and finite behavioural domain. Thus the practical constraints
dictate that a test is only a sample of behavior. However, the sample of conduct is
of interest only to the extent that allows the examiner to make inferences about the
Where X is the observed score, T the true score and the positive or negative error
component. The best you can do to who develops a test is to ensure that e is very
small. It never will be eliminated completely, as you can not know its precise impact
on the individual case. The concept of measurement error will be discussed in item
3B, concepts of reliability.
The second caveat is that evidence consumers must prevent materialize the
measures characteristics. The results do not represent a "thing" that has physical
reality; Typically, they represent an abstraction which has shown that it has utility to
predict behavior outside of the test. For example, when analysing the IQ of a
person, the psychologists refer to an abstraction has no existence direct, material,
but which, however, is useful to predict educational achievement and other
outcomes.
Psychological testing must also have rules or standards. In general, an individual
test score is interpreted to compare it with the scores obtained by other individuals
in the same test. For this purpose, it is common that developers test standards - a
summary of the results of test of a large, group of people (Petersen, Kolen and
Hoover, 1989). The regulatory group is known as an example of standardization.
The selection and evaluation of the sample of standardization is crucial to the utility
of a test. This group should be representative of the population to which the test is
directed or, otherwise, will not be possible to determine the relative position of an
individual under review. In the extreme case where rules are not provided, the
examiner cannot be used in any way the results of the test. An exception to this
point the case evidence referring to criterion, which are discussed below.
Standards not only set an average performance, but will also serve to indicate the
frequency with which different high and low scores are obtained. Thus, the rules
allow for the examiner to determine the degree to which deviates a score with
respect to the expectations. Such information may be very important in the
prediction of the behaviour external to the proof of the individual under review. The
rules are such transcendental importance in the interpretation of evidence, that
them shall then be considered broadly in a separate section within the text.
Finally, the tests do not constitute an end in themselves. In general, the ultimate
purpose of a test is to predict behavior, other than those sampled directly in the
test. Thus, the examiner may have greater interest in external test behavior
predicted by the responses of test itself. Perhaps a concrete example will clarify
this point. Suppose that an examiner applied a test of spots of ink to a patient in a
psychiatric hospital and the patient responds to a spot of ink describing it as "eyes
looming quietly". Based on established standards, the examiner could predict then
that the subject is extremely suspicious and will get little benefit from individual
psychotherapy. The purpose of the test is to this and other predictions of similar not determine if the person perceives eyes him looking at her from stains.
The capacity of a test to predict external behavior is determined by a large body of
research of validation, most of which is carried out once the test has been
published, but there is no guarantee in the world of the psychometric research. It is
common that a researcher published a promising test, only to read later that other
researchers have found deficient. This is a lesson to those who employ the
evidence: the fact that a test exists and declare measure some characteristic, is no
guarantee of being truthful in his claims. A test can have a nice title, precise
instructions, elaborate standards, attractive packaging and preliminary findings, but
if dispassionate by independent researchers in the study, the test can not predict
appropriate external behaviors, then it is useless.
The main features of a test, which was summarized earlier, apply in particular to
those referring to the rule, which constitute much of the evidence in use. A test
referring to the standard, the performance of each reporting is interpreted with
reference to a sample of relevant standardisation (Petersen, Kolen and Hoover,
1989). However, these characteristics are less important in the special case of the
criterion concerning test, given that these instruments do not require comparison of
the individual in particular with a reference group. In this type of instrument, the aim
is to determine the position of the person evaluated with regard to educational
objectives defined very narrowly (Berk, 1984). For example, a part of a test of
arithmetic for children 10 years of age could measure the level of precision in the
sum of pairs of two-digit numbers. In a test without limit of time with 20 of those
problems, the precision would be almost perfect. For this type of test, in fact no
matter how compares the individual discussed with others of the same age, what
matters is if the individual satisfies a criterion appropriate and specific - for
example, 95% accuracy. Because there is no comparison with the standard-setting
performance of others, this type of measurement tool been called correctly test
referring to criterion, which, in contrast to the evidence relating to the standard, can
be interpreted in a meaningful way without reference to rules. Item 3A, standards
and standardization, will explore in greater detail these tests.
Another important distinction exists between the terms test and evaluation, which
are often considered equivalent, however, do not mean exactly the same.
Evaluation is a broader term, which refers to the process of gathering information
about a person and use it to predict the behavior. You can set as the valuation or
estimate of the magnitude of one or more attributes in a person. The evaluation of
the human characteristics implies observations, interviews, lists of verification,
inventories, projective tests, and other psychological tests. In short, the tests
represent only a source of information used in the evaluation process, in which the
examiner must compare and combine data from different sources. This is a
process subjective inherently, which requires the browser to select between the
conflicting information and make predictions based on the take shape full of data.
The term evaluation was invented during the second world war to describe a
program aimed at select men who highlight in the secret service, the Office of
Strategic Services (OSS Assessment Staff, 1948). The staff of psychologists and
psychiatrists of the OSS Massing a huge amount of information about the
candidates during four exhausting days of interviews, written tests and personality
tests. In addition, the evaluation process included a variety of tests on situations of
real life which relied on the awareness that there is a difference between
knowledge and be able to:
hicimos that the candidates will attempt, in fact, the tasks of either physical or verbal,
rather than just indicate in writing how they would be. Us urged to introduce realistic
evidence of capacity due to discoveries such as the following: men who get a high score in
understanding mechanics, a pencil and paper test, can be found - was below the average
when it comes to solving mechanical problems with his hands (OSS Assessment Staff)
(1948).
The situational evidence included group tasks for transporting equipment to the
other side of a stream and climb a wall of 3 m in height, s as the individual
scrutiny of the ability to survive a realistic interrogation and commanding two little
cooperative subordinates in a task of construction.
Based on behavioural observations and test results, the OSS personnel qualified
candidates in dozens of specific traits, in very broad categories such as leadership,
social relationships, emotional stability, effective intelligence and physical capacity.
These skills were used to select staff to military of the OSS.
TESTING TIP0S
The tests can be grouped broadly into two camps: Group and individual evidence.
Group tests are primarily measures of pen and paper, suitable for examining large
groups of people at the same time. The individual tests are instruments which, by
its design and purpose, must apply to a single person. An important advantage of
the latter is that the examiner can estimate the level of motivation of the subject
and evaluate the significance of other factors (p. ej., impulsivity, or anxiety) on test
results.
For convenience, the tests shall be classified in eight categories represented in
table 2-1. Each of them contains evidence concerning the standard, referring to
criterion, individual and group. The reader will notice that any type of testing is a
purely arbitrary determination. For example, could apply even another dichotomy:
evidence seeking to measure the maximum performance (e.g., an intelligence test)
against those who seek to estimate a typical response (e.g., a personality
inventory).
In a strict sense, there are hundreds - maybe thousands - of different types of
evidence, each of which is a slightly different aspect of the individual. For example,
you could discuss that even two intelligence tests would constitute different types
of measure. A test could reveal the assumption that the intelligence is a biological
construct that can be measured better through brain waves, while another could be
based on the traditional view that intelligence is exhibited in the capacity to learn
skills aculturadas as the vocabulary. Grouping both measures under the category
of intelligence tests is certainly exaggerated simplification but, however, it is a
useful starting point.
As seen in the first chapter, intelligence tests were originally designed to collect a
sample of a wide variety of skills, in order to estimate the overall intellectual level of
the individual. BinetSimon scales were successful, in part, because they
incorporated heterogeneous tasks, including definitions of words, designs,
questions of comprehension and spatial visualization tasks. Group intelligence
tests that flourished in such profusion during and after the second world war also
measured various capacities as exemplified the test Army Alpha with its eight
different sections that measure information, arithmetic, practical judgment and
reasoning, among other skills.
the satisfaction with the work. For example, if the person examined has the same
interests that successful and satisfied counter, is considered likely that you will
enjoy the work of an accountant. The assumption that interest patterns predict the
satisfaction with the work is largely confirmed through empirical studies, as it will
be reviewed in the 12th, assessment of interests and work values topic.
There are many types of behavioral procedures to evaluate the antecedents and
consequences of the conduct, including checklists, rating scales, interviews and
formal comments. These methods share a common assumption that the behavior
can be better understood in terms of characteristics defined in a clear way as
frequency, duration, antecedents and consequences. Behavioral procedures tend
to be highly pragmatic in the sense that they are usually interwoven with treatment
approaches.
Neuropsychological tests are used for the assessment of people that you suspect
or are known to have a brain dysfunction. Neuropsychology is the study of cerebroconducta relations. Over the years, the neuropsychlogists have discovered that
some tests and procedures are very sensitive to the effects of brain damage, and
these tests and specialized procedures used to make inferences about location,
extent and consequences of this damage.
Although tests and neuropsychological procedures are useful for reaching a
neurological diagnosis, its main purpose is to assess the strengths and
weaknesses, sensory, motor, cognitive and behavioral by a wide-ranging advanced
training, in order to understand the large amount of test result data.
USES OF EVIDENCE
In general terms, the most common use of psychological tests is to make decisions
about people. For example, educational institutions frequently used tests to
determine the levels of allocation for students and universities evaluated who
should admit, based partially on test scores. State public service systems, federal
and local also depend, largely, of the evidence for the purposes of personnel
selection.
Even the independent professional mainly used the evidence for decision-making.
Examples include the psychologist consultant that uses a personality test to
determine if a Police Department hires a candidate and not to another and the
Neuropsychologist which uses evidence to conclude that a client has suffered brain
damage.
But the simple decision-making process is not the only function of psychological
tests. It is convenient to distinguish five uses of the tests:
. Classification.
Diagnosis and treatment planning.
Self-knowledge.
Evaluation of programmes.
Research.
Often these applications overlap and, occasionally, it is difficult to distinguish from
one another. For example, a test that helps determine a psychiatric diagnosis could
also provide a form of self. These applications will be analysed in greater detail.
The classification term covers a variety of procedures that share a common
purpose: assign a person to a category rather than to another. Of course, the
assignment of categories is not an end in itself, but the basis for differential
treatment of some kind. Thus, the classification may have important effects, such
as grant or restrict access to a specific University or determine if hiring a person for
a job in particular. There are many and varied forms of classification, each of which
emphasizes a particular purpose in the allocation of people categories. It shall
distinguish among assignment, screening, certification and selection.
The allocation is the distribution of people in the different programmes appropriate
to their needs and abilities. For example, often universities use a review of
allocation in mathematics to determine if students should sign up for calculus,
algebra classes or a course of regularization.
Detection refers to tests or procedures quick and easy to identify people who may
have features or special needs. Typically, the psicmetras recognize that screening
will give many erroneous classification result. You are therefore advised to
examiners to perform follow-up with additional instruments testing, before making
important decisions based on screening tests. For example, to identify children with
an extremely exceptional talent in spatial thinking, a psychologist could apply a test
of pen and paper with a duration of 10 minutes, to all children in a school system.
You could then select students whose scores were in the top 10%, in order to apply
a broader test.
Certification and selection have both a quality of approved/disapproved. Pass a
certification examination confers privileges. Some examples include the right to
practice psychology or driving a car. Thus, typically, certification means that a
person has when less a minimum skill in any discipline or activity. The selection is
similar to the certification that confers privileges, such as the opportunity to attend
a university or get a job.
Another use of psychological tests is the diagnosis and treatment planning. The
diagnosis consists of two interlinked tasks: identify the nature and source of the
abnormal conduct of a person and classify the pattern of conduct within the
accepted diagnostic system. Usually the diagnosis is precursor of the remedy or
treatment of personal distress or poor execution.
Often, the psychological tests play an important role in the diagnosis and treatment
planning. For example, intelligence tests are absolutely essential in the diagnosis
of mental retardation. Personality tests are useful for diagnosing the nature and
degree of emotional disorders. In fact, some evidence, such as the MMPI, are
designed with the explicit purpose of enhancing the effectiveness of psychiatric
diagnosis.
The diagnosis should be more than a mere classification, rather than the
assignment of a label. Appropriate diagnosis transmits information - strengths,
weaknesses, etiology and best options for regularization! treatment. Knowing that a
child has received a diagnosis of learning disabilities is useless in general terms;
but knowing also that the same child is in a much lower level in reading
comprehension, is distracted easily and needs help with basic Phonetics, it can
provide an indispensable basis for the planning of treatment.
Psychological tests may also provide a powerful source of self-knowledge. In some
cases, the feedback you receive a person's psychological testing can change their
profession or alter the course of his life. Of course, not all situations of
psychological testing provide a self-knowledge. Perhaps in most cases the client
already knows what will reveal the test results. A college student with a high
functioning, rarely is surprised to discover that his IQ is in the upper range. An
architect is not disconcerting to hear that it has excellent spatial reasoning skills. A
student with a limited capacity for reading, is not usually surprised to receive a
diagnosis of "learning disability".
Another use of psychological tests is the evaluation of educational and social
programmes. You will say more on the subject of the evaluation of educational
programs when analysing the evidence of use in a later chapter. We shall here
confine ourselves in the use of tests for the evaluation of social programmes, which
are designed to provide services that improve the social conditions and community
life. For example, project Head Start is a program with federal funds that supports
national projects of preschool education for disadvantaged children (Cicerelli,
1969;) (McKey and collaborators, 1985). Released in 1965 as an attempt to set
which also can swing independently of the room. The task of the subject is his
body to a position that is perceived as straight. The subjects which depend on the
field aligned in some way their bodies with respect to the room, rather than in
perceived gravity. The independent field subjects are less affected by poorly
aligned room and are more in tune with their internal perceptual signals; in other
words, their perceptual judgments are relatively independent of distorted visual
information. The HISI inspired a lifetime of research on the development of the
personality, but on rare occasions was applied to a practical problem of proof.
SUMMARY
Generally speaking, the initial results of the tests is a score that is natural as the
total number of claims of personality endorsed in a direction in particular or the
total number of problems solved correctly, perhaps with the addition of points
eligible for quick solutions. In most cases, this initial punctuation is useless in itself.
So that test results are meaningful, the examiners must be able to convert the
initial score to some form of derived score based on the comparison with a
standard-setting group or standardization. Much of the evidence is interpreted by
comparing the individual results with the performance of the policy group; tests
relating to criterion, which is discussed later, are an exception.
A normative group consists of a sample of examined people who are
representative of the population which is heading the test. Consider a test of
knowledge of vocabulary, designed for use with future college students from first
grade. In this case, could collect the results of the performance of a large,
heterogeneous, and national sample of such persons, for standardization
purposes. The essential objective of the standardization of a test is to determine
the distribution of natural scores in a group policy, so that those who developed the
test should be able to publish derived scores known as standards. As you will be
analysed more below, the rules are found in many varieties; for example, percentile
ranks, equivalent by age, grade equivalents or standard scores. In general, the
rules indicate the position which has an individual within the test, in connection with
the performance of other persons of the same age, grade, gender and other
variables.
To be effective, standards must be obtained with great care and be constructed
according to the well-known precepts which are discussed later; What is more, can
become outdated in just a few years, so that the rule, rather than the exception,
must be the periodic establishment of new standards (e.g. case 3-1). The issue of
standards focuses indirectly, first is gives the reader an analysis on natural scores
and then reviewed the statistical concepts essential to an understanding of the
rules.
SCORES NATURAL
The most basic level of information provided by a psychological test is the natural
punctuation. For example, in a personality test, often the natural score is the
number of questions answered at the address encoded for a specific scale.
Capacity tests, typically natural score consists of the number of correctly answered
problems, which often adding the items eligible for fast performance. Thus, the
result initial test is almost always a numeric addition, 17 total items 44 answered in
address encoded on a scale of depression, or 29 of 55 points of natural score
obtained in the subscale of design with cubes of an intelligence test.
However, it should be obvious to the reader that the natural scores alone are
absolutely meaningless. For example, what does knowing that a person solved 12
of 20 questions of abstract reasoning correctly? What does mean that a discussion
has responded at the address encoded 19 of 33 questions from verdadero-falso of
a scale of psychological disposition?
It is even difficult to think about these questions without resorting to comparisons in
a variety or another. You want to know how in that other people have responded to
these tests, if observed scores are high or low in comparison with a representative
group of subjects. In the case of evidence of ability, is curious to know if the
questions were easy or difficult, particularly with regard to the age of the subject.
Indeed, it seems almost trivial that a natural score acquires meaning mainly in
relation to standards, a frame of reference established independently derived from
a sample of standardization. More about the derivation and use of standards will
expand later. For now just know that standards are set in empirical way, through
the application of the test to a large and representative of people sample. Then
compare the score of the examinee with the distribution of scores for the sample of
standardization. Thus, is determined from the rules, if a score is low, average or
high.
Much of the psychological tests is interpreted through the consultation of
standards; as already noted, these instruments are called evidence relating to the
rule. However, reminds the reader that there are other types of instruments. In
particular, the evidence relating to criteria help determine if a person can achieve a
criterion objectively defined as the sum of pairs of numbers with two digits with a
97% accuracy. In the case of the evidence concerning criterion, the rules are not
essential.
There are different types of rules, but they have one thing in common: each one
includes a summary statistical a huge set of scores. So, to understand them, the
reader need to master the elementary descriptive statistics. At this point becomes a
modest break to review the key statistical concepts.
Suppose for the moment that you have access to a test of vocabulary of high-level,
appropriate to examine the verbal skills of university professors and other
professionals (Gregory and Gernert, 1990). The test is a questionnaire of multiple
with 30 difficult words like firmament, paradisiacal and melifluo option. A teacher
takes the test and choose the option right for 17 of the 30 words. Question how
compares your score to others with the same academic level. How might it respond
to your question?
A way of responding to the question would be to give a list of natural scores from
the preliminary sample of standardization with 100 teachers representative of his
University (table 3-1). However, even with this relatively small normative sample
(typically thousands of individuals), the list of test scores is an excessive
deployment.
When he confronts us with a set of quantitative data, the natural human tendency
is to the summarize, condense and organize data into meaningful patterns. For
example, in the assessment of the meaning of the score of vocabulary of the
teacher, the reader could calculate the average score of all the sample or set up
the relative position of the punctuation of the teacher (17 correct answers) among
the 100 data found in table 3-1. These and other approaches to the Organization
and summary of quantitative data will be reviewed in the following sections.
DISTRIBUTION OF FREQUENCIES
The graphs shown in figures 3-1 are Visual summaries of 100 data in natural
punctuation of the sample of teachers. In addition to these Visual summaries, it is
also possible to produce numerical summaries by calculating the indices of central
tendency and dispersion.
Can you set a single representative of 100 scores of vocabulary score in our
sample? The average arithmetic average or (X) is one of those scores. He is
calculated by summing all the scores and dividing them between N, the number of
scores. Another useful index of central tendency is the median, the score found in
half when you have ordered all the scores. If the number of scores is even, the
median is the average of the two scores by half. In any case, the median is the
point which divides in two distribution, so that half of the cases are above and half
below. Finally, the fashion is simply punctuation that occurs most often. If two
scores have increased frequency of occurrence, we say that the distribution is
bimodal.
The scores listed in table 3-1 average 16.8; the median and fashion are 17. In this
case, the three measures of central tendency are very good consistency. However,
not always the case as well. The media is sensitive to outliers and may be
misleading if a distribution has few unusually high or low scores. Consider the
extreme case where nine others win $10 000.00 and a tenth person WINS $910
000.00. The median for this group of income would be $100 000; However, this
level of income is not typical of anyone within the group. The median income,
placed in $10, 000, is much more representative. Of course, this is an extreme
example, but illustrates a point in general: If a distribution is skewed (i.e. is
asymmetric), the median is a better rate of the central tendency than the average.
MEASURES OF VARIABIUDAD
Two or more distributions of test scores may have the same average and, however,
it is possible that they differ greatly in the degree of dispersion of scores with
respect to the middle (Figure 3-2). To describe the degree of dispersion, a
statistical index that expresses the variability of scores in a distribution is
necessary.
The statistical index of variability that is used most frequently in a group of scores
is the standard, symbolized by deviation to and abbreviated of. From a conceptual
point of view, the reader needs to know that the of reflects the degree of dispersion
in a group of scores. If they are grouped closely about a core value of is small. In
fact, in the extreme case where all scores are identical, the is exactly zero. As a
group of scores more, disperses the of becomes bigger. For example, in Figure 3-2
distribution to would be of more large, (c) distribution, the smallest.
The deviation is, in simple terms, the square of the variance, named 2 root or
standard. The formula for the variance is
Where means "sum of", X represents each individual score, is the average of
the scores and N is the total number of scores. As the name suggests, the variance
is a measure of variability. However, in general, psychologists prefer to report the
standard deviation, which is calculated to obtain the square root of the variance. Of
course, the variance and deviation standard transmit exchangeable information one can be calculated from the other, to raise to the square (the standard deviation
for the variance) or get the square root (of variance to obtain the standard
deviation).
NORMAL DISTRIBUTION
to reduce its size? It is possible that, to new individuals are added to the sample,
the distribution of scores is seems increasingly a symmetrical curve, defined
mathematically and Bell, called the normal distribution (Figure 3-3).
Psychologists prefer a normal scores test distribution, even though many other
distributions are theoretically possible. For example, within the sphere of
possibilities is the rectangular distribution of scores of test, an equal number of
results in each class interval. In fact, many laymen might even prefer a rectangular
distribution of test scores, based on the equitable premise that individual
differences would be therefore less pronounced. For example, a greater proportion
of people would obtain scores in the upper range if the psychological tests comply
to a rectangular scores, rather than distribution to a normal distribution.
Then, why psychologists prefer a normal distribution of scores, even to the point of
selecting reactive test to help produce this type of distribution in the sample of
standardization? There are several reasons for this, including the statistical
considerations and empirical evidence. Here will be a brief parenthesis to explain
the fascination with psychometric with normal distributions.
One reason why psychologists prefer the normal distributions is that the normal
curve features mathematical tools that form the basis for various types of statistical
research. Suppose that there is interest in determining if the average of two groups
of people CI were significantly different. It would be appropriate to use t an
inferential statistics as proof for the difference between socks. However, many
statistical inference is based on the assumption that the underlying population of
scores is distributed either normal or very close to this. Thus, in order to facilitate
the use of statistical inference, psychologists prefer that scores of test in the
normal population will follow normal or nearly normal distribution.
Another basis for preferring the normal distribution is its mathematical precision.
Given that the normal distribution is defined precisely in mathematical terms, it is
possible to calculate the area under the different regions of the curve with great
accuracy. Thus, a useful property of normal distributions is that the percentage of
cases that fall within a certain range or beyond a certain value is known exactly.
For example, in a normal distribution, only 2.14% of the scores will exceed the
average in two standard deviations or more (Figure 3-3). In the same way, you can
determine that most of the scores - more than 68% - fall within the range of one of
a split of the media, in any direction.
A third basis for preferring a normal distribution of test scores is that, often, the
normal curve arises spontaneously in nature. In fact, the early researchers were
impressed both with the universality of the normal distribution, which devoted to the
normal curve as a law of nature. Galton (1888) wrote:
It is the Supreme Law of the injustice. Each time that controlled a large sample of
chaotic elements and are ordered according to their magnitude, are constantly
having latent an unsuspected and beautiful forms of regularity.
There is certainly not a "law of nature" to the form that must assume the frequency
distributions. However, it is true that many important human characteristics both
physical and mental are a close approximation to the normal curve when they
plotted samples large and heterogeneous measures. For example, a well-known
find is a curve of almost normal distribution for the physical characteristics such as
weight, height and size of the brain at the time of the birth (Jensen, 1980).
It is also an approximately normal distribution in the case of numerous mental
tests, even with those that were built entirely without reference to the normal curve.
To illustrate this point, will be referred to the first tests designed before the fixing
current psychometric with the normal distribution. Wechsler (1944) chose mainly,
the items of the scale of intelligence WechslerBellevue original basis, in the variety
of types of reactive, without paying attention to the resulting distribution of scores.
In fact, it was considered that the belief that mental measures must be distributed
by themselves, according to the normal curve was "wrong". However, when plotted
the distribution of CI totals of his test, the predictable almost normal distribution
(Figure 3-4). Lindvall (1967) found the same when it drew the graph of the data on
the Pintner Ability Test (test capacity Pintner) in 1923. Therefore, is that even in the
absence of psychometric settings, the distribution of scores of a mental test
samples of standardization approaches typically a normal curve.
ASYMMETRY
grouped at the top (negative asymmetry), it is likely that the test contains very few
items difficult to achieve effective discrimination at this end of the scale.
When the initial research indicates that an instrument produces skewed results in
the sample of standardization, usually, authors reformed the test at the level of the
items. The most direct solution is to add items or modify existing ones, so that the
test is more reactive easy (to reduce the positive asymmetry) or more difficult (to
reduce the negative asymmetry). If it is too late to revise the instrument, the author
of the test can be used a statistical transformation to help produce a more normal
distribution of scores (see later). However, the preferred strategy is to review the
evidence, so that the asymmetry is minimal or non-existent.
Give meaning to the test results is largely a matter of transforming natural scores in
most interpretable and useful information. In the previous analysis on normal
distributions, hinted the transformations to show the way in which knowledge about
the media and the diversion standard of these distributions can help to determine
the relative position of an individual score. In this section it will continue with this
topic in a more direct manner, to submit the formal requirements for various types
of transformations of natural scores.