Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1. EVALUATION
1.1. Definition
1
https://en.oxforddictionaries.com/definition/evaluation
2
https://evaluationcanada.ca/what-is-evaluation
3
http://www.qualityresearchinternational.com/glossary/evaluation.htm
1.2. Purpose of Evaluation
In generally, we need to be able to make a good decision in any aspect in our life. For
example, when you are confused to choose between two university to continuous your
study, in this case you need to do an evaluation where you have to start by analysis,
searching and collecting data about both of the university, then making the right decision to
choose the university that you will take. From the example, we can get that the general
purpose of evaluation is to make a good and accurate decision. Eventually, any student is
hoped to be able to make a good decision.
The purposes of evaluation in teaching learning proses are:
To ensure the teaching is meeting student’s leaning needs
To identify areas where teaching can be modified/improved
To provide feedback and encouragement to teacher and the faculty
To support application for promotion and career development.4
According to Doni, Sindu, and Bg Phalguna in their book Evaluasi Pendidikan the
purpose of evaluation is divided in two kinds; general purpose and special purpose.
a. General purpose
To get evidence that can be instruction how far the ability and the success of the
students in the purpose of curricular after they through the learning process in the
time that has been giving.
To measure and assess how far the effectiveness of teaching and method that
have been doing by the teacher and the learning process that have been doing by
the students.
b. Special purpose
To make students have a volition to repair and increase their achievement.
To search and get the cause of effective and the ineffectiveness of students in
learning process to find the way to repair it.
4
http://www.meddent.uwa.edu.au/teaching/faculty-evaluation/why-evaluate
1.3. Type of evaluation
5
https://cyfar.org/different-types-evaluation
Focuses on the To decide Did your
Outcomes changes in whether participants
comprehension, program/activit report the desired
attitudes, y affect changes after
behaviors, and participant’s completing a
practices that result outcomes program cycle?
from programs What are the
activities To establish short or long
Can include both and measure term results
short and long term clear benefits observed among
results of the program (or reported by)
participants?
Following Vedung (1997) and Foss Hansen (2005) we can schematize the theoretical
mainstream in the following way:
Source: Vedung (1997), “Public Policy and Program Evaluation”, Transaction Publisher.
Each one of these has, obviously, different purpose and present advantages
and disadvantages according to the object of evaluation.
Result models focus on the results of a given performance, program or
organization and they inform on whether the goals have been realized or
not and on all the possible effects of the program, both foreseen and
unforeseen. There are at least two distinct methodologies, reflecting distinct
methodological principles: goal-bound and goal-free procedures. Broadly
speaking, goal-bound evaluation is focused on the relative degree to
which a given product effectively meets a previously specified goal, while
goal-free evaluation measures the effectiveness of a given product
exclusively in terms of its actual effects the goals and motivations of the
producer are ignored. Each approach has relative advantages and
disadvantages. On the one hand, goal-bound evaluation is ordinarily more
cost-effective than goal-free evaluation; on the other hand, measuring
effectiveness entirely in terms of the degree to which stated goals are met
can have at least two undesirable consequences: (a) since effectiveness is,
on this model, inversely proportional to expectations, effectiveness can be
raised simply by lowering expectations, and (b) deleterious or otherwise un-
wanted effects, if any, are left out of account, while unintended benefits, if
any, go unnoticed.
https://www.tillvaxtanalys.se/download/18.1af15a1f152a3475a818975/1454505626167/
Evaluation+definitions+methods+and+models-06.pdf
Economic models, on the other hand, test whether program’s
productivity, effectiveness and utility have been satisfactory in terms of
expenses. Cost analysis is currently a somewhat controversial set of
methods in program evaluation. One reason for the controversy is that
these terms cover a wide range of methods, but are often used
interchangeably. Whatever position an evaluator takes in this controversy, it
is good to have some understanding of the concepts involved, because the
cost and effort involved in producing change is a concern in most impact
evaluations (Rossi & Freeman, 1993).
With all of these strategies to choose from, how can an evaluator decide?
De- bates that rage within the evaluation profession are generally battles
between these different strategists, with each claiming the superiority of
their position; but most of the recent development in the debate have
focused on the recognition that there is no inherent incompatibility between
these broad strategies and each of them brings something valuable to the
evaluation table, attention has therefore increasingly turned to how one
might integrate results from evaluations that use different strategies, carried
out from different perspectives, and using different methods. Clearly, there
are no simple answers here. The probleme are complex and the
methodologies needed will and should be varied.
1.5.Steps of evaluation
Buchori (1972) in Zalili Sailan (2016) there are 5 steps of evaluation, they are6:
1. Planning, in this step the teacher will determine the purpose of evaluation, the aspect
that will be assessed, the method that will be used, Preparation of assessment tools,
and determine the time.
2. Collecting data, the teacher should collect the data by do an assessment,
examination of results and scoring.
3. Management data, the results of assessments made with statistical and non-statistical
techniques, depending on the type of data obtained that is quantitative or qualitative.
4. Interpretation, the teacher will interpret the results of data management activities by
basing themselves on certain norms.
5. Using the assessment result, the teacher will use the assessment result that have
done and interpret according to the purpose to do it as a function to improving the
learning proses, rectify the difficulties of learning students, improving the tools of
evaluation and making an evaluation report (rapor).
6
Zalili Sailan, Teknik Evaluasi Hasil Belajar Bahasa dan Sastra Indonesia, (Kendari: Metro Grapha,
2016),hlm. 14-15.
2. ASSESSMENT
2.1. Definition
Assessment is a cyclic process used to identify areas for improvement of student learning
and to facilitate and validate institutional effectiveness. The Higher Learning Commission
offers the following formal definition: Assessment is the systematic collection,
examination, and interpretation of qualitative and quantitative data about student learning,
and the use of that information to document and improve student learning. Assessment is
not an administrative activity, a means of punishment, an intrusion into a faculty member’s
classrooms, or an infringement of academic freedom8.
Assessment has a different meaning to the evaluation. The Task Group on Assessment and
Testing (TGAT) described the assessment as all the methods used to assess the performance
of the individual or group9. Popham (1995: 3) defines assessment in the context of
education as a formal attempt to determine the status of the student regard to the interests of
education10. Boyer & Ewel define assessment as a process that provides information about
individual students, about curriculum or program, the institution or everything related to
institutional system11. "Processes that provide information about individual students, about
curricula or programs, about institutions, or about entire systems of institutions "(Stark &
Thomas, 1994: 46)12. “Assessment is the action or an instance of making a judgment
about something: the act of assessing something” (Merriam Webster: since 1828).
Assessment is the process of gathering and discussing information from multiple
and diverse source in order to develop a deep understanding of what students
know, understand, and can do with their knowledge as a result of their educational
experiences; the process culminates when assessment result are used to improve
subsequent learning (Weimer, 2002). Based on the various descriptions of the above can
concluded that the assessment or assessment can be defined as activities interpreting the
data presented.
7
Handbook of Assessment from Stark State College of Technology, Revision 2010. Page 3.
8
Handbook of Assessment. Stark State College of Technology. 1960.Page 6
9
National Curriculum: Task Group on Assessment and Testing. 1998. Page 3.
10
http://makalahlaporanterbaru1.blogspot.co.id/2012/11/makalah-language-testing.html
11
NASPAonline: Assessment Tips for Student Affairs Professionals. 2001. Page 1.
12
http://makalahlaporanterbaru1.blogspot.co.id/2012/11/makalah-language-testing.html
2.2. Purpose of Assessment
Purpose One Communication
Assessment can be seen as an effective medium for communication between the
teacher and the learner. It is a way for the student to communicate their learning to
their teacher and for the teacher to communicate back to the student a commentary
on their learning. But to what end? To answer this, we offer the metaphor of
navigation. In order for navigation to take place—that is the systematic and
deliberate effort to reach a specific place—two things need to be known: (1)
where you are and (2) where you are going. This metaphor offers us the
framework to discuss assessment as communication—students need to know
where they are in their learning and where they are supposed to be going with their
learning. Each of these will be dealt with in (out of) turn.
Purpose Two: Valuing What We Teach
http://peterliljedahl.com/wp-content/uploads/Four-Purposes-of-Assessment1.pdf
Purpose Three: Reporting Out
There exists a significant societal assumption that one of the primary purposes of
assessment is to sort, or rank, our students. Most evident in this regard, is the
requirement to assign an aggregated letter grade (sorting) and/or a percentage
(ranking) to represent the whole of a student's learning. However, there is a
much subtler and more damaging indicator of this assumption—equitability. That
is, there is an expectation that all students are to be assessed equally. Otherwise,
how can any sorting and/or ranking be considered accurate?
Assessment is not an end in itself but a vehicle for educational improvement. Its effective
practice, then, begins with and enacts a vision of the kinds of learning we most value for
students and strive to help them achieve. Educational values should drive not only what we
choose to assess but also how we do so. Where questions about educational mission and
values are skipped over, assessment threatens to be an exercise in measuring what’s easy,
rather than a process of improving what we really care about.
Learning is a complex process. It entails not only what students know but what they can do
with what they know; it involves not only knowledge and abilities but values, attitudes, and
habits of mind that affect both academic success and performance beyond the classroom.
Assessment should reflect these understandings by employing a diverse array of methods,
including those that call for actual performance, using them over time so as to reveal
change, growth, and increasing degrees of integration. Such an approach aims for a more
complete and accurate picture of learning, and therefore firmer bases for improving our
students’ educational experience.
c. Assessment works best when the programs it seeks to improve have clear,
explicitly stated purposes.
Information about outcomes is of high importance; where students “end up” matters greatly.
But to improve outcomes, we need to know about student experience along the way. We
need to know about the curricula, teaching, and the kind of student effort that led to
particular outcomes. Assessment can help us understand which students learn best under
what conditions; with such knowledge comes the capacity to improve the whole of their
learning.
e. Assessment works best when it is ongoing, not episodic.
Assessment is a process whose power is cumulative. Though isolated, “one‐shot”
assessment can be better than none, improvement over time is best fostered when
assessment entails a linked series of cohorts of students; it may mean collecting the same
examples of student performance or using the same instrument semester after semester. The
point is to monitor progress toward intended goals in a spirit of continuous improvement.
Along the way, the assessment process itself should be evaluated and refined in light of
emerging insights.
g. Assessment makes a difference when it begins with issues of use and illuminates
questions that people really care about.
Assessment alone changes little. Its greatest contribution comes on campuses where the
quality of teaching and learning is visibly valued and worked at. On such campuses, the
push to improve educational performance is a visible and primary goal of leadership;
improving the quality of undergraduate education is central to the institution’s planning,
budgeting, and personnel decisions. On such campuses, information about learning
outcomes is seen as an integral part of decision making, and avidly sought.
13
(These principles were developed under the auspices of the AAHE Assessment Forum with support from
the Fund for the Improvement of Postsecondary Education with additional support for publication and
dissemination from the Exxon Education Foundation. Copies may be made without restriction. The authors
are Alexander W. Astin, Trudy W. Banta, K. Patricia Cross, Elaine El‐Khawas, Peter T. Ewell, Pat Hutchings,
Theodore J. Marchese, Kay M. McClenney, Marcia Mentkowski, Margaret A. Miller, E. Thomas Moran, and
Barbara D. Wright).
2.4. Types of Assessment
The term assessment is generally used to refer to all activities teachers use to help students
learn and to gauge student progress. Though the notion of assessment is generally more
complicated than the following categories suggest, assessment is often divided for the sake
of convenience using the following distinctions:
• State assessments
• Scores that are used for accountability of schools (AYP) and students (report card
grades).
b. Formative assessment
Formative assessment is generally carried out throughout a course or project.
Formative assessment, also referred to as "educative assessment," is used to aid
learning. In an educational setting, formative assessment might be a teacher (or
peer) or the learner, providing feedback on a student's work, and would not
necessarily be used for grading purposes. Formative assessments can take the form
of diagnostic, standardized tests.
Formative assessments are generally low stakes, which means that they have
low or no point value. Examples of formative assessments include asking students
to:
draw a concept map in class to represent their understanding of a topic
submit one or two sentences identifying the main point of a lecture turn in a
research proposal for early feedback
https://www.amle.org/BrowsebyTopic/WhatsNew/WNDet/TabId/270/ArtMID/888/ArticleI
D/286/Formative-and-Summative-Assessments-in-the-Classroom.aspx
3. Informal and Formal.
Assessment can be either formal or informal. Formal assessment usually implies a
written document, such as a test, quiz, or paper. A formal assessment is given a
numerical score or grade based on student performance, whereas an informal
assessment does not contribute to a student's final grade such as this copy and pasted
discussion question. An informal assessment usually occurs in a more casual manner
and may include observation, inventories, checklists, rating scales, rubrics, performance
and portfolio assessments, participation, peer and self-evaluation, and discussion.
4. Interim assessments
Interim assessments are used to evaluate where students are in their learning
progress and determine whether they are on track to performing well on future
assessments, such as standardized tests, end-of-course exams, and other forms of
“summative” assessment. Interim assessments are usually administered periodically
during a course or school year (for example, every six or eight weeks) and separately
from the process of instructing students (i.e., unlike formative assessments, which are
integrated into the instructional process).
5. Placement assessments
Placement assessments are used to “place” students into a course, course level, or
academic program. For example, an assessment may be used to determine whether a
student is ready for Algebra I or a higher-level algebra course, such as an honors-level
course. For this reason, placement assessments are administered before a course or
program begins, and the basic intent is to match students with appropriate learning
experiences that address their distinct learning needs.
6. Screening assessments
Screening assessments are used to determine whether students may need specialized
assistance or services, or whether they are ready to begin a course, grade level, or
academic program. Screening assessments may take a wide variety of forms in
educational settings, and they may be developmental, physical, cognitive, or academic.
A preschool screening test, for example, may be used to determine whether a young
child is physically, emotionally, socially, and intellectually ready to begin preschool,
while other screening tests may be used to evaluate health, potential learning
disabilities, and other student attributes.
https://www.edglossary.org/assessment/
2.5 Assessment Methods
Director
Indirect
Method Description
Data
Surveying program alumni can provide information about
Program satisfaction, preparation (transfer or
workforce), employment status, skills for success.
Alumni Survey Indirect
Surveys can ask alumni to identify what should be
changed, altered, maintained, improved, or expanded.
A capstone project or course integrates knowledge,
concepts and skills that students are to have acquired
Capstone during the course of their study. Capstones provide a
Project or means to assess student achievement across a discipline. Direct
Course
of student learning
Reflective essays can be used as an assessment
Reflective method to determine student understanding of course Direct/
Student content and/or issues as well as students’ opinions and Indirec
perceptions t
Essays
Rubrics/scoring guides outline identified criteria for
https://www.wssu.edu/about/assessment-and
research/niloa/_files/documents/assessmentmethods.pdf
3. MEASUREMENT
3.1. Definition
Measurement, beyond its general definition, refers to the set of procedures and the
principles for how to use the procedures in educational tests and assessments. Some of the
basic principles of measurement in educational evaluations would be raw scores, percentile
ranks, derived scores, standard scores, etc.
Allen & Yen defines measurement as imposing figure in a way that systematic way to
declare an individual18. Thus, the essence of the measurement is the quantification or
determination of the number of characteristics or circumstances of the individual according
to certain rules. State this could be an individual's cognitive, affective and psychomotor.
Measurement has a broader concept of the test. We can measure the characteristics of an
object without using tests, such as observation, rating scales or another way to obtain
information in the form of quantitative.
14
(Cangelosi, 1995: 21)
15
(Oriondo, 1998: 2)
16
(Griffin & Nix, 1991: 3)
17
(Ebel & Frisbie. 1986: 14)
18
(Djemari Mardapi, 2000: 1)
3.2 Measurement Scales: Traditional Classification
The word nominal is derived from nomen, the Latin word for name. Nominal
scales merely name differences and are used most often for qualitative variables in which
observations are classified into discrete groups. The key attribute for a nominal scale is
that there is no inherent quantitative difference among the categories. Sex, religion, and
race are three classic nominal scales used in the behavioral sciences. Taxonomic
categories (rodent, primate, canine) are nominal scales in biology. Variables on a
nominal scale are often called categorical variables.
Ordinal scales rank-order observations. Class rank and horse race results are
examples. There are two salient attributes of an ordinal scale. First, there is an
underlying quantitative measure on which the observations differ. For class rank, this
underlying quantitative attribute might be composite grade point average, and for horse
race results it would be time to the finish line. The second attribute is that individual
differences individual on the underlying quantitative measure are either unavailable or
ignored. As a result, ranking the horses in a race as 1st, 2nd, 3rd, etc. hides the information
about whether the first-place horse won by several lengths or by a nose.
There are a few occasions in which ordinal scales may be preferred to using a
quantitative index of the underlying scale. College admission officers, for example, favor
class rank to overcome the problem of the different criteria used by school districts in
calculating GPA. In general, however, measurement of the underlying quantitative
dimension is preferred to rank-ordering observations because the resulting scale has
greater statistical power than the ordinal scale.
3.2.3. Interval Scales
In ordinal scales, the interval between adjacent values is not constant. For example,
the difference in finishing time between the 1st place horse and the 2nd horse need not the
same as that between the 2nd and 3rd place horses. An interval scale has a constant interval
but lacks a true 0 point. As a result, one can add and subtract values on an interval scale, but
one cannot multiply or divide units.
Temperature used in day-to-day weather reports is the classic example of an interval
scale. The assignment of the number 0 to a particular height in a column of mercury is an
arbitrary convenience apparent to everyone anyone familiar with the difference between the
Celsius and Fahrenheit scales. As a result, one cannot say that 30o C is twice as warm as 15o C
because that statement involved implied multiplication. To convince yourself, translate these
two into Fahrenheit and ask whether 86o F is twice as warm as 50o F.
Nevertheless, temperature has constant intervals between numbers, permitting one to
add and subtract. The difference between 28o C and 21o C is 7 Celsius units as is the
difference between 53o C and 46o C. Again, convert these to Fahrenheit and ask whether the
difference between 82.4o F and 69.8o F is the same in Fahrenheit units as the difference
between 127.4o F and 114.8o F?
3.2.4 Ratio Scales
A ratio scale has the property of equal intervals but also has a true 0 point. As a result,
one can multiply and divide as well as add and subtract using ratio scales. Units of time
(msec, hours), distance and length (cm, kilometers), weight (mg, kilos), and volume (cc) are
all ratio scales. Scales involving division of two ratio scales are also themselves ratio scales.
Hence, rates (miler per hour) and adjusted volumetric measures (mg/dL) are ratio scales. Note
that even though a ratio scale has a true 0 point, it is possible that the nature of the variable is
such that a value of 0 will never be observed. Human height is measured on a ratio scale but
every human has a height greater than 0. Because of the multiplicative property of ratio
scales, it is possible to make statements that 60 mg of fluoexetine is three times as great as 20
mg.
http://psych.colorado.edu/~carey/Courses/PSYC5741/handouts/Measurement%20Scales.pdf
3.3. The function of Measurement
a. Instructional
1. Principal (basic purpose)
To determine what knowledge, skills, abilities, habits and attitudes have been
acquired.
To determine what progress or extent of learning attained.
To determine strengths, weaknesses, difficulties and needs of students.
2. Secondary (auxiliary functions for effective teaching and learning)
To help in study habits formation
To develop the effort-making capacity of students
To serve as aid for guidance, counselling, and prognosis
b. Administrative/supervisory
To maintain standards
To classify or select for special purposes
To determine teachers efficiency, effectiveness of methods, strategies used (strengths,
weaknesses, needs); standards of instruction
To serve as basis or guide for curriculum making and developing
The Differences among Test, Measurement, and Evaluation :
William Wiersma and Stephen G Jurs (1990) in their book “Educational Measurement and
Testing” remark that the terms of Testing, measurement, assessment and evaluation are used with
similar meanings but they are not synonymous.
Test:
Major aspects of their definition are: i) the presentation of standard set of tasks ii) the student
performs the task of performing iii) the test to be taken independently iv) measure of the learner’s
characteristics v) a quantitative comparison of the performance vi) a technique of verbal
description vii) test classification yields quantitative results.
Measurement
Measurement in this manner assures the extent or quantity of something. It has an intimate
relationship with human beings. It is so closely related that it is rather difficult to say in which
aspect of our life it does not exist. It measures the height, weight and age of child. Examiners
measure the intelligence, abilities in various fields of the examinees. Some of these measurements
are physical. Physical measurement is direct, simple and very accurate, Psychological and
educational measurements are complex, for they cannot be measured through the system of
physical measurement. The measurement of intelligence is expressed in terms of the Intelligence
Quotient (IQ), that of scholastic achievement in marks or in grades. Generally, there are three
types of measurement: (i) Direct; (ii) Indirect; and (iii) Relative.
Evaluation
Evaluation includes measurement. It contains the notion of a value judgment. The important
stage in the process of gathering, using all the related relevant and correct information is that of
evaluation. ‘Evaluation is a process of making a value judgement.’
(http://shodhganga.inflibnet.ac.in/jspui/bitstream/10603/134363/13/10_chapter3.pdf)
For the purpose of schematic representation, the three concepts of evaluation, measurement
and testing have traditionally been demonstrated in three concentric circles of varying sizes.
This is what Lynch (2001) has followed in depicting the relationship among these concepts.
The purpose of this representation is to show the relationship between superordinate and
subordinate concepts and the area of overlap between them. Thus, evaluation includes
measurement when decisions are made on the basis of information from quantitative methods.
And measurement includes testing when decision-making is done through the use of “a
specific sample of behavior” (Bachman 1990). However, the process of decision-making is by
no means restricted to the use of quantitative methods as the area not covered by measurement
circle shows. Also, tests are not the only means to measure individuals’ characteristics
as there are other types of measurement than tests, for example, measuring an individual’s
language proficiency by living with him for a long time.
http://drjj.uitm.edu.my/DRJJ/OBE%20FSG%20Dec07/OBEJan2010/DrJJ-Measure-assess-
evaluate-ADPRIMA-n-more-17052012.pdf
NOTE :
Members of group :