Makala H

DEVELOPMENT OF ASSESSMENT INSTRUMENT OF COGNITIVE PROCESS
AND PRODUCT DIMENSIONS FOR BIOLOGY IN GRADE XI OF THE SENIOR

HIGH SCHOOL
Paidi1), Siti Yulaikah2) Isti'farin3), Dessy Alfindasari4), Rabiatul Adawiyah5)

1
Faculty of Mathematics and Sciences, State University of Yogyakarta
2,3,4,5
Biology Education Graduate Program, Yogyakarta State University
* Correspondence author. Email: paidi@uny.ac.id
Tel: +081228306051
Abstract
The purpose of this study was to develop an assessment instrument for Grade XI biology
based on the learning outcomes of the Anderson-Krathwohl cognitive process and product
dimensions that fit to the: (1) Rasch models and (2) test item difficulty levels. Respondents in
were students of Grade XI of the science program in the academic year 2015/2016 in four
locations, namely Padang (West Sumatra), South Jakarta, Madison, and Tenggarong (East
Kalimantan). Research steps included: (1) determining the subjects of the trial, (2) pilot
testing, and (3) analysis of the test items. Results show the following findings. First, the
reliability measure of the multiple-choice test is 0.98 and the matter test description is 0.93.
Second, of the multiple-choice test, the dimension that has the highest difficulty level is
conceptual (C3) application, whereas the dimension with the lowest difficulty level is
procedural (C1) recall. Third, of the essay-type test, the dimension that has the highest
difficulty level is creative (C6) factual, while the dimension that has the lowest difficulty
level is creative (C6) conceptual.
Keywords: instrument development, the of cognitive process and product dimensions, biology
learning, senior high school education
1. INTRODUCTION
The 21st century as a global era is marked by the development of science and
technology. Development in science and technology is actually a momentum to further
improve the quality of human resources, including those in Indonesia. Based on a study
by the United Nations Development Program in 2013, the Human Development Index of
Indonesia ranks 108 out of 187 countries. This indicates that the expectations in the live
index, education index, and gender balance index are still undervalued. The results show
that the Indonesian human resources need strengthening to face the global era, which is
understood as an era of competition.
One of the competences that need to be developed for the global era is associated with
cognitive or thinking skills. The theory of cognitive or thinking skills that are still used as
1
a reference to the practice of national education is that of Benjamin Samuel Bloom. In its
development, Bloom's taxonomy is revised by Anderson and Krathwohl in which the
cognitive domain is divided into two dimensions, i.e. cognitive processes and cognitive
products. The cognitive process dimensions are associated with the processes of the six
categories (C1-C6) expressed in the verbs remembering, understanding, applying,
analyzing, evaluating, and creating. Meanwhile, the cognitive product dimensions consist
of four categories, namely factual knowledge, conceptual knowledge, procedural
knowledge, and meta-cognitive knowledge.
Suggestions to develop thinking skills based on the cognitive process and product
dimensions are mentioned in the passing-grade standards of education in the 2013 Decree
of the Ministry of Education No. 54. Biology, one of the subjects studied in upper
secondary education, has a role in developing students ability to think through the
learning processes. the Ministry of Education decree, No. 22, year 2006, regarding
biology subject content standards, states that the group of subjects in science and
technology in the senior high school is intended to promote students further competences
in science and technology and cultivate scientific thinking critically, creatively, and
independently.
However, in reality, the orientation of the teaching of biology to develop students
thinking skills has not been achieved optimally. In her 2014 study regarding the problem
analyses the national examination in biology based on the revised Bloom's taxonomy of
the cognitive domain and the achievement profile of students of grade XII of the senior
high school, Nuryani Rustaman states that, for the 2010, 2011, and 2012 examinations,
the test items were developed mostly on the C2 (Understanding) cognitive process
category with the conceptual knowledge in addition to C1 (Remembering) and C3
(Applying) cognitive process categories. The average number of students who can answer
correctly on each item is equal to 37.08%, C3 items occupying the highest score. The
researcher suggested that the learners mastery of the material tested is still quite good
since it is relatively new. The material has just been studied in class since C3 matters are
generally taught in grade XII.
As already noted, massive efforts have been made to improve the mastery of
cognitive process and product dimensions of learners through a variety of socialization
and practices imposed by the government. However, in the processes, no analysis was not
carried out concerning the students levels of achievement on the cognitive process and
product dimensions. Therefore, this analysis is required. Through such an analysis,
2
information on students achievement of their cognitive dimensions can be obtained that
will help teachers design more effective and efficient learning.
Development of a test instrument is needed in order to obtain optimum results in
instructional processes related to students cognitive process and product dimensions.
This is because no such test instrument has been used to analyze these students
cognitive dimensions.
Thus, on the basis of the descriptions presented above, the need is felt to study the
dimensions of students cognitive process and product in biology based on Anderson and
Krathwohl. As an initial step, the development of this test instrument is carried out in a
limited number of regions in Indonesia. This is done by taking regional characteristics
into consideration. Results of such analysis are expected to improve evaluation of
classroom instructional material, in a narrow scope, and that of education policies, in a
broader scope.
2. MATERIAL AND METHOD
The study is developmental research in the development of a test instrument with

reference and modification of Oriondo Wilsona test development method. The test items
cover eleventh grade, second semester, instructional materials that include the excretory
system, the coordination system, the reproductive system, and the immune system.
The basic competencies in the curriculum for these materials read: (1) (3:9) to analyze
the relationship between network structure constituent organs of the excretory system and
link it with the process of excretion in order to be able to explain the mechanism as well
as malfunctioning that may occur in the human excretory system through the study of
literature, observation, experimentation, and simulation; (2) (3:10) to analyze the
relationship between network structure constituent organs of the coordination system and
associate it with the coordination process so as to explain the role of the nervous and
hormonal mechanisms of coordination and regulation as well as malfunctioning that may
occur in the human coordination system through the study of literature, observation,
experimentation, and simulation; (3) (3:11) to evaluate the self-understanding of the
dangers of the use of psychotropic substances and their impacts on the personal health,
environment, and society; (4) (3:12) to analyze the relationship between network structure
constituents with reproductive organ functions in the process of human reproduction
through the study of literature, observation, experimentation, and simulation; (5) (3.13)
to apply the understanding of the principles of human reproduction to face population
3
growth through family planning programs and exclusive breastfeeding; and (6) (3.14) to
apply the principles of the understanding of the immune system to improve the quality of
human life through immunization programs to maintain physiological processes in the
body.
The test instrument consisted of a written objective test, a multiple-choice test, and an
objective description test. The tests were compiled into two packages, namely: Package I
and Package II. Each package consisted of 30 multiple-choice items and 6 description
items. In each package, there were 20% anchor items or, in other words, 7 equalizer
items. In addition, an instrument for non-test metacognitive assessment was also
developed to analyze the students metacognitive abilities.
The study was conducted in four regions in Indonesia, namely: Padang (West
Sumatra), South Jakarta, Madison, and Tenggarong (East Kalimantan). In each area, three
senior high schools (SHS) were selected that represented high, medium, and low
reputation respectively as seen from the results of the national examination in the
previous year. All the schools were state-owned (SSHS). The schools were (1) SSHS 1
Padang, SSHS 2 Padang, and SSHS 15 Padang; (2) SSHS 8 Jakarta, SSHS 55 Jakarta,
and SSHS 97 Jakarta; (3) SSHS Nglames, SSHS 1 Mejayan, and SSHS 2 Mejayan; and
(4) SSHS 1 Tenggarong, SSHS 2 Tenggarong, and SSHS 3 Tenggarong. From the 12
schools, a total of 1,116 students were obtained as the research respondents. The research
procedure included: (1) the initial development of the test, (2) the test try-out, and (3)
analyses. The early development of the test included the following steps: (1) determining
the test objectives, (2) determining the competencies that will be tested, (3) determining
the test material, (4) preparing for the test grating, (5) item writing based on Anderson
and Krathwohl principle dimensions, (6) developing the scoring guidelines, (7) validating
the test items, and (8) revising and assembling the test item. Meanwhile, the test piloting
consisted of the following steps: (1) determining the subjects of the try-out, (2)
administering the try-out, and (3) analyzing the test results.
Data were obtained in the forms of subjects responses on the cognitive process and
product dimensions and the meta-cognitive non-test. For the instrument face validity, data
were subjected to expert judgments concerning the aspects of material, construction, and
language. Based on the inputs from the experts, the test was finalized to be ready for
piloting.
4
3. RESULTS AND DISCUSSION
Program QUEST is used to find the validity and reliability measures of the test. untuk
mengetahui validitas dan reliabilitas tes. The test validity of the Rasch model can be seen
from the item fit to the model. Using the 5% error limit, an item is said to fit the model if
the INFIT MNSQ score is between 0.77 and 1.30 and the INFIT t is between -2.0 and 2.0.
In addition, item curve characteristics (ICC) and information function graphs are
presented using the Bilog and Parscale programs. Meanwhile, for the meta-cognitive non-
tes, validity and reliability measures are obtained by way of the SPSS program.
The Rasch analyses show that 57 test items fit the model since they have the score
criteria for the INFIT MNQS between 0.77 and 1.30 as shown in Figure 1. This shows
that each item in the instrument is empirically valid for measuring students competencies
in cognitive process and product dimensions.
Figure 1. Diagram of INFIT MNSQ for the multiple choice items in the try-out phase
5
From Figure 1, it can be seen that the difficulty level of the instrument ranges
between -2 and +2, indicating that the test is good for use. Results of analyses for the
essay-type test itms can be seen in Figure 2.
Figure 2. Diagram of INFIT MNSQ for the description items

In this figure, the average score of the INFIT MNS is 0.99 with a standard deviation of
0.13. It can be stated that the essay-type test items have a fit with the Rasch model.
Meanwhile, to determine that an item has a fit to the model is to see that the INFIT MNSQ
is found between 0.77 and 1.30 and the INFIT t is between -2.0 and 2.0. Thus, it can be
stated that the test items have fulfilled the criteria for the goodness of fit.
The reliability measure of the multiple-choice test items is obtained by using the
QUEST program. The results of the analyses for reliability show a co-efficient of 0.98.
The estimation results of the reliability analyses are presented in Table 1 below.
Table 1 . Estimation Results for the Multiple-Choice Test Items
Aspect Item Case estimate

estimate
Reliability 0.98
The average value and standard 1.00 0.02 1.00 0.05
deviation of the INFIT MNSQ
The average value and standard 1.01 0.12 1.00` 0.16
deviation of the OUTFIT MNSQ
For the description test item, the QUEST program gives the reliability estimate value
of 0.93. The estimates are shown in Table 2 below.
6
Table 2. Estimation Results for the Essay Items
Aspect Item Case Estimate

estimate
Reliability 0,93
The average value and standard 1.00 0.24 1.32 0.64
deviation of the INFIT MNSQ
The average value and standard 1.09 0.39 1,09 1.19
deviation of the OUTFIT MNSQ
For the measures of the item difficulty levels, the QUEST gives scores as can be seen
in Figure 3. A test item is said to be good if the difficulty index is > -2.0 or <2.0. The
most difficult item is seen in the C3P aspect: cognitive process application and cognitive
product procedural knowledge. The easiest item is seen in the C1P aspect: cognitive
process recall and cognitive product procedural knowledge.
0.8
0.6
0.4
0.2
0
C1F C1K C1P C2F C2K C2P C3F C3K C3P C4F C4K C4P C5F C5K C5P
-0.2
-0.4
-0.6
-0.8
Figure3. Distribution of difficulty levels for the multiple-choice items
Notes
C1 F : Remember Factual C4 F : Analyze Factual
C1 K : Remember Conceptual C4 K : Analyze Conceptual
C1 P : Remember Procedural C4 P : Analyze Procedural
C2 F : Understand Factual C5 F : Evaluate Factual
C2 K : Understand Conceptual C5 K : Evaluate Conceptual
C2 P : Understand Procedural C5 P : Evaluate Procedural
C3 F : Apply Factual
C3 K : Apply Conceptual
C3 P : Apply Procedural
7
For the description test item, the distribution of the difficulty level of each category
can be found individually. Distribution of the difficulty level can be seen in Figure 4.
0.4
0.3
0.2
0.1
0
C6F C6K C6P
-0.1
-0.2
-0.3
Figure 4. Distribution Difficulty Test of Description item

Notes:
C6F : Create Factual
C6K : Create Conceptual
C6P : Create Procedural
Furthermore, based on the results of the analysis using the BilogMG program, the
item characteristic curve for each item is obtained. Figure 5 shows an example of the ICC
of item number 18.
Figure 5. Item Characteristic curve for Item Number 18
From Figure 5 above, it is understood that item number 18 can be done by learners
with the ability (b) or a high capacity, since the peak of the curve stands at 1.3.
Then, by using the Parscale program, each ICC obtains as many as 6 pieces. Figure 6
shows the ICCs for item number 10, or number 4 in package II. From this figure, the
ICCs for item number 10 can be explained as follows: (1) a score of 1 (category 1) is
8
mostly obtained by learners with low ability ( = -3); ( 2) a score of 2 (category 2) is
mostly obtained by learners with low ability ( = -0.5); (3) a score of 3 (category 3) is
mostly obtained by learners with high ability ( = 0.9); and (4) a score of 4 (category 4) is
mostly obtained by learners with high ability ( = 3).
Figure 6. Curve Characteristics of Problem Description of Item Number 10
Furthermore, the graph of the function information and the MCQ SEM is presented
in Figure 7 below.
Figure 7. Function Information and Standard Error Measurement (SEM) of the

Multiple Choice Questions (MSQ)
Based on this figure, it can be seen that the test description instrument is suitable for
learners with low to moderate abilities, namely: -1.4 < <2.8. -1.0 2.8.
Then, information for the graph function and SEM for the description problems is
presented in Figure 8.
9
Figure 8. The Information and Standard Error Measurement (SEM) of the
Description Questions
This figure shows two peaks of information, which means that there are two optimal
information pieces obtained by the test, i.e. at low and high ability individuals
simultaneously. Thus, it can be stated that the test dimensions of cognitive process and
knowledge dimensions are appropriate for learners who have medium and high ability
categories of -0.7 0.8.
Finally, from the results of the reliability and validity analyses for the non-instrument
meta-cognitive test through the SPSS, the reliability of the non-test instrument can be seen
in Figure 9.
Reliability Statistics
Cronbach's Alpha N of Items

.910 29
Figure 9. Reliability Coefficient for the Non- Test Meta-cognitive Instrument
From Figure 9, it can be seen that the reliability measure of the meta-cognitive
instrument is 0.91. Meanwhile, the validity of meta-cognitive instrument can be seen in
Table 3.
10
Table 3. Results of the Validation of the Meta-cognitive Instrument
In this table, it can be seen that, based on the try-out results, out of the 29 items tested,
28 items are found valid, and one not valid. Thus, based on the Rasch analyses on the
multiple-choice and description test questions of the try-out results, it is found that
developed instrument is suitable for measuring the students cognitive process and the
knowledge dimensions in biology. However, it is also found that, based on the analyses of
the difficulty levels of the multiple-choice and description tests, there is some
inconsistency within the items. This is because the test instrument is less able to
demonstrate the hierarchy of the levels of the cognitive abilities of the learners. Therefore,
the test items need to go through stages of revision so that the test instrument has a
stronger power to be used for testing students cognitive process and product dimensions
in biology. Finally, the non-test meta-cognitive instrument, however, is found to have a
high measure of reliability and validity that meets the requirement for non-test instrument
development. So it can be used to analyze students meta-cognitive abilities in biology.
4. CONCLUSION
Based on the description and discussion of the research findings, the research study has
produced the following results. First, the multiple-choice test has a mean and standard
deviation of 1.0 and 0.0 which fits the INFIT MNSQ and the description test fits the Rasch
model. Second, the INFIT MNSQ lower and upper bounds of 0.77 and 1.30 indicate that
there are items that do not fit the models. Third, based on the analysis results of the item
11
difficulty levels, items that represent aspects do not show the hierarchy of the cognitive
capability dimensions. Finally, revision of some of the items is needed that will be used in
various stages of implementation.
12
REFERENCES
Anderson L. R, Krathwohl, D. R, et al. (2001). A Taxonomy for Learning, Teaching, and
Assessing: A Revision of Bloom's Taxonomy of Educational Objectives. A Bridged
Edition. New York: Longman.
Bambang Subali. (2010). Free practicum assessment, evaluation and remediation of

learning outcomes in biology. Yogyakarta: Yogyakarta State University.
BambangSubali. (2009). Test measurement science process skills divergent patterns of

biological subjects.[Electronic version]. Yogyakarta State University.
Bambang Sumintono& Revelation Widhiarso. (2015). Rasch modeling applications in

educational assessment. Cimahi: Trim Komunikata.
Bloom, B. S. (1956). Taxonomy of educational objectives: The classification of

educational goals, handbook 1 cognitive domain. New York: Longmans, Green and
Co.
Bond & Fox. (2007). Applying the Rasch model: the fundamental measurement in the
human sciences. 2-nd ed. Mahwah, New Jersey: Lawrence Erlbaum Associates.
BNSP. (2006). Content standard for primary and secondary education: Competence
Standard and Basic Competence SMA. MA.
DeGallow. 2001. What is Problem-Based Learning? (Http://www.pbl.uci.edu/-

whatispbl/html.htm, accessed on March 26, 2015).
Edi Istiyono. (2014). Measurement of high-level thinking skills of high school physics
students in DIY, doctoral dissertation, unpublished, State University of Yogyakarta,
Yogyakarta.
Edi Istiyono, Djemari Mardapi, and Suparno. (2014). Effectiveness of reasoned choice
objective test to measure higher order thinking skills in implementing curriculum
physiscs 2013. Proceedings of the International Conference Educational Research and
Evaluation, University of Yogyakarta.
Eggen, P. D & Kauchak, D.P. (1996). Strategies for Teachers: Teaching Content and
Thinking Skills. (Third edition). Boston: Allyn and Bacon.
Falk, Doris F. (1980). Biology teaching methods. Florida: John Wiley & Sons, INC.
Harminto, Sundowo. (2004). General Biology. Jakarta: Open University Publishing Center.
Heri Retnawati.(2014). Item response theory and its application. Yogyakarta:

MedikaNuha.
Heri Retnawati.(2016). Validity, reliability, and characteristics of the grain. Yogyakarta:
Parama Publishing.
13
IEA. (2011). TIMSS & PIRLS. IEA Sites. Accessed on October 26, 2016 from
http://timssandpirls.bc.edu/data-release-2011/pdf/Overview-TIMSS-and-PIRLS-2011-
Achievement.pdf.
Research Kemdikbud. (2016). PISA (Programme for International Student Assessment).

Accessed on October 26, 2016 from http://litbang.kemdikbud.go.id/index.php/survei-
internasional-pisa.
Mardapi, (2004). The preparation of test. Djemari learning outcomes. Yogyakarta:

Yogyakarta State University.
Mardapi, Djemari. (2008). Mechanical preparation of test instruments and nontes.

Yogyakarta: Cendikia Partners Press.
Minister of National Education. (2006). National Education Minister Regulation No. 22 of

2006, Standard and Basic Competence Competence Biology Subject.
Nitko, A. J. & Brookhart, S. M. (2011). Educational assessment of student (6th ed). New
Jersey: Pearson Education Inc.
Oriondo, L. L. & Dallo-Antonio, E. M. (2008). Evaluation of educational outcomes.

Manila: Rex Printing Company, Inc.
Permendikbud. (2013). Regulation of the Minister of Education and Culture No. 54 of

2013, on Graduates Competency Standards (SKL) SMA.
Rustaman, Nuryani. (2005). Biology teaching and learning strategies. Malang: UM Press.
Saifuddin Azwar. (2015). Reliability and validity. Yogyakarta: Student Library.
Suwarto. 2013. Development of diagnostic tests in learning. Yogyakarta: Student Library.
Trilling, B. & Hood, P. (1999). Learning, technology, and education reform in the
knowledge age ("We're wired, webbed, and windowed, now what?"
(Www.wested.org/cs/we/view / rs / 654, accessed on July 9, 2015).
Wilson, M. (2005). Constructing measures: An item response modeling approach.

Mahwah: Lawrence Erlbaum Associates, Inc. Publishers.
14

Makala H

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Makala H

Caricato da

Copyright:

Formati disponibili

DEVELOPMENT OF ASSESSMENT INSTRUMENT OF COGNITIVE PROCESS

AND PRODUCT DIMENSIONS FOR BIOLOGY IN GRADE XI OF THE SENIOR

Paidi1), Siti Yulaikah2) Isti'farin3), Dessy Alfindasari4), Rabiatul Adawiyah5)

2. MATERIAL AND METHOD

The study is developmental research in the development of a test instrument with

Figure 2. Diagram of INFIT MNSQ for the description items

Aspect Item Case estimate

Aspect Item Case Estimate

Figure3. Distribution of difficulty levels for the multiple-choice items

Figure 4. Distribution Difficulty Test of Description item

Figure 5. Item Characteristic curve for Item Number 18

Figure 6. Curve Characteristics of Problem Description of Item Number 10

Figure 7. Function Information and Standard Error Measurement (SEM) of the

Cronbach's Alpha N of Items

Figure 9. Reliability Coefficient for the Non- Test Meta-cognitive Instrument

Bambang Subali. (2010). Free practicum assessment, evaluation and remediation of

BambangSubali. (2009). Test measurement science process skills divergent patterns of

Bambang Sumintono& Revelation Widhiarso. (2015). Rasch modeling applications in

Bloom, B. S. (1956). Taxonomy of educational objectives: The classification of

DeGallow. 2001. What is Problem-Based Learning? (Http://www.pbl.uci.edu/-

Heri Retnawati.(2014). Item response theory and its application. Yogyakarta:

Research Kemdikbud. (2016). PISA (Programme for International Student Assessment).

Mardapi, (2004). The preparation of test. Djemari learning outcomes. Yogyakarta:

Mardapi, Djemari. (2008). Mechanical preparation of test instruments and nontes.

Minister of National Education. (2006). National Education Minister Regulation No. 22 of

Oriondo, L. L. & Dallo-Antonio, E. M. (2008). Evaluation of educational outcomes.

Permendikbud. (2013). Regulation of the Minister of Education and Culture No. 54 of

Saifuddin Azwar. (2015). Reliability and validity. Yogyakarta: Student Library.

Suwarto. 2013. Development of diagnostic tests in learning. Yogyakarta: Student Library.

Wilson, M. (2005). Constructing measures: An item response modeling approach.

Potrebbero piacerti anche