Sei sulla pagina 1di 8

A Study of Image Classification for Rare

Animal
Constantino Geovany O.L Johan Aristo Wibowo Lewi Junardi T.
Department Of Informatics Department Of Informatics Department Of Informatics
Universtas Atma Jaya Yogyakarta Universtas Atma Jaya Yogyakarta Universtas Atma Jaya Yogyakarta
Yogyakarta, Indonesia Yogyakarta, Indonesia Yogyakarta, Indonesia
orlandolana09@gmail.com aristo.yohan@yahoo.com lewijunardi46@gmail.com

Lucia Adilla Maribeth Palanda Octovianus Pabubung


Department Of Informatics Department Of Informatics
Universtas Atma Jaya Yogyakarta Universtas Atma Jaya Yogyakarta
Yogyakarta, Indonesia Yogyakarta, Indonesia
pandadi479@gmail.com octovianuspabubung10@gmail.com

Abstract – This paper aim to do a study of image therefore we emphasize more on introducing
classification for rare animal. This research will explore technology that can classify these animals based on
many of machine learning algorithm, like logistic their status. The one that was sculpted to almost
regression, kneighbors, random forest, etc. after three
extinction. Our goal is none other than to jointly invite
experiments on several algorithms used, it shows that
the public to be able to protect and share the care of
random forest is suitable for solving this problem, with an
accuracy reaching 93% the endangered animal population.

Keywords – image classification, rare animal, animal It is often debated whether killing a whale is an
affect, machine learning classification for animal, random illegal step, cultivating snakes and crocodiles is
forest . something that is prohibited, or can the birds that we
maintain be animals with an endangered status?
I. INTRODUCTION
II. RELATED WORK
By keeping up with the times, of course
technology is also growing. More and more new Animal data circulating in cyberspace is fairly
discoveries and developments from technology that large and one form of data is in the form of images.
have already existed before have had a positive impact The research conducted by Slavomir Matuska, Robert
on human life. Hudec, Patrik Kamencay, Miroslav Benco, Martina
Zachariasova, 2014 was aimed at introducing objects
One of the technologies that is developing at this based on local hybrid descriptors. The dataset used
time is the classification of images. One area of comes from dataset classes such as wolves, foxes,
artificial intelligence that is developing at this time. brown bears, deer and wild boar using images of large
animals originating from the country of Slovakia. The
From a picture there is very much information that method used is SVM with the aim to see the speed of
can be obtained. A collection of images is unprocessed testing and the level of accuracy obtained [1].
raw data. After the data is processed we can get
information so that we can explore the knowledge A study conducted by Tibor TRNOVSZKY,
contained in it. Patrik KAMENCAY, Richard ORJESEK, Miroslav
We chose to classify endangered species because BENCO, Peter SYKORA, 2017 aims to compare the
we wanted to better educate the public, especially the methods used to recognize all types of animals,
next generation, to get to know the types of animals on namely Convolutional Neural Network (CNN) with
this earth. There are so many types of animals, several other methods such as PCA (Principal
Component Analysis) , Linear Discriminant Analysis several species such as amur leopard, Yangtze
(LDA), Local Binary Patterns Histographic (LBPH) finless porpoise, black rhinos, hawksbill turtle, and
and Support Vector Machine (SVM). The results Sumatran orangutan. Endangered is a species that
obtained show that CNN is the most appropriate has been categorized as very likely to become
method to use because the method gives positive extinct in the near future. Endangered is used to
results and outperforms other methods [2]. label several species such as African wild dogs,
chimpanzee, sea lions, whale sharks, and red
The research using a drawing dataset from the pandas. Extinct is a grouped species that is already
Wildlife Spotter project, of course, was obtained using extinct because it considers the last species that has
a trap camera from Australian Scientists. The research died.
was carried out by Hung Nguyen, Sarah J. Maclagan,
Tu Dinh Nguyen, Thin Nguyen, Paul Flemons, Kylie EX is used to label several species such as
Andrews, Euan G. Ritchie and Dinh Phung in 2017. Canadian cougar, Galapagos tortoise, Iberian ibex,
The study aimed to classify image data in the form of macaw spix, and quagga. The distribution for each
animals or non-animals. If the image is detected by an label is shown in the following table.
animal image it will be classified again into animal
data [3]. In the dataset that we have collected, there are
several labels including Near Threatened (NT),
III. DATASET PREPARATION Least Concern (LC), Critically Endangered (CE),
Endangered (E), Extinct (EX).
A. Data Collection
In data collection, we collect images through
google's search engine. By using google's search
engine, we select and save images manually one by
one. We collect images into 5 classes and each Table 1. Dataset Distribution For Each Label.
class has 5 species of animals. The data we collect Label Number of Data Data Test
is image data with medium quality and size with Data Train
file formats .jpg, .jpeg, and .png. The collected NT 498 488 10
images are inserted into several folders according
LC 496 486 10
to the label, by giving the file name that we have
CE 474 464 10
specified.
E 550 540 10
EX 500 490 10
B. Data Characteristics
Near Threatened is a species that is
considered endangered in the near future, although C. Dataset Prepocessing
it does not currently qualify for the threatened 1. Dataset Reading
status. NT is used to label several species such as To read all datasets, we use the cv2
Jaguar, Beluga, Greater Sage Grouse, Albacore library to open every jpg and png type image
Tuna, and Bison Plains. Least Concern is
considered not to be the focus of species 2. Normalisation
conservation because it does not qualify as This process is useful for cleaning
threatened or almost threatened and the risk of raw data before processing. The method of
extinction based on its population. LC is used to normalization includes several techniques
label several species such as brown bear, tree including:
kangaroo, swift fox, macaw, and pronghorn.

Critically Endangered is a species that is a. Data Scaling


considered to have a very high risk of extinction in
the wild. Critically Endangered is used to label
This technique is used to normalize KNeighbors Classifier (KNN) is a non-
the range of independent variables or parametric method used for classification and
features of data. regression. In both cases, the input consists of
the k closest training examples in the feature
b. Label Encoder space.
This technique is used to converting
the labels into numeric form so as to 4. Decision Tree Classifier
convert it into the machine-readable form. Decision Tree Classifier uses a decision tree
(as a predictive model) to go from observations
c. Augmented about an item (represented in the branches) to
This technique is used to generate conclusions about the item's target value
new training samples from the original (represented in the leaves). It is one of the
ones by applying random jitters and predictive modeling approaches used in
perturbations such that the classes labels statistics, data mining and machine learning.
are not changed.
5. Random Forest
3. Feature Extraction Random Forest are an ensemble learning
In this study, the authors used the method for classification, regression and other
Mahotas. Mahotas is a computer vision and tasks that operates by constructing a multitude
image preprocessing library for Python. of decision trees at training time and outputting
the class that is the mode of the classes
IV. MODEL EVALUATION (classification) or mean prediction (regression)
of the individual trees. Random decision forests
After doing prepocessing data and extracting correct for decision trees habit of overfitting to
features that are in each image, we make and their training set.
experiment the model that will, using various
algorithms consisting of the following: 6. Gaussian Naïve Bayes
Gaussian Naïve Baiyes are a family of simple
1. Logistic Regression
"probabilistic classifiers" based on applying
Logistic regression (LR) is used to describe
Bayes' theorem with strong (naive)
data and to explain the relationship between one
independence assumptions between the
dependent binary variable and one or more
features.
nominal, ordinal, interval or ratio-level
independent variables. To be able to choose what algorithm is suitable for
this problem, we compare the results of the accuracy
2. Linear Discriminant Analysis of each algorithm that we use. Next we conduct
Linear Discriminant Analysis is a various experiments consisting of data scaling,
generalization of Fisher's linear discriminant, a augmented, and tuning parameters.
method used in statistics, pattern recognition
and machine learning to find a linear A. Data Scalling
combination of features that characterizes or
separates two or more classes of objects or In data scaling experiments, after scaling the
events. The resulting combination may be used image shows the accuracy of the various
as a linear classifier, or, more commonly, for algorithms used.
dimensionality reduction before later
classification.

3. KNeighbors Classifier
Figure 1. The result of the the model evaluation
use pyplot for 7 algorithm.

As can be seen, the order of the 3 best


algorithms is RF (92.9%), KNN (74.5%), and
LDA (57.5%).

The second experiment shows quite


satisfactory results because the accuracy results
in random forest algorithms can exceed 90%.

Figure 4. testing data image on the second trial

As can be seen, the order of the 3 best


algorithms is RF (50.1%), LR (39.6%), and LDA
(34%). The first experiment showed
unsatisfactory results because there was no
accuracy that exceeded 80%.

Figure 2. Testing data image on the first trial

C. Parameter Tuning
The third experiment, we try to replace the
value of the bins parameter with 10, and num tree
with 100. The accuracy of some algorithms adds
around 0.05%, so this third experiment can affect
the value of accuracy.

B. Augmented
In the second experiment, the images in the
dataset are augmented, thus having a large
number of datasets.

Figure 3. The result of the the model evaluation


use pyplot for 7 algorithm.
Figure 5. The result of the the model evaluation By recognizing the status of these animals, we can
use pyplot for 7 algorithm. determine the next action to preserve them. Providing
treatment or protection is one of them. Without
knowing the status of the animal, it is possible for
humans to hunt for consumption later. Just name a few
types of whales that until now are still a debate
whether including animals are almost extinct or not.
Because in some regions of the world there is a
tradition of killing whales which then invites world
condemnation.

With the existence of technology like this is


expected to make us more aware of the existence of
these animals.
REFERENCES
[1] Slavomir Matuska, Robert Hudec, Patrik
Kamencay, Miroslav Benco, Martina
Zachariasova, “Classification of Wild Animals
Based on SVM and Local Descriptors”,
University of Zilina Slovakia, 2014.
As can be seen, the order of the 3 best
algorithms is RF (93.04%), KNN (74.8%), and [2] Tibor TRNOVSZKY, Patrik KAMENCAY,
LDA (62.3%). Richard ORJESEK, Miroslav BENCO, Peter
SYKORA, “Animal Recognition System Based
Figure 6. testing data image on the third trial on Convolutional Neural Network “, University
of Zilina Slovakia, 2017.

[3] Hung Nguyen, Sarah J. Maclagan, Tu Dinh


Nguyen, Thin Nguyen, Paul Flemons, Kylie
Andrews, Euan G. Ritchie and Dinh Phung,
“Animal Recognition and Identification with
Deep Convolutional Neural Networks for
Automated Wildlife Monitoring”, University of
Deakin Australia, 2017.

V. CONCLUSION

The efficient classification of animals can be very


useful for the next generation to get to know the
various types of animals that exist in the wild, and also
so that they can understand and understand which
animals are in almost extinct status. So that the next
generation can take steps that are deemed necessary to
continue to maintain the population of these step
animals
Universitas Atma Jaya
Yogyakarta

Potrebbero piacerti anche