Sei sulla pagina 1di 1

U SE OF G ENETIC A LGORITHMS FOR C LASSIFICATION OF D ATASETS

{ N ITIN K UMAR , 216CS1140 } CSE DEPARTMENT, NIT R OURKELA

I NTRODUCTION O BJECTIVE G ENETIC A LGORITHM F LOW C HART


Genetic Algorithms (GA) are search-oriented The aim of this project is to create a classification timization problems.GA uses genetics as itâĂŹs
repetitive optimization strategies built on the con- engine based on genetic algorithm that will train model as problem solving. Each solution in ge-
cepts of natural selection theory and biological itself from the training data provided, and be able netic algorithm is represented through chromo-
genetics. In Genetic Algorithms, we have a set to classify. The classification engine that is be- somes. Chromosomes are made up of genes,
of possible candidate solutions to a problem at ing developed here is based on a type of machine which are individual elements (alleles) that rep-
hand. These solutions are transformed into other learning algorithm called genetic algorithm. The resent the problem. The collection of all chromo-
solutions using mutation and recombination, pro- classification engine will train itself to the max- somes is called population. Generally there are
ducing new children solutions. And this process imum possible level by refining itself with ade- three popular operators are use in GA.
is continued for several iterations. This study quate attention being paid to ensure the engine
shows the comparison of performance of various will not be over-fitted, meaning the classification • Population : The evolution usually starts
popular classification algorithms against GA with engine will be able to classify any input instances from a population of randomly generated
respect to accuracy of the classification. precisely, not confining to the training data. individuals. In other words, population is
a group of chromosomes.

M ETHODS OF GA M ETHODS OF GA • Fitness Function : Individual solutions are


1. Selection: This operator is used in selecting Commonly used mutation techniques are selected through a fitness process. Fitness
individuals for reproduction. Various selection function in simple words depicts how âĂIJ-
methods are • bit flip mutation Figure 1: Flow Chart
fitâĂİ or how âĂIJbadâĂİ is a solution to a
• random resetting Genetic algorithms are useful for search and op- given problem.
• Roulette wheel selection • swap mutation
• Random selection • scramble mutation
• Rank selection • inversion mutation
• Tournament selection R ESULTS
• Boltzmann selection 4. Survivor selection: The policy that help us
in finding out which chromosomes to retain for on our training dataset that shows the ac-
2. Crossover: This is the process of taking two the next generation, and which chromosomes to curacy of 76.46 percent with 10 folds cross-
parent chromosomes and producing a child from be discarded is called as survivor selection policy. validation and 76.56 percent with 66 percent
them. This operator is applied to create better Most commonly used survivor selection schemes split of the training dataset
string. Various types of cross over operators are are • IBk Classifier: The IBk classifier is applied
on our training dataset that shows the ac-
• onepoint crossover • Age based selection curacy of 75.16 percent with 10 folds cross-
• multi-point crossover • Fitness based selection validation and 74.16 percent with 66 percent
• uniform crossover Figure 2: Classification Accuracy
split of the training dataset.
• whole arithmetic recombination 5. Termination condition: It is highly important
• ZeroR classifier: The ZeroR classifier is ap- • Genetic Algorithm: The GA classifier is ap-
to design the termination condition of any ge-
3. Mutation: Randomly inverted gene in parent plied on training dataset that shows the ac- plied on our training dataset with the ac-
netic algorithm very carefully, since it determines
chromosome would produce the child chromo- curacy of 59.90 percent with 10 folds cross- curacy of 81.33 percent with 10 folds cross-
when to end the genetic process. If the termina-
some. Mutation is very much needed for diver- validation and 63.16 percent with 66 percent validation and 80.95 percent with 66 percent
tion condition has not been designed correctly, it
sity of the population. split of the training dataset. split of the training dataset.
may lead to infinite loops.
• J48 Classifier: The J48 classifier is applied

R EFERENCES F UTURE W ORK C ONCLUSION


[1] Shanabog CS Nandish and UM Ashwinkumar. Use In this poster we have presented an approach for classification of datasets. As a way to validate the Genetic algorithm has provided the maximum
of genetic algorithms for classification of datasets. proposed method, we have tested with machine learning data sets taken from UCI repository. Future classification accuracy 81.33 percent with 10-fold
Recent Trends in Electronics, Information & Communi- work may include different datasets and more number of classifiers. Using different data sets may also cross validation and 80.35 percent with 66 percent
cation Technology (RTEICT), 2017 2nd IEEE Interna- be useful to test the proposed system performance. splitting.
tional Conference on, pages 2016–2020, 2017.

Potrebbero piacerti anche