Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Dr. Yossra Hussain Ali Dr. Nuha Jameel Ibrahim Mohammed Abdul Jaleel
Computer Science Department, University of Technology, Baghdad, Iraq
Abstract
Our time characterized by the tremendous progress in communication technology, and the
social networking pages of various types and forms. These tools have emerged as a cultural
achievement created by creative minds through advanced technology. Social media is a modern
web-based application for communication and interaction between humans through audio
messages, written messages, and video messages. These devices build and activate living
communities around the world. People share their interests and activities with these Applications.
Twitter is a social media site, where people communicate through tweets. A service that enables
friends, family, and co-workers to communicate and stay in touch through the exchange of quick
and frequent tweets. People publish their tweets on their profile and send their followers to express
their thoughts and opinions about events in this world. It is important to study and categorize these
tweets. In this research, used evolving intelligent system to fix the problem of text classification.
the Inputs for this classification system are a set of features extracted from tweet and The output
of this system is decision of classification for tweet, which is degree of correlation for each tweet
to a appointed event where the degree of relevance to desired event if it irrelevant or relevant. The
results compared with method of keyword search and fuzzy logic based method to recognize based
incremental rate and correction rate. The result exhibit that this evolving intelligent system is more
suitable for tweets classification from fuzzy logic method and method of keywords search.
Keywords- Social media; Text classification; Evolving Intelligent System
الخالصة
. وصفحات التواصل االجتماعي من مختلف األنواع والنماذج،يتميز عصرنا الحالي بالتقدم الهائل في تكنولوجيا االتصاالت
وسائل اإلعالم االجتماعية هي تطبيقات.وقد برزت هذه األدوات كإنجاز خلقته العقول المبدعة من خالل التكنولوجيا المتقدمة
تقوم هذه.حديثة تعتمد على الويب للتواصل والتفاعل بين البشر من خالل الرسائل الصوتية والرسائل المكتوبة ورسائل الفيديو
، يشارك الناس اهتماماتهم وأنشطتهم في مواقع التواصل االجتماعي.الوسائل بتفعيل التواصل بين المجتمعات في جميع أنحاء العالم
موقع تويتر خدمة تمكن األصدقاء والعائلة وزمالء العمل من التواصل والبقاء على.حيث يتواصل الناس من خالل التغريدات
يقوم األشخاص بنشر تغريداتهم على ملفهم الشخصي وإرسالها الى متابعيه.اتصال من خالل تبادل التغريدات السريعة والمتكررة
في هذا البحث تم استخدم. من المهم دراسة هذه التغريدات وتصنيفها.للتعبير عن أفكارهم وآرائهم حول األحداث في هذا العالم
ناتج هذا النظام هو قرار. مدخالت النظام هذا عبارة عن مجموعة من الميزات التي تستخرج من كل تغريدة.نظام ذكي متطور
وهي درجة ارتباط كل رسالة بحدث معين حيث يتم تحديد درجة االنتماء إلى الحدث المرغوب فيه إذا،التصنيف لكل تغريدة
طريقة تستند إلى منطق غامض للتمييز، النتائج مقارنة مع طريقة البحث عن الكلمات الرئيسية.كانت ذات صلة أو غير ذات صلة
تظهر النتيجة أن هذا النظام أكثر مالءمة لتصنيف التغريدات من طريقة الكلمات.على أساس شروط معدل تصحيح ومعدل متزايد
الرئيسية وطريقة المنطق الضبابي
1. INTRODUCTION
Social media is a convenient space for people to Clarification their opinions in certain events
and communicate with each other. Tweets on Twitter contain features related to users' thoughts
and opinions concerning certain events, and it is important to categorize and select them. In
Twitter, there are data that reduces the data extracted and decrease its utility and then impact to
the classification process [1].
Text classification is an essence problem for numerous applications, like spam detection, smart
replies or sentiment analysis. It is a problem studied widely in past few years and various methods
used to solve this problem. Text classification aim to allocate documents to many or one categories.
If document allocated to more than one class, if document allocated to more than one class, it
called “single label” and if a document allocated to only one class, it called “multi-label” [2]. Most
methods depend on representing text as a text vector to classify it. This vector contains frequency
of each word in the text. It can be more sophisticated and represent several features that extracted
from the text [3].
Evolving intelligent systems (EISs) used to development of online algorithms that it close to
the theoretically optimal, work in real-time, appropriate for unpredictable environments and
appropriate for unstable problems. EISs are characterized by evolve its structure, adapt its
parameters, adapt and work gradually, in real time and online. Evolving intelligent systems Based
on Adaptive algorithms that it participate to raise of ‘quotient of intelligence’ of a system [4].
In this work, evolving intelligent system designed to classify tweets of Twitter data. A set of
features extracted from each tweet. These features are inputs to the classification procedure that
based on the fuzzy logic and genetic algorithm in the classification of the tweets according to their
relevance.
2. RELATED WORKS
In 2014 Caragea et al. [5] by using Naïve Bayes classifier and Support Vector Machine (SVM)
combined with SentiStrength algorithm. They suggested a sentiment classification method for
tweets of users during Hurricane Sandy and visualized these sentiments to map of geographical
Concentrated on hurricane Sandy .IN 2014 Salari et al. [6] they suggested a classification
procedure by using artificial neural network, genetic algorithm and k-Nearest Neighbor algorithm.
The purpose of this classification procedure is to earning best features vector. First, used feature-
sorting methods to prefer features such as the class of criteria reparability and ratio of feature
discriminant. Second, results contain arrays of best-ranked features used to produce optimum
features arrays as initial population to genetic algorithm. Third, advanced the process of
classification based on genetic algorithms that it selected optimum arrays of features using a
modified k-Nearest Neighbor method and improved back propagation neural network method. In
2016 Spielhofer et al. [7] suggested that the problems of irrelevant data removal and noise
reduction are similar to the email spam filtering. They trained a Naïve Bayes classifier for relevant
data detection. In 2016 Jiang et al. [8] by utilizing Maximum Likelihood Estimate, presented an
enhanced strategy called deep feature weighting Naïve Bayes to ascertain the earlier likelihood
and contingent likelihood. In 2016 Prusa et al. [9] in text classification utilized Convolutional
Neural Networks (CNN) and another encoding approach. CNN primarily utilized as a part of
processing of image. the new encoding strategy can change data of text into image and CNN can
utilized as a classifier of text. Because of text data’s high feature measurement, a procedure of
feature selection connected in classification of text. All feature selection applications plan to locate
the littlest subset of original data with the end goal that it can lessen the calculation time and
enhance classification of text. In 2016 Bidi et al. [10] utilizing Genetic Algorithm (GA) to execute
feature selection, can achieve two objectives by using This feature selection method, first is the
hunt of a feature subset to extent that execution of classifier is ideal; second is discover feature
subset with littlest dimensionality which accomplishes classification with higher precision. To
assess execution, three classifiers chose Nearest Neighbors, Support Vector Machine and Naive
Bayes. In 2017 Sathe et al. [11] suggested sentiment classification algorithm using fuzzy logic
combined with Neural Network.
3. PROPOSED SYSTEM
Research provides subtle elements of evolving intelligent system based on fuzzy logic and
genetic algorithm. At begin, classified collected data as training data, and the initial phase is pre-
processing. In this phase, each tweet handled to eliminate with the augmentations that influence
the classification procedure and then seven features used as input extracted from each tweet in the
phase of extract feature. These features utilized as input in procedure of classification. Procedure
of classification goes through three stages. Fuzzification is change of real inputs to fuzzy inputs
containing degree of membership using functions of membership. The trapezoidal membership
function utilized because it is precise, utilized much of the time and straightforward. In this step,
the advancement procedure happens. Genetic algorithms used to generate new membership degree
based on the previous membership degree. The inference step portrays second step, draws
assignment from input to output, and uses the IF-THEN rules to change the fuzzy input to the
fuzzy output. The last step is the defuzzification step to get real output. There are numerous
functions of defuzzification, for example, the middle point, median, maximum average (MOM)
and greater than the maximum (LOM). Figure (1) describe block diagram of this evolving
intelligent system.
Data Collection
Preprocessing
Preprocessing
Hashtag Process
Tokenization
Extract more 50 words Feature Extraction
used frequently
Stop Word Removal
Stemming
Extract other Fuzzy
Rules
Lemmatization
POS
Feature Extraction
Classification procedure
Kj = sum Si
Where Kj chooses a tweet, amass score of words.
3. Indicates (Nj) to length of tweet
Nj = n
Where n indicates to words number in tweet.
4. Indicates (Mj) to considerably utilized words number in tweet
Wj = Kj / Nj
Where Wj is the mean of words.
6. Indicates (Xj) to much of the time utilized words weight in tweet
Xj = Mj / Nj
Where Xj chooses rate of words utilized much of the time for all words in tweet.
7. Number of patterns in jth tweet (Vj )
After get a list. Helpful words found in training data in excess of 50 significant words that go
ahead their own but are not on the list. e.g. 'not protected ' and 'not expected' terms advantageous
more than one term, for example, 'not' or ' safe' . Along these lines, shows YJ to number of this
pattern in a tweet [12].
We extracted four extra features to enable us to give a more precise outcome and to order
more tweets. Partition the list D to three equally circulated subgroups that indicate to Z1, Z2 and
Z3 with various weights Θ1, Θ2 and Θ3, individually. These features characterize as:
8- More words utilized as a part of the list D (Z1)
9 - Words utilized regularly in the D list (Z2)
10 - Less usually utilized as a part of the rundown D (Z3)
ϴ1 for Z1 k ϵ [1, 17)
ϴk = ϴ2 for Z2 k ϵ [17, 33)
ϴ3 foe Z3 k ϵ [33, 50]
11 - The Number of words not found in the D, but rather these words utilized as often as
possible in training data that identified with Hurricane Sandy (SW)
3.4 CLASSIFICATION PROCEDURE
Figure (2) indicate to framework of using a classification procedure. After process of
fextraction feature, feature vector contain eleven value for every tweet. Eleven features utilized as
the input to the procedure. The classification procedure Pass through three steps of Fuzzification
process, inference process, Defuzzification process, as show in algorithm (1)
Crossover
Mutation
Reproduction
Summarize, evolving intelligent system ready to extract tweets more than fuzzy logic
method and keyword search method. With considering incremental rate, evolving intelligent
system is powerful more than fuzzy logic based method and keyword search method. With
considering correctness rate values, a keyword search method completing somewhat best than
fuzzy logic method but evolving intelligent system is superior and better than fuzzy logic method,
and approximate to keyword search method. With thinking and considering about both standard,
evolving intelligent system choose in research, where relevant tweets are exceptionally and highly
required for analysis step [14-15]. correctness rate value and high quantity ready to ensure more
informative, educational and helpful .We find that evolving intelligent system is superior to
anything and better than fuzzy logic based method where it can ensure the high rate value and
high quantity value of correction and tweets more that are relevant and classified accurately.
5. CONCLUSION
In this research, proposed evolving intelligent system for text classification from Twitter
data. By utilizing, an arrangement of training data and test data and got eleven feature from every
tweet as inputs to the classification procedure. We compare this evolving intelligent system
with two methods. The first is a method of keyword search and second method is fuzzy Logic
method for text classification. Results demonstrate that this system is suitable and appropriate to
classify irrelevant or relevant tweets more than fuzzy logic method and keyword search method,
Additionally, by contrasting defuzzification functions usually utilized, we conclude , centroid
function is more productive and powerful than other function. In future works, we aim to detect a
best way to classify text, for example, using neural networks with fuzzy logic and using
evolutionary algorithms to generate new membership degrees at fuzzification process or generate
additional rules in the inference process.
REFERENCES
[1] C. Chen, D. Neal, and M. Zhou, “Understanding the Evolution of a Disaster A Framework
for Assessing Crisis in a System Environment (FACSE)”, Natural Hazards, vol. 65, no. 1, pp.
407-422, January, 2013.
[2] Rajni Jindal, Ruchika Malhotra, Abha Jain, "Techniques for text classification:
Literature review and current trends", Webology, Volume 12, Number 2, December 2015.
[3] M. Nogueira, O. Rezende, A. Camargo,” On the Use of Fuzzy Rules to Text Document
Classification”, International Conference on Hybrid Intelligent Systems, USA, August 2010.
[4] Plamen Angelov, Dimitar P. Filev, Nikola Kasabov, “Evolving Intelligent Systems:
Methodology and Applications”, IEEE, Ambleside, UK, September 2006.
[5] C. Caragea, A. Squicciarini, S. Stehle, K. Neppalli, and A. Tapia, “Mapping moods: geo-
mapped sentiment analysis during hurricane Sandy,” International Conference on Information
Systems for Crisis Response and Management (ISCRAM), pp. 642-651, May 2014.
[7] T. Spielhofer, R. Greenlaw, D. Markham, and A. Hahne, “Data mining Twitter during the
UK floods: Investigating the potential use of social media in emergency management”, 3rd
International Conference on Information and Communication Technologies for Disaster
Management (ICT-DM), Vienna, Austria, pp. 1-6, December, 2016.
[8] Q. Jiang, W. Wang, X. Han, S. Zhang, X. Wang and C. Wang, “Deep feature weighting in
Naive Bayes for Chinese text classification”, International Conference on Cloud Computing and
Intelligence Systems (CCIS), Beijing, China, pp. 160-164, August, 2016.
[9] J. D. Prusa and T. M. Khoshgoftaar, “Designing a better data representation for deep neural
networks and text classification”, International Conference on Information Reuse and
Integration (IRI), IEEE, USA, pp. 411-416, July ,2016.
[10] N. Bidi and Z. Elberrichi, “Feature selection for text classification using genetic
algorithms”, International Conference on Modelling, Identification and Control (ICMIC),IEEE,
Algiers, Algeria, pp. 806-810, November, 2016.
[11] J. B. Sathe and M. P. Mali, “A hybrid Sentiment Classification method using Neural Network
and Fuzzy Logic,” IEEE, India, pp. 93-96, January 2017.
[12] KeYuan Wu, MengChu Zhou, Xiaoyu Sean Lu and Li Huang, “A Fuzzy Logic-Based Text
Classification Method for Social Media Data“, International Conference on Systems, IEEE,
October 2017.
[13] A. Kasun, M. Manic, and R. Hruska, “Optimal stop word selection for text mining in
critical infrastructure domain”, Resilience Week (RWS), Philadelphia, pp. 1-6, August 2015.
[14] H. Hellendoorn and C. Thomax, “Defuzzification in fuzzy controllers”, Journal of
Intelligent & Fuzzy Systems, vol. 1, no. 2, pp.109-123, 1993.