Proceedings ICINC19 PDF

Proceeding of
International Conference on IoT, Next

Generation Networks & Cloud
Computing 2019 (ICINC-2019)
ORGANIZED BY
Department of Computer Engineering

in association with
Savitribai Phule Pune University
Sinhgad Technical Education Society’s
Smt. Kashibai Navale College of

Engineering
Vadgaon (Bk.), Pune-411041.
CONFERENCE COMMITTEE
CHIEF PATRON
Prof. M. N. Navale
Founder President,Sinhgad Institutes
PATRON
Dr. (Mrs). S. M. Navale
Founder Secretary,Sinhgad Institutes
PATRON
Mr. R. M. Navale
Vice-President (Hr), Sinhgad Institutes
PATRON
Mrs. Rachana. Navale - Ashtekar.
Vice-President (Admin), Sinhgad Institutes
CONVENOR
Dr P. N. Mahalle
Professor & Head,
Member- BoS Computer Engineering SPPU,
Excharirman - BoS Information Technology, SPPU Pune
ORGANIZING SECRETARY
Dr. G. R. Shinde
Prof. J. N. Nandimath
CORE TECHNICAL COMMITTEE
Prof. S. K. Pathan
Prof. S. P. Pingat
Prof. R. A. Satao
Prof. V. S. Deshmukh
Prof. V. V. Kimbahune
Prof. A. A. Deshmukh
Prof. V. R. Ghule
Prof. P. S. Desai
Prof. P. N. Railkar
Prof. P. S. Raskar
Prof. S. R. Pavshere
Prof. P. A. Sonewar
Prof. P. R. Chandre
Prof. A. B. Kalamkar
Prof. S. A. Kahate
Prof. B. D. Thorat
Prof. P. S. Teli
Prof. P. P. Patil
Prof. D. T. Bodake
Prof. G. S. Pise
Prof. S. P. Patil
Prof. M. Tamboli
CORE SUPPORTING HANDS

Ms. Manisha Shinde
Mr. Sanjay Panchal
Mr. Pranesh Holgundikar
Mr. Salim Shaikh
Ms. Komal Ingole
Ms. Deepali Ingole
Message from Principal Desk
Dr A V Deshpande
Principal
Smt Kashibai Navale College of Engineering,
Pune.
In the advent of high speed communication tremendous impetus felt to
various core sector technology in terms of computer networking. This includes
next generation network, advance database technologies like data mining and
information retrieval, image and signal processing etc. There is also tremendous
advancement like solution system soft computing like cloud computing, grid
computing, neural networks, network and cyber security. Internet, web and other
services sectors have gone through sea change in last decade.
A need was therefore felt to organize this International Conference on
―Internet of Things, Next Generation Network and Cloud Computing 2019‖
ICINC 2019 to acquaint researcher, faculty and students of this college with the
latest trends and development in this direction. This conference in deed provides a
very useful platform for close intermingle congregation between industry and
academic. The conference addresses the trends, challenges and future roadmaps
within a conglomerate of existing and novel wireless technologies and recent
advances in information theory and its applications.To make the event more
meaningful we interacted with premier institutes, organizations and leading
industries, spread over the country in the field of computer networking and
requested them to demonstrate and share latest technology with participants. I am
sure this close interaction with them will enrich us all with knowledge of latest
development.
Message from Vice Principal
Dr K R Borole
Vice Principal
Pune.
Warm and Happy greeting to all. I am immensely happy that Department
of Computer Engineering of Smt. Kashibai Navale College of Engineering
,Vadgaon (bk), Pune is organizing International conference on ― Internet of
Things, Next Generation Networks and Cloud Computing 2019 ICINC-
2019‖, on February 15th to 16th , 2019. The conference addresses the trends,
challenges and future roadmaps within a conglomerate of existing and novel
wireless technologies and recent advances in information theory and its
applications. The conference features a comprehensive technical program
including special sessions and short courses.
The dedicated Head of Department of Computer Engineering Dr.
P.N.Mahalle (Convener), Dr. G.R.Shinde & Prof J. N.Nandimath (Organizing
Secretory), staff members and disciplined undergraduate, postgraduate students
and research scholars of Smt. Kashibai Navale College of Engineering
Vadgaon (bk) Pune are the added features of our college. On this occasion I
would like to express my best wishes to this event.
I congratulate Head of Department, staff members, students of Computer
Engineering Departments, participants from all over India and abroad countries,
and colleges for organizing and participating in this conference.
I express my sincere thanks to all the authors, invited speakers, session
chairpersons, participants and publication of proceeding who did the painstaking
efforts of reviewing research papers and technical manuscripts which are included
in this proceeding.
Message from Convener & Head of Department
Dr.Parikshit N. Mahalle
Head & Professor,
Dept of Computer Engineering
Smt Kashibai Navale College of Engineering.
Pune.
It‘s an honor and privilege to host and witness an international conference,

congruence of scholarly people who meet and put forward their theory to raise the
technology by a notch. I feel proud to see intellectuals from different countries
come together to discuss their research and acknowledge others‘ achievements. I
would like to quote Jonathan Swift “Vision is the art of seeing what is invisible
to others". We have a vision of excelling in the genre of education system and the
rankings awarded by various prestigious organizations to our institute are the
testimonials to this fact. Our strong foresight helps us to adapt ourselves quite
easily to the changing environment, compete with others and make a mark of our
own. My heartiest congratulations go to the organizing Committee and
participants of ICINC19 for successful conduction of 4th International conference.
Message from Organizing Secretary
Dr. G. R. Shinde
Organizing secretory
Pune.
Dear friends,
Adding a new chapter to the tradition of Proceeding of third International
conference at our college, I am very happy to place before you the proceeding of
4th International Conference ICINC2019. As an Organizing secretory, allow me to
introduce to this proceeding. It consists of 96 papers spread across six domains. I
laud my editorial team which has brought out this copy with beautiful and
research rich presentations. It is indeed a herculean task. It has been my pleasure
to guide and coordinate them in bringing out this proceeding .
My sincere thanks to Prof. M. N. Navale - Founder President, STE
Society, Pune, Dr. (Mrs) S. M. Navale - Secretary, STE Society,Pune, Ms.
Rachana Navale– Ashtekar - Vice-President (Admin), STE Society, Pune, Mr.
Rohit M. Navale - Vice-President (HR), STE Society, Pune for their
encouragement and support. I would also like to thank my Principal Dr. A. V.
Deshpande for his unstinted help and guidance. Dr. K. R. Borole, Vice Principal,
Dr. P. N. Mahalle Head Computer Department, have been kind enough in advising
me to carry this onerous responsibility of managing the functions of Organizing
secretory. I would also like to thank Savitribai Phule Pune University for
association with us.
I hope the research community will enjoy reading this proceeding during their
research time.
Message from Organizing Secretary
Prof. J. N. Nandimath
Organizing Secretory
Pune.
Dear Friends,
Research is an important activity of human civilization. It is very crucial
for improving the economy of our country and achieving sustainable
development. The outcome of research should not be confined to research
laboratories and effort must be put so that humanity can benefit from the
new developments in research. At the same time, the research education should
also be given due importance, in order to attract the young talented persons in this
area of research and equip them with the knowledge, information and wisdom
suitable for industry.
The 4th International Conference on ―Internet of Things, Next Generation
Networks and Cloud Computing 2019‖ (ICINC- 2019) aims to provide a
common platform for research community, industries and academia. It is also
expected to be a wonderful gathering of senior and young professionals belonging
to Department of Computer Engineering carrying out research.
We wish to thank all the authors, reviewers, sponsors, and invited
speakers, members of advisory board and organizing team, student-volunteers
and all others who have contributed in the successful organization of this
conference. I am very grateful to Prof. M. N. Navale - Founder President, STE
Society, Pune, Dr. (Mrs) S. M. Navale - Secretary, STE Society,Pune, Ms.
Rachana Navale– Ashtekar - Vice-President (Admin), STE Society, Pune, Mr.
Rohit M. Navale - Vice-President (HR), STE Society, Pune for their
encouragement and support. I would also like to thank Principal Dr. A. V.
Deshpande for his generous help and guidance. Dr. K. R. Borole, Vice Principal,
Dr. P. N. Mahalle Head Computer Department, has been kind enough in advising
me to carry this arduous responsibility of managing the functions of Organizing
secretory.
I would also like to thank Savitribai Phule Pune University for association
and providing necessary funding.
Index
Sr Page
Title
No No
Internet of Things
1 Automated Toll Collection System And Theft Detection Using RFID 1
Samruddhi S. Patil, Priti Y. Holkar, Kiran A. Pote, Shubhashri K. Chavan,
Asmita Kalamkar
2 WI-FI Based Home Surveillance Bot Using PI Camera & Accessing Live 7
Streaming Using Youtube To Iprove Home Security
Ritik Jain, Varshun Tiku, Rinisha Bhaykar, Rishi Ahuja, Prof. S.P.Pingat
3 Smart Dustbin With Metal Detector 12
Dhiraj Jain, Vaidehi Kale, Raksha Sisodiya, Sujata Mahajan, Dr. Mrs.
Gitanjali R. Shinde
4 Improvement In Personal Assistant 17
Ashik Raj, Sreeja Singh, Deepak Kumar, Deshpande Shivani Shripad
5 IoT Based Home Automation System For Senior Citizens 20
Ashwathi Sreekumar, Divyanshi Shah, Himanshi Varshney
6 Smart Trafic Control System Using Time Management 25
Gaikwad Kavita Pitambar, More Sunita Vitthal, Nalge Bhagyashree Muktaji
7 The Pothole Detection: Using A Mobile Sensor Network For Road 29
Surface Monitoring
Sanket Deotarse,Nate Pratiksha,Shaikh Kash, Sonnis Poonam
8 IoT Based Agricultural Soil Prediction For Crops With Precautions 33
Prof.Yashanjali Sisodia, Pooja Gahile, Chaitali Meher
9 IoMT Healthcare: Security Measures 36
Ms. Swati Subhash Nikam, Ms. Ranjita Balu Pandhare
10 Smart Wearable Gadget For Industrial Safety 42
Ketki Apte, Rani Khandagle, Rijwana Shaikh,Rani Ohal
11 Smart Solar Remote Monitoring and Forecasting System 45
Niranjan Kale, Akshay Bondarde, Nitin Kale, Shailesh Kore,
Prof.D.H.Kulkarni
12 Smart Agriculture Using Internet of Things 50
Akshay Kudale, Yogesh Bhavsar, Ashutosh Auti, Mahesh Raykar,
Prof. V. R. Ghule
13 Area-Wise Bike Pooling- ―BikeUp‖ 54
Mayur Chavhan, Sagar Tambe,Amol Kharat, Prof. S.P Kosbatwar
14 Smart Water Quality Management System 58
Prof. Rachana Satao, Rutuja Padavkar, Rachana Gade, Snehal Aher, Vaibhavi
Dangat
15 Intelligent Water Regulation Using IoT 62
Shahapurkar Shreya Somnath, Kardile Prajakta Sudam, Shipalkar Gayatri
Satish, Satav Varsha Subhash
16 Smart Notice Board 65
Shaikh Tahura Anjum Vazir, Shaikh Fiza Shaukat, Kale Akshay Ashok
17 Vehicle Identification Using IOT 68
Miss YashanjaliSisodia, Mr.SudarshanR.Diwate
18 Wireless Communication System Within Campus 72
Mrs. Shilpa S. Jahagirdar, Mrs. Kanchan A. Pujari
19 License Plate Recognition Using RFID 77
Vaibhavi Bhosale , Monali Deoghare, Dynanda Kulkarni, Prof S A Kahate
Data Analytics and Machine Learning

20 Online Recommendation System 81
Prof. Swapnil N. Patil, Ms. Vaishnavi Jadhav, Ms. Kiran Patil, Ms. Shailja
Maheshwari
21 Intelligent Query System Using Natural Language Processing Kshitij 86
Ingole, Akash Patil, Kalyani Kshirsagar, Pratiksha Bothara, Prof. Vaishali
S. Deshmukh
22 Mood Enhancer Chatbot Using Artificial Intelligence 92
Divya Khairnar, Ritesh Patil, Shubham Bhavsar, Shrikant Tale
23 Multistage Classification of Diabetic Retinopathy using Convolutional 96
Neural Networks
Aarti Kulkarni, Shivani Sawant, Simran Rathi, Prajakta Puranik
24 Predicting Delays And Cancellation Of Commercial Flights Using 101
Meteorological And Historic Flight Data
Kunal Zodape, Shravan Ramdurg, Niraj Punde, Gautam Devda, Prof.
Pankaj Chandre, Dr. Purnima Lala Mehta,
25 A Survey on Risk Assessment in Heart Attack Using Machine Learning 109
Rahul Satpute, Irfan Husssain, Irfan Husssain, Prof. Piyush Sonewar
26 Textual Content Moderation using Supervised Machine Learning 115
Approach
Revati Ganorkar, Shubham Deshpande, Mayur Giri, Gaurang Suki,
Araddhana Deshmukh
27 Survey Paper on Location Recommendation Using Scalable Content- 122
Aware Collaborative Filtering and Social Networking Sites
Prof. Pramod P. Patil, Ajinkya Awati, Deepak Patil, Rohan Shingate, Akshay
More
28 Anonymous Schedule Generation Using Genetic Algorithm 127
Adep Vaishnavi Anil, Berad Rituja Shivaji, Myana Vaishnavi Dnyaneshwar,
Pawar Ashwini Janardhan
29 A Survey on Unsupervised Feature Learning Using a Novel Non 131
Symmetric Deep Autoencoder(NDAE) For NIDPS Framework
Vinav Autkar, Prof P R Chandre, Dr. Purnima Lala Mehta
30 Turing Machine Imitate Artificial Intelligence 138
Tulashiram B. Pisal, Prof. Dr. Arjun P. Ghatule
31 A Survey on Emotion Recognition between POMS and Gaussian Naïve 145
Bayes Algorithm Using Twitter API
Darshan Vallur, Prathamesh Kulkarni, Suraj Kenjale, Suraj Kenjale
32 Anti-Depression Chatbot In Java 150
Manas Mamidwar, Ameya Marathe, Ishan Mehendale, Abdullah
Pothiyawala, Prof. A. A. Deshmukh
33 Emotion Analysis on Social Media Platform using Machine learning 158
Shreyas Bakshetti, Pratik Gugale, Sohail Shaikh, Jayesh Birari
34 Stock Market Prediction Using Machine Learning Techniques 164
Rushikesh M. Khamkar, Rushikesh P. Kadam, Moushmi R. Jain, Moushmi R. Jain
35 Stock Recommendations And Price Prediction By Exploiting Business 172
Commodity Information Using Data Mining And Machine Learning
Techniques
Dr. Parikshit N. Mahalle, Prof P R Chandre, Mohit Bhalgat, Aukush
Mahajan, Priyamvada Barve, Vaidehi Jagtap
36 A Machine Learning Model For Toxic Comment Classification 178
Mihir Pargaonkar, Rohan Nikumbh, Shubham Shinde, Akshay Wagh, Prof.
D.T. Bodake
37 Holographic Artificial Intelligence Assistance 186
Patil Girish, Pathade Omkar,Dubey Shweta, SimranMunot
38 Personal Digital Assistant To Enhance Communication Skills 191
Prof. G.Y. Gunjal, Hritik Sharma, Rushikesh Vidhate, Rohit Gaikwad, Akash
Kadam
39 Fake News Detection Using Machine Learning 194
Kartik Sharma, Mrudul Agrawal, Malav Warke, Saurabh Saxena
40 Cost-Effective Big Data Science in Medical and Health Care 199
Applications
Dr. S. T. Patil, Prof. G. S. Pise
41 AI – Assisted Chatbots For E-Commerce To Address Selection Of 206
Products From Multiple Categories
Gauri Shankar Jawalkar, Rachana Rajesh Ambawale, Supriya Vijay Bankar,
Manasi Arun Kadam, Dr. Shafi. K. Pathan, Jyoti Prakash Rajpoot
42 Distributed Storage, Analysis, And Exploration Of Multidimensional 216
Phenomena With Trident Framework
Nikesh Mhaske, Dr Prashant Dhotre
Data Mining and Image Retrieval

43 Utilising Location Based Social Media For Target Marketing In 222
Tourism: Bringing The Twitter Data Into Play
Prof. G. S. Pise, Sujit Bidawe, Kshitij Naik, Palash Bhanarkar, Rushikesh
Sawant
44 Cross Media Retrieval Using Mixed-Generative Hashing Methods 227
Saurav Kumar,Shubham Jamkhola, Mohd Uvais, Paresh Khade, Mrs
Manjusha Joshi
45 An Efficient Algorithm For Mining Top-K High Utility Itemset 232
Ahishek Doke, Akshay Bhosale,Sanket Gaikwad,Shubham Gundawar
46 Sarcasm Detection Using Text Factorization On Reviews 239
Tejaswini Murudkar, Vijaya Dabade, Priyanka Lodhe, Mayuri Patil, Shailesh
Patil
47 Prediction On Health Care Based On Near Search By Keyword 242
Mantasha Shaikh, Sourabh Gaikwad, Pooja Garje, Harshada Diwate
48 Crime Detection And Prediction System 249
Aparna Vijay Bhange, Shreya Arish Bhuptani, Manjushri Patilingale, Yash
Kothari, Prof. D.T. Bodake
49 Academic Assessment With Automated Question Generation And 254
Evaluation
Kishore Das, Ashish Kempwad, Shraddha Dhumal, Deepti Rana, Prof. S.P.
Kosbatwar
50 A Comprehensive Survey For Sentiment Analysis Techniques 258
Amrut Sabale, Abhishek Charan, Tushar Thorat, Pavan Deshmukh
51 E – Referencing Of Digital Document Using Text Summarization 263
Harsh Purbiya, Venktesh Chandrikapure, Harshada Sandesh Karne, Ishwari
Shailendra Datar, Prof. P. S. Teli
52 Online Shopping System With Stitching Facility 268
Akshada Akolkar, Dahifale Manjusha, Chitale Sanchita
53 A Survey On Online Medical Support System 272
Shivani J. Sawarkar, G.R. Shinde
54 Natural Language Question Answering System Using Rdf Framework 280
Maruti K. Bandgar, Avinash H. Jadhav, Ashwini D. Thombare, Poornima D.
Asundkar, Prof.P.P.Patil
55 Technique For Mood Based Classification Of Music By Using C4.5 284
Classifier
Manisha Rakate, Nandan More
56 Super Market Assistant With Market Basket And Inventory Analytics 290
Aditya Kiran Potdar, Atharv Subhash Chitre, Manisha Dhalaram Jongra,
Prasad Vijay Kudale, Prema S. Desai
57 Analysis And Prediction Of Environment Near A Public Place 295
Bhagyesh Pandey, Rahul Bhati, Ajay Kuchanur, Darshan Jain, S.P.
Kosbatwar
58 Secure Cloud Log For Cyber Forensics 300
Dr V.V.Kimbahune, Punam Shivaji Chavan, Priyanka Uttam Linge, Pawan
Bhutani
59 Traffic Flow Prediction With Big Data 304
Nitika Vernekar, Shivani Naik,Ankita More, Dr V V Kimbahune, Pawan
Bhutani
60 Determining Diseases Using Advance Decision Tree In Data Mining 309
Technology
Vrushali Punde, Priyanka Pandit, Sharwari Nemane
61 Survey Paper on Multimedia Retrieval Using Semantic Cross Media 314
Hashing Method
Prof.B.D.Thorat, Akash Parulekar, Mandar Bedage, Ankit Patil ,Dipali
Gome
62 Modern Logistics Vehicle System Using Tracking And Security 318
Arpit Sharma , Bakul Rangari , Rohit Walvekar , Bhagyashree Nivangune ,
Prof .G.Gunjal
Network and Cyber Security

63 Online Voting System Using OTP 324
Archit Bidkar,Madhabi Ghosh,Prajakta Madane,Rohan Mahapatra,Prof.
Jyoti Nandimath
64 Accident Detection And Prevention Using Smartphone 330
Sakshi Kottawar, Mayuri Sarode, Ajit Andhale, Ashay Pajgade, Shailesh
Patil
65 Generation of Multi-Color QR Code Using Visual Secret Sharing 335
Scheme
Nirup Kumar Satpathy, Sandhya Barikrao Ingole, Pari Sabharwal,
Harmanjeet Kour
66 Verifying The Integrity Of Digital Files Using Decentralized 340
Timestamping On The Blockchain
Akash Dhande, Anuj Jain, Tejas Jain, Tushar Mhaslekar, Prof. P. N. Railkar,
Jigyasa Chadha
67 Smart Phone Sensor App Using Security Questions 345
Prof.Yashanjali Sisodia, Miss.Monali Sable, Miss.Rutuja Pawar
68 A Survey on Privacy Awareness Protocol for Machine to Machine 353
Communication in IoT
Apurva R. Wattamwar, Dr. P. N. Mahalle, D. D. Shinde
69 Survey on Security Enhancement In Network Protocol 359
Jagdish S. Ingale, Pathan Mohd Shafi, Jyoti Prakash Rajpoot
70 Distributed Access Control Scheme for Machine to Machine 365
Communication in IoT Using Trust Factor
Miss. Nikita D. Mazire, Dr. Vinod V. Kimbahun, D. D. Shinde
71 Multimodal Game Bot Detection Using User Behavioral Characteristics 371
Prof. P.R.Chandre,Kushal Matha ,Kiran Bibave, Roshani Patil, Mahesh Mali
72 Mediblock- A Healthcare Management System Using Blockchain 375
Technology
Gayatri Bodke, Himanshu Bagale, Prathamesh Bhaskarwar, Mihir
Limaye, Dr S K Pathan, Jyoti Prakash Rajpoot
73 Survey On Multifactor Authentication System 379
Nisha Kshatriya, Aishwarya Bansude, Nilesh Bansod, Anil Sakate
Cloud Computing
74 Cloud Stress Distribution And De-Duplication Check Of Cloud Data 384
With Secure Data Sharing Via Cloud Computing
Amruta Deshmukh,Rajeshri Besekar,Raveena Gone,Roshan Wakode, Prof.
D.S.Lavhkare
75 Efficient Client-Side Deduplication Of Encrypted Data With Improved 389
Data Availability And Public Auditing In Cloud Storage
Akash Reddy, Karishma Sarode, Pruthviraj Kanade,Sneha M. Patil
76 A Novel Methodology Used To Store Big Data Securely In Cloud 397
Kale Piyusha Balasaheb, Kale Piyusha Balasaheb, Ukande Monika Prakash
77 Survey Paper on Secure Heterogeneous Data Storage Management with 402
Deduplication in Cloud Computing
Miss. Arati Gaikwad, Prof. S. P. Patil
78 Survey on A Ranked Multi-Keyword Search in Cloud Computing 411
Mr.Swaranjeet Singh, Prof. D. H . Kulkarni
79 Private Secure Scalabale Cloud Computing 417
Himanshu Jaiswal, Sankalp Kumar, Janhvi Charthankar, Sushma Ahuja
Image & Signal Processing

80 Indoor Navigation Using Augmented Reality 423
Prof. B. D. Thorat, Sudhanshu S. Bhamburkar, Sumit R. Bhiungade,
Harshada S. Kothawade, Neha A. Jamdade
81 AI Based Lesion Detection System 430
Mayuri Warke, Richa Padmawar, Sakshi Nikam, Veena Mahesh, Prof.
Gitanjali R. Shinde, D. D. Shinde
82 Leap Virtual Board: Switchless Home Appliances Using Leap Motion 436
Aakanksha kulkarni, Sakshi chauhan, Vaishnavi sawant , Shreya satpute,
Prof P.N Railkar, Jigyasa Chadha
83 Recognition Of Fake Indian Currency Notes Using Image Forgery 442
Detection
Kishory Chavan,Rutuja Padwad,Vishal Pandita,Harsh Punjabi, Prof P S
Raskar, Jigyasa Chadha
Detection84Of Suspicious Person And Alerting In The Security System 448
Avani Phase,Purva Puranik,Priyal Patil, Rigved Patil,Dr Parikshit Mahalle,
D. D. Shinde
85 Adaptive Computer Display For Preventing Computer Vision 456
Syndrome
Manpreet Kaur, Dhanashri Yadav, Ruhi Sharma, Aman Katiyar,Bhakti Patil
86 AAS [Automated Attendance System] Using Face Discernment And 460
Recognition Using Faster R-Cnn, Pose Correction & Deep Learning
Mohit Vakare, Amogh Agnihotri, Adwait Sohoniand Sayali Dalvi,
Prof. Araddhana Arvind Deshmukh
87 A Survey Of Current Digital Approaches To Improve Soil Fertility 465
Rahul Nikumbhe, Jaya Bachchhav, Ganesh Kulkarni, Amruta Chaudar
88 IoT Based Polyhouse Monitoring And Controlling System 470
Shelke Snehal, Aware Yogita, Sapkal Komal, Warkad Shweta
89 Adoption Of E-Learning In Engineering Colleges For Training The 474
Students
Santosh Borde , Yogesh Kumar Sharma
90 Ict Gadget: Design Of E-Learning System For Rural Community 482
Ansari M A, Yogesh Kumar Sharma
91 Diesease Infected Crop Identification Using Deep Learning and Sugestion 488
of Solution
J. N. Nandimath, Sammit Ranade, Shantanu Pawar, Mrunmai Patil
92 Crop Recommendation Based On Local Environmental Parameters 496
Using Machine Learning Approach
Saurabh Jadhav, Kaustubh Borse, Sudarshan Dhatrak, Milind Chaudhari
93 A Survey On Key Distribution And Trust Based Scheme On Big Data 500
Analysis For Group User On Cloud Service
Mrunal S.Jagtap, Prof.A.M.Wade
94 Survey On Mining Online Social Data For Detecting Social Network 507
Mental Disorders
Miss. Aishwarya Uttam Deore, Prof. Aradhana A. Deshmukh
95 Survey On Secure Cloud Log For Cyber Forensics 513
Arati S. Patil, Prof. Rachana A. Satao
96 Analysis And Evaluation Of Privacy Policies Of Online Services Using 519
Machine Learning
Ashutosh Singh, Manish Kumar, Rahul Kumar, Dr. Prashant S. Dhotre
97 Web Image Search Re- ranking Dependent on Diversity 525
Nagesh K Patil, S B Nimbekar
INTERNET OF
THINGS
Proceedings of International Conference on Internet of Things,Next Generation Network & Cloud Computing 2019
AUTOMATED TOLL COLLECTION SYSTEM AND

THEFT DETECTION USING RFID
Samruddhi S. Patil1, Priti Y. Holkar2, Kiran A. Pote3, Shubhashri K. Chavan4, Asmita
Kalamkar5
1,2,3,4,5
Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune, India.
psamruddhi97@gmail.com1, pritiholkar123ph@gmail.com2, kiranpote2013@gmail.com3,
shubhkc1310@gmail.com4, asmitakalamkar@gmail.com5
ABSTRACT
In country like India, manual toll collection is quite time consuming due to the
overwhelming number of populations. Vehicles line up forming long queues for toll
collection is a cumbersome task. By taking under consideration all these problems, we
have come up with an automated toll collection system and theft detection using RFID.
In this system, toll is automatically deducted from the customer‘s account and he/she is
notified about the same through a text message. In case of an accident or theft, if the
car happens to pass through a toll plaza, it can be blocked there itself. Moreover, there
will be a display on the plaza which will show the deducted toll cash to the person
assigned for monitoring the toll functioning.
Keywords
Automated Toll Collection, Radio Frequency Identification (RFID), Global System of
Mobile (GSM), Arduino ATMega328, Theft Detection
1. INTRODUCTION unsolved because the vehicles involved
The national highway network in India is a could not be recognized accurately as
network of trunk roads of over 1,15,435. recognizing them manually is very
The Government of India plans various difficult and cumbersome.
policies for national highways. The [1] Also, in today‘s implemented toll system,
Government of India or National Highway at most of the toll plazas, toll is being
Authority of India (NHAI) works in collected manually which has become a
public-private partnership model for tedious job as the vehicles are made to line
highway development. Thus, the up in long queues and involves more time
government collects toll tax for for toll collection. However, it involves
maintenance and construction. In India huge manpower to carry out redundant
there are about 468 toll plazas. While work. Another way which is adapted
national highway constitutes 2.7% of recently, is using FASTags which work
Indian roads, they carry 40% of traffic. like RFID tags. Moreover, it is being
With such a heavy traffic flowing on practiced on a small scale with less
national highways, the toll collection objectives. But the system efficiency is not
needs to be made as fast as possible to taken care of.
avoid long queues of vehicles. So, we are emphasizing to build in the
As population of India is increasing day by loopholes and adding more objectives
day, the number of private as well as along with toll collection by implementing
public vehicles are also increasing. This an automated toll collection system which
increase in number of vehicles is also will take into consideration wider
serving a reason for increase in traffic and objectives such as accidental scenarios and
various crimes associated with it. [2] theft detection.
Various cases of theft, hit and run, However, Radio Frequency Identification
robbery, kidnapping, smuggling is technology has now come at the boom
increasing day by day and reported. which is being used in many sectors on a
Though the number of crimes is large scale. Mainly, RFID is used for
increasing, many such crimes remain
ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 1

Proceeding of International Conference on Internet of Things,Next Generation Network & Cloud Computing 2019
tracing vehicles, government sectors, Collection System Using Embedded Linux

Aerospace and Healthcare. platform. In his proposed system, a camera
The system proposed in this paper is based will capture an image of the arrived
on an automated toll collection using vehicle at toll plaza and depending on the
RFID [3]. An RFID tag is attached to each size of the vehicle detected by camera,
vehicle for the unique identification of the appropriate amount of toll is charged. And
vehicle and RFID readers are placed at all also, this system can be used to count the
the toll plazas. When a vehicle comes moving vehicles from stored videos.
within the range of an RFID reader placed In this [2] paper, an algorithm is proposed
on the toll plaza, the reader reads the RFID to recognize Indian vehicle number plates.
tag through Radio frequency and sends the A camera is used to capture the image of
information to the system through Arduino the vehicle passing through toll plaza
ATMega328. The details pertaining the which then will be used to retrieve the
owner is retrieved from the database by vehicle number and using the vehicle
matching the vehicle number provided and number the toll amount from the
thus, the required owner‘s details are respective account can be deducted. This
displayed on the desktop provided on the algorithm addresses the problem of scaling
toll plaza. Automatic deduction of toll is and recognition of position of characters
carried out from the user‘s prepaid account with a good accuracy.
and the user is notified about the same In this paper [3], the concept of Automated
through a text message through the GSM Toll Collection using a low cost and low
incorporated with the Arduino power consuming microcontroller
ATMega328 micro-controller. Once the MSP430 Launch pad is discussed where
toll is deducted, the barricades are opened they have used an approach where a
up and the vehicle can safely pass through traveller will pay the toll while in motion
it. using RFID which will in turn save time,
In another scenario, if a vehicle is stolen, effort and man power. Also, the number of
the owner can file a complaint against the vehicles passing through toll plaza and
stolen vehicle and a FIR No will be number of times the vehicle passed
assigned to it by the police. The FIR No. through that toll plaza in a day is stored in
will then be used to blacklist the vehicle in database. The owner will receive an SMS
the central database through user message on his/her mobile about the
application. When the stolen vehicle or details of the payment.
any blacklisted vehicle shall pass through This paper [4] compares the spectral range
any of the toll gates, the barricades will of the current RFID system with the future
block the vehicle right there. Similarly, in scenario where the modification of the
a hit and run case, if anyone notes the spectral range for the TAV project is done
vehicle no. the information will be sent to and examined whether there was
all the toll plazas and the vehicle no. will degradation of performance in the reading
be blacklisted. Thus, vehicle can be rate of RFID systems that were already
blocked when it happens to pass through implemented.
any of the toll plazas. In this paper the author [5] discusses about
various threats posed while using RFID
2. LITERATURE SURVEY tags like privacy leakage when tags are
In this paper [1] the author presents a brief read by an unauthorized reader. The author
review about the toll collection system also proposes salted hash algorithm to
present in India, their advantages and avoid this theft where authentication of
disadvantages and proposes an efficient both tag and the reader is done without
model for toll collection using Computer leaking any important and vulnerable
Vision Vehicle Detection for Toll values to the reader where the algorithm

responds to the reader with a random predetermined amount is automatically

number each time it proposes a query. deducted from its account.
In this paper [6], RFID technology is used In this [11] paper, Vehicle Number
for development of tracking system for Recognition (VNR) which is an image
vehicles. Also, the paper addresses the processing technology which uses efficient
major problems like traffic signal timings, algorithms to detect the vehicle number
congestion due to vehicles and theft of from real time images and implemented it
vehicles which can be detected using track for automatic toll tax collection.
logs of the vehicles. In this paper [12], a system that enables
In this paper [7], the main study is done to road users to pay the toll fees without
explore the various existing approach of stopping or slowing down was proposed
toll collection in India and also to their and developed. They proposed Global
merits and demerits are discussed. Also, Positioning System (GPS)-based highway
they have addressed the prevention of toll collection system. In general, the
motorists and toll authorities‘ manually system utilized GPS coordinates to detect
performance of ticket payments and to whether a vehicle passed through
check driving without proper document, predefined locations and if so the
overloaded vehicle and others respectively. respective toll amount will be deducted
In [8] this paper RFID tag was tested and also the travel details are recorded.
against harsh environmental conditions In this paper [13], a fully passive printable
like -30oC blast freezing and exposure to Quick Response (QR) code embedded chip
gamma irradiation. Also, survivability of less RFID (Radio Frequency
the tag was checked by following criteria: Identification) technique is presented for
read/write ability at different distances and secure identification of alive and nonalive
within time threshold and data integrity of amenity. This paper proposes a better
pre-encoded data before and after each technology than barcode for identification
test. purpose. Here, a series of QR codes are
The [9] author compared three different printed in the form of a resonator in
toll collection systems i.e. manual, semi- passive RFID tag, and the coded
automated using pre-paid card and information is retrieved through frequency
automated toll collection system using domain reflectometry method for
RFID technology. The survey conducted identification. This tag can be read from a
for ETC had following results: a) About distance of about 2 km efficiently.
65% of the ETC user stated that higher In this paper [14], design of an algorithm
transaction speed was the main reason of for vehicle identification by recognizing
using ETC b)87% of respondents stated the number plate is presented. Also, this
that it was easy to add the balance in the paper represents the concept of
card c)66% of respondent had no problem classification of a vehicle based on the
in transaction d) Finally about 83% image captured into small, medium and
respondent were satisfied with the existing large vehicle so as to deduct toll amount
condition of the ETC. based on it, Here, Genetic algorithm (GA)
In this paper [10], author has proposed a is employed at two levels: for detecting
system for automatic vehicle tracking, vehicle from traffic image and recognizing
time management and also for automation character from the number plate. Detection
of Toll gate. In this system, a is based on contour and shape information.
computerized system automatically In this [15] paper, the problem to make
identifies an approaching vehicle and RFID reader read better is addressed. For
records the vehicle number & Time, it this problem, they propose a method for
automatically opens the Toll Gate and a optimizing the position of passive UHF
RFID tags. Firstly, a relative ideal test

environment was built and then, for each In the proposed system RFID tags are
location of the label attached on the used. They can be attached in the front
container, distance between the container portion i.e. wind shield of the vehicle or
and the antenna along a fixed direction the side portion of the vehicle. Passive tags
was changed. Finally, they concluded on are being used because of their feasibility.
how to determine the preferred location of Passive tags do not have their own battery.
a RFID tag. When the vehicles enter the toll gates the
active device here i.e. readers will emit
3. GAP ANALYSIS the radio waves, as soon as these waves
In India, almost all toll collection on toll contacts with tags, it produces the
plazas is done manually. Also due to large magnetic field. The same draws the power
population and heavy road transportation it out of it and sends the data to the
is time consuming and causes traffic controller.
congestion on toll plazas. While there are The reader is connected to the
some toll plazas in India which have microcontroller. Arduino ATMega328 is
started to implement electronic toll used as microcontroller here. The reader
collection, but is not being implemented scans the tags and sends it to the main
on large scale. system here it is Arduino. Then Arduino
Though there are many proposed systems checks it with the database for that unique
for implementing automated toll ID. There will be a user interface on the
collection, however issue of theft detection desktop at the toll plaza. After checking
is not addressed so far. So, to enhance the the information from the database details
current systems, we are proposing are displayed on the user interface. If
automated toll collection with theft Details are matched the amount is
detection to overcome time consumption, deducted and command is issued to the
long queues, fuel wastage and to identify servo motor to lift up the barricade. A
stolen vehicles. central database is maintained which
consists of t So as soon as the vehicle
4. PROPOSED SYSTEM enters the toll plaza RFID tag is scanned
In this proposed system we are using RFID and information regarding the vehicle is
(Radio Frequency Identification) displayed. Toll is automatically deducted.
technology. This technology makes the use And the message is sent to the registered
of radio frequency to identify the objects. mobile number using GSM technology. In
Thus, RFID technology will enable the case if the RFID ID or number is not
automatic toll collection which conserves matched then the barricades will not be
time and energy and presents an efficient lifted up and the vehicle will be blocked
system for automation transaction. there. This is theft detection. For the
movement of barricades servo motor is
used.
Fig 1: Block Diagram

4.1. Proposed pseudocode-

Algorithm Check_Vehicle
(No:RFID_Number):
node=find_number(No);
if(node.status=="Blocked"):
sendmsg_to_userofcar();
sendsignal_to_barrigates();
else:
if(node.amount>200):
if(node.timer<24hours):
deductmney(tollamt*0.75);
else:
deductamount(tollamt);
else:
if(node.amount>100):
send_warning_msg();
else:
send_redalert_msg();
sendmsg_to_usertoaddmoney();
end;
5. CONCLUSION AND FUTURE

WORK
In this paper, the concept of Automated
Fig 2: Flow of actions at toll plaza Toll Collection is presented using Aurdino
In another case if the vehicle prepaid ATMega328. Here we have used an
account is not having ample amount the innovative approach where a traveller will
vehicle is asked to go to another lane i.e. be able to pay the toll while in motion
where manual toll is being collected for using RFID communication technology.
toll collection. This approach will save travelling time,
A central database is maintained. It avoid traffic congestion, less man power
consists of the unique IDs and the required and there will be no hassle of
information of the vehicle having that FID. leasing the money. As the important
It consists of the parameters to find theft. feature of the project is theft detection, so
The GSM module is there, which sends the when a vehicle is stolen and it passes
message to the registered mobile number through the toll gate then it will be
when the toll is deducted and along with detected and a proper action would be
that the location of the toll is also sent. he taken. Thus, theft detection would have
information of valid user and its vehicle. impact at large scale. In future a separate
In proposed system, the hardware that are application could be provided for tracking
required are as follows: of stolen or suspicious vehicles for the
 Arduino ATMega328 police. Also, tracking of stolen vehicle can
 Passive RFID tag be done. And at the same time multilane
 RFID Reader and barricade-less toll gate system can be
 GSM Module created.
 Stepper Motor
Using these hardware components, the REFERENCES
automated toll collection and theft [1] Mr. Abhijeet Suryatali, Mr. V. B.
detection can be possible. Dharmadhikari, ―Computer Vision Based
Vehicle Detection for Toll Collection System

Using Embedded Linux ",2015 International [9] Rudy Hermawan Karsaman, Yudo Adi
Conference on Circuit, Power and Computing Nugraha, Sri Hendarto, Febri Zukhruf,"A
Technologies [ICCPCT]. COMPARATIVE STUDY ON THREE
[2] Hanit Karwal, Akshay Girdhar,"Vehicle ELECTRONICS TOLL COLLECTION
Number Plate Detection System for Indian SYSTEMS IN SURABAYA",2015
Vehicles",2015 IEEE International Conference International Conference on Information
on Computational Intelligence & Technology Systems and Innovation (ICITSI)
Communication Technology. Bandung – Bali, November 16 – 19, 2015
[3] Sana Said Al-Ghawi, Muna Abdullah Al ISBN: 978-1-4673-6664-9.
Rahbi, Dr. S. Asif Hussain, S. Zahid Hussain," [10] Janani Krishnamurthy, Nitin Mohan,
AUTOMATIC TOLL E-TICKETING Rajeshwari Hegde, "Automation of Toll Gate
SYSTEM FOR TRANSPORTATION and Vehicle Tracking‖, International
SYSTEMS ", 2016 3rd MEC International Conference on Computer Science and
Conference on Big Data and Smart City. Information Technology 2008.
[4] Renata Rampim de Freitas Dias, Hugo E. [11] Shoaib Rehman Soomro Mohammad Arslan
Hernandez-Figueroa, Luiz Renata Costa, Javed Fahad Ahmed Memon," VEHICLE
―Analysis of impacts on the change of NUMBER RECOGNITION SYSTEM FOR
frequency band for RFID system in Brazil ―, AUTOMATIC TOLL TAX COLLECTION",7
Proceeding of the 2013 IEEE International December 2012.
Conference on RFID Technologies and [12] Jin Yeong Tan, Pin Jern Ker*, Dineis Mani
Applications, 4 - 5 September, Johor Bahru, and Puvanesan Arumugam, ―Development of a
Malaysia. GPS-based Highway Toll Collection System
[5] Pinaki Ghosh, Dr. Mahesh T R, ―A Privacy ",2016 6th IEEE International Conference on
Preserving Mutual Authentication Protocol for Control System, Computing and Engineering,
RFID based Automated Toll Collection 25–27 November 2016, Penang, Malaysia.
System‖, November 2016. [13] G. Srivatsa Vardhan, Naveen Sivadasan,
[6] A.A. Pandit, Jyot Talreja, Ankit Kumar Ashudeb Dutta,"QR-Code based Chipless
Mundra, ―RFID Tracking System for Vehicles RFID System for Unique Identification",2016
(RTSV)",2009 First International Conference IEEE International Conference on RFID
on Computational Intelligence, Technology and Applications (RFID-TA).
Communication Systems and Networks. [14] P. Vijayalakshmi, M. Sumathi, ―Design of
[7] K. Gowrisubadra, Jeevitha.S, Selvarasi.N, "A Algorithm for Vehicle Identification by
SURVEY ON RFID BASED AUTOMATIC Number Plate Recognition‖, IEEE- Fourth
TOLL GATEMANAGEMENT ",2017 4th International Conference on Advanced
International Conference on Signal Processing, Computing, ICoAC 2012 MIT, Anna
Communications and Networking (ICSCN - University, Chennai. December 13-15, 2012.
2017), March 16 – 18, 2017, Chennai, INDIA. [15] Zhu Zhi-yuan, Ren He, Tan Jie, "A Method for
[8] Alfonso Gutierrez, F. Daniel Nicolalde, Atul Optimizing the Position of Passive UHF RFID
Ingle, Clive Hohberger, Rodeina Davis, Tags ―, Program for the IEEE International
William Hochschild and Raj Conference on RFID-Technology and
Veeramani,"High-Frequency RFID Tag Applications, 17 - 19 June 2010 Guangzhou,
Survivability in Harsh Environments Use of China.
RFID in Transfusion Medicine",2013 IEEE
International Conference on RFID.

WI-FI BASED HOME SURVEILLANCE BOT USING

PI CAMERA & ACCESSING LIVE STREAMING
USING YOUTUBE TO IMPROVE HOME SECURITY
Ritik Jain1, Varshun Tiku2, Rinisha Bhaykar3, Rishi Ahuja4, Prof. S.P.Pingat5
1,2,3,4,5
Department of Computer Engineering, Smt Kashibai Navale College of Engineering,
Vadgaon(Bk), Pune, India.
ABSTRACT
There are various surveillance systems such as camera, CCTV etc. available. In these
types of surveillance systems, the person who is stationary and is located in that
particular area can only view what is happening in that place. We proposed a system to
build a real-time live streaming and monitoring system using Raspberry pi with
installed Wi-Fi connectivity. Whereas we can monitor the movements in 360 degrees
which is accomplished with the help of motors. Also we are going to detect gas leakage.
By using video cameras, information returned by ROBOT analyzed the real time
images so that the computation effort, cost and a resource requirements needed are
significantly decreased.
1. INTRODUCTION .Raspberry pi is a simple circuit .The
Traditionally, [1] surveillance systems operating system used is Raspbian OS.
are installed in every security critical Gas leakage being one of the most
areas. These systems generally consist of frequently observed
high quality cameras, multiple computers parameter, and is extremely harmful. So,
for monitoring, servers for storing these proposed system is capable of monitoring
videos and many security personnel for this value indefinitely without any delay.
monitoring these videos. When considered Our proposed system is implemented on
as a whole, these systems can yield great Raspberry Pi and interfaced with gas
complexities while installing as well as for sensor and controlling the device and also
their maintenance. The CCTV camera live video streaming is implemented for
feeds are only visible in certain locations quick actions. Mobile video surveillance
and they also have limited range within system has been envisioned in the
which these can be viewed. Above all literature as either classical video
these, the cost of implementations of these streaming with an extension over wire and
systems is so high that they cannot be wireless network system to control the
installed in every household. human operator. Remote monitor is
Raspberry pi is a credit-card sized becoming an important maintenance
computer. Its functions are almost as a method that is based on the network. There
computer. There are various existing are two units Raspberry Pi Unit and
surveillance systems such as camera, Process unit with wireless link between
CCTV etc., in these types of surveillance them. Sensor unit will send sensor reading
systems, the person is stationary and is to Raspberry Pi Unit which will be
located in that particular area can only uploaded to the server. The Pi camera will
view what is happening in that place. be connected to Raspberry Pi CSI camera
Whereas, here, even if a person is moving port.
from place to place. The main advantage
of this system is can be used in security 2. MOTIVATION
purpose and another advantage is that it A robot is generally an electro-mechanical
can offers privacy on both sides since it is machine that can perform tasks
being viewed by only authorized person automatically. Security is one of the

applications that everyone needs to be night time as it is having the potential to

controlled remotely. Nowadays, houses are provide night vision. Night vision
getting robbed by burglars and gas capability is attained by simply taking off
leakages are causing fire hazards. By infra-red (IR) filter from an ordinary web-
2020, most of the homes will have home cam and thus can be used for night vision
surveillance systems. sensing with the help of IR Light Emitting
Diode illuminator. Multi-environment
3. STATE OF ART robot for surveillance and live streaming is
Smart Security Camera using developed to assemble real-time
Raspberry pi and OpenCV is a system surveillance system possible within a local
constructed for surveillance and it is network. The live streaming is
designed to be used inside a warehouse accomplished using mjpg streamer and the
facility. This system is devised using a server-client model is build using java. As
low-cost security camera with night vision IP-based installation provide access from
capability using a raspberry pi. This anywhere and hence are preferred over the
system is having the ability of gas leakage analogue system. IP-based systems offer
detection that can be used to avoid superior picture quality and they are also
potential crimes and potential fire. [6] favorable when it comes to scalability and
Basically, two gear motors are sufficient to flexibility. But IP -based system needs
produce the movement of spy robot and some knowledge about networking and
the motor driver module is used to supply these systems are too expensive than the
enough current to drive two gear motors analog ones. This raspberry pi controlled
which protects the Raspberry-pi module robot is incorporated by a server-client
from the damage. The major advantage of model. This client-server model is
using the minimum number of gear motor constructed on java and thus can work on
is minimizing the power consumption. The any systems such as windows, Mac or
researchers evolved a light-footed Linux. This entire model is connected to a
surveillance camera that has the potential local network and anyone available in that
of identifying the condition of the scene particular local network can control it from
that is being monitored and also gives anywhere. The live streaming is done by
notification or alarm as the event occurs. MJPG streamer.
This system also provides security during
4. GAP ANALYSIS
Table 1: Gap Analysis
Sr. Paper Name Year Publication Concept

No.
1. Implementation of Spy 2017 IEEE In this present work, a

Robot for A Surveillance Raspbian operating system
System using Internet based spy robot platform
Protocol of Raspberry Pi with remote monitoring
and control algorithm
through IoT has been
developed which will save
human live, reduces
manual error and protect
the country from enemies.
2. Implementation of Cloud 2016 ICCSP This paper presents cloud

Based Live Streaming for based surveillance system

Surveillance for live video streaming
that can be surveillance
from anywhere and
anytime.
3. Video Surveillance Robot 2016 ICCSP This paper proposes a
Control using Smartphone method for controlling a
and Raspberry Pi wireless robot for
surveillance using an
application built on
Android platform.
4. Remote Control Robot 2014 ICCC The paper describes the
Using Android Mobile design and realization of
Device the mobile application for
the Android operating
system which is focused
on manual control of
mobile robot using
wireless Bluetooth
technology.
5. A Model for Remote 2014 ICFCNA A camera ―eye of robot‖
Controlled Mobile captures and transmits
Robotic over Wi-Fi images/videos to the
Network Using Arduino operator who can then
Technology recognize the surrounding
environment and remotely
control the module.
5. PROPOSED SYSTEM There are two H-Bridges in IC. There are

We proposed a system to build a real- four input pins, each of two pins control a
time live streaming and monitoring system single DC motor. By changing the logic
using Raspberry pi with installed Wi-Fi level on two pins like ―0 and 1‖ or ―1 and
connectivity. In monitoring phase, the pi 0‖ the motor rotation direction has been
will record the video of the location in controlled. A portable charger of 2 amp
real-time. Capturing video is done through current is connected to the motor shield
commands given through the computer to and raspberry pi. Once the connections are
the raspberry pi. This command will be done properly the raspberry pi is ready to
communicated to the pi using Wi-Fi. The boot up. A Python program is written for
pi camera is being used which will give a controlling the motors wherein the GPIO
very good quality of the picture in the pins will give out the output from the
video. The connection of Raspberry pi raspberry pi to the motor shield. The robot
with the motor driver is done using the movement is controlled through the
General Purpose Input Output (GPIO) pins directions mentioned on the web page
of Raspberry Pi. The GPIO pins are created using Hypertext Markup Language
connected to the input pins of the motor (HTML) code and webpage Universal
shield. The output pins of the motor shield Resource Locator (URL) address. This
are connected to the motors. [4] Motor process is communicated through Wi-Fi to
driver IC allows DC motor to run in either he Raspberry Pi model B. The camera
clockwise or anticlockwise direction. module is installed into its port and it is
L293D works on H-Bridge principle. enabled in raspberry pi settings. For the

Live Streaming of videos, MJPEG 5. If Result is equal to ‗L‘ Move robot

streamer is installed and configured. After LEFT
the configuration steps are done just view 6. If Result is equal to ‗S‘ Robot STOP
the live streaming in the app as well as the 7. If Gas leakage is detected by gas sensor
website. The website has been developed Send alert message to mentioned mobile
to allow a large number of people to number. (The live streaming will directly
experience the live streaming irrespective process from terminal command.)
of their location. Here admin rights are
given to authenticate the visibility of 5.4 FLOWCHART
critical information by only authentic
users.
5.1 ARCHITECTURE
Fig 1: Architecture
Fig 2: Flow Chart
5.2 MATHEMATICAL MODEL
The Mathematical model for this system is 6. CONCLUSION
as follows:- The smart supervisor system we
Input={in1,in2,in3,in4) have built surveillance and real time video
Forward={in1=1,in2=0,in3=1,in4=0) streaming system in which authentication
Backward(in1=0,in2=1,in3=0,in4=1) is required to access the smart supervisor
Right(in1=1,in2=0,in3=0,in4=0) system. The smart supervisor system
Left(in1=0, in2=0, in3=1, in4=0) displaying the gas sensor value. This
Stop(in1=0, in2=0, in3=0, in4=0) message is based on the response received
Where in1 & in2 denotes input of left from the smart supervisor system server &
motor Smart phone. Whenever the gas leakage is
Where in3 & in4 denotes input of right detected, a mail is going to be sent to the
motor registered mobile number. If correct IP
address is provided, the app will proceed
5.3 ALGORITHM to display the various device operations &
1. Result = get data from firebase database video streaming operations. According to
2. If Result is equal to ‗F‘ Move robot the instructions provided by the app on our
FORWARD android mobile we can operate the
3. If Result is equal to ‗B‘ Move robot movement of the robot. The robot can
BACKWARD move in forward, backward, left and right
4. If Result is equal to ‗R‘ Move robot direction. The command used for live
RIGHT streaming is as follows:- raspivid -o - -t 0 -

vf -hf -fps 10 -b 500000 | ffmpeg -re -ar [1] R, H., & Safwat Hussain, M. H. (2018).
44100 -ac 2 -acodec pcm_s16le -f s16le - Surveillance Robot Using Raspberry Pi and
IoT. 2018 International Conference on Design
ac 2 -i/dev/zero -f h264 -i - -vcodec copy - Innovations for 3Cs Compute
acodec aac -ab 128k -g 50 -strict CommunicateControl(ICDI3C).doi:10.1109/ic
experimental -f di3c.2018.00018
flvrtmp://a.rtmp.youtube.com/live2/j1s8- [2] Oza, N., & Gohil, N. B. (2016).
d349-9536-8d6r. Implementation of cloud based live streaming
for surveillance. 2016 International Conference
[2] Surveillance system is available with on Communication and Signal Processing
various features. Selection is based on (ICCSP). doi:10.1109/iccsp.2016.7754297
various factors such as cost, video quality [3] Nadvornik, J., & Smutny, P. (2014).
etc. Proposed system is cost effective as Remote control robot using Android mobile
well as user friendly. It has application in device. Proceedings of the 2014 15th
International Carpathian Control Conference
different fields like military, defenses, ICCC).doi:10.1109/carpathiancc.2014.684363
house, office and environment monitoring. 0.
System can be enhanced by using face [4] Bokade, A. U., & Ratnaparkhe, V. R. (2016).
detection and recognition to follow a Video surveillance robot control using
particular person like children below 4 smartphone andRaspberry pi. 2016
International Conference on Communication
years so that they are continuously in front and Signal Processing (ICCSP).
of our eyes. doi:10.1109/iccsp.2016.7754547
[5] Aneiba, A., & Hormos, K. (2014). A Model for
7. FUTURE SCOPE Remote Controlled Mobile Robotic over Wi-
1. Major improvements on the system Fi Network Using Arduino Technology.
International Conference on Frontiers of
processor speed are much needed in order Communications, Networks and Applications
to process large files e.g. video for (ICFCNA 2014-
effective motion detection and tracking. Malaysia). doi: 10.1049/cp.2014.1429.
2. The designed security system can be [6] Abdalla, G. O. E., & Veeramanikandasamy, T.
used in homes to monitor the facility at (2017). Implementation of spy robot for a
surveillance system using Internet protocol of
any given time. Raspberry Pi. 2017 2nd IEEE International
3. The system requires to be remotely Conference on Recent Trends in Electronics,
controlled. Hence, future explorations Information & Communication
should focus much more on the same. Technology(RTEICT).doi:10.1109/rteict.2017.
REFERENCES 8256563.

SMART DUSTBIN WITH METAL DETECTOR

Dhiraj Jain1, Vaidehi Kale2, Raksha Sisodiya3, Sujata Mahajan4, Dr. Mrs. Gitanjali R.
Shinde5
1,2,3,4,5
Computer Department, SKNCOE, Pune,India.
dhirajj75@gmail.com1, vaidehimaheshkale@gmail.com2, rakshasisodiya121@gmail.com3,
mahajansujata23@gmail.com4, grshinde@sinhgad.edu5
ABSTRACT
In the past few decades there is a rapid increase in urbanization. So, management of
waste is one of the issues we are facing nowadays. As India is a developing nation, the
important challenge is turning our nation's cities into smart cities. Swachh Bharat
Mission, is an urban renewal and retrofitting program by the government of India with
the mission to develop 100 cities across the country making them citizen friendly and
sustainable. For making this possible we need smart cities with smart streets enabled
with smart garbage monitoring system. The aim of the mission is to cover all the rural
and urban areas of the country to present this country as an ideal country before the
world.
In this proposed system, multiple dustbins from the different areas throughout the cities
are connected using IOT technology. The dustbin uses low cost embedded devices and
it will sense the level of dustbin, then it is sent to the municipality officer. Smart bin is
built on Arduino Uno board which is interfaced with GSM modem, Ultrasonic sensor
and Metal detector. Ultrasonic sensor is placed at the top of the dustbin which will
measure the status of the dustbin and metal detector will prevent metal from getting
mixed with the garbage. Arduino will be programmed in such a way that when the
dustbin is being filled, the remaining height from the threshold height will be displayed.
Once the garbage reaches the threshold level ultrasonic sensor will trigger the GSM
modem which will continuously alert the required authority. Also, metal detector give
alert to indicate that garbage contains metal.
Keywords
GSM (Global System for Mobile communication); IOT (Internet of Things); LED (Light
Emitting Diode); ILP (Integer Linear Programming); IoT; Smart city; Smart Garbage
Dustbins; Arduino; Ultrasonic Sensors;
1. INTRODUCTION transceivers for digital communication that
The main aim of this project is to reduce will be able to communicate with one
human resources and efforts along with the another. [1] There is a rapid growth in
enhancement of a smart city vision. urbanization and modernization. With
Garbage Monitoring System and Metal respect to urbanization, we must have
Detection: - sustainable urban development future
Garbage may consist of the unwanted plans. To achieve this, we propose smart
material surplus from the City, Public area, dustbins with metal detector. Our proposed
Society, College, home etc., due to these project is based on IOT, refers to wireless
wastes there will be poisonous gases network between objects. The internet of
emitting from them which is harmful for things helps us make dustbins that can be
the nearby residents which leads to severe easily sensed and remotely accessed and
diseases. This survey is related to the controlled from the internet. Here we get
"Smart garbage monitoring system using real time information of dustbins. [3] The
internet of things". So, for a smart main problem in the current waste
lifestyle, cleanliness is crucial. This helps management system in most of the Indian
us to eradicate the garbage disposal cities is the unhealthy status of dustbins. In
problem using Internet of Things (IoT) in this project, we have tried to upgrade the
which this is done using microcontrollers, trivial but vital component of the urban

waste management system, i.e. dustbin. Algorithm. This is real time waste
The main focus of our project is to create management by using smart trash bins that
an automatic waste management system can be accessed anytime anywhere by the
across the whole city and monitoring by a concerned person.
single system efficiently and separate the [2] Bikramjit Singh et al, Manpreet Kaur
metal in the garbage at its origin to reduce in ―Smart Dustbins for Smart Cities‖ has
the separation of metals and garbage at the imposed that the garbage collection system
dumping place. It will also help to reduce has to be smarter and in addition to that the
the cost of separation of metals and people need easy accessibility to the
garbage. This can prove to be a new garbage disposing points and garbage
revolution in the smart city collection process has to be efficient in
implementation. terms of time and fuel cost. Paper has GPS
and internet enabled Smart Dustbin,
2. MOTIVATION Garbage Collection and disposing,
These malodorous rotten wastes that Garbage Collection Scheduling, Nearest
remain untreated for a long time, due to Dustbin.
negligence of authorities and carelessness [3] Ahmed Omara, Damla Gulen, ,Burak
of public may lead to long term problems. Kantarci and Sema F. Oktug in
Breeding of insects and mosquitoes can ―Trajectory-Assisted Municipal Agent
cause dreadful diseases. Also, the garbage Mobility A Sensor-Driven Smart Waste
has various metals that can be recycled Management System‖ has proposed a
which is separated from garbage at the WSN-driven system for smart waste
dumping place at its cost separation is management in urban areas. In proposed
high. Garbage also contains many types framework, the waste bins are equipped
metals like Tin Can, Metal container etc. with sensors that continuously monitor the
this increase the cost of metal separation waste level and trigger alarms that are
and garbage at the dumping place. wirelessly communicated to a cloud
platform to actuate the municipal agents,
3. LITERATURE SURVEY i.e., waste collection trucks. They
[1] Dharna Kaushik Sumit Yadav in formulate an Integer Linear Programming
―Multipurpose Street-Smart Garbage bin (ILP) model to find the best set of
based on Iot‖ proposed system has trajectory-truck with the objectives of
included, there are multiple smart garbage minimum cost or minimum delay. In order
trash bins on a microcontroller board for the trajectory assistance to work in real
platform (Arduino Board) located time, they propose three heuristics, one of
throughout any city or the campus or which is a greedy one. Through
hospital. The Arduino Board is interfaced simulations, they have shown that the ILP
with GSM modem and ultrasonic sensor. formulation can provide a baseline
Once the level of threshold is being reference to the heuristics, whereas the
crossed, then ultrasonic sensors will non-greedy heuristics can significantly
trigger the GSM module which in turn outperform the greedy approach regarding
continuously alert the authorized person by cost and delay under moderate waste
sending SMS reminder after until the accumulation scenarios.
dustbin is cleaned. Beside this, we will Minthu Ram Chiary, Sripathi SaiCharan,
also create the central system that will Abdul Rashath .R, Dhikhi .T in
keep showing us the current status of ―DUSTBIN MANAGEMENT SYSTEM
garbage on mobile web browser with html USING IOT‖ has proposed a system, in
page by wi-fi. With the help of this, we their system the Smart dustbins are
will create shortest path for garbage connected to the internet to get the real
collection vehicles using Dijkstra time information of the smart dustbins. In

the recent years, there was a rapid growth microcontroller to the urban office and to
in population which leads to more waste perform the remote monitoring of the
disposal. So, a proper waste management cleaning process, done by the workers,
system is necessary to avoid spreading thereby reducing the manual process of
many diseases by managing the smart bins monitoring and verification. The
by monitoring the status of it and notifications are sent to the Android
accordingly taking the decision. There are application using Wi-Fi module.
multiple dustbins that are located in the
city or the Campus (Educational 4. GAP ANALYSIS
Institutions, Companies, Hospitalet.). Table: Gap Analysis
These dustbins are interfaced with micro Systems Benefits Limitations
controller-based system with Ultrasonic Multipurpose Continuously Access to
Sensors and Wi-Fi modules. Where the Street-Smart alert the status is on
Ultrasonic sensor detects the level of the Garbage bin authorized web browser
waste in dustbin and sends the signals to based on IOT person by as html page,
sending SMS there is no
micro controller the same signal is
reminder. application.
encoded and send through Wi-Fi Modular
Smart Provides Garbage
(ESP8266) and it is received by the end Dustbins for location on collection
user. The data will be sent to the user Smart Cities nearest scheduling is
through E-Mail i.e., a mail will be sent as dustbin for done when
notification that the dustbin is full so that disposing many of the
the municipality van can come and empty garbage. dustbins are
the dustbin. full.
[5] N. Sathish Kumar, B. Vuayalakshmi et Dustbin Micro The status of
al, in ―IOT based smart garbage alert Management controller- the dustbin
system using Arduino UNO‖ proposed a System Using based system will be sent
smart alert system for garbage clearance IOT with to the user
by giving an alert signal to the municipal Ultrasonic through E-
Sensors and Mail
web server for instant cleaning of dustbin Wi-Fi
with proper verification based on level of modules
garbage filling. This process is aided by Trajectory- waste It has no
the ultrasonic sensor which is interfaced Assisted collection metal
with Arduino UNO to check the level of Municipal trucks detector to
garbage filled in the dustbin and sends the Agent formulate an detect metal.
alert to the municipal web server once if Mobility A Integer
garbage is filled. After cleaning the Sensor- Linear
dustbin, the driver confirms the task of Driven Smart Programming
emptying the garbage with the aid of RFID Waste (ILP) model
Tag. RFID is a computing technology that Management to find the
System best set of
is used for verification process and in
trajectory-
addition, it also enhances the smart truck with the
garbage alert system by providing objectives of
automatic identification of garbage filled minimum
in the dustbin and sends the status of cost or
clean-up to the server affirming that the minimum
work is done. The whole process is upheld delay.
by an embedded module integrated with
RF ID and IOT Facilitation. An Android 5. PROPOSED WORK
application is developed and linked to a A. System Architecture
web server to intimate the alerts from the

Global System for Mobile communication

(GSM) is digital cellular system used for
mobile devices. It is an international
standard for mobile which is widely used
for long distance communication. There
are various GSM modules available in
market like SIM900, SIM700, SIM800,
SIM808, SIM5320 etc. SIM900A module
allows users to send/receive data over
Fig:System Architecture GPRS, send/receive SMS and
System architecture includes the modules make/receive voice calls.
used in the project and relationships Connecting GSM modem with Arduino is
between them based on data flow and very simple just connect RX Line of
processing. The System consists of Arduino to TX Line of GSM Modem and
following components: vice versa TX of Arduino to Rx of GSM
 Dustbin modem. Make sure use TTL RX, TX
 LED lines of GSM modem. Give 12V 2Amp
 Metal Detector power supply to GSM modem, use of less
 Ultrasonic Sensor current power supply can cause reset
 Arduino Board problem in GSM modem, give sufficient
 GSM Module current to GSM modem.
 User Interface C. Metal Detector Using Arduino Model
Arduino Uno board is interfaced with GSM
modem, Ultrasonic sensor and metal detector.
When waste is being dumped into the dustbin
the metal detector detects whether the waste
contains metal or not. If there is any metal
present then it gives an alert. Ultrasonic sensor
is placed at the top of the dustbin which will
measure the stature of the dustbin. The
threshold stature is set as 10cm. Arduino will
be programmed in such a way that when the Fig: Metal Detector Using Arduino Model
dustbin is being filled, the remaining height A LED and Buzzer is used for metal
from the threshold height will be displayed. detection indicator. A Coil and capacitor
Once the garbage reaches the threshold level are used for detection of metals. A signal
ultrasonic sensor will trigger the GSM modem diode is also used to reduce the voltage.
which will continuously alert the required And a resistor for limiting the current to
authority. GSM modem sends the data of the the Arduino pin. Working of this Arduino
dustbin to the concerned authority. Metal Detector is bit tricky. The block
wave or pulse is provided which is
B. Arduino and GSM Module Interface
generated by Arduino, to the LR high pass
filter. Due to this, short spikes will be
generated by the coil in every transition.
The pulse length of the generated spikes is
proportional to the inductance of the coil.
So, with the help of these Spike pulses we
can measure the inductance of Coil. A
capacitor is used which is charged by the
rising pulse or spike. And it required few
Fig: Module Interface pulses to charge the capacitor to the point

where its voltage can be read by Arduino University for Women Delhi, India and Sumit
analog pin A5. Yadav Computer Science and Engineering
Indira Gandhi Technical University for
D. Mathematical Model Women Delhi, India, ―Multipurpose Street-
Server collects the fill up status and Smart Garbage bin based on Iot‖ Volume 8,
location of dustbins. It processes the No. 3, March – April 2017.
clients query and it respond with nearest [2] Bikramjit Singh, Manpreet Kaur – ―Smart
dustbin location and with direction to Dustbins for Smart Cities‖ Bikramjit Singh et
al, / (IJCSIT) International Journal of
access dustbin. Computer Science and Information
C - current fill up status Technologies, Vol. 7 (2), 2016, 610-611
T - time duration between generation of [3] Ahmed Omara, Damla Gulen, Burak Kantarci
wave and wave received by receiver and Sema F. Oktug – ―Trajectory-Assisted
S - the speed of light. And we will Municipal Agent Mobility: A Sensor-Driven
Smart Waste Management System‖,
calculate the value of C using formula Published: 21 July 2018
given below [4] Minthu Ram Chiary, Sripathi SaiCharan,
C=L-(ST)/2 Abdul Rashath. R, Dhikhi. T Computer
And similarly, percentage of fill up is Science and Engineering Saveetha school of
calculated using formula given below Engineering Saveetha University - ―DUSTBIN
MANAGEMENT SYSTEM USING IOT‖
P=(C/L) *100 Volume 115 No. 8 2017, 463-468 ISSN: 1311-
Where P is the % fill up 8080
Here we are assuming the wave path is [5] N. Sathish Kumar, B. Vuayalakshmi, R.
almost vertical. Jenifer Prarthana, A. Shankar, Sri
Ramakrishna Engineering College,
Coimbatore, TamilNadu, India for ――IOT
6. CONCLUSION AND FUTURE based smart garbage alert system using
WORK Arduino UNO‖ IEEE 978-1-5090-2597-8
This project was developed with the [6] Narayan Sharma, Nirman Singha, Tanmoy
intention of making smart cities; Dutta, ―Smart Bin Implementation for Smart
however, there are lots of scope to Cities‖, International Journal of Scientific &
Engineering Research, Volume 6, Issue 9,
improve the performance of the Proposed September-2015 ISSN 2229-5518.
System in the area of User Interface, [7] ―Smart Cities‖ available at
adding new features and query processing www.smartcities.gov.in/
time. Etc. [8] ―GSM MODULE INTERFACE‖ at
So, there are many things for future https://circuits4you.com/2016/06/15/gsm-
modem-interfacing-arduino/
enhancement of this project. The future [9] ―GSM‖
enhancements that are possible in the https://www.arduino.cc/en/Guide/ArduinoGS
project are as follows: MShield
If the system is sponsored then we can [10] ―GSM Module‖
have additional sensors for wet and dry http://www.circuitstoday.com/interface-gsm-
module-with-arduino
waste segregation. [11] ―Arduino‖ https://www.arduino.cc/
[12] ―Android‖https://developer.android.com/studio
REFERENCES /―GSM MODULE‖
[1] Dharna Kaushik Computer Science and www.electronicwings.com/arduino/sim900a-
Engineering Indira Gandhi Delhi Technical gsm-module-interfacingwith-arduino-uno.

IMPROVEMENT IN PERSONAL ASSISTANT

Ashik Raj1, Sreeja Singh2, Deepak Kumar3, Deshpande Shivani Shripad4
1,2,3,4
Department of Computer Engineering Smt. Kashibai Navale College of Engineering, Vadgaon bk Pune,
India.
ashikraj10@gmail.com1, Sreejasingh.bglpr@gmail.com2, Deepak26264@gmail.com3,
Shivanideshpande19may@gmail.com 4
ABSTRACT
In this paper, we describe the Artificial Intelligence technologies are beginning to be
actively used in human life, this is facilitated by the appearance and wide dissemination
of the Internet of Things (IoT). Autonomous devices are becoming smarter in their way
to interact with both a human and themselves. New capacities lead to creation of
various systems for integration of smart things into Social Networks of the Internet of
Things. One of the relevant trends in artificial intelligence is the technology of
recognizing the natural language of a human. New insights in this topic can lead to
new means of natural human-machine interaction, in which the machine would learn
how to understand human‘s language.
Keywords
Virtual Personal Assistants; Multi-modal Dialogue Systems; Gesture Recognition;
Image Recognition; Image Recognition; Intrusion detection
1. INTRODUCTION image/video recognition, speech
Today the development of artificial recognition, the vast dialogue and
intelligence (AI) systems that are able to conversational knowledge base, and the
organize a natural human-machine general knowledge base. Moreover, our
interaction (through voice, approach will be used in different tasks
communication, gestures, facial including education assistance, medical
expressions, etc.) are gaining in popularity. assistance, robotics and vehicles,
Machine learns to communicate with a disabilities systems, home automation, and
human, exploring his actions, habits, security access control.
behavior and trying to become his
personalized assistant. 2. GENERAL TERM
The work on creating and improving such The dialogue system is one of an active
personalized assistants has been going on area that many companies use to design
for a long time. These systems are and improve their new systems.
constantly improving and improving, go According to CHM Research, before 2030,
beyond personal computer. Spoken millions of us will be using ―voice‖ to
dialogue systems are intelligent agents that interact with machine, and voice-driven
are able to help users finish tasks more services will become part and parcel of
efficiently via spoken interactions. Also, smartphones, smart glasses, home hubs,
spoken dialogue systems are being kitchen equipment, TVs, games consoles,
incorporated into various devices such as thermostats, in-car systems and apparel.
smart-phones, smart TVs, in car navigating There are many techniques used to design
system. the dialogue systems, based on the
In this proposal, we propose an approach application and its complexity. On the
that will be used to design the Next- basis of method used to control dialogue, a
Generation of Virtual Personal Assistants, dialogue system can be classified in three
increasing the interaction between users categories: Finite State (or graph) based
and the computers by using the Multi- systems, Frame based system and Agent
modal dialogue system with techniques based systems.
including the gesture recognition,

Also, there are many different and body data sets for gesture modal,
architectures for dialog systems. Which speech recognition knowledge bases,
sets of components are included in a dialog dictionary and spoken dialog knowledge
system, and how those components divide base for ASR modal, video and image
up responsibilities differs from system to body data sets for Graph Model, and some
system. A dialogue system has mainly user‘s information and the setting system.
seven components: Input Decoder, Natural B. Graph Model
Language Understanding, Dialogue The Graph Model analyzes video and
Manager, Domain Specific Component, image in real-time by using the Graph
Response Generator, and Output Renderer. Model and extracts frames of the video
However, there are six main components that collect by the camera and the input
in the general dialogue systems, which model; then it sends those frames and
includes the Speech Recognition (ASR), images to the Graph Model and
the Spoken Language Understanding applications in Cloud Servers for
(SLU), Dialog Manager (DM), Natural analyzing those frames and images and
Language Generation (NLG), Text to returning the result.
Speech Synthesis (TTS), and the 1.2 Comparison on features of popular
knowledge base. The following is the VPA in market
structure of the general dialogue system 1.3
3. THE PROPOSAL VPASS SYSTEM

1.1 In this proposal, we have used the
multi-modal dialogue systems which
process two or more combined user input
modes, such as speech, image, video,
touch, manual gestures, gaze, and head and
body movement in order to design the
Next-Generation of VPAs model. We have
modified and added some components in
the original structure of general dialogue
systems, such as ASR Model, Gesture Fig: Gap Analysis
Model, Graph Model, Interaction Model,
User Model, Input Model, Output Model,
Inference Engine, Cloud Servers and
Knowledge Base. The following is the
structure of the Next-Generation of Virtual
Personal Assistants: this model includes
intelligence algorithms to organize the
input information before sending the data
to the Interaction Model.
Knowledge Base: There are two

knowledge bases. The first is the online
and the second is local knowledge base
which include all data and facts based on
each model, such as facial and body data
sets. There are two knowledge bases. The Fig 1: Block diagram of system architecture
first is the online and the second is local
knowledge base which include all data and Competition:
facts based on each model, such as facial Google Now:

Launched in 2012, Google Now is an her by using your voice or by typing. You
intelligent personal assistant made by can give her instructions and talk with her
Google. It was first included in Android by using your voice or by typing Cortana,
4.1 which launched on July 9, 2012, and named after her fictional counterpart in the
was first supported on the Google Nexus video game series Halo, takes notes,
smart-phone. Found within the Google dictates messages and offers up calendar
search option, Google Now can be used in alerts and reminders. But her real standout
numerous ways that are helpful. Yes, it can characteristic and the one Microsoft's
set reminders or answer basic questions betting heavily on, is the ability to strike
like the weather of the day or the name of up casual conversations with users; what
the movies that won Oscars last year. But Microsoft calls "chitchat".
more than that Google Now is a virtual
assistant that shows relevant and timely 4. CONCLUSION
information to you once it learns more In this paper we have seen the working of
about you and how you use the phone. personal virtual assistant by using Natural
Google Now also displays different language Processing and Internet of
sections called Now cards that pulls Things and also seen the implementation
information from your Gmail account and of intrusion detection system with the help
throws it on the screen. For example if you of passive infrared sensor PIR for
have last bought a Red Bag from Amazon, detecting the motion.
the card shows you your recent buy.
Similarly, it also has weather card where REFERENCES
you can know about the weather, sport [1] S. Arora, K. Batra, and S. Singh.
card where you can learn about any match Dialogue System: A Brief Review. Punjab
Technical University.
that is on. [2] Ding, W. and Marchionini, G. 1997 A Study on
Amazon Alexa: Video Browsing Strategies. Technical Report.
Amazon Alexa, known simply as Alexa is University of Maryland at College Park.
a virtual assistant developed by Amazon, [3] R. Mead. 2017. Semio: Developing a
first used in the Amazon Echo and the Cloud-based Platform for Multimodal
Conversational AI in Social Robotics. 2017
Amazon Echo Dot smart speakers IEEE International Conference on
developed by Amazon Lab126. It is Consumer Electronics (ICCE).
capable of voice interaction, music [4] R. Pieraccini, K. Dayanidhi, J. Bloom,
playback, making to-do lists, setting J. Dahan, M.l Phillips. 2003. A Multimodal
alarms, streaming podcasts, playing Conversational Interface for a Concept
Vehicle. Eurospeech 2003.
audiobooks, and providing weather, traffic, [5] G. Bohouta and V. Z Këpuska. 2017.
sports, and other real-time information, Comparing Speech Recognition Systems
such as news. Alexa can also control (Microsoft API, Google API and CMU
several smart devices using itself as a Sphinx). Int. Journal of Engineering Research
home automation system. Users are able to [6] M. McTear .2016. The Dawn of the
Conversational Interface. Springer
extend the Alexa capabilities by installing International Publishing Switzerland 2016
"skills" (additional functionality developed [7] Amazon. Amazon Lex is a service for building
by third-party vendors, in other settings conversational interfaces.
more commonly called apps such as https://aws.amazon.com.
weather programs and audio features). [8] B. Marr. The Amazing Ways Google Uses
Deep Learning AI. https://www.forbes.com
Cortana: [9] K. Wagner. Facebook's Virtual Assistant 'M' Is
Cortana is the name of the interactive Super Smart. It's Also Probably a Human.
personal assistant built into Windows 10. https://www.recode.com.
You can give her instructions and talk with

IoT BASED HOME AUTOMATION SYSTEM FOR

SENIOR CITIZENS
Ashwathi Sreekumar1, Divyanshi Shah2, Himanshi Varshney3
1,2,3
Dept. of Computer Engineering, Smt. Kashibai Navale College of Engineering, Savitribai Phule Pune
University, Pune, India.
ashsree10@gmail.com1, shah.divyanshi@gmail.com2, himavarshney@gmail.com3
ABSTRACT
Smart homes promise to make the lives of senior citizens of our society more
comfortable and safer. However, the goal has often been to develop new services for
young people rather than assisting old people to improve their quality of life. Important
is, the potential for using these technologies to promote safety and prevent injury
among old people because this group is at home more than the other age groups.
Network devices can collect data from sensors and can instruct and remind individuals
about safety-related issues. The work focuses on concept of home automation where
the monitoring and control operations are facilitating through smart devices installed
at homes.
Keywords
IoT, Smart Home, Security, Raspberry pi, remote sensor, relay, WI-FI, Mobile phone,
Home Automation for elderly, Emergency support.
1. INTRODUCTION Some of the monitoring or safety devices
Nowadays, many of the daily activities are that can be installed in a home include
automated with the rapid enhancement of lighting and motion sensors,
the electronic devices. Automation is a environmental controls, video cameras,
technique, method, or system of operating automated timers, emergency assistance
or controlling a process by electronic systems, and alerts. In order to maintain
devices with reducing human involvement the security of the home many home
to a minimum. The fundamental of automation systems integrate features such
building an automation system for an as remote keyless entry systems which will
office or home is increasing day-by-day allow seniors to view who is at the door
with numerous benefits. While there are and then remotely open the door. Home
many industrial facilities that are using networks can also be programmed to
automation systems and are almost fully automatically lock doors in order to
automated, in the other hand, the home maintain privacy. In simple installations,
automation systems are rarely used in the automation may be as straightforward as
houses of common people, which is turning on the lights when a person enters
mainly because of the high cost of these the room. In advanced installations, rooms
kind of systems. The form of home can sense not only the presence of a person
automation focuses on making it possible inside but know who that person is and
for older adults to remain at home, safe perhaps set appropriate lighting,
and comfortable rather than move to a temperature, music levels or Television
healthcare facility. This project focuses channels, taking into account the day of
more on that home automation can make a the week, the time of day, and other
difference regarding better energy factors. Also, to Design a remotely
management and usage of renewable controlled multifunction outlet handled
energy sources but tailors it towards older using google assistant and other sources.
adults. Home automation for healthcare The request will be sent to the designated
can range from very simple alerts to lavish device via Wi-Fi.
computer-controlled network interfaces.

2. MOTIVATION Dual Tone Multi Frequency based

What happens when our senior loved ones Home Automation System (2017)
still want to live independently at home, Authors to the paper include Rucha R.
but we worry about them? What if we had Jogdand, B. N. Choudhari. Dual-tone-
a smart home system that could provide multi-frequency are the audible sounds
information on an ageing loved one – and you hear when you press keys on your
give some peace of mind? With the phone. It is paired with a wireless module.
advancement in Technology, living life When a button is pressed from mobile it
has become a lot easier from the generates a tone which is decoded by the
materialism point of view. Seeing the decoder IC and it is sent to ATMEGA8
elderly go by their daily routine, aloof to controller. The main advantage is that it
the advancements, we were motivated to can have both wired and wireless
find a solution to help them in overcoming communication. Also, frequencies are
these difficulties, so that they can live more practical and less expensive to
independently and securely in their current implement. The drawbacks to the system
home for as long as possible, thus giving was that Number of appliances is limited
family members peace of mind, called as our mobile can generate only 16 tones.
‗ageing in place‘.
4. PROPOSED WORK
3. LITERATURE REVIEW This project focuses on the helping the
IoT Based Smart Security and Home technology provide easier and safe living
Automation System.(2016) for the elderly. The system uses various
The paper is written by Ravi Kishore sensors to either ensure or safety for the
Kodali, Vishal Jain, Suvadeep Bose and elderly. We use a mobile application to
Lakshmi Boppana. The current system send the commands to the cloud over a
sends alerts to the owner over voice calls WIFI based system. On the cloud these
using the Internet if any sort of human commands are interpreted and the
movement is sensed near the entrance of necessary action are taken by the actuators
his house. Microcontroller used is TI- or the requests are delivered with the
CC3200 Launchpad board which comes responses. The mobile application also has
with an embedded micro-controller and an a situation to handle emergencies like a
onboard Wi-Fi Shield. Advantages of the medical emergency which would call the
system proposed are that it is a low-cost ambulance and a security emergency
system with minimum requirements, which would alert the police.
platform independent and phone need not Assumptions and Dependencies
necessarily be connected to internet. Assuming that the user has a stable
IoT: Secured and Automated House internet connection at home, user also has
(2017) basic knowledge about smart phones. The
The paper is presented by Hakar Mohsin devices should always be connected to
Saber, Nawzad Kamaran Al-Salihi. The internet. User should have an android
system uses Arduino with Teleduino web operating smart phone. Proper hardware
server and an Android application. It also components should be available.
uses a cloud webserver. The advantage to Requirements
this system was that it sends an SMS alert Functional requirements denote the
to the user using a cloud server API to functions that a developer must build into
make it cost effective. The disadvantage the software to achieve use-cases. For the
was Limited memory due to usage of SIM proposed system the functional
card and the application sends a 25- requirements are Switching Devices On
message clear signal before sending the and Off, Door Lock Down, Select Room to
alert. Monitor, View Status of Devices at Home,

Instant Capture of Image, Medical

Emergency Handling.
5. SYSTEM ARCHITECTURE
Fig.2. Data flow diagram level 0
Fig. 1. System Architecture

The Fig 1 above shows the architecture of
the proposed system. The sensors and
actuators are connected and given power Fig.2. Data flow diagram level 1
supply by the raspberry pi. The raspberry The Fig, 3 and 4 shows the data flow
pi provides the support to send requests diagram of the model. It shows the
and receive responses form the cloud. The graphical representation of flow of data
cloud via the IFTTT services sends through an information system. It also
message to the mobile application. The shows the preliminary step to create an
mobile application responds to the overview of the system. DFD level 0 show
raspberry pi using the cloud. The user uses Android Application taking voice
the android application created using the commands and giving it to the raspberry pi
MIT App Inventor to as an interface to the which then takes the action. The DFD
cloud in order to give the commands. level 1 is a more detailed view of the level
0. Here we show what type of commands
are sent via the user to the raspberry pi and
ultimately to the Application.
Algorithm
The algorithm which is to be used for the
light and fan would read the variable value
from the button on the android application
and accordingly actuate the necessary
action i.e. either to switch ON or OFF the
device or to set the intensity/speed value.
Fig.2. Steps Involved
The above steps shown in the Fig 2 are the
basic ones required in the project. In the
Assembly of Hardware phase, we are
going to assemble all the hardware which
includes set of sensors and actuators and
set up a connection between the raspberry
pi, breadboard and the devices. In the
Fig.3. Algorithm for Light and Fan
Services phase we are using Amazon Web
For the working of the door, the proposed
Services IoT, IFTTT, MIT APP Inventor.
algorithm requires for the system to be
The Application phase is based on creating
under lockdown. Under such
a elder friendly interface.
circumstances if the door is opened by any
means it would set off the alarm and also

send the notification to the necessary

people. 6. CONCLUSION AND FUTURE
SCOPE
The main objective of our project is to
make life easier for senior citizen by
introducing a cost-efficient system that can
connect appliances remotely. The
communication link between the
appliances and remote user plays an
important role in automation. This project
includes voice-controlled home
automation system which involves the
speech recognition system to implement
this work. This is used to remotely control
the home appliances through smart devices
Fig.4. Algorithm for Door so that one can remotely check the status
Other Specifications of the home appliances and turn ON or
Advantages OFF the same.
Remotely monitoring Home Appliances: Also, one can keep a track of the security
home appliances such as lights, fans, door of their valuables whenever required. In
etc. can be monitored easily with the help advanced installations, rooms can sense
of an Android application and/or voice not only the presence of a person inside
recognition tool. but know who that person is and perhaps
Security for the senior citizens: This set appropriate lighting, temperature,
application also provides the facility of music levels or Television channels, taking
intrusion detection along with door lock into account the day of the week, the time
down whenever required. Also, the feature of day, and other factors in future .The
of instant face capture is being provided to future of IoT is virtually unlimited due to
help in detecting the identity of the advances in technology and consumers‘
intruder. desire to integrate devices such as smart
Medical Emergency Call/SMS: An extra phones with household machines. The
button for ―Emergency‖ is provided on the possibilities are exciting, productivity will
application to make a call or send SMS to increase and amazing things will come by
the immediate emergency contact or to the connecting the world.
Hospital in case of medical emergencies.
Energy Management: Providing the app REFERENCES
with visual aids and syncing them to the [1] Ravi Kishore Kodali, Vishal Jain, Suvadeep
current status of the remote device. Bose and Lakshmi Boppana, ―IoT Based
Smart Security and Home Automation
Providing the feature of instant capture to System‖, International Conference on
avoid worrying over suspicious activities Computing, Communication and
and help in clearing paranoia. Automation 2016.
Limitations [2] Hakar Mohsin Saber, Nawzad Kamaran Al-
Auto-detecting Medical Emergency: In Salihi, ―IoT: Secured and Automated
House‖, International Journal of Engineering
case of medical emergency, the elderly Science and Computing 2017.
people have to make a move on their own [3] Rucha R. Jogdand, B. N. Choudhari, ―Dual
like either making use of voice recognition Tone Multi Frequency based Home
tool or open the particular android Automation System‖, IEEE 2017.
application and select the ―Emergency‖ [4] Prof. R.S. Suryavanshi1, Kunal Khivensara,
Gulam Hussain, Nitish Bansal, Vikash
button, the system cannot detect an Kumar,‖ Home automation system using
emergency on its own.

android and Wi-Fi‖, International Journal of recognition for home automation using
Engineering and Computer Science 2014. android application,‖ March 2015.
[5] B. R. Pavithra, D., ―Iot based monitoring [7] R. A. Ramlee, M. A. Othman, M. H. Leong,
and control system for home automation,‖ M. M. Ismail and S. S. S.
April 2015. [8] Ranjit, "Smart home system using android
[6] B. S. S. Tharaniya soundhari, M., application‖, international Conference of
―Intelligent interface-based speech information and Communication
Technology 2013.

SMART TRAFIC CONTROL SYSTEM USING TIME

MANAGEMENT
Gaikwad Kavita Pitambar1, More Sunita Vitthal2, Nalge Bhagyashree Muktaji3
1,2,3
Computer Engineering, SCSMCOE, Nepti, Ahmednagar, India.
gaikwadkavita678@gmail.com1, sunitavm94@gmail.com2, nbhagya2209@gmail.com3
ABSTRACT
An automated Raspberry Pi based traffic control system using sensors along with live
web updates can be a helpful step in optimizing the traffic flow pattern in busy
intersections. This intuitive design of the transport infrastructure can help alleviate the
traffic congestion problem in crowded cities. This system describes a system where
photoelectric sensors are integrated with the Raspberry Pi to operate the lanes of an
intersection based on the density of traffic. The current condition of the intersection is
updated on a user accessible website. In this system, we will use photoelectric sensors to
measure the traffic density. We have to mount four photoelectric sensors for each road;
the distance between these sensors will depend on nature of traffic on a particular
junction. These sensors will sense the traffic on that particular road. As a result, the
improvement in traffic system can be incrementally enhanced, which can lead to
eventually significant improvement in the overall traffic system.
General Terms
Your general terms must be any term which can be used for general classification of the
submitted material such as Pattern Recognition, Security, Algorithms et. al.
Keywords
smart traffic control system; Raspberry pi; photoeletric sensor; traffic congestion.
1. INTRODUCTION rage. In order to avoid the congestion in
In modern life we have to face the traffic. In traffic environments, Traffic
with many problems one of which is traffic Sign Recognition (TSR) is used to regulate
congestion becoming more serious day traffic signs, warn the driver, and
after day. It is said that the high volume of command or prohibit certain actions. A
vehicles, the inadequate infrastructure and fast real-time and robust automatic traffic
the irrational distribution of the sign detection and recognition can support
development are main reasons for and disburden the driver, and thus,
increasing traffic jam. The major cause significantly increase driving safety and
leading to traffic congestion is the high comfort.
number of vehicle which was caused by Generally, traffic signs provide the
the population and the development of driver various information for safe and
economy. Traffic congestion is a condition efficient navigation Automatic recognition
on road networks that occurs as use of traffic signs is, therefore, important for
increases, and is characterized by slower automated intelligent driving vehicle or
speeds, longer trip times, and increased driver assistance systems. However,
vehicular queuing. The most common identification of traffic signs with respect
example is the physical use of roads by to various natural background viewing
vehicles. When traffic demand is great conditions still remains challenging tasks.
enough that the interaction between Real time automatic vision based traffic
vehicles slows the speed of the traffic light control has been recently the interest
stream, these results in some congestion. of many researchers, due to the frequent
known as a traffic jam or traffic snarl-up. traffic jams at major junctions and its
Traffic congestion can lead to drivers resulting wastage of time. Instead of
becoming frustrated and engaging in road depending on information generated by

costly sensors, economic situation calls for +5 positive supply which is given to the
using available video cameras in an whole electronic component of the system.
efficient way for effective traffic The Raspberry Pi uses this information to
congestion estimation. Researchers may set the signal timer according to the level
focus on one or more of these tasks, and of traffic.
they may also choose different measures
for traffic structure or add measures. 3. BLOCK DIAGRAM
For more comprehensive review
on vision based traffic light control Due to
the massive growth in urbanization and
traffic congestion, intelligent vision based
traffic light controller is needed to reduce
the traffic delay and travel time especially
in developing countries as the current
automatic time based control is not
realistic while sensor based traffic light
controller is not reliable in developing
countries. Traffic congestion is now
considered to be one of the biggest
problems in the urban environments.
Traffic problems will be also much more
widely increasing as an expected result of
the growing number of transportation
means and current low-quality
infrastructure of the roads. In addition,
many studies and statistics were generated
in developing countries that proved that
most of the road accidents are because of
the very narrow roads and because of the Fig. 1 Block Diagram
destructive increase in the transportation 16*2 alpha-numeric LCD display is used
means. which shows the real time information
A Raspberry Pi microcomputer and about Traffic signal. Here use to four
multiple ultrasonic sensors are used in sensor when any sensor sense then this
each lane to calculate the density of traffic signal go to the Raspberry pi and
and operate the lane based on that Raspberry pi output go the relay driver and
calculation. This idea of controlling the relay is ON at that time LED is ON and
traffic light efficiently in real time has also LCD display the time.
attracted many researchers to work in this
field with the goal of creating automatic 4. SYSTEM DESIGN
tool that can estimate the traffic congestion Fig. shows the overall design of the
and based on this Variable, the traffic sign system. In this intersection, each outgoing
time interval is forecasted. lane has four photoelectric sensors that
calculate and report the traffic conditions
2. WORKING of each lane to the Raspberry Pi. The
In this proposed system supply given to Raspberry Pi uses this information to set
the step-down transformer. The output of the signal timer according to the level of
the transformer is connected to the input to traffic
the full wave bridge rectifier. The output
of bridge rectifier is given to the
Regulator. The output of regulator gives

Compo No. No of Total

nent Of I/O pins no. of
Name Com required I/O
pone for each pins
nt unit of requir
Use compone ed
d nt.
Photo
Electric 4 3 12
Sensor
LED 8 2 16
16×2 1 14 14
Fig. The model of the system Display
Driver 1 16 16
5. COMPONENTS ULN
The components used in this system are 2003
listed below: Relay 4 5 20
A. Photoelectric sensor
It used to discover the distance, absence, Table: Assembly Components
or presence of an object by using a light
transmitter, often infrared, and
a photoelectric receiver. 6. FLOW CHART
B. Raspberry Pi 3
Raspberry pi is a miniature computer with
an operating system that can be used as a
development tool for different software
and hardware based projects. In this
project, the Raspberry Pi 3rd generation
was used for its superior processing power
compared to other available
Microcontrollers.
C. Display
This display used to show the traffic
timers.
D. Relay
Relay electrically operated switch.
E. Driver ULN2003
The IC ULN2003A is a Darlington
transistor array which deals with high-
voltage and high-current.
6. ASSEMBLY
The methods used to assemble all the Fig: Flowchart of the system.
components are discussed in this section.
Table I shows the number of I/O pins used 7. FUTURE WORK
in the design and also how they are More sensors can be used in each lane to
distributed among each component. It is make the system more accurate and
also used to represent how the number of sensitive to small changes in traffic
I/O pins was reduced to increase the density. Driverless cars can access the
efficiency of the system. website to view the intensity of traffic at

an intersection and choose the fastest route waiting time. The whole system is
accordingly. Data mining techniques such controlled by Raspberry Pi. The designed
as classification can be applied on traffic system is implemented, tested to ensure its
data collected over a long term to study the performance and other design factors.
patterns of traffic in each lane at different
times of the day. Using this information, REFERENCES
different timing algorithms can be used at [1] R. Dhakad and M. Jain, "GPS based road traffic
different points of the day according to the congestion reporting system," 2014 IEEE
International Conference on Computational
traffic pattern. Intelligence and Computing Research,
Coimbatore, 2014, pp. 1-6.doi:
8. CONCLUSION 10.1109/ICCIC.2014.7238547
Nowadays, traffic congestion is a main [2] Q. Xinyun and X. Xiao, "The design and
problem in major cities since the traffic simulation of traffic monitoring system based
on RFID," The 26th Chinese Control and
signal lights are programmed for particular Decision Conference (2014 CCDC),
time intervals. However, sometimes the Changsha, 2014, pp. 4319-4322. doi:
demand for longer green light comes in at 10.1109/CCDC.2014.6852939
the one side of the junction due to huge [3] M. F. Rachmadi et al., "Adaptive traffic signal
traffic density. Thus, the traffic signal control system using camera sensor and
embedded system," TENCON 2011 - 2011
lights system is enhanced to generate IEEE Region 10 Conference,
traffic-light signals based on the traffic on Bali,2011,pp.12611265.doi:10.1109/TENCO
roads at that particular instant. The N.2011.6129009
advanced technologies and sensors have [4] X. Jiang and D. H. C. Du, "BUS-VANET: A
given the capability to build smart and BUS Vehicular Network Integrated with
Traffic Infrastructure," in IEEE Intelligent
intelligent embedded systems to solve Transportation Systems Magazine, vol. 7,no.
human problems and facilitate the life 2, pp. 47-57, Summer
style. Our system is capable of estimating 2015.doi:10.1109/MITS.2015.2408137
traffic density using IR sensors placed on [5] I. Septiana, Y. Setiowati and A. Fariza, "Road
either side of the roads. Based on it, the condition monitoring application based on
social media with text mining system: Case
time delay for the green light can be Study:
increased and we can reduce unnecessary

THE POTHOLE DETECTION: USING A MOBILE

SENSOR NETWORK FOR ROAD SURFACE
MONITORING
Sanket Deotarse1,Nate Pratiksha2,Shaikh Kash3, Sonnis Poonam4
1,2,3,4
Computer Engineering, Shri Chatrapati Shivaji Maharj College of Engineering, Ahmednagar, India.
sanketdeotarse1996@gmail.com1, natepratiksha143@gmail.com2, kashshaikh307@gmail.com3,
sonnispoonam1997@gmail.com4
ABSTRACT
Pothole Detection system is a unique concept and it is very useful to whom which face
the problem of pothole in their route. The technology is purely new and idea is
generated a profile for pothole in your vehicle journey. It is an application which is
Accessing to timely and accurate road condition information, especially about
dangerous potholes is of great importance to the public and the government. We
implement an effective road surface monitoring system for automated path hole
detection. It is a unique concept where it a low cost solution for the road safety purpose.
This will help to avoid accidents and can use to identify problem areas early. The
authorities can be alerted to take preventive actions; preventive actions can save
money. Poorly maintained roads are a fact of life in most developing countries
including our India. A well maintained road network is a must for the well being and
the development of any country. So that we are going to create an effective road surface
monitoring system. Automated path hole detection is our focus in the system.
1. INTRODUCTION 4. The authorities can be alerted to take
We are going to develop a effective road preventive actions; preventive actions can
surface monitoring system for automated save money.
pothole detection. This is a low cost Pothole in the Dark: Perceiving Pothole
solution for the road safety purpose. This Profiles with Participatory Urban
will help to avoid accidents and can use to Vehicles‖, Over the past few years, there
identify problem areas early. The has been a large increase in vehicle
authorities can be alerted to take population. This increase in vehicle
preventive actions; preventive actions can population has led to increasing road
save money. Poorly maintained roads are a accidents and also traffic congestion.
fact of life in most developing countries According to Global Road Safety Report,
including our India. A well maintained 2015 released by the World Health
road network is a must for the well being Organization (WHO), India accounts for
and the development of any country. So more than 200,000 deaths because of road
that we are going to create an effective accidents. These accidents can be due to
road surface monitoring system. over speeding, drunk and driving, jumping
Automated path hole detection is our focus traffic signals and also due to humps,
in the system. This is first ever system for speed-breakers and potholes. Hence it is
pothole detection. In this we are using important to collect information regarding
wireless sensor network. these poor road conditions and distribute
1. We are going to develop a effective road the same to other vehicles that in turn help
surface monitoring system for automated reduce accidents caused due to potholes
pothole detection. and humps. Hence, in this system we have
2. This is a low cost solution for the road proposed a system that would notify the
safety purpose. drivers regarding any hurdles such as
3. This will help to avoid accidents and potholes and humps and this information
can use to identify problem areas early. can be used by the Government to correct
these roads effectively. To develop a

system based on IOT to detect Potholes n

the road which will be uploaded on server
and notified to all the user using the
application and update as per the
condition.
2. MOTIVATION
This research work is helpful for
improving smart city application. The
authorities can be alerted to take
preventive actions; preventive actions can
save money.
3. PROBLEM STATEMENT Fig 1. Project Idea

Before Existing system cannot give The system it automatically detects the
proper road condition. This technology is potholes and humps and sends the
purely new and idea is generated a profile information regarding this to the vehicle
for pothole in your vehicle journey. It is an drivers, so that they can avoid accidents.
application which is Accessing to timely This is a cost efficient solution for
and accurate road condition information, detection of humps and potholes. This
especially about dangerous potholes is of system is effective even in rainy season
great importance to the public and the when roads are loaded with rain water as
government. well as in winter during low visibility, as
the alerts are sent from the stored
4. OBJECTIVES information in the server/database. This
1. We are going to develop a effective system helps us to avoid dreadful potholes
road surface monitoring system for automated and humps and hence to avoid any tragic
pothole detection. accidents due to bad road conditions.
2. This is a low cost solution for the road
safety purpose.
3. This will help to avoid accidents and
can use to identify problem areas early.
4. The authorities can be alerted to take
preventive actions; preventive actions can save
Money
5. Notification to Users.
6. Updating a per the latest road
condition
5. PROPOSED SYSTEM
The proposed system consists of entities
such as ultrasonic sensor and micro
controller for pothole detection. We are
going to develop an effective road surface
Fig 2. System Architecture
monitoring system for automated path hole
detection. This is a low cost solution for
6. IMPLEMENTATION MODULE
the road safety purpose.
6.1 Mobile Application Module:
User can collect the pathole notification
from the system for his safe journey.
6.2 Server Module:

The server module is nothing but the Sensorreadingarray [ ] //depth parameter

database for system. It is an Intermediate for (k=0 ; k isgreater noofsensor ;k++)
layer between sensing and mobile x=Sensorreadingarray[k]; //values will be
application module. Its function is to store check
the updated information received by the y=Sensorreadingarray[k+1]; // through
sensor and provide to the requested user threshold
whenever needed. This module can also if(abs(x-y) isgreater patholethreshold)
be updated frequently for information //make sure hardware if function is not
related to the potholes and humps. malfunction
6.3 Microcontroller Module: pathole
The Module is responsible for ag = true;
coordinating the hardware and server. timestamp =currenttime;
6.4 Sensing Module:
This model consists GPS receiver, 8. CONCLUSION AND FUTURE
ultrasonic sensor (HCSR04) and GSM SCOPE
SIM 900 modem. The distances in In this paper, we have proposed a system
between the car body and the road surface which will detect the potholes on the road
area is calculated with the help of an and save the information in the server and
ultrasonic sensor. A threshold value is set reduce the vehicle speed if needed. Due to
such that the value based on ground the rains and oil spills potholes are
clearance of the transport vehicle. The generated which will cause the accidents.
calculated distance(depth parameter) is The potholes are detected and its height,
compared with the threshold value to depth and size are measured using
detect pothole or hump. If the calculated ultrasonic sensor. The GPS is used to find
distance is greater when compared with the location of pothole. All the
the threshold value, then it is classified to information is saved in the database. This
be a pothole, and if the measured distance timely information can help to recover the
is less, then it is classified to be a hump. road as fast as possible. By controlling the
The location co-ordinates fetch by the GPS rate of fuel injection we can control the
receiver, along with this data the rotation of the drive shaft by means of an
information regarding the detected pothole IR Non-contact tachometer. This helps to
or hump at a particular location co- reduce the vehicle speed when pothole or
ordinate is broadcast to the server using a hump is detected. Hence the system will
GSM modem. help to avoid road accidents.
7. METHODOLOGY 9. ACKNOWLEDGMENT
We implement this system for avoiding the We express our sincere thanks to our
obstacle in our route for safe journey and project guide Prof. Lagad J. U. who always
maintain a vehicle proper condition. In this being with presence & constant,
paper we use the following algorithm for constructive criticism to made this paper.
implementation the detection system We would also like to thank all the staff of
Algorithm details: COMPUTER DEPARTMENT for their
Input: Sensor Value valuable guidance, suggestion and support
Output: According to the system the of through the project work, who has given
output is positive that is one when the co-operation for the project with personal
proposed pothole detection system face the attention. Above all we express our deepest
pathole in car journey. Following code gratitude to all of them for their kind-
shows, how operations performed within hearted support which helped us a lot
the system and the sequence in which they during project work. At the last we
are performed. thankful to our friends, colleagues for the

inspirational help provided to us through a [3] Samyak Kathane, Vaibhav Kambli, Tanil Patel
project work. and Rohan Kapadia, Time Potholes Detection
and Vehicle Accident Detection and Reporting
System and Anti-theft (Wireless)‖, IJETT,
REFERENCES Vol. 21, No. 4, March 2015.
[1] S. S. Rode, S. Vijay, P. Goyal, P. Kulkarni, and [4] J. Lin and Y. Liu, \Potholes detection based on
K. Arya, detection and warning system: SVM in the pavement distress image," in Proc.
Infrastructure support and system design,‖ in 9th Int. Symp. Distrib. Comput. Appl. Bus.
Proc. Int. Conf Electron. Comput. Technol., Eng. Sci., Aug. 2010, pp. 544{ 547
Feb. 2009, pp. 286290. [5] I. Moazzam, K. Kamal, S. Mathavan, S. Usman,
[2] R. Sundar, S. Hebbar, and V. Golla, intelligent and M. Rahman, \Metrology and visualization
trac control system for congestion control, of potholes using the microsoft Kinect sensor,"
ambulance clearance, and stolen vehicle in Proc. 16th Int. IEEE Conf. Intell. Transp.
detection,‖ IEEE Sensors J., vol. 15, no. 2, pp. Syst., Oct. 2013, pp. 1284{1291.
11091113, Feb. 2015.

IOT BASED AGRICULTURAL SOIL PREDICTION

FOR CROPS WITH PRECAUTIONS
Prof.Yashanjali Sisodia1, Pooja Gahile2, Chaitali Meher3
1,2,3
Department of Computer Engineering, GHRCOEM, Ahmadnagar, India.
Yashanjali44@gmail.com1, poojagahile25@gmail.com2,Chaitalimeher7@gmail.com3
ABSTRACT
The present study focuses on the applications of data mining techniques in yield
prediction in the face of climatic change to help the farmer in taking decision for
farming and achieving the expected economic return. The problem of yield prediction
is a major problem that can be solved based on available data. Data mining techniques
are the better choices for this purpose. Dif- ferent Data Mining techniques are used
and evaluated in agriculture for estimating the future year‘s crop production.
Therefore we propose a brief analysis of crop yield prediction using k Nearest Neighbor
(kNN) technique and Density based clustering technique for the selected region i.e.
Pune district of Maharashtra in India.
General Terms
In this work the experiments are performed two important and well known
classification algorithms K Nearest Neighbor (kNN) and Density based clustering are
applied to the dataset.
Keywords
Data Mining,Machine Learning,Classification Rule,K Nearest Neigh-
bor(KNN),Density Based Clustering.
1. INTRODUCTION Prasad.c. Ascough, ―PhenologyMMS: A
The study focuses on the applications of program to simulate crop phonological
data mining techniques in yield prediction responses to water stress ‖Journal
in the face of climatic change to help the Computers and Electronics in Agriculture
farmer in taking decision for farming and 77 (2011) 118-125 Crop phenology is
achieving the expected economic return. fundamental for understanding crop
The problem of yield prediction is a major growth and development, and increasingly
problem that can be solved based on past influences many agricultural management
data.Therefore we propose a brief analysis practices. Water deficits are one
of crop yield prediction using K Nearest environmental factor that can influence
Neighbor (kNN) technique for the selected crop phenology through shortening or
region in India The patterns of crop lengthening the developmental phase, yet
production in response to the climatic the phonological responses to water
(rainfall, temperature, relative humidity, deficits have rarely been quantified. The
evaporation and sunshine) effect across the objective of this paper is to provide an
selected regions of Maharashtra are being overview of a decision support technology
developed using K Nearest Neighbor software tool, Phenology MMS Vl.2,
(kNN) technique. developed to simulate the phenology of
Will be beneficial if farmers could use the various crops for varying levels of soil
technique to predict the future crop water. The program is intended to be
productivity and consequently adopt simple to use, requires minimal
alternative adaptive measures to maximize information for calibration, and can be
yield. incorporated into other crop simulation
models. It consists of a Java interface
2. LITERATURE REVIEW connected to FORTRAN science modules
Gregory S. McMaster, DA Edmunds, to simulate phonological responses. The
W.W. Wilhelm ,l, D.C. Nielsen, P.v.v. complete developmental sequence of the

shoot apex correlated with phonological program has general applicability for
events, and the response to soil water predicting crop phenology and can aid in
availability for winter and spring wheat crop management.
(Triticum aestivum L.), winter and spring
barley (Hordeum vulgare L.), corn (Zea 3. SYSTEM ARCHITECTURE
mays L.), sorghum (Sorghum bicolor L.), The coaching of soil is step one earlier
proso millet (Panicum milaceum L.), than developing a crop.one of themost
hay/foxtail millet [Setaria italica (L.) P. vital task in agricultural is to penetrate
Beauv.]. And sunflower (Helianthus annus deep into soil and unfasten it.the
L.) was created based on experimental unfastened soil allows the roots to
data and the literature. Model evaluation breathe effortlessly even if they move
consisted of testing algorithms using deep into soil
―generic‖ default phenology parameters 1.1. Title and Author
for wheat (i.e., no calibration for specific IOT Based Agricultural Soil
cultivars was used) for a variety of field Prediction for Crops With
experiments to predict developmental Precautions.
events. Results demonstrated that the
Fig: Prediction is a major hassle that can be solved.
3. SYSTEM ANALYSIS 4. ACKNOWLEDGMENTS

To Design and develop records era in I would prefer to give thanks the
addition to in agriculture era. Agrarian researchers likewise publishers for
area in India is dealing with rigorous creating their resources available. I‘m
trouble to maximize the crop conjointly grateful to guide, reviewer for
productiveness. the prevailing take a look their valuable suggestions and also thank
at makes a specialty of the applications of the college authorities for providing the
information mining strategies in yield required infrastructure and support.
prediction in the face of climatic exchange
to assist the farmer in taking choice for 5. RESULTS
farming and achieving the expected Agriculture is the spine of Indian
economic go back. The problem of yield economic system. In India, majority of
as well as disease based on available the farmers are not getting the expected
statistics. Subsequently we proposed a crop disease after which yield due to
device Prediction of Crop disease numerous reasons. The agricultural yield
Prediction as according to climate is basically relies upon on weather
situation. situations. Rainfall situations additionally

influence the rice cultivation. on this Change and U. S. Agriculture."Nature.345

context, the farmers necessarily requires a (6272, May): 219-224.
[4] Adaptation to Climate Change Issues of
well-timed advice to predict the future Longrun Sustainability." An Economic
crop productivity, disorder and an Research
analysis is to be made as a way to assist [5] Barron, E. J. 1995."Advances in Predicting
the farmers to maximise the crop Global Warming‖.The Bridge (National
manufacturing in their vegetation. Academy of Engineering). 25 (2,Summer):
10-15.
[6] Barua, D. N. 2008. Science and Practice in
REFERENCES Tea Culture,second ed. Tea Research
[1] Adams, R., Fleming, R., Chang, C., McCarl, Association, Calcutta-Jorhat,India.
B., and Rosenzweig, 1993 ―A Reassessment [7] D Ramesh , B Vishnu Vardhan, ―Data mining
of the Economic Effects of Global Climate technique and applications to agriculture
Change on U.S. Agriculture, Unpublished: yield data‖, International Journal of
September. Advanced Research in Computer and
[2] Adams, R.,Glyer, D., and McCarl, B. 1989. Communication Engineering Vol. 2, Issue 9,
"The Economic Effects of Climate Change on September 2013 .
U. S. Agriculture: A Preliminary Assessment." [8] Gideon O Adeoye, Akinola A Agboola,
In Smith, J., and Tirpak, D.,eds., The ―Critical levels for soil pH, available P, K,
Potential Effects of Global Climate Change Zn and Mn and maize ear-leaf content of P,
onthe United States. Washington, D.C.: Cu and Mn insedimentary soils of South-
USEPA. Western Nigeria‖, Nutrient Cycling in
[3] Adams, R.,Rosenzweig, C., Peart, R., Ritchie, Agroeco systems, Volume 6, Issue 1, pp
J., McCarl,B., Glyer, D., Curry, B., Jones, J., 65-71, February 1985.
Boote, K., and Allen, H.1990."Global Climate

IoMT HEALTHCARE: SECURITY MEASURES

Ms. Swati Subhash Nikam1, Ms. Ranjita Balu Pandhare2
1
Department of Computer Engineering, JSPM‘s RSCOE Thatwade, Pune, India.
2
Department of Computer Science & Engineering, KIT‘s College of Engineering
Kolhapur, India.
swatee24@gmail.com1, ranjeeta.pandhare@gmail.com2
ABSTRACT
IoT the vast network of connected things and people, enable users to collect and
analyze data through the use of connected devices. In Healthcare, prevention and cure
have seen diverse advancement in technological schema. Medical equipment used in
this advanced technology also see internet integration. Such equipment used with
internet of things are termed as Internet of Medical things (IOMT). IoMT is
transforming healthcare industry by providing large scale connectivity for medical
devices, patients, physicians, clinical and nursing staff who use them and facilitate
real-time monitoring based on the information gathered from the connected things.
Security constraints for IoMT take confidentiality, integrity and authentication as
prime key aspect. These have to be obtained in each through integration of physical
devices such as sensors for connectivity and communication in cloud-based facility
which in course is delivered by user interface. Security strategy of access control and
data protection for these have to be obtained at various layers in IoMT architecture.
Access Control security is obtained by key generation for data owners and data user of
personal health records while data protection security is obtained by use of advanced
encryption algorithm (AES).
General Terms
IoT, Security, Algorithm,Healthcare
Keywords
IoT, Healthcare, IoMT, Security, Cloud-based, Personal Health Records, Privacy,
Access Control.
1. INTRODUCTION online computer networks. As the amount
During recent times Internet has penetrated of connected medical devices increases,
in our everyday life. Many things have the power of IoMT grows as well grows
revolutionized the way we manage our the scope of its application, be it remote
lives. Internet of things (IoT) tops this list. patient monitoring of people with chronic
IoT the vast network of connected things or long-term conditions or tracking patient
and people, enable users to collect and medication orders or patients‘ wearable
analyze data through the use of connected Health devices, which can send
devices. In Healthcare, prevention and information to caregivers. This new
cure have seen diverse advancement in practice to use IoMT devices to remotely
technological schema. Chronic care and monitor patients in their homes spares
prevention care both stand on equal level them from traveling to a hospital,
with the same advancement in technology. whenever they have a medical question or
Medical equipment used in this advanced change in their condition. This has
technology also see internet integration. revolutionized the whole healthcare
Such equipment used with internet of ecosystem and doctor-patient
things are termed as Internet of Medical communication settings.
things (IOMT). The Internet of Medical Basic record of medical health of patient is
Things (IoMT) is virtually the collection stored as Personal Health Records (PHR).
of medical devices and applications that Numerous methods have been employed to
connect to healthcare IT systems through ensure the privacy of the PHRs stored on

the cloud servers. The privacy preserving enables patients to securely store and share
approaches make sure confidentiality, their PHR in the cloud server (for
integrity, authenticity, accountability, and example, to their care-givers), and
audit trial. Confidentiality ensures that the furthermore the treating doctors can refer
health information is entirely concealed to the patients‘ medical record to specialists
the unsanctioned parties, whereas integrity for research purposes, whenever they are
deals with maintaining the originality of required, while ensuring that the patients‘
the data, whether in transit or in cloud information remain private. This system
storage. Authenticity guarantees that the also supports cross domain operations
health-data is accessed by authorized (e.g., with different countries regulations).
entities only, whereas accountability refers
to the fact that the data access policies Electronic Personal Health Record
must comply with the agreed upon Systems: A Brief Review of Privacy,
procedures. Security, and Architectural Issues [18]
This paper addressed design and
2. RELATED WORK architectural issues of PHR systems, and
A Review on the State-of- the-Art focused on privacy and security issues
Privacy Preserving Approaches in the e- which must be addressed carefully if PHRs
Health Clouds [16] are to become generally acceptable to
This paper aimed to encompass the state- consumers. In conclusion, the general
of-the-art privacy preserving approaches indications are that there are significant
employed in the e-Health clouds. benefits to PHR use, although there are
Automated PHRs are exposed to possible architecturally specific risks to their
abuse and require security measures based adoption that must be considered. Some of
on the identity management, access these relate directly to consumer concerns
control, policy integration, and compliance about security and privacy, and Authors
management. The privacy preserving have attempted to discuss these in the
approaches are classified into context of several different PHR system
cryptographic and non-cryptographic architectures that have been proposed or
approaches and taxonomy of the are in trial. In Germany, the choice of the
approaches is also presented. Furthermore, standalone smartcard PHR is close to
the strengths and weaknesses of the national implementation. In the United
presented approaches are reported and States, implementations and/or tests of all
some open issues are highlighted. The the suggested architectures except the
cryptographic approaches to reduces the standalone smartcard are underway. In the
privacy risks by utilization of certain United Kingdom, the National Health
encryption schemes and cryptographic Service (NHS) appears to have settled on
primitives. This includes Public key an integrated architecture for PHRs.
encryption, Symmetric key encryption,
Alternative primitives such as Attribute Achieving Secure, Scalable and Fine-
based encryption, Identity based grained Data Access Control in Cloud
encryption, proxy-re encryption Computing [19]
This paper addressed challenging open
A General Framework for Secure issue by, on one hand, defining and
Sharing of Personal Health Records in enforcing access policies based on data
Cloud System [17] attributes, and, on the other hand, allowing
In this paper, Author provided an the data owner to delegate most of the
affirmative answer to problem of sharing computation tasks involved in fine grained
by presenting a general framework for data access control to untrusted cloud
secure sharing of PHRs. This system servers without disclosing the underlying

data contents. It achieved this goal by Analyzing and responding to queries, the
exploiting and uniquely combining IoT also controls things. Intelligent
techniques of attribute-based encryption processing involves making data useful
(ABE), proxy re-encryption, and lazy re- through machine learning algorithms.
encryption. This scheme also has salient Phase IV: Data Transmission
properties of user access privilege Data Transmission occurs through all
confidentiality and user secret key parts, from cloud to user. The user may be
accountability. Extensive analysis shows doctor, nurse, pharma and patient himself.
that this scheme is highly efficient and Phase V: Data Delivery
provably secure under existing security Delivery of information takes place
models. through user interface which may be
mobile, desktop or tablet. Delivered data is
3. PHASES IN IOMT in respect to role of person who is asking
Phase I: Data Collection, Data data. Doctor related data and pharma
Acquisition related data will be different.
Physical devices such as sensors plays
important role in enhancing safety and
improving the Quality of life in healthcare
arena. They have inherent accuracy,
intelligence, capability, reliability, small
size and low power consumption.
Figure 2: IoMT Architecture
4. ATTACKS ON PHASES
Phase I: Data Loss
Data loss refers to losing the work
accidentally due to hardware or software
Figure 1: Phases in IOMT [4] failure and natural disasters. Data can be
Phase II: Storage duplicated by intruders. It must be ensured
The data collected in phase I should be that perceived data are received from
stored. Generally, IoT components are intended sensors only. Data authentication
installed with low memory and low could provide integrity and originality.
processing capabilities. The cloud is the Phase II: Denial of service, Access
best solution that takes over the Control
responsibility for storing the data in the The main objective of DOS attack is to
case of stateless devices. overload the target machine with many
Phase III: Intelligent Processing service requests to prevent it from
The IoT analyzes the data stored in the responding to legitimate requests. Unable
cloud DCs and provides intelligent to handle all the service requests on its
services for work and life in hard real time. own, it delegates the work load to other
similar service instances which ultimately

leads to flooding attacks. Cloud system is protest, information gathering, challenge,

more vulnerable to DOS attacks, since it recreation, or to evaluate system
supports resource pooling. [7] weaknesses to assist in formulating
defenses against potential hackers.
5. SECURITY MEASURES IN IOMT

Sensor Node:
Security is essential in sensor nodes
which acquire and transmit sensitive data.
The constraints of processing, memory
and power consumption are very high in
these nodes. Cryptographic algorithms
based on symmetric key are very suitable
for them. The drawback is that secure
storage of secret keys is required. In this
Figure 3: Attacks on Phases [4] work, a low-cost solution is presented to
Phase III: Authentication obfuscate secret keys with Physically
‗Proof of Identity‘ is compromised. Unclonable Functions (PUFs), which
Password is discovered. Attackers adopt exploit the hardware identity of the node.
several mechanisms to retrieve passwords [5]
stored or transmitted by a computer system Access Control:
to launch this attack. Guessing Attack: But In Context Based Access Control, Context
in online guessing scenario the system is a multi-dimensional information
blocks the user after a certain number of structure, where each dimension is
login attempts. Brute Force Attack: This associated with a unique type (value
attack is launched by guessing passwords domain) often want to know ―who is
containing all possible combinations of wearing what device, when, where, and for
letters, numbers and alphanumeric what purpose‖. We refer to who etc.; as
characters until the attacker get the correct dimensions. The value associated with a
password [7]. dimension is of a specific type. As an
Phase IV: Flooding example, with who we can associate a
The cloud server before providing the ―role‖, and with where we can associate a
requested service, checks for the location name. A collection of such
authenticity of the requested jobs and the (dimension, value) pairs is a context. [14].
process consumes CPU utilization,
memory etc. Processing of these bogus
requests, make legitimate service requests
to starve, and as a result the server will
offload its jobs to another server, which
will also eventually arrive at the same
situation. The adversary is thus successful
in engaging the whole cloud system, by
attacking one server and propagating the
attack further by flooding the entire
system.
Phase V: Hacker
A hacker is someone who seeks to breach Figure 4: Security Measures [4]
defenses and exploit weaknesses in Encryption to ensure Data Integrity:
network. Hackers may be motivated by a Attributed based encryption (ABE),
multitude of reasons, such as profit, provides a mechanism by which we can

ensure that even if the storage is I3 = EHR

compromised, the loss of information will
only be minimal. What attribute-based P = P0, P1, P2, P3, P4, P5
encryption does is that, it effectively binds P0 = EHR encrypted (AES algorithm
the access-control policy to the data and used)
the users(clients) instead of having a P1 = k ,P2 = t
server mediating access to files. [6] P3 = key generate
Securing Network Architecture: P4 = sell secrete key
The IETF has proposed a paradigm known P5 = KGC
as Representational State Transfer (ReST) O = O0, O1, O2
which is used for extending the web O0 = EHR decrypted
services over IoT. There is a similarity O1= User revocation ,O2= Traitors
between conventional web services and identify
IoT services following the ReST paradigm
which helps the developers and users to
use the traditional web knowledge in IoT
web-based services. [8]
Event Logging & Activity Monitoring
Event logging and Activity monitoring,
process is performed by examining
electronic audit logs for indications that Fig. 5 Mapping Diagram
unauthorized security-related activities
have been attempted or performed on a 6. CONCLUSION
system or application that processes, Proposed measures, safely stores and
transmits or stores confidential transmits PHRs to the authorized elements
information. When properly designed and in the cloud. The strategy preserves the
implemented, system event logging and security of the PHRs and authorizes a
monitoring assists organizations to patient-driven access control to various
determine what has been recorded on their segments of the PHRs on the access
systems for follow-up investigation and if provided by the patients. We executed a
necessary, remediation. [9] context access control technique so that
Mathematical Model even the valid system clients can‘t get to
those segments of the PHR for which they
System Description: are not authorized. The PHR owners store
Let S be the whole System, the encrypted information on the cloud and
S= I, P, O just the approved users having valid re-
encryption keys issued by a semi-trusted
I= Input, P=Procedure, O= Output authority can decrypt the PHRs. The job of
Users u=owner, doctor, health care staff the semi-trusted authority is to produce
u= u1, u2... un and store the public/private key sets for the
Keywords k= k1, k2...kn clients in the system. The performance
H= heart sensor Evaluation is done on the based-on time
T= temperature sensor required to generate keys, encryption and
D=details decryption tasks, and turnaround time.
EHR=Electronic Health Record
Trapdoor generation t= t1, t2, tn REFERENCES
[1] Bhosale A.H., Manjrekar A.A. (2019)
I = I0, I1, I2, I3 Attribute Based Storage Mechanism with
I0 = H, T, D De-duplication Filter: A Review Paper. In:
I1= u Fong S., Akashe S., Mahalle P. (eds)
I2= k Information and Communication

Technology for Competitive Strategies. [11] Weber, Rolf. (2010). Internet of Things – New
Lecture Notes in Networks and Systems, vol security and privacy challenges. Computer
40. Springer, Singapore Law & Security Review. 26. 23-30.
[2] Jin-cui YANG, Bin-xing FANG,Security 10.1016/j.clsr.2009.11.008.
model and key technologies for the Internet of [12] K. Zhao and L. Ge, "A Survey on the Internet
things, The Journal of China Universities of of Things Security," 2013 Ninth International
Posts and Telecommunications ,Volume 18, Conference on Computational Intelligence and
Supplement 2,2011,Pages 109-112, ISSN Security(CIS), Emeishan 614201, China China,
1005-8885,https://doi.org/10.1016/S1005- 2013, pp. 663-667.
8885(10)60159-8 doi:10.1109/CIS.2013.145.
[3] Lo-Yao Yeh, Woei-Jiunn Tsaur, and Hsin-Han [13] Security Issues and Challenges for the IoT-
Huang. 2017. Secure IoT-Based, Incentive- based Smart Grid Procedia Computer Science,
Aware Emergency Personnel Dispatching ISSN: 1877-0509, Vol: 34, Page: 532-537
Scheme with Weighted Fine-Grained Access [14] V. Alagar, A. Alsaig, O. Ormandjiva and K.
Control. ACM Trans. Intell. Syst. Technol. 9, 1, Wan, "Context-Based Security and Privacy for
Article 10 (September 2017), 23 pages. DOI: Healthcare IoT," 2018 IEEE International
https://doi.org/10.1145/3063716 Conference on Smart Internet of Things
[4] Fei Hu, Security and Privacy in Internet of (SmartIoT), Xi'an, 2018, pp. 122-128.
Things (IoT). Models Algorithms and doi: 10.1109/SmartIoT.2018.00-14
Implementarions, CRC Press, 2016. [15] Arbia Riahi Sfar, Enrico Natalizio, Yacine
[5] Arjona, R.; Prada-Delgado, M.Á.; Arcenegui, Challal, Zied Chtourou,A roadmap for security
J.; Baturone, I. A PUF- and Biometric-Based challenges in the Internet of Things,Digital
Lightweight Hardware Solution to Increase Communications and Networks,Volume 4,
Security at Sensor Nodes. Sensors 2018, 18, Issue 2,2018,Pages 118-137,ISSN 2352-
2429. 8648,https://doi.org/10.1016/j.dcan.2017.04.00
[6] S. Venugopalan,‖ Attribute Based 3.
Cryptology,‖ PhD Dissertation Indian Institute [16] A. Abbas and S. U. Khan, "A Review on the
Of Technology Madras, April-2011. State-of-the-Art Privacy-Preserving
[7] Sumitra B, Pethuru CR & Misbahuddin M, ―A Approaches in the e-Health Clouds," in IEEE
survey of cloud authentication attacks and Journal of Biomedical and Health Informatics,
solution approaches‖, International journal of vol. 18, no. 4, pp. 1431-1441, July 2014.
innovative research in computer and doi: 10.1109/JBHI.2014.2300846.
communication engineering, Vol.2, No.10, [17] M. H. Au, T. H. Yuen, J. K. Liu, W. Susilo, X.
(2014), pp.6245-6253. Huang, Y. Xiang, and Z. L. Jiang, ―A general
[8] Sankar Mukherjee, G.P. Biswas,Networking framework for secure sharing of personal
for IoT and applications using existing health records in cloud system‖, Journal of
communication technology,Egyptian Computer and System Sciences, 2017.
Informatics Journal,Volume 19,Issue [18] David Daglish and Norm Archer, ―Electronic
2,2018,Pages 107-127,ISSN 1110- Personal Health Record Systems: A Brief
8665,https://doi.org/10.1016/j.eij.2017.11.002. Review of Privacy, Security, and Architectural
[9] https://www.controlcase.com/services/log- Issues‖, IEEE 2009.
monitoring/ [19] S. Yu, C. Wang, K. Ren and W. Lou,
[10] Babar, Sachin & Stango, Antonietta & Prasad, "Achieving Secure, Scalable, and Fine-grained
Neeli & Sen, Jaydip & Prasad, Ramjee. Data Access Control in Cloud
(2011). Proposed Embedded Security Computing," 2010 Proceedings IEEE
Framework for Internet of Things (IoT). INFOCOM, San Diego, CA, 2010, pp. 1-9.
10.1109/WIRELESSVITAE.2011.5940923. doi: 10.1109/INFCOM.2010.5462174.

SMART WEARABLE GADGET FOR INDUSTRIAL

SAFETY
Ketki Apte1, Rani Khandagle2, Rijwana Shaikh3,Rani Ohal4
1,2,3,4
Department of Computer Enineering, SCSMCOE, Nepti, Ahmadnagar, India.
ketkiapte21@gmail.com1, ranikhandagle21@gmail.com2, rijwanashaikh312@gmail.com3,
raniohal7@gmail.com4
ABSTRACT
To build a smart device which assist factory workers and other employees, IoT
hardware and protocols are described in this paper. It is a wearable Glove device which
is used in different workspaces where power tools are being constantly used. This
apparatus is made around a microprocessor acting as a central sever, while other
sensor are interfaced with microcontrollers, and it act as a link for data transformation
and perform different topics. A microcontroller‘s works as the master and it controls
the others microcontrollers attached to different sensors. In this master there is a LCD
screen and few buttons, and control the other sensors and read the data in real time.
There are safety features in this glove thus workers cannot use any dangerous power
tools without wearing proper equipment. This glove works as a security measure in
such a way that each tool will have restricted access according to the level of expertise
of the worker. This glove is able to restrict the access to the tools, which are being used
actively during a particular time frame. The central server and different other sensors
such a heat sensor, temperature sensor and vibration sensor log the entire data which
can be attached and monitored by the master glove. Whenever the user gets hurt and
shouts in pain, the analysis function classify the pain and call the medical help because
this system has an extra capability of analysing tone of workers. A sweep based camera
module is used along the central server to record and live stream the captured video
when power tool is switched on. This system focuses the importance of a worker‘s
safety in factory floor.
Keywords
Internet of Things, Industry 4.0, MQTT, Node,Wireless Communications,Factory.
1. INTRODUCTION by the Internet-of-Thing. In that system
With the start of industrial revolution, different sensors like temperature sensor,
power tools became very important part of ambient, accelerometer can be used to
the factory floor. Every day, millions of capturing the data.
people go to work and operate potentially
life threatening machines. According to 2. RELATED WORK
publicly available statistics, more than a Multiple hardware solutions exit to protect
hundred thousand people are injured in and increase the level of safety in any
power tool related accidents every year. power tool or machinery. A set of safety
This results in a huge loss of precious and hazard rules are placed in workspace
work-force and other resources. The idea to limit such issues. But the current
of Connected Machines is an appealing technology only aims at securing the
one and it can be applied to the large as machines and devices, but does not factor
well as small scale machinery to improve in on human errors which is one of the
the efficiency and thus, the productivity in major issues in this case. The tools are not
factories. It is believed that both the access locked and any user, irrespective of
aforementioned ideas can go hand in hand skill set, can use them. If proper protective
and that we can create a solution that measures are not taken seriously, they can
would help with the safety in factories and lead to serious issue [1].
improve efficiency that would be provided

3. PROPOSED WORK fetched from the Glove and transferred to

The proposed solution is an IoT based ARDUINO IDE.
system that implements a wearable that Hardware
connects to any type of machinery and 1. Arduino Nano: Based on
permits access based on whether proper Atmega328/168.Power supply to
safety equipment has been warn. We will arduino is given through Mini-B USB
use sensor on these equipment and send connection, 5V regulated external
this data to a Arduino. On the Arduino, we power supply is given. Arduino Nano
will check if the machine, for which access has 32kb and Digital Pins are 14.
is begin requested for, is free to be used 2. Arduino UNO: It has 14 digital I/O
and if all the proper gear is begin used by PINS, USB connection, power port and
person requesting access and based on this ICSP header. It supports plug and play
information, the Ardunio will control a via USB port.Sensor values are
relay that will power the machine[1] transferred using Arduino Nano to
Arduino IDE.
4. SYSTEM ARCHITECTURE 3. Temperature Sensor: It is basically
Data Glove is further divided into the two used for measuring temperature
parts: fluctuations among temperature values
1. Transmitter around the sensor. LM35 is preferred
2. Receiver sensor. It measures temperature
Transmitter ranging from -55 to +150 degree in
This is most important section in Smart Celsius.
glove. This sensor simply deliver the data 4. LDR Sensor (Light Dependent
content from the main server system to the Register): LDR works as: This is light
device and from device to system. dependent sensor. When light falls on
Transmission throughout the system is LDR then the resistance decreases and
performed by this device. Smart gloves use thus conductivity increases.
various devices to perform this operation. 5. 3-Axis Accelerometer (ADXL335): It
Temperature sensor measures -550C to is a low power, sensor. It measures
1500C. For light detection purpose LDR accelerations of range ±3g. It detects
Sensor are used. Using 3-AXIS the vibrations of Machinery. It
Accelometer gesture values are measures the static and dynamic
represented in form of X, Y and Z acceleration.
coordinates. Arduido Nano converts binary 6. nRF24L01: It is low power transceiver
values into digital values using ADC that operates on frequency of 2.4 Ghz.
converter, these values are processed and It is mainly used for wireless
sent to receiver side via nRF24Lo1 communication. It is a preferred
transceiver. nRF24L01 performs Transceiver.
operations of transferring and receiving in Software
combine.. Arduino IDE: It is used for embedded
Receiver application development which executes
For all operations Aurdino UNO is on Windows, Linux, Mac, etc. and
important unit. Using nRF24L01 value is supports embedded C, C++.

Fig 1: Architecture diagram
5. CONCLUSION REFERENCES
This system is IoT based .Wearable [1] Chirag Mahaveer Parmar,Projjal
glove is ready with different sensors as Gupta,K Shashank Bhardwaj 2018
(Members,IEEE)‖,Smart Work –Assisting
temperature, LDR and 3-AXIS Gear‖. Next Tech Lab(IoT
Accelerometer, Arduino IDE, Arduino Division)SRMUniversity,Kattankulalthur
Nano, Arduino Uno. In this system 5V 2018.
Battery is used. In small scale industry, [2] Aditya C, Siddharth T, Karan K, Priya G
smart glove that connect to any type of 2017, Meri Awaz-Smart Glow Learning
Assistant for Mute Students and Teachers.
machinery and provides access to it IJIRCCE 2017.
based on whether Proper safety of that [3] Umesh V. Nikam, Harshal D.Misalkar, Anup
machine is ensured. W. Burange 2018,Securing MQTT Protocol
in IoT by payload Encryption Technique and
6. ACKNOWLEDGEMENT Digital Signature, IAESTD 2018.
[4] Dhawal L. Patel,Harshal S. Tapse, Praful A.
We take this opportunity to express my Landge, Parmeshwar P. More and Prof.A.P.
hearty thanks to all those who helped me Bagade 2018, Smart Hand Gloves for
in the completion of the Project on Smart Disable Peoples, IRJET 2018.
Wearable Gadget for Industrial Safety. [5] Suman Thakur, Mr. Manish Varma, Mr.
We would especially like to express my Lumesh Sahu 2018,Security System Using
Aurdino Microcontroller, IIJET 2018.
sincere gratitude to Prof. Guide Name, [6] Radhika Munoli, Prof. Sankar Dasiga
Our Guide Prof. Pauras Bangar and 2016,Secured Data Transmission Fro IoT
HOD Prof. J.U. Lagad HOD Department Application, IJARCCE 2016.
of Computer Engineering who extended [7] Ashton K. That 2009 ‗‗Internet of Things‘‘
their moral support, inspiring guidance thing. RFiD Journal; 2009.
[8] Vincent C. Conzola, Michael S. Wogalter
and encouraging independence 1998, Consumer Product Warnings: Effects
throughout this task. We would also of injury Statistics on the call and Subjective
thank our Principal Sir, DR. R.S Evaluation, HFAES 1998.
Deshpande for his great insight and
motivation. Last but not least, we would
like to thank my fellow colleagues for
their valuable suggestions.

SMART SOLAR REMOTE MONITORING AND

FORECASTING SYSTEM
Niranjan Kale1, Akshay Bondarde2, Nitin Kale3, Shailesh Kore4, Prof.D.H.Kulkarni5
1,2,3,4,5
Department of Computer Engineering, Smt Kashibai Navale College of Engineering, Vadgaon(Bk),
Pune, India.
niranjan.g.kale@gmail.com1, akshaybondarde9269@gmail.com2, nitin kale 1998@gmail.com3,
Shaileshkore143143@gmail.com4, dhkulkarni@sinhgad.edu5
ABSTRACT
In innovative developing technologies IoT leads the work quicker and cleverer to
appliance. Every and each solar photovoltaic cell of a solar board necessities to
observer to know its present rank as for this is concern observing in addition to sensing
just in case of deficiency in solar cells of a panel and appliance curative measures to
work in a good situation. The Internet of Things has a prophecy in which the internet
spreads into the actual world implementation everyday objects. The IoT permits objects
to be detected and/or precise remotely over prevailing network structure, generating
chances for pure amalgamation of the corporal world into computer-based systems,
and resultant in developed efficacy, correctness and economic advantage in adding to
reduced human interference. This equipment has several applications like Solar cities,
Smart villages, Micro grids and Solar Path lights and so on. As Renewable energy
raised at a ratiomore rapidly than whichever other time in history through this period.
The suggested structure denotes to the online display of the power usage of solar
energy as a renewable energy. This monitoring is completedconcluded raspberry pi
with flask framework. Smart Monitoring displays everyday procedure of renewable
energy. This helps the user to scrutiny of energy usage. Analysis things on the
renewable energy usage and electricity issues. The suggested work implements
estimating approaches for equally solar resource and PV power. System used
strengthening learning methodology for prospect forecast of power generation. We also
forecast the mistake finding as well dead state of panel. In the investigational
investigation we matchthe concrete expectation and energy generation of panel with
time parameters.
General Terms
Internet of Things, Solar Cell, raspberry pi,Renewable Energy , Machine Learning.
Keywords
Solar Power,Battery,Sensors,Remote Monitoring,Raspberry pi, Mycielski-Markov
Model.
1. INTRODUCTION 2. MOTIVATION
Renewable energy sources, such as solar Today‘s solar plants are highly
and wind, offer many environmental unstructured and localized.Need to map
advantages over fossil fuels for electricity the prediction scenario accuracy of
generation, but the energy produced by system.Study and analysis how
them fluctuates with changing weather environmental factors can affect on
conditions. This work we proposed a solar technical predictions.The Photovoltaic
energy generation and analysis with plantsgenerate energy but we can not
prediction in IoT Environment.We also monitor the performance of each Solar
proposed energy predictions scenario base panel.
on some data mining and prediction
techniques.System can provide the 3. LITERATURE SURVEY
capacity of energy productivity of PV FatihOnurHocaoglu and FatihSerttas [1]
panel suggested a system A novel hybrid
(Mycielski-Markov) model for hourly

solar radiation forecasting. System focuses well than associated models in the day-
on short term predictions of solar ahead point and interval prediction of the
radioactivity are revised. An alternate solar radiance.
method and model is suggested. The Ali Chikh and Ambrish Chandra [3]
method accepts that solar radiation data planned An Optimal Extreme Power Point
recurrences itself in the history. Allowing Tracking Algorithm for PV Systems With
to this preliminary supposition, a novel Climatic Parameters Estimation, System
Mycielski constructed model is planned. suggested a approach Maximum Power
This model reflects the recorded hourly Point Tracking (MPPT) method for
solar radiation statistics as an array and photovoltaic (PV) schemes with
starting from the last record value, it goes concentrated hardware setup. It is
to discovery most parallel sub-array understood by computing the immediate
pattern in the history. This sub-array conductance and the interchange
pattern agrees to the longest matching conductance of the array. The first one is
solar radiation data array in the history. done by means of the array electrical
The data detected after this lengthiest energy and current, whereas the 2nd one,
array in history is measured as the which is a function of the array junction
forecast. In case numerous sub-arrays are current, is predictable by means of an
obtained, the model selects the choice adaptive neuro-fuzzy (ANFIS) solar cell
rendering to the probabilistic relatives of model. Meaningful the problems of
the sub-patterns last values to the determining solar radiation and cell
following value. To model the temperature, since those need2 extra
probabilistic relations of the data, a sensors that will rise the hardware
Markov chain model is approved and circuitry and dimension noise, analogical
used. By this way historical search model model is planned to estimation them with
is fortified. a de-noising based wavelet algorithm.
According to Yu Jiang [2] projected Day- This method supports to decrease the
ahead Forecast of Bi-hourly Solar hardware setup using only one voltage
Radiance with a Markov Switch sensor, while rises the array power
Approach, system uses a regime switching efficacy and MPPT response time. user
procedure to designate the progress of the friendly daily attendance system that is
solar radiance time-series. The optimal easy to manage, maintain and query. Our
number of regimes and regime-exact primary focus is to develop a paperless
parameters are unwavering by the system that provides the management a
Bayesian implication. The Markov regime way to facilitate smoother functioning of
switching model offers together the point the mess system.
and intermission forecast of solar viva city
centered on the posterior distribution 4. PROPOSED WORK
consequent from historical data by the 4.1 PROJECT SCOPE
Bayesian implication. Four solar viva city The product is an android application used
predicting models, the perseverance to manage daily mess attendance along
model, the autoregressive (AR) model, the with streamlining rebate and menu
Gaussian process regression (GPR) model, selection processes. Objective of the
as well as the neural network system is to provide a user friendly daily
1. model, are measured as starting point attendance system that is easy to manage,
models for authenticating the Markov maintain and query. Our primary focus is
switching model. The reasonable analysis to develop a paperless system that
based on numerical experiment outcomes provides the management a way to
determines that in overall the Markov facilitate smoother functioning of the mess
regime exchanging model accomplishes system.

Web browser we can download Java

4.2 Method And Results applets without fear of viral infection.
In total three surveys and one experiment
were conducted. The first survey was a 4.4 Functional Requirement
questionnaire survey to explore what System must be fast and efficient
usability problems users experienced in User friendly GUI
the Netherlands and South Korea. This Reusability
study resulted in thecategorization of soft Performance
usability problems. The second survey
investigated how user characteristics are 5. FIGURES/CAPTIONS :
related to the occurrence ofspecific soft
usability problems. Finally, an experiment
was conducted to find out how user
characteristics are correlated to specific
soft usability problems depending on type
of product in the USA, South Korea and
the Netherlands. Based on the findings
from the studies, an interaction model
(PIP model: Product-Interaction-Persona
model) were developed which provides
insight into the interaction between user
characteristics, product properties, and
soft usability problems. Based on this PIP
model a workshop and an interactive tool
were developed. Companies can use the
PIP model to gain insights into probable Fig: System Architecture
usability problems of a product they are
developing and the characteristics of those 6. ALGORITHM
who would have problems using the
product. Collect data from sensors(Time
Series technique) Measure energy
4.3 Design & Implementation level
Constraints
This protocol is implemented in Java Store the data in data set
language. We also use HTTP/TCP/IP
protocols.Java has had a profound For solar energy
effect on the Internet. The reason for forecasting(Mycielski-Markov
this is Java expands the universe of Model)
objects that can be about freely on the If data match with historical data
Internet. There are two types of objects gives accurate result
we transmit over the network, passive Else select probable prediction
and dynamic.Network programs also gives highly possible result
present serious problems in the areas of Send hourly notification of status of
security and portability. When we solar panel
download a normal program we risk Ideal solution to increase
viral infection. Java provides a firewall efficiency of solar plant monitoring
to overcome these problems. Java with detection of failure helps to
concerns about these problems by consume more energy and
applets. By using a Java compatible accuracy in prediction of solar
radiation.

Time Series Technique help the user to compute the condition of

many constraints in the solar PV PCU.
collection of data points at constant time Applying Renewable Energy technologies
intervals.these are analyzed to determine is one suggested way of falling the
the long term trend so as to forecast the environmental effect. Because of
future. numerous power cut it is important to use
renewable energy and monitoring it.
Mycielski-Markov Model Monitoring guides the user in scrutiny of
needs only historical solar data without renewable energy usage. This system is
any other parameters.repeatedness in cost effective. The system efficacy is
the history directly gives accurate about 95%.This allows the proficient use
results. of renewable energy. Thus it is falling the
electricity matters
REFERENCES
[1] Day-ahead Prediction of Bi-hourly
Solar Radiance with a Markov Switch
Approach, Yu Jiang, Huan Long, Zijun
Zhang, and ZheSong ,IEEE Transactions on
7. ACKNOWLEDGMENTS Sustainable Energy,2017, DOI 10.1109
With due respect and gratitude, we [2] An Optimal Maximum Power Point Tracking
take the opportunity to thank to all those Algorithm for PV Systems With Climatic
who have helped us directly and indirectly. Parameters Estimation ,AliChikh and
Ambrish Chandra, IEEE TRANSACTIONS
We convey our sincere thanks to Prof. P. ON SUSTAINABLE ENERGY, 2015,DOI
N. Mahalle, HoD, Computer Dept. and 10.1109
PROF. D. H. Kulkarni for their help in [3] Critical weather situations for renewable
selecting the project topic and support. energies e Part B: Low stratus risk for solar
Our guide PROF. D. H. Kulkarni has power, Carmen K€ohler , Andrea Steiner,
Yves-Marie Saint-Drenan, Dominique Ernst,
always encouraged us and given us the Anja Bergmann-Dick, Mathias Zirkelbach,
motivation to move ahead. He has put in a Zied Ben Bouall_egue , Isabel Metzinger ,
lot of time and effort in this seminar along Bodo Ritter Elsevier,Renewable
with us and given us a lot of confidence. Energy(2017),http://dx.doi.org/10.1016/j.rene
We wish to extend a big thank to him for ne.2016.09.002
[4] Sentinella: Smart Monitoring of Photovoltaic
the same.Also, we wish to thank all the Systems at Panel Level-Bruno Andò, Senior
other people who in any smallest way have Member, IEEE, Salvatore Baglio, Fellow,
helped us in the successful completion of IEEE, Antonio Pistorio, Giuseppe Marco
this project. Tina, Member, IEEE, and Cristina Ventura,
0018-9456 © 2015 IEEE , DOI 10.110
[5] Monitoring system for photovoltaic plants: A
8. CONCLUSION review-SivaRamakrishnaMadeti n, S.N.Singh
The solar PV PCU observing using Alternate Hydro Energy Centre,
Internet of Things has been IndianInstituteofTechnologyRoorkee,
experimentally sure to work satisfactorily Uttarakhand247667,India
by monitoring the constraints effectively RenewableandSustainableEnergy
Reviews67(2017)1180– 1207,
through the internet. The planned system http://dx.doi.org/10.1016/j.rser.2016.09.088
not only monitors the parameter of solar [6] Design and implementation of a solar plant and
PV PCU , but it also operate the data and irrigation system with remote monitoring and
create the report according to the remote control infrastructures ,YasinKabalci ,
requirement, for example estimation unit ErsanKabalci , RidvanCanbaz ,
AyberkCalpbinici, Elsevier, Solar Energy
plot and create total units produced per 139(2016),
month. It also stores all the constraints in [7] Forecasting of solar energy with application for
the cloud in a timely manner. This will a growing economy like India: Survey and

implication, Sthitapragyan Mohanty, Power Systems Research 136 (2016),

Prashanta K. Patra, Sudhansu S. Sahoo, http://dx.doi.org/10.1016/j.epsr.2016.02.006
AsitMohanty .Elsevier, Renewable and [9] Improving the performance of power system
Sustainable Energy Reviews 78(2017) , protection using wide area monitoring
http://dx.doi.org/10.1016 systems.Arun G. PHADKE1, Peter WALL2,
[8] Utility scale photovoltaic plant indices and Lei DING3, Vladimir TERZIJA2, Springer,
models for on-line monitoring and fault J. Mod. Power Syst. Clean Energy (2016),
detection purposes, Cristina Ventura, DOI 10.1007/s40565-016-0211-x.
Giuseppe Marco Tina, Elsevier, Electric

SMART AGRICULTURE USING INTERNET OF

THINGS
Akshay Kudale1, Yogesh Bhavsar2, Ashutosh Auti3, Mahesh Raykar4, Prof. V. R.
Ghule5
1,2,3,4,5
Department of Computer Engineering, Smt. Kashibai Navale College of Engineering Savitribai
Phule Pune University, Vadgaon(Bk), Pune, India.
akshayheights@gmail.com1, bhavsaryogi97@gmail.com2, ashutoshauti24@gmail.com3,
maheshraykar2@gmail.com4, vrghule@sinhgad.edu5
ABSTRACT
Today agriculture is inserted with propel benefit like GPS, sensors that empower to
impart to each other break down the information and further more trade information
among them. IT gives benefit as cloud to farming. Internet of Things plays an
important role in smart farming. Smart farming is an emerging concept, because of
IoT sensors which are capable of providing information about agriculture field
conditions. The combination of traditional methods with latest advancements in
technologies as Internet of Things and WSNs can lead to agriculture modernization.
The Wireless Sensor Network which collects the data from different types of sensors
and send it to the main server using wireless protocol. There are many other factors
that affect the productivity to great extent. Factors include attack of insects and pests
which can be controlled by spraying the proper insecticide and pesticides and also
attack of wild animals and birds when the crop grows up. The crop yield is declining
because of unpredictable monsoon rainfalls, water scarcity and improper water usage.
The developed system is more efficient and beneficial for farmers. It gives the
information about the temperature, humidity of the air in agricultural field and other
soil nutrients through mobile application to the farmer, if it fallout from optimal range.
The application of such system in the field can definitely help to advance the harvest of
the crops and global production.
General Terms
Internet of Things (Iot), Machine Learning, Passive Infrared Sensor (PIR)
1. INTRODUCTION increasing the yield. The proposed system
Agriculture is considered as the basis of collects the data from various sensors and
life for the human species as it is the it provides the information about different
main source of food grains and other raw environmental factors which in turns helps
materials. It plays vital role in the growth to monitor the system. Monitoring
of country‘s economy. It also provides environmental factors is not enough and
large ample employment opportunities to complete solution to improve the yield
the people. Growth in agricultural sector of the crops. There are number of other
is necessary for the development of factors that affect the productivity to great
economic condition of the country. extent. These factors include attack of
Unfortunately, many farmers still use the insects and pests which can be controlled
traditional methods of farming which by spraying the crop with proper
results in low yielding of crops and fruits. insecticide and pesticides. Secondly,
But wherever automation had been attack of wild animals and birds when the
implemented and human beings had been crop grows up. There is also possibility of
replaced by automatic machineries, the thefts when crop is at the stage of
yield has been improved. Hence there is harvesting. Even after harvesting, farmers
need to implement modern science and also face problems in storage of harvested
technology in the agriculture sector for crop. So, in order to provide solutions to

all such problems, it is necessary to Prathibha S , IoT Based

Anupama Hongal , Monitoring System
develop integrated system which will take 2017 Jyothi M In Smart
care of all factors affecting the productivity Agriculture.
in every stages like; cultivation, harvesting
and post harvesting storage. This
proposed system is therefore useful in 4. GAP ANALYSIS
monitoring the field data as well as Irrigation Intelligent
controlling the field operations which Parameter automation Security
using Iot and
provides the flexibility. Warehouse
Monitoring
1. Data
device
2. MOTIVATION collected by 1. The system
Agriculture is the basis for the human sensors will can be
help in controlled and
species as it is the main source of food Advantages deciding ON monitored from
and it plays important role in the growth and OFF of remote
irrigation location.
of country‘s economy. Agriculture is the system. 2. Threats of
prime occupation in our country. 65% of 2. Remote rodents and
controlling
our country's population works in of system
thefts can be
easily detected.
agriculture sector. Agriculture sector reduces
1. All
contributes in 20% of GDP of our farmer‘s
parameters of
efforts.
soil are not
country. Farmers use traditional methods 1. System
considered doesn‘t
for farming which results in reducing the while identify and
quality of yields. Traditional methods Disadvanta ge
automating categorize
irrigation. between
reduce the quantity of crops further 2. System is humans,
reducing the net profit generated. not reliable mammals and
Farmers have insufficient information in some rodents.
cases as it 2. System
about soil, appropriate water level, fails to doesn‘t satisfy
atmospheric conditions which lead to provide all test cases
correct and this
crop degradation. With the help of output. increases the
Internet of things, we can overcome 5. PROPOSED WORK threat of not
these drawbacks and can help farmers in The project will help in transforming
detecting
and reorienting agricultural rodents
systems and to
reducing their efforts and increasing the thefts.
crop production. Using smart IoT effectively support development and
system, farmer can increase yield and net ensure food security in changing climate.
profit generated in field. Project is based on the consideration that
the proposed system will help in
3. STATE OF ART increasing quality and quantity of yield.
Table 1. Literature survey System will gather the information about
Year Author Title climate change, soil nutrients, etc. using
Nikesh
the sensors installed in field, to predict the
2016 Gondchawar, IoT based suitable crops for that climate conditions.
Smart Agriculture
Prof. Dr. R. S. This system will continuously monitor the
Kawitkar
Development of field and will suggest suitable actions.
Tanmay Baranwal,
2017
Nitika, Pushpendra
IoT based Smart Smart Warehouse system will detect and
Security and
Kumar Pateriya
Monitoring Devices
differentiate between humans and rodents
Nelson Sales, for Agriculture.
Wireless Sensor and will trigger that alerts.
Orlando Remedios, and Actuator Assumptions and Dependencies
2015
Artur Arsenio System for Smart
Irrigation on the In the proposed system there are
Cloud. various assumptions which are important
for the working of the proposed system.
It is important that the data gathered by

sensors should be correct. Data collected cloud platform and providing suggestions
by sensors is assumed to be same in all to farmers through mobile application.
areas of the field. The arrangement of the System Design
whole system is unchanged and secured.
Warehouse security system will
differentiate between rodents and humans
based on size.
The proposed system depends on the
consideration that users have good
internet connection and local system
should have power supply. Also user
should have mobile application installed
where alerts will be provided. Fig.2. DFD level 0
Requirements
The functional requirement of
system includes the data gathered by
the sensors and the decisions which are
taken on the basis of this data. The data
provided by sensors can contain some
noise, so it requires refining that data.
The processing model installed on
cloud platform will take this refined
data as input and then it will take
decisions based on the dataset values.
Accordingly, alerts will be provided to Fig.3. Data Flow Diagram level 1
farmers through mobile applications. The Fig, 2 and Fig.3 shows the
User of this system is a farmer. So data flow diagram of the system. It
we have to design the application shows the graphical representation of
accordingly. System must provide flow of data through an information
reliable alerts to user which will help system. It also shows the preliminary
him in making decisions and taking step to create an overview of the
actions about the field. system. DFD level 0 shows three
components - farmers, local system
Steps Involved and administrator which interact with
the model. DFD level 1 describes the
function through which the farmers,
local system and administrators
interact with the system. Local
system can collect data using sensors.
Farmers can request and view their
data on the system. Administrator
manages the data stored.
Other Specification
The proposed system provides
advantages in terms of increasing the
Fig. 1. Steps Involved
quality and quantity of yield and
As shown in Fig.1 the model will reducing the risk factor of damage
proceed in three steps. Which are caused by natural calamities. Also this
Colleting the data from field using system will help in improving soil
sensors, processing the collected data on fertility and soil nutrients, increasing net

profit of farmers and reducing efforts of mammals, also sensor fusion can be done
farmers. This system will promote the to increase the functionality of device.
smart farming techniques. Improving these perspectives of device,
This system has some limitations in it can be used in different areas. This
terms of requirement of constant power project can undergo for further research
supply and stable internet connection. to improve the functionality of device
Also farmer should be able to use and its applicable areas. We have opted
smartphone. Farmer should afford the to implement this system as a security
cost of the proposed system. solution in agricultural sector i.e. farms,
cold stores and grain stores.
WORK REFERENCES
Internet of Things is widely used in [1] Nikesh Gondchawar, Prof. Dr. R. S.Kawitkar,
connecting devices and collecting IoT based Smart Agriculture International
Journal of Advanced Research in Computer
information. All the sensors are and Communication Engineering Vol. 5, Issue
successfully interfaced with raspberry pi 6, ISSN (Online) 2278-1021 ISSN (Print) 2319
and wireless communication is achieved 5940, June 2016.
between various nodes. All observations [2] Tanmay Baranwal, Nitika , Pushpendra
and experimental tests proves that project Kumar Pateriya Development of IoT based
Smart Security and Monitoring Devices for
is a complete solution to field activities, Agriculture 6th International Conference -
environmental problems, and storage Cloud System and Big Data Engineering, 978-
problems using smart irrigation system 1-4673-8203-8/16, 2016 IEEE.
and a smart warehouse management [3] Nelson Sales, Artur Arsenio, Wireless Sensor
system. Implementation of such a system and Actuator System for Smart Irrigation on
the Cloud 978-1- 5090-0366-2/15, 2nd World
in the field can definitely help to improve forum on Internet of Things (WF-IoT) Dec
the yield of the crops and overall 2015, published in IEEE Xplore Jan 2016.
production. [4] Prathibha S R1, Anupama Hongal 2, Jyothi
The device can incorporate pattern M P3 IoT based Monitoring System In Smart
recognition techniques for machine Agriculture 2017 International Conference on
Recent Advances in Electronics and
learning and to identify objects and Communication Technology.
categorize them into humans, rodents and

AREA-WISE BIKE POOLING- ―BIKEUP‖

Mayur Chavhan, Amol Kharat, Sagar Tambe, Prof. S.P Kosbatwar
Department of Computer Engineering, Smt. Kashibai Navale College of Engineering
Pune,India
chavhanm101997@gmail.com, amol.kharat9995@gmail.com, tambesagar28@gmail.com,
spkosbatwar@sinhgad.edu
ABSTRACT
―This study summarizes the implementation of bike pooling system and it
services. Functionalities as proving low cost travelling for middle class families who
is not afford expense on travelling. This system also very useful in rural area
whereas transport vehicles are less in number.‖
Keywords—Bidding, Auction.
1. INTRODUCTION range of 100 m to 5 km and if the
Now-a-days, cabs are in great destination points matches the request
transportation demand. The concept of ola from the end user side will send and thus
and uber is that we book a ride through accepted by bikers, and at the end user by
their application, by providing the pickup applying the essential charges to apply end
point as well as destination point. user and thus drop the end user to required
According to need of people they used location.
their transportation means, Here we will use the first law of business
Ex: is ―Use Public Investment for Business‖.
 A person riding bike uses
bike. 2. MOTIVATION
 A person driving car uses car The motivation for doing this project was
etc. primarily an interest in undertaking a
But besides all this addition of cabs challenging project in an interesting area.
results in more traffic. So we are The observation towards the increasing traffic
developing an application namely, ―Bike- gave a thought to develop such project which
Up‖ which will help in traffic reduction will lead to a decrease in traffic as well
where we will be using two wheelers to
provide efficient transportation means to
provide transportation services to the
people.
people.
This will help to lower down the tremendously
Ex:
If the single person want to travel increasing pollution.
he will also have to book a ride for cabs, This will also be very useful for common
auto except two wheelers. people who cannot afford for cabs.
It will be of great help for controlling the day
So here, We are using private two increasing the traffic as well as pollution
wheelers as a public transport i.e the observing both these factors, gave an idea for
person riding bike as well as the end user this project.
will need an application installed in their
mobiles then the end user and the rider 3. STATE OF ART
both will entered their destination points Paper Title: The mobile applications
and pickup point will be generated using development is composed by three groups:
Google Map. natives, hybrids and web. In this paper a
A broadcast message from the comparison between the native and hybrid
biker will sent after which the controller of mobile applications build on JavaScript
the application will match the destination (Reactive Native, Native Script and Ionic)
point of both person occurring in particular is done. The analysis is done using the 7

more relevant principles to the mobile used by most of the participating agents.
applications development. This paper We then report on the strategies of the
shows that React Native exhibits the best four top-placing agents. We conclude
results in all the analyzed principles and with suggestions for improving the design
still having benefits in the hybrid of future trading agent com-petitions
development in relation to native. With the
emergence of frameworks for mobile Paper Title: The opportunistic large array
development, some of them with a little (OLA) with transmission threshold (OLA-
more than a year of existence, there is the T) is a simple form of cooperative
difficulty to perceive which are the most transmission that limits node participation
advantageous for a given business in broadcasts. Performance of OLA-T has
objective, this article shows the best been studied for disc-shaped networks.
options among the frameworks used, This paper analyzes OLA-T for strip-
always comparing with the native shaped networks. The results also apply to
development. arbitrarily shaped networks that have
previously limited node participation to a
Paper Title: Among the various impacts strip. The analytical results include a
caused by high penetration distributed condition for sustained propagation, which
generation (DG) in medium and low implies a bound on the transmission
voltage distribution networks, the issues of threshold. OLA transmission on a strip
interaction between the DG and feeder network with and without a transmission
equipment, such as step voltage regulators threshold are compared in terms of total
(SVRs), have been increasingly brought energy consumption.
into focus of computational analyses and
real-life case studies. Particularly, the 2. gap analysis
SVR's runaway condition is a major Standard Platform:
concern in recent years due to the  It is an standard android
overvoltage problem and the SVR application or ios application.
maintenance costs it entails. This paper  All the API are pure platform
aims to assess the accuracy of the quasi- dependent for ola and uber.
static time series (QSTS) method in  There is no such algorithm which
detailing such phenomenon when support for cross-platform for each
compared to the classical load flow platform there is different
formulation. To this end, simulations were algorithms.
performed using the OpenDSS software  No current system availed for two
for two different test-feeders and helped to wheeler transportation.
demonstrate the effectiveness of the QSTS  Some company provide such
approach in investigating the SVR's services but they don‘t have proper
runaway condition. implementation of these system.
Paper Title: Autonomous Bidding Agents  In rural area the transportation
in the Trading Agent Competition services is negligible.
Abstract: Designing agents that can bid BikeUp:
in online simultaneous auctions is a
 It is cross-platform algorithm
complex task. The authors describe task- which is use for many platform.
specific details and strategies of agents in a
 This API are pure platform
trading agent competition. More
independent for various devices
specifically, the article describes the task-
like web application, android
specific details of and the general
application,ios application‘
motivations behind, the four top-scoring
 This system will increase
agents. First, we discuss general strategies
employability in rural area by
adding there bike in BikeUp stored in a database. An entity in this

System. context is an object, a component of data.
An entity set is a collection of similar
4. PROPOSED WORK entities. These entities can have attributes
Transport plays a vital role in that define its properties.
economic and human development. In the
initial phases of development of an
economy, transport requirements tend to
increase at a considerably higher rate than
the growth of economy. In India, during
1990 to 2005, the rail freight traffic
increased nearly two and half times and
traffic by road almost five times.
Traffic congestion increases
vehicle emissions and degrades ambient
air quality, and recent studies have shown
excess morbidity and mortality for drivers,
commuters and individuals living near Figure 4.4:ER Diagram
major roadways. In figure 4.4, Customer entity contains
cust_id, name, gender, destination,
mob_no this detail will stored in table and
provide to match_destination action.
Biker entity contails vehical_no, mob_no,
gender, bike_name, biker_id, destination
from this attribute destination address is
need for matching_destination action
between customer and biker entity.

WORK
In this paper we proposed a method and
Figure 1: Survey of various Vehicles apparatus for managing bidding process
In the figure 1, It shows the various for services in india. a platform for
vehicle that produce pollution in connecting service provider to clients, to
percentage. The lowest pollution rate of
improve the local markets in india. this
bike quite larger than trucks because of
trucks have very minimum numbers as web site has an intuitive interface and
compare to bike. By using the bike it really unique visual objects that make it friendly
help to reduce pollution. for use. online auction will provide a way
Regarding passenger traffic, road to connect service providers and
traffic increased almost three times. consumers. india needs a platform to
Recently it is reported that road traffic connect small businesses which lays
would account for 87% of passenger
foundation of indian economy. this
traffic and 65% of freight traffic. The
increase in road traffic had direct platform will work for the same. the
implication on pollution. In Delhi, the platform will enable small businesses to
vehicular pollution was increasing since connect to the peoples who need their
2000. services.
An entity relationship diagram (ERD)
shows the relationships of entity sets

the future work may lay its emphases [4] Chia-Hui Yen(2008) Effects of e-service quality
on loyalty intention: an empirical study in
on exploration of the various methods online auction.
and applications of blockchain in auction [5] A. Kailas, L. Thanayankizil, M. A. Ingram, "A
by overcoming its limitations. more layers simple cooperative transmission protocol for
energy-efficient broadcasting over multi-hop
of hybrid functions can be included for wireless networks", KICS/IEEE Journal of
further increase in data integrity and Communications and Networks (Special Issue
on Wireless Cooperative Transmission and Its
security.
Applications), vol. 10, no. 2, pp. 213-220, June
2008.
REFERENCES [6] Y. J. Chang, M. A. Ingram, "Packet arrival time
[1] J.-N. Meier, A. Kailas, O. Abuchaar et al., "On estimation techniques in software defined
augmenting adaptive cruise control systems radio", preparation.
with vehicular communication for smoother [7] B. Sirkeci-Mergen, A. Scaglione, "On the power
automated following", Proc. TRB Annual efficiency of cooperative broadcast in dense
Meeting, Jan. 2018. wireless networks", IEEE J. Sel. Areas
[2] Dan Ariely(2003) Buying, Bidding, Playing, or Commun., vol. 25, no. 2, pp. 497-507, Feb.
Competing? Value Assessment. 2007.
[3] Amy Greenwald (2001) Autonomous Bidding
Agents in the Trading Agent Competition.

SMART WATER QUALITY MANAGEMENT

SYSTEM
Prof. Rachana Satao, Rutuja Padavkar, Rachana Gade, Snehal Aher, Vaibhavi Dangat
Dept. of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune
rutujapadavkar@gmail.com , rachanagade97@gmail.com
ABSTRACT
Water pollution is one of the biggest threats for the green globalization. Water
pollution affects human health by causing waterborne diseases. In the present
scenario, water parameters are detected by chemical tester laboratory test , where the
testing equipment‘s are stationary and samples are provided to testing equipment‘s. In
this paper, the design of Arduino based water quality monitoring system that monitors
the quality of water in real time is presented. This system consists of different sensors
which measures the water quality parameter such as pH, conductivity, muddiness of
water , temperature.
Keywords
WSN : Wireless Sensor Network pH : potential of Hydrogen
RM: Relay Module
1. INTRODUCTION The system proposed is a water quality
The quality of drinking water is essential monitoring system in the Arduino platform
for public health. Hence, it‘s necessary to that measures the pH , conductivity,
prevent any intrusion into water temperature, and presence of suspended
distribution systems and to detect items on the water bodies like lakes and
pollution as soon as possible, whether rivers using sensors.These sensed
intentional or accidental . The protection parameters are sent to the authorized
of the visible assets (water storage tank, person via server system in the form of
pumping station, treatment centers, etc.) messages, so that proper action can be
can be realized by traditional intrusion taken by the authority in cleaning the water
detection. As a result, the network bodies to reduce the possible health
becomes more difficult to protect. In problem that could occur.
recent years, assistance and research
programs have been developed to 2. MOTIVATION
improve the safety and security of
drinking water systems and enhance Traditional water quality monitoring
capability of system monitoring, sensors involves three steps namely water
are placed which monitor various sampling, Testing and investigation. These
parameters of quality of water in timely are done manually by the scientists. This
manner[1]. Various algorithms takes into technique is not fully reliable and gives no
account the variable characteristics of the indication before hand on quality of
water quality parameters[5][6]. Also there water. Also with the advent of wireless
are systems which are developed that can sensor technologies, some amount of
evaluate two to three parameters of water research carried out in monitoring the
using PH sensors, turbidity sensors , water quality using wireless sensors
temperature sensor, s::can and eventlab for deployed in water and sending short
contamination detection [2][14][4]. message to farmer‘s about water. Also
Our project provides a new water quality research been carried out in analyzing the
monitoring system for water distribution quality of water using machine learning
network based on wireless sensor network algorithms too.
(WSN).
3. LITERATURE SURVEY

1. Design and Implementation of Cost environmental impacts will continue to

Effective WaterQuality Evaluation increase the prevalence and duration of
System in 2017: harmful freshwater cyanobacterial and
In this research project, a system is algae blooms. Human, ecological and
developed and discussed which can economic health can all be negatively
evaluate the three parameters of water. impacted by harmful cyanobacterial
The detection of water parameter could blooms formed due to eutrophication.
reduce the rate of illness and unnecessary 5. Towards a water quality monitoring
death as well as create consciousness to system based on wireless sensor
people for healthier life. networks in 2017:
2. Smart sensor detection for water We proposed an efficient anomaly
quality as anticipation of disaster detection algorithm centralized in the
environment pollution in 2016: sink node where a global and coherent
Water quality is good for water from the water quality should be obtained from the
local government water company of measurements taken locally. The
Surabaya and Malang; mountain spring algorithm takes into account the variable
water, wells water in Malang; and aqua characteristics of the water quality
water. Water quality is less good for wells parameters. Indeed, one of the water
water in Surabaya. While poor water quality parameters like pH can suddenly
quality for tap water mixed with soap. exceed the standard thresholds during a
3. Smart Technology for Water measurement window and then it keeps
Quality Control Feedback about use of standard values.
water quality sensors in 2017:
This project presented analysis of the 4. STATE OF ART
use of two smart sensors (S::CAN and To enhance capability of system
EventLab) for early detection of water monitoring, sensors are placed which
contamination. The performances of these monitor various parameters of quality of
sensors were first verified using a pilot water in timely manner. Also
station. Then , these sensors were development of cloud environment for
installed in the distribution network of the storage of real time data of Water quality
Scientific Campus of the University of from sensors in real pipeline network can
Lille. Recorded data showed quasi- be done[1].
constant signals. Some events were In [3] authors proposed that water
detected. They coincided with the start of quality is good for water from the local
water consumption. A comparison government water company of Surabaya
between recorded data and laboratory and Malang; mountain spring water, wells
analyses confirmed the good water in Malang; and aqua water. Water
performances of the tested sensors. The quality is less good for wells water in
demonstration program continues in order Surabaya. While poor water quality for tap
to enhance our experience with these water mixed with soap.
innovative water quality sensors. In [14]authors proposed a research
4. A Centrifugal Microfluidic-Based project, a system is developed and
Approach for Multi- Toxin Detection discussed which can evaluate the three
for Real-Time Marine Water-Quality parameters of water. The detection of
Monitoring in 2017: water parameter could reduce the rate of
To sustain a rapidly increasing of illness and unnecessary death as well as
population growth, the global demand for create consciousness to people for
clean, safe water supplies has never been healthier life. In [2][4]authors proposed
more apparent. It has been previously systems which are developed that can
reported, and predicted, that anthropogenic evaluate two to three parameters of water

using PH sensors, turbidity sensors, The proposed system consists of 3

temperature sensor, s::can and eventlab major stages.At the first i.e Sensing stage,
for contamination detection. Computing and controlling and
Communication stage.
5. GAP ANALYSIS The proposed system consists of 3
The use of various sensors was major stages. Sensing stage, Computing
proposed in different systems. To and controlling and Communication
enhance capability of system monitoring, stage.The system is a water quality
sensors which monitor various parameters monitoring system in the Arduino
of quality of water in timely manner were platform that measures the pH,
used[1]. Then in another research project, conductivity, temperature, and presence of
a system was developed which can suspended items on the water bodies like
evaluate the three parameters of water lakes and rivers using sensors.These
. The detection of water parameter could sensed parameters are sent to the
reduce the rate of illness and unnecessary authorized person via server system in the
death as well as create consciousness to form of messages, so that proper action
people for healthier life[14]. An efficient can be taken by the authority in cleaning
anomaly detection algorithm was proposed the water bodies to reduce the possible
centralized in the sink node where a health problem that could occur .All
global and coherent water quality should switching ON/OFF is remotely done by
be obtained from the measurements taken the RM.
locally. The algorithm takes into account
the variable characteristics of the water4. 7. CONCLUSION
quality parameters. Indeed, one of the An electronic system is designed to
water quality parameters like pH can control and monitor the level of water in a
suddenly exceed the standard thresholds tank. A similar reservoir based on the
during a measurement window and then water detector sensor information. The
it keeps standard values[7]. electronic system is designed to
automatically control and display water
6. PROPOSED WORK levels . The proposed system eliminates
manual monitoring and controlling for
home, agricultural or industrial users. The
system achieves proper water management
and enhances productivity from
automation.
8. FUTURE WORK
Water is a key element for the human
survival but uneasy and unsustainable
because of patterns of water consumption.
Usage are still evident in our practical
life.There is a strong need to change this
pattern of sustainability.The world would
indeed cease to exist without the
availablility of water.
REFERENCES
Fig : System Implementation Plan [1] Vijay, Mahak, S. A. Akbar, and S. C. Jain.
"Chlorine decay modelling with contamination
simulation for water quality in smart water
grid." In 2017 International Conference on

Energy, Communication, Data Analytics and International Conference on, pp. 38-41. IEEE,
Soft Computing (ICECDS), pp. 3336-3341. 2017.
IEEE, 2017. [8] Shirode, Mourvika, Monika Adaling, Jyoti
[2] Pawara, Sona, Siddhi Nalam, Saurabh Biradar, and Trupti Mate. "IOT Based Water
Mirajkar, Shruti Gujar, and Vaishali Nagmoti. Quality Monitoring System." (2018).
"Remote monitoring of waters quality from [9] Getu, Beza Negash, and Hussain A. Attia.
reservoirs." In Convergence in Technology "Electricity audit and reduction of
(I2CT), 2017 2nd International Conference consumption: campus case study."
for, pp. 503-506. IEEE, 2017. International Journal of Applied Engineering
[3] Putra, Dito Adhi, and Tri Harsono. "Smart Research 11, no. 6(2016): 4423-4427.
sensor device for detection of water quality as [10] Attia, Hussain A., and Beza N. Getu.
anticipation of disaster environment pollution." "Authorized Timer for Reduction of Electricity
In Electronics Symposium (IES), 2016 Consumption and Energy saving in
International, pp. 87-92. IEEE, 2016. Classrooms." I JAER 11, no. 15 (2016): 8436-
[4] Saab, Christine, Isam Shahrour, and Fadi 8441.
Hage Chehade. "Smart technology for water [11] Getu, Beza Negash, and Hussain A. Attia.
quality control: Feedback about use of water "Automatic control of agricultural pumps
quality sensors." In Sensors Networks Smart based on soil moisture sensing." In AFRICON,
and Emerging Technologies (SENSET), 2017, 2015, pp. 1-5. IEEE, 2015.
pp. 1-4. 2017. [12] Bhardwaj, R. M. "Overview of Ganga River
[5] Borawake-Satao, Rachana, and Rajesh Prasad. Pollution." Report: Central Pollution Control
"Mobility Aware Path Discovery for Efficient Board, Delhi (2011).
Routing in Wireless Multimedia Sensor [13] NivitYadav, "CPCB Real time Water Quality
Network." In Proceedings of the International Monitoring", Report: Center for Science and
Conference on Data Engineering and Environment, 2012
Communication Technology, pp. 673-681. [14] Faruq, Md Omar, Injamamul Hoque Emu,
Springer, Singapore, 2017. Md Nazmul Haque, Maitry Dey, N. K. Das,
[6] Borawake-Satao, Rachana, and Rajesh and Mrinmoy Dey. "Design and
Prasad. "Comprehensive survey on effect of implementation of cost effective water quality
mobility over routing issues in wireless evaluation system." In Humanitarian
multimedia sensor networks." International Technology Conference (R10- HTC), 2017
Journal of Pervasive Computing and IEEE Region 10, pp. 860-863, IEEE, 2017.
Communications 12, no. 4 (2016): 447-465. [15] Le Dinh, Tuan, Wen Hu, Pavan Sikka, Peter
[7] Jalal, Dziri, and Tahar Ezzedine. "Towards a Corke, Leslie Overs, and Stephen Brosnan.
water quality monitoring system based on "Design and deployment of a remote robust
wireless sensor networks." In Internet of sensor network: Experiences from an outdoor
Things, Embedded Systems and water quality monitoring network." In Local
Communications (IINTEC), 2017 Computer Networks, 2007. LCN 2007. 32nd
IEEE 2007.

INTELLIGENT WATER REGULATION USING IOT

Shahapurkar Shreya Somnath1, Kardile Prajakta Sudam2, Shipalkar Gayatri Satish3, Satav
Varsha Subhash4
1,2,3,4
Computer Engineering SCSMCOE, Nepti, Ahmednagar, India
shreyashahpurkar55@gmail.com1,prajakta.kardile36@gmail.com2,
shipalkarpayal11@gmail.com3,varshasatav5@gmail.com4
ABSTRACT
The proposed system is implemented with the help of IOT to reduce the issue of
wastage of water and provides monitoring and controlling level of water in particular
water tank. To implement this system we used android. With the help of this android
application we can record the temperature of water, availability of water in the form of
water level by using temperature sensor and water level sensor respectively as well as
we provide the automatic ON/OFF motor functioning to reduce manual work.
Keywords
IOT Device , Water level Sensor, Android Application..
1. INTRODUCTION Step1: Input data: The first step in
To Live water is very important functioning to take an initial input data
aspect for each and every living things. from the level sensor.
Not only for human beings but also for Step2: After sensing the input data from
animals, plants. By the survey there is 71% the level sensor select the level of water
surface of earth is covered by water But with the help of level sensor.
reality is that there is approximately only Step3:Analog data is processed with the
2% of water is fresh water we can use for help of Arduino UNO board and generate
drinking which is very less as compare to the digital output.
todays world population. Step4:Generated output is send to android
Nowadays we can see that in ruler and application via the wifi.
urban areas there is lots of water is wasted Step5: Motor ON-OFF automation are
because of overflow, leakage of water. In done with the help of relay.
existing system the management of water Step6:status and value of output are
wastage is handle manually but sometimes display on the Android app.
because of some reasons like
unavailability of person or there is no 2. SYSTEM ARCHITECTURE
proper medium for communicating with
person to alert about wastage of water or
leakage of water. That‘s why because of
these problems day by day the ratio of
wastage of water increases.
To overcome the problems we implement
the proposed system. In this proposed
system we overcome the problem related
to wastage of water, leakage of water,
overflow of water as well as we provide
the functionality like known the level of
water, measure the temperature of water
automatically by using android application
With the help of IOT.
.Algorithm
Fig. 1 System Architecture

3. FLOWCHART 4. CONCLUSION AND FUTURE

WORK
• This proposed system can be
implemented in personal level areas like
shcool,colleges, particular
industries,private houses or bunglows
housing socities, apartments,hospitals
,offices and munipal overhead tanks.
•As well as this system will be
implemented in large water scale areas like
river,dams etc.to determine the level of
water,theft of water and preventing the
loss of human life,damage of
propertirs,destruction of crops,loss of
livestocks and determination of health
conditions because of flood.
•By using the app we will provide the alert
message in flood prone areas.
•In our proposed system the wastage of
water and level of water is controlled and
monitered from any location by using
simple android application with the help of
IOT
•The facilities provided in this system:
Fig. 2 Flowchart of the system i. Motor ON/OFF facitity.because
. automation of motor reduse the manual
Advantages work as well as westage of water because
 Using IOTt user can directly control of overflow.
and monitor the working of tank ii. with the help of level sensor we
through the smartphone. determine the water westage because of
 User can operate from any place in the leakage
world. iii. using android application provide
 Project can be installed in existing
the cleaning status periodically.
water tanks with no requirement new
for this purpose.
iv. determine the how much water
 No need to take care of cleaning of consumed by in particular region.
water tank .
 System will automatically generate an REFERENCES
alert. [1] Y. Xue, B. Ramamurthy, M.E. Burbach, C.L.
Knutson, "Towards a Real-time Groundwater
 Zero majority of water wastage. Monitoring Network", Nebraska Water
 Project can be installed in existing Colloquium, 2007.
water tanks with no requirement new [2] P. H. Gleick, ―Water resources. In
tank for this purpose. . Encyclopedia of Climate and Weather‖, ed. By
S. H. Schneider, Oxford University Press, New
5. ACKNOWLEDGEMENT York, vol. 2, 1996, pp.817-823.
[3] J. Ghazarian, T. Ruggieri, A. Balaster, ―Secure
We are thankful to Prof. Lagad J. U., Prof. Wireless Leak Detection System. World
Tambe R.,Prof. Jadhav H. , Department of Intellectual Property Organization (WIPO)‖,
Computer Engineering, Shri Chhatrapati WO/2009/017512. 2009.
Shivaji Maharaj College Of Engineering .. [4] C.J. Vörösmarty, P. Green, J. Salisbury, R.B.
Lammers, ―Global Water Resources:
Vulnerability from Climate Change and

Population Growth‖, Science, Vol. 289 no. users,‖ Proceedings 22nd International
5477, 14 July 2000, pp. 284-288. Conference on Distributed Computing Systems
[5] I. Podnar, M. Hauswirth, and M. Jazayeri, Workshops, p. 563568, 2002.
―Mobile push: delivering content to mobile

SMART NOTICE BOARD

Shaikh Tahura Anjum Vazir , Shaikh Fiza Shaukat, Kale Akshay Ashok
Student, Ahmednagar, Maharashtra, India
tahurajannat@gmail.com, shaikhfiza2412@gmail.com, kalesakshay1997@gmail.com
ABSTRACT
A notice board is a surface intended for the posting of public messages, for example, to
advertise items wanted or for sale, announce events, or provide information. Notice
boards are mandatory asset used in institutes and organizations or public places. The
process of notice board handling is a time consuming and hectic. To overcome this
problem a new concept of digital notice board is introduced in this paper. This concept
provides digital way of displaying notices using Android application and Wireless
technology.
General Terms
Existing System, Proposed Method, Implementation, Mathematical Model.
Keywords
Notice Board, Wireless Technology, Android application, Kiosk mode, PHP- Hypertext
Preprocessor
1. INTRODUCTION Android app, but this technique is
The main concept is to use Liquid time consuming.
Crystal Displays (LCD) to display  Updated system for the above
notices which are controlled using voice technique includes Arduino board as
commands. We have already seen GSM a controller to make use of WiFi
based notice board, but voice controlled Technology. As Arduino does not
allows extra advantage. The user sends have inbuilt WiFi support external
the message from the Android hardware is used.
application device, it is received and  No voice command facility was
retrieved by the Wireless Fidility (WiFi) provided in any of the above system.
device at the display unit. Android
application allows user to take voice 3. PROPOSED METHOD
commands as input and send it to This section gives a basic overview of
Raspberry Pi. This function is carried out the system. Fig. 1 shows the block
using WiFi. After receiving the sent text diagram of the system
is processed and displayed on the LCD
screen connected to Raspberry Pi. The
font size is customizable and can display
multiple notices at time. Raspberry Pi is
used as it allows using PHP templates to
display notices.
2. EXISTING SYSTEM
 One of the existing system is
implemented using Global System
for Mobile Communication (GSM)
where Short Message Service (SMS) Fig 1: Block diagram of the system
is used to send notices to the The notice to be displayed is sent
controller which limits the data size. from android application using Socket
 Another existing system uses programming in java.
Bluetooth as mode of data transfer  As Wireless transmission is used,
between microcontroller and the large amount of data can be
transferred over the network.

 Client Server Model is used for Python and making Raspberry Pi as

communication purpose. Android an Access Point.
application is the client who sends  In case of power failure, after boot up
notices to server, which is Raspberry on resumption of power supply, the
Pi. browser window should open
 Server is implemented using Python. automatically so that the display
The server processes the data and screen is ready to show the notice.
displays it on the screen using PHP  For aesthetic reasons, the boot
templates messages and the Raspberry pi logo
 Raspberry Pi provides two video which also appears in the top left
output facility. Which is composite corner of the screen can be hidden.
Radio Corporation of America
(RCA) and High-Definition 5. MATHEMATICAL MODEL
Multimedia Interface (HDMI).
 Video Graphics Array (VGA) port of
display screens can be used by using
HDMI OUT port of the Raspberry Pi
3 model B with a HDMI to VGA
convertor.
 Therefore, the proposed method is
versatile with respect to display
options.
4. IMPLEMENTATION
This section explains the execution flow
from establishing communication
between the Android application and
Raspberry pi to displaying the notices on
the screen. Fig 2: Mathematical Model
 As shown in Fig.3, first the message  M1 sends notice to m2.
is sent from the application and  M2 is the access point which
stored at Raspberry Pi. The message provides the network for m1 to
is retrieved and the contents are connect.
updated and stored on SD card.  After receiving notice from m1, m3
 Now the text message is read from processes it and includes it in m4,
the SD card. Fetched text is wrapped which is PHP template.
in a template and displayed on the  This processed data is sent from m3
screen using browser which is open to m5.
in kiosk mode.  M5 displays the message on LCD
 For the communication to take place, screen.
both Raspberry Pi and android
application must be connected to
same WiFi network. This can be
achieved using server side coding in

Fig 3: Implementation flow chart

6. CONCLUSION on 11 June 2014 from
Current world prefers automation and http://www.eacomm.com/downloads/produc
ts/textbox/wdtgsm.pdf
digitalization in such a way this [3] Article titled ―How to hide text on boot‖
projectwill be more useful in displaying retrieved on 20 September 2014 from
the messages, videos, pictures in http://raspberrypi-
Wireless E-notice board through android easy.blogspot.in/2013/12/how-hide-texton-
app development application by boot.html
[4] Article titled ―How to hide Raspberry Pi
Raspberry Pi. By which the message can LOGO on boot‖ retrieved on 20 September
be send by the users at anywhere from 2014, 11:30 A.M. from
any location with high data speed. http://raspberrypieasy.
User will be able to provide notices blogspot.in/2013/12/how-to-hide-raspberry-
using voice command which will be pi-logo-onboost.html
[5] Article titled ―Autorun browser on startup‖
much easier. retrieved 13 August 2014 from
Only authorized user will have the http://www.raspberry-projects.com/pi/pi-
access to the system which will provide operating%20system s/raspbian/gui/auto-
security and integrity to organization run-browser-on-startup
using the system. [6] Article titled ―WIFI‖ retrieved on 27
November 2014 9:45 A.M. from
Thus the notice board will be more https://www.raspberrypi.org/documentation/
efficient in displaying the accurate configuration/wireless/
messages at low cost. [7] Android Application Development Tutorial-
186 – Voice Recognition Result, The New
7. FUTURE SCOPE Boston, YouTube.
http://www.youtube.com/watch?v=8_XW_5
GLCD can be implemented for more JDxpXI. Oct. 2011.
advancement Voice call can also be [8] J.M. Noyes and C.R. Frankish, ―Speech
added for emergency purpose at public recognition technology for individuals with
places Voice messages and buzzer can disabilities,‖ ISAAC. vol. 8, December
be included to indicate the arrival of new 1992..
[9] Wireless Networking Basics by NETGEAR,
messages especially in educational Inc. 4500 Great America Parkway Santa
institutions. Clara, CA 95054 USA.
[10] A Message Proliferation System using
REFERENCES Short-Range Wireless Devices Department
[1] Vinod B. Jadhav, Tejas S. Nagwanshi, of Information Systems and Media
Yogesh P. Patil, Deepak R. Patil. ―Digital Design,Tokyo Denki University.
Notice Board Using Raspberry Pi‖ IJRET,
Volume: 03, Issue: 05 | May-2016.
[2] Article named ―Wireless data transmission
over GSM Short Message Service‖ retrieved

VEHICLE IDENTIFICATION USING IOT

Miss YashanjaliSisodia, Mr.SudarshanR.Diwate
Asst. Prof. (Department of Computer), G.H.RaisoniCEOM,Maharshtra, SPPU University ,India
Asst. Prof. (Department of E&TC), G.H.RaisoniCEOM,Maharshtra, SPPU University ,India
yashanjali44@gmail.com, srdiwate52@gmail.com, s.diwate@raisoni.net
ABSTRACT
The aim of the paper is to identify the vehicle which passes through system and
RFID device will identify vehicle by using arduinouno. The key element in this system
is the passive RFID tags which will be hidden inside the vehicle and act as a unique
identification number for the vehicle. The information of all such tags will be
maintained by centralized server. When unauthorized vehicle is trying to pass through
the gate then gate will not open and for authorized vehicle automatically gate will get
open via radio frequency identification of vehicle using IOT. System help to security
domain also.
1. INTRODUCTION police center as an alert for the stolen.
In earlier in the residential buildings there When a police center receives an alert for
is no any system which identifies the stolen vehicles, they can make an action to
information about the persons and their prevent this theft.
vehicle. So the unknown vehicle enters in Nowadays, it is used either as a
residential buildings. By using this system replacement or addition for car alarms to
we are going to solve this problem using protect it from theft or it can be used as a
IOT. monitoring system to keep track the
In this paper we are going to use IOT vehicle at the real time. So, many
involves extending internet applications can be used for this purpose to
connectivity beyond standard devices, block car's engine or doors as an action to
such as desktops, laptops, smartphones and protect the vehicle. Due to the
tablets, to any range of advancement in technology vehicle
traditionally dumb or non-internet-enabled tracking systems that can even identify and
physical devices and everyday objects. detect vehicle's illegal movements and
Embedded with technology, these devices then attentive the owner about these
can communicate and interact over movements. This gives an advantage over
the internet, and they can be remotely the rest applications and other pieces of
monitored and controlled. technology that can serve for the same
In this paper we are going to use the purpose using IOT.
RFID for identification of the vehicle. The
system identifies the vehicle and access to 2. LITERATURE REVIEW
the gate. The system stores the information Prof. Kumthekar A.V. Ms. SayaliOwhal
about the person vehicle who lives in the etc.[1] proposed a system that RFID
residential building. The system database technology and information management
stores the name, flat number and the are leading tools that are imperative for
vehicle number of the person. future sustainable development of
Vehicle tracking systems are popular container transportation, not only port
among people as are travel device and facilities and transportation but also a
theft prevention. The main benefit of manufacturer and retailers. The useful
vehicle tracking systems is the security application experiences are extremely
purposes by monitoring the vehicle's helpful for RFID widespread and
location which can be used as a protection successful adoption in the future. From the
approach for vehicles that are stolen by analysis of above-mentioned RFID
sending its position coordinates to the container transportation implementation,

some key points can be concluded for anonymity requirements, the transformed
further RFID application systems oneinherits these properties. The result
implementation. As information systems improves the prior best bound on worst-
play a crucial role in RFID case key-lookup cost of O(log n), by
implementation, information system Molnar, Soppera and Wagner (2006). They
development is essential for RFID project also show that any RFID authentication
success. And RFID information system protocol that simultaneously provides
should be developed as all open systems guarantees of privacy protection and of
that can be easily integrated with others worst-case constant-cost key-lookup must
system in supply chains. Security is a also imply ―public-key obfuscation‟, at
critical issue for RFID systems since they least when the number of tags is
manage cargo information that must be asymptotically large. Also consider
protected from theft, modification or relaxations of the privacy requirements
destruction. As a new wireless technology and show that, if limited likability is to be
that often links to the Internet, security tolerated, then simpler approaches can be
presents additional challenges that must be pursued to achieve constant key-lookup
factored into any installation of RFID cost.
systems.
Kashif Ali, HossamHassanein[4] 3. DESIGNING OF SYSTEM
presented system successfully merges the Objective
RFID readers and their tags with central Vehicle tracking has increased in use over
database, such that all the parking lots in the past few years and, based on current
the university can work in fast and trends, this rise should continue. Tracking
efficient manner. The RFID tag provides a offers benefits to both private and public
secure and robust method for holding the sector individuals, allowing for real-time
vehicle identify. The web-based database visibility of vehicles and the ability to
allows for the centralization of all vehicles receive advanced information regarding
and owners records. legal existence and security status. The
Ivan Muller, Renato Machado de monitoring system of a vehicle is
Brito[5]Vehicle tracking systems are integration of RFID technology and
popular among people as are travel device tracking system using IOT.
and theft prevention. The main benefit of Theme
vehicle tracking systems is the security In this paper Arduino used for controlling
purposes by monitoring the vehicle's all peripherals and activities. Arduino does
location which can be used as a protection not require external power supply circuit
approach for vehicles that are stolen by because Arduino has inbuilt power supply
sending its position coordinates to the circuit as well it provides additional
police center as an alert for the stolen. functionalities compared to any
When a police center receives an alert microcontroller like pic, microcontroller
forstolenvehicles, they can make an action 8051. Arduino is more sophisticated
to prevent this theft. compared with other microcontroller In
Muhammad Tahir Qadri, RFID the RFID reader can identify all
Muhammad[6]Introduced a new approach recognized data from RFID tag and then
that leads to a reconciliation of privacy and collected data has showing in terminal of
availability requirements in anonymous pc in that RFID tag is provided to the all
RFID authentication: a generic compiler vehicle and data can move towardsto
that mapseach challenge-response RFID RFID reader via radio frequency range
authentication protocol into another that 13.56MHz. Data can help to which one is
supports key-lookup operations in constant authorized or unauthorized vehicle. This
cost. If the original protocol were to satisfy

whole data will go to the via ESP8266 Wi- provides UART TTL (5V) serial
Fi module with internet to the mobile. communication which can be done using
Design digital pin 0 (Rx) and digital pin 1(TX).
In this fig shown the RFID reader, relay is An ATmega16U2 on the board channels
connected to the Arduino Uno. All data this serial communication over USB and
can begathered and then stored in appearsas a virtual com port to software on
ArduinoUno so we can easily access any the computer. The ATmega16U2 firmware
time anywhere it. According to data it can uses the standard USB COM drivers, and
give response. It can very efficient and no external driver is needed. However, on
reliable for data storing purpose and many Windows, an .info file is required. The
things can be analyze in system. RFID Arduino software includes a serial monitor
reader canbe read info. From RFID tag and which allows simple textual data to be sent
relay can be control motor. Motor is to and from the Arduino board. There are
connected to one circular rode which can two RX and TX LEDs on the Arduino
act as a gate so acc. To wholedata it can board which will flash when data is being
give response very quickly. All data transmitted via the USB-to-serial chip and
which can going into the pc and for mobile USB connection to the computer (not for
the ESP8266 is connected to the Arduino. serial communication on pins 0 and 1). A
Inmobile the authorized and unauthorized Software Serial library allows for serial
vehicle id no. is sent via ESP8266 Wi-Fi communication on any of the Uno's digital
module which is connected to internet. pins.The ATmega328P also supports I2C
(TWI) and SPI communication. The
Arduino software includes a Wire library
to simplify use of the I2C bus.
Figure 1 block diagram of vehicle detection

using RFID reader
Arduino Uno:
Arduino Uno is a microcontroller board Figure 2 Arduinouno
based on 8-bit ATmega328P RC-522 13.56 MHz RFID Reader
microcontroller. Along with ATmega328P, This low cost MFRC522 based RFID
it consist other components such as crystal Reader Module is easy to use and can be
oscillator, serial communication, voltage used in a wide range of applications.
regulator, etc. to support the RC522 is a highly integrated transmission
microcontroller. Arduino Uno has 14 module for contactless communication at
digital input/output pins (out of which 6 13.56 MHz this transmission module
can be used as PWM outputs), 6 analog utilizes an outstanding modulation and
input pins, a USB connection, A Power demodulation concept completely
barrel jack, an ICSP header and a reset integrated for different kinds of contactless
button .Arduino can be used to communication methods and protocols at
communicate with a computer, another 13.56 MHz. The MFRC522 is a highly
Arduino board or other microcontrollers. integrated reader/writer IC for contactless
The ATmega328P microcontroller communication at 13.56 MHz.

DC MOTOR module comes pre-programmed with an

A DC motor is an electric motor that runs AT command set firmware, meaning, you
on direct current (DC) electricity. DC can simply hook this up to your Arduino
motors were used to run machinery, often device and get about as much Wi-Fi-
eliminating the need for a local steam ability as a Wi-Fi Shield offers (and that‘s
engine or internal combustion engine. DC just out of the box) The ESP8266 module
motors can operate directly from is an extremelycost effective board with a
rechargeable batteries, providing the huge, and ever growing, community. 32
motive power for the first electric vehicles. This module has a powerful enough on-
Today DC motors are still found in board processing and storage capability
applications as small as toys and disk that allows it to be integrated with the
drives, or in large sizes to operate steel sensors and other application specific
rolling mills and paper machines. Modern devices through its GPIOs with minimal
DC motors are nearly always operated in development up-front and minimal loading
conjunction with power electronic devices. during runtime. Its high degree of on-chip
In any electric motor, operation is based integration allows for minimal external
on simple electromagnetism. A current- circuitry, including the front-end module,
carrying conductor generates a magnetic is designed to occupy minimal PCB area.
field; when this is then placed in an The ESP8266 supports APSD for VoIP
external magnetic field, it will experience applications and Bluetooth co-existence
a force proportional to the current in the interfaces, it contains a self-calibrated RF
conductor, and to the strength of the allowing it to work under all operating
external magnetic field. The internal conditions, and requires no external RF
configuration of a DC motor is designed to parts. There is an almost limitless fountain
harness the magnetic interaction between a of information available for the ESP8266,
current-carrying conductor and an external all of which has been provided by amazing
magnetic field to generate rotational community support. In the Documents
motion.Every DC motor has six basic parts section below you will find many
axle, rotor, stator, commutator, field resources to aid you in using the ESP8266,
magnet(s), and brushes. In most common even instructions on how to transforming
DC, the external magnetic field is this module into an IOT (Internet of
produced by high-strength permanent Things) solution.
magnets. The stator is the stationary part Specification:-
of the motor this includes the motor Hardware:Arduino UnoRFID
casing, as well as two or more permanent sensor(MFRC522)MotorRelay
magnet pole pieces. The rotor (together Software:-Arduino IDE 1.6.8
with the axle and attached commutator)
rotates with respect to the stator. The rotor 6. CONCLUSION
consists of windings (generally on a core), The project is helpful for the identification
the windings being electrically connected of the vehicle via RFID using IOT. Our
to the commutator. project help in any stage of security
ESP8266 WIFI MODULE domain system in residential buildings,
The ESP8266 Wi-Fi Module is a self- colleges, schools, malls etc. when
contained SOC with integrated TCP/IP unauthorized vehicle pass through gate
protocol stack that can give any then RFID identify acc. to data which can
microcontroller access to your Wi-Fi stored , it will not open and when
network. The ESP8266 is capable of either authorized vehicle near gate then it will
hosting an application or offloading all open. All data which can going into the pc
Wi-Fi networking functions from another and for mobile the ESP8266 is connected
application processor. Each ESP8266 to the Arduino. In mobile the authorized

and unauthorized vehicle id no. is sent via Replacement StevanPreradovic, Isaac Balbin,
esp8266 Wi-Fi module which is connected Nemai C. Karmakar and Gerry Swiegers2008
to through internet. [4] Kashif Ali; HossamHassanein ―Passive RFID
for Intelligent Transportation Systems‖: 2009
6th IEEE Consumer Communications and
REFERENCES Networking Conference.
[1] Prof. Kumthekar A.V. , Ms. SayaliOwhal, [5] Ivan Muller, Renato Machado de Brito, Carlos
Ms.SnehalSupekar, Ms. BhagyashriTupe Eduardo Pereira, and ValnerBrusamarello.
―International research journal of Engineering ‖Load cells in force sensing analysis theory
and technology‖ IRJET, (volume:05) April and a novel application‖: IEEE
2018. Instrumentation & Measurement Magazine
[2] Liu Bin, Lu Xiaobo and GaoChaohui. Volume: 13, Issue: 1
Comparing and testing of ETC modes in [6] Muhammad Tahir Qadri, Muhammad Asif.
Chinese freeway. Journal of Transportation ―Automatic Number Plate Recognition System
Engineering and Information, 5(2), 2007, for Vehicle Identification Using Optical
pp.31-35. Character Recognition‖: 2009 International
[3] A Novel Chipless RFID System Based on Conference on Education Technology and
Planar Multiresonators for Barcode Computer

WIRELESS COMMUNICATION SYSTEM WITHIN

CAMPUS
Mrs. Shilpa S. Jahagirdar1, Mrs. Kanchan A. Pujari2
1,2
Department of Electronics and Telecommunication, Smt Kashibai Navale College of Engineering,
Vadgaon(Bk), Pune, India.
1
shilpajahagirdar93@gmail.com , kanchi807@gmail.com2
ABSTRACT
The system ―Wireless Communication System Within Campus‖ can be seen as a
smaller version of smart campus. It is observed that providing the educational material,
or important notices to students by faculties is still done though old methods like
detecting notes in class or physical distribution, which is very time consuming. This
important time of faculties as well as students can be saved through the use of
technology and also this approach will be useful for students to acquire the important
notices and required educational material. By making use of today‘s advanced
electronic techniques and capabilities of smart phone‘s powerful processors and large
memories, a system is designed to view important information by the students using an
application by Wi-Fi without internet connectivity. This will help in better sharing and
spread of important message or information amongst the campus students. The
students will view or download required educational material and important message
through the application..
Keywords
wireless communication, smart phone, Wi-Fi, application
1. INTRODUCTION • Every student or faculty in the college
In system that is typically followed in may not have access to the internet.
colleges, the students and teachers have to • Excessive use of paper and other
communicate everyday for many activities. resources.
The notices, educational material or any As the electronic techniques advanced,
other sort of information is required to be computing machines have been
spread through either physical means or miniaturized and Smartphone are equipped
internet access, this might consume a lot of with powerful processors and large
effort as the paperwork is slow and also memories. In the consequence, various
everyone at college may or may not have services become available on smart
the privilege of internet access. The phones. Since a smart phone is a personal
current process of information sharing has belonging it is an excellent candidate
problems such as device on which a context-aware. services
• Notices are shared on paper from class to are provided. As an example of context-
class which is time consuming. aware service on Smartphone, the campus
• Searching backdated data might be guide is picked up and its implementation
difficult. is introduced in this paper. The ―Wireless
• Manual process is slower and may cause Communication Within Campus‖ consists
error. of the server and client.
The main features of the client include the required educational material and
sharing of information and important important messages through the
educational material between the client application. Use of same application can
and server in android mobile phones. This be extended for faculties to control the
will help in better sharing and spread of electrical appliances in department. The
important message or information amongst application does not need internet access
the campus students. The students can get hence no internet service is mandatory.

The application only is needed to be

connected to the raspberry pi WI-FI in the
college or department premises. This
system will ease the communication and
data sharing without using the resources
such as papers, manual effort, and internet
connection.
2. MOTIVATION
The present day other smart campus
systems propose the application of
knowing or measuring the area of a
building or classrooms in a college etc.
The other smart campus system proposes Figure 1: Block Diagram of the System
the application such that the location of a The system uses a raspberry pi 3 version
user using the android app in the college as the heart of the system which looks
area or campus area can be known through after the whole communication in the
the android app. The major issue in the system. In this the WI-FI of the raspberry
college or a campus is the difficulty of pi will be used as a medium to connect the
data sharing amongst the students and the android apps. The program such as socket
staff. While many of the users are not programming is used for the
connected to the internet facility among communication purpose where an app is
the college hours and also the important designed in such a way that it can be
notices are to be displayed on the notice accessed by an authorized person only. If a
board or shared from class to class student has to access the app they will be
increasing the manual effort. This process given a separate password and ID (USER
might be time consuming and also cause ID), and if a faculty has to access the app
manual error and also is the problem of they will have a different password and ID
controlling the electric appliances in the (ADMIN ID). Thus, system also prevents
classes where one has to go and manually the privacy of the users and
switch on/off the appliances. In this paper, miscommunication occurrence.
efforts are made to solve this issue by The memory of the raspberry pi is used as
using the android app and raspberry pi a storage unit for the data being uploaded,
module where a student can access the this it will work as a cloud memory for the
data sent by teacher on the WI-FI module android app. The android application(App)
and also the power control is added so that have options such as upload, download,
the electric appliances control can be done view, etc. The GUI design will be different
in the range of raspberry pi. for teachers and students based on their
respective login as a faculty (ADMIN) or
3. METHEDOLOGY as a student(USER). This GUI is created
The system works as a storage medium using the eclipse software. The faculty also
plus being WI-FI enabled which uploads can control the electrical appliances of the
the information on web server designed for department using their android application
this application. The uploaded file will be whereas student‘s login is not provided
stored and can be viewed or downloaded with this extra feature. This option is only
using an android application. Faculties are provided in the faculty login GUI.
able to turn ON/OFF the electric
appliances like fan or lights remotely from 4. ALGORITHM
server with the help of raspberry Pi and Start
relay assembly.

Change the directory path to location at iv. Now read the filename with fread() function
predefined path and it will return the file content and length of
Set the direction of GPIO pin to output . the file name
Open the socket with fixed port number v. Write the file length and file content on socket
To accept connections, the following steps vi. Free the allocated memory in
malloc() function
are performed:
12. if switch case 4:
1. A socket is created with socket(). i. Device 1will get turn ON
2. The socket is bound to a local address 13. if switch case 5:
using bind(), so that other sockets may bei. Device 1will get turn OFF
connected to it. 14 .if switch case 6:
3. A willingness to accept incomingi. Device 2will get turn ON
connections and a queue limit for 15. if switch case 7:
incoming connections are specified withi. Device 2will get turn OFF
listen(). 16. if switch case 0:
4. Connect socket With connect() method i. All devices are OFF
5. Connections are accepted with accept().
6. Read the data on socket 1 byte
7. Convert that byte from ascii to int by 5. RESULTS
atoi() function Screen shots of various pages of the
8. Check the byte put into the switch case application are as follows,
9 if switch case 1:
i. Read file data from client and save on server
ii. First read file size
iii. Allocate memory to read filename using
malloc()
iv. Now read actual file data
v. Now read name of file, for that first read size
of filename
vi. Allocate memory to read filename
vii. Now read actual file data
viii. Write file data into file
ix. Free the allocated memory in malloc()
function
i. Now read pathname, for that first read size of Figure 2: Screen shots1 of Android Application
pathname (LOGIN PAGE)
ii. Allocate memory to read pathname using
malloc()
iii. Now read actual path data
iv. Now pass the directory path for list_dir()
function
v. From this function return value are file and
directory listing and it‘s length
vi. Write the data on socket it‘s file and directory
length and it‘s length
vii. Free the allocated memory in malloc()
function
i. Now read file, for that first read size of
filename
ii. Allocate memory to read pathname using
malloc() Figure 3: Screen shots2 of Android Application
iii. Now read actual filename (CONFIGURATION)

easy access to students for getting the

required educational material and
important notices by using an android
application and also to download if
required. All the devices are connected
through WI-FI using an application on
android phone. The system makes the task
of sharing files and important data easy.
Using the same application electric
appliances are also controlled. So this
system reduces the human effort of sharing
the important notice from class to class or
faculty to students and also helps in
controlling electronic appliances from a
Figure 4: Screen shots3 of Android Application distance instead of manually going to the
(MENU SELECTION) place and switching it ON/OFF. It is also
easier now to access the previously shared
6. APPLICATIONS information.
This system is useful for easy
communication between students and the REFERENCES
faculty without the use of internet access [1] Min Guo, Yu Zhang, The Research Of Smart
and paper wastage. The system GUI is Campus Based On Internet Of Things & Cloud
different from user to user depending upon Computing, sept. 2015.
their login as a faculty (ADMIN) or as a [2] Dhiraj Sunhera, Ayesha Bano, An Intelligent
Surveillance with Cloud Storage for Home
student (USER). The faculties with their Security, Annual IEEE India Conference,
own login can upload, download 2014.
documents and can also operate the [3] Xiao Nie, Constructing Smart Campus Based
electric appliances in a particular on the Cloud Computing Platform and the
classroom. Students can only view or Internet of Things, 2nd International
Conference on Computer Science and
download the required document. In Electronics Engineering, 2013.
current system, data is stored in memory [4] Suresh.S, H.N.S.Anusha, T.Rajath,
of raspberry pi but in future this system P.Soundarya and S.V,Prathyusha Vudatha,
can be made IOT based by storage of all Automatic Lighting And Control System For
data in cloud. Classroom, Nov. 2016.
[5] Piotr K. Tysowski, Pengxiang Zhao,
Kshirasagar Naik, Peer to Peer Content
7. CONCLUSION Sharing on Ad Hoc Networks of Smartphones,
After making survey of all existing 7th International Conference, July 2011.
different smart campus systems, the [6] Agus Kurniawan, ―Getting started with
―Wireless Communication System Within Raspberry Pi 3‖, 1st edition.
Campus‖ can be implemented. It gives

LICENSE PLATE RECOGNITION USING RFID

Vaibhavi Bhosale1 , Monali Deoghare2, Dynanda Kulkarni3, Prof S A Kahate
1,2,3
Department of Computer Engineering, Smt Kashibai Navale College of engineering, Vadgaon(Bk),
Pune, India.
vaibhavibhosale79@gmail.com1,monalideoghare@gmail.com2,dynandakulkarni@gmail.com3,
sakahate@gmail.com4
ABSTRACT
The objective of this project is to design an efficient automatic authorized vehicle
identification system by using the vehicle number plate and RFID. The developed
system firstly detects the vehicle RFID tag and then it captures the vehicle number
plate. Here Vehicle number plate is detected by using the RFID tag situated on vehicle.
And then resulting data is used to compare with the records on a database and data
extracted from RFID Tag. And in database there can be specific information like
vehicles owner name, place of registration, or address, etc. If the ID and the number
are matches with the database then it show the message authorized person else
unauthorized person. Both should be match with the database. If signal break any
vehicles the RTO have authority to send the fine details by courier given address.
1.INTRODUCTION information of the vehicle and its owners.
This System can be implementing on tolls Robust and accurate detection and tracking
to identify the theft vehicles, the RFID of moving objects has always been a
tags will help to identify the authorized complex problem. Especially in the case of
owner of the vehicle that will provide outdoor video surveillance systems, the
security to society. The System robustness visual tracking problem is articularly
and speed can be increased if high challenging due to illumination or
frequency readers is used. We will be able background changes, occlusions problems
to trace the vehicle moments if GPS is etc.
implemented and can extract the vehicles
number.Identified owner will be sent an I. 3. STATE OF ART
SMS with the use of GSM module about [1] The essentials of keystroke dynamics is
moments An algorithm for vehicle number not what you type, but how you type. In
plate extraction, character segmentation this paper, it mainly presents our proposed
and recognition is presented. If vehicle authentication system supporting with
break the signal then immediately send the keystroke dynamics as a biometric for
report to the RTO center and RTO check authentication. We uses inter-key delays of
that vehicle details and apply the fine. the password and the account for user
Here Vehicle number plate is detected by identification in the system design. There
using the RFID tag situated on vehicle. are suggestions in the literature, that a
combination of key-hold time with the
2. MOTIVATION inter-key delay can improve the
In Traffic surveillance, tracking of the performance further.
vehicle is a vital job. We are proposing a
real time application which recognizes [2] We propose to use RFID technology
license plates from vehicles to track the to combine functions of physical access
vehicle path using RFID tag and Reader. It control, computer‘s access control and
is very difficult to identify the lost vehicle management, and digital signature
and also the vehicles which violate traffic systems. This combination allows to
rules. Therefore, it is necessary to detect drastically increase systems‘ security.
the number plate of the vehicle and use Even low-end RFID tags can add one
this detected number to track the security level into the system, but high-end

RFID tags with cryptographic possibilities used that carries the family member details
and slight modification of digital signature and the customer needs to show this tag to
calculation procedure make it possible to the RFID reader. The microcontroller
prevent obtaining digital signatures for connected to the reader will checks for the
fraudulent documents. The further user authentication. If the user is found
evolution of the proposed scheme is authentic then the quantity of ration to be
permanent monitoring by means of given to the customer according to the
periodical controlling user‘s RFID tag, total number of family members will be
whether authenticated user is present at the displayed on display device.
computer with restricted access. II. Proposed Work
[3] Mobile SNS is one of the most popular

topics of mobile Internet. In order to fulfill
the user demand for self-maintained
independent social network and ensure the
privacy of their personal information and
resources, the paper proposes system
architecture of decentralized mobile SNS.
The mechanism and algorithm are devised
for user profile complete deletion when
users are going to quit the service for the
temporary scenarios.
[4] An encryption scheme for exchanging

item level data by storing it in a central
repository. It allows the data owner to Fig:Introduction to Proposed System
enforce access control on an item-level by First Goal of this project is to modernize
managing the corresponding keys. the present system and style of the new
Furthermore, data remains confidential solutions for identi_cation and registration
even against the repository provider. Thus of vehicles supported RFID
we eliminate the main problem of the technology.Frequency identi_cation
central approach. We provide formal technology, as a result of contactless
proofs that the proposed encryption manner of identification of things and
scheme is secure. Then, we evaluate the objects, provides higher and safer
encryption scheme with databases solutions, particularly in conjunction with
containing up to 50 million tuples. Results a camera system.
show that the encryption scheme is fast,
III.
scalable and that it can be parallelized very Advantages
efficiently. Our encryption scheme thereby In this project we have thought out of a
reconciles the conflict between security system which is simple, cheap,
and performance in item-level data reliable,and of course at least some
repositories. fundamental advantages over the
conventional automated systems.
[5] Developed a smart ration card using Here, micro controller controlled wireless
Radio Frequency Identification (RFID) communication system has been
technique to prevent the ration forgery as used,which makes the system not only
there are chances that the shopkeeper may automatic but also flexible
sell the material to someone else and take
the profit and put some false amount in 4. CONCLUSION AND FUTURE
their records. In this system, a RFID tag is WORK

Here we conclude, the automatic vehicle Systems, 2015 IEEE International Conference
identification system using vehicle license on Systems, Man, and Cybernetics.
[4] ehun-wei Tseng, Design and Implementation
plate and RFID technology is presented. of a RFID-based Authentication System by
The system identifying the vehicle from Using Keystroke Dynamics.
the database stored in the PC. The [5] Andrey Larchikov, Sergey Panasenko,
objective of this project is to design an Alexander V. Pimenov, Petr
efficient automatic authorized vehicle Timofeev,Combining RFID-Based Physical
Access Control Systems with Digital Signature
identification system by using the vehicle Systems to Increase Their Security.
number plate and RFID. The Automatic
Number Plate Recognition (ANPR) system
is an important technique, used in
Intelligent Transportation System.ANPR is
an advanced machine vision technology
used to identify vehicles by their number
plates without direct human intervention.
The decisive portion of ANPR system is
the software model.We also implemented
further process if any vehicle break the
signal then our system can detect that
vehicle number tag and check details of
that vehicle for applying fine to that
vehicles.
REFERENCES
[1] Hsiao-Ying Huang, Privacy by Region:
Evaluation Online Users‘ Privacy Perceptions
by Geographical Region, FTC 2016 - Future
Technologies Conference 2016,6-7 December
2016.
[2] Hyoung shick Kim, Design of a secure digital
recording protection system with network
connected devices, 2017 31st International
Conference on Advanced Information
Networking and Applications Workshops.
[3] Chao-Hsien Lee and Yu-Lin Zheng, SQL-to-
NoSQL Schema Denormalization and
Migration: A Study on Content Management

DATA ANALYTICS
AND MACHINE
LEARNING

Online Recommendation System

Swapnil N Patil1, Vaishnavi Jadhav2, Kiran Patil3, Shailja Maheshwari4
1
Asst. Professor, Smt Kashibai Navale College of Engineering, Vadgaon(Bk), Pune, India.
2,3,4
Asst. Professor, Smt Kashibai Navale College of Engineering, Vadgaon(Bk), Pune, India.
swapnil4acc@gmail.com1, jadhav.vaishnavi28@gmail.com2, patilkiran3122@gmail.com3,
mshailja29@gmail.com4
ABSTRACT
In today‘s world, everyone tends towards the internet. Usage of the internet is
increasing day by day. Online shopping trend increases as internet usage increases.
Online consumer reviews influence the consumer decision-making. End-user has seen
the reviews of the product of the previous user and decides about good things and bad
things. The Web provides an extensive source of consumer reviews, but one can hardly
read all reviews to obtain a fair evaluation of a product or service. On the basis of this
previous theory the process of computationally identifying and categorizing opinions
expressed in a piece of text, especially in order to determine whether the writer's
attitude towards a particular topic, product, etc. is positive, negative, or neutral. So, in
this paper we are working on the sentiment analysis of that particular review and gives
proper recommendation to end user. We are work on the supervised and unsupervised
methodology. This system uses the real-time dataset of the review of the product.
Keyword: Machine learning, Opinion mining, Statistical measures, Early reviewer,
Early review.
1. INTRODUCTION overall rating. The paper proposes a
Nowadays if we want to purchase system that can use this information from
something, we go online and search for reviews to evaluate the quality of these
products and look for their reviews. A user products' aspects. Also, the proposed
has to go through each and every review system categorizes these aspects so that
for getting information regarding each and problem with different words for same
every aspect of product. Some of these aspects can be resolved. These aspects are
reviews contains large amount of text and identified using supervised and
detailed information about product and its unsupervised techniques. Then these
aspects. A user may have to go through all identified aspects are categorized in
of these reviews for help in decision categories. The sentiments or opinions
making. Some of these products can have user provided for particular aspect is
large amount of reviews and can contain assigned to category of that aspect. Using
information about its aspects in the form of natural language processing techniques,
la6rge texts corpuses. A user might get the opinions are rated in the scale of 1 to 5.
irritated while reading all of these reviews These ratings are used to evaluate the
and learn about the product. To avoid this, quality of the products.
a system is needed that can analyze these
reviews and detect the sentiments from 2. RELATED WORK
these reviews for every aspect. Existing Opinion Mining and Sentiment Analysis:
approaches fails to cover the fact if two Opinion mining is a type of natural
reviews are mentioning same aspect with language processing for tracking the mood
two different words. Existing systems of the public about a particular product.
considers those as two different aspects. The paper focuses on designing and
Also, the aspect wise information is not developing a rating and review-
preserved by these systems as they rely summarization system in 6a mobile
mostly on rating that is provided by environment. This research examines the
different users for showing the quality or influence of recommendations on

consumer decision making during online for sentimental analysis for users' reviews.
shopping experiences. The recommender The user is giving negative, positive or
system recommends the products to users neutral review is characterized by this
and to what extent these recommendations sentimental analysis.
affect consumer decisions about buying The User Diagram:
products is analyzed in this paper.
Comparison with the state-of-the-art for
opinion mining is done by Horacio
Saggion, et.al,2009, Ana-Maria Popescu
and Oren Etzioni introduces an
unsupervised information xtraction system
which mines reviews in order to build a
model of important product features, their
valuation by reviewers, and their relative
quality across products(Oren et. al., 2005).
Early Adopter Detection
An early adopter could refer to a
trendsetter, e.g., an early customer of a
given company, product and technology.
The importance of early adopters has been
widely studied in sociology and
economics. It has been shown that early
adopters are important in trend prediction,
viral marketing, product promotion, and so
on. The analysis and detection of early
adopters in the diffusion of innovations Fig 1: Use case
have attracted much attention from the The Sequence Diagram
research community. Generally speaking,
three elements of a diffusion process have
been studied: attributes of an innovation,
communication channels, and
social network structures.
Modeling Comparison-Based Preference
By modeling comparison-based
preference, we can essentially perform any
ranking task. For example, in information
retrieval (IR), learning to rank aims to
learn the ranking for a list of candidate
items with manually selected features.
Distributed Representation Learning
Since it's seminal work , distributed
representation learning has been
successfully used in various application
areas including Natural Language
Processing(NLP), speech recognition and
computer vision. In NLP several semantic
embedding models have been proposed,
including word embedding, phrase
embedding such as word2vec.In this paper
we are using natural language processing Fig 2: Sequence Diagram

3. MOTIVATION Fig 3: System overview

We all use user's reviews for evaluating Activity Diagram:
quality of product which we wish to
purchase online. While looking for a
particular feature of a product, user might
look for one particular feature of that
product. (Ex. Camera in phones) The
products having good quality for that
feature should be preferred in results. For
this, detailed information about features is
needed. And a system that can fetch this
information from user reviews is needed.
System Architecture:
In our system firstly user will search the
pro6duct and review that product
according to t6he6m and using sentimental
analysis on that review for generating
rating system. If another user will view
that product the review will help them.
Fig:Activity Diagram
4. GAP ANALYSIS
Sr. Year Author Paper Name Paper Description
no Name
1. 2016 Julian Addressing Complex and ‗Relevance‘ is measured in terms

McAuley, Subjective Product- of how helpful the review will be
Alex Related Queries with in terms of identifying the
Yang Customer Reviews correct response.
2. 2012 Ida Mele, The Early-Adopter Graph By tracking the browsing

Francesco and its Application to activity of early adopters we can
Bonchi, Web-Page identify new interesting pages
Aristides Recommendation early, and recommend these

Gionis pages to similar users
3. 2012 Manuela Models for Paired There are other situations that
Cattelan Comparison Data: A may be regarded as comparisons
Review with Emphasis from which a winner and a loser
on Dependent Data can be identified without the
presence of a judge
4. 2010 Ee-Peng Detecting Product Given that such labels do not

Lim, Review Spammers using exist in the public, we thus
Rating Behaviors decide to conduct user
Viet-An evaluation on different methods
Nguyen, derived from the spamming
Nitin behaviors proposed in this paper
Jindal
5. CONCLUSION supervisor as well as the panels especially

A system with two methods for in our project presentation that has
detecting aspect categories that is useful improved our presentation skills thanks to
for online review summarization is their comment and advices. A special
proposed. This system contains spreading thanks to my teammates, who helped me
activation to identify categories accurately. prepare this report. Last but not the least,
The system also weighs the importance of we extend my sincere thanks to my family
aspect. System can identify the sentiment members and my friends for their constant
for given aspect.Our experiments also support throughout this project.
indicate that early reviewers‘ ratings and
their received helpfulness scores are likely REFERENCES
to influence product popularity at a later [1] J. McAuley and A. Yang, ―Addressing
stage. We have adopted a competition- complex and subjective product-related queries
based viewpoint to model the review with customer reviews,‖ in WWW, 2016, pp.
625–635.
posting process, and developed a margin- [2] N. V. Nielsen, ―E-commerce: Evolution or
based embedding ranking model for revolution in the fastmoving consumer goods
predicting early reviewers in a cold-start world,‖ nngroup. com, 2014.
setting. [3] W6. D. J. Salganik M J, Dodds P S,
―Experimental study of inequality and
unpredictability in an artificial cultural
6. ACKNOWLEDGEMENT market,‖ in ASONAM, 2016, pp. 529–532.
We express our gratitude to Prof [4] R. Peres, E. Muller, and V. Mahajan,
Swapnil N Patil, for his patronage and ―Innovation diffusion and new product growth
giving us an opportunity to undertake this models: A critical review and research
Project. We owe deep sense of gratitude to directions,‖ International Journal of Research
in Marketing, vol. 27, no. 2, pp. 91 – 106,
Swapnil Patil Sir for his constant 2010.
encouragement, valuable guidance and [5] L. A. Fourt and J. W. Woodlock, ―Early
support to meet the successful completion prediction of market success for new grocery
of my preliminary project report. We products.‖ Journal of Marketing, vol. 25, no. 2,
appreciate the guidance given by other pp. 31 – 38, 1960.

[6] B. W. O, ―Reference group influence on [11] X. Rong and Q. Mei, ―Diffusion of innovations
product and brand purchase decisions,‖ Journal revisited: from social network to innovation
of Consumer Research, vol. 9, pp. 183– network,‖ in CIKM, 2013, pp. 499–508.
194,1982. [12] I. Mele, F. Bonchi, and A. Gionis, ―The early-
[7] J. J. McAuley, C. Targett, Q. Shi, and A. van adopter graph and its application to web-page
den Hengel, ―Imagebased recommendations recommendation,‖ in CIKM, 2012, pp.1682–
on styles and substitutes,‖ in SIGIR, 2015, pp. 1686.
43–52. [13] Y.-F. Chen, ―Herd behavior in purchasing
[8] E. M.Rogers, Diffusion of Innovations. New books online,‖ Computers in Human Behavior,
York: The Rise of High-Technology Culture, vol. 24(5), pp. 1977–1992, 2008. Banerjee, ―A
1983. simple model of herd behaviour,‖ Quarterly
[9] K. Sarkar and H. Sundaram, ―How do we find Journal of Economics, vol. 107, pp. 797–817,
early adopters who will guide a resource 1992.
constrained network towards a desired [14] A. S. E, ―Studies of independence and
distribution of behaviors?‖ in CoRR, 2013, p. conformity: I. a minority of one against a
1303. unanimous majority,‖ Psychological
[10] D. Imamori and K. Tajima, ―Predicting monographs: General and applied, vol. 70(9),
popularity of twitter accounts through the p. 1, 1956.
discovery of link-propagating early adopters,‖
in CoRR, 2015, p. 1512.

INTELLIGENT QUERY SYSTEM USING NATURAL

LANGUAGE PROCESSING
Kshitij Ingole1, Akash Patil2, Kalyani Kshirsagar3, Pratiksha Bothara4,
Prof. Vaishali S. Deshmukh5
1,2,3,4
Student, Smt. Kashibai Navale College Of Engineering,Vadgaon(bk),Pune-41
5
Asst. Professor, Smt. Kashibai Navale College Of Engineering,Vadgaon(bk),Pune-41
ABSTRACT
We live in data driven world, where large amount of data is generated daily from
various sectors. This data is stored in an organized manner in databases and SQL
allows user to access, manage, process the data on the database. SQL is not easy for
users who do not have any technical knowledge of databases. Intelligent Querying
System (IQS) acts as an intelligent interface for database by which a layman or a
person without any technical knowledge of databases can fire queries in natural
language (English).This paper presents a technique for automatically generating SQL
queries from natural language.In the proposed system input is taken in form of speech
and the final output is generated after query is fired to the database. The process from
taking speech input to obtained the final output is explained in this paper. keywords-
Databases,Natural Language Processing;
1. INTRODUCTION system and database without having the
Use of Database is widespread. knowledge of the formal database query
Databases have application in almost all languages. One of the major and
information systems such as transport interesting challenge in the Computer
information system, financial information Science is to design a model for
system, human resource management automatically mapping natural language
system etc. Intelligent interface to enhance semantics into programming languages.
efficient interactions between users and For example, accessing a database and
databases, is the need of the database extracting data from it requires the
applications. Structured Query Language knowledge of Structured Query
(SQL) queries get increasingly Language (SQL) and machine readable
complicated as the size and the complexity instructions that common users have no
in the relation among these entities knowledge of. Ideally, to access a database
increase. These complex queries are very they should only ask questions in natural
difficult to write for a layman or users language without knowing either the
who do not have knowledge of the same. underlying database schema or any
The main problem is that the users who complex machine language. Questions
want to extract information or data from asked by the users in natural language
the database do not have knowledge about form are translated into a statement/query
the formal languages like SQL. The users in a formal query language. Once the
proficient in SQL languages can access statement/query is formed, the query is
the database easily but non- technical processed by the DBMS in order to
users cannot. It is essential for the user extract the required data by the user.
to know all the details of the database Databases are the common entities that are
such as the structure of the database, processed by experts and with different
entities, relations, etc. Natural language levels of knowledge. Databases respond
interface to database presents an interface only to standard SQL queries which are
for non-expert users to interact with the based on the relational algebra. It is

nearly impossible for a layman to be well doctor may not be well acquainted with
versed in SQL querying as they may be the databases. Information retrieval hence
unaware of the structure of the database becomes difficult for the doctor. It also
namely tables, their corresponding fields acts as a learning tool for students, which
and types, primary keys and so on. There help in the assessment of the SQL
is a need to overcome this gap of queries and learning through experience.
knowledge and allow users who have no The proposed system takes such problems
prior knowledge of SQL, to query a into consideration and provides a solution
database using a query posed in a natural to these problems. It makes access to
language such as English. Providing a data easier. With natural language as
solution to this problem, this system has input and conversion of natural language
been proposed that uses natural language to SQL queries, even nave users can
speech through voice recognition,
access the data in the database. The
converted to SQL query and displaying the
advances in machine learning has
results from the database.
progressively increased the reliability,
usage, and efficiency of Voice to Text
2. MOTIVATION
One of the most important aims of models. NLP has also seen major
Artificial Intelligence is to make things breakthroughs due to the of the Internet
easily and quickly accessible to humans. and Business Intelligence needs. Many
The access to information is invaluable toolkits and libraries exist for the sole
and it should be available for everyone. purpose of performing NLP, this makes
Logically understanding the needs of the developing a system for easier and
information that a person needs is quite achievable.
easy to formulate and we do it frequently.
However, one needs to have the 3. STATE OF ART
For the proposed system Intelligent
knowledge regarding formal languages to
Querying System using Natural Language
access information from current systems
Processing various papers have been
and this hinders non- technical people
reviewed whose survey report is given
from obtaining the information they want.
below. In [1] author has proposed an
It is crucial for systems to be user-friendly
in order to obtain the highest benefits. interactive natural language query
interface for relational databases. Given a
These systems try to make information
natural language query, the system first
accessible to everyone who knows a
translates it to an SQL statement and then
natural language. The main motivation of
evaluates it against an RDBMS. To
proposed systems is to break the barriers
achieve high reliability, the system
for non-technical users and make
explains to the user how the query is
information easily accessible to them.
actually processed. When ambiguities
Making a user-friendly and more
exist, for each ambiguity, the system
conversationally intelligent system will
generates multiple likely interpretations
help the user and even nave users to
for the user to choose from, which resolves
perform queries without having actual
ambiguities interactively with the user.
knowledge of SQL or database schema.
‖The Rule based domain specific semantic
We aim to introduce a modular system to
analysis Natural Language Interface for
Query a database at any time without the
Database‖ [2] converts a wide range of
hassle of logically forming the SQL
text queries (English questions) into
constructs. For an instance consider the
scenario of a hospital. Information of formal (SQL query) ones that can then
be run against a database by employing
the patient is stored in the database. A

generic and simpler processing techniques reasonable human useable format. The
and methods. This paper defines the limitations of the developed NLIDB, are
relation involving the ambiguous term as follows: 1. Domain Dependent. 2.
and domain specific rules and with this Limited on Query Domain. In ‖System
approach this paper makes a NLIDB and Methods for Converting Speech to
system portable and generic for smaller SQL‖ [6], author proposes a system
as well as large number of applications. which uses speech recognition models in
This paper only focuses on context based association with classical rule based
interaction along with SELECT, FROM, technique and semantic knowledge of
WHERE and JOIN clauses of SQL query underlying database to translate the user
and also handles complex query that speech query into SQL. To find the join
results from the ambiguous Natural of tables the system uses underlying
Language query. In ‖Natural Language to database schema by converting it into a
SQL Generation for Semantic Knowledge graph structure. The system is checked for
Extraction in Social Web Sources ‖[3], a single tables and multiple tables and it
system is developed that can execute gives correct result if the input query is
both DDL and DML queries, input by syntactically consistent with the Syntactic
the user in natural language. A limited Rules. The system is also database
Data dictionary is used where all possible independent i.e. it can be configured
words related to a particular system are automatically for different databases.
included. Ambiguity among the words is
taken care of while processing the 4. PROPOSED WORK
natural language. The system is There are many NLIDBs proposed in
developed in java programming language different papers but the interaction
and various tools of java are used to between the user and the system is
build the system. An oracle database is missing.The proposed system tries to
used to store the information. The author construct a natural language interface to
has proposed a system in [4] which databases in which the user can interact
provides a convenient as well as reliable with the system and confirm that the the
means of querying access, hence, a interpretation done by the system is
realistic potential for bridging the gap correct or not and if any manual changes
between computer and the casual end required can be done.The proposed
users. The system employs CFG based System tries to build a bridge between
system which makes it easy search the the linguistics and artificial intelligence,
terminals. As the target terminals become aiming at developing computer programs
separated to many non-terminals. To get capable of human like activity like
the maximum performance, the data understanding and producing text or
dictionary of the system will have to be speech in natural language such as English
regularly updated with words that are or conversion of natural language in text
specific to the particular system. The or speech from to language like SQL.
paper ‖An Algorithm for Solving Natural The proposed system mainly works on
Language Query Execution Problems on three important steps that are 1. Speech to
Relational Databases‖ [5] showed how a text conversion 2. SQL query generation
modelled algorithm can be used to create a 3. Result generation. As displayed in fig.1
user friend non expert search process. (flowchart) In the proposed system that is
The modularity of SQL conversion is also interactive query system using natural
shown. The proposed model has been able language processing the very first
to intelligently process users request in a challenge is to convert the speech to text

format. This phase reduce the human and string/integer variables.

effort to type the query or text. The result Considering the following sentence as
after conversion should not depend on input:
the accent of speaking, voice of the user, How old are the students whose first
etc. The speech to text conversion should name is Jean?
be precise and should produce accurate The filter must return the elements: ‖age,
result each time. As there can be pupil, first name, John‖. The order of the
ambiguity in every human speech, to words is preserved and has its importance
interpret the proper speech is difficult during next steps.
part hence the edit option is also 5. Relations-Attributes-Clauses Identifier:
available so that if there is any change Now the system classifies the tokens into
required in the machine interpretation of relations, attributes and clauses on the
human speech the human can do that, this basis of tagged elements and also
is done to reduce the further problems of separates the Integer and String values to
misun- derstanding. This is done with the form clauses.
help of Google Speech Recognition. This 6. Ambiguity Removal: It removes all the
requires an active internet connection to ambiguous attributes that exists in
work. However, there are certain offline multiple relation with the same attribute
Recognition systems such as name and maps it with the correct
PocketSphinx, but have a very rigorous relation.
installation process that requires several 7. Query Formation:
dependencies. Google Speech Recognition After the relations, attributes and
is one of the easiest to use. The speech to clauses are extracted, the final query is
text conversion requires an active internet constructed. after the query generation
connection. the generated query s fired to the database
After the conversion of human speech and the result is generated. The required
to text the next challenge is to convert result is extracted from the database and
that text to sql query, using the accurate displayed.
natural language processing algorithm the Considering the following sentence as
input:
text is
‖How old are the students whose first
converted into the sql query, complex
name is Jean?‖ Then the query generated
queries like joins must be converted
for this input is:
properly. The system analyses and
SELECT age FROM student
executes an NLQ in series of steps and at
WHERE firstname = ‘JEAN‘
each stage the data is further processed to
The system architecture is shown in fig
finally form a query.
2. In the proposed Intelligent Query
1. Lowercase Conversion: The natural
system using natural language processing.
language query is then translated into
The user is expected to give input in the
lowercase.
form of speech. The Interactive system is
2. Tokenization: The query after
developed .After taking input in the
lowercase conversion is then converted
speech format the input is then given to
into stream of tokens and a token id is
the speech to text converter and
provided to each word of NLQ.
communicator which converts it in the
3. Escape word removal: The extra/stop
text form. The user can analyze the text
words are removed which are not needed
and can update it manually if required. If
in the analysis of a query.
there is any mistake by the machine to
4. Part of Speech Tagger: The tokens are
interpret to avoid the mistake, user has
then classified into nouns, pronouns, verb
choice to edit it manually, so like this a

interactive system is been developed. and misunderstandings. This natural

There are various NLIDBs are developing language query is then converted into a
but the proposed system provides the stream of tokens with the help of
interaction between the user and ma- tokenizer and a token id is provided to
chine which leads to less mistakes and each word of the NLQ. Tokenization is
misunderstandings. This natural language the act of breaking up a sequence of
query is then converted into a stream of strings into pieces such as words,
tokens with the help of tokenizer and a keywords, phrases, symbols and other
token id is provided to each word of the elements called tokens. Tokens can be
NLQ. Tokenization is the act of breaking individual words, phrases or even whole
sentences. In the process of tokenization,
some characters like punctuation marks
are discarded.Then the parse tree is
generated through the parser with the help
of the token ids and a set of words is
identified. The output of this analysis
will be a collection of identified words.
The set of identified words is then
represented into a meaningful
representation with the MR Generator.
The identified words are transformed
into structures that show how the words
relate to each other.To find the relation
between each tokenized word is important
for query generation and that work is done
by MR generator. The semantic builder
takes the output generated by the MR
Fig. 1. System Architecture generator and extracts the relevant
up a sequence of strings into pieces such attributes from the database. The relations
as words, keywords, phrases, symbols and between the word structures and the
other elements called tokens. Tokens can attributes extracted from the database are
be individual words, phrases or even whole identified in the lexicon builder and
sentences. In the process of tokenization, relation identifier. The word structures and
some characters like punctuation marks the attributes are mapped by identifying the
are discarded.taking input in the speech relation between them and a semantic map
format the input is then given to the is created. The SQL query is constructed
speech to text converter and with the help of the semantic map input to
communicator which converts it in the the query generator. This SQL query is
text form. The user can analyze the text then fired on the database. The output after
and can update it manually if required. If the execution of the SQL query is then
there is any mistake by the machine to displayed to the user.
interpret to avoid the mistake, user has
choice to edit it manually, so like this a 5. CONCLUSIONS AND FUTURE WORK
interactive system is been developed. Intelligent Query System using Natural
There are various NLIDBs are Language Pro-
developing but the proposed system cessing is a system used for making
provides the interaction between the user data retrieval from
and machine which leads to less mistakes database easier and more interactive.

Proposed system is bridging the gap especially in conversation- like

between computer and casual user. interactions.
Without any technical training handling
databases is not possible for nave user. REFERENCES
This drawback is overcome by this system [1] Fei Li, H.V. Jagadish, Constructing an
interactive natural language interface for
This system converts the human speech relational database Journal proceedings of
input i.e. natural language input to the VLDB en- dowment, vol. 8, Issue 01, Sept.
SQL query after converting the natural 2014.
[2] Probin Anand, Zuber Farooqui, Rule based
language to SQL query the generated Domain Specific Semantic Analysis for
query is given to database which gives the Natural Language Interface for Database
International Journal of Computer
desired output.
Applications (0975 8887) Volume 164 No
Though the basic idea of the system is not 11, April 2017.
new and there are many more such [3] K. Javubar Sathick, A. Jaya, Natural Language
to SQL Generation for Semantic Knowledge
systems have been developed in the past, Extraction in Social Web Sources Middle-East
this system tries to give more accurate Journal of Scientific Research 22 (3): 375-384,
results. As well as inner joins,aggregate 2014.
[4] Tanzim Mahmud, K. M. Azharul Hasan,
functions are successfully implemented by Mahtab Ahmed, A Rule Based Approach for
this system. NLP Based Query Processing Proceedings of
In- ternational Conference on Electrical
In proposed system, the process of natural Information and Communication Technology
language queries is independent of each (EICT 2015).
other. Search is not often a single-step [5] Enikuomehin A.O., Okwufulueze D.O, An
Algorithm for Solving Natural Language
process. Query Execution Problems on Relational
A user may ask follow-up questions based Databases (IJACSA) International Journal of
on the results obtained. It is thus necessary Advanced Computer Science and Applications,
Vol. 3, No. 10, 2012.
to provide a system to support a sequence [6] Sachin Kumar, Ashish Kumar, Dr. Pinaki
of related queries. In the future, we Mitra, Girish Sundaram, System and Methods
for Converting Speech to SQL International
would like to explore how to support
Conference on Emerging Research in
follow-up queries, thereby allowing users Computing, Information, Com- munication
to incrementally focus their query on the and Applications ERCICA 2013.
information they are interested in,

MOOD ENHANCER CHATBOT USING

ARTIFICIAL INTELLIGENCE
Divya Khairnar1, Ritesh Patil2, Shrikant Tale3, Shubham Bhavsar4
1,2,3,4
Student, Smt.Kashibai Navale College of Engineering,Pune
divyakhairnar@gmail.com, patil.ritesh311@gmail.com, shrikantale1997@gmail.com,
shubham.bhavsar4@gmail.com
ABSTRACT
There are existing researches that attempt users for the psychiatric counseling with
chatbot. They lead to the changes in drinking habits based on an intervention approach
via chat bot. The existing application do not deal with the users psychiatric status and
mood through the easy communications, frequent chat monitoring, and ethical
citations in the intervention.. In addition, we will use image processing to detect mood
of the user. We recommend a friendly chatbot for counseling that has adapted
methodologies to understand counseling contents based on of high-level natural
language understanding (NLU), and emotion recognition based on machine learning
approach. The methodologies allows us to enable continuous observation of emotional
changes sensitively.Pattern matching feature provided helps in communicating with the
user via chatbot.
General Terms
Face Detection, Self Learning, Pattern Matching, Response Generation, Artificial
Intelligence, Natural Language Processing, K-nearest neighbor.
1. INTRODUCTION bot other than human who will keep their
This project emphasizes on providing thoughts safe.
solutions to the user based on the mood
recognition through face detection. 2. MOTIVATION
Response generation by chatbot uses Anxiety and depression are major issues
machine leaning concepts to implement. that are prevailing in our country. There
Emotional recognition of human has been are about 5.6 million people in India who
a long research topic. Recently many suffer from depression or anxiety. The
studies show AI methods for adequate excessive pressure of today‘s competitive
approach. In our model we have tried to world and with fast growing lives and
make emotion recognition more easy that changing environment conditions more
is via image processing. The service will and more people are being prone to
first capture human image and recognize depression. Anxiety is defined as ―a
human emotion by studying the image. feeling of worry, nervousness, or
The chatbot will suggest videos and other uneasiness‖. With the addiction of social
entertainment activities based on users media and competition there are more
mood and chat accordingly. At the end number of cases of teens committing of
there will be an analysis of the user. This suicide. This is because of insecurity, fear
service will be mainly helpful to the of separation, low self-esteems and many
people who are depressed and are not more. Mental health is not taken seriously.
confident enough to share their feelings If not treated at the right time this may
with other human beings. It make much lead to severe depression. Thus people
more easier to share ones feeling with a need to understand the importance of
mental health care.

Emotion Based Mood Enhancing SR.
PAPER
ADVANTAG- DISADVAN
Music Recommendation, 2017 ‖ proposed NO. ES TA-GES
system „EmoPlayer‟, is an android

A Chatbot for
application which minimize efforts by Psychiatric
RNN cannot
suggesting user a list of songs based on his Counseling in
Free track long
Mental
current emotions[1].A Chatbot for counseling, term
Healthcare
dependency,
Psychiatric Counseling in Mental 1 Service Based on
Implementation huge amount
Emotional
Healthcare Service Based on Emotional of morpheme of training
Dialogue
embedding. data
Dialogue Analysis and Sentence Analysis and
required.
Sentence
Generation, 2017‖ proposed system Generation.[1]
enables continuous observation of
The chatbot feels
Efficient use of
emotional changes sensitively[2]. A Novel you- A Only uses
pattern
counseling NLP, storage
Approach For Medical Assistance Using matching, RNN
2 Service using limitation
performs better
Trained Chatbot, 2017 proposed system Emotion due to use of
with human
can predict diseases based on symptoms Response RDBMS.
interaction.
Generation[2]
and give the list of available treatments[3].
Age based
A Study On Facial Components Detection A novel No real time
medicine
approach for monitoring
Method For Face Based Emotion dosage details,
medical of users,
Recognition, 2014‖, proposed system for 3
assistance using
easy to use due
accuracy
to JSON docs,
trained cannot be
facial component detection method for cross platform
chatbot[3] guaranteed.
compatibility.
face-based emotion recognition[6]. The
Chatbot Feels You- A Counselling Service Chatbot Using A
Use of
Knowledge in
Using Emotional Response Generation, Implementation bigram,
Database
of pattern storage
2017, proposed system to introduce novel 4 Human-to-
matching, use limitation
Machine
chatbot system for psychiatric counselling of AIML. due to use of
Conversation
RDBMS.
service[5]. Modeling[4]
4.GAP ANALYSIS Requires a

lot of sample
Existing system only recommends us Emotion based Use of haar
images,
5 mood cascade
music as a response on the basis of enhancement [5] algorithm.
hence more
mood[1]. Machine Learning concepts such storage
required.
as Self Learning and Pattern Matching are
not used in the proposed model. Only
feature provided is music
recommendation. This system do not
provide way of communicating with the
user.
Proposed system will recommend not
only music but will also interact with the
user. Machine learning concepts such as
self learning along with pattern matching
will be used. Proposed system will suggest
music as well as motivational videos,
jokes, meditation etc.

5. PROPOSED WORK 3. The best fit model is selected for our

In this paper, we have proposed a predictions.
system ―TALK2ME‖ which is an desktop System will respond to user
based application. This application will according to users emotion. It will not only
detect human emotions based on the chat with user but also will recommend
images clicked by the application. It will him/her motivational videos, songs and
gather random images of various people other entertainment stuff. Suppose user is
with different emotions to train our model. sad we will recommend motivational
By studying these images, model can videos to the user which will enhance
classify various human emotions such as users mood or the application will generate
happy, sad, angry, depressed, etc precisely. a playlist which will consist of songs
System will respond to the user according which will boost users emotions. If the
to identified emotion. System will also user is depressed the system will boost up
suggest songs and videos to user for users user emotions by suggesting videos which
mood enhancement. System will keep will increase users confidence.
track of various emotions of the user and System enables pattern matching
will generate graphs according to that. feature. The user input will be matched
1. System detects user emotions using with the existing data in the database and
machine learning algorithms such as haar- reply according to the users requirement.
cascade.[1] System uses KNN algorithm for
2. Random images will be used to train our pattern matching.
models.
System makes use of machine learning to 6.CONCLUSION AND FUTURE
learn new things based on past inputs. This WORK
system uses k-means clustering algorithm Integrating chatbots into the employee
development and training process can go a
to form clusters of sentences or words that
long way in boosting the productivity of
have similar meanings.The following the employees.A human emotion
figure correctly depicts the architecture of recognizing chatbot application is still in
the proposed system along with its its early days,but if used promptly by
components. human resources,it is sure that it will
enhance the ever growing industry of
artificial intelligence. An emotion based
chatbot will surely help in medical fields if
it is deployed with utmost priority to
security concerns.
REFERENCES
[1] P. Belhumeur, J. Hespanha, and D. Kriegman,
―Eigenfaces vs. Fisherfaces: Recognition
Using Class Specific Linear Projection,‖ IEEE
Transactions on Pattern Analysis and Machine
Intelligence. Vol. 19, No. 7, pp. 711-720,
1997.
[2] Ashleigh Fratesi, ―Automated Real Time
Emotion Recognition using Facial Expression
Analysis‖, Master of Computer Science thesis,
Carleton University
[3] Mudrová, M, Procházka, A, ―Principal
component analysis in image processing‖,
Department of Computing and Control
Engineering, Institute of Chemical
Fig.System Architecture
Technology'.

[4] Paul Viola and Michael J. Jones, ―Robust real- [8] Lucey, P., Cohn, J. F., Kanade, T., Saragih, J.,
time object detection‖, International Journal of Ambadar, Z., & Matthews, I. (2010). The
Computer Vision, Vol. 57, No. 2, pp.137–154, Extended Cohn-Kanade Dataset (CK+): A
2004. complete expression dataset for action unit and
[5] SayaliChavan, EktaMalkan, Dipali Bhatt, emotion-specified expression. Proceedings of
Prakash H. Paranjape, ―XBeats-An Emotion the Third International Workshop on CVPR
Based Music Player ‖, International Journal for for Human Communicative Behavior Analysis
Advance Research in Engineering and (CVPR4HB 2010), San Francisco, USA, 94-
Technology, Vol. 2, pp. 79-84, 2014. 101.
[6] Xuan Zhu, Yuan-Yuan Shi, Hyoung-Gook [9] R.Cowie E.Douglas-Cowie.N.Tsapatsoulis.G.
Kim and Ki-Wan Eom, ―An Integrated Music Votsis. S. Koilias.W.Fellenz.Emotion
Recommendation System‖ IEEE Transactions Recognition in Human Computer Interaction.
on Consumer Electronics, Vol. 52, No. 3, pp. IEEE Signal Processing Magazine 18(01).32
917-925, 2006. 80. 2001.
[7] Dolly Reney and Dr.NeetaTripaathi, ―An [10] O.Martin.I.Kotsia.B.Macq.I.Pitas,The
Efficient Method to Face and Emotion eNTERFACE 05 Audio-visual Emotion
Detection‖, Fifth International Conference on Database,In:22nd International Conference on
Communication Systems and Network Data Engineering workshops
Technologies, 2015 Atlanta.Atlanta.GA.USA.2006.

MULTISTAGE CLASSIFICATION OF DIABETIC

RETINOPATHY USING CONVOLUTIONAL
NEURAL NETWORKS
Aarti Kulkarni1, Shivani Sawant2, Simran Rathi3, Prajakta Puranik4
1,2,3,4
Computer Engineering Department , Smt. Kashibai Navale College of Engineering, Pune- 411041
aartikulkarni36@gmail.com, shivani.sawant97@gmail.com, rathisimran29@gmail.com,
prajaktapuranik02@gmail.com
ABSTRACT
Diabetic Retinopathy (DR) is a diabetes complication that affects the eye causing
damage to the blood vessels of retina. The progressive effect of it may lead to complete
blindness. It has shown progressive effects on people especially in India. The screening
of such disease involves expansive diagnosis measures which are meagre. To overcome
this situation, this paper proposes a software-based system for ophthalmologists that
facilitates the stage wise classification of Diabetic Retinopathy. Convolutional Neural
Network facilitates the stage-based classification of DR by studying retina images
known as Fundus Images. These images are classified by training the network based
on their features. With increasing number of Diabetic Retinopathy patients, the need
for the automated screening tools becomes indispensable. This application will help
ophthalmologists to quickly and correctly identify the severity of the disease.
General Terms - deep learning; computer vision
Keywords- diabetic retinopathy; image classification; deep learning; convolutional neural
network; transfer learning
1. INTRODUCTION These reasons contribute to the difficulties
DR is recognized by the presence of in gradeability of the image. This makes
symptoms includ-ing micro-aneurysms, the automation of the DR system
hard exudates and haemorrhages. These necessary. The fundus images obtained via
symptoms have been aggregated into five public datasets consist of some
categories according to the expertise of irregularities which need to be corrected
ophthalmologist which are as follows: prior to feeding it to the CNN. The images
Stage 1: No apparent retinopathy, Stage 2: are pre-processed to obtain normalization
Mild None-Proliferative DR(NPDR), throughout the dataset. CNN is
Stage 3: Moderate NPDR, Stage 4: Severe implemented via the Transfer Learning
NPDR, Stage 5: Proliferative DR [1][2]. A algorithm which fa-cilitates the use of pre-
recent nation-wide cross-sectional study of trained models allowing the network to
Diabetic patients by the AIOS, reiterated classify the labelled dataset into the
the findings of earlier regional studies required five classes. It demonstrates the
which concluded the prevalence of effectiveness of the method for DR image
Diabetic Retinopathy in India on a large recognition and classification. The use of
scale. The existing methods for Transfer Learning for CNN guarantees
classification and analysis of DR face high accuracy results within the limited
certain issues. The rarity of systematic DR time constraints. It makes the system
screening methods is one of the major robust and removes the constraints of the
causes. Also, acquisition of good quality quantity and quality of data.
retinal images possesses a challenge [3].

2. MOTIVATION for diabetic retinopathy. The output

Among individuals with diabetes, the obtained for this model is labeling of
prevalence of diabetic retinopathy is textural feature called Exudates in retina.
approximately 28.5 percent in the United The performance measures indicated
high performance but accuracy rate can
States and 18 percent in India [4]. Earlier
be improved by increasing the training
methods for detecting DR include data size. The well-trained CNN model
manual interpretation and repeat can also be leveraged for multi stage
examinations. This is time consuming classification of DR [7].
and can delay the prognosis which may Bui et al proposed architecture of neural
lead to severe complications. Auto- network for detection of cotton wools in
mated grading of diabetic retinopathy retinal fundus images. Feature extraction
has potential benefits such as increasing can be improved by introduction of
efficiency, coverage of screening pro- Convolu-tional layer. Accuracy rate can
grams and improving results by also be improved by training a CNN
providing early detection and treatment. over a traditional neural network [8].
Padmanabha et al. proposed the
implementation of SVM to perform
Kanungo et al. proposed a CNN model
binary classification of DR.
built around the Inception-
Preprocessing tech-niques like adaptive
v3architecture. The architecture basically
histogram equalization and segmen-
acts as multiple convolution filter inputs
tation of blood vessels were
that are processed on the same input. It
implemented. This enabled extraction of
also does pooling at the same time. All
textural features of the entire retinal
the results are then concatenated. This
region. Although it obtained binary
allows the model to take advantage of
classification of DR, multi-stage
multi-level feature extraction from each
classification and the scope of better
input. Problem of overfitting could be
feature extrac-tion can be achieved in
reduced [5].
future [9].
Fitriati et al. proposed an implementation
Wang et al. proposed the implementation
of Diabetic retinopathy screening using
of transfer learn-ing by comapring three
real time data. Extreme Learn-ing
pre-trained Convolutional Neural
Machine is used as classification method
Network architectures to perform five
for binary classification of DR stages.
stage classification of DR. Image
RCSM and DiaretDB0 datasets were
used for training and testing. While preprocessing technique implemented
DiaretDB0 achieved high training was noise reduction. Although it
accuracy, it failed to perform in testing. obtained multistage classification of DR,
The model performed poorly for both increasing the training data size would
training and testing of RCSM. improve DR categorization accuracy
Introduction of robust predictive and further [10].
recognition model like CNN could 4. PROPOSED WORK
improve the performance. [6]. Dataset
Yu et al. proposed a Convolutional Kaggle dataset provides a large set of
Neural Network for Exudate detection Fundus images taken under a variety of

imaging conditions. Images are labelled channels of the output equal to the
with a subject ID as well as their number of features. The correspondence
orientation [11]. The images have been of the feature detectors with the required
labelled on a scale of 0 to 4 where 0 is output is very large which may lead to
no DR and 4 is proliferative DR. The overfitting. To avoid this, the parameters
dataset consists of 35,126 training to be trained on the network can be fixed
images divided into 5 category labels by computing the dimensions of the
and 10715 test images which are 20 filters and the bias, such that it does not
percent of the total test dataset. The depend on the size of the input
dataset is a collection of images with image.Each layer outputs certain values
different illumination, size and by convoluting the input and the filter.
resolution, every image needs to be Non-linearity activation functions are
standardized. Initially all images are applied to the output to achieve the final
resized to standard dimensions. Dataset computations.
images are RGB colour images
consisting of red, green and blue
channels out of which the green channel
is used to obtain the best results in
contrast of blood vessels. This is
depicted in Fig. 1.
Method
Convolutional Neural Networks:

Convolutional Neu-ral Network is a deep
learning based artificial neural network
Fig. 1: Image Pre-processing
that is used to classify images, cluster
them by similarity and perform object
recognition. It is used to detect features
according to the respective classification
of the images. The number of features to
be detected directly corresponds to the
filters used. The filters are treated as
small windows of the required
dimensions which are convoluted with
the matrix of the input image pixels.
Vertical and horizontal feature detectors
are implemented with the number of Fig. 2: Transfer Learning for CNN
ReLU is used to compute the non- the network to the activations of the next
linearity functions which can be layer. The convolution layer is paired with
implemented by simply thresholding a a pooling layer which is used to reduce the
matrix of activations at zero. These size of representation and to speed up
computations involve taking the computation although this layer has
activations from the previous one layer of nothing to learn.

The use of CNN is proposed for higher accuracy results of classification.

particularly two reasons: parameter CNN-based transfer learning methodology
sharing and sparsity of the connections. would result in better performance of DR
Parameter sharing is feasible as the classification task as the pre-trained model
number of parameters can be fixed by accurately classifies the low-level visual
patterns in the images. The expected
using feature detectors which can be
results of this automated system will help
applied multiple times in various regions in the accurate diagnosis of DR and the
in case of a very large image. Sparsity of obtained results will enable
connections benefits the network as not all ophthalmologists to correctly recommend
nodes of every layer have to be connected the appropriate treatment. The results of
to each other. the au-tomated system are obtained within
limited time constraints and this proves to
Transfer Learning: Transfer Learning is be beneficial as manual processes often
implemented in CNN to lower take a day or two for evaluating the
computational cost and save time. It severity of the disease leading to
comprises the use of a trained network and miscommunication and delayed treatment.
using it for another task of a different
WORK
design. The modification in the training of The exponential growth of this disease
the layers can be done as per the features in people created an alarmed situation as
that are to be detected. The re-training of people were not able to receive timely
the necessary layers can be done by treatment. Generally, the testing of the
freezing the layers and the implementation patient and the analysis of the report takes
a lot of time without the guarantee of
of an alternate SoftMax layer to simplify
accurate results. To reduce this problem
the output of the network in the desired this system is designed. Diabetic
number of classes. Retinopathy is classified into five stages,
DR diagnosis is done by using an image corresponding to the symptoms. The stage
dataset which consist of a large number of wise classification helps to analyze the
images. Since the number of images severity of the treatment. The use of Deep
available for training is high, many layers Learning methods has become very
popular in all applications due to its self
can be re-trained. The last few layers are learning aspect. Transfer Learning based
re-trained by adding new hidden units, a Convolutional Neural Network reduces the
SoftMax layer which will give output in learning time period of the system and also
the required five classes which correspond guarentees the high accuracy results. It
to the stages of the disease. Leveraging the makes the system robust and removes the
trained units of another network allows for constraints of the quantity and quality of
data.
better cost and time for large datasets. Fig
Future Work
2. depicts the flow of the model including The classification of Diabetic
Transfer Learning in CNN. Retinopathy is done using Fundus Eye
Outcome Images as the input image data. Along
The classification of DR is done in five with this, OCT images could also be used
stages according to the symptoms to identify and classify the disease. OCT
associated with the fundus images. Ev-ery images are eye images that are also used
stage is recognized by a set of particular for retinal scanning. The use of these
symptoms. The appropriate image pre- images also for the identification and
processing techniques aid in achieving classification of the disease will expand

the scope of the diagnosis. It will also 2016, pp. 198-203. doi:
allow the model to learn better during the 10.1109/IAC.2016.7905715
[6] S. Yu, D. Xiao and Y. Kanagasingam,
training phase. Diabetic Retinopathy is one
‖Exudate detection for diabetic retinopathy
of the diseases that affect people all over with convolutional neural networks,‖ 2017
the world. The success of such a system 39th Annual International Conference of the
for the classification of Diabetic IEEE Engineering in Medicine and Biology
Retinopathy provides the scope for Society (EMBC), Seogwipo, 2017, pp. 1744-
building such a system for various other 1747. doi: 10.1109/EMBC.2017.8037180
diseases that need accurate results and [7] T. Bui, N. Maneerat and U. Watchareeruetai,
within short period of time. Convolutional ‖Detection of cotton wool for diabetic
retinopathy analysis using neural network,‖
Neural Network is a very powerful 2017 IEEE 10th International Workshop on
network which can be further used for Computational Intelligence and Applications
extended analysis of various other (IWCIA), Hiroshima, 2017, pp. 203-206. doi:
diseases. 10.1109/IWCIA.2017.8203585
REFERENCES
[1] pp.
C. P.1677-1682.
Lee,
Kampik,
and Wilkinson,
G.C.
clinical
macular D.D.
R.
diabetic
edema
Ophthalmology, F. retinopathy
L. Ferris,
R.Pararajasegaram,
Agardh,
P. Group, M. R.
Proposed
disease
vol. 110, J. E.
Davis, T.
D.Klein, P. A.
P.
international
Verdaguer,
severity
issue and
9, Dills,
diabetic
Sep. scales,
2003,
[8] A. G. A. Padmanabha, M. A. Appaji, M.
[1] T. Y. Wong, C. M. G. Cheung, M. Larsen, S. Prasad, H. Lu and S. Joshi, ‖Classification of
Sharma, and R. Sim, Diabetic retinopathy, diabetic retinopathy using textural features in
Nature Reviews Disease Primers, vol. 2, Mar. retinal color fundus image,‖ 2017 12th
2016, pp. 1-16. International Conference on Intelligent
[2] Gadkari SS. Diabetic retinopathy screening: Systems and Knowledge Engineering (ISKE),
Telemedicine, the way to go!. Indian J Nanjing, 2017, pp. 1-5. doi:
Ophthalmol 2018;66:187-8 10.1109/ISKE.2017.8258754
[3] Gulshan V, Peng L, Coram M, et al. [9] X. Wang, Y. Lu, Y. Wang and W. Chen,
Development and Validation of a Deep ‖Diabetic Retinopathy Stage Classification
Learning Algorithm for Detection of Diabetic
Using Convolutional Neural Networks,‖ 2018
Retinopathy in Retinal Fundus Photographs.
IEEE International Conference on Information
JAMA. 2016;316(22):24022410.
doi:10.1001/jama.2016.17216 Reuse and Integration (IRI), Salt Lake City,
Choudhary,
usingS. deep
[4] 801-804.
Y.
International
Electronics,
Technology ‖Detecting
Kanungo,learning,‖
Conference
doi:Information
(RTEICT), diabetic
on2017
B. Bangalore,
and retinopathy
2nd
Srinivasan
Recent andIEEE
Trends
Com-munication
2017,
10.1109/RTEICT.2017.8256708 S.
in
pp.
[5] D. Fitriati and A. Murtako, ‖Implementation of UT, 2018, pp. 465-471. doi:
Diabetic Retinopathy screening using realtime 10.1109/IRI.2018.00074
data,‖ 2016 International Conference on [10] https://www.kaggle.com/c/diabetic-
Informatics and Computing (ICIC), Mataram, retinopathy-detection/dat.

PREDICTING DELAYS AND CANCELLATION OF

COMMERCIAL FLIGHTS USING
METEOROLOGICAL AND HISTORIC FLIGHT
DATA
Kunal Zodape1, Shravan Ramdurg2, Niraj Punde3, Gautam Devdas4, Prof. Pankaj
Chandre5, Dr. Purnima Lala Mehta6
1,2,3,4
Student, Department of Computer Engineering SKNCOE,Savitribai Phule Pune University Pune,
India
5
Asst. Professor, Department of Computer Engineering SKNCOE,Savitribai Phule Pune University Pune,
India
6
Department of ECE, HMR Institute of Technology and Managmenet Delhi
kunal.zodape@gmail.com, shravanramdurg45@gmail.com, nirajpunde@gmail.com,
gautam.sinhgad2016@gmail.com, pankajrchandre30@gmail.com,
Purnima.lala@gmail.com
ABSTRACT
Flight delays are a problem which has reached its pinnacle in the recent times.
These delays are primarily caused due to seasonal upsurges in number of commuters
or meteorological interferences. Airline companies suffer through economical issues
such as reimbursement costs, arrangement of accommodations and latent issues like
damages to the brand value and depreciated public image. By introducing a predictive
model the airline companies can help in the planning and logistics operations by taking
preemptive measures. Commuters can use this information to mitigate the
consequences of flight delays. In this paper we propose the use of boosting methods to
improve the performance of classifiers by tweaking weak learners in the favour of those
instances which were misclassified in the previous iterations. The model is built using
various statistical techniques based on the stochastic distribution of trends in the
datasets.
Keywords:
Predictive Analysis, Machine Learning, Supervised Learning, Data Mining
1. INTRODUCTION capacity and Airport, Airline choice in
With the concept of machine multi-airport regions and delay
learning fueled by the upsurge in the propagation[5][6][7].
processing power of the underlying We intend to predict flight delays
hardware we have been able to apply using historic flight and meteorological
complex mathematical computations to big data as features. Due to flight delays and
data iteratively and automatically in a cancellations, many airline customers
reasonable time on modern computers. On suffer from complications in their business
the other hand, data mining involves data or travel schedules. Furthermore, airlines
discovery and data sorting among large have to pay hefty amounts for the
data sets available to identify the required reimbursements, accommodation charges
patterns and establish relationships with and may miss critical business deadlines
the aim of solving problems through data which could result in loss of revenue,
analysis[1]. Previous attempts for solving further damaging the quality and
this problem involve usage of techniques reputation [8]. Machine learning
such as Artificial Neural Networks, algorithms can assist the passengers by
Gaussian processes, Support Vector reducing the inconveniences caused by
Machines[2][3][4]. Previous works also delays and cancellations and help the
involve considering factors like Airport airlines save on the reimbursements and

improve their quality by being better for cross validation of the intermediate
prepared for similar anomalies in the models and RMSE and Ecart metrics
future. gauge their performance. The
implementation is carried out in Python 3.
2. MOTIVATION ● Review on Flight Delay Prediction
● Every day almost 2.2 million Alice Sternberg, Jorge Soares, Diego
people willingly board commercial Carvalho, Eduardo Ogasawara
airlines despite the fact that around This paper proposes a taxonomy
850,000 them would not get to their and consolidates the methodologies used
desired destination on time [9]. to address the flight delay prediction
● Roughly 40 percent of all air problem, with respect to scope, data, and
travelers have arrived late consistently for computing methods, specifically focusing
most of the last 35 years [10] . And unless on the increased usage of machine learning
things change dramatically, about 40 methods. It also presents a timeline of
percent of all the air travelers will continue significant works that represent the
to arrive late every year, perhaps forever. interrelationships between research trends
● A 40 percent failure rate would be and flight delay prediction problems to
unacceptable for the global commercial address them.
passenger flight network and acts as a● A Deep Learning Approach to Flight
bottleneck for various business and travel Delay Prediction
related activities along with air cargo Young Jin Kim, Sun Choi, Simon Briceno
delivery operations. and Dimitri Mavris
● Using historic flight data and This paper uses deep learning
meteorological data of the source and models like Recurrent Neural Networks
destination airports as the major attributes and long short-term memory units along
this paper cyphers this problem using with RNN. Deep learning is suitable for
various machine learning algorithms in learning from labelled as well as
order to gauge the feasibility of different unlabelled data. It uses multiple hidden
algorithms and choose the most accurate layers to improve the learning process and
one for prediction. can accelerated using modern GPUs. Deep
learning tries to mimic the learning
3. LITERATURE SURVEY methodologies of biological brain (mainly
This section provides information human brain). This paper comments on
about the previous work done for effectiveness of various deep learning
addressing the problem of flight delay models for predicting airline delays
● A statistical approach to predict flight
prediction.
delay using gradient boosted decision
● Airline Delay Predictions using tree
Supervised Machine Learning Pranalli Suvojit Manna, Sanket Biswas, Riyanka
Chandraa, Prabakaran.N and Kundu, Somnath Rakshit, Priti Gupta
Kannadasan.R, VIT University, Vellore. This paper investigates the
This paper uses preliminary data effectiveness of the algorithm Gradient
analysis techniques and data cleaning to Boosted Decision Tree, one of the famous
remove noise and inconsistencies. The machine learning tools to analyse those air
machine learning techniques used are traffic data. They built an accurate and
multiple linear regression and polynomial robust prediction model which enables an
regression which allow for various metrics elaborated analysis of the patterns in air
of bias and variance in order to pinpoint traffic delays.
the best fitting parameters for the
respective models. K-fold method is used

4. GAP ANALYSIS previously published addressing the flight

This section provides the delay prediction problem.
comparison drawn between the paper
Table 1: GAP ANALYSIS

Sr. Paper Title Year Algorithms used Results Obtained
no.
1 Airline Delay 2018 Linear Flight Delay

Predictions using Regression Prediction Analysis
Supervised
Machine Learning
2 A Review in Flight 2017 KNN, Taxonomy and

Delay Prediction Fuzzy Logic, summarized
Random Forest initiatives to address
flight delay
prediction problem
3 A Deep Learning 2016 Recurrent Neural Improved accuracy in

Approach to Flight Networks flight delay
Delay Prediction prediction
4 A Statistical 2017 Gradient boosted Prediction model

approach to predict decision trees enabling an
Flight Delay using elaborated analysis of
Gradient Boosted patterns in air traffic
Decision Tree delays
5. PROPOSED WORK parameters which are ultimately used for

The proposed predictive model initially prediction calculation.
undergoes three data preprocessing
Datasets and Sources
techniques which consists of:
The U.S. Department of
● Filling in missing values Transportation's Bureau of Transportation
● Alternative values for crucial cells Statistics tracks the on-time performance
● Merging Climatological data with
of domestic flights operated by large air
flight data
● Dimensionality Reduction carriers.
Data Preprocessing is followed by

designing the prediction engine and
building the learning model using different
boosting techniques for producing Learned

Table 2: Flight Dataset Attributes
Sr. Flight dataset attributes Sample Data Values / Description

no.
1 AIRPORT ABQ, BLI, DHN
2 CITY Albuquerque, Waco, Nantucket
3 STATE NM, TX, PA
4 COUNTRY USA
5 DATE dd-mm-yyyy format
6 FLIGHT_NUMBER Flight Identifier
7 ORIGIN_AIRPORT Starting Airport
8 COUNTRY USA
9 DESTINATION_AIRPORT Planned Destination
10 SCHEDULED_DEPARTURE Planned Departure Time
11 DEPARTURE_TIME Actual Departure Time
Local Climatological Data (LCD) locations. Provided in the public domain

consist of hourly, daily, and monthly via the US National Oceanic and
summaries for approximately 1,600 U.S. Atmospheric Administration.
Table 3: Meteorological Dataset Attributes
Sr. Sample Data

Meteorological
no. Values /
dataset attributes
Description
1 DATE Given in serial

numbers
2 TEMPERATURE Min, Avg, Max

- MIN, AVG, temperature in
MAX fahrenheit

3 SUN - Given in terms of

RISE(UTC), ISO 8601 without
SET(UTC) seconds
4 WEATHER Alpha numerical

weather type
identifiers
5 PRECIPITATION Given in terms of

- RAIN, SNOW inches for snow and
rainfall
6 PRESSURE Given in terms of

inch of mercury i.e
hg
7 WIND SPEED - Low and high wind

LOW, HIGH speed given in
terms of km
/hr
8 WIND Low and high wind

DIRECTION - speed given in
LOW, HIGH terms of degrees
5.2 System Overview 5.3 Data Preparation and

This section provides the Preprocessing
architectural overview of the proposed This section lists some heuristics
system highlighting the processing work- for preparing and preprocessing the data
flow. before building the learning model.
5.3.1 Filling in missing values

This step deals with the missing
values in the dataset by filling them with
an unique identifier. For example if
DEPARTURE_TIME is absent then the
empty cell is to be filled with an unique
identifier.
5.3.2 Data Discretization/Binning

Attributes with a continuous
distribution of values can be classified
using the process of data discretization or
Fig. 1: System Overview
binning, in order to create discrete class of
ordinal values to provide an optimised
environment for the learning process.

5.3.3 Merging Meteorological data AdaBoost (Adaptive boosting) was

with flight data developed for efficiently boosting binary
Merging meteorological data from classifiers. AdaBoost is adaptive because
National Oceanic and Atmospheric previous weak learners are tweaked in the
Administration with historical flight data favor of those instances which are
procured from Bureau of Transportation misclassified [11].
Statistics. AdaBoost is sensitive to noisy data
and outliers, hence high quality training set
5.3.4 Dimensionality Reduction
Dimensionality reduction helps in is required to counteract this. The most
reducing the complexity of the dataset by commonly used algorithm with AdaBoost
merging correlated attributes and creating is decision trees with one level, which are
more generalized attributes that facilitates also known as decision stumps. The
faster computations during the learning process of adding weak learners is
process. Due to the lower dimensionality continued, till no further improvements
of the resultant dataset, data visualization can be made or until the threshold number
and analysis becomes more concise. of weak learners is achieved[12].
5.4 Model Building
5.4.1 Boosting Ensemble Method
Fig. 2: Boosting
These are boosting methods which work

on weak classifiers to improve the Fig. 3: AdaBoosting
performance using additive learning which AdaBoost puts more weight on the
learns basically through improving the instances that are difficult to classify,
previously built models. rather than instances that are easily
The main methodology used in this, is to classified. AdaBoost is less susceptible to
build a model from training data and over-fitting the training data. Strong
creating a second model that corrects the classifier can be built by converging
errors of the previously built model . individual weak learners.
5.4.2 AdaBoost Method

5.4.3 Gradient Boosting There has been several attempts to

apply the various supervised or
Gradient boosting is a highly unsupervised machine learning algorithms
popular technique for building predictive to the predict delays and cancellations in
models. Gradient boosting can be commercial airlines. For instance, in the
expressed as an optimization problem to paper A Statistical approach to predict
minimise a loss function by combining Flight Delay using Gradient Boosted
multiple weak submodels using a Decision Tree [13], the algorithm used was
procedure similar to gradient descent. This Gradient Boosting. The comparison
method overcomes the limitations imposed between above Adaboost, Gradient
by AdaBoost by expanding the scope of Boosting and Stochastic boosting
boosting to support regression and multi- algorithm is given as follows:
class classification.
Adaboost and Gradient boosting differ on
Gradient boosting involves three elements: how they create the weak learners during
the iterative process. Adaptive boosting
Loss Function: A differentiable metric changes the sample distribution at each
that needs to be minimised in order to fine iteration by modifying the weights
tune the model. It can be calculated using a attached to each of the instances. It favours
variety of methods like squared sum the misclassified data points by increasing
(RMSE), eCart or logarithmic loss. the weights and similarly decreases the
Weak Learner: Decision trees are used as weights of the correctly classified data
weak learners in gradient boosting method. point. Thus weak learner is trained to
The trees can be constrained in multiple classify more difficult instances. After
ways like depth limiting or branch training, the weak learner is added to the
limiting. In case of AdaBoost, the decision strong one according to its performance.
trees are usually comprised of a single split The higher its performance, the more it
(Decision stumps), whereas in gradient contributes to the strong learner.
boosting, trees can have 4 to 8 levels. On the contrary, Gradient boosting
does not modify the sample distribution.
Additive Model: The weak classifiers are Instead of performing training on a new
combined by using the weights of the subsample distribution, the weak learner trains
models as the parameters in a gradient on the remaining errors of the strong
descent procedure for minimising the loss learner. In each iteration, the mismatched
function. data points are calculated and a weak
5.4.4 Stochastic Gradient Boosting learner is fitted to these mismatched data
points of the strong learner.
In this method, base learner should Whereas, in stochastic gradient
be fit on a subsample of the training set boosting at each iteration a subsample of
drawn at random without replacement, at the training data is drawn at random
each iteration. Due to this, a significant (without replacement) from the full
improvement is observed in the training dataset. The randomly selected
performance of the models built using subsample is then used, instead of the full
gradient boosting[12]. sample, to fit the base learner. A few
Using this model proves beneficial as it variants of stochastic boosting like
reduces correlations between the subsample rows before creating each tree,
submodels by greedily selecting the most subsample columns before creating each
informative trees in a stochastic manner. tree and subsample columns before
considering each split can be used.
6. SYSTEM COMPARISON Thus, the proposed algorithm
Stochastic Gradient Boosting exhibits

better performance than the previously [6] Tierney, Sean, and Michael Kuby. "Airline and
implemented Adaboost and Gradient airport choice by passengers in multi airport
regions: The effect of Southwest airlines." The
Boosting. Professional Geographer 60.1 (2008): 1532.
[7] Schaefer, Lisa, and David Millner. "Flight delay
propagation analysis with the detailed policy
7. CONCLUSION assessment tool." Systems, Man, and
In a generalized manner, this paper Cybernetics, 2001 IEEE International
have shown that prediction of delays in Conference on . Vol. 2. IEEE, 2001.
[8] Guy, Ann Brody."Flight delays cost
commercial flights is tractable and that $32.9billion".
local weather data at the origin airport is http://news.berkeley.edu/2010/10/18/flight_dela
indeed essential for the prediction of ys.
delays. [9] ―Airlines' 40% Failure Rate: 850,000
In the case of flight delays or Passengers Will Arrive Late Today -- And
Every Day‖
cancelation, the most significant real world https://www.forbes.com/sites/danielreed/2015/0
factors are combination of technical and 7/06/airlines-40-failure-rate-850000-
logistical issues. The datasets considered passengers-will-arrive-late-today-and-every-
in the paper do not provide this aspect of day/#2d077c1074bd
data thus the accuracy of the model model [10] Hansen, Mark, and Chieh Hsiao. "Going
south?: Econometric analysis of US airline
is restrained by this limitation. flight delays from 2000 to 2004."
Transportation Research Record: Journal of the
Transportation Research Board 1915 (2005):
REFERENCES. 8594.
[1] Belcastro, Loris, et al. "Using Scalable Data [11] Robert E. Schapire.― Explaining AdaBoost
Mining for Predicting Flight Delays." ACM
Transactions on Intelligent Systems and ―.Princeton University, Dept. of Computer
Technology (TIST) 8.1 (2016) Science, 35 Olden Street, Princeton, NJ 08540
[2] Khanmohammadi, Sina, Salih Tutun, and USA, e-mail: schapire@cs.princeton.edu
Yunus Kucuk. "A New Multilevel Input Layer [12] Jerome H.Friedman. ―Stochastic gradient
Artificial Neural Network for Predicting Flight boosting‖. Department of Statistics and
Delays at JFK Airport." Procedia Computer Stanford Linear Accelerator Center, Stanford
Science 95 (2016): 237244. University, Stanford, CA 94305, USA
[3] Hensman, James, Nicolo Fusi, and Neil D.
Lawrence. "Gaussian processes for big data." [13] Suojit Manna ,Sanket Biswas.‖A Statistical
CoRR,arXiv:1309.6835 (2013) approach to predict Flight Delay using
[4] Bandyopadhyay, Raj, and Guerrero, Rafael. Gradient Boosted Decision Tree‖.
"Predicting airline delays." CS229 Final
Projects (2012).
[5] Gilbo, Eugene P. "Airport capacity:
Representation, estimation, optimization." IEEE
Transactions on Control Systems Technology
1.3 (1993): 144154.

A SURVEY ON RISK ASSESSMENT IN HEART

ATTACK USING MACHINE LEARNING
Rahul Satpute1, Atharva Dhamrmadhikari2, Irfan Husssain3, Prof. Piyush Sonewar4
1,2,3
Student, Department of Computer Engineering Smt. Kashibai Navle College of Engineering, Pune.
India
4
Asst. Professor, Department of Computer Engineering Smt. Kashibai Navle College of Engineering, Pune.
India
rahul.satpute24@gmail.com, atharavdharmadhikari0@gmail.com, irfan.husen888@gmail.com
ABSTRACT
Acute Myocardial Infarction (Heart Attack), a Cardiovascular Disease (CVD) leads to
Ischemic Heart Disease (IHD) is one of the major killers around the world. A
proficient approach is proposed in this work that can predict the chances of heart
attack when a person is bearing chest pain or equivalent symptoms. We will developed
a prototype by integrating clinical data collected from patients admitted in different
hospitals attacked by Acute Myocardial Infarction (AMI). 25 attributes related to
symptoms of heart attack are collected and analyzed where chest pain, palpitation,
breathlessness, syncope with nausea, sweating, vomiting are the prominent symptoms
of a person getting heart attack. The data mining technique naïve bayes classification is
used to analyze heart attack based on training dataset. This technique will increase the
accuracy of the classification result of heart attack prediction. A guiding system to
suspect the chest pain as having heart attack or not may help many people who tend to
neglect the chest pain and later land up in catastrophe of heart attacks is the most
interesting research area of researcher's in early stages.
Keywords: Acute Myocardial Infarction (Heart Attack), Cardiovascular Disease (CVD),
Ischemic Heart Disease (IHD), Naïve Bayes Classification.
1. INTRODUCTION excluding heart attack of the chest pain
Acute myocardial infarction, someone is suffering from. This will lead
commonly referred to as Heart Attack is to early prediction of heart attack leading
the most common cause for sudden deaths to early presentation to and evaluation by a
in city and village areas. Detecting heart doctor and early treatment. Chest pain is
attack on time is of paramount importance the most common and significant symptom
as delay in predicting may lead to severe of a heart attack, although, some other
damage to heart muscle, called features are also liable to have heart attack.
myocardium leading to morbidities and In this era, modern medical science has
mortalities. Even after having severe and
been enriched with many modern
unbearable chest pain, the person may
technology and biological equipment that
neglect to go to a doctor due to several
reasons including his professional reasons, reduce the overall mortality rate greatly.
personal reasons or just overconfidence But cardiovascular disease (CVD), cancer,
that they how they can have heart attack. chronic respiratory disease and diabetes
Many times, people do not realize that the are becoming fatal at an alarming rate.
chest pain they are suffering from may be Predicting heart attack on time is of
a heart attack and lead to death as they are paramount importance as delay in
not educated on the subject. detecting may lead to severe damage to
heart muscle, called myocardium leading
When mobile phone is one of the most to morbidities and mortalities. Acute
widely used technology nowadays, myocardial infarction occurs when there is
developing an application to predict the a sudden, complete blockage of a coronary
episode of heart attack will yield artery that supplies blood to an area of
productive results in diagnosing of

heart also known as Heart Attack. A pathological test, namely, Blood test, ECG
blockage can develop due to a buildup of and analysis by experienced pathologists.
plaque, a substance mostly made of fat, As it involves human judgment of several
cholesterol and cellular waste products. factors and a combination of experiences,
Due to an insufficient blood supply, some a decision support system is desirable in
of the heart muscles begin to die. Without this case. The proposed problem statement
early medical treatment this damage can is ―Risk Assessment in Heart Attack using
machine learning
be permanent.
Motivation
Medical sector is rich with information
but the major issues with medical data Acute myocardial infarction, commonly
mining are their volume and complexity, referred to as Heart Attack is the most
poor mathematical categorization and common cause for sudden deaths in city
canonical form. We have used advanced and village areas. It is one the most
data mining techniques to discover dangerous disease among men and women
knowledge from the collected medical and early identification and treatment is
datasets. Reducing the delay time between the best available option for the people.
onset of a heart attack and seeking
treatment is a major issue. Individuals who 2. RELATED WORK
are busy in their homes or offices with Nearest neighbor (KNN) is very
their regular works and rural people simple, most popular, highly efficient and
having no knowledge on the symptoms of effective technique for pattern recognition.
heart attack may neglect the chest KNN is a straight forward classifier, where
discomfort. They may not have exact parts are classified based on the class of
intention to neglect it but they may pass on their nearest neighbor. Medical data bases
the time and decided to go to a doctor or are big volume in nature. If the data set
hospital after a while. But for heart attack, contains excessive and irrelevant
time matters most. There are many Mobile attributes, classification may create less
Health (Health) tools available to the accurate result. Heart disease is the best
consumer in the cause of death in INDIA. In Andhra
Pradesh heart disease was the best cause of
mortality accounting for 32%of all deaths,
prevention of CVD such as self monitoring a rate as high as Canada (35%) and USA.
mobile apps. Current science shows the Hence there is a need to define a decision
evidence on the use of the vast array of support system that helps clinicians to take
mobile devices such as use of mobile precautionary steps. In this work proposed
phones for communication and feedback, a new technique which combines KNN
Smartphone apps. As medical diagnosis of with genetic technique for effective
heart attack is important but complicated classification. Genetic technique perform
and costly task, we will proposed a system global search in complex large and
for medical diagnosis that would enhance multimodal landscapes and provide
medical care and reduce cost. Our aim is to optimal solution [1].
provide a ubiquitous service that is both
feasible, sustainable and which also make This work focuses a new approach for
people to assess their risk for heart attack applying association rules in the Medical
at that point of time or later Domain to discover Heart Disease
Prediction. The health care industry
Problem Statement collects huge amount of health care data
which, unfortunately are not mined to
Reliable identification and classification
discover hidden information for effective
of cardiovascular diseases requires
decision making. Discovery of hidden
patterns and relationships often goes increase in height of the 'R' wave or
unexploited. Data mining techniques can changes in the measurement of the 'R-R'
help remedy this situation. Data mining denote various anomalies of human heart.
have found numerous applications in Similarly 'P-P', 'Q-Q', 'S-S', 'T-T' also
Business and Scientific domains. corresponds to various anomalies of heart
Association rules, classification, clustering and their peak amplitude also envisages
are majorareas of interest in data mining. other cardiac diseases. In this proposed
[2]. method the 'PQRST' peaks are marked and
stored over the entire signal and the time
This work has analyzed prediction interval between two consecutive 'R' peaks
systems for Heart disease using more and other peaks interval are measured to
number of input attributes. The work uses find anomalies in behavior of heart, if any
medical terms such as sex, blood pressure, [5].
cholesterol like 13 attributes to predict the
likelihood of patient getting a Heart The ECG signal is well known for
disease. Until now, 13 attributes are used its nonlinear changing behavior and a key
for prediction. This research work added characteristic that is utilized in this
two more attributes i.e. obesity and research; the nonlinear component of its
smoking. The data mining classification dynamics changes more automatically
algorithms, namely Decision Trees, Naive between normal and abnormal conditions
Bayes, and Neural Networks are analyzed than does the linear one. As the higher-
on Heart disease database [3]. order statistics (HOS) maintain phase
Medical Diagnosis Systems play information, this work makes use of one-
important role in medical practice and are dimensional slices from the higher-order
used by medical practitioners for diagnosis spectral region of normal and ischemic
and treatment. In this work, a medical subjects. A feed forward multilayer neural
diagnosis system is defined for predicting network (NN) with error back propagation
the risk of cardiovascular disease. This (BP) learning technique was used as an
system is built by combining the relative automated ECG classifier to find the
advantages of genetic technique and neural possibility of recognizing ischemic heart
network. Multilayered feed forward neural disease from normal ECG signals [6].
networks are particularly adapted to
complex classification problems. The Automatic ECG classification is a
weights of the neural network are showing tool for the cardiologists in
determined using genetic technique medical diagnosis for effective treatments.
because it finds acceptably good set of In this work, propose efficient techniques
weights in less number of iterations [4]. to automatically classify the ECG signals
into normal and arrhythmia affected
A wide range of heart condition is (abnormal) parts. For these categories
defined by thorough examination of the morphological features are extracted to
features of the ECG report. Automatic illustrate the ECG signal. Probabilistic
extraction of time plane features is neural network (PNN) is the modeling
valuable for identification of vital cardiac technique added to capture the distribution
diseases. This work presents a multi- of the feature vectors for classification and
resolution wavelet transform based system the performance is calculated. ECG time
for detection 'P', 'Q', 'R', 'S', 'T' peaks series signals in this work are bind from
complex from original ECG signal. 'R-R' MIT-BIH arrhythmia database [7].
time lapse is an important minutia of the
ECG signal that corresponds to the
heartbeat of the related person. Abrupt The heart diseases are the most
extensive induce for human dying. Every

year, 7.4 million deaths are attributed to

heart diseases (cardiac arrhythmia)
including 52% of deaths due to strokes and
47% deaths due to coronary heart diseases.
Hence identification of different heart
diseases in the primary
stages becomes very important for the

protection of cardiac related deaths. The
existing conventional ECG analysis
methods like, RR interval, Wavelet
transform with classification algorithms,
such as, Support Vector machine K-
Nearest Neighbor and Levenberg Fig: Proposed System Architecture
Marquardt Neural Network are used for
detection of cardiac arrhythmia Using 4. MATHEMATICAL MODEL
these techniques large number of features
are extracted but it will not identify exactly Mathematical equation in Naive-Bayes
the problem [8]. Classification:
It gives us a method to
3. PROPOSED SYSTEM calculate the conditional probability, i.e.,
the probability of an event based on
We will propose a novel Heart attack previous knowledge available on the
prediction mechanism is proposed which events. Here we will use this technique for
first learns deep features and then trains
heart disease prediction i.e. classification
these learned features. Experimental
results show the classifier outperforms all based on conditional probability. More
other classifiers when trained with all formally, Bayes' Theorem is stated as the
attributes and same training samples. It is following equation:
also demonstrated that the performance
improvement is statistically significant.
Prediction of heart attack using a low
population, high dimensional dataset is Let us understand the statement first and
challenging due to insufficient samples to then we will look at the proof of the
learn an accurate mapping among features statement. The components of the above
and class labels. Current literature usually statement are:
handles this task through handcrafted
feature creation and selection. Naïve P (A/B): Probability (conditional
baiyes is found to be able to identify the probability) of occurrence of event A
underlying structure of data compare to given the event B is true. I.e. the
other techniques. probability of heart check up attributes.
Proposed System Architecture P(A) and P(B): Probabilities of the

occurrence of event A and B respectively
which is the probability of heart check up
attributes
P(B/A): Probability of the occurrence of
event B given the event A is true. More
ever probability of heart check up

attributes to predict the actual heart education programs will decline in the
disease. heart disease mortality.
REFERENCES
5. ALGORITHM
Naive Bayes algorithm is the [1] Algorithm M.Akhil jabbar B.L Deekshatulua
algorithm that learns the probability of an Priti Chandra International ―Classification of
Heart Disease Using K- Nearest Neighbor and
object with certain features belonging to a Genetic Algorithm‖ Conference on
particular group/class. In short, it is a Computational Intelligence: Modeling
probabilistic classifier. Techniques and Applications (CIMTA) 2013.
The Naive Bayes algorithm is called [2] MA.Jabbar, B.L Deekshatulu, Priti Chandra,
―An evolutionary algorithm for heart disease
"naive" because it makes the assumption prediction‖CCIS,PP 378-389 , Springer(2012).
that the occurrence of a certain feature is [3] Chaitrali S Dangare ―Improved Study Of Heart
independent of the occurrence of other Disease Prediction System Using Data Mining
ClassificationTechniques‖, International
features. Here we classify the heart disease Journal Of Computer Applications, Vol.47,
based on heart check up attributes. No.10 (June 2012).
[4] Amma, N.G.B ―Cardio Vascular Disease
Naive Bayes or Bayes‘ Rule is the basis Prediction System using Genetic Algorithm‖,
for many machine learning and data IEEE International Conference on Computing,
mining methods. The rule (algorithm) is Communication and Applications, 2012.
used to create models with predictive [5] Sayantan Mukhopadhyay1 , Shouvik Biswas2 ,
capabilities. It provides new ways of Anamitra Bardhan Roy3 , Nilanjan Dey4‘
Wavelet Based QRS Complex Detection of
exploring and understanding data. Why to ECG Signal‘ International Journal of
prefer naive bayes implementation: Engineering Research and Applications
1)When the data is high. (IJERA) Vol. 2, Issue 3, May-Jun 2012,
2)When the attributes are independent of pp.2361-2365
each other. [6] Sahar H. El-Khafifand Mohamed A. El-
Brawany, ―Artificial Neural Network-Based
3) When we expect more efficient output, Automated ECG Signal Classifier‖, 29 May
as compared to other methods output. 2013.
Based on all these information and steps [7] M.Vijayavanan, V.Rathikarani, Dr. P.
we classify to predict the heart disease Dhanalakshmi, ―Automatic Classification of
depending on heart check up attributes. ECG Signal for Heart Disease Diagnosis using
morphological features‖. ISSN: 2229-3345
Vol. 5 No. 04 Apr 2014.
6. CONCLUSION [8] I. S. Siva Rao, T. Srinivasa Rao,
In this work we have presented a novel ―Performance Identification of Different Heart
approach for classifying heart disease. As Diseases Based On Neural Network
a way to validate the proposed method, we Classification‖. ISSN 0973-4562 Volume 11,
Number 6 (2016) pp 3859-3864.
will add the patient heart testing result [9] J. R. Quinlan, Induction of decision trees,
details to predict the type of heart disease Machine learning, vol. 1, no. 1, pp.81106,
using machine learning. Train data sets 1986.
taken from UCI repository. Our approach [10] J. Han, J. Pei, and M. Kamber, Data mining:
use naïve bayes technique which is a concepts and techniques. Elsevier,2011.
[11] I. H. Witten, E. Frank, M. A. Hall, and C. J.
competitive method for classification. This
Pal, Data Mining: Practical machine learning
prediction model helps the doctors in tools and techniques. Morgan Kaufmann,
efficient heart disease diagnosis process 2016.
with fewer attributes. Heart disease is the [12] L. Breiman, Random forests, Machine
most common contributor of mortality in learning, vol. 45, no. 1, pp. 532, 2001.
India and in Andhra Pradesh. [13] Mullasari AS, Balaji P, Khando T." Managing
complications in acute myocardial infarction."
Identification of major risk factors and J Assoc Physicians India. 2011 Dec; 59
developing decision support system, and Suppl(1): 43-8.
effective control measures and health

[14] C. Alexander and L. Wang, Big data analytics [15] Wallis JW. Use of arti_cial intelligence in
in heart attack prediction,J Nurs Care, vol. 6, cardiac imaging. J Nucl Med. 2001 Aug;
no. 393, pp. 21671168, 2017. 42(8): 1192-4.

Textual Content Moderation using Supervised Machine

Learning Approach
Revati Ganorkar1, Gaurang Suki2, Shubham Deshpande3, Mayur Giri4, Araddhana
Deshmukh5
1,2,3,4
Student, Department of Computer Science, Smt. Kashibai Navale College of Engineering., Pune
5Aarhus University, Herning, Denmark
revatisganorkar@gmail.com, gaurang210498@gmail.com, shubhamdeshpande123@gmail.com,
mgiri6612@gmail.com, , aaradhna.deshmukh@gmail.com5
ABSTRACT
By the increasing use of Social Networking Sites, a huge amount of data is generated
on daily basis. This data contains a plethora of hate speech and offensive content
which makes a negative impact on society. Various tech giants such as Facebook[1]
and Microsoft have been using manual content moderation techniques on their website.
But even this has a negative effect on content moderators reviewing content across the
world. In order to tackle this issue, we have proposed an efficient automated textual
content moderation technique which uses supervised machine learning approach.
KEYWORDS
Social Networking Sites, content moderation, hate speech, offensive words, text
classification
1. INTRODUCTION the contents they publish to avoid judicial
Social Networking Sites have gained a claims. This work proposes the use of
considerable amount of popularity in automatic textual classification techniques
recent years. It has totally changed to identify and only allow to go online
people‘s way of communication and harmless textual posts and other content.
sharing of information.
Different sites use different methods to
People use different means for moderated the textual content. SNS like
communication (Example: text messages, Facebook[1], Twitter[2] manually
images, audio clips, video clips, etc) This moderate the content whereas Linkedin[3]
information shared on social networking automatically removes the content after
sites may contain some data which might reported by a certain number of users. But,
be offensive to some people. Also, the manual moderation of content requires
shared media may contain some illegal manpower and the moderators have to go
information which can spread the wrong through a lot of mental stress while
message in the society. moderating the data. Some of the cases
where moderators suffered from extreme
In [4], authors have observed that the stress are discussed here. In [5] content
increase in the use of social media and moderators alleged Facebook[1] that it
Web 2.0 are daily drawing more people to failed to keep its moderators safe as they
participate and express their point of views developed post-traumatic stress and
about a variety of subjects. However, there psychological trauma from viewing
are a huge number of comments which are graphic images and videos.
offensive and sometimes non-politically
correct and so must be hindered from In another incident [6], two employees at
coming up online. This is pushing the Microsoft filed a lawsuit against Microsoft
services providers to be more careful with as they were forced to view content that
inhumane which led to severe post- integrated with PHP and Wamp. Initially,
traumatic stress disorder. Thus, manual for the testing purpose, a Phase one
moderation of abusive content is malicious development is being established on
for the person moderating the content as it localhost.
causes harmful effects on them.
In [9], various techniques applied
Therefore there is a need for an efficient regarding with data processing, such as
technique to monitor hate speeches and weighting of terms and the dimensionality
offensive words on social networking reduction. All these techniques were
sites. studied in order to model algorithms to be
able to mimic well the human decisions
2. LITERATURE SURVEY regarding the comments. The results
In [7], the paper includes moderation of indicate the ability to mimic experts
multimodal subtleties such as images or decision on 96.78% in the data set used.
text. The authors develop a deep learning The classifiers used for comparison of the
classifier that jointly models textual and results were the K-Nearest Neighbors and
visual characteristics of pro-eating the Covalent Bond Classification. For
disorder content that violates community dimensionality reduction, techniques for
guidelines. For analysis, they used a the extraction of terms were also used to
million photos, posts from Tumblr. The best characterize the categories within the
classifier discovers deviant content data set.
efficiently while also maintaining high
recall (85%). They also discuss how As SNSs have become of paramount
automation might impact community relevance nowadays, many people refuse
moderation and the ethical and social to participate in or join them because of
obligations of this area. how easy it is to publish and spread
content that might be considered offensive.
In [8], the proposed system is designed for In [4], the approach accurately identifies
open source operating system windows or inappropriate content based on accusers‘
Linux. The implementation of this system reputations. Analysis of reporting systems
is based on PHP framework. MySQL to assess content as harmless or offensive
database is used for storing the datasets by in SNSs.
configuring the LAMP server in Ubuntu
and WAMP server in windows. Also the 3. GAP ANALYSIS
configuration of PHPMyAdmin. Ubuntu Not all the data generated from SNS can
helps to perform various tasks such as be considered as normal. It contains a
creating, modifying or deleting databases considerate amount of data that can be
with the use of a web browser. Dream considered as offensive and hateful.
viewer is being used for the system Manual content moderation is effective but
development. For recommendation requires a considerate amount of
generation, latest version of Apache is manpower and sometimes it can be
used. To configure Wamp with windows traumatic for humans to examine such
environment the integration of Wamp inappropriate content. Hence, in recent
server in windows is done. To make the days some organizations have come up
Web environment scalable it is being with effective techniques which can be

used for filtering inappropriate content. coupled with supervised classification

Following table summarizes all the learning. Using the association between
different techniques used by different these two methods, the model for offensive
organizations to rectify this illegal data. and hateful text detection is proposed. The
proposed model is designed to achieve
Reporting Automatic vs human more efficiency in illegal text
Systems intervention classification performance.
The main aim of the proposed model is to

Udd Hate reports are
automatically filtered eliminate the need for manual content
moderation. This can be effectively
achieved by utilizing techniques of natural
Work, Blue Content withdrawn
and hoffman depending on owner‘s language processing and machine learning
reputation that when trained with appropriate data,
predicts a nearly accurate outcome.
Facebook Manual review of content The proposed model is composed of the
on social media following core components as shown in
Figure 1.
Linkedin Automated withdrawal
after reported by fixed no. 1. Natural Language Processing:- It
of user. is responsible of taking textual data as
input and apply series of natural language
Twitter Manual review of content processing techniques so that it can be
on social media and also processed by text classifier. Here,
uses automated data. sentences are filtered and converted into a
vector of numbers.
Table 1: Content Moderation 2. Training:- Twitter corpus is given
Techniques to Natural Language Processing
component which converts it into a set of
Most of the organizations manually
vectors. These vectors and pre-assigned
monitor the content. Because of this
labels are used for construction and
people are exposed to offensive content
training of the classifier model. The model
which sometimes can be hostile for the
obtained is then improved with parameter
person monitoring the data and can cause
tuning. The parameter tuning method used
mental stress. There is a need for a system
here is 10-fold cross-validation.
that will automatically monitor offensive
3. Classifier model:- During training,
content and reduce the manual workload.
classifier model is constructed from the
Thus, we are proposing a system which
vectorized sentences prepared by Natural
will automatically monitor SNS for malign
Language Processing component and label
content with the help of machine learning.
(Offensive/Normal) which are already
4. PROPOSED SYSTEM present in the dataset. Further, this trained
Automatic content moderation can be classifier model is used for predicting a
achieved with the help of traditional given sentence whether it‘s offensive or
natural language processing techniques not. Classifier predicts the outcome

accurately and precisely. For this purpose, classification performance.

2 algorithms are compared for their
Figure 1: Proposed Architecture
Tweets contain unnecessary data such as The vectorization algorithm used in the
stop words, emojis, usernames. This kind proposed model is TF-IDF vectorization.
of data does not contribute much in the The reason to choose this particular
classification and hence, we need to filter vectorization technique is that the dataset
out this data as well as normalize it into a used for the experimentation a contains
suitable format so that it can be used for large number of tweets containing
training the classifier for classifying the offensive words which dominate the small
unknown text data. An Individual tweet is number of regular tweets. As TF-IDF
taken and is then tokenized into words. assigns the score depending upon the
These tokens are then used to determine occurrence of a term in a document, this
unnecessary data such as emoji and seems to be the best choice.
usernames. Furthermore, unnecessary
symbols and stopwords are removed in The classifier model is then trained on a
order to reduce the data volume. collection of pairs containing vectorized
tweets and whether they are offensive or
The main task is to normalize the data. not. Supervised classification is used in
Hence the aim is to infer the grammar this proposed system is able to then learn
independent representation of a given from these tweets and can classify a new
tweet. Lemmatization is used to find out tweet.
the the lemma of each token. After this, all
the filtered tokens for one tweet are After training, a new tweet is given to the
collected together for further processing. model, it will repeat all the above steps

except training the model. After going

through these steps, vectorized
representation of a sentence is obtained. Where
This vectorized representation is then
: predicted outcome
given to previously trained classifier
model as input and it classifies the tweet C : classifier function
depending on its content.
5. MATHEMATICAL MODEL
The Proposed model can be represented in Here, we used 2 classifier models.
mathematical model as follows - (Bernoulli Naive Bayes and Bagged SVM)
for performance comparison
Term frequency inverse document
frequency (TF-IDF) of words in given
corpus is calculated by
1.) Naive Bayes -
argmax(
...(1)
Where,
)
t - terms
a - individual document
2.) Bagged Support Vector
D - collection of document Machines -
As given in [12], Support Vector Machines
tf - term frequency i.e. number of
can be bagged as
times words appear in each
document
idf - inverse document frequency

calculated by
where,
Hm : Sequence of classifiers
m : 1,....,M
M : Number of classifiers in
bagging
Using (1) all equation are vectorized.
: Learning parameter
Let Vi represent vectorized sentence i, then 6. RESULT & DISCUSSION

general classifier is represented using We used dataset developed by [10] and
further modified it to fit the needs for

classification of the proposed system. This comparison of various predictive metrics

dataset originally contained 3 categories: for 2 models which are used for the
training.
1)Normal tweets
Bernoulli
2)Offensive tweets
Results Naive Bayes' Bagged SVM
3)Tweets containing hate speech
Accurac
y 0.9292543021 0.9492245592
Only 2 categories are used for the
Precision 0.9439205955 0.9700460829
experimentation:- Normal tweets and
effensive tweets. Hate speech which also
Recall 0.9726412682 0.968805932
contained offensive tweets are filtered and
are treated as offensive tweet only. 0.9580657348 0.9694256108
F1-Score
The proposed model is implemented in
Table 2: Performance metrics for Bernoulli Naive
Scikit-learn library[11] in order to obtain Bayes‘ and Bagged SVM
results. Following table shows the
Figure 2 : Bar chart for different metric comparison between the two models
From Figure 2, it can be inferred that reduced using the proposed system.
both models yield almost same accuracy Currently, the proposed system is for
but by considering other metrics, Bagged textual data but in the future, this can be
SVM performs better than Bernoulli extended to images, videos, and audio.
Naive Bayes‘. Further, an efficient model with higher
efficiency can be used to classify text
7. FUTURE WORK data more effectively. Additionally, the
Traditionally content moderation is done algorithm to find out what is wrong with
manually. This manual work can be the content can also be designed. Manual

Moderators will be less exposed to hate [7] Stevie Chancellor, Yannis Kalantidis, Jessica
speeches and offensive if such models A. Pater, Munmun De Choudhury, David A.
are implemented on large scale. Shamma. ‖Multimodal Classification of
Moderated Online Pro-Eating Disorder
8. CONCLUSION Content‖. In Proceedings of the 2017 CHI
This system mainly focuses on Conference on Human Factors in Computing
Systems (Pg. 3213-3226) on ACM
categorizing text data in two categories
(2017,May).
namely offensive and normal. This will
help content moderators to review less
[8] Sanafarin Mulla, Avinash Palave,
offensive data. Content moderation ―Moderation Technique For Sexually Explicit
process will be automated by the use of a Content‖. In 2016 International Conference
machine learning technique. on Automatic Control and Dynamic
Optimization Techniques (ICACDOT) at
REFERENCE International Institute of Information
Technology (I2IT), Pune (2016,September).
[1] Facebook-https://www.facebook.com/
[Access Date: 19 Dec 2018]. [9] Félix Gómez Mármol,Manuel Gil Pérez
,Gregorio Martínez Pérez. ―Reporting
[2] Twitter-https://twitter.com/ Offensive Content in Social Networks:
[Access Date: 19 Dec 2018]. Toward a Reputation-Based Assessment
Approach‖. In IEEE Internet Computing
Volume 18 , Issue 2 , Mar.-Apr. 2014.
[3] LinkedIn-https://in.linkedin.com/
[Access Date: 19 Dec 2018].
[10] Davidson, Thomas and Warmsley, Dana and
Macy, Michael and Weber, Ingmar.
[4] Marcos Rodrigues Saúde, Marcelo de ‖Automated Hate Speech Detection and the
Medeiros Soares, Henrique Gomes Basoni, Problem of Offensive Language‖. In
Patrick Marques Ciarelli, Elias Oliveira. ―A proceedings of the 11th International AAAI
Strategy for Automatic Moderation of a Large Conference on Web and Social Media 2017,
Data Set of Users Comments‖. In 2014 XL (Pg. 512-515).
Latin American Computing Conference
(CLEI) (2014,September).
[11] Scikit-learn: A module for machine learning.
[5] Facebook's 7,500 Moderators Protect You https://scikit-learn.org [Access Date: 19 Dec
From the Internet's Most Horrifying Content. 018].
But Who's Protecting Them.
https://www.inc.com/christine-
[12] Kristína Machová, František Barčák, Peter
lagorio/facebook-content-moderator-
Bednár, ―A Bagging Method Using Decision
lawsuit.html [Access Date: 19 Dec 2018].
Trees in the Role of Base Classifiers‖ in Acta
Polytechnica Hungarica, Vol.3, No.2, 2006,
[6] Moderators who had to view child abuse 121-132, ISSN 1785-8860.
content sue Microsoft, claiming PTSD.
https://www.theguardian.com/technology/201
7/jan/11/microsoft-employees-child-abuse-
lawsuit-ptsd [Access Date: 19 Dec 2018].

SURVEY PAPER ON LOCATION

RECOMMENDATION USING SCALABLE
CONTENT-AWARE COLLABORATIVE
FILTERING AND SOCIAL NETWORKING SITES
Prof. Pramod P. Patil, Ajinkya Awati, Deepak Patil, Rohan Shingate, Akshay More
Smt. Kashibai Navale College of Engineering,Pune
pramod.patil@sinhgad.edu , ajawati1996@gmail.com, deepakpatil319@gmail.com,
akshaymore3117@gmail.com, rohandshingate@gmail.com
ABSTRACT
The location recommendation plays an essential role in helping people find interesting
places. Although recent research has he has studied how to advise places with social
and geographical information, some of which have dealt with the problem of starting
the new cold users. Because mobility records are often shared on social networks,
semantic information can be used to address this challenge. There the typical method is
to place them in collaborative content-based filters based on explicit comments, but
require a negative design samples for a better learning performance, since the negative
user preference is not observable in human mobility. However, previous studies have
demonstrated empirically that sampling-based methods do not work well. To this end,
we propose a system based on implicit scalable comments Content-based collaborative
filtering framework (ICCF) to incorporate semantic content and avoid negative
sampling. We then develop an efficient optimization algorithm, scaling in a linear
fashion with the dimensions of the data and the dimensions of the features, and in a
quadratic way with the dimension of latent space. We also establish its relationship with
the factorization of the plate matrix plating. Finally, we evaluated ICCF with a large-
scale LBSN data set in which users have text and content profiles. The results show
that ICCF surpasses many competitors’ baselines and that user information is not only
effective for improving recommendations, but also for managing cold boot scenarios.
Keywords- Content-aware, implicit feedback, Location recommendation, social
network, weighted matrix factorization.
1. INTRODUCTION and using the profile to calculate the
As we think about the title of this paper is similarity with the new elements. We
related to Recommender System which is recommend location that are more similar
part of the Data mining technique. to the user's profile. Recommender
Recommendation systems use different systems, on the other hand, ignore the
technologies, but they can be classified properties of the articles and base their
into two categories: collaborative and recommendations on community
content-based filtering systems. Content- preferences. They recommend the
based systems examine the properties of elements that users with similar tastes and
articles and recommend articles similar to preferences have liked in the past. Two
those that the user has preferred in the users are considered similar if they have
past. They model the taste of a user by many elements in common.
building a user profile based on the One of the main problems of
properties of the elements that users like recommendation systems is the problem of

cold start, i.e. when a new article or user is configuration, with a classification, to the
introduced into the system. In this study preference trust model. This sparse
we focused on the problem of producing weighing and weighting configuration not
effective recommendations for new only assigns a large amount of confidence
articles: the cold starting article. to the visited and unvisited positions, but
Collaborative filtering systems suffer from also includes three different weighting
this problem because they depend on schemes previously developed for
previous user ratings. Content-based
locations.
approaches, on the other hand, can still
produce recommendations using article A.Motivation
descriptions and are the default solution
for cold-starting the article. However, they  In introductory part for the study of
tend to get less accuracy and, in practice, recommendation system, their
are rarely the only option. application, which algorithm used for
that and the different types of model, I
The problem of cold start of the
decided to work on the
article is of great practical importance Recommendation application which is
Portability due to two main reasons. First, used for e-commerce, online shopping,
modern online the platforms have location recommendation, product
hundreds of new articles every day and recommendation lot of work done on
actively recommending them is essential to that application and that the technique
keep users continuously busy. Second, used for that application is
collaborative filtering methods are at the Recommendation system using
core of most recommendation engines traditional data mining algorithms.
since then tend to achieve the accuracy of
 Approaches to the state of the art to
the state of the art. However, to produce generate recommendations only
recommendations with the predicted positive evaluations are often based on
accuracy that require that items be the content aware collaborative
qualified by a sufficient number of users. filtering algorithm. However, they
Therefore, it is essential for any suffer from low accuracy.
collaborative adviser to reach this state as
soon as possible. Having methods that 2. RELATED WORK
producing precise recommendations for Shuhui Jiang, Xueming Qian *,
Member, IEEE, Tao Mei, Senior
new articles will allow enough comments
Member, IEEE and Yun Fu, Senior
to be collected in a short period of time, Member, IEEE‖ describe the
Make effective recommendations on Personalized Travel Sequence
collaboration possible. Recommendation on Multi-Source Big
Social Media In this paper, we proposed a
In this paper, we focus on providing personalized travel sequence
location recommendations novel scalable recommendation system by learning
Implicit-feedback based Content-aware topical package model from big multi-
Collaborative Filtering (ICCF) framework. source social media: travelogues And
community-contributed photos. The
Avoid sampling negative positions by
advantages of our work are 1) the system
considering all positions not visited as automatically mined user‘s and routes‘
negative and proposing a low weight travel topical preferences including the

topical interest, Cost, time and season, 2) algorithm best tailored to implicit
we recommended not only POIs but also feedback, And developed a scalable
travel sequence, considering both the optimization algorithm for jointly learning
popularity and user‘s travel preferences at latent factors and hyper parameters [5].
the same time. We mined and ranked
famous routes based on the similarity E. X. He, H. Zhang, M.-Y. Kan, and T.-
between user package and route package S. Chua describe the ―Fast matrix
[1]. factorization for online recommendation
with implicit feedback,‖ We study the
Shuyao Qi, Dingming Wu, and Nikos
problem of learning MF models from
Mamoulis describe that ,‖ Location Aware implicit feedback. In contrast to previous
Keyword Query Suggestion Based on work that applied a uniform weight on
Document Proximity‖ In this paper, we missing data, we propose to weight
proposed an LKS framework providing Missing data based on the popularity of
keyword suggestions that are relevant to items. To address the key efficiency
the user information needs and at the same challenge in optimization, we develop a
time can retrieve relevant documents Near new learning algorithm which effectively
the user location [2]. learns Parameters by performing
coordinate descent with memorization [6].
X. Liu, Y. Liu, and X. Li describe the
―Exploring the context of locations for F. Yuan, G. Guo, J. M. Jose, L. Chen, H.
personalized Location recommendations‖. Yu, and W. Zhang, describe the
In this paper, we decouple the process of ―Lambdafm: learning optimal ranking with
jointly learning latent representations of factorization machines using lambda
users and locations into two separated surrogates‖ In this paper, we have
components: learning location latent presented a novel ranking predictor
representations using the Skip-gram Lambda Factorization Machines.
model, and learning user latent Inheriting advantages from both LtR and
representations Using C-WARP loss [3]. FM, LambdaFM (i) is capable of
optimizing various top-N item ranking
H. Li, R. Hong, D. Lian, Z. Wu, M. metrics in implicit feedback settings; (ii) is
Wang, and Y. Ge describe the ―A relaxed very exible to incorporate context
ranking-based factor model for information for context-aware
recommender system from implicit recommendations [7].
feedback,‖ in this paper, we propose a
relaxed ranking-based algorithm for item Yiding Liu1 TuanAnh Nguyen Pham2
recommendation with implicit feedback, Gao Cong3 Quan Yuan describe the An
and design a smooth and scalable Experimental Evaluation of Pointofinterest
optimization method for model‘s Recommendation in Locationbased Social
parameter Estimation [4]. Networks-2017 In this paper, we provide
an all around Evaluation of 12 state-of-the-
D. Lian, Y. Ge, N. J. Yuan, X. Xie, and art POI recommendation models. From the
H. Xiong describe the ―Sparse Bayesian evaluation, we obtain several important
collaborative filtering for implicit findings, based on which we can better
feedback,‖ In this paper, we proposed a understand and utilize POI
sparse Bayesian collaborative filtering

recommendation Models in various recommendation recommends the

scenarios [8]. locations and routes by mining user‘s
travel records. The most famous method is
Salman Salamatian_, Amy Zhangy, location-based matrix factorization. To
Flavio du Pin Calmon_, Sandilya similar social users are measured based on
Bhamidipatiz, Nadia Fawazz, Branislav the location co-occurrence of previously
Kvetonx, Pedro Oliveira{, Nina Taftk visited locations. Recently, static topic
describe the ―Managing your Private and model is employed to model travel
Public Data: Bringing down Inference preferences by extracting travel topics
Attacks against your Privacy‖ In this from past traveling behaviours which can
paper, they propose an ML framework for contribute to similar user identification.
content-aware collaborative filtering from However, the travel preferences are not
implicit feedback datasets, and develop obtained accurately, because all travel
coordinate descent for efficient and histories of a user as one document drawn
Effective parameter learning [9]. from a set of static topics, which ignores
Zhiwen Yu, Huang Xu, Zhe Yang, and the evolutions of topics and travel
Bin Guo describe the ―Personalized Travel preferences.
Package With Multi-Point-of-Interest As my point of view when I studied the
Recommendation Based on Crowdsourced papers the issues are related to
User Footprints‖ In this paper, we propose recommendation systems. The challenge is
an approach for personalized travel to addressing cold start problem from
package recommendation to help users implicit feedback is based on the detection
make travel Plans. The approach utilizes of recommendation between users and
data collected from LBSNs to model users location with similar preference.
and locations, and it determines users‘
preferred destinations using collaborative 4. PROPOSED SYSTEM
Filtering approaches. Recommendations As I studied then I want to propose
are generated by jointly considering user content aware collaborative filtering and
preference and spatiotemporal constraints. baseline algorithm, firstly find nearby
A heuristic search-based travel route locations i.e. places, hotels and then to
planning algorithm was designed to recommend to user based on implicit
generate Travel packages [10]. feedback and achieve the high accuracy
and also remove cold-start problem in
3. EXISTING SYSTEM
recommendation system.
Lot of work has been done in this field
because of its extensive usage and In this system, particular Recommendation
applications. In this section, some of the of places for new users. A general solution
approaches which have been implemented is to integrate collaborative filtering with
to achieve the same purpose are content based filtering from this point of
mentioned. These works are majorly
view of research, some popular. Content-
differentiated by the algorithm for
based collaboration filtering frameworks,
recommendation systems.
In another research, general location have been recently Proposed, but designed
on the basis of explicit feedback with
route planning cannot well meet users‘
favourite samples both positively and
personal requirements. Personalized

negatively. Such as Only the preferred oriented to the element Scheme, and that
samples are implicitly provided in a the sparse configuration and rank one
positive way. Feedback data while it is not significantly improves the performance of
practical to treat all unvisited locations as the recommendation.
negative, feeding the data on mobility REFERENCES
together. With user information and [1] Shuhui Jiang, Xueming Qian *, Member,
location in these explicit comments IEEE, Tao Mei, Senior Member, IEEE and
Frames require pseudo-negative drawings. Yun Fu, Senior Member, IEEE‖ Personalized
Travel Sequence Recommendation on Multi-
From places not visited. The samples and Source Big Social Media‖ Transactions on Big
the lack of different levels of trust cannot Data IEEE TRANSACTIONS ON BIG
allow them to get the comparable top-k DATA, VOL. X, NO. X,
recommendation. [2] Shuyao Qi, Dingming Wu, and Nikos
Mamoulis,‖ Location Aware Keyword Query
5. System Architecture: Suggestion Based on Document Proximity‖
VOL. 28, NO. 1, JANUARY 2016.
[3] X. Liu, Y. Liu, and X. Li, ―Exploring the
context of locations for personalized Location
recommendations,‖ in Proceedings of
IJCAI‘16. AAAI, 2016.
[4] H. Li, R. Hong, D. Lian, Z. Wu, M. Wang, and
Y. Ge, ―A relaxed ranking-based factor model
for recommender system from implicit
feedback,‖ in Proceedings of IJCAI‘16, 2016,
pp. 1683–1689.
[5] D. Lian, Y. Ge, N. J. Yuan, X. Xie, and H.
Xiong, ―Sparse Bayesian collaborative filtering
Fig. System Architecture for implicit feedback,‖ in Proceedings of
IJCAI‘16. AAAI, 2016.
6. CONCLUSION [6] X. He, H. Zhang, M.-Y. Kan, and T.-S. Chua,
In this Paper, we propose an ICCF ―Fast matrix factorization for online
recommendation with implicit feedback,‖ in
framework for collaborative filtering based Proceedings of SIGIR‘16, vol. 16, 2016.
on content based on implicit feedback set [7] Yuan, G. Guo, J. M. Jose, L. Chen, H. Yu, and
of data and develop the coordinates of the W. Zhang, ―Lambdafm: learning optimal
offspring for effective learning of ranking with factorization machines using
parameters. We establish the close lambda surrogates,‖ in Proceedings of the 25th
ACM International on Conference on
relationship of ICCF with matrix graphical Information and Knowledge Management.
factorization and shows that user functions ACM, 2016, pp. 227–236.
really improve mobility Similarity [8] Yiding Liu1 TuanAnh Nguyen Pham2 Gao
between users. So we apply ICCF for the Cong3 Quan Yuan,‖ An Experimental
Location recommendation on a large-scale Evaluation of Pointofinterest Recommendation
in Locationbased Social Networks-2017‖.
LBSN data set. our the results of the [9] Salman Salamatian_, Amy Zhangy, Flavio du
experiment indicate that ICCF is greater Pin Calmon_, Sandilya Bhamidipatiz, Nadia
than five competing baselines, including Fawazz, Branislav Kvetonx, Pedro Oliveira{,
two leading positions recommendation and Nina Taftk ―Managing your Private and Public
factoring algorithms based on the ranking Data: Bringing down Inference Attacks against
your Privacy‖ 2015.
machine. When comparing different [10] Zhiwen Yu, Huang Xu, Zhe Yang, and Bin
weighting schemes for negative preference Guo ―Personalized Travel PackageWith Multi-
of the unvisited places, we observe that the Point-of-Interest Recommendation Based on
user-oriented scheme is superior to that Crowdsourced User Footprints‖ 2016

Anonymous Schedule Generation Using Genetic

Algorithm
Adep Vaishnavi Anil1, Berad Rituja Shivaji2, Myana Vaishnavi Dnyaneshwar3, Pawar
Ashwini Janardhan4
1,2,3,4
Computer Engineering, SCSMCOE,Nepti, Ahmednagar, India
vaishnaviadep22@gmail.com, beradrituja16@gmail.com, vaishnavimyana22@gmail.com,
makoneashwini4@gmail.com
ABSTRACT
In this proposed system, a genetic algorithm is applied to automatic schedule
generation system to generate course timetable that best suit student and teachers
needs. Preparing schedule in colleges and institutes is very difficult task for satisfying
different constraints. Conventional process of scheduling is very basic process of
generating schedule for any educational organization .This study develop a practical
system for generation of schedule .By taking complicated constraints in consideration
to avoid conflicts in schedule. Conflicts means that generate problem after allocation of
time slots.
Keywords
Genetic Algorithm (GA), Constraints, Chromosomes, Genetic Operators.
1. INTRODUCTION constraints include [1] Each time slot
Preparing timetable is most complicated should be scheduled to a specified time
and conflicting process .The traditional .[2] Each teacher or student can be
way of generating timetable still have the allocated only one classroom at a time.[3]
error prone output, even if it is prepared All students must be fit into that particular
repeatedly for suitable output .The aim of allocated classroom. Some of the software
our application is to make the process constraints include [1] Both faculty and
simple easily understanding and efficient student should not unconnected timeslots
and also with less time requirements in timetable.[2] Classroom have limited
therefore there is a great need of this kind capacity.
of application in educational institute.
2. ALGORITHM
Timetable generating has been in most Step1: Partition the training set Tr into m
of the human requirements and it is most subsets through random sampling;
widely used for educational institutes like
schools, colleges and other institutes, Step2: Apply decision tree algorithm to
where we need planning of courses, each subsets S 2S m;
subjects and hours. In earlier days
timetable scheduling was a manual process Step3: Apply each included tree from
where a one person or the group of some step2 (Tree, Tree2 Tree m) to the test set T
peoples are involved in this process and e;
the create timetable with their hands,
which take more efforts and still the output Step4: Use fitness function to evaluate
is not appropriate. performance of all trees, and rank the trees
with their related subsets according to
The courses scheduling problem can be trees‘ performance;
specified as constraint satisfaction problem
(CSP). Constraints in the scheduling Step5: Perform GA operations:
process can be categories into two Selection: select the top (1 – c)m subsets
constraints Hardware Constraints and and keep them intact into next operation;
software Constraints. Common hardware

Crossover: for remaining cm 12 pairs, Fitness function is used to find the quality
perform two points crossover; of represented function. This function is
problem dependent. Infield of genetic
Mutation: randomly select mu subsets to algorithm design solution is represented as
perform mutation operation. Randomly a string it refers as chromosome .In the
replace one instance in selected subset by each phase of testing it delete the ‗n‘ worst
one instance randomly selected from the result or condition and create ‗n‘ new ones
original training data set. from the best design solution and the final
result is obtained from that solution.
Step6: New subsets are created from step5
as the next new generation, then replicates 3. PROPOSED SYSTEM
step2 to step6,until identify a subset and a In this proposed system is based on
related tree with ideal performance. customer centric strategy in designing of
scheduling system. Firstly a data mining
1. Input data:
algorithm is design for mining student
The first step in functioning of GA is preferences in different course selection
generation of an initial input data, each from historical data. Then based on
individual is evaluated and assigned a selection pattern obtain from mining
fitness value according to positive fitness algorithm for scheduling is designed,
function. which leads to develop an integrative,
automatic course scheduling system. This
2. Selection: system is not only help to increase the
student satisfaction of course scheduling
This operator select chromosome in data system result.
for reproduction. The better chromosome
to fit, the more times it is likely to be In this proposed system adopts the
selected to reproduce. user‘s perspective and applies different
types of techniques to an automatic
3. Crossover: scheduling and also considers teacher
preferences and student needs in their
It is a genetic operator is used to vary
schedule, so that final output fulfills the
coding of a chromosome from one
expectations at each and every users. This
generation to the next. In crossover
algorithm is used for exchanging course
process it takes one or more than one
that are given to the system as input, so as
parent solution and find the child solutions
to find optimal solution for timetabling
from the parent solution.
problem
4. Mutation:
4. SYSTEM ARCHITECTURE
In mutation solution may change from the Input data:
previous one solution. Mutation is the
1.Courses
process in which the data can be
interchange for the best solution. When the 2.Labs
given solution is not reliable or there is
conflicts are available then mutation and 3.Lectures
crossover techniques are very important. It
decides which result is best for given input 4.Sems
data.
5.Students
5. Fitness Function:

Fig : System Architecture

Output data: System constraints categories 5. ACKNOWLEDGMENTS
into two parts: We are thankful to Prof. Lagad J. U., Prof.
Tambe R. ,Prof. Pawar S.R. ,Prof. Jadhav
1. Hard constraint: H. ,Prof. Avhad P. Department of
Computer Engineering, Shri Chhatrapati
a. Each course should be scheduled to a
Shivaji Maharaj College Of Engineering.
specified time.
b. Each teacher or student can be allocated REFERENCES
[1] Meysam Shahvali Kohshori, Mohammad
only one class at a time. sanieeabadeh,Hedieh Sajedi ―A Fuzzy genetic
algorithm with local search for university course
c. All students assigned to particular timetabling problem‖, 2008 20th IEEE
assigned class must be able to fit into that International conference on tools with artificial
class. intelligence.
[2] Antariksha Bhaduri ―University Time Table
2. Soft constraint: Scheduling using Genetic Artifical Immune
Network‖2009 International conference on
a. Some of the soft constraint include advances in Recent Technologies in
faculty and student should not have Communication and Computing.
unconnected time slots in time table. [3] Sadaf N.Jat , Shengxiang Yang ―A mimetic
algorithm for university course timetabling
b. Classrooms have limited capacity. problem‖, 2008 20th IEEE International
Conference on tools with artificial intelligence.
c. Student should not have any free time [4] Mosaic Space Blog, ―The Practice and
between two classes on a day. Theory of automated
Timetabling‖PATAT2010,Mosaic Space
Blog,University and College planning and
management retrieved,from http://mosaic.com
/blog,2011,Last accessed date 21st January 2012 .
[5] Hitoshi kanoh, Yuusuke sakamoto ―Interactive
timetabling system using knowledge based

genetic algorithm‖ , 2004 IEEE International [10] AnujaChowdhary ―TIME TABLE

Conference on systems,man and cybernetics. GENERATION SYSTEM‖.Vol.3
[6] De Werra D., ―An introduction to timetabling‖, Issue.2,February-2014,pg. 410-414.
European Journal of Operations Research,Vol. [11] DilipDatta, Kalyanmoy Deb, Carlos M.
19,1985,pp. 151-162. Fonseca, ―Solving Class Timetiabling
[7] A. I. S. Even and A. Shamir., ―On the complexity Problem of IIT Kanpur using Multi-
of timetabling and multicommodity flow Objective Evaluatioary Algorithm‖ KanGal
problems.‖ SIAM Journal of Computation, pp. 2005.
691-703, 1976. [12] Melanie Mitchell, ―An Introduction To
[8] D. E. Goldberg, ―Genetic Algorithm in Search, Genetic Algorithm‖, A Bradford Book The
Optimization and Machine Learning‖ MIT Press, Fifth printing 1999.
.‖Hardcover‖, 1989. [13] M. Ayob and G. Jaradat, ―Hybrid ant colony
[9] L. Davis , ―Handbook of genetic algorithms‖. systems for cours timtabling problems,‖ in
―Van Nostrand Reinhold‖,1991. Data Mining and Optimization, 2009.DmO‘09.
2nd Confrence on, 2009,pp.120-126.

A Survey on Unsupervised Feature Learning Using a

Novel Non Symmetric Deep Autoencoder(NDAE) For
NIDPS Framework
Vinav Autkar1, Prof. P. R. Chandre2, Dr. Purnima Lala Mehta3
1,2
Department of Computer Engineering, Smt. Kashibai Navale College Of Engineering
Savitribai Phule Pune University, Pune
3
Vinavautkar@yahool.com1, Pankajchandre30@gmail.com2
Purnima.lala@gmail.com
ABSTRACT
Repetitive and in material highlights in data have caused a whole deal issue in system
traffic classification. Lately, one of the fundamental concentrations inside (Network
Intrusion Detection System) NIDS investigate has been the use of machine learning
and shallow learning strategies. This paper proposes a novel profound learning model
to empower NIDS activity inside present day systems. The model demonstrates a blend
of profound and shallow learning, prepared to do accurately investigating a wide-scope
of system traffic. The system approach proposes a Non-symmetric Deep Auto-Encoder
(NDAE) for unsupervised feature learning. Also, furthermore proposes novel profound
learning order show constructed using stacked NDAEs. Our proposed classifier has
been executed in Graphics preparing unit (GPU)- engaged TensorFlow and surveyed
using the benchmark utilizing KDD Cup '99 and NSL-KDD datasets. The execution
assessed organize interruption location examination datasets, especially KDD Cup 99
and NSL-KDD dataset. However the to cover-up the Limitation of KDD dataset in
proposed system WSN trace dataset has been used. The commitment work is to execute
interruption counteractive action framework (IPS) contains IDS usefulness however
progressively complex frameworks which are fit for making quick move so as to
forestall or diminish the vindictive conduct.
General Terms
Non Symmetric Deep Auto-Encoder, Restricted Boltzman Machine, Deep Belief
Network.
Keywords
Deep learning, Anomaly detection, Autoencoders, KDD, Network security
1. INTRODUCTION information, inside and out observing and

One of the real difficulties in system granularity required to enhance adequacy
security is the arrangement of a powerful and precision lastly the quantity of various
and successful Network Intrusion conventions and assorted variety of
Detection System (NIDS). Regardless of information crossing. The primary focus
the critical advances in NIDS innovation, on developing NIDS has been the use of
most of arrangements still work utilizing machine learning and shallow learning
less-able mark based strategies, rather than techniques. The underlying profound
irregularity recognition methods. The learning research has shown that its
present issues are the current systems unrivaled layer-wise element learning can
prompts ineffectual and wrong discovery better or possibly coordinate the execution
of assaults. There are three fundamental of shallow learning procedures. It is
confinements like, volume of system equipped for encouraging a more profound

examination of system information and new deep learning model for NIDPS for
quicker recognizable proof of any present day systems.
peculiarities. In this paper, we propose a
2. MOTIVATION machine health monitoring systems are
 A new NDAE technique for reviewed mainly from the following
unsupervised feature learning, which aspects: Auto-encoder (AE) and its
not like typical autoencoder variants, Restricted Boltzmann Machines
approaches provides non-symmetric and its variants including Deep Belief
data dimensionality reduction. Hence, Network (DBN) and Deep Boltzmann
our technique is able to ease improved Machines (DBM), Convolutional Neural
classification results when compared Networks (CNN) and Recurrent Neural
with leading methods such as Deep Networks (RNN). Advantages are: DL-
Belief Networks (DBNs). based MHMS do not require extensive
 A novel classifier model that utilizes human labor and expert knowledge. The
stacked NDAEs and the RF applications of deep learning models are
classification algorithm. By combining not restricted to specific kinds of
both deep and shallow learning machines. Disadvantages are: The
techniques to exploit their strengths performance of DL-based MHMS heavily
and decrease analytical overheads. We depends on the scale and quality of
should be able to get better results datasets.
from similar research, at the same time
Proposes the use of a stacked
as significantly reducing the training
denoising autoencoder (SdA), which is a
time.
deep learning algorithm, to establish an
3. REVIEW OF LITERATURE FDC model for simultaneous feature
The paper [1] focuses on deep extraction and classification. The SdA
learning methods which are inspired by the model [3] can identify global and invariant
structure depth of human brain learn from features in the sensor signals for fault
lower level characteristic to higher levels monitoring and is robust against
concept. It is because of abstraction from measurement noise. An SdA is consisting
multiple levels, the Deep Belief Network of denoising autoencoders that are stacked
(DBN) helps to learn functions which are layer by layer. This multilayered
mapping from input to the output. The architecture is capable of learning global
process of learning does not dependent on features from complex input data, such as
human-crafted features. DBN uses an multivariate time-series datasets and high-
unsupervised learning algorithm, a resolution images. Advantages are: SdA
Restricted Boltzmann Machine (RBM) for model is useful in real applications. The
each layer. Advantages are: Deep coding is SdA model proposes effectively learn
its ability to adapt to changing contexts normal and fault-related features from
concerning data that ensures the technique sensor signals without preprocessing.
conducts exhaustive data analysis. Detects Disadvantages are: Need to investigate a
abnormalities in the system that includes trained SdA to identify the process
anomaly detection, traffic identification. parameters that most significantly impact
Disadvantages are: Demand for faster and the classification results.
efficient data assessment.
Proposes a novel deep learning-
The main purpose of [2] paper is to based recurrent neural networks
review and summarize the work of deep (RNNs)model [4] for automatic security
learning on machine health monitoring. audit of short messages from prisons,
The applications of deep learning in which can classify short messages(secure
and non-insecure).In this paper, the feature

of short messages is extracted by rate of 2.47%. Disadvantages are: Need to

word2vec which captures word order improve the method to maximize the
information, and each sentence is mapped feature reduction process in the deep
to a feature vector. In particular, words learning network and to improve the
with similar meaning are mapped to a dataset.
similar position in the vector space, and
then classified by RNNs. Advantages are: The paper [7] proposes a deep
The RNNs model achieves an average learning based approach for developing an
92.7% accuracy which is higher than efficient and flexible NIDS. A sparse
SVM. Taking advantage of ensemble autoencoder and soft-max regression based
frameworks for integrating different NIDS was implemented. Uses Self-taught
feature extraction and classification Learning (STL), a deep learning based
algorithms to boost the overall technique, on NSL-KDD - a benchmark
performance. Disadvantages are: It is dataset for network intrusion. Advantages
apply on only short messages not large- are: STL achieved a classification
scale messages. accuracy rate more than 98% for all types
of classification. Disadvantages are: Need
Signature-based features technique to implement a real-time NIDS for actual
as a deep convolutional neural network [5] networks using deep learning technique.
in a cloud platform is proposed for plate
localization, character detection and In [8] paper choose multi-core
segmentation. Extracting significant CPU‘s as well as GPU‘s to evaluate the
features makes the LPRS to adequately performance of the DNN based IDS to
recognize the license plate in a challenging handle huge network data. The parallel
situation such as i) congested traffic with computing capabilities of the neural
multiple plates in the image ii) plate network make the Deep Neural Network
orientation towards brightness, iii) extra (DNN) to effectively look through the
information on the plate, iv) distortion due network traffic with an accelerated
to wear and tear and v) distortion about performance. Advantages are: The DNN
captured images in bad weather like as based IDS is reliable and efficient in
hazy images. Advantages are: The intrusion detection for identifying the
superiority of the proposed algorithm in specific attack classes with required
the accuracy of recognizing LP rather than number of samples for training. The multi-
other traditional LPRS. Disadvantages are: core CPU‘s was faster than the serial
There are some unrecognized or miss- training mechanism. Disadvantages are:
detection images. Need to improve the detection accuracies
of DNN based IDS.
In [6] paper, a deep learning
approach for anomaly detection using a In [9] paper, proposes a mechanism
Restricted Boltzmann Machine (RBM) and for detecting large scale network-wide
a deep belief network are implemented. attacks using Replicator Neural Networks
This method uses a one-hidden layer RBM (RNNs) for creating anomaly detection
to perform unsupervised feature reduction. models Our approach is unsupervised and
The resultant weights from this RBM are requires no labeled data. It also accurately
passed to another RBM producing a deep detects network-wide anomalies without
belief network. The pre-trained weights presuming that the training data is
are passed into a fine tuning layer completely free of attacks. Advantages
consisting of a Logistic Regression (LR) are: The proposed methodology is able to
classifier with multi-class soft-max. successfully discover all prominent DDoS
Advantages are: Achieves 97.9% attacks and SYN Port scans injected.
accuracy. It produces a low false negative Proposed methodology is resilient against

learning in the presence of attacks, such information. Ordering a colossal

something that related work lacks. measure of information for the most part
Disadvantages are: Need to improve causes numerous numerical troubles which
proposed methodology by using stacked at that point lead to higher computational
autoencoder deep learning techniques. complexity.
Based on the flow-based nature of 5. SYSTEM OVERVIEW
SDN, we propose a flow-based anomaly In this paper,[11] propose a novel
detection system using deep learning. In deep learning model to enable NIDS
[10] paper, apply a deep learning approach operation within modern networks. The
for flow-based anomaly detection in an model proposes is a combination of deep
SDN environment. Advantages are :It and shallow learning, capable of correctly
finds an optimal hyper-parameter for DNN analyzing a wide-range of network traffic.
and confirms the detection rate and false More specifically, we combine the power
alarm rate. The model gets the of stacking our proposed Non-symmetric
performance with accuracy of 75.75% Deep Auto-Encoder (NDAE) (deep
which is quite reasonable from just using learning) and the accuracy and speed of
six basic network features. Disadvantages Random Forest (RF) (shallow learning).
are: It will not work on real SDN This paper introduces our NDAE, which is
environment. an auto-encoder featuring non-symmetrical
multiple hidden layers. NDAE can be used
4. OPEN ISSUES as a hierarchical unsupervised feature
` The present system traffic extractor that scales well to accommodate
information, which are regularly enormous high-dimensional inputs. It learns non-
in size, present a noteworthy test to IDSs trivial features using a similar training
These "Big Data" back off the whole strategy to that of a typical auto-encoder.
location process and may prompt Stacking the NDAEs offers a layer-wise
unsuitable grouping precision because of unsupervised representation learning
the computational troubles in taking care algorithm, which will allow our model to
of such information. Machine learning learn the complex relationships between
innovations have been normally utilized in different features. It also has feature
IDS. In any case, a large portion of the extraction capabilities, so it is able to
conventional machine learning innovations refine the model by prioritizing the most
allude to shallow learning; they can't descriptive features.
viably understand the gigantic interruption
information order issue that emerges The existing system in the paper
despite a genuine system application have used NSL KDD dataset which is
condition. Also, shallow learning is refined version of KDD 99 Dataset. The
contradictory to wise examination and the NSL KDD dataset is use for IDS. Which
foreordained necessities of high- has 41 features which make it more
dimensional learning with colossal accurate. However It has a limitation that
information. Disadvantage: Computer it can‘t be used for Wireless Network. So
frameworks and web have turned into a to Overcome this limitation of NSL KDD
noteworthy piece of the basic framework. dataset in proposed system WSN dataset
The present system traffic information, has been used.
which are regularly gigantic in size,
present a noteworthy test to IDSs. These The WSN dataset has 12 attribute
"Big Data" back off the whole recognition which are given in table I
process and may prompt inadmissible
grouping precision because of the
computational challenges in dealing with

Table I WSN Trace Dataset Attributes
Total Attributes
Event protocol_used
Time port_number
from_node transmission_rate_kbps
Fig. 1 Proposed System Architecture
to_node received_rate_kbps
Advantages are: Due to deep learning
hopcount drop_rate_kbps technique, it improves accuracy of
intrusion detection system.
packet_size Class
 The network or computer is constantly
Fig. 1 shows the proposed system monitored for any invasion or attack.
architecture of Network Intrusion  The system can be modified and
Detection and Prevention System changed according to needs of specific
(NIDPS). The input traffic data is uses for client and can help outside as well as
WSN dataset with 12 features. The inner threats to the system and
training dataset contains data network.
preprocessing which includes two steps:  It effectively prevents any damage to
Data transformation and data the network.
normalization. After uses two NDAEs  It provides user friendly interface
arranged in a stack, which uses for which allows easy security
selecting number of features. After that management systems.
apply the Random Forest Classifier for  Any alterations to files and directories
attack detection. Intrusion Prevention on the system can be easily detected
Systems (IPS) contains IDS functionality and reported.
but more sophisticated systems which are
capable of taking immediate action in 6. ALGORITHM
order to prevent or reduce the malicious A Deep Belief Network (DBN)[11] is a
behavior. complex sort of generative neural system
that utilizes an unsupervised machine
learning model to deliver results. This kind
of system outlines a portion of the work
that has been done as of late in utilizing
generally unlabeled information to
construct unsupervised models. A few
specialists depict the Deep Belief Network
as a lot of limited Boltzmann machines
(RBMs) stacked over each other. When all
is said in done, profound conviction

systems are made out of different littler normalized using the same and as
unsupervised neural systems. One of the follows:
regular highlights of a DBN is that in spite
of the fact that layers have associations
between them, the system does exclude
associations between units in a solitary (2)
layer. It uses Stacked Restricted
Boltzmann Machine Which has a two 2. Feature Selection:
layer called hidden layer and visible layer. NDAE is an auto-encoder featuring non-
The rule status monitoring symmetrical multiple hidden layers. The
algorithm has been use to recognize and proposed NDAE takes an input vector
detect the attack. We define a rule set as a and step-by-step maps it to the
file consisting of a set (or category) of latent representations (here d
rules that share a common set of represents the dimension of the vector)
characteristics. Our goal is to develop an using a deterministic function shown in (3)
algorithm that monitors the collection of below:
rule sets so as to identify the state of each
(3)
rule in each rule set, in terms of whether it
Here, is an activation function
is enabled or disabled, and to build useful
(in this work use sigmoid function
statistics based on these findings. The
and n is the number
algorithm should also provide periodic
of hidden layers. Unlike a conventional
updates of this information. This may be auto-encoder and deep auto-encoder, the
accomplished by running it as a daemon proposed NDAE does not contain a
with an appropriately selected specified decoder and its output vector is calculated
period. by a similar formula to (4) as the latent
representation.
7. Mathematical Model (4)
7.1. Preprocessing: The estimator of the model
In this step, training data source (T) is can be obtained by
normalized to be ready for processing by
minimizing the square reconstruction error
using following steps:
over m training samples , as
shown in (5).
(5)
(1)
Where, 8. CONCLUSION AND FUTURE
WORK
In this paper, we have discussed the
problems faced by existing NIDS
techniques. In response to this we have
proposed our novel NDAE method for
T is m samples with n column attributes; unsupervised feature learning. We have
is the jth column attribute in ith then built upon this by proposing a novel
sample, and are 1*n matrix which classification model constructed from
are the training data mean and standard stacked NDAEs and the RF classification
deviation respectively for each of the n algorithm. Also we implemented the
attributes. Test dataset (TS) which is used Intrusion prevention system. The result
to measure detection accuracy is shows that our approach offers high levels

of accuracy, precision and recall together [6] Learn. Appl., Anaheim, CA, USA, Dec. 2016,
with reduced training time. The proposed pp. 286–293.
[7] K. Alrawashdeh and C. Purdy, ―Toward an
NIDS system is improved only 5% online anomaly intrusion detection system
accuracy. So, there is need to further based on deep learning,‖ in Proc. 15th IEEE
improvement of accuracy. And also further Int. Conf. Mach. Learn. Appl., Anaheim, CA,
work on real-time network traffic and to USA, Dec. 2016, pp. 195–200.
handle zero-day attacks. [8] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, ―A
deep learning approach for network intrusion
detection system,‖ in Proc. 9th EAI Int.Conf.
REFERENCES Bio-Inspired Inf. Commun. Technol., 2016, pp.
[1] B. Dong and X. Wang, ―Comparison deep 21–26. [Online]. Available:
learning method to traditional methods using http://dx.doi.org/10.4108/eai.3-12-
for network intrusion detection,‖ in Proc. 8th 2015.2262516
IEEE Int.Conf. Commun. Softw. Netw, [9] S. Potluri and C. Diedrich, ―Accelerated deep
Beijing, China, Jun. 2016, pp. 581–585. neural networks for enhanced intrusion
[2] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, detection system,‖ in Proc. IEEE 21st Int.
and R. X. Gao, ―Deep learning and its Conf. Emerg. Technol. Factory Autom., Berlin,
applications to machine health monitoring: A Germany, Sep. 2016, pp. 1–8.
survey,‖ Submitted to IEEE Trans. Neural [10] C. Garcia Cordero, S. Hauke, M. Muhlhauser,
Netw. Learn. Syst., 2016. [Online]. Available: and M. Fischer, ―Analyzing flow-based
http://arxiv.org/abs/1612.07640 anomaly intrusion detection using replicator
[3] H. Lee, Y. Kim, and C. O. Kim, ―A deep neural networks,‖ in Proc. 14th Annu. Conf.
learning model for robust wafer fault Privacy, Security. Trust, Auckland, New
monitoring with sensor measurement noise,‖ Zeland, Dec. 2016, pp. 317–324.
IEEE Trans. Semicond. Manuf., vol. 30, no. 1, [11] T. A. Tang, L. Mhamdi, D. McLernon, S. A.
pp. 23–31, Feb. 2017. R. Zaidi, and M. Ghogho, ―Deep learning
[4] L. You, Y. Li, Y. Wang, J. Zhang, and Y. approach for network intrusion detection in
Yang, ―A deep learning based RNNs model for software defined networking,‖ in Proc. Int.
automatic security audit of short messages,‖ in Conf. Wireless Netw. Mobile Commun., Oct.
Proc. 16th Int. Symp. Commun. Inf. Technolf., 2016, pp. 258–26
Qingdao, China, Sep. 2016, pp. 225–229.
[12] Nathan shone , trannguyenngoc, vu
[5] R. Polishetty, M. Roopaei, and P. Rad, ―A
dinhphai , and qi sh, ―a deep learning approach
next-generation secure cloud based deep
to network intrusion detection‖,ieee
learning license plate recognition for smart
transactions on emerging topics in
cities,‖ in Proc. 15th IEEE Int. Conf.Mach.
computational intelligence, vol. 2, no. 1,
february 2018.

TURING MACHINE IMITATE ARTIFICIAL

INTELLIGENCE
Tulashiram B. Pisal1, Prof. Dr. Arjun P. Ghatule2
1
Research Scholar, Sinhgad Institute of Computer Sciences, Pandharpur(MS),India
2
Director, Board of Examinations and Evaluation,University of Mumbai, India
pisaltbresearch@gmail.com, arjun1671@gmail.com
ABSTRACT
A Turing Machine is the mathematical tool corresponding to a digital computer. It is a
widely used the model of computation in computability and complexity theory.
According to Turing‘s hypothesis, If Turing machine computes function then only
compute, it by algorithmically. The problems which are not solved by a Turing machine
that problems cannot be solved by any modern computer program. It accepts all types
of languages. A Turing machine manipulates symbols on a tape according to transition
rules. Due to its simplicity, a Turing machine can be amended to simulate the logic of
any computer algorithm and is particularly useful in explaining the functions of a
central processing unit (CPU) inside a computer. A Turing machine is able to imitate
Artificial Intelligence.
General Terms
Turing Machine implements the machine learning.
Keywords
Turing Machine, Artificial Intelligence, Finite Automata, Push Down Automata,
Transaction Diagram, Turing Test
1. INTRODUCTION The answer given by both human and
Turing machine was introduced by 1930 computer wouldn‘t be able to distinguish
by Alan Turing for computation. The by the interrogator. The computer passed
Turing test was developed by Alan Turing the test is providing computer is intelligent
in 1950[1]. He proposed that ―Turing test as human[2-3]. Both computer and
is used to determine whether or not a humans, the whole conversation is only
computer or machine can think through a computer keyboard and screen.
intelligently like a human‖? The abstract machine could not be
The game of three players is playing in designed without consideration of the
which two players are human and one is a Turing test. Turing test is represented logic
computer. One human is interrogator by using the symbols for better
which job is to find out which one is understanding. Before the study of
human and which one is computer by cognitive science, we could not conclude
asking questions to both of them but machine thinking as like human[4-5].
distinguish a computer from a human is a
harder task and the guess of interrogator
become wrong. Turing test is shown in the
following figure Fig.1.

Sr.No Machine Data Nature

Name Structure
1 Finite No Deterministic
Automata
2 Pushdown Stack Non-

Automata Deterministic
3 Turing Infinite Deterministic

Machine tape to
both side
sides, Turing machines have an
unconstrained amount of storage space to
Fig.1: Turing Test the computations.
A test to empirically determine whether a The Finite Automata (FA), Push Down
computer is achieved intelligence. A Automata (PDA) and Post Machine have
Turing test combines both human no control over the input and they cannot
behaviours and intelligent behaviours[6-8]. modify its own input symbols. PDA has
The Turing test uses natural language two types deterministic Push Down
processing to communicate with the Automata (DPDA) and Non-deterministic
computer. Push Down Automat (NPDA). NPDA is
more powerful than DPADA. The Turing
Turing test plays a crucial role in artificial
Machine (TM) is more powerful due to
intelligence for game playing. In game
their deterministic nature[17-18]. The
playing like chess, tennis the computer can
comparative nature of various machines is
beat world class player. A game playing
shown in following table 1.
has numerous possible moves for the
single move of the opponent to reach at Table 1. Deterministic Nature of Machines
goal state with the optimal solution. A
computer is be able to play the imitation
game which not given chance to the
interrogator of making the right
identification of player those are machine
or human[9].
The artificial intelligence covered all
games ground over the world from Turing Turing machines simplify the statement of
wrote his paper in 1950. Imitation game algorithms to run in memory while the real
and computer game boats are significance machine has a problem to enlarge the
role played for game playing[10-11]. It memory space.
also plays important roles in all other The power of various machines is shown
games. in following equation 1
2. POWER OF MACHINES
The entire real machine handled all
operations handle by Turing machine with
intelligence[12]. A real machine has only (1)
limited a finite number of formations. The
actually real machine is a linear bounded
automaton. Due to infinite tape to both

3. TURING MACHINE a a b b B
A Turing machine is a mathematical
model of machine or computer that L N R
describes an intellectual machine for any
problems. The machine handles finite
symbols on a tape according to rules[13]. Read / Write Head
A Turing machine computes algorithms
constructed due to the model's simplicity.
The machine contains tape is an infinite Finite State Control
length to both sides which is divided into
small squares is known as a cell. Each cell
contains only one symbol for a finite Fig.2: Turing Machine Model
alphabet. The empty cells are filled with 3.1 Mathematical Representation
blank symbols. A head is used to read and The real machine handled all operations
write symbols on the tape and set the handle by Turing machine with
movement to the first symbol of left. The intelligence. A real machine has only
machine can move one cell at a time to limited a finite number of formations. The
left, right or no movement. Finite states are actually real machine is a linear bounded
stored in state register by Turing machine. automata[16]. Due to infinite tape to both
The state register is initialized by the sides, Turing machines have an
special start state. A finite table of rules is unconstrained amount of storage space to
used to read the current input symbols the computations.
from tape and modify it by moving tape
head left, right or no movement[14-15]. A Turing machine is represented by 7
The Turing machine is a mathematical tuples[23] i.e.
model of machine or computer that M= (Q, ∑, , δ, q0, B, F)
mechanically operates on a tape as shown
in the following figure Fig.2. where;
A Turing machine consists of: Q is a finite set of states
1) Input Tape: A tape is infinite to both ∑ is the finite set of input alphabets
sides and divided into cells. Each cell  is the finite set of tape alphabets
contains only one symbol from finite
δ is a transition function;
alphabets. At the end, alphabet contains a
special symbol known as blank. It is δ∶ Q ×  → Q ×  × {L,R,N}
written as 'B'. The tape is implicit to be
arbitrarily extendable to both left and right
where,
sides for computation.
L: move to the left
2) Read/Write Head: A head that can read
and write only one symbol at a time on R: move to the right
tape and move to the left, right or no N: no movement
movement.
q0 is the initial state
3) Finite State Control: A state control
stores the state of the Turing machine from B is the blank symbol
initial to halting state. After reading last F is the set of final states or set of
symbol Turing machine reaches to final halting states.
state then the input string is accepted
otherwise input string is rejected.

3.2 The Language Acceptance The following figure Fig.4 shows

The formal language tool is used to apply transition diagram for L= {an bn | n>=1}.
user-specific constraints for pattern .
mining. Informal language, we have a
need to recognize the category of
grammar, but a recognized category of
grammar is a difficult task. Turing
Machine accepts all types of grammar
hence; there is no need to recognize the
category of grammar for constraint [19].
The use of Turing Machine for sequential
pattern mining is a flexible specification
tool. Figure Fig.3 shows acceptance of all
types of languages by Turing Machine.
Fig.4: Transition Diagram for L= {an bn | n>=1}
3.4 Transition Rules

The definition of Turing machine is
represented using tuple format. The
machine is represented in mathematical
model as follows;
M= ({q0, q1, q3, q4}, {0, 1}, {0, 1, B}, δ,
q0, B, {q4})
Where,
δ (q0, a) = (q1,x,R)
δ (q0, y) = (q4,y,N)
δ (q1, a) = (q1,a,R)
δ (q1, b) = (q1,b,R)
Fig.3: Language Acceptance by Turing Machine δ (q1, y) = (q2,y,L)
δ (q1, B) = (q2,B,L)
3.3 Transition Diagram δ (q2, b) = (q3,y,L)
The transition diagram is used to represent δ (q3, a) = (q3,a,L)
the Turing machine computations. The
transition rules can also be represented δ (q3, b) = (q3,b,L)
using a state transition diagram. In a state δ (q3, x) = (q0,x,R)
transition diagram circle represents a state,
arrows represent transitions between δ (q0, y) = (q4,y,S)
states. Each state transition depends upon q4 is halting state.
current state and current tape symbol and it
3.5 Instantaneous Description
gives a new state with changing tape
The step by step string processing in the
symbol and movement.
Turing machine is known as an
The Java Formal Languages and Automata instantaneous description (ID). Turing
Package (JFLAP) is used to design a machine accepts recursively enumerable
Turing machine for L= {an bn | n>=1}[21]. language is extensible and implemented in

JFLAP [22]. An instantaneous of a Turing

machine includes:
1) The input string at any point of
time
2) Position of head
3) State of the machine
The string a1a2…...ai-1 ai ai+1……an give
the snapshot of the machine in which;
1) q is the state of Turing machine.
2) The head is scanning the symbol ai.
The instantaneous description of scanning
symbol ai and machine in state q is shown
in following figure Fig.5.
Fig.6: Instantaneous Descriptions for String a4b4

3.6 Grammar Representation
The Turing machine accepts all types of
grammar. The Unrestricted grammar for
Turing machine L= {an bn | n>=1} is as
shown in following table 2.
Table 2. Unrestricted Grammar for L= {an bn |
n>=1}
Fig.5: Instantaneous Description
The instantaneous Description for string

a4b4 is shown in following figure Fig.6.
3.7 String Acceptance

The String acceptance or rejection is
shown with the help of JFLAP tool. The
string acceptance for input string
aaaabbbbB is illustrated by JFLAP. The
following figure Fig.7 shows acceptance
of the string.

computation, it has led to new

mathematical investigations. The
development of the last 20 years is that of
categorizing diverse problems in terms of
their complexity. It gives a platform-
independent approach of determining the
complexity. Nowadays computer can be
used to pretend the process of a Turing
machine, which is seen on the screen. It
can have numerous applications such as
enumerator, function computer. The
Turing machine is core part of Artificial
Intelligence.
5. ACKNOWLEDGMENTS
This research is a part of my research
entitled ―Sequential Pattern Mining using
Turing Machine‖. We thank Dr. Arjun P.
Ghatule for his help and for the
Fig.7: Acceptance for String aaaabbbbB discussions on the topics in this paper. I
also thanks to Dr. Kailas J. Karande,
Principal, SKN Sinhgad College of
The string rejection for input string
Engineering, Pandharpur for his help and
aaaabbbB is illustrated by JFLAP in the
following figure Fig.8. discussions on topics of this paper. The
paper is partially supported by Sinhgad
Institute of Computer Sciences,
Pandharpur of Solapur
University,Solapur(MS), India.
REFERENCES
[1] Teuscher and Christof, "Alan Turing: Life and
Legacy of a Great Thinker", Springer, ISBN
978-3-662-05642-4.
[2] Guy Avraham, Ilana Nisky, Hugo L.
Fernandes, Daniel E. Acuna, Konrad P.
Kording, Gerald E.Loeb, and Amir Karniel,
"Towards Perceiving Robots as Humans:
Three Handshake Models Face the Turing-like
Handshake Test", Revised manuscript
Fig.8: Rejection for String aaaabbbbB received, IEEE,2012.
[3] Stuart Shieber, "The Turing Test: Verbal
Behavior as the Hallmark of Intelligence",
MIT Press, Cambridge, ISBN 0-262-69293-7,
4. CONCLUSION pp. 407-412.
The Turing Machine is the most
[4] Shane T. Mueller, "Is the Turing Test Still
comprehensive, deep, and accessible Relevant? A Plan for Developing the
model of computation existent and its Cognitive Decathlon to Test Intelligent
allied theories consent many ideas Embodied Behavior", Paper submitted to the
involving time and cost complexity to be 19th Midwest Artificial Intelligence and
gainfully deliberated. In providing a sort of Cognitive Science Conference, 2008, pp.1-8.
atomic structure for the concept of [5] Roman V. Yampolskiy, "Turing Test as a
Defining Feature of AI-Completeness",

Springer-Verlag Berlin Heidelberg, 2012, [16] K. L. P. Mishra and N. Chandrasekaran,

pp.3-17. "Theory of Computer Science: Automata,
[6] Saul Traiger, "Making the Right Identification Languages and Computation", Prentice Hall of
in the Turing Test", Minds and Machines, India Private Limited, New Delhi-110001,
Kluwer Academic Publishers, Netherlands, 2007.
2000, pp. 561-572. [17] Tirtharaj Dash and Tanistha Nayak,
[7] Ulrich J. Pfeiffer, Bert Timmermans, Gary "Comparative Analysis on Turing Machine
Bente, Kai Vogeley and Leonhard Schilbach, and Quantum Turing Machine", Journal of
"A Non-Verbal Turing Test: Differentiating Global Research in Computer Science, ISSN-
Mind from Machine in Gaze-Based Social 2229-371X, Volume 3, No. 5, 2012, pp.51-56
Interaction", Plos One, Volume 6, Issue 11, [18] Amandeep Kaur, "Enigmatic Power of Turing
2011, pp.1-12. Machines: A Review", International Journal of
[8] Jan van Leeuwen and Jiri Wiedermann, Computer Science & Engineering Technology
"Question Answering and Cognitive Automata (IJCSET), ISSN: 2229-3345, Volume 6, No.,
with Background Intelligence", This Research 2015, pp. 427-430.
was Partially Supported by RVO 67985807 [19] Gerhard Jager and James Rogers, ―Formal
and GA CR grant No. 15-04960S, pp. 1-15. language theory: refining the Chomsky
[9] John F. Stins and Steven Laureys, "Thought hierarchy‖, Philos Trans R Soc Lond B Biol
translation, tennis and Turing tests in the Sci., 2012, pp.1956–1970.
vegetative state", Springer, 2009, pp. 1-10. [20] Nazir Ahmad Zafar and Fawaz Alsaade,
[10] Kirkpatrick B. and Klingner B., "Turing‘s "Syntax-Tree Regular Expression Based DFA
Imitation Game: a discussion with the benefit Formal Construction", Intelligent Information
of hind-sight", Berkeley Computer Science Management, 2012, pp. 138-146.
course ―Reading the Classics‖, 2004, pp. 1-5. [21] JFAP Tool for Simulating Results and
[11] Philip Hingston, "A Turing Test for Computer Validation. [Online]. Available:
Game Bots", IEEE Transactions on http://www.jflap.org.
Computational Intelligence and AI in Games, [22] Ankur Singh and Jainendra Singh,
Volume 1, NO. 3, 2009, pp. 169-186. "Implementation of Recursively Enumerable
[12] Ayse Pinar Saygin, Ilyas Cicekli and Varol Languages using Universal Turing Machine in
Akman, "Turing Test: 50 Years Later", Minds JFLAP", International Journal of Information
and Machines, 2000, pp.463–518. and Computation Technology, ISSN 0974-
[13] John E. Hopcroft, Rajeev Motwani and Jeffrey 2239 Volume 4, Number 1, 2014, pp. 79-84.
D. Ullman, ―Automata Theory, Language, and [23] Tulashiram B. Pisal and Dr. Arjun P. Ghatule,
Computation‖, Delhi: Pearson, 2008. ―Implicit Conversion of Deterministic Finite
[14] Vivek Kulkarni, ―Theory of Computation‖, Automata to Turing Machine‖, ―International
Pune: Tech-Max, 2007. Journal of Innovations & Advancement in
Computer Science (IJIACS)‖, ISSN 2347 –
[15] Dilip Kumar Sultania, ―Theory of
8616 Volume 7, Issue 3 March 2018,pp.606-
Computation‖, Pune: Tech-Max, 2010.
616.

A SURVEY ON EMOTION RECOGNITION

BETWEEN POMS AND GAUSSIAN NAÏVE BAYES
ALGORITHM USING TWITTER API
Darshan Vallur1, Prathamesh Kulkarni2, Suraj Kenjale3, Shubham Shinde4
1,2,3,4
Smt Kashibai Navale College of Engineering,Pune,India.
darshvallur@gmail.com, prathameshkoolkarni123@gmail.com, surajkenjale281@gmail.com,
shubham291297@gmail.com
ABSTRACT
The analysis of social networks is a very tough research area while a fundamental
element concerns the detection of user communities. The existing work of emotion
recognition on Twitter specifically relies on the use of lexicons and simple classifiers
on bag-of words models. The vital question of our observation is whether or not we will
increase their overall performance using machine learning algorithms. The novel
algorithm a Profile of Mood States (POMS) represents twelve-dimensional mood state
representation using 65 adjectives with the combination of Ekman‘s and Plutchik‘s
emotions categories like, joy, anger, depression, fatigue, vigour, tension, confusion,
disgust, fear, trust, surprise and anticipation. These emotions recognize with the help of
text based bag-of-words and LSI algorithms. The contribution work is to cover machine
learning algorithm for emotion classification, it takes less time consumption without
interfere human labeling. The Gaussian Naïve Bayes classifier works on testing dataset
with help of huge amount of training dataset. Measure the performance of POMS &
Gaussian Naïve Bayes algorithms on Twitter API. The experimental outcome shows
with the help of Emojis for emotion recognition using tweet contents.
Keywords- Emotion Recognition, Text Mining, Twitter, LSI, Recurrent Neural

Networks, Convolutional Neural Networks, Gaussian Naïve Bayes Classifier
1. INTRODUCTION Previous studies mainly focus on lexicon-

Emotions can be defined as conscious based and machine learning based
affect attitudes, which constitute the methods. The performance of lexicon-
display of a feeling. In recent years, a large based methods relies heavily on the quality
number of studies have focused on of emotion lexicon and the performance of
emotion detection using opinion mining on machine learning methods relies heavily
social media. Due to some intrinsic on the features. Therefore, we work with
characteristics of the texts produced on three classifications that are the most
social media sites, such as the limited popular, and have also been used before by
length and casual expression, emotion the researchers from computational
recognition on them is a challenging task. linguistics and natural language processing

(NLP). Paul Ekman defined six basic text processing techniques and that such
emotions by studying facial expressions. changes respond to a variety of socio-
Robert Plutchik extended Ekman‘s cultural drivers in a highly differentiated
categorization with two additional manner. Advantages are: Increases the
emotions and presented his categorization performance. Public mood analysis from
in a wheel of emotions. Finally, Profile of Twitter feeds offers an automatic, fast, free
Mood States (POMS) is a psychological and large-scale addition to this toolkit that
instrument that defines a six-dimensional may be optimized to measure a variety of
mood state representation using text dimensions of the public mood state.
mining. The novel algorithm a Profile of Disadvantages are: It avoids geographical
Mood States (POMS) generating twelve- and cultural sampling errors.
dimensional mood state representation In [2] paper explored an
using 65 adjectives with combination of application of deep recurrent neural
Ekman‘s and Plutchik‘s emotions networks to the task of sentence-level
categories like, anger, depression, fatigue, opinion expression extraction. DSEs
vigour, tension, confusion, joy, disgust, (direct subjective expressions) consist of
fear, trust, surprise and anticipation. explicit mentions of private states or
Previous work generally studied only one speech events expressing private states;
emotion classification. Working with and ESEs (expressive subjective
multiple classifications simultaneously not expressions) consist of expressions that
only enables performance comparisons indicate sentiment, emotion, etc., without
between different emotion categorizations explicitly conveying them. Advantages
on the same type of data, but also allows are: Deep RNNs outperformed previous
us to develop a single model for predicting (semi)CRF baselines; achieving new state-
multiple classifications at the same time. of-the-art results for fine-grained on
Motivation opinion expression extraction.
The system developed based on our Disadvantages are: RNNs do not have
proposed approach would be able to access to any features other than word
automatically detect what people feel vectors.
about their lives from twitter messages. In [3] paper analyze electoral
For example, the system can recognize: tweets for more subtly expressed
 percentage of people expressing higher information such as sentiment (positive or
levels of life satisfaction in one group negative), the emotion (joy, sadness,
versus another group, anger, etc.), the purpose or intent behind
 percentage of people who feel happy the tweet (to point out a mistake, to
and cheerful, support, to ridicule, etc.), and the style of
 percentage of people who feel calm the tweet (simple statement, sarcasm,
and peaceful, and hyperbole, etc.). There are two sections: on
 percentage of people expressing higher annotating text for sentiment, emotion,
levels of anxiety or depression. style, and categories such as purpose, and
on automatic classifiers for detecting these
2. RELATED WORK categories. Advantages are: Using a
In [1] paper, investigate whether multitude of custom engineered features
public mood as measured from large-scale like those concerning emoticons,
collection of tweets posted on twitter.com punctuation, elongated words and negation
is correlated or even predictive of DJIA along with unigrams, bigrams and emotion
values. The results show that changes in lexicons features, the SVM classifier
the public mood state can indeed be achieved a higher accuracy. Automatically
tracked from the content of large-scale classify tweets into eleven categories of
Twitter feeds by means of rather simple emotions. Disadvantages are: Does not

summarize tweets. Does not automatically novel corpus of 1.2M English tweets
identifying other semantic roles of (1,500 authors) annotated for gender and
emotions such as degree, reason, and MBTI. Advantages are: The personality
empathy target. distinctions, namely INTROVERT–
In [4] article, show that emotion- EXTROVERT (I–E) and THINKING–
word hashtags are good manual labels of FEELING (T–F), can be predicted from
emotions in tweets. Proposes a method to social media data with high reliability. The
generate a large lexicon of word–emotion large-scale, open-vocabulary analysis of
associations from this emotion-labeled user attributes can help improve
tweet corpus. This is the first lexicon with classification accuracy.
real-valued word–emotion association The paper [7] focuses on studying
scores. Advantages are: Using hashtagged two fundamental NLP tasks, Discourse
tweets can collect large amounts of labeled Parsing and Sentiment Analysis. The
data for any emotion that is used as a development of three independent
hashtag by tweeters. The hashtag emotion recursive neural nets: two for the key sub-
lexicon is performed significantly better tasks of discourse parsing, namely
than those that used the manually created structure prediction and relation
WordNet affect lexicon. Automatically prediction; the third net for sentiment
detecting personality from text. prediction. Advantages are: The latent
Disadvantages are: This paper works only Discourse features can help boost the
on given text not synonym of that text. performance of a neural sentiment
The paper [5] develops a multi-task analyzer. Pre-training and the individual
DNN for learning representations across models are an order of magnitude faster
multiple tasks, not only leveraging large than the Multi-tasking model.
amounts of cross-task data, but also Disadvantages are: Difficult predictions to
benefiting from a regularization effect that multi-sentential text.
leads to more general representations to
help tasks in new domains. A multi-task 3. EXISTING SYSTEM
deep neural network for representation The ability of the human face to
learning, in particular focusing on communicate emotional states via facial
semantic classification (query expressions is well known, and past
classification) and semantic information research has established the importance
retrieval (ranking for web search) tasks. and universality of emotional facial
Demonstrate strong results on query expressions. However, recent evidence has
classification and web search. Advantages revealed that facial expressions of emotion
are: The MT-DNN robustly outperforms are most accurately recognized when the
strong baselines across all web search and perceiver and expresser are from the same
query classification tasks. Multi-task DNN cultural in group. Paul Ekman explains
model successfully combines tasks as facial expressions to define a set of six
disparate as classification and ranking. universally recognizable basic emotions:
Disadvantages are: The query anger, disgust, fear, joy, sadness and
classification incorporated either as surprise. Robert Plutchik defined a wheel-
classification or ranking tasks not like diagram with a set of eight basic,
comprehensive exploration work. pairwise contrasting emotions; joy –
In [6] paper we i) demonstrate how sadness, trust – disgust, fear – anger and
large amounts of social media data can be surprise – anticipation. Consider each of
used for large-scale open-vocabulary these emotions as a separate category, and
personality detection; ii) analyze which disregard different levels of intensities that
features are predictive of which Plutchik defines in his wheel of emotions.
personality dimension; and iii) present a Disadvantages:

A. Ekman‘s Facial expressions limitations: dimensional mood state representation

Image quality consisting of categories: anger, depression,
Image quality affects how well facial- fatigue, vigour, tension and confusion.
recognition algorithms work. The image Comparing to the original structure, we
quality of scanning video is quite low discarded the adjective blue, since it only
compared with that of a digital camera. rarely corresponds to an emotion and not a
2. Image size color, and word-sense disambiguation
When a face-detection algorithm finds a tools were unsuccessful at distinguishing
face in an image or in a still from a video between the two meanings. We also
capture, the relative size of that face removed adjectives relaxed and efficient,
compared with the enrolled image size which have negative contributions, since
affects how well the face will be the tweets containing them would
recognized. represent counter-examples for their
3. Face angle corresponding category.
The relative angle of the target‘s face
influences the recognition score
profoundly. When a face is enrolled in the
recognition software, usually multiple
angles are used (profile, frontal and 45-
degree are common).
4. Processing and storage
Even though high-definition video is quite
low in resolution when compared with
digital camera images, it still occupies
significant amounts of disk space.
Processing every frame of video is an
enormous undertaking, so usually only a
fraction (10 percent to 25 percent) is
actually run through a recognition system.
B. Plutchik‘s algorithm limitations:
1. The FPGA Kit uses hardware that is
expensive. Thus, making this approach a
cost ineffective technological solution.
2. Also, there is an additional dimension
which involves a lot of tedious
calculations. Fig. 1 System Architecture
Contribution of this paper is to
4. SYSTEM OVERVIEW implement the novel algorithm a Profile of
Profile of Mood States is a psychological Mood States (POMS) generating twelve-
instrument for assessing the individual‘s dimensional mood state representation
mood state. It defines 65 adjectives that using 65 adjectives with combination of
are rated by the subject on the five-point Ekman‘s and Plutchik‘s emotions
scale. Each adjective contributes to one of categories like, joy, anger, depression,
the six categories. For example, feeling fatigue, vigour, tension, confusion, disgust,
annoyed will positively contribute to the fear, trust, surprise and anticipation. The
anger category. The higher the score for machine learning algorithm gives less time
the adjective, the more it contributes to the consumption without interfere human
overall score for its category, except for labeling. The Gaussian Naïve Bayes
relaxed and efficient whose contributions classifier works on testing dataset with
to their respective categories are negative. help of huge amount of training dataset. It
POMS combines these ratings into a six- gives same result as POMS tagging

methods. The contribution work is (a) Randomly choose a distribution

prediction of Emojis for emotion over topics (a multinomial of length K)
recognition using tweet contents. (b) for each word in the document:
(i) Probabilistically draw
5. MATHEMATICAL MODEL one of the K topics from the
5.1 Set Theory distribution over topics obtained in
Let us consider S as a system for Emotion (a), say topic βj
recognition system (ii) Probabilistically draw
S= {…… one of the V words from βj
INPUT:
 Identify the inputs 6. CONCLUSION
F= {f1, f2, f3 ....., fn| ‗F‘ as set of This project implements a novel algorithm
functions to execute commands.} Profile of Mood States (POMS) represents
I= {i1, i2, i3…|‘I‘ sets of inputs to the twelve-dimensional mood state
function set} representation using 65 adjectives with
O= {o1, o2, o3….|‘O‘ Set of outputs from combination of Ekman‘s and Plutchik‘s
the function sets} emotions categories like, joy, anger,
S = {I, F, O} depression, fatigue, vigour, tension,
I = {Comments or tweets submitted by the confusion, disgust, fear, trust, surprise and
user ...} anticipation. These POMS classifies the
O = {Detect emotions of the users and emotions with the help of bag-of-words
finally display tweets...} and LSI algorithm. The machine learning
F={ Gaussian Naïve Bayes classifier is used to
Tweet extraction, classify emotions, which gives results as
Generate Trainingset, accurate and less time consumption
Tweet processing, compares to POMS.
Keywords extraction
Tweet Classification, REFERENCES
Emotional tweet detection, [1] J. Bollen, H. Mao, and X.-J. Zeng, ―Twitter
Broadcasting tweet review mood predicts the stock market,‖ J. of
Computational Science, vol. 2, no. 1, pp. 1–8,
} 2011.
5.2 Latent Dirichlet Allocation (LDA) [2] O. Irsoy and C. Cardie, ―Opinion Mining with
Algorithm Deep Recurrent Neural Networks,‖ in Proc. of
First and foremost, LDA provides a the Conf. on Empirical Methods in Natural
generative model that describes how the Language Processing. ACL, 2014, pp. 720–
728.
documents in a dataset were created. In [3] S. M. Mohammad, X. Zhu, S. Kiritchenko, and
this context, a dataset is a collection of D J. Martin, ―Sentiment, emotion, purpose, and
documents. Document is a collection of style in electoral tweets,‖ Information
words. So our generative model describes Processing and Management, vol. 51, no. 4,
how each document obtains its words. pp. 480–499, 2015.
[4] S. M. Mohammad and S. Kiritchenko, ―Using
Initially, let‘s assume we know K topic Hashtags to Capture Fine Emotion Categories
distributions for our dataset, meaning K from Tweets,‖ Computational Intelligence, vol.
multinomials containing V elements each, 31, no. 2, pp. 301–326, 2015.
where V is the number of terms in our [5] X. Liu, J. Gao, X. He, L. Deng, K. Duh, and
corpus. Let βi represent the multinomial Y.-Y. Wang, ―Representation Learning Using
Multi-Task Deep Neural Networks for
for the ith topic, where the size of βi is V: Semantic Classification and Information
|βi|=V. Given these distributions, the LDA Retrieval,‖ Proc. of the 2015 Conf. of the
generative process is as follows: North American Chapter of the Association for
Steps: Computational Linguistics: Human Language
1. For each document: Technologies, pp. 912–921, 2015.
[6] B. Plank and D. Hovy, ―Personality Traits on

Twitter —or— How to Get 1,500 Personality [7] B. Nejat, G. Carenini, and R. Ng, ―Exploring
Tests in a Week,‖ in Proc. of the 6th Workshop Joint Neural Model for Sentence Level
on Computational Approaches to Subjectivity, Discourse Parsing and Sentiment Analysis,‖
Sentiment and Social Media Analysis, 2015, Proc. of the SIGDIAL 2017 Conf., no. August,
pp. 92–98. pp. 289–298, 2017.
ANTI DEPRESSION CHATBOT IN JAVA

Manas Mamidwar1, Ameya Marathe2, Ishan Mehendale3, Abdullah Pothiyawala4, Prof. A.
A. Deshmukh5
1,2,3,4,5
Department of Computer Engineering, SKNCOE, Pune 411041, Savitribai Phule Pune University,
Pune
mamidwarmanas@gmail.com1, maratheameya3@gmail.com2, ishan.mehendale@yahoo.com3,
pothiyawala80@gmail.com3, aadeshmukh@sinhgad.edu5
1. INTRODUCTION 2. MOTIVATION
The steps taken by students in their earlier The steps taken by students in their earlier
learning years shape up their future. There learning years shape up their future. There
is a lot of pressure on them from their is a lot of pressure on them from their
parents or peers to perform well. This parents or peers to perform well. This
might lead to extreme levels of depression might lead to extreme levels of depression
which might take a toll on their health. So, which might take a toll on their health. So,
we decided to design a web app to help the we decided to design a web app to help the
students to cope up with the stress. We are students to cope up with the stress. We are
going to make a better app than those going to make a better app than those
which are previously available. This which are previously available.
chatbot helps to cope with the pressure of
studies for students within a range of 14 to 3. PROBLEM STATEMENT
22 years. The bot can determine the stress Create a chatbot to help with coping up
or depression level using a simple with the pressure of studies for students
questionnaire at start and advances to within a range of 14 to 22 years. The bot
better assess the situation in later stages. can determine the stress or depression
General Terms level using a simple
Depression, Depression level, Stanford questionnaire at start and advances to
CoreNLP better assess the situation in later stages.
Keywords Also to help sports people to balance their
Chatbot play and studies.
4. STATE OF ART
Table 1 State of art
Sr. Name of the Paper Excerpts

No.
1. The chatbot Feels You – A This paper uses DNN for context and emotion
Counseling Service Using recognition to generate an appropriate response by
Emotional Response recognizing the best suited reaction.
Generation

2. Speech Analysis and Formant and jitter frequencies in speech are calculated
Depression based upon which a depression level is determined.
3. Affective and Content Linguistic Inquiry and Word Count (LIWC) is used
Analysis of Online for depression recognition. A survey is conducted of
Depression Communities various Clinical and Control communities for better
understanding depression patterns.
4. Detection of Depression in Survey based paper, volunteers are required to speak
Speech on certain questions, stories and visual images and
using feature selection, facilitate the depression
recognition.
5. A Model For Prediction Of Here about 500 records have been taken as test data
Human Depression Using for the model. The model is tested on 500 individuals
Apriori Algorithm and successfully predicted the percent of individuals
are suffering depression. Following factors of
depression are considered: Lifestyle, Life events,
Non-psychiatric illness, Acquired infection, Medical
treatments, Professional activities, Stress, and
Relationship Status etc. The questions were based on
Family problem(FA), Financial
problem(FP),Unemployed(UE), enumeration
(REM),Addiction(ADD),Workplace(ORG),
Relationship(RL),Congenital diseases(CD),
Apprehension(AP),Hallucination(HL), and Sleeping
problem(SLP).
6. Clinical Depression The speech of the person who is depressed is recorded

analysis Using Speech by one of the family members of the person or his/her
Features friend. Using the linear features of the speech, the
model is able to calculate the depression level of the
person.
7. Internet Improves Health The paper suggests some websites where solution to
Outcomes in Depression their problems can be found. It is a kind of self-help.
The model uses theory of behavior change.
8. Detecting Depression In this, there are various ways to take input, viz.,
Using Multimodal speech input, textual input, etc. 8 emotions are
Approach of Emotion considered and accordingly, an alert send to the
Recognition doctor.
9. Classification of depressionGiven that neurophysiological changes due to major
state based on articulatory depressive disorder influence the articulatory
precision precision of speech production, vocal tract formant
frequencies and their velocity and acceleration toward
automatic classification of depression state were
investigated.
10. Predicting anxiety and The model uses ten machine learning algorithms like
depression in elderly Naïve Bayes, Random Forest, Bayesian Network, K
patients using machine star, etc. to classify the patients whether they have
learning technology depression or not. Out of these ten algorithms, the best

one is chosen using the confusion matrix.
5. GAP ANALYSIS
Table 1 Gap Analysis
Sr. Name of the Excerpts Differentiating points

No. Paper
1. The chatbot This paper uses DNN for context and Our project focuses
Feels You – A emotion recognition to generate an on a specific context
Counseling appropriate response by recognizing the ‖Depression‖ and
Service Using best suited reaction. gives a specific
Emotional solution.
Response
Generation
2. Speech Analysis Formant and jitter frequencies in speech are The app mentioned in
and Depression calculated based upon which a depression the paper is android
level is determined. exclusive, whereas
we are planning to
create a web
application. We are
also going to provide
a solution along with
depression level
calculation which the
android app does not
provide.
3. Affective and Linguistic Inquiry and Word Count The paper just
Content Analysis (LIWC) is used for depression recognition. provides a way of
of Online A survey is conducted of various Clinical detecting depression.
Depression and Control communities for better Our app detects and
Communities understanding depression patterns. quantifies depression
and gives satisfactory
solution for the same.
4. Detection of Survey based paper, volunteers are required Limited questions, no
Depression in to speak on certain questions, stories and solution provided,
Speech visual images and using feature selection, unable to recognize
facilitate the depression recognition. root cause of
depression, while our
app does the above
mentioned things.
5. A Model For Here about 500 records have been taken as Only able to detect
Prediction Of test data for the model. The model is tested depression level. No
Human on 500 individuals and successfully solutions are
Depression predicted the percent of individuals are provided. Apriori
Using Apriori suffering depression. Following factors of algorithm has its own
Algorithm depression are considered: Lifestyle, Life disadvantages.

events, Non-psychiatric illness, Acquired

infection, Medical treatments, Professional
activities, Stress, and Relationship Status
etc. The questions were based on Family
problem(FA), Financial
problem(FP),Unemployed(UE),
enumeration
(REM),Addiction(ADD),Workplace(ORG),
Relationship(RL),Congenital diseases(CD),
Apprehension(AP),Hallucination(HL), and
Sleeping problem(SLP).
6. Clinical The speech of the person who is depressed No solution is
Depression is recorded by one of the family members provided, only
analysis Using of the person or his/her friend. Using the depression level is
Speech Features linear features of the speech, the model is determined. To use
able to calculate the depression level of the the model, the person
person. who is depressed has
to depend on another
person. In our app,
the person himself is
interacting with the
system.
7. Internet The paper suggests some websites where The websites provide
Improves Health solution to their problems can be found. It only a generalized
Outcomes in is a kind of self-help. The model uses solution, not a
Depression theory of behavior change. specific solution to
the problem. We are
giving specific
solution to the
problem.
8. Detecting In this, there are various ways to take input, The model is not
Depression viz., speech input ,textual input,etc. useful when someone
Using 8 emotions are considered and goes into depression.
Multimodal accordingly, an alert send to the doctor. It only suggests
Approach of preventive measures.
Emotion While our app
Recognition suggests preventive
measures as well as
the solutions when
the person has gone
into depression.
9. Classification of Given that neurophysiological changes due If the person has
depression state to major depressive disorder influence the depression ,then an
based on articulatory precision of speech production, immediate alert is
articulatory vocal tract formant frequencies and their sent to the doctor, but
precision velocity and acceleration toward automatic if the user is not
classification of depression state were comfortable to talk
investigated. with the doctor, then

His/her depression
will not get treated.
But, in our app we
provide the solution
as well as if the
person is in severe
depression, we
encourage the user to
seek help from the
doctor.
10. Predicting The model uses ten machine learning The time spent on
anxiety and algorithms like Naïve Bayes, Random determining the best
depression in Forest, Bayesian Network, K star, etc. to algorithm to predict
elderly patients classify the patients whether they have is a lot. Also no
using machine depression or not. Out of these ten solution is provided.
learning algorithms, the best one is chosen using the Our application is
technology confusion matrix. fast and also provides
the solution.
6. PROPOSED WORK
Fig 5.1 Proposed Architecture
1. First the user if not already registered in attempt he/she is given a text area to write
the system has to sign up. The signup stage his mental state upon which a specialized
is foolproof and is secured with an OTP questionnaire with respect to his/her
verification stage. depression level is provided.
2 After the Signup step the user is taken to
the login page. After login on the first

3. There are basically 3 levels of where he/she will be provided with a token
Depression going from 1 to 3 according to and can contact the admin who has
the increasing severity. experience in dealing with the sports and
4. The first two levels are considered as study stress.
curable with our app itself. Here an option 5. In case of a very severe condition the
for chatbot is provided which is available contact details of a renowned psychiatrist
24/7. There are two types of students who will be provided. The app generates
can use the app (sports and regular). The reminders after specific intervals just to
chatbot is provided for a regular student. A check the progress of student after some
messenger is created for the sports student remedies have been incorporated by them.
Fig 5.2 Activity Flow Diagram

Formulae
1. Each node is assigned a label via:
...IV
where Ws 2 R5d is the sentiment
classification matrix.
2. The error function of a sentence is:
...V
where = (V;W;Ws;L)
5.1.3 Working
First we will provide text area in which
user has to Express his/her condition.
Then the function will be executed on
...I this text area which will split all the
 h 2 Rd: output of the tensor sentences present in the text area. Then
product this function will return the number of
 V [1:d] 2 R2d2dd: tensor that sentences and array of sentences.
defines multiple bilinear forms.
 V [i] 2 R2d2d: each slice of V Stanford CoreNLP will be applied on
[1:d] this array of sentences to compute the
sentiment level of each sentence. If any
…II one of the sentence‘s sentiment level
returns 1(Negative) then the sentiment
level of complete text area will be 1.
...III

If the number of sentences with Condition 3:- Sentiment level=3

sentiment level 2(Neutral) is greater than Only basic solution will be provided.
or equal to the number of sentences with
sentiment level 3(Positive) then 7. CONCLUSION AND FUTURE
sentiment level of complete text area will WORK
be 2. Else the sentiment level of We are emotional beings looking for
complete text area will be 3. context, relevance and connection in a
Depending on the sentiment level of text technology ridden world. And nothing
area, question set(10 questions) will be better than the very technology
provided(Except for sentiment level 3). enhancing human interactions and easing
out our tasks, right? That‘s the reason
Condition 1:- why by deep diving into the status quo of
Sentiment level=1 the AI driven market in particular, we
1. Out of 10 questions, find a vested interest in the development
4 questions will be of conversational UIs. It comes as a no-
provided which will wonder for such is the penetration of
strictly focus on chat as a medium of conversation today.
whether the user is going to harm Chatbots learn to do new things by
himself/herself or not. trawling through a huge swath of
information. They are designed to spot
2. If answer of any of these 4 questions patterns and repeat actions associated
is yes then the depression level is with them when triggered by keywords,
determined as 3. phrases or other stimuli. They seem
clever, but they are not. They are
3. If out of the remaining 6 questions adaptive & predictive in their learning
user answers atleast 4 Questions as Yes curve. This means that if the input is
then the depression level is determined poor, or repeats questionable statements,
as 2. the chatbots behavior will evolve
accordingly.
4. Otherwise the depression level will be Anti-Depression chatbots would help the
1. depressed people to communicate more
efficiently with the psychiatrists and find
Condition 2:- Sentiment level=2 a solution to their problems. A hospital
1. Out of 10 questions, 4 questions will can have its own anti-
be provided which will strictly focus on depression chatbot so that more patients
whether the user is going to harm get covered. If the chatbot can identify
himself/herself or not. various languages, then it will be more
efficient. These chatbots would really
2. If answer of any of these 4 questions help teenagers who are regular students
is yes then the depression level is and who also play sports as the
determined as 3. depression problem of the teenagers is
not taken seriously. Many teenagers are
3. If out of the remaining 6 questions afraid to talk to their parents about their
user answers at least 5 Questions as Yes current difficult situation. So, these anti-
then the depression level is determined depression chatbots would help these
as 2. students a lot. Anti-depression chatbots
must be installed as a built-in app in all
4. Otherwise the depression level will be mobile phones. As advancements happen
1. in the field of Artificial Intelligence,

these anti-depression chatbots would [4] Brian S. Helfer, Thomas F. Quatieri, James
become more efficient. R. Williamson, Daryush D. Mehta, Rachelle
Horwitz, Bea Yu . Classification of
Anti-Depression chatbot can be used by depression state based on articulatory
professional sports players. There is a lot precision. Interspeech 2013
of pressure on the sports players [5] Lambodar Jena, Narendra K. Kamila. A
particularly when the fail. They need to Model for Prediction Of Human Depression
find some way, some path to the top Using Apriori Algorithm. 2014
International Conference on Information
again and these chatbots can help a lot. Technology.
As we keep bettering the underlying [6] Thin Nguyen, Dinh Phung, Bo Dao, Svetha
technology through trial and error, NLP Venkatesh, Michael Berk. Affective and
will grow more efficient, capable of Content Analysis of Online Depression
handling more complex commands and Communities. 08 April 2014, IEEE
Transactions on Affective
delivering more poignant outputs. Computing(Volume: 5, Issue: 3, July-Sept. 1
chatbots will also be able to have multi- 2014)
linguistic conversations, not only [7] Zhenyu Liu, Bin Hu*, Lihua Yan, Tianyang
understanding hybrid languages like Wang, Fei Liu, Xiaoyu Li, Huanyu Kang.
‗Hinglish‘ (Hindi crossed with English) Detection of Depression in Speech. 2015
International Conference on Affective
with NLU, but with advanced NLG, will Computing and Intelligent Interaction
also be able to reciprocate in kind. On a (ACII).
conversational space, the users enjoy the [8] Tan Tze Ern Shannon, Dai Jingwen Annie
freedom to input their thoughts and See Swee Lan. Speech Analysis and
seamlessly. Meaning, be it an enquiry Depression. 2016 Asia-Pacific Signal and
Information Processing Association Annual
related to a service being provided or a Summit and Conference (APSIPA)
query of help, the users receive an [9] Dongkeon Lee, Kyo-Joong Oh, Ho-Jin Choi.
instant reply which provides them a The chatbot Feels You –A Counseling
sense of direction inside the app.This Service Using Emotional Response
app is the best line of defense against a Generation. 2017 IEEE International
Conference on Big Data and Smart
varying range of depression also for a Computing (BigComp)
wide range of ages. The app can detect, [10] Arkaprabha Sau, Ishita Bhakta. Predicting
measure and cure depression. The app anxiety and depression in elderly patients
will help a huge population to cope with using machine learning
the increasing stress that is gripping the technology.(Volume: 4, Issue: 6, 12 2017)
society. Thus, the contribution of this [11] Recursive Deep Models for Semantic
Compositionality over a Sentiment
app towards society is immense. Treebank; Richard Socher, Alex Perelygin,
Jean Y. Wu, Jason Chuang, Christopher D.
REFERENCES Manning, Andrew Y. Ng and Christopher
[1] Culjak, M. Spranca. Internet Improves Potts; 2013; Stanford University.
Health Outcomes in Depression.
Proceedings of the 39th annual Hawaii
international conference on system science,
2006, pp. 1 – 9
[2] Imen Tayari Meftah, Nhan Le Thanh,
Chokri Ben Amar. Detecting Depression
Using Multimodal Approach of Emotion
Recognition. GLOBAL HEALTH 2012 :
The First International Conference on
Global Health Challenges.
[3] Shamla Mantri, Dr. Pankaj Agrawal, Dr.
S.S.Dorle, Dipti Patil, Dr. V.M.Wadhai.
Clinical Depression analysis Using Speech
Features. 2013 Sixth International
Conference on Emerging Trends in
Engineering and Technology

EMOTION ANALYSIS ON SOCIAL MEDIA

PLATFORM USING MACHINE LEARNING
Shreyas Bakshetti1, Pratik Gugale2, Jayesh Birari3, Sohail Shaikh4
1,2,3,4
bakshettis18@gmail.com1, gugalepratik2403@gmail.com2, jayeshbirari07@gmail.com3,
sohail.r.shaikh123@gmail.com4
ABSTRACT
Social media has become source to various kinds of information now-a-days. Analyzing
this huge volume of user-generated data on social media can provide useful information
for understanding people‘s emotions as well as the general attitude and mood of the
public. Sentiment analysis also known as opinion mining is a part of data mining that
deals with classifying text expressed in natural language into different classes of
emotions. In this paper, we present a framework for sentiment analysis of twitter data
using machine learning.
Keywords
Sentiment Analysis, Machine learning, Ensemble approach
1. INTRODUCTION be a combination of text, symbols,
Over the last years the rise of social media emoticons and images as well. A lot of
has changed completely the way of times these tweets as used by the people to
communication and they provide new express their views on various topics and
means that connect in real time people all interact with other users and understand
over the globe with information, news and their views. Sentiment analysis presents an
events. Social media have changed opportunity to organizations with political,
completely the role of the users and have social and economic interests to
transformed them from simple passive understand the mood of people on various
information seekers and consumers to topics.
active producers. With the wide-spread In this work, we present a framework for
usage of social media, people have become understanding and then representing public
more and more habitual in expressing their attitude/mood expressed by the users using
opinions on web regarding almost all the social media platform twitter. The data
aspects of everyday. Every day, a vast required for this purpose will be extracted
amount of heterogeneous big social data is using the application programming
generated in various social media and interface (API) provided by twitter. The
networks. This vast amount of textual data extracted data will be applied upon by
necessitates automated methods to analyze some pre-processing techniques that will
and extract knowledge from it. help select only the parts of the text that
A big contributor to this large amount of actually express emotions. This will be
social data is the widely-used social media then followed by feature selection which
platform Twitter. It is a platform where will be used to build the classifiers.
users of the platform interact using Finally, the classifiers will be used to label
messages called ―tweets‖. These tweets the data into polarities that is positive or
can be simple textual sentences or they can negative.

Bigram Collocation. They also proposed

2. RELATED WORK that there was a scope for improvement
Emotion detection methods can be divided using hybrid techniques with various
into 2 main parts; lexicon-based classification algorithms. The paper [5]
methodologies and machine learning proposed a system using Naive Bayes
methods. Lexicon based methods use (NB) and Maximum Entropy (ME)
lexical resources to recognize sentiments methods to the same dataset which worked
in text. This approach is basically a key- very well with the high level of accuracy
word based approach where every word in and precision. The work in [6] presented a
the text is compared to dictionaries of survey on different classification
words that contain words expressing algorithms (NB, KNN, SVM, DT, and
emotions or sentiments. In this paper [1] Regression). Authors found that almost all
use of a lexicon-based approach to analyze classification techniques were suited to the
basic emotions in Bengali text is done. A characteristics of text data.
model was presented to extract emotion Use of neural networks has started to
from Bengali text at the sentence level. In increase in sentiment analysis in recent
order to detect emotion from Bengali text, times. The authors in their work in paper
the study considered two basic emotion [7] have compared the performances of
‘happiness‘ and ‘sadness‘. The proposed CNN (Convolutional Neural Networks)
model detected emotions on the basis of and combination of CNN and SVM (a
the sentiment of each sentence associated supervised technique) and found out that
to it. the performance of the combination is
The other method is using various machine much higher than only of CNN.
learning algorithms to build classifiers that In this paper we are going to use machine
will help in the process of sentiment learning approach as it is better as
analysis. Machine learning also contains compared to the lexicon-based approach.
two different types of techniques: This paper also seeks to improve the
supervised and unsupervised which can previous works by using the ensemble
both be used for sentiment analysis. But technique for building the classifiers which
mostly the supervised techniques are used is bound to show great improvement in
for sentiment analysis. In the work performance for sentiment analysis.
presented by this paper [2] linear
regression which is a supervised machine 3. PROPOSED SYSTEM
learning technique has been used to for the The aim of our system is to develop a
purpose of sentiment analysis. In another framework to display the emotions of the
work done in the paper [3] classification public regarding any particular topic. To
algorithms such as Naïve Bayes do this, we will be building an application
multinomial (NBM), Sequential minimal that can be given an input (which will be
optimization (SMO), Compliment Naïve the topic regarding which the emotions of
Bayes (CNB) and Composite hypercubes the public are to be anticipated) and the
on iterated random projections (CHIRP) application after applying pre-processing,
were used for classification. The Naïve feature extraction and classification will
Bayes multinomial (which is a variation of display the mood of the public regarding
naïve Bayes) gave the highest accuracy. the given topic using graphs and statistics.
The author in paper [4] explored machine The social media platform that we are
learning approaches with different feature using in our work is twitter, using the
selection schemes, to identify the best application programming interface
possible approach and found that the provided by twitter we are able to extract
classification using high information as many tweets as possible. Once we
features, resulted in more accuracy than extract the tweets, we will apply a number

of steps to finally classify the tweets into their origin text form by looking up the
two labels that are positive and negative, emoticon dictionary.
thus expressing the mood of the public NLP and Feature Selection
regarding the given topic. The most Natural language processing basically
interesting part of our work is the use of includes removal of stop words and
ensemble approach in process of stemming of the words after pre-
classification and classifying the tweets. processing.
But before applying the machine learning -Stop word Removal: Stop words usually
algorithms it is important that proper pre- refer to the most common words in a
processing of the data is done. Once pre- language, such as "the", "an", and "than".
processing is done, it will be followed by The classic method is based on removing
feature extraction which will be used for the stop words obtained from precompiled
generating feature vectors which will then lists. There are multiple stop words lists
be used for the purpose of classification. existing in the literature.
Using graphs and statistics we will also be -Stemming: It refers to replacing of
providing comparison between results multiple words with same meaning.
obtained using the techniques individually Example: "played", "playing" and "play"
and the results obtained by using the all are replaced with play.
ensemble approach. The algorithms that will be used for these
Pre-processing purposes are described in the further
The Tweets are usually composed of sections of the paper.
incomplete expressions, or expressions Finally, the feature selection is done.
having emoticons in them or having Vectors of words are created after pre-
acronyms or special symbols. Such processing and NLP has been applied on
irregular Twitter data will affect the the tweets. These vectors are given to the
performance of sentiment classification. classifiers for the purpose of classification.
Prior to feature selection, a series of Ensemble Approach for Classification
preprocessing is performed on the tweets In our work we are going to use the
to reduce the noise and the irregularities. ensemble approach for the purpose of
The preprocessing that will be done is: classification, that is labelling the tweets
-Removal of all non-ASCII and non- into different polarities. This is the most
English characters in the tweets. important part of our work as most of the
-Removal of URL links. The URLs do not works done previously have used only
contain the any useful information for our single machine learning algorithms for the
analysis, so they will be deleted from purpose of classification but in this work,
tweets. we are going to use an Ensemble of three
-Removal of numbers. The numbers different algorithms to obtain better results
generally do not convey any sentiments, in prediction than what could be obtained
and thus are useless during sentiment from any of the learning algorithms alone.
analysis and thus are deleted from tweets. The advantage of using the ensemble
-Expand acronyms and slangs to their full approach is that is significantly increases
words form. Acronyms and slang are the efficiency of classification. One more
common in tweets, but are ill-formed important thing about using the ensemble
words. It is essential to expand them to approach is the use of right combinations
their original complete words form for of algorithms. In our work we are going to
sentiment analysis. consider Naïve-Bayes, Random Forest and
-Replace emoticons and emojis. The Support Vector Machine for the ensemble
emoticon expresses the mood of the writer. classifier. These algorithms have been
We replace the emoticons and emoji with selected as they have proven to give the
best results when used individually and

thus using them in the ensemble will also three main parts: Pre-processing, Feature
yield efficient results. The algorithms have selection and applying the ensemble
been discussed in short in a further section. classifier to perform sentiment analysis on
. social media big data and visualization of
4. SYSTEM ARCHITECTURE the results obtains using graphs.
The following figure shows the proposed
architecture of the system which includes
Fig 1: System Architecture

5. ALGORITHMS Stop-Word Removal Algorithm
Algorithms will be used in the pre- Input: Document D of comments of review
processing as well as the classification file.
phase. In pre-processing the algorithms Output: Stop-word removed comment
used will be for stemming and stop-word data.
removal. They are described below: Step 1: The text of input Document D is
NLP Algorithms tokenized and each and every word from D
The heading of subsections should be in is stored in array.
Times New Roman 12-point bold with Step2: A single stop word is read from the
only the initial letters capitalized. (Note: list of stop-words.
For subsections and subsubsections, a Step 3: The stop word that is read from
word like the or a is not capitalized unless stop-word list is now compared to the
it is the first word of the header.) word from D using sequential search
technique.

Step 4: If the word matches, then it is vectors of feature values, where the
removed from the array, and the class labels are drawn from some finite
comparison is continued till all the words set. This classification technique is a
from D is compared successfully. probabilistic classification technique
Step 5: After successful removal of first which finds the probability of a label
stop-word, another stop-word is read from belonging to a certain class. (In our
stop-word list and again we continue from case the classes are positive and
step 2. The algorithm runs till all the stop- negative).
words are compared successfully. The algorithm uses the Bayes theorem
Stemming Algorithm for the purpose of finding the
Input: comments after stop-words probabilities. The theorem assumes
removing. that the value of any particular feature
Output: Stemmed comment data. is independent of the value of any
other feature.
Step 1: A single comment is read from
output of stop-word removing It is given as:
algorithm. P(A|B) = (P(A)* P(B|A)) / P(B)
Step 2: This is then written into Support Vector Machine
another file at location given and read A support vector machine is a
during stemming process. supervised technique in machine
Step 3: tokenization is applied on learning. In this technique every data
selected comment. item is represented as a point in a n-
dimensional space and hyperplane is
Step 4: A particular word is processed
constructed that separates the data
during stemming in loop and checked
points into different classes. and then
if that word or character is null or not.
this hyperplane is used for the purpose
Then that word is converted into lower
of classification. The hyperplane will
case and compared with another words
divide the dataset into two different
in comments.
classes positive and negative in our
Step 6: If words with similar kind of or work.
meaning are found are stemmed, that
A hyperplane having the maximum
is they are reduced to their basic word.
distance to the nearest training data
After the pre-processing is done the item of both the classes is considered
next step will be building the classifier to be the most appropriate hyperplane.
based on the ensemble approach. The This distance is called margin. In
following algorithms are being general, the larger is the margin the
considered for that purpose: lesser is the error in classification.
Machine Learning Algorithms Random Forest
This section describes the machine Random Forest is developed as an
learning algorithms that will be used in our ensemble of based on many decision trees.
work in brief. It is basically a combination of many
Naïve Bayes Algorithm decision trees. In classification procedure,
Naive Bayes is a simple technique for each Decision Tree in the Random Forest
constructing classifiers: models that classifies an instance and the Random
assign class labels to which describes Forest classifier assigns it to the class with
the probability of a feature, based on most votes from the individual Decision
prior knowledge of conditions that Trees. So basically, each decision tree in
might be related to that feature.) the random forest performs classification
problem instances, represented as

on random parts of the dataset and use of machine learning instead of lexicon-
predictions by of all these different trees based approach is a big plus-point of this
are aggregated to generate the final results. work the framework has the potential to
outdo the existing systems because of the
6. PERFORMANCE use of the ensemble approach. It will do
MEASUREMENTS the classification on the basis of polarities
The classification performance will be i.e. positive and negative. Future work can
evaluated in three terms accuracy, include developing better techniques for
recall and precision as defined below. visualizing the results. Another possible
A confusion matrix is used for this. future work can be classifying the tweets
on a range of emotions. Another direction
for future work can be using of larger
True positive reviews + True datasets to train the classifiers so as to
Negative reviews improve the efficiency of the analysis
process.
Accuracy = --------------------------------
----------------------------
REFERENCES
Total number of [1] Tapasy Rabeya and Sanjida Ferdous. ―A
documents Survey on Emotion Detection‖. 2017, 20th
International Conference of Computer and
Information Technology (ICCIT)
[2] Sonia Xylina Mashal, Kavita Asnani in their
work ―Emotion Intensity Detection for Social
True positive Media Data‖. 2017, International
Conference on Computing Methodologies and
reviews Communication (ICCMC)
Recall = ------------------------------------ [3] Kudakwashe Zvarevashe, Oludayo O
Olugbara. "A Framework for Sentiment
------------------- Analysis with Opinion Mining of Hotel
True positive reviews + false Reviews".2018, Conference on Information
negative reviews Communications Technology and Society
(ICTAS)
[4] M. Trupthi et al., ―Improved Feature
Extraction and Classification - Sentiment
True positive reviews Analysis, ―International Conference on
Advances in Human Machine Interaction
Precision = --------------------------------- (HMI-2016), March 03-05, 2016, R. L. Jalappa
------------------------- Institute of Technology, Doddaballapur,
Bangalore, India.
True positive review+ false [5] Orestes Apple et al., ―A Hybrid Approach to
positive reviews Sentiment Analysis‖, IEEE, 2016.
[6] S. Brindhaet et al., ―A Survey on Classification
Techniques for Text Mining‖, 3rd International
WORK Conference on Advanced Computing and
A framework is being built that will Communication Systems (ICACCS-2016),
enhance the existing techniques of Jan. 22-23, 2016, Coimbatore, India.
sentiment analysis as previous techniques [7] Y Yuling Chen, Zhi Zhang. "Research on text
sentiment analysis based on CNNs and SVM".
mostly focused on classification of single
2018,Conference on Information
sentences but the framework, we are Communications Technology and Society
building works on huge amounts of data (ICTAS).
using machine learning techniques. The

STOCK MARKET PREDICTION USING MACHINE

LEARNING TECHNIQUES
Rushikesh M. Khamkar, Rushikesh P. Kadam, Moushmi R. Jain, Ashwin Gadupudi
Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune
rushi.khamkar5678@gmail.com, jainmoushmi@gmail.com, rpkadam27@gmail.com,
ashwingg97@gmail.com
ABSTRACT
The Stock Market prediction task is interesting as well as divides researchers and
academics into two groups those who believe that we can devise mechanisms to predict
the market and those who believe that the market is efficient and whenever new
information comes up the market absorbs it by correcting itself, thus there is no space for
prediction using different Support Vector Machine(SVM), Single Level Perceptron, Multi
-Level Perceptron, Radial Bias Function.
General Terms
Support vector machine, radial basis function, multi-level perceptron, single level
perceptron, Machine learning.
Keywords:
Stock Market, Stock prediction, Machine learning, Classification of Stocks.
1. INTRODUCTION • NYSE - New York Stock Exchange
For a new investor, the share market •NASDAQ - National Association of
can feel a lot like legalized gambling. Securities Dealers
Randomly choose a share based on gut • NSE – National Stock Exchange (India)
instinct. If the value of your share goes up • BSE – Bombay Stock Exchange
you‘re in profit else you‘re in loss. The
share market can be intimidating, but the There is no way to predict the
more you learn about shares, and the more accurate trends in stock market. Many
you understand the true nature of stock factors affect rises the share prices of
market investment, the better and smarter different companies[1]. The best way to
you'll manage your money. understand stock markets is to analyze and
Terms: study how the market movements have
• A stock of a company constitutes the been in the past[2].
equity stake of all shareholders.
• A share of stock is literally a share in the Share market trends tend to repeat
ownership of a company[1]. When themselves overtime. After you study the
investor purchases a share of stock, the cycle of a particular stock, you can make
investor is entitled to a small fraction of predictions about how it will change over
the assets and profits of that company. the course of time[3]. Some stocks might
• Assets include everything the company be truly arbitrary in which case the
owns (real estate, equipment, inventory) movement is random but in most of the
• Shares in publicly traded companies are cases there is a particular trend that repeats
bought and sold at a stock market or a itself. Recognizing these patters will
stock exchange. enable you to predict the future trend[1].
These are some examples of popular stock The project goal is to build a system where
exchanges: the machine learning algorithms try to

predict the prices of stocks based on their 1. Comparative analysis of data mining
previous closing prices and other attributes techniques for financial data using
that influence its price like Interest rates, parallel processing[1]
Foreign exchange and Commodity [2014] [IEEE] Do the comparative
prices[4]. analysis of several data mining
classification techniques on the basis of
2. MOTIVATION parameters accuracy, execution time, types
Stock market movements make of datasets and applications. Simple
headlines every day. In India, 3.23 crore Regression and multivariate analysis used,
individual investors trade stocks. Regression analysis on attributes is used.
Maharashtra alone accounts for one-fifth No use of machine learning. Does not
of these investors. However, a report from provide the algorithm used.
Trade Brains shows that 90% of these
investors lose money in due to various 2. Stock market prices do not follow
reasons like insufficient research, random walks: Evidence from a simple
speculation, trading with emotions etc. specification test[2]
[2015] [IEEE]Test the random
Higher inflation rate and lower walk hypothesis for weekly stock market
interest rate makes it ineffective to put returns by comparing variance estimators
one‘s money into savings account or fixed derived from data sampled at different
deposits.[5][6] Thus, many people look up frequencies. Simple trading rules
to stock market to keep up with the extraction and Extraction of Trading Rules
inflation. In this process of multiplying from Charts and Trading Rules. No
their money many investors have made a alternative provided for human investing.
fortune while, some have lost a lot of Show only the flaws on manual
money due to unawareness or lack of time investments.
to research about a stock.
3. A Machine Learning Model for Stock
There are lots of contradicting Market Prediction[3]
opinions in the news and an individual [2017] [IJAERD] Support Vector
may not have the time or may not know Machine with Regression Technology
how to research about a stock. Most (SVR), Recurrent Neural networks (RNN).
importantly, it is very difficult to manually Regression analysis on attributes using
predict the stocks prices based on their simple Regression and multivariate
previous performance of that stock. Due to analysis used. It is not tested in real
these factors many investors lose a lot of market. Shows how social media affects
money every year[6]. share prices. Does not account for other
A system that could predict the factors.
stock prices accurately is highly in
demand. Individuals can know the 4. Twitter mood predicts the stock
predicted stock prices upfront and this may market[4]
prevent them from investing in a bad [2010] [IEEE] Analyze the text
stock. This would also mean a lot of saved content of daily Twitter feeds by two
time for many of the investors who are mood tracking tools, namely Opinion
figuring out wheather a particular stock is Finder that measures positive vs. negative
good or not. mood and Google-Profile of Mood States.
These results are strongly indicative of a
3. LITERATURE SURVEY predictive correlation between
measurements of the public mood states
from Twitter feeds. Difficult to scan each

every text extraction from large set of data, predict the long-term trend of the stock
difficult Text mining. market. The proposed model detects
anomalies in data according to the volume
5.Stock Market Prediction on High- of a stock to accurately predict the trend of
Frequency Data Using Generative the stock. This paper only provides long
Adversarial Nets[5] term predictions and does not give
[2017] [Research] Propose a predictions to the immediate trends.
generic framework employing Long Short-
Term Memory (LSTM) and convolutional 5. PROPOSED WORK
neural network (CNN)for adversarial
training to forecast high frequency stock Stock Market Prediction Using
market. This model achieves prediction Machine Learning can be a challenging
ability superior to other benchmark task. The process of determining which
methods by means of adversarial training, indicators and input data will be used and
minimizing direction prediction loss, and gathering enough training data to training
forecast error loss. It Can‘t predict Multi the system appropriately is not obvious.
scale Conditions and live data The input data may be raw data on
volume, price, or daily change, but also it
6. Stock Market Prediction Using may include derived data such as technical
Machine Learning[6] indicators (moving average, trend-line
[2016] [IEEE] Uses different indicators, etc.)[5] or fundamental
modules and give different models and indicators (intrinsic share value, economic
give best accuracy using live streaming environment, etc.). It is crucial to
data. Predict Real Market Data and understand what data can be useful to
calculate Live data using single and capture the underlying patterns and
multilevel perspective, SVM, Radial Bias. integrate into the machine learning system.
It Couldn‘t work Textual Data form The methodology used in this work
different Browsing Data (Web Crawling) consists on applying Machine Learning
systems, with special emphasis on Genetic
7. Stock Market Prediction by Using Programming. GP has been considered one
Artificial Neural Networks[7] of the most successful existing
[2014] [IEEE] This model takes computational intelligence methods and
help of Artificial Intelligence and uses capable to obtain competitive results on a
only neural networks to predict the data. very large set of real-life application
Predicting data using single and multi- against other methods. Section Different
level perceptron. It uses 10 hidden layers Algorithms used in algorithm[1].
with the learning rate of 0.4, momentum
constant at 0.75 and Max Epochs of 1000. Tools and Technologies Used
This model doesn‘t use machine learning  Python
algorithms like SVM and radial basis  Usage of libraries like – OpenCV, scikit,
function to determine their accuracy. pandas, numpy
 Machine Learning techniques -classifiers
8. Price trend prediction Using Data  Linear regression techniques
Mining Algorithm[8]  Jupyter IDE
[2015] [IEEE] This paper
presented a data mining approach to
4. GAP ANALYSIS

1 Comparative analysis Do the comparative Simple Regression and No use of machine

of data mining analysis of several data multivariate analysis learning. Does not
techniques for mining classification used, Regression analysis provide the algorithm
financial data using techniques on the basis of on attributes is used used.
parallel processing parameters accuracy,
execution time, types of
[2014] [IEEE] datasets and applications.
2 Stock market prices do Test the random walk Simple trading rules No alternative provided
not follow random hypothesis for weekly stock extraction and Extraction for human investing.
walks: Evidence from market returns by of Trading Rules from Show only the flaws on
a simple specification comparing variance Charts and Trading Rules manual investments.
test estimators derived from
data sampled at different
[2015] [IEEE] frequencies
3 A Machine Learning Support Vector Machine Regression analysis on It is not tested in real
Model for Stock with Regression attributes using simple market. Shows how social
Market Prediction Technology (SVR), Regression and media affects share
Recurrent Neural networks multivariate analysis used prices. Does not account
[2017] [IJAERD] (RNN) for other factors.
4 Twitter mood predicts Analysed the text content of These results are strongly Difficult to scan each
the stock market daily Twitter feeds by two indicative of a predictive every text extraction from
mood tracking tools, correlation between large set of data, difficult
[2010] [IEEE] namely Opinion Finder that measurements of the Text mining
measures positive vs. public mood states from
negative mood and Google- Twitter feeds
Profile of Mood States
5 Stock Market Propose a generic This model achieves Can‘t predict Multi scale
Prediction on High- framework employing Long prediction ability superior Conditions and live data
Frequency Data Using Short-Term Memory to other benchmark
Generative Adversarial (LSTM) and convolutional methods by means of
Nets neural network (CNN)for adversarial training,
adversarial training to minimizing direction
[2017] [Research] forecast high frequency prediction loss, and
stock market forecast error loss
6 Stock Market Uses different modules and Predict Real Market Data Couldn‘t work Textual
Prediction Using give different models and and calculate Live data Data form different
Machine Learning give best accuracy using using single and Browsing Data (Web
live streaming data. multilevel perspective, Crawling)
SVM, Radial Bias

6. METHODOLOGY weight vector and the value b as the bias

In this project we tried to predict the stock term. The term w.x refers to the dot
market prices using four different types of product (inner product, scalar product),
SVM and Artificial Neural Networks which calculates the sum of the products
Algorithms. of vector components.
Support Vector Machine (SVM)
In machine learning, support vector Classification hyper-plane equations:
machines are supervised learning models Positive margin hyper-plane equation: w.x
with associated learning algorithms that –b=1
analyze data and recognize patterns, used Negative margin hyper-plane equation:
for classification and regression analysis. w.x – b = -1
The basic SVM takes a set of input data Middle optimum hyper-plane equation:
and predicts, for each given input, which w.x – b = 0
of two possible classes forms the output,
making it a non-probabilistic binary linear
classifier. Given a set of training Radial Bias
examples[7], each marked as belonging to Radial basis function network is an
one of two categories, an SVM training artificial neural network which uses radial
algorithm builds a model that assigns new basis functions as activation functions.
examples into one category or the other. These networks are feed forward networks
An SVM model is a representation of the which can be trained using supervised
examples as points in space, mapped so training algorithms. These networks are
that the examples of the separate used for function approximation in
categories are divided by a clear gap that is regression, classification and time series
as wide as possible. New examples are predictions[5]. Radial basis function
then mapped into that same space and networks are three layered networks where
predicted to belong to a category based on the input layer units does no processing,
which side of the gap they fall on[6]. the hidden layer units implement a radial
In addition to performing linear activation function and the output layer
classification, SVMs can efficiently units implement a weighted sum of the
perform a non-linear classification using hidden unit outputs. Nonlinearly separable
what is called the kernel trick, implicitly data can easily be modeled by radial basis
mapping their inputs into high dimensional function networks. To use the radial basis
feature spaces. function networks we have to specify the
type of radial basis activation function, the
number of units in the hidden layer and the
algorithms for finding the parameters of
the network[3].
Figure 3. 1: demonstration of SVM
Linear discriminant function:

f(x) = w.x+b
In this function, x refers to a training
dataset vector, w is referred to as the Figure 3.2: An demonstration of Radial
Bias
h(x) = Φ((x - c)T R-1 (x - c))

Where Φ is the function used, c is
the center and R is the metric. The term (x
- c)T R-1 (x - c) is the distance between the
input x and the center c in the metric
defined by R. There are several common
types of functions used such as Gaussian
Φ(z) = e-z, the multi-quadratic
Φ(z)=(1+z)1/2, the inverse multi-quadratic
Φ(z) = (1+z)-1/2 and the Cauchy Φ(z) =
(1+z)-1.
Figure 3.3: An demonstration of Single
Level And Multi Level Perceptron
Single Layer and Multi-layer
Perceptron
Single and multi-level perceptrons
A single layer perceptron (SLP) is
have multiple inputs and a single output.
a feed-forward network based on a
Consider x1,x2,…xn be input vectors and
threshold transfer function. SLP is the
w1,w2,…wn be the weights associated
simplest type of artificial neural networks
with them[7].
and can only classify linearly separable
cases with a binary target (1, 0)[1]. The
Output a = x1.w1 + x2.w2 + …xn.wn
single layer perceptron does not have a
priori knowledge, so the initial weights are
7. SYSTEM ARCHITECHTURE
assigned randomly. SLP sums all the
weighted inputs and if the sum is above
the threshold (some predetermined value),
SLP is said to be activated (output=1). The
input values are presented to the
perceptron, and if the predicted output is
the same as the desired output, then the
performance is considered satisfactory and
no changes to the weights are made.
However, if the output does not match the
desired output, then the weights need to be
changed to reduce the error[8].
A multi-layer perceptron (MLP)
has the same structure of a single layer
perceptron with one or more hidden layers.
The backpropagation algorithm consists of
two phases: the forward phase where the
activations are propagated from the input
to the output layer, and the backward
phase, where the error between the
observed actual and the requested nominal
value in the output layer is propagated
backwards to modify the weights and bias
values[5]. 2 Propagation: Forward and 8. CONCLUSION
Backward
In this thesis, we looked at the
problem of forecasting stock performance.

Although a substantial volume of research deal with the various problems in the data
exists on the topic, very little is aimed at and ended up using mean substitution and
long term forecasting while making use of feature deletion.
machine learning methods and textual data
sources. We prepared over ten year worth 9. FUTURE WORK
of stock data and proposed a solution
which combines features from textual 1. Model Updating Frequency:
yearly and quarterly filings with They are trained once and then
fundamental factors for long term stock used for predicting stock performances
performance forecasting. Additionally, we over the span of a year. Since we use a
developed a new method of extracting return duration of 120 trading days, there
features from text for the purpose of is a necessary wait of half a year before
performance forecasting and applied data can be used to train models, which
feature selection aided by a novel means that models end up making
evaluation function. Problems predictions using data which is over a year
Overcome[5]. To produce effective old. One way to make use of data as soon
models, there were two main problems we as it become available is to completely
were faced with and had to overcome. The retrain the model every week (or less). A
first was that of market efficiency, which faster way to improve model performance
places theoretical limits on how patterns may be through updating using
can be found in the stock markets for the incremental machine learning algorithms,
purpose of forecasting. This property can which can update model parameters
become a concrete problem by patterns without re-training on all data[6].
being exhibited in the data which are
useless or even detrimental for predicting 2. Explore More Algorithms:
future values. The way we tried to deal Although many different models
with this was by carefully splitting our were considered in this thesis, including
data into training, validation, and testing various linear regression methods,
data with expanding windows so as to gradient boosting, random forests, and
make maximum use of it while trying to neural networks, there is always more
avoid accidental overfitting. The second room to explore.
way we dealt with this was by using a
tailored model performance metrics, which 3. Improve Feature Extraction:
aimed to ensure good test performance of In this thesis, a few methods for
models by not only maximizing model extracting features from filings with
validation, but also minimizing the textual data were explored. The problem of
variation across validation years of this extracting features from text and
value[7]. The third way we dealt with determining text sentiment in particular are
market efficiency was by performing well studied, and other natural language
feature selection using the Algorithm, so processing methods may perform better.
as to remove those features which Our approach of using autoencoders to
performed poorly or unreliably. The extract features may also benefit from
second set of problems came from putting further exploration. In particular, when
together a dataset to use for using the auxiliary loss, a more accurate
experimentation and testing. Due to the method for estimating the financial effect
large volume of the data, care had to be corresponding to a given filing would be
taken when cleaning and preparing it, and useful.
the inevitable mistakes along the way
required reprocessing of the data[4]. Using 4. Utilize Time Series Information.:
expert knowledge, we determined how to

Similar to the idea of updating [10] Ibrahim M. Hamed Ashraf S. Hussein,

model frequency, another area for ―An Intelligent Model for Stock Market
Prediction‖, The 2011 International
exploration includes utilizing the time Conference on Computer Engineering &
series aspect of the data. Our current Systems
models are not aware that the samples
occur in any temporal order, and thus are
not able to spot patterns in stock
performance that depend on knowing the
order of samples. One type of model that is
often used to find and make use of these
type of patterns are recurrent neural
networks[9].
REFERENCES
[1] Raut Sushrut Deepak, Shinde Isha Uday ,Dr.
D. Malathi, ―Machine Learning Approach in
stock market prediction‖-2015 International
Journal of Pure and Applied Mathematics
Volume 115 No. 8 2017, 71-77.
[2] Tao Xing, Yuan Sun, Qian Wang, Guo Yu,
―The Analysis and Prediction of Stock Price,‖
2013 IEEE International Conference on
Granular Computing.
[3] A. W. Lo, & A. C. MacKinlay, ―Stock market
prices do not follow random walks: Evidence
from a simple specification test,‖ Review of
financial studies, vol. 1, no. 1, pp. 41-66, 1988.
[4] Yash Omer , Nitesh Kumar Singh, ―Stock
Prediction using Machine Learning‖, 2018
International Journal on Future Revolution in
Computer Science & Communication
Engineering.
[5] Ritu Sharma, Mr. Shiv Kumar, Mr. Rohit
Maheshwari ―Comparative Analysis of
Classification Techniques in Data Mining
Using Different Datasets‖, 2015 International
Journal of Computer Science and Mobile
Computing.
[6] Osma Hegazy, Omar S. Soliman, ―A Machine
Learning Model for Stock Market Prediction‖,
International Journal of Computer Science and
Telecommunications [Volume 4, Issue 12,
December 2013].
[7] S .P. Pimpalkar, Jenish Karia, Muskaan Khan,
Satyam Anand, Tushar Mukherjee, ―Stock
Market Prediction using Machine Learning‖,
International Journal of Advance Engineering
and Research Development, vol. 4 2017.
[8] Xingyu Zhou , Zhisong Pan , Guyu Hu , Siqi
Tang,and Cheng Zhao, ―Stock Market
Prediction on High-Frequency Data Using
Generative Adversarial Nets‖, Mathematical
Problems in Engineering, Volume 2018.
[9] J. Bollen, H. Mao, & X. Zeng, ―Twitter mood
predicts the stock market. Journal of
Computational Science,‖ vol. 2, no. 1, pp. 1-8,
2011.

STOCK RECOMMENDATIONS AND PRICE

REDICTION BY EXPLOITING BUSINESS
COMMODITY INFORMATION USING DATA
MINING AND MACHINE LEARNING TECHNIQUES
Dr. Parikshit N. Mahalle1, P R Chandre2, Mohit Bhalgat3, Aukush
Mahajan4, Priyamvada Barve5, Vaidehi Jagtap6
1,2,3,4,5
Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Savitribai
Phule Pune University.
Aalborg.pnm@gmail.com1,pankajchandre30@gmail.com2ankush.v.mahajan@gmail.com3,
priyamvadabarve001@gmail.com4, vaidehijagtap56@gmail.com5
ABSTRACT
Abstract Market is an untidy place for predicting since there are no significant rules to
estimate or predict the price in share market. Many methods like technical analysis,
fundamental analysis, and statistical analysis, etc. are all used to attempt to predict the
price in the market but none of these methods are proved as a consistently acceptable
prediction tool. In this project we attempt to implement an Artificial Intelligence
technique to predict commodity market prices. We select a certain group of raw
material and parameters with relatively significant impact on the price of a commodity.
Although, market can never be predicted, due to its vague domain, this concept aims at
applying Artificial Intelligence in predicting the commodity prices and recommending
stock modelling. This System aims to assess the accuracy of prediction by 2 stages and
assess the precision of recommendation by the last recommendation stage. Although
there is considerable movement between spot and futures prices, futures prices tend to
exhibit less variability than spot prices.
Hence, futures prices tend to act as an anchor for spot prices, and error-correction
models that exploit the long-run integrating relationship provide better forecasts of
future spot-price developments.
Index Terms— Commodity Prices, Forecast, Prediction
1. INTRODUCTION characterized by high risk and high yield;

Stock prices are considered to be chaotic hence investors are concerned about the
and unpredictable, with very few rules analysis of the stock market and are trying
therefore predicting or assuming anything to forecast the trend of the stock market.
of it is a very tricky business. Predicting This paper enhances the idea towards
the future stock prices of financial stock recommendation and price
commodities or forecasting the upcoming prediction, it intends to assess the accuracy
stock market trends can enable the of price forecasts for commodities over the
investors to garner profit from their past several years. In the view of the
trading by taking calculated risks based on difficulties in accurately forecasting future
reliable trading strategies. This paper price movements this system aims to
focuses on implementing Machine achieve more prominent results than
Learning and Artificial Intelligence to others. This project will not only work in
predict commodity prices, so as to help the the area of increasing sales of the
business providers put their investment country‘s business providers but also help
and efforts in the right direction to gain in managing those sales and keep them on
maximum profit. The stock market is the path of improvement. It will help

people in this area of work to take smart, position where we can invest smartly,
calculated and informed decisions which knowingly and have maximum outcome.
will add to the advancement of the field For efficient manufacturing the actual real-
and economy of the country. Design time consumption is necessary but, it is not
model will work on the data given in past always possible to analyze real-time data
several years and it will be able to hence stock recommendation will
improvise itself according to the real-time give manufactures an overview of
data that comes along the way. Model will stock consumption leading towards
aim to achieve higher level of accuracy lower production cost and in result
towards prediction range and will be the end consumer will be benefited.
adaptable to any kind of data that is given
to it. A number of alternate measures of
forecast performance, having to do with 3. STATE OF ART
statistical as well as directional accuracy, Stock prices are considered to be
are employed. Stock recommendation chaotic and unpredictable. Predicting the
system will be based on already known future stock prices of financial
data to us, we focus on raw material and commodities or forecasting the upcoming
dependency variation through artificial stock market trends can enable the
intelligence and Machine Learning is the investors to garner profit from their
path for our project. trading by taking calculated risks based
on reliable trading strategies. The stock
2. MOTIVATION market is characterized by high risk and
Stock recommendation and prediction is a high yield; hence investors are concerned
very tricky business, and forecasting about the analysis of the stock market and
commodity prices relying exclusively are trying to forecast the trend of the stock
on historic price data is a challenge of its market. To accurately predict stock
own. Spot prices and future prices are market, various prediction algorithms and
nonstationary they form a co- integrating models have been proposed in the
relation. Spot prices tend to move literature. In the paper proposed by A.Rao
towards future prices over the long run ,S.Hule ,Stock Market Prediction
hence predicting the path has become more Using Statistical Computational
useful than ever. Fluctuations in Methodologies and Artificial Neural
commodity prices affect the global Networks, the focus is on the technical
economic activity. For many countries, approaches that have been proposed
especially developing countries, primary and/or implemented with varying levels
commodities remain an important source of accuracy and success rates. It
of export earnings, and commodity price surveys mainly two approaches – the
movements have a major impact on Statistical Computational Approach and
overall performance therefore the Artificial Neural Networks Approach.
commodity-price forecasts are key input to It also describes the attempts that have
policy planning and formulation. Sales is a gone in combining the two approaches
very crucial aspect when it comes to any in order to achieve higher accuracy
developing nation but managing that predictions.
sales within the country and In another work done by K.K.Sreshkumar
estimating its future prospects is also and Dr.N.M.Elango, An Efficient
very important, recommendation and Approach to Forecast Indian Stock
prediction system will lead us to a Market Price and their Performance
standing where estimating the area of Analysis, the paper reveals the use of
maximum outcome will ultimately benefit prediction algorithms and functions to
all business providers and will bring us to a predict future share prices and

compares their performance. The Relevance Vector Machine

results from analysis shows that (MVRVM) that is based on Bayesian
isotonic regression function offers the learning machine approach for
ability to predict the stock prices more regression. The performance of the
accurately than the other existing MVRVM model is compared with the
techniques. The results will be used to performance of another multiple
analyses the stock prices and their output model such as Artificial Neural
prediction in depth in future research Network (ANN). Bootstrapping
efforts. In this paper, different neural methodology is applied to analyze
classifier functions are examined and robustness of the MVRVM and ANN.
applied by using the Weka tool. By The MVRVM model outperforms ANN
using correlation coefficient various most of the time. The potential benefit
prediction functions are compared, of these predictions lies in assisting
and it is found that Isotonic regression producers in making better-informed
function offer the ability to predict the decisions and managing price risk.
stock price of NSE more accurately than
the other functions such as Gaussian 4. GAP ANALYSIS
processes, least mean square, linear Stock Market Prediction Using
regression, pace regression, simple Statistical Computational Methodologies
linear regression and SMO and Artificial Neural Networks.( A.
regression.- The paper, Forecasting Rao, S. Hule, H. Shaikh, E. Nirwan, Prof.
Commodity Prices: Futures Versus P. M. Daflapurkar) The paper provides
Judgment by Chakriya Bowman and ANNs that are able to represent complex
Aasim M. Husain, assesses the non-linear behaviour‘s. ANN approach
performance of three types of here, eliminates the error in parameter
commodity price forecasts—those based estimation.It doesn‘t provide statistical
on judgment, those relying exclusively methods are parametric model that need
on historical price data, and those higher background of statistic.
incorporating prices implied by An Efficient Approach to Forecast Indian
commodity futures. The analysis here Stock Market Price and their
indicates that on the basis of statistical- Performance Analysis.
and directional-accuracy measures, (K.K.Sureshkumar, Dr.N.M.Elango) It
futures-based models yield better helps in isotonic regression which is not
forecasts than historical-data-based constrained by any functional form,
models or judgment, especially at longer such as the linearity imposed by
horizons. The results here suggest that linear regression, as long as the function
futures prices can provide reasonable is monotonic increasing but it does not fit
guidance about likely developments in derivatives, so it will not approximate
spot prices over the longer term, at least smooth curves like most distribution
in directional terms. functions.
Another idea was proposed by Andres M. .Forecasting Commodity Prices:
Ticlavilca, Dillon M. Feuz, and Mac Futures versus Judgment. (Chakriya
McKee, Forecasting Agricultural Bowman, Aasim M. Husain)
Commodity Prices Using Multivariate Solves the given problem with Root
Bayesian Machine Learning Mean Squared Error
Regression, where multiple predictions (RMSE) gives a measure of the
are performed for agricultural magnitude of the average forecast
commodity prices. In order to obtain error, as an effectiveness measure but
multiple time-ahead predictions, this RMSE is a measure that is commodity
paper applies the Multivariate

specific, and cannot be readily used for System architecture starts with client
comparison across commodities. using any web browser to access the
Forecasting Agricultural Commodity server and add up to his data, this
Prices Using Multivariate Bayesian data is further observed and is used to
Machine Learning Regression. generate alpha 1 and alpha 2 with
(Andres M. Ticlavilca, Dillon M. Feuz respect to current and historic data
and Mac McKee) which is necessary for further
The dependency between input and prediction process. It is tested whether
output target is learned using MVRVM the newly acquired data possess any
to make accurate predictions. The abnormalities or not, and if it does then
potential benefit of these predictions data is sent for noise removal and process
lies in assisting producers in making which then goes
better-informed decisions and managing in the section which combines new data
price risk but sparse property (low and historic data, if there is no noise in
complexity) of the MVRVM cannot be new data, it directly goes for combination
analyzed for the small with historic data. History is updated
dataset.nForecasting Model for Crude Oil after combining new acquired data and
Price Using Artificial Neural Networks it is sent for training of the system,
and Commodity Futures Prices. this system keeps on training as new
(Siddhivinayak Kulkarni, Imad Haidar) In data keeps on adding and system then
this paper, ANN is selected as a mapping becomes capable of training itself after a
model, and viewed as nonparametric, certain point of time.
nonlinear, assumption free model .
which means it does not make a priori
assumption about the problem but if the
assumptions are not correct in
econometrical model, it could generate
misleading results.
5. PROPOSED WORK
This paper proposes an artificial intelligent
system prediction and recommendation as
this is the heart and brain of entire
process, here the data set noise elimination,
and learning and prediction stage is
going to occur. The data provided to
the system should be relevant and labelled,
in order to identify the parameters and
predict the patterns it learned. The system
5.1 Training Stage
must understand the pattern between the
This is the starting phase of the software
data parameter at faster rate because that is
cycle. This is where the system starts to
important to speed up the calculation
learns and understand the patterns of the
process for predicting values in future.
commodity and then it starts to predict
Artificial intelligence is based on machine
the prices of the commodity. This stage
learning technique known as decision
is being divided into 2 stages one where
learning tree so it must select the ideal
the system learns the dependency of the
parameters in order to understand the
factor for commodity and second the
pattern and predict the values.
external factors affecting the prices of
the commodity In First stage the
system creates the cluster of algorithms. considered as anomaly and the model
Then it identifies the dependency of the will be retrained.
commodity and the raw materials
dependency by studying the factors of the
raw material and then after learning
the dependency it progresses to
choosing the initial based algorithm based
on the factor it chose for predicting.
After the selection of the algorithm the
AI tries to construct sequence to train
machine. And the finally train the machine
to understand the raw material dependency
for commodity. In Second stage the
Fig 3: Prediction Stage
system creates the cluster of raw material
data collected from the first stage. Then it
5.3 Recommendation Stage
learns and the external factor affecting the
This stage is the final stage where the
raw material price fluctuation for example
clients can access and get
inflation, import export factor and then
recommendation based on the
after learning the external factors it
commodity they want to buy in this
progresses to choosing the initial state of
stage first of all the system will
probability based of commodity pattern.
identify the inventory management of
After the selection of the state of
the business owner and then choose
probability the AI tries to construct
the probability state based on their sales
sequence to train machine and the
and purchase and prices. The AI will
finally train the machine
construct sequence from the data it was
fed with and the, it will try to
implement it on the machine. After
learning the inventory management, it
will try to recommend the business
owner on the basis of pattern. It will
actually track the inventory through
proprietary based GIBS algorithm will
help it to understand the flow of the
inventory and finally the
Fig 2: Training Stage recommendation will be tested There can
be two output of the test Normal -If the
5.2 Prediction Stage test satisfies the test condition then
In the Prediction stage, our system will the pattern of the inventory will be added
generate the pattern based on the to the system in order to recommend it in
historical data. Then the discovered future Anomaly-This stage is the demerit
pattern will be added in the existing of the system to correctly identify the
sequence of the patterns. Using the inventory pattern so it is sent to the
combination of the discovered pattern beginning that is from identification
and also the existing sequence of inventory management.
patterns, system will predict some value
which we call as alpha. Then the test will
take place to check the behavior of the
alpha. If according the test, the alpha has
normal value, then it will be added to the
existingsequence of values else it will be

college, and in turn, the Department of

Computer Engineering for presenting
this opportunity to us. We are grateful
for the exposure given to us in the same
regards. The team members- Ankush,
Mohit, Priyamvada, Vaidehi need to be
explicitly thanked for their individual
and co-operative contribution in the
constant progress of this project. This is
in general, a large thank you and wide
smile to all those who directly or
Fig 4: Recommendation Stage indirectly influenced the course of this
project. Last, but not in the least, we
thank our respective parents for their
6. CONCLUSION AND FUTURE unwavering support and help.
WORK
REFERENCES
[1] Market Prediction Using Statistical
The purpose of the project is to improve Computational Methodologies and Artificial
the overall sales of the market, and Neural Networks‖, International Research
increase export of the nation. Stock Journal of Engineering and Technology
prediction and commodities (IRJET).
recommendation system provides a step [2] K.K. Sureshkumar and Dr.N.M. Elango, ―An
Efficient Approach to Forecast Indian Stock
towards smart investments and huge
Market Price and their Performance Analysis‖,
profit margins. The progress that this International Journal of Computer
system will bring to the market will be Applications (0975 – 8887).
revolutionary and recommendation and [3] Chakriya Bowman and Aasim M. Husain,‖
prediction system will lead us to a Forecasting Commodity Prices: Futures
versus Judgment‖.
standing where estimating the area of [4] Andres M. Ticlavilca,Dillon M. Feuz and
maximum outcome will ultimately benefit Mac McKee,‖ Forecasting Agricultural
all business providers and will bring us to Commodity Prices Using Multivariate
a position where we can invest smartly, Bayesian Machine Learning Regression‖.
knowingly and have maximum outcome. [5] Enke, D. and S. Thawornwong (2005),‖ The
Use of Data Mining and Neural Networks for
This project could be expanded to a Forecasting Stock Market Returns‖. Expert
wide range of commodities with the Systems with Applications 29:927-940.
proper support of technology in the [6] Cumby, R.E., and D.M. Modest, 1987,
future. ―Testing for Market Timing Ability: A
Framework for Forecast Evaluation‖, Journal
of Financial Economics, Vol. 19(1).
7. ACKNOWLEDGMENT [7] Mills, T.C., 1999 The Econometric Modeling
It is a long list, but the most important of Financial Time Series, Cambridge
people include our Guide Prof. University Press, Cambridge, United
P.R.Chandre under whose guidance we Kingdoms.
are able to learn, explore and flourish in [8] Irwin, S.H., M.E. Gerlow and T. Liu,
1994,‖ The Forecasting Performance of
experience. I am also grateful to Dr. P. N. Livestock Future Prices: AComparison to
Mahalle, Head of Computer Engineering USDA Expert Predictions‖, Journal of
Department, STES. Smt. Kashibai Futures Market Vol. 14(7).
Navale College of Engineering for his [9] E. Bopp, S. Sitzer, ―Are Petroleum futures
indispensable support, suggestions. It prices good predictors of cash value?‖, The
Journal of Futures
would be incorrect to hesitate while Market, 1987.
mentioning a special thank you to our

A MACHINE LEARNING MODEL FOR TOXIC

COMMENT CLASSIFICATION
Mihir Pargaonkar1, Akshay Wagh2, Rohan Nikumbh3, Prof. D.T. Bodake4, Shubham
Shinde5
1,2,3,4,5
Dept. of Computer Engineering, SKNCOE Pune, India
p22mihir@gmail.com1, akshayjwagh@gmail.com2, rohannikumbh2601@gmail.com3,
digambertb@gmail.com4, shubham.shinde7@gmail.com5
ABSTRACT
With rapidly expanding personal content and opinions on social-media web platforms,
there is an urgent need to protect their owners from abuses and threats. With the user
bases of popular platforms like Reddit, Facebook, Twitter etc. clocking over 500 million
and growing, a time-efficient and privacy protective solution to tackle ‗cyber bullying‘
is an automated one that understands a user‘s comment and flags them if
inappropriate. Social media platforms, online news commenting spaces, and many
other public forums of the Internet have become increasingly known for issues of
abusive behavior such as cyber bullying, threats and personal attacks.
We present our work on detection and classification of derogatory language in
online text, where derogatory language is defined as ―using, containing, or
characterized by harshly or coarsely insulting language‖. While derogatory language
against any group may exhibit some common characteristics, we have observed that it
is typically characterized by the use of a small set of high frequency stereotypical words
making our task similar to that of text classification.
Automating the process of identifying abuse in comments would not only save
websites time but would also increase user safety and improve the quality of discussions
online.
Keywords-Natural Language Processing (NLP), Toxic Comment Classification (TCC),
Machine Leaning (ML).
1. INTRODUCTION been criticized for their enabling of cyber

The threat of abuse and harassment bullying.
online means that many people stop According to a recent survey conducted by
expressing themselves and give up on Microsoft 53% of Indian children between
seeking different opinions. Platforms the age of 8 and 17 were bullied and India
struggle to effectively facilitate was ranked 3rd in cyber bullying which is
conversation leading many communities to of much concern. Many people are abused
limit or completely shut down user and harassed online in many ways which
comments. As discussions increasingly may affect them seriously and may lead to
move toward online forums, the issue of serious situations. So it is necessary to
trolls and spammers is becoming keep a control on the online comments and
increasingly prevalent. discussions by classifying them and taking
Manually moderating comments and action accordingly. This project will
discussion forums can be tedious, and to identify toxicity in text, which could be
deal with the large volume of comments, used to help deter users from posting
companies often have to ask employees to potentially hurtful messages, craft more
take time away from their regular work to civil arguments when engaging in
sift through comments or are forced to hire discourse with others and to gauge the
contracted or outside moderators. Without toxicity of other users‘ comments.
careful moderation, social media The proposed system uses NLP and
companies like Reddit and Twitter have Machine Learning techniques to create an

intelligent classifier which can understand algorithm, K-nearest Neighbor algorithm,

the meaning of the sentence and classify novel technique and most popular and
into six categories of toxicity: toxic, severe used algorithm is support vector algorithm.
toxic, obscene, threat, insult and identity Most authors used SVM (support vector
hate algorithm) for classification purpose. Two
API‘s are developed by companies for
2. LITERATURE SURVEY toxic comment classification by Google
NLP and Machine learning is used for and Yahoo.Google Counter Abuse
analyzing the social comment and Technology team developed one
identified the aggressive effect of an Perspective API in Conversion-AI.
individual or a group. Over the past few Machine Learning Tool used in
years, several techniques have been Conversion-AI as collaborative research
proposed to measure and detect offensive effort, which makes better discussions
or abusive content/behavior on platform online. Using Machine learning models
like Instagram, YouTube and Yahoo The API create score for toxicity of an
Answers. Some possible features could be input text.
– Lexical Syntactic Features, TF-IDF Limitation of Perspective API: This API
(Term Frequency – Inverse Document can only classify comments related to
Frequency), count of English language. Identifies abusive
offensive words in a sentence, count of comment based on predefined set of data.
positive words in a sentence, etc. If new comments are written down which
The current technologies like part of are not matched with the stored dataset,
speech, URLs BoW (Bag of Words), then toxicity could not be determined.
lexical features are useful for our study on Yahoo developed ―Yahoo‘s anti abuse AI‖
this context. In this study we made two which can hunt out even the most devious
main categories bullies and non-bullies online trolls. This uses Aho-Corasick
and the use of probabilistic sentiment string pattern matching algorithm for
analysis approach is used for filtering in detecting abusive comments. The accuracy
these two categories. of correctly detection of offensive word is
Huang et al. specifically, they chose to use 90%.
LSTMs because it solves the vanishing Limitation of Yahoo API: Problem is to
gradient problem. [1] build a system that can detect whether or
not any given comment is insulting. With
In this paper detection techniques for such a system, website owners would have
comments classification, which are based a lot of flexibility in dealing with this
on two machine learning algorithm problem. At this time there is no system
supervised and unsupervised learning are deployed anywhere on social media
used.Machine learning supervised platforms etc.
approach includes different type of [2]
decision tree algorithm, Naïve bays
algorithm, Regular pattern matching
Sr. Year Authors Synopsis Limitation
No.
1. 2009 Dawei Yin et. al. The supervised learning The experiments were
was used for detecting done using supervised
harassment. This methods. The temporal or

technique employs content user information was not

features, sentiment fully utilized.
features, and contextual
features of documents
with significant
improvements over
several baselines,
including Term Frequency
Inverse Document
Frequency
approaches.
2. 2012 Warner & In this work, the authors As a result, some

Hirschberg show a way to perform constraint in mixing two
sentiment analysis in blog languages like ―bookon‖
data by using the method in Urdu seems in English
of structural as ―books‖ their tagger
correspondence learning. ignores such kind of
This method offensive word.
accommodates the various
issues with blog data such
as spelling variations,
script difference, pattern
switching. By comparing
with English and Urdu
languages.
3. 2012 Xiang el al Semi-supervised approach The focused was on word

was applied for detecting level distribution and
offensive content on 860,071 Tweets. Not able
twitter using machine to cope up with the
learning (ML) algorithms. complex feature, complex
weighting mechanism and
In the experiment, the true with more data.
positive rate was 75.1%
over 4029 testing tweets
using Logistic Regression,
which has a TP of 69.7%,

while keeping the false
positive rate (FP) at the
same level as the baseline
at about 3.77%.

4. 2013 Dadvar et al An improved Need to improve the

cyberbullying system detection accuracy for the
which classifies the users‘ offensive comments.
comments on YouTube
using content-based,
cyberbullying-specific and
user-based features by
applying support vector
machine.
5. 2015 Kansara & A framework detects only Not able to detect audio
Shekokar abusive text messages or and video which are
images from the social offensive.
network sites by applying
SVM and Naïve Bayes
classifiers.
Table 1: Analysis of Related Work

Wikipedia talk page data is used to train character- to sentence-level. Character-
deep learning models to flag comments. level embedding‘s performed better than
Three models are tested: a recurrent neural the other embedding‘s on one dataset and
network (RNN) with a long-term short performed equally as well on the other.
memory cell (LSTM) and word Mehdad and Tetreault added more insight
embeddings, a CNN with character into using character-level features versus
embeddings and a CNN with word word-level features through their research.
embedding‘s. It is clear that RNNs, specifically LSTMs,
Comment abuse classification research and CNNs are state-of-the-art architectures
with machine learning began with Yin, et for sentiment analysis. Given the
al.‘s paper, in which the researchers use a similarities between comment abuse
support vector machine and apply TF-IDF classification and sentiment analysis, we
to the features. More recently, research hope to use this research to inform our
into applying deep learning to related approach and methodology. [3]
fields such as sentiment analysis has
proven quite fruitful. Zhang and Lapata Abusive language detection, which is
used Recurrent neural networks have been inherently formulated as classification
known to perform well in sentiment problem multiple works are done till date
analysis. Wang, et al. used LSTMs to with extensive usage of deep learning,
predict the polarity of Tweets and Naïve Bayes, SVM and Tree based
performed comparably to the state-of-the- approaches. In this paper systems are
art algorithms of the time. Huang, Cao, developed using Gaussian Naive Bayes,
and Dong found that hierarchical LSTMs Logistic Regression, K-Nearest neighbors,
allow rich context modeling, which Decision Trees, Multilayer perceptron and
enabled them to do much better at Convolutional Neural Networks(CNN) in
sentiment classification. Convolutional combination with word and character
neural networks have also been used for embedding‘s.
sentiment analysis. Nogueira dosSantos A convolutional neural network used by
and Gatti experimented with CNNs using using multichannel model with five input
various feature embedding‘s, from channels for processing 2-6 grams of input

malware text. Following CNN use of 3. GAP ANALYSIS

FCNN to concatenated feature map to a Current State
probability distribution over two classes. Current online discussion platforms are
To handle over-fitting, we use much susceptible to abusive behavior and
regularization via dropout. [4] are pretty ineffective in the detection,
classification and regulation of toxic
This paper investigates the effect of comments to prevent hurtful discussions.
various types of linguistic features for There exists a lack of publically available
training classifiers to detect threats of APIs for effective categorization of toxic
violence in a corpus of YouTube comments online.
comments. Their data set consists of over Ideal Future State:
50,000 YouTube comments taken from Online discussion platforms can
videos about controversial topics. The effectively evaluate the toxicity of
experiments reported accuracies from 0.63 comments that are being published by its
to 0.80, but did not report precision or users and accordingly take the desired
recall. There has been quite a bit of work action based on the category of toxicity.
focused on the detection of threats in a Bridging the Gap:
data set of Dutch tweets which consists of The proposed system will provide access
a collection of 5000 threatening tweets. to toxic comment classification machine
The system relies on manually constructed learning model through an API which can
recognition patterns in the form of n- be used by online discussion platforms
grams, but details about the strategy used such that the users‘ comments would be
to construct these effectively classified into the following six
patterns are not given. A manually crafted categories-toxic, severe toxic, obscene,
shallow parser is added to the system. This threat, insult, and identity-hate.
improves results to a precision
of 0.39 and a recall of 0.59. The results 4. SYSTEM FEATURES
show that combination of lexical features Functional Requirements
outperforms complex semantic and 1. The model has an input interface to the
syntactic features. user(calling entity) through which
Warner and Hirschberg (2012) present a 'comments' can be given for classification.
method for detecting hate speech in user- 2. The model can predict and classify a
generated web text, which relies on comment into the following 6 categories –
machine learning in combination with toxic, severely toxic, obscene, threat,
template-based features. The task is insult, identity hate.
approached as a word-sense 3. The model has an output interface to the
disambiguation task, since the same words calling entity which provides information
can be used in both hateful and non-hateful about the categories to which the input
contexts. The features used in the comment belongs.
classification were combinations of uni-, Software requirements:
bi- and trigrams, part-of-speech-tags and For development: Python3, NumPy,
Brown clusters. The best results were Pandas, Keras, scikit-learn,
obtained using only unigram features. The Spyder/PyCharm, Jupyter Notebook,
authors suggest that deeper parsing could Twitter API.
reveal significant Hardware requirements:
phrase patterns. [5] Processor: 2.9 GHz Processor,
Recommended Intel core i5 processor
Ram: At least 4 GB RAM

Figure 1: System Architecture
5. PROPOSED WORK obtained from the client. This includes

The proposed system is a multi-label activities like removal of irrelevant text
classification machine learning model like dates, IP addresses, numbers, stop
which will be able to accurately predict the words, etc. This cleansing action is
categories of toxicity into which a essential since it can hugely affect the
comment provided by the client belongs accuracy and response time of the classier.
to. It is designed to categorize comments 2. The Classifier - This component is the
into the following six categories of toxicity actual machine learning model developed
- toxic, severely toxic, obscene, threat, using the most suitable algorithm which
insult and identity hate. As per the system will be used to evaluate and categorize the
architecture, the proposed system which is comments sent by the client into the
named as 'toxic comment classifier' appropriate categories.
comprises of three main components - 3. Response Generator - The task of this
Text Processing unit, the Classifier and the component is to capture the results of the
Response Generator. The major tasks of classifier and convert them into a suitable
these components are: format to send it as a response to the client
1. Text Preprocessing unit - The task of via the web API.
this component is to apply common text The proposed system will be made
processing operations on the raw comment accessible to its clients in the form of a

Web Application Programming Interface online experiences is the need of the hour.
(API).This makes it easy for the clients as The proposed system thus provides online
they are provided with a uniform interface social media utilities and other such
irrespective of the client. The clients are discussion platforms the ability to assess
expected to just pass the comments made the quality of users' comments by their
by users on their platform in JSON format classification into various kinds of toxicity
to the proposed system's web API for their using techniques like Natural Language
evaluation. The comment then progresses Processing and machine learning
through the three components of the Toxic algorithms. Based on the results provided
Comment Classifier and the response is by the system, the communication
sent back to the respective client via the platforms can decide the suitable course of
web API. action to be taken on such comments and
hence ensure that its users have a better,
System Parameters safer and harmless online experience.
1. Response Time - Since, typically, any The goals of future work on toxic
online discussion platform will have comment classification are to make initial
several active users who are posting and admission decisions reliable, decrease the
updating comments, the process of number of false calls and to make the QoS
evaluation of comments and corresponding guarantees more robust in the face of
response generation must be quick to network dynamics. There are users from
ensure that the users are not forced to wait various backgrounds, cultures which read
for an unsatisfactorily long period of time. and write in their native languages apart
Thus, the proposed system is expected to from English so it may be difficult to
provide a response to its clients in less identify the toxic comments in their local
than 4 seconds (assuming good network languages. This problem can be countered
connectivity). using CNN or Deep Learning in future. In
2. Cost - The cost associated with the future, the system can be improved with
proposed system is only for 'training' the the advancements in fields of NLP, ML,
machine learning model which varies from AI, Speech Synthesis etc.
platform to platform depending on various
factors like GPU specifications, memory REFERENCES
size, training time etc. [1] Hitesh Kumar Sharma, K Kshitiz, Shailendra.
3. Scalability -During peak online traffic, ―NLP and Machine Learning Techniques for
Detecting Insulating Comments on Social
it is important to make sure that the Networking Platforms‖ 2018
proposed system's response does not slow [2] Pooja Parekh, Hetal Patel. ―Toxic Comment
down. Thus, as the system is designed in Tools: A Case Study‖
the form of an API, it can be easily scaled- 2017
up by replicating and deploying it on [3] Theodora Chu, Kylie Jue. ―Comment Abuse
Classification with Deep Learning‖
multiple servers so as to satisfy larger [4] Manikandan R, Sneha Mani. ―Toxic Comment
number of incoming requests efficiently. Classification- An Empirical Study‖
4. Accessibility –The proposed system is 2018
easily accessible in the form of an API to [5] Aksel Wester, Lilja Ovrelid, Erik Velldal,
all its clients through a uniform interface. Hugo Lewi Hammer. Threat Detection in
Online Discussions
[6] S. Bird, E. Klein, and E. Loper, ―Natural
6. CONCLUSION AND FUTURE language processing with python.‖ 2014.
WORK http://www.nltk.org/book/ch02.html
To tackle the severe issue of abuse and [7] J. Pennington, R. Socher, and C. D. Manning,
harassment on social media platforms and ―Glove: Global vectors for word
representation,‖
to improve the quality of online 2018.https://nlp.stanford.edu/projects/glove/
discussions thereby mitigating harmful

[8] Ivan, ―Lstm: Glove + lr decrease+bn+cv,‖ [11] A. Pentina and C. H. L. 1, ―Multi-task learning
2018. https://www.kaggle.com/demesgal/lstm- with labeled and unlabeled tasks,‖ 2017.
glove-lr decrease-bn-cv-lb-0-047 http://pub.ist.ac.at/~apentina/docs/icml17.pdf
[9] A. Srinet and D. Snyder, ―Bagging and [12] Kaggle, ―Toxic comment classification
boosting‖. challenge‖,2018.
https://www.cs.rit.edu/~rlaz/prec20092/slides/ https://www.kaggle.com/c/jigsaw-toxic
Bagging_and_Boosting.pdf comment-classification-challenge/leaderboard
[10] T. Cooijmans, N. Ballas, C. Laurent, and A. C. [13] ―Threat detection in online discussions‖
Courville, ―Recurrent batch normalization,‖ 2016 - Aksel Wester and Lilja Øvrelid and
CoRR, 2017. Erik Velldal and Hugo Lewi Hammer.
https://arxiv.org/pdf/1603.09025.pdf Department of Informatics University of Oslo.

HOLOGRAPHIC ARTIFICIAL INTELLIGENCE

ASSISTANCE
Patil Girish1, Pathade Omkar2, Dubey Shweta3, SimranMunot4
1,2,3,4
Dept.Computer Engineering, Shri.ChhatrapatiShivajiMaharaj College of Engineering Ahmednagar,
India
girishpatil815@gmail.com1, Indiaomkar.pathade2011@gmail.com2, Indiassd12597@gmail.com3,
Indiasiyapv206@gmail.com4
ABSTRACT
The current AI assistant systems are used to take user speech as input and process it to
give the desired output. But the current available systems are the Virtual Private
Assistant(VPA‘s). This means you can communicate with the assistant but is not visible
to you. So the proposed system will allow you to interact with 3D Holographic Assistant
and you can provide input in the form of Speech, Gesture, Video Frame, etc. And will
also take form of any object to give detailed idea of required object. This system will be
used to increase the interaction between humans and the machines by using 3D
Holographic projection in thin air like a real object and makes the holographic effect
more realistic and interactive. The system can detect the age of the person with
provided input and provide the results accordingly. The system can be integrated within
the smartphones for providing inputs and outputs. This system can be used in other
different areas of applications, including education assistance, medical assistance,
robotics and vehicles, disabilities systems, home automation, and security access
control. System can also be used in shops, malls and exhibition to visualize the object in
3D Holographic format instead of real object.
Keywords
Holographic Artificial Intelligent Assistant; Natural Language Processing; Image
Recognition; Gesture Recognition.

1. INTRODUCTION
While using the AI Assistant that are
currently present we can face a problem
that if sometimes the mike of the device
fails we are unable to interact with the
Assistant. This may createa interrupts in
interaction. And also while using the
current assistant we are not able to
visualize them, they are virtually present
so we cannot see them. Also while the
kids are using it there are a few concepts
that needs to be visualized for better
understanding. Fig 1: Existing Virtual AI Assistance System
The proposed system involves the As shown in above fig.1 they are the
Multi-Model system in combine with the current existing systems which are the
Holographic view, this includes the virtual AI Assistance system. They are
advancement in computer graphics and the systems which do not show the
multimedia technologies the way human assistance in front of you. They are also
view and interact with the virtual world, the systems which accept the simple input
such as the augmented reality (AR) and mode that is Speech or Text. They are no
the hologram display. The usage of AR able to take input in the form of video
display devices, such as Smartphone‘s frames, images, gestures, etc. They are
and smart glasses, allow the user to not much interactive.
receive additional information, which is in
the form of informativegraphics based on
his or her field of view through the
devices, for example, the street‘s name, 3. PROPOSED MODEL
navigation arrow to lead the user to the This proposed model gives the
destination, etc. On the other hand, the advance version for the present Existing
use of holographic pyramid prism can system. It combines 2 concepts as
produce the holographic results that Holographic projection and Artificial
displayed the 3D objects in the real world Intelligent Assistant
environment, by letting the user to look at
different perspective of these holograms
when viewing from different angles.
This system can also be used in the
education system to improve the
experience of the learning. This will
create the better understanding effect in
mind of the students. Also it can be used
in malls for demonstration of the material,
in case if the material is not available and Fig 2: Architecture of Proposed System
it will soon be arrived then also the
customer can view it using this The above shown in fig.2 is the
Holographic AI Assistant architecture of the proposed system as
shown in it the system consist of the
transparent box and the monitor is
2. EXISTING SYSTEM placed in the top part of the box.
The current Existing systems are as Inside the box the glass prism is been
shown below: set at angel. This will help for
displaying the projection. The inside
projection will consists of the simple

human animation. This animation will

make the same effects as the human in
certain conditions.
As per mention by Authors
calculation and the dimensions would
be as shown below in Fig.3 [3].
Fig 3.2: Gesture as input

Video Frames: In this module the
video frames will be given as input
and the data will be decoded in it.
Fig 3: Dimensions used

a. Input Module:
The system will be able to take and
recognize the input in different modes.
The modes will be:
Speech: In this the simple speech
will be taken as input decoded and Fig 3.3: Video Frames as
result will be provided. input
b. Output Module:
The output module will be in the
given form:
Gesture: In this the input can be

given in form of the gesture. That is
the user will need to perform the
action and they will be recognized and
proper output will be shown.
Fig.4 Output Module With Assistant

The above shown in Fig.4 is the
output module. In case of proper
understanding the displayed assistant

will take form of the object as shown

in Fig.5.
Fig.7 NLP
e. Knowledge Base :
Proposed system consists of two
knowledge bases. The first one is
online and second one is offline where
Fig.5 Output module with object all the data and the facts such as facial
and body datasets for gesture module,
speech recognition knowledge base,
c. Interaction Module: image and video dataset and some user
As per mention by VetonKepuska information related to modules will be
this module consists of the way the stored.
interaction is made. It describes how
the interaction is made. The Fig. 6
shows it [1]. 4. EXPERIMENTAL RESULTS
While researching the results which were
generated while using single modal AI
assistants,we considered efficiency and the
correctness as important measures. With
the increasing functionalities, the concern
of user experience regarding voice
recognition, visualization experience , fast
tracing of hand gestures ,which we have
introduced in Holographic Assistant has
been a challenge need to be overcome.
Efficiency: In comparison with the old AI
assistants, the Holographic Assistant will
prove to be more accurate while using
advance technologies such as Natural
Fig.6 Interaction Module Language Processing .
This is the module that describes Accuracy: On the other hand, the
the way the interaction is going to take accuracy of the holographic assistant
place. would be better which would handle
challenges like noise and accents. Whereas
the existing modals were more error prone.
d. Natural Language Cost: One of the profitable things
Processing(NLP): about this AI assistant is it's almost free of
This module gives the proper cost. The overall pre-requisites apart from
understanding of NLP which is the available softwares is a transparent glass
basic concept for speech recognition in and a monitor screen. Hence, this system
multimodal system. The Fig.7 shows would be affordable for all kind of
the proper NLP Structure. vendors out in the market who will be

ready to take innovations on new levels in using Augmented Reality Edu Card and 3D
their businesses. Holographic Pyramid for Interactive and
Immersive Learning, 2017 IEEE Conference on
REFERENCES e-Learning, e-Management and e-Services
[1] Veton Kepuska, Gamal Bohouta, Next- (IC3e).
Generation of Virtual Personal Assistants [4] R. Mead. 2017.Semio: Developing a Cloud-
(Microsoft Cortana,Apple Siri, Amazon based Platform for Multimodal
Alexa and Google Home),2018 IEEE. Conversational AI in Social Robotics. 2017
[2] Mrs. Paul Jasmin Rani, Jason Baktha IEEE International Conference on
kumar, Praveen Kumaar.B, Praveen Consumer Electronics (ICCE).
Kumaar.U and Santhosh Kumar, Voice [5] ChukYau and Abdul Sattar,Developing
controlled home automation system using Expert System with Soft Systems
natural language processing (nlp) and Concept,1994 IEEE.
internet of things (iot).2017Third [6] Inchul Hwang, Jinhe Jung, Jaedeok Kim,
International Conference on Science Youngbin Shin and Jeong-Su Seol,
Technology Engineering & Management Architecture for Automatic Generation of
(ICONSTEM). User Interaction Guides with Intelligent
[3] Chan Vei Siang, Muhammad Ismail Mat Isham Assistant, 2017 31st International
,Farhan Mohamed, Yusman Azimi Yusoff, Conference on Advanced Information.
Mohd Khalid Mokhtar, Bazli Tomi, Ali
Selamat, Interactive Holographic Application

PERSONAL DIGITAL ASSISTANT TO ENHANCE

COMMUNICATION SKILLS
Prof G Y Gunjal1,Hritik Sharma2,Rushikesh Vidhate3,Rohit Gaikwad4,Akash Kadam5

1,2,3,4,5
Department of Computer Engineering, Smt Kashibai Navale College of Engineering, Vadgaon(k),
Pune, India.
ggunjal@gmail.com , hritik.3hs@gmail.com2, www.rushikeshvidhate02@gmail.com3,
1
rg881403@gmail.com4, akashkadam985@gmail.com5
ABSTRACT
The development of the information technology and communication has been complex
in implementing of artificial intelligent systems. The systems are approaching of
human activities such as decision support systems, robotics, natural language
processing, expert systems, etc.In the modern Era of technology, Chatbots is the next
big thing in the era of conversational services. Chatbots is a virtual person who can
effectively talk to any human being using interactive textual skills.
GENERAL TERMS
NLP - Natural Language Processing
NLU - Natural Language Understanding
NLG - Natural Language Generation
NLTK- Natural Language Toolkit
1. INTRODUCTION organized in a way that supports reasoning
Chatbots are ―online human-computer about the structures and behaviors of the
dialog system with natural language.‖ The system. System architecture of the system
first conceptualization of the chatbot is consists of following blocks
attributed to Alan Turing, who asked ―Can
machines think?‖ in 1950. Since Turing,
chatbot technology has improved with
advances in natural language processing
and machine learning. Likewise, chat bot
adoption has also increased, especially
with the launch of chatbot platforms by
Facebook, Slack, Skype, WeChat ,Line,
and Telegram.
Not only that, but nowadays there is also a
hybrid of natural language and intelligent
systems those could understand human
natural language. These systems can learn 3. OVERALL DESCRIPTION
themselves and renew their knowledge by Product Perspective
reading all electronics articles those has Most of the search engines today, like
been existed on the Internet . Human as Google, use a system (The Pagerank
user can ask to the systems like usually did Algorithm) to rank different web pages.
to other human. When a user enters a query, the query is
interpreted as keywords and the system
2. SYSTEM ARCHITECTURE returns a list of highest ranked web pages
The System Architecture is the conceptual which may have the answer to the query.
model that denes the structure, behavior, Then the user must go through the list of
and more views of a system. An webpages to find the answer they are
architecture description is a formal looking for.
description and representation of a system,

Product Features anyone who is interested in improving

The major features for Drexel Chatbot will their communication skills.
be the following:
● Natural Language Processing:The 4. FIGURES
system will take in questions written in The purpose of a component diagram is
standard show the relationship between different
● English Natural Language components in a system. For the purpose
Responses:The answer to the question will of UML 2.0, the term "component" refers
be written in standard and understandable to a module of classes that represent
English. independent systems or subsystems with
● Information Extraction:There will be a the ability to interface with the rest of the
database containing all the information system.
needed, populated using information There exists a whole development
extraction techniques. approach that revolves around
User classes and characteristics components: component-based
Primary User: development (CBD). In this approach,
The main User Class that is going to use component diagrams allow the planner to
this product. The product frequency of use identify the different components so the
could be on a daily basis as every student, whole system does what it's supposed to
every employee needs to improve their do.
communication and personal skills. Component diagrams are integral part of
Mobile/Web app users: designing a system. Drawn with different
These are the users who want to improve types of software which supports UML
their communication in English. These diagram. They help to understand the
users inputs sentences to system and get system structure and to design new ones.
response with mobile, web, or text The component diagrams are used to show
messaging interfaces. This class of users relationships among various components.
include students, corporate peoples, and
Fig 2: Component Diagram

5. CONCLUSION input and receive response. A chat bot is a
The development of chat bot application in rising trend and chat bot increases the
various programming language had been effectiveness of business by providing a
done with making a user interface to send better experience with low cost. A simple

chat bot is not a challenging task as his kind support. His valuable suggestions
compared to complex chatbots and were very helpful.
developers should understand and consider We are also grateful to Dr. P. N. Mahalle,
the stability, scalability and flexibility Head of Computer Engineering
issues along with high level of intention on Department, STES' Smt. Kashibai Navale
human language. In short, Chatbot system College of Engineering for his
is moving quite fast and with the passage indispensable Guidance, support and
of time new features are added in the suggestions
existing platform. Recent advancements in
the machine learning techniques may able REFERENCES
to handle complex conversation issue such [1] AM Rahman, Abdullah Al Mamun, Alma
as payments correctly. Islam, ―Programming challenges of Chatbot:
Current and Future Prospective‖ Region 10
Humanitarian Technology Conference ( 2017)
6. FUTURE SCOPE [2] Bayu Setiaji, Ferry Wahyu Wibowo, ―Chatbot
The scope of our application in future is by Using A Knowledge in Database‖ 7th
extending the knowledge database with International Conference on Intelligent
Systems, Modelling and Simulation (2016)
more advanced datasets and including [3] Anirudh Khanna Bishwajeet Pandey Kushagra
support for more languages as well. Vashishta Kartik Kalia Bhale Pradeepkumar
Providing users with more detailed reports [4] Teerath Das, ―A Study of Today‘s A.I. through
Chat bots and Rediscovery of Machine
of their previous performances, so that this Intelligence‖ International Journal of u- and e-
could lead to improve the pace of user‘s Service, Science and Technology Vol.8, No. 7
(2015)
skill development. We also plan to extend [5] Sameera A. Abdul-Kader Dr. John Woods,
the web application into native mobile ―Survey on Chatbot Design Techniques in
apps. Speech Conversation Systems‖ International
Journal of Advanced Computer Science and
1. ACKNOWLEDGMENTS Applications, Vol. 6, No. 7 (2015)
We would like to take this opportunity to [6] https://www.altoros.com/blog/how-tensorflow-
thank our internal guide Prof. G.Y. Gunjal can-help-to-perform-natural-language-
for giving us all the help and guidance we processing-taksk
needed. We are really grateful to him for https://media.readthedocs.org/pdf/nltk/latest/nl
tk.pdf.

FAKE NEWS DETECTION USING MACHINE

LEARNING
Kartik Sharma1,Mrudul Agrawal2,Malav Warke3,Saurabh Saxena4
1,2,3,4
Pune, India.
kartik4949@gmail.com1,mradulagrawal007@gmail.com2,malav2202@gmail.com 3,saurabh69912162@gm
ail.com4
ABSTRACT
American politics suffered from a great set back due to fake news. Fake news is
intentionally written to mislead the audience to believe the false propaganda, which
makes it difficult to detect based on news content. The fake news has hindered the
mindset of the common people. Due to this widespread of the fake news online it is the
need of the hour to check the authenticity of the news. The spread of fake news has the
potential for extremely negative impact on society. The proposed approach is to use
machine learning to detect fake news. Using vectorisation of the news title and then
analysing the tokens of words with our dataset. The dataset we are using is a
predefined curated list of news with their property of being a fake news or not. Our
goal is to develop a model that classifies a given article as either true or fake.
General Terms
Fake News, Self Learning, Pattern Matching, Response Generation, Artificial
Intelligence, Natural Language Processing, Context Free Grammar, Term Frequency
Inverse Document Frequency, Stochastic Gradient Decent, Word2Vec
Keywords
Natural language processing, Machine learning, Classification algorithms, Fake-news
detection Detection, Filtering.
1.INTRODUCTION After the election results, these fake news
This project emphasises on providing had made its prominent way into the
solutions to the community by providing a market. These have also led into the
reliable platform to check the Authenticity exclusion of Britain from the European
of the news. The project Fake News Union i.e Brexit. During the Brexit time
Detection using Machine Learning the same fake news propaganda was
revolves around discovering the carried on the internet and due to this a
probability of a news being fake or real, mentality is developed among people that
Fake News mainly comprises of one option is better than another thus
maliciously-fabricated News developed in leading into the manipulation of the
order to gain attention or create chaos in decision of the public and hindering the
the community. importance of the democracy. Thus the
In 2016 American election the propaganda very foundation on which the countries are
carried on by the Russian hackers had the operating is disturbed and people don‘t
drastic effect on the country, few had know whom to believe and whom to not
supported for President Trump while thus the belief system of democratic
others didn‘t but still, due to the spread of countries are compromised and people
the fake news against both presidential began to think on their own decision
candidates Trump and Clinton there was whether they took the decision was right or
an uproar in the public and moreover the not or the influence of this news was the
spread of these fake news on the social cause of it? Thus the paper deals with
media had a drastic impact on the lives of tackling the situation of fake news which
the Americans. has the power to shatter the whole

economy of the world and create a ― Great

Fall‖. Sholk Gilda ―Evaluating Machine
Learning Algorithms for Fake News
2.MOTIVATION Detection, 2017‖ proposed system make
Fake news mostly spreads through the use of available methods like Support
medium of social networking sites such as Vector Machines, Stochastic Gradient
Facebook, Twitter and several others.Fake Descent, Gradient Boosting, Bounded
news is written and published with the Decision Trees, and Random Forests in
intent to mislead in order to damage a order to calculate best available way to
person, and/or gain financially or achieve maximum accuracy.
politically.A litany of verticals, spanning Sakeena M. Sirajudeen, Nur Fatihah a.
national security, education and social Azmi, Adamu I. Abubakar, ―Online Fake
media are currently scrambling to find News Detection Algorithm, 2017‖ The
better ways to tag and identify fake news proposed approach is a multi-layered
with the intention of protecting the public evaluations technique to be built as an app,
from deception. Our goal is to develop a where all information read online is
reliable model that classifies a given news associated with a tag, given a description
article as either fake or true. Recently of the facts about the contain.
Facebook has been at the centre of much Verónica Pérez-Rosas, Bennett
critique following media attention. They Kleinberg, Alexandra Lefevre Rada
have already implemented a feature for Mihalcea, ―Automatic Detection of Fake
their users to check fake news on the site News,
itself, it is clear from their public 2017‖, proposed system does comparative
announcements that they are actively analyses of the automatic and manual
researching their ability to distinguish identification of fake news.
these articles in an automated way. Indeed,
it is not an easy task. A given algorithm 4.GAP ANALYSIS
should be politically unbiased – since fake
news exists on both ends of the spectrum – Table 1. Comparison of existing and proposed
and also give equal balance to legitimate system
news sources on either end of the
spectrum.We need to determine what Sr Existing Proposed
makes a new site 'legitimate' and a method System System
to determine this in an objective manner. no.
3.LITERATURE SURVEY syste uses tf-

1. Mykhailo Granik, Volodymyr Mesyura, This m idf This
―Fake News detection using Naïve Bayes, system will use
2017 ‖ , proposed an approach for encodin wit statistic w i k i p e d F a s t Te
detection of fake news using Naïve Bayes g h al ia xt
classifier with accuracy of 74% on the test 1 machine Word2Vec
set. learning. Embeddings.
Sohan Mone, Devyani Choudhary, Machine Lear

Ayush Singhania, ―Fake News ning
Identification, 2017‖ proposed system M a c h i n L e a r n i concep
calculates the probability of a news being e ng ts such as Self
fake or not by applying NLP and making concept suc Learning
use of methods like Naïve Bayes, SVM, s h as Self along with
Logistic Regression. 2 Learnin and Pattern L o n g S h oT e r m

g rt SYSTEM FEATURE 1 – NEWS

Matching are not M e m o r y ( R e c u r GATHERING
used. rent
Neural We gathered random news on various
Networks). articles with different subjects to train our
model. By studying these, System detects
syste performe news intent using machine learning
This m d algorithm. Pre Labelled news are used to
new This system train our models. The Accurate and Best
well on new s but outperforms performing model is selected for our
lack performanc the existing predictions. The pre labelled data that we
3 s e with system collected is form a reliable resource such
complex as Kaggle. The news collected also
news. contains the class attribute with its
corresponding values either true or false
Table 1 - Gap Analysis on the basis of which it will be determined
whether the news is true positive, true
negative or false positive, false negative.
The class attribute helps in producing the
confusion metrics through which attributes
like precision, recall etc are calculated in
order to evaluate accuracy of the model.
The proposed model initially consist of
10,000 different news articles and their
corresponding class attributes. Once the
news is gathered the model goes to the
next feature.
SYSTEM FEATURE 2 - COMPLEX

NEWS HANDLING
System will analyse complex news which

Fig 2 - LSTM can be difficult for traditional model.
Following steps are required for handling
of the complex news, which are as follows
Tokenising, padding, encoding,
Embedding matrix formation, Model
Formation, Model Training and Finally
predicting the model. The process starts
with the tokenising of the input news
which is present in the LIAR dataset. The
dataset we are using consists of 10,000
news articles with class attribute of each
article. In the next process each
article/news is taken and is tokenised, in
the tokenisation process all the stop words
Fig 3 - Naïve Bayes are removed as well as stemming and
lemmatisation is also performed.
5.PROPOSED WORK

Second Stage is thr Padding the tokens of One way is to create co-occurrence matrix.
variable length for this, pad_sequences() A co-occurrence matrix is a matrix that
function in the Keras deep learning library consist of number counts of each word
can be used to pad variable length appearing next to all the other words in the
sequences.The default value is 0.0, which corpus (or training set). Let‘s see the
is suitable for almost every application, following matrix.
although this can be changed by specifying
the preferred value via the ―value‖
argument. The padding to be applied at
first or the end of the sequence, called pre-
or post-sequence padding, can be called
the ―padding‖ argument.
Table 2 - Word Embedding Table
Text data requires special preparation We are able to gain useful insights. For
before you can start using it for predictive example, take the words ‗love‘ and ‗like‘
modelling. The text must be parsed to and both contain 1 for their counts with
remove words, called tokenisation. Then nouns like NLP and dogs. They also have
the words need to be encoded as integers 1‘s for each of ―I‖, which indicates that the
or floating point values for use as input to words must be some sort of verb. These
a machine learning algorithm, called Text features are learnt by NN as this is a
Encoding. Once this process of encoding is unsupervised method of learning. Each of
completed then the text or tokens gets the vector has several set of characteristics.
ready for the embedding process. For example let‘s take example, V(King) -
Embedding is representation for text where V(man) + V(Women) ~ V(Queen) and

words that have the same meaning have each of word represents a 300-dimension
similar representation. It is a approach to vector. V(King) will have characteristics
represent words and documents that may of Royalty, kingdom, human etc in the
be considered one of the key development vector in specific order. V(Man) will have
of deep learning on challenging natural masculinity, human, work in specific
language processing problems.This order. When V(King)-V(Man) is done,
transformation is necessary because many masculinity, human characteristics will get
machine learning algorithms require their NULL and when added with V(Women)
input to be in vectors of continuous values; which having femininity, human
they just won‘t work on strings of plain characteristics will be added thus resulting
text. So natural language modelling in a vector similar to a V(Queen). The
techniques like Word Embedding which is interesting thing is that these
used to map words and phrases from characteristics are encoded in the vector
vocabulary to a corresponding vector of form in a specific order so that numerical
real numbers. Word2Vec model is used for computations such as addition, subtraction
learning vector representations of a works perfectly. This is because of the
particular words called ―word nature of unsupervised learning.
embeddings‖. This is typically done as
preprocessing step, after which the learned SYSTEM FEATURE 3 – FAST
vectors are feed into a model mostly RNN TRAINING OF NEW DATA ON GPU
inorder to generate predictions and
perform all sort of interesting things.We The Proposed System uses Nvidia GPU
will be filling the values in such a way that using CUDA architecture and thus the
the vector somehow represents the word training of complex real time news
and its context, meaning, or semantics. becomes easy and faster.Keras

automatically uses the GPU wherever and Web-portal it then reaches out to the
whenever possible with the help model in backend who process and gives
CuDNNLSTM, which is a high level deep output. The news given by the user is
learning keras and tensor-flow neural taken as a test set or test case and is sent to
network which runs the model on GPU classifier which classifies it.
(Nvidia gpu) using CUDA technology.
CUDA is NVIDIA's parallel computing 6.CONCLUSION
architecture that enables dramatic
increases in computing performance by The circulation of fake news online not
harnessing the power of the GPU (graphics only jeopardises News Industry but has
processing unit).Fast LSTM been negatively impacting the user‘s mind
implementation backed by CuDNN. The and they tend to believe all the information
execution of model training gets faster by they read online. It has power to dictate the
12 to 15 % depending on data. fate of a country or even whole world.
Daily decision of public also gets affected.
5.1 FIGURES/CAPTIONS Applying the projected model would
This diagram depicts the actual working definitely help in differentiating between
of the proposed system and all the Fake and Real news.
functionalities it will perform.Model
formation for fake news detection make REFERENCES
use of the training and the test data set and [1] Sadia Afroz, Michael Brennan, and Rachel
some other parameters like the dimensions Green- stadt. Detecting hoaxes, frauds, and
deception in writing style online. In ISSP‘12.
of the vector space where it hold the [2] Hunt Allcott and Matthew Gentzkow. Social
relation between the two or more news media and fake news in the 2016 election.
entities. All these data is set to pass into Technical report, National Bureau of
the main function which is thought to Economic Research, 2017.
generate the confusion metrics and present [3] Meital Balmas. When fake news becomes real:
Com-bined exposure to multiple news sources
the result in terms of percentage. and political attitudes of inefficacy, alienation,
and cynicism. Com-munication Research,
41(3):430–454, 2014.
[4] Alessandro Bessi and Emilio Ferrara. Social
bots dis- tort the 2016 US presidential election
online discussion. First Monday, 21(11), 2016.
[5] Prakhar Biyani, Kostas Tsioutsiouliklis, and
John Blackmer. ‖ 8 amazing secrets for getting
more clicks‖: Detecting clickbaits in news
streams using article in-formality. In
AAAI‘16.
[6] Thomas G Dietterich et al. Ensemble methods
in ma-chine learning. Multiple classifier
systems, 1857:1–15, 2000.
[7] kaggle Fake News NLP Stuff.
Fig 5 - Working of proposed model https://www.kaggle.com/ rksriram312/fake-
Initially the system stores the gathered news-nlp-stuff/notebook.
[8] kaggle All the news
news in database which is then retrieved .https://www.kaggle.com/snapcrack/ all-the-
by the model, which then processes the news.
training data and produces the classifier. [9] Mykhailo Granik, Volodymyr Mesyura, ―Fake
The user is supposed to enter the news News detection using Naïve Bayes, 2017 ‖
manually which is thought to be [10] Sohan Mone, Devyani Choudhary, Ayush
Singhania, ―Fake News Identification, 2017‖.
unverified, once the input is given via

COST-EFFECTIVE BIG DATA SCIENCE IN MEDICAL

AND HEALTH CARE APPLICATIONS
Dr S T Patil1,Prof G S Pise2
1 Department of Computer engineering, VIT, Pune.
2 Department of Computer Enginering, Smt Kashibai Navale College of Engineering, Vadgaon(Bk),
Pune, India.
Patil.st@vit.edu1,ganesh.pise@sinhgad.edu2
ABSTRACT
Big-Data can play important role in data science and Healthcare Industries to manage
data and easily utilize all data in a proper way with the help of ―V6s‖ (Velocity,
Volume, Variety, Value, Variability, and Veracity). The main goal of this paper is to
provide bottomless analysis on the field of medical science and healthcare data analysis
and also focused of previous strategies of healthcare as well as medical science. The
digitization process is participated in the medical science (MS) and Healthcare Industry
(HI) hence it produces massive data analysis of all patient related data to get a 360-
degree point of view of the patient to analyze and prediction. It helps to improve
healthcare activities like, clinical practices, new drugs development and financial
process of healthcare. It helps for lots of benefits in healthcare activities such as early
disease detection, fraud detection, and better healthcare quality improvement as well as
efficiency. This paper introduces the big data analytics techniques and challenges in
healthcare and its benefits, applications and opportunities in medical science and
healthcare.
General Terms
Hadoop, Map-Reduce, Healthcare Big-Data, Medicals, Pathologist.
Keywords
Healthcare Industry (HI), R, Data Analytics (DA), Smart-Health (SH).
1. INTRODUCTION 1. Volume: It means data size is big/huge like,
The main goal of this paper is to provide Terabytes(TB), Petabytes(PB), Zeta bytes(ZB)
best predictive analysis solution to etc,
researchers, academicians, healthcare 2. Velocity: It means data can be generated in
industries and medical science industries, high –speed, per day data generated, per hour,
per minute, and per second etc.
who have a lots of interest into analytics of
3. Variety: It means data can be represented in
big-data for a specific healthcare and different types like, structural data,
medical science industries. unstructured data and semi structured data for
We know that all healthcare Industries and example, data from email messages, articles,
Medical Science Researcher are dependent streamed videos and audios etc.
on data for analysis and processing on it,4. Value: It means data have some valuable
and that data are generated from information insight within it. This will be
Government Hospitals and Private Clinic useful information somewhere within the data
collaborative record of every old and new for outcomes.
patient‘s data, which is in the form of 5. Variability: It means data can be changes
different structure known as big-data. So during the processing; it may be producing
some unexpected, hidden and valuable
big-data can be processed and identified information from that data.
with the help of big-data characteristics. Veracity: It has to be focused on two terms:
We can say that big-data with V6‘s Data Trustworthiness and Data Consistency,
(Volume, Velocity, Variety, Value, we can also says that data is in doubt means,
Variability, Veracity) Characteristics to Ambiguity, Incompleteness and uncertainty
achieve dedicated outcomes. due to data inconsistency.

The consideration of all healthcare The advancement of pathological process

industries and medical science researchers with Digital Clinical Observation System
about big-data, it has some false (DCOS), radiology and last but not least
information and noisy data but all Robotic Guided Healthcare System
collaborative data has correlative so that (RGHS), etc can generate records which is
we have apply corrective big-data handling consisting of databases dumps, texts,
approach to achieve outcomes from that images as well as videos. These data can
data[1]. be planned for a particular location
If we consider big-data has today and may collaboratively to achieve expected
not be tomorrow due to advances in outcomes of Context based Retrieval
healthcare industries and medical science System (CRS) and accurate analytics
but data generation will not stop and day process which will helpful to provide cost-
by data is going to generate rapidly and it effective and fast service to individual
is difficult to manage it because day by patient and healthcare management [3].
day human beings requirement is The collaborative data leads towards large
increasing with their standards. But amount of data or volume of data with
optionally we can say that if all types of different structural view and hence in
data combined together with previous section we have introduced
collaboratively at one location then it is characteristics of big-data.
not difficult to process it and manage it. The volume of data created may be in the
Diagram create-big-data characteristics form of structured data or unstructured
and healthcare. type data, those data can be stored,
manipulated, recalled, analyzed or queried
2. HEALTH CARE DATA by using electronic machine.
The Big-data in healthcare industries and There are various types of data can be used
medical science refers to health data sets it in healthcare, it has categorized as follows,
is complex and hence it is difficult to1. 1. Genomic Data
manage with common data management This type of Genomic data refers to the
methods as well as traditional software and genome and DNA data of an organism.
hardware method [2]. They are used in bioinformatics for
The healthcare industries and medical collecting, storing and processing the
science researchers are mostly reliant on genomes of living things. Genomic data
Pharmacist, Hospitals, Radiologist, generally require a large amount of storage
Druggist, Pathologist, as well as any other and purpose-built software to analyze [1].
web services based applications which are2. 2. Clinical Data and Clinical Notes
related to healthcare and management. It is In this data approximately 80% data is
mandatory for every country, the health unstructured data with documents, images,
care process making as a beautiful way to clinical or prescription notes. Structural data is
digitalize which will helpful for data also available like laboratory data, structured
analysis and processing for the healthcare EMR/HER [1].
industries. In any government hospital or3. 3. Behavioral Data and Patient Sentiment Data
private hospital/clinic every new patient In this category generally data can be
registration is supposed to be recorded in considered search engine, internet consumer
Electronic Registration System (ERS), and uses and networking sites like, Facebook,
they need to be issued a secure chip-based Twitter, LinkedIn, blogs, health plan websites
data card, so that their record can be and Smartphone‘s, etc.
updated in various department which will 4. Administrative, Business and External
helpful to identify previous past record and Data
symptoms and other formalities don by In this category data comes from insurance
previous doctor with all details [2]. claim and related financial data, billing and

scheduling also biometric data can be distributed system for that mostly
considered like, Fingerprints, Handwriting and recommended Big-Data Analytic (BDA)
Iris scan etc [1]. tools, without any doubt the analysis tools
of healthcare it is beneficial and useful.
3. HEALTHCARE PATIENT
RECORD CHALLENGES 4. BIG-DATA ANALYTIC TOOL
In any hospital or private clinic, big In healthcare industry big problem is to
challenge is to manage and analysis of big- processing and execution of data and also
data of any new or existing patient. The all hospital as well as clinic suffering with
electronic record of patient can be same to manage big data of patients and its
composed in structured and semi- processing and execution is difficult task,
structured data and instrumental recording so that Big-Data Analytics tools plays
for health test, while unstructured data important role to process it easily in two
consisting of handwritten notes, patient‘s different ways centralized and distributed
admission and reliving records, ways[1].
prescriptions records etc, also the data may The BDA tools are naturally complex in
be web-based, machine–based, biometric- nature with widespread programming and
based and also data generated by human multi-skill applications combined together
(like, Twitter, Facebook, Sensors, Remote under one roof, so that it is not user-
Devices, Fingerprints, X-RAY, Scanning, friendly and its complexity of the process
EMRs, Mails etc.) these conventional will take place with the data itself. For this
records and digital data are combined in system different types of data need to be
Healthcare Big-Data (HBD). combined then raw data is transformed for
The execution of big-data is the most multiple availability points.
challenging task hence, most of the In healthcare industry how big-data is
researcher suggested for installation of supporting entire industry, actually it has a
big-data tools in the standalone system. benefits from these initiatives. In this
The big data is generally considered as paper we have focused on three area of big
voluminous data and when that data is data analytics, they are intended to provide
processing and execution of that data a perspective of broad and popular
should be on the distributed nodes. Hence research areas where the concept of big-
we need some knowledge about data data analytics are currently being applied.
analysis techniques and healthcare These areas are as
decision in better way which will help for 1. Healthcare industry aspect with BDA,
active enhancement. For processing and 2. Impact of Big-Data in Healthcare,
analysis we have some open source tools 3. Opportunities and Applications of Big
of distributed data processing [6]. data in Healthcare.
The big data in healthcare science and 4.1 Healthcare Industry Aspect with
industry is changing the way of patients BDA
and doctors healthcare system because The healthcare industry system is not only
voluminous data is involved it affects on one of the largest industries it is also one
more efficient and scalable healthcare, so of the most complex in nature, with many
it can be useful for every patients and patients constantly demanding better care
hospital to handle that data of each and management. Big-Data in healthcare
every patient record easily. The big data is industry along with industry analytics have
generally has huge voluminous data and it made a mark on healthcare, but one
has been processing, the execution is important point should be focused here is
carried out in distributed nodes. security concern and requires better skill
We know that for processing and programming aspect as end user skills are
execution of any voluminous data from not proposed. The healthcare industry has

some limitations in big data like, security, will also gain a financial advantage, by
privacy; ownership and its standard are not backing health trackers as well as wearable
proposed yet. to make sure patients don‘t actually over
4.2 Impact of Big-Data in Healthcare exceed their hospital stay. Patients could
Industry also benefit from this change, lowering
In healthcare industry big data changed their waiting time, by having immediate
everything with respect to data processing access to staff and beds. The analysis will
and execution including hospital and reduce staffing needs and bed shortages
clinic. Here we have focused some [4].
relevant connections on information [1]. 4.2.3 Patient Health Tracking
4.2.1 High Risk Patient Care We have focus on Identifying
We know that healthcare cost and potential health problems before they
complications always increasing lots of develop and turn into aggravating issues is
patient in emergency care. Due to higher an important goal for all organizations
cost it is not beneficial for poor patients functioning in the industry. Due to lack of
and so many patients are not taking data, the system has not always been able
benefit, so implementing the change in this to avoid situations that could have easily
department will be advantage and hospital been prevented otherwise. Patient health
will work properly [1]. If all records are tracking is another strong benefit that
digitized, patient patterns can be identified comes with big data, as well as The
more effectively and quickly, it will Internet of Things tech resources [2].
directly help to reduce time of checking 4.2.4 Patient Engagement could be
and applying proper treatment to that Enhanced
patient also it will help checking on Through big Data and analytics, an
patients with high risk of problem and increase in patient engagement could also
ensuring more effective, customized be obtained. Drawing the interest of
treatment can be benefitted. Lack of data consumers towards wearable and various
makes the creation of patient-centric care health tracking devices would certainly
programs more difficult, so one care bring a positive change in the healthcare
clearly understand why big data utilization industry, a noticeable decrease in
is important in healthcare industry. It wills emergency cases being potentially
clearly identifies and process with zero reached. With more patients understanding
error in execution flow of patient checking the importance of these devices,
and maintained of record of patient with physicians‘ jobs will be simplified, and an
all treatment details and hence big data engagement boost could be obtained
analytics tools need in healthcare industry through big data initiatives, once again [2,
[3]. 3].
4.2.2 Cost Reduction 4.3 Opportunities and Applications of
Generally we know that various Big-Data in Healthcare and Medical
hospitals, clinic and medical institutions Industry
are faced high level financial waste, due to We have mentioned in previous first
improper financial management. If section and second section of this
happens because of over booking of staff. regarding big data about its role, big data
Through predictive analysis, this specific can provide major support with all
problem can be solved, being far easier to different aspect in healthcare. We know
access help for effective allocation of staff that big data analytics (BDA) has gained
together with admission rate prediction [7, traction in genomics, clinical outcomes,
8]. Hospital investments will thus be fraud detection, personalized patient care
optimized, reducing the investment rate and pharmaceutical development; likewise
when necessary. The insurance industry there are so many potential applications in

healthcare and medical science areas some For healthcare system big data, the
of these applications are given in 4.2 Hadoop with Map Reduce framework is
impact of big data in healthcare industry. mostly suitable for storing wide range of
Following table shows that some of the healthcare data types including electronic
important application area of big data in medical records, genomic data, financial
healthcare industry and medical science. data, claims data, etc. It has high
Applicatio Business Big Data scalability, reliability and availability than
n Areas Problems Types traditional database management system.
Healthcare Fraud Machine The Hadoop Map reduce system will
detection generated, increase the system throughput and it can
Transaction process huge amount of data with proper
data, execution, so that it is helpful for
healthcare industry and medical science
Human
[5].
generated
The big data analytics tools are widely
Healthcare Genomic Electronic considered for complex type applications
health and it has widely used in healthcare
record, industry to manage all type of data under
Personal one roof with distributed architecture. In
health record following architecture we have given basic
Healthcare Behavioral Facebook, idea about different coming sources of big
and Patient Twitter, data it can be considered as a raw data
Sentiment LinkedIn, like, External, Internal, Multiple
data Blogs, Locations, Multiple Formats, and
Smartphone’s Applications [5, 6].
. Raw data from different sources can be
Science Utilities: Machine transformed on middleware with Extract,
transform, Load (ETL) in the form of
and Predict generated
traditional format. With transformed data
Technolog Power data
we use big data platforms and tools to
y Consumptio process and analytics it. Then we use
n actual bid data analytics applications [2].
Table 1: Big data Applications in Healthcare
5. TECHNOLOGY AND
METHODOLOGY PROGRESS IN
BIG DATA
In every field big data plays important role
with big data analytics tools, but here we
have focused in healthcare/ medical
science field. In medical and healthcare
field, large amount of data can be
generated about patient‘s medical
histories, symptoms, diagnosis and
responses to treatments and therapies
collected. Data mining are some time used
here for fining interesting pattern from
healthcare data with analytics tools with
In this architecture we have shown all
the help of Electronic Patient Record
different areas of application coved with
(EPR) of each patient [1].
big data analytics tools, but here we have

more focused on healthcare industry and treatment and hospital activity with all
medical science applications. doctor management with prior appoint of
every patient with department wise. These
6. BIG DATA CHALLENGES IN challenges are mostly considered for future
HEALTHCARE research with Big Data Analytics tools role
We know that big data characteristics i.e. in healthcare industry and medical science
V6, it is difficult to storage big amount of like sensor data and electronic records of
data also difficult to search, visualize, patient data privacy-preserving with data
retrieval, and curation. There are so many mining. In healthcare this type of changes
challenges in healthcare application some is necessary for sentiment analysis of big
of the major challenges in healthcare are data in healthcare science with patient
listed below [4]. personalized data and behavioral data. But
1. It is difficult to analyze and aggregate researcher point of view big data is the
unstructured data from different hospital best solution for healthcare industry and
and clinic from ERM, Notes, and Scan etc. medical science. In future we know that
2. The data which is provided by many data will be generating rapidly, so future
hospital and clinic are not accurate with generation healthcare big data will apply
quality factors also, so it is difficult to
with vast application of healthcare industry
analyze sometime with BDA.
3. Analyzing genomic data computationally and society. In this paper we have given
difficult task. many tools of BDA for healthcare industry
4. Data hackers can damage big data. as a solution and it will establish an
5. Information Security is a big challenge in efficient and cost effective quality
big data. management using data cluster manager.
7. CONCLUSION AND FUTURE REFERENCES

RESEARCH [1] Lidong Wang and Cheryl Ann Alexander.
Big Data has lots of challenges in ―Big data in Medical and Healthcare‖,
Department of Engineering Technology,
healthcare and medical science, due to lack Mississippi valley State University, USA,
of infrastructure, skills, privacy and 2015
information security, also data processing [2] A. Widmer, R. Schaer, D. Markonis, and H.
and its execution it is difficult in present Muller, ―Gesture interaction for content-based
system. In hospital and clinic they are not medical image retrieval,‖ in Proceedings of the
4th ACMInternational Conference on
maintain daily updation and lack of Multimedia Retrieval, pp. 503–506, ACM,
machineries of diagnosis with manual April 2014.
process, due to these manual processes it is [3] Weiss, G., "Welcome to the (almost) digital
difficult to handle each and every patient hospital," in Spectrum, IEEE, vol.39, no.3,
properly in given time span and hence pp.44-49, Mar 2002.
[4] Jun-ping Zhao, "Electronic health in China:
sometimes actual diagnosis and treatment from digital hospital to regional collaborative
not getting to that patient. Generally many healthcare," in Information Technology and
small and medium hospitals are offering Applications in Biomedicine, 2008. ITAB
manual process with documented 2008. International Conference on, vol., no.,
prescription, so it is difficult to carry all pp.26-26, 30-31 May 2008.
[5] Raghupathi, Wullianallur, and
prescription next time when we have given VijuRaghupathi. "Big data analytics in
appointment to visit in hospital at that time healthcare: promise and potential." Health
it is difficult to carry everything properly Information Science and Systems2.1, 2014.
and keep it safe in our home, so if all [6] Srinivasan, U.; Arunasalam, B., "Leveraging
record kept in electronic record at hospital Big Data Analytics to Reduce Healthcare
Costs," in IT Professional, vol.15, no.6, pp.21-
then it will very easy to find out every 28, Nov.-Dec. 2013.
patient information quickly and it will also [7] Hongsong Chen; Bhargava, B.; Fu
help for improvement of quality of Zhongchuan, "Multilabels-Based Scalable

Access Control for Big Data Applications," in [8] A. McAfee, E. Brynjolfsson, T. H. Davenport,
Cloud Computing, IEEE, vol.1, no.3, pp.65- D. J. Patil, and D. Barton, ―Big data: the
71, Sept. 2014. management revolution,‖ Harvard Business
Review, vol. 90, no. 10, pp. 60–68, 2012.

AI – ASSISTED CHATBOTS FOR E-COMMERCE

TO ADDRESS SELECTION OF PRODUCTS FROM
MULTIPLE CATEGORIES
Gauri Shankar Jawalkar1, Rachana Rajesh Ambawale2, Supriya Vijay Bankar3, Manasi
Arun Kadam4, Dr. Shafi. K. Pathan5, Jyoti Prakash Rajpoot6
1,2,3,4,5
Pune, India.
6
gaurisjawalkar@gmail.com1, rachanaambawale98@gmail.com2, supriyabankar70@gmail.com3,
manasi.kadam84.mk@gmail.com4, shafipathan@gmail.com5, jyotiet@gmail.com
ABSTRACT
Artificial Intelligence has been used to develop and advance numerous fields and
industries, including finance, healthcare, education, transportation, and more.
Machine Learning is a subset of AI techniques that gives machines the ability to learn
from data or while interacting with the world without being explicitly programmed. E-
commerce websites are trending nowadays due to online shopping makes customer‘s
life easier. Similar to this, Chatter Robots i.e. ChatBots are providing better customer
service through Internet. A chatbot is a software program for simulating intelligent
conversations with human using rules or artificial intelligence. Users interact with the
chatbot via conversational interface through written or spoken text. With the help of E-
commerce website sellers can reach to larger audience and with the help of chatbots,
sales can be increased by personal interaction with the users. Chatbots will welcome a
user to conversation, guide to customer to make purchase which will reduce customer‘s
struggle. Chatbots will ask customers all the relevant questions to find the perfect fit,
style, and color for them. Chatbots are the future of marketing and customer support.
Chatbots are one such means of technology which helps humans in a lot of ways, by
helping them increase sales whilst providing great customer satisfaction.
Keywords: online shopping, e-commerce, chatbot, customers, machine learning, artificial
intelligence, NLP
1. INTRODUCTION used business paradigm. More and more
With the development of internet business houses are implementing web
technology, network service plays an sites providing functionality for
increasingly important role in people‘s performing commercial transactions over
daily life. People expect that they can get the web. It is reasonable to say that the
the satisfied service or goods in a process of shopping on the web is
convenient way and in very short time. becoming common place.
Hence, the electronic commerce system at An online store is a virtual store on
this moment plays a very critical part. On the Internet where customers can browse
one hand, it is very convenient for people the catalogue and select products of
to look at the goods online and it also interest. The selected items may be
shortens people‘s time period for collected in a shopping cart. At checkout
shopping. On the other hand, for the time, the items in the shopping cart will be
enterprise, it shortens intermediate links, presented as an order. At that time, more
and it can reduce the geographic information will be needed to complete the
restrictions and decreases the merchandise transaction. Usually, the customer will be
inventory pressure, therefore, it can greatly asked to fill or select a billing address, a
save business operating cost. E-commerce shipping address, a shipping option, and
is fast gaining ground as an accepted and payment information such as credit card

number. An e- mail notification is sent to 1.1 Motivation

the customer as soon as the order is placed. There are various systems available
A chatbot is a software program for with chatbots which are currently in use. The
simulating intelligent conversations with available systems are only related to few
human using rules or artificial intelligence. categories, such as starbucks is related to food
Users interact with the chatbot via category i.e. coffee and snacks, Sephora is
conversational interface through written or related is associated with makeup material and
spoken text. Chatbots will welcome a user makeup tutorials. Due to poor memory,
to conversation, guide to customer to make Chatbots are not able to memorize the past
purchase which will reduce customer‘s conversation which forces the user to type the
struggle. Chatbots are the future of same thing again & again. This can be
marketing and customer support. Chatbots
cumbersome for the customer and annoy them
are one such means of technology which
because of the effort required.
helps humans in a lot of ways, by helping
Due to fixed programs, chatbots can
them increase sales whilst providing great
customer satisfaction. With the help of E- be stuck if an unsaved query is presented in
commerce website sellers can reach to front of them. This can lead to customer
larger audience and with the help of dissatisfaction and result in loss. It is also the
chatbots, sales can be increased by multiple messaging that can be taxing for users
personal interaction with the users. and deteriorate the overall experience on the
Digitization, the rise of the internet website.
and mobile devices has changed the way
people interact with each other and with 2. LITERATURE REVIEW
companies. The internet has boosted 2.1 Related Work
electronic commerce (ecommerce) and the
growth of wireless networks and mobile Anwesh Marwade, Nakul Kumar,
devices has led to the development of Shubham Mundada, and Jagannath Aghav
mobile e-commerce. Artificially intelligent have published a paper ―Augmenting E-
chatbots or conversational agents can be Commerce Product Recommendations by
used to automate the interaction between a Analyzing Customer Personality‖ in 2017
company and customer. Chatbots are in which they had focused on customer
computer programs that communicate with specific personalization. The e-commerce
its users by using natural language and industry predominantly uses various
engages in a conversation with its user by machine learning models for product
generating natural language as output. The recommendations and analyzing a
application of chatbots by businesses is no customer‘s behavioral patterns. With the
new development itself. help of e commerce based conversational
Chatbots have been around in bot, the personality insights to develop a
online web based environments for quite unique recommendation system can be
some time and are commonly used to utilized based on order history and
facilitate customer service. Chatbots can conversational data that the bot-application
respond with messages, recommendations, would gather over time from users.
updates, links or call-to-action buttons and Adhitya Bhawiyuga, M. Ali Fauzi,
customers can shop for products by going Eko Sakti Pramukantoro, Widhi Yahya
through a product carousel, all in the have published a paper ―Design of E-
messenger interface. A chatbot can Commerce Chat Robot for Automatically
recognize the buyer‘s intent and refine Answering Customer Question‖ in 2017 in
offerings based on the buyer‘s choices and which they had focused on design and
preferences. It can then facilitate the sale, implementation of e-commerce chatbot
order, and delivery process. system which provides an automatic

response to the incoming customer-to- proposed a system architecture which will

seller question. The proposed system try to overcome the above shortcoming by
consists of two main agents : analyzing messages of each ejabberd users
communication and intelligent part which to check whether it‘s actionable or not. If
can deliver the automatic answer in less it‘s actionable then an automated Chatbot
than 5 seconds with relatively good will initiates conversation with that user
matching accuracy. and help the user to resolve the issue by
S. J. du Preez1, M. Lall, S. Sinha providing a human way interactions using
have published a paper ―An Intelligent LUIS and cognitive services. To provide a
Web-Based Voice Chat Bot‖ in 2009 in highly robust, scalable and extensible
which they are presenting the design and architecture, this system is implemented
development of an intelligent voice on AWS public cloud.
recognition chat bot. The paper presents a Cyril Joe Baby, Faizan Ayyub
technology demonstrator to verify a Khan, Swathi J. N. have published a paper
proposed framework required to support ―Home Automation using IoT and a
such a bot (a web service). By introducing Chatbot using Natural Language
an artificial brain, the web-based bot Processing‖ in 2017 in which they focused
generates customized user responses, on a web application using which the fans,
aligned to the desired character. lights and other electrical appliances can
Bayu Setiaji, Ferry Wahyu be controlled over the Internet. The
Wibowo have published a paper, ―Chatbot important features of the web application
Using A Knowledge in Database‖, in 2017 are that firstly, we have a chatbot
in which they describe about a chatterbot algorithm such that the user can text
or chatbot aims to make a conversation information to control the functioning of
between both human and machine. The the electrical appliances at home. The
machine has been embedded knowledge to messages sent using the chatbot is
identify the sentences and making a processed using Natural Language
decision itself as response to answer a processing techniques. Secondly, any
question. The response principle is device connected to the local area network
matching the input sentence from user. of the house can control the devices and
From input sentence, it will be scored to other appliances in the house. Thirdly, the
get the similarity of sentences, the higher web application used to enable home
score obtained the more similar of automation also has a security feature that
reference sentences. only enables certain users to access the
Godson Michael D‘silva, Sanket application. And finally, it also has a
Thakare, Sharddha More, and Jeril functionality of sending an email alert
Kuriakose have published a paper ―Real when intruder is detected using motion
World Smart Chatbot for Customer Care sensors.
using a Software as a Service (SaaS) 2.2 Literature Review Analysis
Architecture‖ in 2017 in which they
Table 1 Literature Review Analysis
Title of Paper Author Publication Year Key Points
Real World Smart Godson Michael 1.Respond to
Chatbot for Customer D’silva, Sanket actionable
Care using a Software as Thakare, Sharddha messages.
a Service (SaaS) More, and Jeril 2017 2.Initiate
Architecture Kuriakose conversation and
help to solve issue.
3.Implemented on

AWS public cloud.

Chatbot Using A Bayu Setiaji, Ferry 1.Conversation
Knowledge in Database Wahyu Wibowo between human and
machine
2. Use of response
2017
principle
3. Machine identify
sentence & make
decision as response
Home Automation using Cyril Joe Baby, 1.Web application
IoT and a Chatbot using Faizan Ayyub Khan, to control home
Natural Language Swathi J. N appliances
Processing 2.Message sent to
2017
chatbot for control
Security and
network
connectivity
Design of E-Commerce Adhitya Bhawiyuga, 1. Provides
Chat Robot for M. Ali Fauzi, Eko automatic response
Automatically Sakti 2. Use of
Answering Customer Pramukantoro, 2017 communication and
Question Widhi Yahya intelligent agent
3. Good pattern
matching accuracy
Augmenting E- Anwesh Marwade, 1. Use of ML models
Commerce Product Nakul Kumar, Analyzes customers’
Recommendations by Shubham behavioral pattern
Analysing Mundada, and 2017 2. Utilizes
Customer Personality Jagannath Aghav personality
insights(order
history)
An Intelligent Web- S. J. du Preez1, M. 1. Proposed
Based Voice Chat Bot Lall, S. Sinha framework to
support web based
bot
2009
2. Use of Artificial
Brain to generate
customized user
responses
2.3 Existing Systems 1)The available systems are only related to
There are various systems available few categories, such as starbucks is related
with chatbots which are currently in use. to food category i.e. coffee and snacks,
Though these chatbots assisted systems are Sephora is related is associated with
in use there are some limitations makeup material and makeup tutorials.
associated with it. The limitations could 2)Poor memory can be a disadvantage to
be,- the system. Due to poor memory, Chatbots
will not able to memorize the past

conversation which forces the user to type SnapTravel helps users to book hotels
the same thing again & again. This can be according to their convenient location and
cumbersome for the customer and annoy timings. A customer can also get to know
them because of the effort required. about current deals available at various
3)Due to fixed programs, chatbots can be hotels and resorts.
stuck if an unsaved query is presented in 1-800 Flower
front of them. This can lead to customer 1-800 Flower will help customers to gift
dissatisfaction and result in loss. It is also flowers and gifts to someone for events
the multiple messaging that can be taxing like birthday, anniversary or any special
for users and deteriorate the overall occasion. It also offer suggestions for gifts
experience on the website. to customers.
Chatbots are installed with the motive to These available chatbots are related to
speed-up the response and improve only few categories. To combine all the
customer interaction. However, due to categories together at single place to
limited data-availability and time required integrate with chatbots for customer
for self-updating, this process appears service.
more time-taking and expensive.
Therefore, in place of attending several 3. PROPOSED WORK
customers at a time, chatbots appear The Proposed System E-commerce
confused about how to communicate with with Chatbots will permit consolidation of
people. customer login, browse and purchase
Starbucks products available, manage orders and
Starbucks has developed an Android and payments, engaging customer with
iOS application to place an order for personalized marketing, qualifying
favorite drink or snacks. The order can be recommendations based on history. The
placed with help of voice commands or main users of the project are customers
text messaging. who want to shop various products and
Spotify services online.
Spotify chatbots allow users to search for, From end-user perspective, the
listen to their favorite music. Also it allow proposed system consists of functional
users to share music. elements: login module to access online
Whole Foods products and services, browse and search
Whole Foods is related to groceries and products, purchase and pay for products,
food material. It allow to search for communicate to chatbot for better product
grocery items to shop for. Also it provides and offer recommendation.
interesting recipes for users to try. According the back end logic,
Sephora Natural Language Processing (NLP) will
Sephora is associated with makeup be used to understand messages sent by
material such as foundation, face primer, user through messaging platform. The
concealer, blush, highlighter, etc. Sephora Chatbot will launch an action as answer
chatbots also suggest for makeup tutorials with real time information based on
for which user is interested. machine algorithms such supervised and
Pizza Hut unsupervised learning. The bot will
Pizza Hut chatbot can help customer to improve with increasing number of
order pizza with favorite toppings and messages received.
carryout delivery. A customer can reorder The important features of the
favorite pizza based on previous orders system are handling thousands of
and can ask questions about current deals. customers simultaneously which will
SnapTravel provide better satisfaction to customers.
Also it will be a virtual but personal

assistant for each customer. Similar to see daily sales and details about deliveries.
chatbots, e-Commerce becomes one of the He will be able to see the feedbacks or
preferred ways of shopping as they enjoy reviews given by the customers.
their online because of its easiness and 3.2 Assumptions and Dependencies
convenience. The combination of e- There are few assumptions can be made while
commerce site with AI- assisted chatbots developing the proposed system:
will provide better customer service to  A user has an active Internet Connection
make profitable sales by personalized or has an access to view the Website.
marketing.  A user runs an operating system which
supports Internet Browsing.
The risks associated such as  The website will not be violating any
privacy issues can be handled with the Internet Ethic or Cultural Rules and won‘t be
help of authentication and Authorization to blocked by the Telecom Companies.
provide strong access control measures.  A user must have basic knowledge of
Intellectual property related risks can be English and computer functionalities.
avoided by proper instructions to upload 3.3 Communication Interface
data with restrictions. Online security is The system should use HTTPS
the most important risk to be considered protocol for communication over the
while developing the system regarding to internet and for the intranet
customers‘ credentials, online products communication will be through TCP/IP
and services available. Data storage could protocol suite as the users are connected to
be a risk associated with chatbots as they the system with Internet interface. The
store information to interact with the users. user must have SSL certificate licensing
The best solution in this situation is to registered web browser.
store the data in a secure place for a certain 3.4 System Architecture
amount of time and to delete it after that. Systems design is the process of
3.1 User Classes and Characteristics defining the architecture, components,
modules, interfaces, and data for a system
There are essentially three classes to satisfy specified requirements. Systems
of users of the proposed system: the design could be seen as the application of
general users, the customers and the systems theory to product development.
administrators. General users will be able There is some overlap with the disciplines
to see and browse through products of systems analysis, systems architecture
available to purchase, but they cannot buy and systems engineering.
the products and services. Customers are System architecture includes the
the users of the E-commerce System who modules used in the project and
will be able to browse, purchase, pay and relationships between them based on data
add products and services to the cart with flow and processing. AI – Assisted
available functionality. Chatbots will help Chatbots For E-Commerce System
them to make a purchase decision based on consists of following components:
various criteria and suggestions by chatbot General User
algorithms. Also customers can write Customer
reviews or feedbacks on products and Administrator
services they purchased.
The administrators will be having
advanced functionality to add, edit, update
and delete products available in inventory.
Also administrator will be able to
authorize and authenticate the users logged
into the system. Administrator will able to

able to handle products in inventory, sales

of products, refunds to the customers,
marketing of products, purchase and
transactions made by customers, shipment
of products ordered, records of invoices,
inventory and customers, etc.
Administrator must be authenticated as per
well-defined rules and standards with
his/her personal information, contact
information, product manufacturing
Fig. 1 System Architecture information and other required
information. Administrator is responsible
to manage quantity of products available
in inventory, deals and discounts
associated with specific products, manage
marketing of products on regional basis,
manage pricing of the products,
advertising of products to make them as
sponsored products to increase sales, etc.
A Customer will be a General User
who logged into the E-Commerce System.
A customer need to login to the Proposed
System with help of login credentials such
Fig. 2 System Architecture Components
as username and password, along with
The System Architecture shows the Name, Address, Phone No., E-Mail, Credit
main components of the Proposed System. Card Details. A customer can search
There are three main user classes of the products with filters as per need, purchase
system such as General User, Customer products, add products to shopping cart,
and Administrator. Along with these users track order shipment online, track record
the important components of the system of invoice, write reviews to purchased
are mentioned in diagram such as E- products. A General User can log into the
Commerce Website Home Page, Product system to make use of more features and
Categories, Inventory, Sales and functionalities of the system. A User can
Marketing, Shopping Cart, Purchase and be Customer after logging into the system
Invoice Generation and Order Tracking
that not only can add products to the
Shipment. shopping cart but also can purchase the
A General User is the basic products.
component of the Proposed System who The proposed system will verify
will be able to browse and search through and authorize the person by verifying the
the filters. General User will be directed to Phone No and E-mail ID. Also it will
the E-Commerce Website Home Page with verify the Credit Card Details if provided
various products of different categories, by the user. All the details of each
deals of the specific products, Menu option individual customer will be stored in
to search various categories and log in to database with unique ID. User
the system. Also Chatbot will interact with Authentication allows user to search and
the customer with some basic standards of purchase the products available in
interaction. inventory with provided address, date of
A General User can be delivery, shipping type, etc. Also a
Administrator with help of Seller Portal of shopping cart is available for each
the E-Commerce Website. Administrator individual customer to add products from
will be the authorized person who will be
inventories which are to be purchased. to login to the system with the help of
Chatbot will use the search and purchase login credentials such as username and
history if the user is authenticated by the password before managing inventory.
system for product suggestions and Administrator can add new
recommendations. products with new or existing categories
A customer can modify his/her along with its description and images, add
profile or account created into the number of products to already listed
Proposed System. Before updating the products available in inventory. A
profile the user need to authenticate that customer who logged into the system can
he/she is the original user of the account or search and add these products to the
profile by providing login credentials such Shopping Cart. Also customer can
as username and password. After this the purchase the products in inventory or the
customer need to provide the attributes products added to the shopping cart from
need to be updated such address, phone no, inventory.
mail ID, credit card details, etc. A Shopping Cart is the temporary
A Customer can interact with storage to save the products which a
chatbot to make a purchase decision. customer may want to purchase in future.
Chatbot will interact with a customer The Shopping Cart is separate storage for
based on customer‘s browse or search individual customer who logged into the
history, purchase history. Chatbot will Proposed System. The products added to
make use of customer‘s record to suggest the shopping cart can be purchased by the
products from inventory to customer to customer. To purchase the product, the
purchase. Initially Chatbots will interact customer need to provide related
with basic set of rules designed with information such as, Name, Address,
Machine Learning Algorithms. With more Phone No., Date of Delivery, Shipping
interaction with customer, chatbots will Type, Payment Method and Credit Card
also get improved to recommend products Details in case of Card Payment. A
to customers based on customer‘s search customer can modify the shopping cart
and purchase history. items such as customer can either purchase
E-Commerce Website‘s Home the products in shopping cart or customer
Page is designed with the important can remove the products from the
features such as deals of the specific shopping cart.
products, best seller products, discounts of Purchase history is recorded in
products, various categories of the form of invoice reports, order reports and
products to choose, feature to login to the transaction reports. Invoice is generated
system and a way to interact with chatbot. after the customer purchases the products
A user can browse through these from inventory which includes all the
categories to view various products such details of purchase and transactions made
as clothes, accessories, beauty products, by the customer. It includes the details of
shoes, bags, etc. product purchased such as price, quantity,
Inventory is a collection of all product ID, product category along with
categories of the products. Administrator customer details such as name, address of
is allowed to add products to the inventory delivery, shipping type, date of delivery,
separated based on its category. For phone no and payment method. All the
Example, Clothes is category of products details regarding purchase history are used
such as shirts, tops, t-shirts, jeans, skirts, by chatbot to interact with the customer
party wear dresses, etc. Similarly various based on his/her history to suggest or
products of different category can be recommend products. A customer can
added to the inventory by the track the shipment of order based on
Administrator person. Administrator need invoices recorded or transactions saved to

his/her profile. The online tracking of play more important role by providing
order can help to customer to locate his/her personalized suggestions. This will help
product. customers to make a purchase decision which
Sales and Marketing involves the will increase profitable sales by personalized
techniques to suggest a customer to marketing. It will improve the performance of
purchase particular products with the proposed system due to personal
advertisement. It is done based on the recommendations.
keywords searched by the customers for  Use of AI Concepts
the products he/she wants to purchase. AI concepts such as Artificial
Advertising of products is a way of Neural Networks (ANN), Natural Language
Marketing to increase Sales of products. Processing (NLP), and Machine Learning
All these things are managed by the
(ML) Algorithms are used in the proposed
Administrator to make maximum sales of
system. Machine learning algorithms such as
products with the help of marketing of
Supervised and Unsupervised Algorithms will
products.
improve the performance of the proposed
4. PERFORMANCE ANALYSIS system. Linear Regression are capable for
The performance of the proposed prediction Modelling and minimizing risk of
system can be analyzed based on few failure. Linear Regression are used for
parameters. These parameters can be used predicting responses with better accuracy. It
to measure the performance of the system makes use of relationship between input
in comparison with the existing system. values and output values. Naïve Bayes
The parameters can be used for this Algorithm is used for large data set for ranking
analysis are,- or indexing purpose. It will help to rank the
 Human Machine Interaction products based on customer reviews. Semi-
To provide better interaction supervised algorithms will help to handle the
between Human and Machine, the concepts of combination of both labelled and unlabelled
AI such as, Artificial Neural Network (ANN), data. NLP is useful to understand the human
Natural Language Processing (NLP) and understandable language by machine and
Machine Learning Algorithms are used in the generate the response in human language. It
Proposed System. Human users will interact to will make use of elements of Named Entity
the system through the Chatbot which a Recognition, Speech Recognition, Sentiment
Software program designed to communicate analysis and OCR. All these concepts will help
with the user. These Machine Learning to enhance the performance of the proposed
Algorithms will help Chatbot to generate system.
response using Supervised and Unsupervised
algorithms. This will enhance the performance 5. CONCLUSION
of the proposed system as compared to The Internet has become a major
existing system as fixed programs are used for resource in modern business, thus electronic
Chatbot in existing system. shopping has gained significance not only
 Better Recommendations to User from the entrepreneur‘s but also from the
Recommendations are the suggestions customer‘s point of view. For the
provided to the user based on the search or entrepreneur, electronic shopping generates
browse history and purchase history of the new business opportunities and for the
particular user. These recommendations customer, it makes comparative shopping
provided by Chatbot can be in the form of possible. As per a survey, most consumers of
Product recommendations with links, updates online stores are impulsive and usually make a
for latest products. To facilitate the customer decision to stay on a site within the first few
service and support recommendations will seconds. Hence we have designed the project

to provide the user with easy navigation, World Smart Chatbot for Customer Care using
retrieval of data and necessary feedback as a Software as a Service (SaaS) Architecture‖,
2017
much as possible. [6] S. J. du Preez1, M. Lall, S. Sinha, ―An
As we have seen in this project, the Intelligent Web-Based Voice Chat Bot‖ , 2009
process of creating a user-friendly and [7] Cyril Joe Baby, Faizan Ayyub Khan, Swathi J.
straightforward platform that facilitates the N.,‖ Home Automation using IoT and a
Chatbot using Natural Language Processing‖,
administrator‘s job is one filled with 2017
complexity. From understanding user [8] Ellis Pratt, ―Artificial Intelligence and
requirements to system design and finally Chatbots in Technical Communication‖, 2017
[9] Bayan Abu Shawar, Arab Open University,
system prototype and finalization, every step
Information Technology Department, Jordan,
requires in-depth understanding and ―Integrating Computer Assisted Learning
commitment towards achieving the objective Language Systems with Chatbots as
of project. Conversational Partners‖, 2017
[10] Aditya Deshpande, Alisha Shahane , Darshana
So this is an efficient and effective Gadre, Mrunmayi Deshpande, Prof. Dr. Prachi
way for the customers to purchase products M. Joshi, International Journal of Computer
online with the help of Chatbot within a few Engineering and Applications, Volume XI, ―A
steps. . With the help of E-commerce website Survey Of Various Chatbot Implementation
Techniques‖, May 2017
sellers can reach to larger audience and with [11] Sameera A. Abdul-Kader, Dr. John Woods,
the help of chatbots, sales can be increased by International Journal of Advanced Computer
personal interaction with the users. In this Science and Applications (IJACSA), Vol. 6,
way, this application provides an optimized ―Survey on Chatbot Design Techniques in
Speech Conversation Systems‖, 2015
solution with better availability, [12] M. J. Pereira, and L. Coheur, ―Just. Chat-a
maintainability and usability. platform for processing information to be used
in chatbots,‖, 2013.
REFERENCES [13] A. S. Lokman, and J. M. Zain, American
[1] Adhitya Bhawiyuga, M. Ali Fauzi, Eko Sakti Journal of Applied Sciences, vol. 7, ―One-
Pramukantoro, Widhi Yahya, ―Design of E- Match and All-Match Categories for
Commerce Chat Robot for Automatically Keywords Matching in Chatbot,‖, 2010.
Answering Customer Question‖, University of [14] S. Ghose and J. J. Barua, Proc. IEEE of 2013
Brawijaya Malang, Republic of Indonesia, International Conference on Informatics,
2017 Electronics & Vision (ICIEV), ―Toward The
[2] Anwesh Marwade, Nakul Kumar, Shubham Implementation of A Topic Specific Dialogue
Mundada, and Jagannath Aghav have Based Natural Language Chatbot As An
published a paper ―Augmenting E-Commerce Undergraduate Advisor,‖ , 2013.
Product Recommendations by Analyzing [15] R. Kar, and R. Haldar, "Applying Chatbots to
Customer Personality‖, 2017 the Internet of Things: Opportunities and
[3] Bayu Setiaji, Ferry Wahyu Wibowo, ―Chatbot Architectural Elements".
Using A Knowledge in Database‖, 2017 [16] McTear, Michael, Zoraida Callejas, and
[4] Abdul-Kader, S. A., & Woods, J., ―Survey on David Griol, Springer International Publishing,
chatbot design techniques in speech "Creating a Conversational Interface Using
conversation systems‖, International J. Adv. Chatbot Technology.", 2016.
Computer Science Application, 2015
[5] Godson Michael D‘silva, Sanket Thakare,
Sharddha More, and Jeril Kuriakose, ―Real

DISTRIBUTED STORAGE, ANALYSIS, AND

EXPLORATION OF MULTIDIMENSIONAL
PHENOMENA WITH TRIDENT FRAMEWORK
Nikesh Mhaske1, Dr Prashant Dhotre2
1,2
Department of Computer Engineering, Dr.D.Y.Patil Institute of Technology,Pune, India
Nikeshmhsk994@gmail.com1, Prashantsdhotre@gmail.com2
ABSTRACT
Today‘s rising storage and computational capacities have led to the accumulation of
very lengthy and detailed datasets. These datasets contain an accurate and deep
understanding that describe natural phenomena, usage patterns, trends, and other
aspects of complex, real-world systems. Machine learning models and Statistical are
often employed to identify these patterns or attributes of interest. However, a wide array
of potentially relevant models and defining or choosing parameters exist, and may
provide the best performance only after preprocessing steps have been carried out.
TRIDENT is an integrated framework that targets both how and where training data is
stored in the system. Data partitioning can be configured using multiple strategies,
including hash-based and spatially-aware partitioners. The default partitioner performs
correlation analysis between independent and dependent variables to achieve
dimensionality reduction. Reduced-dimensionality feature vectors are then clustered
and dispersed to storage nodes that hold similar data. Clustering data points with high
similarity enables to create the specialized models that outperform models generated
with randomly-placed data. Trident supports three key aspects of handling data in the
context of analytic modeling: (1) distribution and storage, (2) feature space
management, and (3) support for ad hoc retrieval and exploration of model training
data.
Keywords— Distributed analytics, very lengthy data management, machine learnin
1. INTRODUCTION storage, (2) feature space management,
Recent advancements in distributed and (3) support for ad hoc retrieval and
storage and computation engines have exploration of model training data. In this
enabled analytics at a never done scale, incoming feature vectors are partitioned to
with systems such as Spark and Hadoop facilitate targeted analysis over specific
allowing users to build distributed subsets of the feature space.
applications to gain insight from very Transformations supported by TRIDENT
lengthy and detailed, multidimensional include normalization, binning, and
datasets. While these systems are highly support for dimensionality reduction based
effective from a computational standpoint, on correlation analysis. Retrieval and
both exploration and feature engineering Exploration of model training data is
for machine learning models require enabled by expressive queries that can
several rounds of computation and incur prune the feature space, sample across
I/O costs as data is migrated into main feature vectors, or combine portions of the
memory. To address these use cases we data. Exposing this functionality at the
propose TRIDENT, which supports three storage level allows many steps in the
key aspects of handling data in the context feature engineering process to be
of analytic modeling: (1) distribution and performed before analysis begins. By use

to maximum advantage this functionality, interactive information mining tools. In

researchers and practitioners can explore each case, keeping information in memory
and inspect their datasets in an interactive can improve performance by associate
fashion to help guide the creation of order of magnitude.
machine learning models or visualizations To achieve fault tolerance expeditiously,
without needing to write ad-hoc RDDs offer a restricted variety of shared
applications or wait for heavyweight memory, supported coarse-grained
distributed computations to execute. transformations instead of fine-grained
updates to shared state. However, we have
2. LITERATURE SURVEY a tendency to show that RDDs are
Tensor Flow [1] could be a machine communicatory enough to capture a large
learning system that operates at large scale category of computations, including recent
and in heterogeneous environments. specialized programming models for
Tensor- Flow uses dataflow graphs to unvarying jobs, like Pregel, and new
represent computation, shared state, and applications that these models don't
also the operations that change that state. It capture. We have enforced RDDs during a
maps the nodes of a dataflow graph across system referred to as Spark that we have a
several machines in a cluster, and at tendency to judge through a spread of user
intervals a machine across multiple applications and benchmarks.
process devices, together with multicore Distributed Storage System [3] provides
CPUs, general purpose GPUs, and custom- big table may be a distributed storage
designed ASICs called Tensor process system for managing structured knowledge
Units (TPUs). This design provides that's designed to scale to a awfully giant
flexibility to the applying developer: size: petabytes of information across
whereas in previous ―parameter server‖ thousands of goods servers. Several come
styles the management of shared state is at Google store knowledge in Big table,
constructed into the system, Tensor Flow including net assortment, Google Earth,
permits developers to experiment with and Google Finance.
novel optimizations and coaching These applications place terribly
algorithms. completely different demands on Big
Tensor Flow supports a range of table, each in terms of information size
applications, with a spotlight on coaching (from URLs to web pages to satellite
and illation on deep neural networks. imagery) and latency necessities (from
Several Google services use Tensor Flow backend bulk process to period of time
in production, we have free it as associate knowledge serving). Despite these varied
ASCII text file project, and it has become demands, Big table has with success
wide used for machine learning analysis. provided a versatile, superior resolution
Here, we have a tendency to describe the for all of these Google product.
Tensor Flow dataflow model and During this paper we have a tendency to
demonstrate the compelling performance describe the easy data model provided by
that Tensor Flow achieves for many real- Big table, which supplies shoppers
world applications. dynamic management over knowledge
Resilient Distributed Datasets (RDDs) [2], layout and format, and that we describe the
a distributed memory abstraction that lets design and implementation of Big table.
programmers perform in-memory Decentralized Structured Storage System
computations on massive clusters during a [4] may be a distributed storage system for
fault-tolerant manner. RDDs ar motivated managing terribly large amounts of
by 2 sorts of applications that current structured knowledge opened up across
computing frameworks handle several commodity servers, whereas
inefficiently: unvarying algorithms and providing extremely offered service with

no single purpose of failure. Cassandra networking capabilities has led to growth

aims to run on high of associate degree in both the rates and sources of data that
infrastructure of many nodes (possibly ultimately contribute to extreme- scale
unfold across die rent knowledge centers). data volumes. Datasets generated in such
At this scale, tiny and enormous settings are frequently multidimensional,
components fail incessantly. with each dimension accounting for a
The method Cassandra manages the feature of interest. We posit that efficient
persistent state within the face of those evaluation of queries over such datasets
failures drives the reliable ness and must account for both the distribution of
quantity friability of the computer code data values and the patterns in the queries
systems relying on this service. Where as themselves. Configuring query
in many ways Cassandra resembles a info evaluation by hand is impracticable given
and shares several style and the data volumes, dimensionality, and the
implementation strategies with that, rates at which new data and queries arrive.
Cassandra doesn't support a full relative Here, we describe our algorithm to
data model; instead, it provides purchasers autonomously improve query evaluations
with an easy data model that supports over voluminous, distributed datasets. Our
dynamic management over knowledge approach independently tunes for the most
layout and format. dominant query patterns and distribution
Cassandra system was designed to run on of values across a dimension. We evaluate
cheap trade goods hardware and handle algorithm in the context of our system,
high write output while not searching scan Galileo, which is a hierarchical distributed
efficiency. Cassandra may be a distributed hash table used for managing geospatial,
storage system for managing terribly are time-series data. Our system strikes a
amounts of structured knowledge opened balance between fast evaluations, memory
up across several commodity servers, utilization and search space reductions.
whereas providing extremely offered Empirical evaluations reported here are
service with no single purpose of failure. performed on the dataset that is
Cassandra aims to run on high of associate multidimensional and comprises a billion
degree infrastructure of many nodes files. The schemes described in work are
(possibly unfold across direct knowledge broadly applicable to any system that
centers). At this scale, tiny and enormous leverages distributed hash tables as a
components fail incessantly. The method storage mechanism.
Cassandra manages the persistent state
within the face of those failures drives the 3. PROPOSED METHODOLOGY
reliable hence and quantifiability of the A key theme underpinning these core
computer code systems relying on this capabilities is the preservation of
service. Whereas in many ways Cassandra timeliness, allowing the analyst to quickly
resembles a info and shares several style identify interesting data, gather insights, fit
and implementation strategies with that, models, and assess their quality.
Cassandra doesn't support a full relative To contrast with other approaches,
data model; instead, it provides purchasers consider a basic computational operation
with an easy data model that supports — retrieving the average (mean) of a
dynamic management over knowledge particular feature. While straightforward in
layout and format. Cassandra system was an algorithmic sense, this requires heavy
designed to run on cheap trade goods disk and memory I/O in systems such as
hardware and handle high write output Hadoop or Spark, whereas in TRIDENT
while not searching scan efficiency. the operation can be completed in less than
Distributed Hash Tables [5] proliferation 1 ms by querying our indexing structure.
of observational devices and sensors with Since the metadata collected by the system

is general and can be fused, filtering such a understand phenomena or forecast

query based on time or additional feature outcomes.
values does not incur additional latency.
TRIDENT is designed to assimilate data

incrementally as it arrives, allowing both
streaming and in-place datasets to be
managed. The system employs a network
design based on distributed hash tables
(DHTs) to ensure scalability as new nodes
are added to its resource pool, and uses a
gossip protocol to keep nodes informed of
the collective system state. This allows Fig. 1. TRIDENT architecture: multidimensional
flexible preprocessing and creation of records are partitioned and indexed for subsequent
training data for statistical and machine analysis through expressive queries.
learning models. Our methodology
encompasses three core capabilities: 4. CONCLUSIONS
1) Data Dispersion: Effective dispersion TRIDENT controls the placement of
of the dataset over a collection of nodes incoming feature vectors by reducing their
underpins data locality, representativeness of dimensionality and clustering similar data
in-memory data structures, and the efficiency points.
of query evaluations. The resulting data Cluster quality is evaluated with the Davies-
locality promotes timeliness during Bouldin index, and we demonstrate
construction of specialized models for improvements in building specialized local
different portions of the feature space. models across the nodes in the system.
2) Feature Space Management: After partitioning, feature vectors are passed to
TRIDENT maintains memory-resident online sketch instances and our memory-
metadata to help locate portions of the dataset, resident, hierarchical analytic base tree (ABT)
summarize its attributes, and preprocess data structures. This allows information to be
feature vectors. Online sketches ensure the retrieved about the underlying dataset and
data can be represented compactly and with transformations to be applied without
high accuracy, while preprocessing activities requiring disk I/O.
enable operations such as dimensionality Additionally, our analytic base trees support
reduction or normalization. flexible queries to locate and refine portions of
3) Data Selection and Model the feature space in memory. Online summary
Construction: TRIDENT supports interactive statistics also provide detailed information
exploration via steering and calibration queries about the features under study without
to probe the feature space. These real time accessing files on disk, and preprocessing
queries help analysts sift and identify training operations are cached to reduce duplicate
data of interest. Training data can be exported transformations.
to a variety of formats, including Data Frame Finally, our query-driven approach allows
implementations supported by R, Pandas, and subsets of the feature space to be selected,
Spark. TRIDENT also manages training and creating training data sets that can be passed
assessment of analytical models via generation on to machine learning frameworks. To
of cross validation folds and bias-variance support such activities, we provide a base set
decomposition of model errors. of analytical models that can serve as pilot
While we evaluate TRIDENT in the studies. Bias-variance decomposition of these
context of two representative datasets, models is also made available to allow the
our methodology does not preclude the use analyst to judge performance.
of data from other domains with similar
dimensionality (hundreds to thousands of
dimensions) where there is a need to REFERENCES

[1] M. Abadi, P. Barham, J. Chen et al., ―Tensor Fikes, and R. E. Gruber, ―Bigtable: A
flow: A system for large-scale machine distributed storage system for structured data,‖
learning,‖ in Proceedings of the 12th USENIX ACM Trans. Comput. Syst., vol. 26, no. 2, pp.
Conference on Operating Systems Design and 4:1–4:26, Jun. 2008. [Online]. Available:
Implementation, ser. OSDI‘16. Berkeley, CA, http://doi.acm.org/10.1145/1365815.1365816
USA: USENIX Association, 2016, pp. 265– [4] A. Lakshman and P. Malik, ―Cassandra: A
283. [Online]. decentralized structured storage system,‖
Available:http://dl.acm.org/citation.cfm?id=30 SIGOPS Oper. Syst. Rev., vol. 44, no. 2, pp.
26877.3026899 35–40, Apr. 2010.
[2] M. Zaharia et al., ―Resilient distributed [5] M. Malensek, S. L. Pallickara, and S.
datasets: A fault tolerant abstraction for in- Pallickara, ―Autonomously improving query
memory cluster computing,‖ in Proceedings of evaluations over multidimensional data in
the 9th USENIX Conference on Networked distributed hash tables,‖ in Proceedings of the
Systems Design and Implementation, ser. 2013 ACM Cloud and Autonomic Computing
NSDI‘12. Berkeley, CA, USA: USENIX Conference (CAC), Sep 2013, pp. 15:1–15:10.
Association, 2012, pp. 2–2. [Online]. [Online]. Available:
Available: http: https://www.cs.usfca.edu/mmalensek/publicati
//dl.acm.org/citation.cfm?id=2228298.2228301 ons/malensek2013autonomously.pdf
[3] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh,
D. A. Wallach, M. Burrows, T. Chandra, A.

DATA MINING
AND
INFORMATION
RETRIEVAL

UTILISING LOCATION BASED SOCIAL MEDIA

FOR TARGET MARKETING IN TOURISM:
BRINGING THE TWITTER DATA INTO PLAY
Prof. G. S. Pise1, Sujit Bidawe2, Kshitij Naik3, Palash Bhanarkar4, Rushikesh Sawant5
1,2,3,4,5
Dept. of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune, India.
ganeshpise143@gmail.com1, sbtbdw@gmail.com2, kshitijnaik96@gmail.com3,
palashbhanarkar@gmail.com4, rushikeshsawant1027@gmail.com5
ABSTRACT
The way of growing body of literature has been devoted to harnessing the
crowdsourcing power of social media by extracting ―hidden‖ knowledge from huge
amounts of information or data available through online portals. The basic need to
understand how social media affect the hospitality and tourism field has increased. In
this paper the discussions and demonstrations focus on social media analytics using
twitter data referring to "XYZ‖ travel. This paper also gives an idea on how social
media data can be used indirectly and with minimal costs to extract travel attributes
such as trip purpose and activity location. The results of this paper open up avenues for
travel demand modellers to explore possibility of big data to model trips and also this
study provides feasible marketing strategies to help growth of business as well as
customer satisfaction.
Categories and Subject Descriptors
[Database Applications]: Data mining, Spatial databases and GIS
[Online Information Services]: Web-based services
KEYWORDS: Twitter, social media, big data analytics, location based social media
1. INTRODUCTION 1. Tour purpose
Transport infrastructure is one of the most 2. Departure time
important factors for a country's progress. 3. Mode of transport
Although India has a large and diverse 4. Tour duration
transport sector with its own share of 5. Tour location
challenges, they can be overcome by 6. Travel route
energy-efficient technologies and 7. Party organization
customer-focused approach. 8. Traffic state
It has been proven by so many instances
how transport infrastructure has added Challenges for travel modeller are:
speed and efficiency to a country's 1. Complexity
progress. India, the seventh largest nation 2. Cost
with over a billion population, has one of 3. Weather Condition
the largest transport sectors. But not one 4. Traveller Anxiety
without its own set of challenges. Travel Although focusing on increasing the
demand modelling is widely applied for spectrum and quantity of data captured and
analysis of major transportation analyzed will be a key trend, the quality of
investments. Marketing is one hefty task in both the underlying data and the end
the tourisms field since to know the product will arguably be of even greater
interested customers, to find them out, to importance. Quality is the single greatest
make them interested in your schemes issue and there is the potential not just to
especially in a country like India with vast be inefficient in using an analytics
population is as hard as it can get. program, but downright dangerous.
Important attributes considered for travel Feeding large sums of irrelevant or simply
modelling are: incorrect data into a data program has the

potential to push the right conclusions into information is a challenging task,

the margins and promote erroneous especially for attributes such as traveller
findings. We can therefore expect travel and tourists finding. As a result, the
brands to be examining closely at the accuracy of the outcome is not expected to
quality of data both internally and be high unless advanced data mining and
externally and looking to corroborate their linguistic techniques are used.
results with findings from alternative Nonetheless, the true potential of these
independent sources. techniques in extracting information from
Improving data governance, accessibility, social media data is yet to be explored.
and structure will also be crucial to driving The data obtained from various social
forward the complexity of analytics. The media provides us vision into social
travel industry is relatively well advanced networks and users that was not available
in introducing analytics programs but we before in both scale and extent. This social
can anticipate organizations to media data can surpass the real-world
increasingly move down the funnel margins to study human interactions and
towards predictive and prescriptive help measure popular social and political
analytics. sentiment associated with regional
Being able to achieve more advanced populations without time consuming
analytics requires more sophisticated explicit surveys. Social media effectively
systems and a significant part of the records viral marketing trends and is the
budgetary increases will be spent on new ideal source to study, to better understand
tools to mine and interpret data. It was and leverage influence mechanisms.
found that the most revenue managers felt However, it is absolutely challenging to
they didn‘t have all the tools necessary to obtain relevant data from social media data
do their job, which demonstrates the depth without applying data mining techniques
of the market for further spending on tech. due to vivid obstacles.
Extracting relevant information is not a Data can now be stored in many different
challenge if only general information is kinds of databases and information
used and but it is when hash-tag data or repositories. One data repository
check-in data is in use. architecture that has emerged is the data
This study attempts to investigate how warehouse, a repository of multiple
social media data can be used to ease and heterogeneous data sources organized
augment cross sales, target marketing, under a unified schema at a single site in
transportation planning, management and order to facilitate management decision
operation. This paper is structured as making. Data warehouse technology
follows. First, the motivation includes data cleaning, data integration,
forundertaking of this project is and on-line analytical processing (OLAP).
elaborated, then the literature is reviewed
with a focus on the application of social
media data in the field of tours and travels.
Then a comprehensive framework is
presented for using social media data in
domain of travel sales. Next, proposed
system is discussed, followed by a
summary and future scope.
2. MOTIVATION
Generally, the cost of obtaining such social
media data is trivial. But processing such
massive databases to extract travel

Paper Author Publication
Effectiveness of Bayesian Rashidi, T., J. Auld, and A. Transportation Research

Updating Attributes in Data Mohammadian Record: Journal of the
Transferability Applications. Transportation Research
Board, 2013
Effect of Variation in Rashidi, T., A. Mohammadian, Transportation Research

Household and Y. Zhang Record: Journal of the
Sociodemographics, Transportation Research
Lifestyles, and Built Board, 2010
Environment on Travel
Behavior.
Urban Passenger Data Transportation Association of Transportation Association
Collection: Keeping Up With Canada of Canada
a Changing World
[1] Rashidi, T., J. Auld, and A. Unlike traditional method of following

Mohammadian, Effectiveness of normal(Gaussian) distribution; more than
Bayesian Updating Attributes in Data 40 different probability density functions
Transferability Applications. were tested and validated on 11 clusters of
Transportation Research Record: homogeneous household types
Journal of the Transportation Research representing their lifestyles, 22 household
Board, 2013(2344): p. 1-9. and individual level travel attributes were
The applications of the Bayesian updating considered.
formulation in the transportation and travel
demand fields are continually growing. [3]Urban Passenger Data Collection:
Improving the state of belief and Keeping Up With a Changing World:
knowledge about data by incorporating the Transportation Association of Canada
existing prior information is one of the
major properties of the Bayesian updating https://www.tac-atc.ca/sites/tac-
that makes this approach superior atc.ca/files/site/doc/Bookstore/data-
compared with other approaches to collection-primer.pdf
transferability. This approach would provide an
―organic‖, voluntary, one- step-at-a-time
[2]Rashidi, T., A. Mohammadian, and approach to evolving a national data
Y. Zhang, Effect of Variation in collection program that would be driven
Household Sociodemographics, ―bottom up‖ by the provincial and
Lifestyles, and Built Environment on municipal organisations. It would facilitate
Travel Behavior. Transportation collaboration and the sharing of data and
Research Record: Journal of the experience among provinces and their
Transportation Research Board, constituent urban areas across the nation.
2010(2156): p. 64-72. And it would encourage experimentation

through the spreading of risk and, facilities provided to them. This gives the
possibly, pooling of funds. usefulness of Twitter data for analysing
the behaviour of tourists in cities. The data
4. PROPOSED WORK we obtain from the twitter is in huge
In the proposed system, we cover the amount at a time, which results broad
limitation of previous methods of vague insight at a time. The approach is more
marketing in tourism and make targeting time efficient since we can target a highly
efficient with focused approach for data scalable area as required in a single go.
acquisition and marketing using Twitter Twitter data provides various information
data. about its users which is difficult to obtain
With help of the Twitter data, we can find otherwise. This social media data enables
the tourists more efficiently. The use of in efficient target marketing with variable
dynamic Twitter data makes marketing parameters as per the need be.
optimum in terms of time and targeted
approach. The system uses recently 6. FUTURE WORKS
updated data for client search which makes 1. In-home activity data:
it better. If the activity is scheduled to happen at
home, one out-of-home activity is
DATA FLOW DIAGRAM - LEVEL 0 cancelled, which results in fewer trips on
the transport network, which is of great
importance to travel demand modellers
and planners.
2. Tour formation:
Tour formation requires collecting
information about trips. Twitter users often
provide information about their daily
activities which helps to extract
information about the location, time and
purpose of different activities. Using
Twitter data for modelling tour formation
behaviour can significantly complement
the models that are developed using
household travel surveys.
3. Future activities:
When the Twitter data is extracted using
different techniques, it becomes possible
to recognize potential future activities. In
Fig. Data Flow Diagram- Level 0
other words, based on his/her tweet about
the place he/she wants to visit is likely to
5. SUMMARY AND CONCLUSION
be at that location at a time to be
This paper focuses on how Twitter data to
determined. This helps to manage the
be used in analysing the individual level
future tours and their activities.
travel behaviour of users. This framework
helps for more applications of Twitter and
7. ACKNOWLEDGEMENTS
other social media data for client search
With due respect and gratitude we would
for travel industry, management, sales and
like to take this opportunity to thank our
operation purposes. With the help of
internal guide PROF. G. S. PISE for
Twitter posts it becomes easy to track the
giving us all the help and guidance we
tourists around the needed location. It was
needed. We are really grateful for his kind
found that tweets are mainly associated
support. He has always encouraged us and
with the ease of the tourists and the

given us the motivation to move ahead. He Updating Attributes in Data

has put in a lot of time and effort in this Transferability Applications.
project along with us and given us a lot of Transportation Research Record:
confidence. We are also grateful to DR. P. Journal of the Transportation Research
N. MAHALLE, Head of Computer Board, 2013(2344): p. 1-9.
[2] Rashidi, T., A. Mohammadian, and Y.
Engineering Department, Smt. Kashibai Zhang, Effect of Variation in Household
Navale College of Engineering for his Sociodemographics, Lifestyles, and Built
indispensable support. Also we wish to Environment on Travel Behavior.
thank all the other people who have helped Transportation Research Record: Journal
us in the successful completion of this of the Transportation Research Board,
project. We would also like to extend our 2010(2156): p. 64-72.
sincere thanks to Principal DR. A. V. [3] Francis, R.C., et al., Object tracking and
DESHPANDE, for his dynamic and management system and method using
valuable guidance throughout the project radio- frequency identification tags.
and providing the necessary facilities that 2003, Google Patents.
helped us to complete our dissertation [4] CRISP - DM
https://paginas.fe.up.pt/~ec/files_0405/slid
work. We would like to thank my es/02%20CRISP.pdf
colleagues friends who have helped us [5] Urban Passenger Data
directly or indirectly to complete this Collection: Keeping Up With a Changing
work. World, Transportation Association of
Canada https://www.tac-atc.ca/sites/tac-
REFERENCES atc.ca/files/site/doc/Bookstore/data-
[1] Rashidi, T., J. Auld, and A. collection-primer.pdf
Mohammadian, Effectiveness of Bayesian

CROSS MEDIA RETRIEVAL USING MIXED-

GENERATIVE HASHING METHODS
Saurav Kumar1,Shubham Jamkhola2, Mohd Uvais3, Paresh Khade4, Mrs Manjusha Joshi5
1,2,3,4,5
Department of Computer Engineering, Smt Kashibai Navale College of Engineering, Vadgao(Bk),
Pune, India.
sauravkumar644@gmail.com1, shubhamjakhmola999@gmail.com2, muvais32@gmail.com3,
pareshkhade19972511@gmail.com4, manjushajoshi1@gmail.com5
ABSTRACT
Hash methods are useful for number of tasks and have attracted large attention in
recent times. They proposed different approaches to capture the similarities between
text and images. Most of the existing work uses bag-of-words method to represent text
information. Since words with different format may have same meaning, the
similarities of the semantic text cannot be well worked out in these methods. To
overcome these challenges, a new method called Semantic Cross Media Hashing
(SCMH) is proposed that uses the continuous representations of words which captures
the semantic textual similarity level and uses a Deep Belief Network (DBN) to build the
correlation between different modes. In this method we use Skip-gram algorithm for
word embedding, SIFT descriptor to extract the key points from the images and MD5
algorithm for hash code generation. To demonstrate the effectiveness of the proposed
method, it is necessary to consider three commonly used data sets that are considered
basic. Here in proposed system we can used flicker dataset for experimental purpose.
Experimental results show that the proposed method achieves significantly better
results as well As the effectiveness of the proposed method is similar or superior to
other hash methods. We can also remove drawback of flicker in this proposed system.
Keywords- Deep Belief Network, Flicker, Semantic Cross-Media Hashing,
1. INTRODUCTION descriptor to extract the key points from
Internet information has become much the images. The Fisher kernel structure is
easy to view, search of text and images. used to incorporate both text information
Therefore, the hash similarity based as well as image information with fixed-
calculates or approximate search close by length vectors. To map Fisher vectors in
next they have been proposed and received different ways, a network of deep beliefs is
a remarkable beware of the last few years. used to carry out the task. In proposed
Various applications use information to system remove drawback of flicker
recover or detect near duplicate data and websites.MD5 algorithm for hash code
data mining. At a different social generation. SCMH gives the best result
networking sites, information entry than more advanced methods with
through multiple channels we search any different lengths of hash code and displays
images to that websites we got relevant as query results in order of classification or
well as relevant and irrelevant data. In the mapping. After mapping find out the
existing system a new hashing method that ranking of the images according to search.
is the Semantic Cross-Media Hashing
(SCMH) is used for detection of any 2. LITERATURE SURVEY
duplicates and recovery of cross media.
Given a collection of text-image bi- 1. Spatially Constrained Bag-of-Visual-
modality data, we firstly represent image Words for Hyperspectral Image
and text respectively. The cross-media Classification
retrieval makes use of a skip gram Spatially obliged Bag-of-Visual Words
algorithm for word embedding's to (BOV) proposed strategy for hyperspectral
represent text information and the SIFT picture arrangement. We right off the bat

remove the surface element. The unearthly novel portrayals of picture locales. We
and surface highlights are utilized as two show that our arrangement display
kinds of low-level highlights, in light of produces best in class results in recovery
which, the abnormal state visual-words are probes Flickr8K, Flickr30K and
built by the proposed technique. We utilize MSCOCO datasets. We at that point
the entropy rate super pixel division demonstrate that the created portrayals
strategy to fragment the hyperspectral into significant you perform retrieval baselines
patches that well keep the on both full images and on another dataset
homogeneousness of districts. The patches of locale level explanations.
are viewed as records in BOV show. At 4. Latent semantic sparse hashing for
that point kmeans bunching is executed to cross-modal similarity search
group pixels to develop codebook. At long A novel hashing technique, proposed to as
last, the BOV portrayal is developed with Latent Semantic Sparse Hashing, for
the insights of the event of visual words expansive scale cross modal comparability
for each fix. Trials on a genuine look among pictures and messages.
information demonstrate that the proposed Specifically, we uses Sparse Coding to
strategy is tantamount to a few best in catch abnormal state notable structures of
class strategies. pictures, and Matrix Factorization to
2. Automated Patent Classification separate inert ideas from writings. At that
Using Word Embedding point these abnormal state semantic
Patent classification is the undertaking of highlights are mapped to a joint reflection
dole out an uncommon code to a patent, space. The pursuit execution can be
where the allocated code is utilized to advanced by blending numerous complete
aggregate licenses with comparative idle semantic portrayals from
subject into an equivalent class. This paper heterogeneous information. We propose an
exhibits a patent arrangement technique iterative procedure which is exceptionally
dependent on word inserting and long efficient to investigate the connection
momentary memory system to group between's multi-modular portrayals and
licenses down to the subgroup IPC level. scaffold the semantic hole between
The trial results show that our heterogeneous information in dormant
classification technique accomplish 63% semantic space. We lead broad
exactness at the subgroup level. investigations on three multi-modular
3. Deep visual-semantic alignments for datasets comprising of pictures and
generating image descriptions messages. Unrivaled and stable 423
A model that creates regular dialect exhibitions of LSSH verifies the
descriptions of images and their regions. effectiveness of it looked at against a few
Our approach leverages datasets of best in class cross-modular hashing
pictures and their sentence depictions to techniques
find out about the between modular
correspondences among dialect and visual 5. Click through-based cross-view
information. Our arrangement demonstrate learning for image search
depends on a novel blend of Convolutional We have explored the issue of specifically
Neural Networks over picture locales, taking in the multi-see separate between a
bidirectional Recurrent Neural Networks printed question what's more, a picture by
over sentences, and an organized target utilizing both snap information and
that adjusts the two modalities through a subspace learning methods. The snap
multimodal inserting. We at that point information speaks to the snap relations
depict a Multimodal Recurrent Neural among inquiries and pictures, while the
Network design that utilizes the derived subspace learning expects to take in an
arrangements to figure out how to create inert regular subspace between various

perspectives. We have proposed a novel learning results and long haul collected
navigate based cross-see figuring out how information into the goal work.
to take care of the issue in a guideline way. Examinations on picture sound dataset
In particular, we utilize two diverse direct have shown the prevalence of our strategy
mappings to extend printed inquiries and more than a few existing calculations.
visual pictures into an idle subspace. The
mappings are found out by together 3. EXISTING SYSTEM APPROACH
limiting the separation of the watched Alongside the expanding necessities,
question picture matches on the navigate lately, cross-media look errands have
bipartite chart and safeguarding the inborn gotten extensive consideration. Since,
structure in unique single view. In every methodology having diverse
addition, we make symmetrical portrayal strategies and correlational
presumptions on the mapping frameworks. structures, an assortment of techniques
At that point the mappings can be gotten examined the issue from the part of
productively through curvilinear inquiry. learning relationships between various
We take l2 standard between the modalities. The effectiveness of hashing-
projections of inquiry and picture in the based strategies, there likewise exists a
inactive subspace as the separation rich profession cantering the issue of
capacity to quantify the importance of a mapping multi-modular high-dimensional
combine of (inquiry, picture). information to low-dimensional hash
codes, for example, Latent semantic
6. Boosting cross-media retrieval via inadequate hashing (LSSH), discriminative
visual-auditory feature analysis and coupled word reference hashing (DCDH),
relevance feedback Cross-see Hashing (CVH, etc. In the
existing system, user can search the data
0Diverse kinds of media information on flicker, user can get result the relevant
express abnormal state semantics from as well as irrelevant images.Irrelelent data
various angles. Step by step instructions to is main drawback of existing system as
learn far reaching abnormal state well as in existing data search the images
semantics from various sorts of using normal text only so time require for
information and empower effective cross- searching is more.
media recovery turns into a rising hot
issue. There are rich relationships among 4. PROPOSED SYSTEM APPROACH
heterogeneous low-level media content,
which makes it trying to inquiry cross-
media information adequately. In this
paper, we propose another cross-media
recovery strategy dependent on present
moment and long-haul significance input.
Our technique for the most part centres
around two run of the mill kinds of media
information, i.e. picture and sound.
Initially, we assemble multimodal
portrayal by means of factual authoritative Fig.1 Block Diagram of Proposed System
connection amongst picture and sound We propose a novel hashing strategy,
element frameworks, and characterize called semantic cross-media hashing
cross-media separate measurement for (SCMH), to play out the close copy
likeness measure; at that point we propose recognition and cross media recovery
advancement technique dependent on assignment. We propose to utilize a lot of
importance input, which melds momentary word embeddings to speak to printed data.

In proposed framework comprise of 2 ranking user can see the proper image.
modules client and administrator. Images also search using hash value using
Administrator can include the pictures and MD5 Algorithm. So, this is various type of
different capacities additionally and client methods are used for searching of
can look through the picture utilizing images.Likewise positioning of pictures as
content and also picture moreover. In per look. We can also display the ranking
proposed system user can search image of image search by the user.
using text as well as image. Searching of
image using text Word Embedding 5. CONCLUTION
algorithm is used and for searching of In this work, we propose you a new
image Feature Descriptor algorithm is hashing method, SCMH a duplicate and
used. Main drawback of existing system is cross-media detection restoration activity.
in searching of flicker data relevant as well We are proposing to use a series of words
as irrelevant data is shown .So, we can to represent textual information. In this
remove this drawback In our system we proposed system we can remove drawback
can get relevant images only. Fisher of flicker. User can search the data using
portion system is joined to speak to both text as well as images. User can also
printed and visual data with settled length search the images using hash values. The
vectors. For mapping the Fisher vectors of Fisher Framework Kernel built to
various modalities, a profound conviction represent both textual and visual
arrange is proposed to play out the information with fixed length vectors. To
undertaking. We assess the proposed map the Fisher vectors of different modes,
technique SCMH on three normally a network of deep beliefs intends to do the
utilized informational indexes. In a operation. We appreciate the proposal
proposed system searching the data over SCMH three common usage sets. SCMH
hashing value for calculating hash value best avant-garde methods with different
we can used MD5 algorithm. Proposed the lengths of hash codes. In the MIR
distinctive ways to deal with catch the Flicker data set, SCMH related
similitudes among content and pictures. In improvements in LSSH, which manages
this strategy we use Skip-gram calculation the best results in these data sets, are 10.0
for word implanting, SIFT descriptor to and 18.5 percent text to Image & Image to
remove the key focuses from the pictures Text tasks, respectively. Experimental
and MD5 calculation for hash code age. results demonstrate effectiveness proposed
SCMH accomplishes preferred outcomes in the cross-media recovery method. In the
over cutting edge strategies with various proposed user can also see ranked images
the lengths of hash codes. We can likewise according to users search. In feature work
evacuate downside of gleam in this we can search anybody using images with
proposed framework. In proposed system other social media like facebook, twitter.
user can search the image using text as
well as image. User can search any image 6. ACKNOWLEDGMENT
using text, word embedding algorithm This work is supported in a mix generative
feature vector is calculated after system of any state in india. Authors are
calculating feature vector, mapping the thankful to Faculty of Engineering and
image as well as ranking of images after Technology (FET), SavitribaiPhule Pune
ranking user can see the accurate image University,Pune for providing the facility
and user can search any image using to carry out the research work.
image, Feature Descriptor algorithm
feature vector is calculated after REFERENCES
calculating feature vector, mapping the [1] Liangrong Zhang, Kai Jiang, Yaoguo Zheng,
image as well as ranking of images after Jinliang An, Yanning Hu, Licheng Jiao

―Spatially Constrained Bag-of-Visual- Vis. Pattern Recog., Boston, MA, USA, Jun.
Words for Hyperspectral Image 2015, pp. 31283137.
Classification_‖ International Research [4] J. Zhou, G. Ding, and Y. Guo, ―Latent
Center for Intelligent Perception and semantic sparse hashing for cross-modal
Computation Xidian University, Xi‘an similarity search,‖ in Proc. 37th Int. ACM
710071, China_2016 SIGIR Conf. Res. Develop. Inf. Retrieval,
[2] Mattyws F. Grawe, Claudia A. Martins, Andreia 2014, pp. 415–424.
G. Bonfante, ―Automated Patent [5] Y. Pan, T. Yao, T. Mei, H. Li, C.-W. Ngo, and
Classification Using Word Embedding Y. Rui, ―Clickthrough-based cross-view
―16th IEEE International Conference on learning for image search,‖ in Proc. 37th
Machine Learning and Applications Federal Int.ACMSIGIR Conf. Res. Develop. Inf.
University of Mato Grosso Cuiaba, Brazil Retrieval, 2014, pp. 717–726.
2017 [6] H. Zhang, J. Yuan, X. Gao, and Z. Chen,
[3] A. Karpathy and L. Fei-Fei, ―Deep visual- ―Boosting cross-media retrieval via visual-
semantic alignments for generating image auditory feature analysis and relevance
descriptions‖ in Proc. IEEE Conf. Comput. feedback,‖ in Proc. ACM Int. Conf.
Multimedia, 2014, pp. 953–956.

AN EFFICIENT ALGORITHM FOR MINING TOP-K

HIGH UTILITY ITEMSET
Ahishek Doke1, Akshay Bhosale2,Sanket Gaikwad3,Shubham Gundawar4
1,2,3,4
Department of Computer Engineering, Smt Kashibai Navale College of Engineering, Vadgaon, Pune,
India.
dokeabhishek43@gmail.com1, akbhosale8042@gmail.com2, sanketgaikwad410@gmail.com3,
sgundawar85@gmail.com4
ABSTRACT
Data mining is a computerized process of searching for models in large data sets that
involve methods at the intersection of the database system. The popular problem of data
mining is the extraction of high utility element sets (HUI) or, more generally, the
extraction of public services (UI). The problem of HUI (set of elements of high utility)
is mainly the introduction to the set of frequent elements. Frequent pattern mining is a
widespread problem in data mining, which involves searching for frequent patterns in
transaction databases. Solve the problem of the set of high utility elements (HUI) with
some particular data and the state of the art of the algorithms. To store the HUI (set of
high utility elements) many popular algorithms have been proposed for this problem,
such as "Apriori", FP growth, etc., but now the most popular TKO algorithms
(extraction of utility element sets) K in one phase) and TKU (extraction of elements sets
Top-K Utility) here TKO is Top K in one phase and TKU is Top K in utility. In this
paper, address previous issues by proposing a new frame work for k upper HUI where
k is the desired number of HUI to extract. Extraction of high utility element sets is an
uncommon term. But we are using it while shopping online, etc. It is part of the
business analysis. The main area of application is the analysis of the market basket,
where when the customer buys the item he can buy another to maximize the benefit
both the customer and supplier profit.
Keyword
Utility mining, high utility item set, top k Pattern mining, top k high item set mining.
1. INTRODUCTION Top K models consists of two phases. In
Data mining is the efficient discovery of the first phase, called phase I, it is the
valuable and vivid information from a vast complete set of high transaction weighted
collection of data. Frequent set mining set utility item set (HTWUI). In the second
(FIM) discovers the only frequent phase, called phase II, all HUIs are
elements, but the set of HUI High Utility obtained by calculating the exact HTWUI
items. In the FIM profile of the set of utilities with a database scan. Although
elements are not considered. This is many studies have been devoted to the
because the amount of the purchase does extraction of HUI, it is difficult for users to
not take into account. Data mining is the effectively choose an appropriate
process of analyzing data from different minimum threshold. Depending on the
points of view and summarizing it in threshold, the size of the output can be
useful data. Data mining is a tool for very small or very large. Also the choice
analyzing data. It allows users to analyze of the threshold significantly impacts the
data from different levels or angles, performance of the algorithms if the
organize them and find the relationships threshold is too low then too many HUI
between the data. Data mining is the will be presented to users then it will be
process of finding patterns between difficult for users to understand the results.
enough fields in the large relational A large amount of HUI creates data
database. A classic algorithm based on mining algorithms unproductive or out of

memory, subsequently the more HUIs 1. ―Efficient tree structures for high-utility
create the algorithms, the more resources pattern mining in incremental databases‖.
they consume. Conversely, if the threshold Recently, high utility pattern (HUP)
is too high, HUI will not be found. mining is one of the most important
1.1 Background research issues in data mining due to its
Frequently generate a huge set of HUIs ability to consider the non-binary
and their mining performance is degraded frequency values of items in transactions
consequently. Further in case of long and different profit values for every item.
transactions in dataset or low thresholds On the other hand, incremental and
are set, then this condition may become interactive data mining provide the ability
worst. The huge number of HUIs forms a to use previous data structures and mining
challenging problem to the mining results in order to reduce unnecessary
performance since the more HUIs the calculations when a database is updated, or
algorithm generates, the higher processing when the minimum threshold is changed.
time it consumes. Thus to overcome this In this paper, we propose three novel tree
challenges the efficient algorithms structures to efficiently perform
presented. Top k will not work on the incremental and interactive HUP mining.
parallel mining. The first tree structure, Incremental HUP
1.2 Motivation Lexicographic Tree (IHUPL-Tree), is
1. Set the value of k which is more arranged according to an item‘s
intuitive than setting the threshold because lexicographic order. It can capture the
k represents the number of Item sets that incremental data without any restructuring
users want to find whereas choosing the operation. The second tree structure is the
threshold depends primarily on database IHUP Transaction Frequency Tree
characteristics, which are often unknown (IHUPTF-Tree), which obtains a compact
to users. size by arranging items according to their
2. The main point of min utility variable is transaction frequency (descending order).
not given in advance in top k HUI mining To reduce the mining time, the third tree,
In traditional HUI mining the search space IHUP-Transaction-Weighted Utilization
can be efficiently increased to algorithm Tree (IHUPTWU-Tree) is designed based
by using a given the min utility threshold on the TWU value of items in descending
value. In scenario of TKO and TKU order. Extensive performance analyses
algorithm min utility threshold value is show that our tree structures are very
provided in advance. efficient and scalable for incremental and
1.3 Aim & Objective interactive HUP mining.
1. The execution time of TKO algorithm is 2. ―Mining high-utility item sets‖
less but result is incorrect with a garbage Traditional association rule mining
value and it is efficient algorithm. The algorithms only generate a large number of
execution time of TKU algorithm is more highly frequent rules, but these rules do
but result is correct. It is very challenging not provide useful answers for what the
issue how hybrid algorithm (TKO WITH high utility rules are. We develop a novel
TKU) is efficient than TKU algorithm. idea of top-K objective-directed data
The time factor is very important in that. mining, which focuses on mining the top-
2. Need to achieve significantly better K high utility closed patterns that directly
performance. support a given business objective. To
3. The Hybrid Algorithm get HUI fixed association mining, we add the concept of
Parameter of Rating and view and Number utility to capture highly desirable statistical
of Buy‘s patterns and present a level-wise item-set
mining algorithm. With both positive and
2. LITERATURE SURVEY negative utilities, the anti-monotone

pruning strategy in Apriori algorithm no and/or long patterns. In this study, we

longer holds. In response, we develop a propose a novel frequent pattern tree (FP-
new pruning strategy based on utilities that tree) structure, which is an extended pre
allow pruning of low utility item sets to be xtree structure for storing compressed,
done by means of a weaker but anti- crucial information about frequent
monotonic condition. Our experimental patterns, and develop an e client FP-tree
results show that our algorithm does not based mining method, FP-growth, for
require a user specified minimum utility mining the complete set of frequent
and hence is effective in practice. patterns by pattern fragment growth.
3. ―Mining top-k frequent closed patterns Exigency of mining is achieved with three
without minimum support‖ In this paper, techniques: (1) a large database is
we propose a new mining task: mining compressed into a highly condensed, much
top-k frequent closed patterns of length no smaller data structure, which avoids costly,
less than min_/spllscr/, where k is the repeated database scans, (2) our FP-tree-
desired number of frequent closed patterns based mining adopts a pattern fragment
to be mined, and min_/spllscr/ is the growth method to avoid the costly
minimal length of each pattern. An generation of a large number of candidate
efficient algorithm, called TFP, is sets, and (3) a partitioning-based, divide-
developed for mining such patterns and-conquer method is used to decompose
without minimum support. Two methods, the mining task into a set of smaller tasks
closed-node-count and descendant-sum are for mining con need patterns in conditional
proposed to effectively raise support databases, which dramatically reduces the
threshold and prune FP-tree both during search space. Our performance study
and after the construction of FP-tree. shows that the FP-growth method is
During the mining process, a novel top- ancient and scalable for mining both long
down and bottom-up combined FP-tree and short frequent patterns, and is about an
mining strategy is developed to speed-up order of magnitude faster than the Apriori
support-raising and closed frequent pattern algorithm and also faster than some
discovering. In addition, a fast hash-based recently reported new frequent pattern
closed pattern verification scheme has mining methods.
been employed to check efficiently if a 5. ―Novel Concise Representations of
potential closed pattern is really closed. High Utility Item sets Using Generator
Our performance study shows that in most Patterns‖ Mining High Utility Item sets
cases, TFP outperforms CLOSET and (HUIs) is an important task with many
CHARM, two efficient frequent closed applications. However, the set of HUIs can
pattern mining algorithms, even when both be very large, which makes HUI mining
are running with the best tuned min- algorithms suffer from long execution
support. Furthermore, the method can be times and huge memory consumption. To
extended to generate association rules and address this issue, concise representations
to incorporate user-specified constraints. of HUIs have been proposed. However, no
4. ―Mining frequent patterns without concise representation of HUIs has been
candidate Generation‖ Mining frequent proposed based on the concept of
patterns in transaction databases, times generator despite that it provides several
Series databases, and many other kinds of benefits in many applications. In this
databases have been studied popularly in paper, we incorporate the concept of
data mining research. Most of the previous generator into HUI mining and devise two
studies adopt an Apriori-like candidate set new concise representations of HUIs,
generation-and-test approach. However, called High Utility Generators (HUGs) and
candidate set generation is still costly, Generator of High Utility Item sets
especially when there existproli c patterns (GHUIs). Two efficient algorithms named

HUG-Miner and GHUI-Miner are prefix extensions, to prune search space by

proposed to respectively mine these utility upper bounding, and to maintain
representations. Experiments on both real original utility information in the mining
and synthetic datasets show that proposed process by a novel data structure. Such a
algorithms are very efficient and that these data structure enables us to compute a tight
representations are up to 36 times smaller bound for powerful pruning and to directly
than the set of all HUIs. identify high utility itemsets in an efficient
6. ―Mining Top-K Sequential Rules‖ and scalable way. We further enhance the
Mining sequential rules requires efficiency significantly by introducing
specifying parameters that are often recursive irrelevant item filtering with
difficult to set (the minimal confidence sparse data, and a lookahead strategy with
and minimal support). Depending on the dense data. Extensive experiments on
choice of these parameters, current sparse and dense, synthetic and real data
algorithms can become very slow and suggest that our algorithm outperforms the
generate an extremely large amount of state-of-the-art algorithms over one order
results or generate too few results, of magnitude.
omitting valuable information. This is a 8. ―Mining High Utility Itemsets in Big
serious problem because in practice users Data‖ In recent years, extensive studies
have limited resources for analyzing the have been conducted on high utility
results and thus are often only interested in itemsets (HUI) mining with wide
discovering a certain amount of results, applications. However, most of them
and fine-tuning the parameters can be very assume that data are stored in centralized
time-consuming. In this paper, we address databases with a single machine
this problem by proposing TopSeqRules, performing the mining tasks.
an efficient algorithm for mining the top-k Consequently, existing algorithms cannot
sequential rules from sequence databases, be applied to the big data environments,
where k is the number of sequential rules where data are often distributed and too
to be found and is set by the user. large to be dealt with by a single machine.
Experimental results on real-life datasets To address this issue, we propose a new
show that the algorithm has excellent framework for mining high utility itemsets
performance and scalability. in big data. A novel algorithm named
7. ―Direct Discovery of High Utility PHUI-Growth (Parallel mining High
Itemsets without Candidate Generation‖ Utility Itemsets by pattern-Growth) is
Utility mining emerged recently to address proposed for parallel mining HUIs on
the limitation of frequent itemset mining Hadoop platform, which inherits several
by introducing interestingness measures nice properties of Hadoop, including easy
that reflect both the statistical significance deployment, fault recovery, low
and the user‘s expectation. Among utility communication overheads and high
mining problems, utility mining with the scalability. Moreover, it adopts the
itemset share framework is a hard one as MapReduce architecture to partition the
no anti-monotone property holds with the whole mining tasks into smaller
interestingness measure. The state-of-the- independent subtasks and uses Hadoop
art works on this problem all employ a distributed file system to manage
two-phase, candidate generation approach, distributed data so that it allows to parallel
which suffers from the scalability issue discover HUIs from distributed data across
due to the huge number of candidates. This multiple commodity computers in a
paper proposes a high utility itemset reliable, fault tolerance manner.
growth approach that works in a single Experimental results on both synthetic and
phase without generating candidates. Our real datasets show that PHUI-Growth has
basic approach is to enumerate itemsets by high performance on large-scale datasets

and outperforms state-of-the-art non- highest frequency patterns. In this paper,

parallel type of HUI mining algorithms. we propose an explorative mining
9. ―Isolated items discarding strategy for algorithm, called ExMiner, to mine k-most
discovering high utility item sets‖ interesting (i.e. top-k) frequent patterns
Traditional methods of association rule from large scale datasets effectively and
mining consider the appearance of an item efficiently. The ExMiner is then combined
in a transaction, whether or not it is with the idea of ―build once mine anytime‖
purchased, as a binary variable. However, to mine top-k frequent patterns
customers may purchase more than one of sequentially. Experiments on both
the same item, and the unit cost may vary synthetic and real data show that our
among items. Utility mining, a generalized proposed methods are more efficient
form of the share mining model, attempts compared to the existing ones.
to overcome this problem. Since the
Apriori pruning strategy cannot identify 3. PROPOSED SYSTEM
high utility item sets, developing an In the proposed framework, we address the
efficient algorithm is crucial for utility problems mentioned above by proposing
mining. This study proposes the Isolated another system for calculating the means
Items Discarding Strategy (IIDS), which and means responsible for a high utility
can be applied to any existing level-wise configured in parallel extraction using
utility mining method to reduce candidates TKU and TKO. Two types of production
and to improve performance. The most calculations called TKU (extraction of sets
efficient known models for share mining of utility elements Top-K) and TKO (sets
are ShFSM and DCG, which also work of themes of extraction Top-K are
adequately for utility mining as well. By proposed in one phase) to extract these
applying IIDS to ShFSM and DCG, the series of elements without the need to
two methods FUM and DCG+ were establish a utility minimum. But the TKO
implemented, respectively. For both algorithm have the main disadvantage of
synthetic and real datasets, experimental not mainly accumulating the result of TKO
results reveal that the performance of given the value of the garbage in the set of
FUM and DCG+ is more efficient than high utility items isthe result of the TKU
that of ShFSM and DCG, respectively. algorithm is increased but the execution
Therefore, IIDS is an effective strategy for time is high, so the alternative solution is
utility mining. to find the efficient algorithm in the
10. ―ExMiner: An efficient algorithm for proposed combination of the TKO and
mining top-k frequent patterns‖ TKU algorithm system. It can be said that
Conventional frequent pattern mining the result of TKO Top K in one phase is
algorithms require users to specify some given at the entrance of TKU Top K in the
minimum support threshold. If that utility result of TKO and TKU is increased
specified-value is large, users may lose and the execution time is low. In the
interesting information. In contrast, a small proposed system, a new algorithm is
minimum support threshold results in a generated for combining the name TKO
huge set of frequent patterns that users and TKU as TKO WITH TKU or
may not be able to screen for useful TKMHUI Top k Main set of utility
knowledge. To solve this problem and elements.
make algorithms more user-friendly, an Modules:
idea of mining the k-most interesting Module 1 - Administrator (Admin)
frequent patterns has been proposed. This The administrator preserve database of the
idea is based upon an algorithm for mining transactions made by customers. In the
frequent patterns without a minimum daily market basis, each day a new product
support threshold, but with a k number of is let go, so that the administrator would

add the product or items, and update the Module 4 TKO and TKU Algorithms
new product view the stock details. In Combination of TKO and TKU
Module 2 - User (Customer) algorithms first the TKO (Top k in one
Customer can purchase the number of phase) algorithms is called and then output
items. All the purchased items history is of TKO is given as the input of TKU (Top
stored in the transaction database. k in utility phases) then the actual result is
Module 3 - Construction of Up Tree TKU Result.
In Up Tree Dynamic Table is generated by
algorithms. Mainly the Up growth is
considerable to get the PHUI item set.
End User Select Up Growth

Book Category Algorithms Data Base
Result With K
K Value TKO Algorithms
Parallel and Value
Pattern
Algorithms
Select Category TKU Algorithms
Data Base
4. CONCLUSION data sets display the proposed algorithms

In this paper, we looked at the question of have good scalability in large data sets and
the best sets of high-use mining mines, the performance of the proposed
where k is the coveted number of highly algorithms are close to the optimal case of
useful sets of things to extract. The most the state of the combination of both phases
competent combination of TKO WITH in an algorithm
TKU of the TKO and TKU calculations is
proposed to extract such sets of objects REFRENCES
without establishing utility limits. Instead [1] IEEE TRANSACTIONS 2018 JANUARY 1,
TKO is the first single phase algorithm NO. 28, VOL. ENGINEERING, DATA AND
KNOWLEDGE ON ―Efficient Algorithms for
developed for top-k HUI mining called Mining Top-K High Utility Itemsets‖ Vincent
PHUI (high potential set of utility S. Tseng, Senior Member, IEEE, Cheng-Wei
elements) and PHUI is given to TKU in Wu, Philippe Fournier-Viger, and Philip S.
the utility phases. Empirical evaluations Yu, Fellow, IEEE.
on different types of real and synthetic [2] C. Ahmed, S. Tanbeer, B. Jeong, and Y. Lee,
―Efficient tree structures for high-utility

pattern mining in incremental databases,‖ [7] P. Fournier-Viger and V. S. Tseng, ―Mining

IEEE Trans. Knowl. Data Eng., vol. 21, no. top-k sequential rules,‖ in Proc. Int. Conf.
12, pp. 1708–1721, Dec. 2009. Adv. Data Mining Appl., 2011, pp. 180–194.
[3] R. Chan, Q. Yang, and Y. Shen, ―Mining [8] J. Liu, K. Wang, and B. Fung, ―Direct
high-utility itemsets,‖ in Proc. IEEE Int. discovery of high utility itemsets without
Conf. Data Mining, 2003, pp. 19–26. candidate generation,‖ in Proc. IEEE Int.
[4] J. Han, J. Wang, Y. Lu, and P. Tzvetkov, Conf. Data Mining, 2012, pp. 984–989.
―Mining top-k frequent closed patterns [9] Y. Lin, C. Wu, and V. S. Tseng, ―Mining high
without minimum support,‖ in Proc. IEEE Int. utility itemsets in big data,‖ in Proc. Int. Conf.
Conf. Data Mining, 2002, pp. 211–218. Pacific-Asia Conf. Knowl. Discovery Data
[5] J. Han, J. Pei, and Y. Yin, ―Mining frequent Mining, 2015, pp. 649–661.
patterns without candidate generation,‖ in [10] Y. Li, J. Yeh, and C. Chang, ―Isolated items
Proc. ACM SIGMOD Int. Conf. Manag. Data, discarding strategy for discovering high-utility
2000, pp. 1–12. itemsets,‖ Data Knowl. Eng., vol. 64, no. 1,
[6] P. Fournier-Viger, C. Wu, and V. S. Tseng, pp. 198–217, 2008.
―Novel concise representations of high utility [11] T. Quang, S. Oyanagi, and K. Yamazaki,
itemsets using generator patterns,‖ in Proc. ―ExMiner: An efficient algorithm for mining
Int. Conf. Adv. Data Mining Appl. Lecture top-k frequent patterns,‖ in Proc. Int. Conf.
Notes Comput. Sci., 2014, vol. 8933, pp. 30– Adv. Data Mining Appl., 2006, pp. 436 – 447.
43.

SARCASM DETECTION USING TEXT

FACTORIZATION ON REVIEWS
Tejaswini Murudkar1, Vijaya Dabade2, Priyanka Lodhe3, Mayuri Patil4, Shailesh Patil5
1,2,3,4,5
Pune, India.
Tejum14320@gmail.com1, Vijayadabade105@gmail.com2, Priyankalodhe09@gmail.com3,
Mayuripatil733@gmail.com4, Shaileshpp19@gmail.com5
ABSTRACT
The research area of sentiment analysis, opinion mining, sentiment mining and
sentiment extraction has gained popularity in the last years. Online reviews are
becoming very important criteria in measuring the quality of a business. This paper
presents a sentiment analysis approach to business reviews classification using a large
reviews dataset provided by Yelp: Yelp Challenge dataset. In this work, we propose
several approaches for automatic sentiment classification, using two feature extraction
methods and four machine learning models. It is illustrated a comparative study on the
effectiveness of the ensemble methods for reviews sentiment classification.
1. INTRODUCTION 3. STATE OF ART
Sentiment analysis has become an Mondher Bouazizi and Tomoaki Ohtsuki
important research area for understanding [1] explained- use of Part-of-Speech-tags
people‘s opinion on a matter by analyzing to extract patterns characterizing the level
a large amount of information. The active of sarcasm of tweets training set since the
feedback of the people is valuable not only number of patterns we extracted from the
for companies to analyze their customers‘ current one is 346 541 Mondher Bouazizi
satisfaction and the monitoring of and Tomoaki Ohtsuki [2] analyzed- ran
competitors, but is also very useful for the classification using the classifiers
consumers who want to research a product ―Random Forest‖, ―Support Vector
or a service prior to making a purchase. Machine‖ (SVM), ―k Nearest Neighbors‖
(k-NN) and ―Maximum Entropy‖. Huaxun
2. MOTIVATION Deng, Linfeng Zhao et al. [3] analyzed -
With the increased amount of data used the similarity characteristics of the
collection taking place as a result of social text to determine a set of true negative
media interaction, scientific experiments, cases or fake reviews and extract the
and even e-commerce applications, the characteristic vector from multiple
nature of data as we know it has been aspects. Then, take the technique of K-
evolving. As a result of this data Means to cluster towards the comments.
generation from many different sources, We label the comments as the negative
―new generation‖ data, presents case if the comments is close to the true
challenges as it is not all relational and negative cases, whereas label the
lacks predefined structures. In this project comments as the positive case if the
we try to sort these issues and provide a comment far away from the trusted
way for better acquisition and processing negative case Shalini Raghav, Ela Kumar
of this type of data. We will be Analyzing [4] explained- have identified pattern
the real time social network data and try to extraction, hashtag based and contextual
eliminate the Fake reviews and analyze approach. Tanya Jain, Nilesh Agrawal et
the sarcasm in it. al. [5] analyzed- Problem of sarcasm
positive sentiments attached to a negative
situation. The work uses two approaches-
voted classifier and random forest

classifier. And in the proposed model they paradigms are at utmost. To support our
used seeding algorithm and pragmatic motivations, we have described some
classifier to detect emoticon based areas where Big Data can play an
sarcasm. Edwin Lunando, Ayu important role. In healthcare scenarios,
Purwarianti [6] analyzed- To solve the medical practitioners gather massive
high computational overhead and low volume of data about patients, medical
classification efficiency of the KNN history, medications, and other details.
algorithm, a text feature vector The above-mentioned data are
representation method based on accumulated in drug-manufacturing
information gain and non-negative matrix companies. The nature of these data is
factorization is proposed. very complex, and sometimes the
practitioners are unable to show a
4. GAP ANALYSIS relationship with other information, which
The process of detecting sarcasm was results in missing of important
done on the basis of fixed dataset. The information. With a view in employing
dataset was saved and then the further advance analytic techniques for
processing started. The dataset was or organizing and extracting useful
could be manipulated easily as it used to information from Big Data results in
be stored. Whereas in our process of personalized medication, the advance Big
detecting sarcasm, the detection is done Data analytic techniques give insight into
on real time data. The data is not saved hereditarily causes of the disease. In the
permanently. The minute you refresh the Same way data is also generated for the
data would be refreshed from the memory, reviews of the product across various
i.e. new data would be shown. The data services but sometimes we have to
would be only saved temporarily. differentiate between fake reviews and
Temporary storage would be done through Genuine Reviews for the input of our
MongoDB. As the data is not being saved, decision making process in Business.
manipulation of the data is impossible.
And the results are more accurate and 6. CONCLUSION AND FUTURE
unbiased. WORK
5. PROPOSED WORK CONCLUSION

The increase in the data rates generated on Sarcasm is a complex phenomenon. In this
the digital universe is escalating project, we could notice how real time
exponentially. With a view in employing dataset can act as huge asset in terms of
current tools and technologies to analyze data gathering and how a few basic
and store, a massive volume of data are features like punctuation can be powerful
not up to the mark, since they are unable in the detection (accuracy) of
to extract required sample data sets. sophisticated language form like that of
Therefore, we must design an architectural sarcasm. Data pre-processing and feature
platform for analyzing both remote access engineering would be one of the most
real time and offline data. When a important tasks in terms of improving
business enterprise can pull-out all the accuracy and more in-depth analysis in
useful information obtainable in the Big these domains will help in improving the
Data rather than a sample of its data set, in accuracy considerably. The goal of the
that case, it has an influential benefit over system is to efficient detect sarcasm into
the market competitors. Big Data analytics positive, negative and neutral categories.
helps us to gain insight and make better Not just detecting the sarcasm but also
decisions. Therefore, with the intentions detecting reviews in positive, negative and
of using Big Data, modifications in

neutral categories in the form of a [2] A Pattern-Based Approach for Sarcasm

graphical representation. Detection on Twitter (2016), Mondher
Bouazizi, Tomoaki Ohtsuki
[3] Semi-supervised Learning based Fake Review
FUTURE WORK Detection(2017), Huaxun Deng,Linfeng Zhao,
In the above section I have mentioned few Ning Luo, Yuan Liu, Guibing Guo, Xingwei
improvements that can be incorporated in Wang, Zhenhua Tan, Shuang Wang and Fucai
Zhou
the feature set that I have used. Apart from [4] Review of Automatic Sarcasm Detection
these we can include topic based feature (2017 review paper), Shalini Raghav, Ela
set. Other major improvement that can be Kumar
done is in data processing. Spell checks, [5] Sarcasm Detection of Tweets: A comparative
word sense disambiguation, slang Study (2017), Tanya Jain,Nilesh
Agrawal,Garima Goyal,Niyati Aggrawal.
detection can make the data cleaner and [6] Indonesian Social Media Sentiment Analysis
can help in better classification. Also, the with Sarcasm Detection (2013), Edwin
ratio of the sarcastic to non-sarcastic data Lunando, Ayu Purwarianti
is quiet high which is not the case in the [7] Text Classification Algorithm Based on Non-
real word hence we need to gather more negative Matrix Factorization (2017),Yongxia
Jing, Heping Gou, Chuanyi Fu,Qiang Liu
data with lower ratio to get the real [8] Satire Detection from Web Documents using
performance measure of our system. So machine Learning Methods(2014),Tanvir
basically, with detecting the sarcasm we Ahmad, Halima Akhtar, Akshay Chopra,
also would be looking for foul languages Mohd Waris
and detecting them in the reviews the data [9] Automatic Sarcasm Detection using feature
selection (2017), Paras Dharwal, Tanupriya
wouldn‘t be saved and hence there is no Choudhury, Rajat Mittal, Paveen Kumar
scope of manipulation of the data. The [10] Improvement Sarcasm Analysis using NLP
results will be pretty unbiased. We and Corpus based Approach (2017), Manoj Y.
wouldn‘t just limit it to the reviews, we‘ll Manohar, Prof. Pallavi Kulkarni
be adding this process in comments as [11] Sentiment Analysis for Sarcasm
Detection on Streaming Short Text
well.
Data(2017), Anukarsh G Prasad, Sanjana
S, Skanda M Bhat, B S Harish
REFERENCES
[1] Sarcasm Detection in Twitter (2015),
Mondher Bouazizi, Tomoaki Ohtsuki

PREDICTION ON HEALTH CARE BASED ON

NEAR SEARCH BY KEYWORD
Mantasha Shaikh1, Sourabh Gaikwad2, Pooja Garje3, Harshada Diwate4
1,2,3,4
Department of Computer Engineering, Smt.Kashibai navle college of Engineering, Vadgaon(Bk),
Pune, India.
mantasha.shaikh1911@gmail.com1, Sourabhgaikwad45@gmail.com2, poojagarje.pg@gmail.com3,
Harshadiwate912@gmail.com4
ABSTRACT
In our society, humans have more attention to their own fitness. Personalized fitness
service is regularly rising. The lack of skilled doctors and physicians, maximum
healthcare corporations cannot meet the clinical call for the public. Public want extra
accurate and on the spot result. Thus, increasingly more facts mining packages are
evolved to provide humans with extra custom designed healthcare provider. It is a good
answer for the mismatch of insufficient clinical assets and growing medical demands.
We advocate an AI-assisted prediction device which leverages information mining
strategies to show the relationship between the everyday physical examination
information and the capability fitness danger given through the consumer or public.
The Main Concept to decide clinical illnesses in step with given signs and symptoms &
every day Routine whilst User search the sanatorium then given the closest medical
institution of their cutting-edge area. The machine gives a user-friendly interface for
examinees and medical doctors. Examinees can recognize their symptoms which
amassed in the frame which set as the at the same time as medical doctors can get fixed
of examinees with capacity hazard. A comments mechanism could shop manpower and
improve the overall performance of gadget mechanically. The doctor should restoration
prediction result via an interface, which will accumulate medical doctors' enter as new
training information. A more training technique might be caused every day the use of
those facts. Thus, our machine ought to enhance the overall performance of the
prediction model mechanically.
Keyword:
Data Mining, Machine Learning, and diseases prediction.
1. INTRODUCTION records which takes the form of numbers,
Many healthcare companies (hospitals, text. There are lots of hidden records in
medical facilities) in China are busy these data untouched. Data mining and
serving humans with quality-attempt predictive analytics goal to reveal patterns
healthcare carrier. Nowadays, humans pay and policies by applying advanced facts
extra interest to their bodily situations. analysis strategies on a large set of facts
They need higher first-class and more for descriptive and predictive purposes.
customized healthcare provider. However, Data mining is suitable for processing
with the limitation of a number of skilled large datasets from hospital records
medical doctors and physicians, most machine and locating family members
healthcare agencies cannot meet the need amongst facts features. It takes only some
of the public. How to offer better first- researchers to investigate information from
class healthcare to more people with sanatorium records. The Main Concept to
restrained manpower turns into a key determine medical sicknesses consistent
problem. The healthcare environment is with given signs & every day Routine
usually perceived as being ‗facts wealthy' whilst User seek the health center then
but ‗understanding bad'. Hospital facts given the closest health center in their
structures usually generate a big amount of modern location. The machine provides a

person-pleasant interface for examinees environment is generally perceived as

and medical doctors. Examinees can know being ‗information rich' yet
their signs and symptoms which ‗knowledge poor'. There is a wealth of
accumulated in the frame which set as the data available within the healthcare
while docs can get fixed of examinees with systems. However, there is a lack of
a capacity chance. A comments effective analysis tools to discover
mechanism could shop manpower and hidden relationships and trends in data.
enhance the performance of device Knowledge discovery and data mining
robotically. have found numerous applications in
business and scientific domain.
1.1 MOTIVATION Valuable knowledge can be discovered
a. Previous medical examiner only from the application of data mining
used basic symptoms of particular diseases techniques in the healthcare system. In
but our application examiner examines the this study, we briefly examine the
word count, laboratory results and potential use of classification based
diagnostic data. data mining techniques such as rule-
b. A feedback mechanism could save based, decision tree, naïve Bayes and
manpower and improve the performance artificial neural network to the massive
of the system automatically. The doctor volume of healthcare data. The
could fix prediction result through an healthcare industry collects huge
interface, which will collect doctors' input amounts of healthcare data which,
as new training data. An extra training unfortunately, are not "mined" to
process will be triggered every day using discover hidden information. For data
these data. Thus, our system could preprocessing and effective decision
improve the performance of the prediction making One Dependency Augmented
model automatically. Naïve Bayes classifier (ODANB) and
c. When the user visits hospital naive credal classifier 2 (NCC2) are
physically, then the user's personal record used. This is an extension of naïve
is saved and then that record is added to Bayes to imprecise probabilities that
the examiner data set. It consumes a lot of aims at delivering robust
time. classifications also when dealing with
small or incomplete data sets.
1.2 AIM AND OBJECTIVES Discovery of hidden patterns and
a. The Main concept is to determine relationships often goes unexploited.
medical diseases according to given Using medical profiles such as age,
symptoms and daily routine and when sex, blood pressure, and blood sugar
user search the hospital, the hospital can predict the likelihood of patients
which is nearest to their current getting heart disease. It enables
location is given. significant knowledge, e.g. patterns,
b. Determine medical diseases according relationships between medical factors
to given symptoms & daily Routine. related to heart disease, to be
c. Prediction is done on the word count, established.
laboratory results and diagnostic data. Disadvantage
 For predicting heart attack significantly 15
2. RELATED WORK attributes are listed
A. ―Applications of Data Mining Besides the 15 listed in the medical
Techniques in Healthcare and literature, we can also incorporate other
Prediction of Heart Attacks‖ data mining techniques, e.g., Time Series,
Author,-Srinivas K, Rani B K, Clustering and Association Rules.
Govrdhan A. The healthcare categorical data is used

 Text mining is not used for of unstructured Era of Big Data‖ Author-Anderson J E,
data. Chang D
D. C. Many healthcare facilities enforce
security on their electronic health records
B. ―Grand challenges in clinical (EHRs) through a corrective mechanism:
decision support‖ Author- Sittig D, some staff nominally have almost
Wright A, Osheroff J, et al. unrestricted access to the records, but there
There is a pressing need for high-quality, is a strict ex post facto audit process for
effective means of designing, developing, inappropriate accesses, i.e., accesses that
presenting, implementing, evaluating, and violate the facility‘s security and privacy
maintaining all types of clinical decision policies. This process is inefficient, as
support capabilities for clinicians, patients each suspicious access has to be reviewed
and consumers. Using an iterative, by a security expert, and is purely
consensus-building process we identified a retrospective, as it occurs after damage
rank-ordered list of the top 10 grand may have been incurred. This motivates
challenges in clinical decision support. automated approaches based on machine
This list was created to educate and inspire learning using historical data. Previous
researchers, developers, funders, and attempts at such a system have
policy-makers. The list of challenges in successfully applied supervised learning
order of importance that they be solved if models to this end, such as SVMs and
patients and organizations are to begin logistic regression. While providing
realizing the fullest benefits possible of benefits over manual auditing, these
these systems consists of: improve the approaches ignore the identity of the users
human-computer interface; disseminate and patients involved in record access.
best practices in CDS design, Therefore, they cannot exploit the fact that
development, and implementation; a patient whose record was previously
summarize patient-level information; involved in a violation has an increased
prioritize and filter recommendations to risk of being involved in a future violation.
the user; create an architecture for sharing Motivated by this, in this paper, we
executable CDS modules and services; propose a collaborative filtering inspired
combine recommendations for patients approach to predicting inappropriate
with co-morbidities; prioritize CDS accesses. Our solution integrates both
content development and implementation; explicit and latent features for staff and
create internet-accessible clinical decision patients, the latter acting as a personalized
support repositories; use free text "fingerprint" based on historical access
information to drive clinical decision patterns. The proposed method, when
support; mine large clinical databases to applied to real EHR access data from two
create new CDS. Identification of tertiary hospitals and a file-access dataset
solutions to these challenges is critical if from Amazon, shows not only
clinical decision support is to achieve its significantly improved performance
potential and improve the quality, safety, compared to existing methods, but also
and efficiency of healthcare provides insights as to what indicates
Disadvantage:- Identification of solutions inappropriate access.
to these challenges is critical if E. ―Data Mining Techniques into
clinical decision support is to achieve its Telemedicine Systems‖ Author-
potential and improve the quality, safety, Gheorghe M, Petre R providing care
and efficiency of healthcare services through telemedicine has become
an important part of the medical
C. ―Using Electronic Health Records for development process, due to the latest
Surgical Quality Improvement in the innovation in the information and

computer technologies. Meanwhile, data turn that data are facts, numbers, or
mining, a dynamic and fast-expanding text which can be processed by a
domain, has improved many fields of computer into knowledge or
human life by offering the possibility of information. The main purpose of data
predicting future trends and helping with mining application in healthcare
decision making, based on the patterns and systems is to develop an automated
trends discovered. The diversity of data tool for identifying and disseminating
and the multitude of data mining relevant healthcare information. This
techniques provide various applications for paper aims to make a detailed study
data mining, including in the healthcare report of different types of data mining
organization. Integrating data mining applications in the healthcare sector
techniques into telemedicine systems and to reduce the complexity of the
would help improve the efficiency and study of the healthcare data
effectiveness of the healthcare transactions. Also presents a
organizations activity, contributing to the comparative study of different data
development and refinement of the mining applications, techniques and
healthcare services offered as part of the different methodologies applied for
medical development process. extracting knowledge from a database
F. ―Query recommendation using generated in the healthcare industry.
query logs in search engines‖ Finally, the existing data mining
Author-R. Baeza-Yates, C. Hurtado, techniques with data mining
and M. Mendoza In this paper we algorithms and its application tools
propose a method that, given a query which are more valuable for healthcare
submitted to a search engine, suggests services are discussed in detail.
a list of related queries. The related H. ―Detecting Inappropriate Access
queries are based in previously issued to Electronic Health Records Using
queries and can be issued by the user Collaborative Filtering‖ Author-Aditya
to the search engine to tune or redirect Krishna Menon , Many healthcare
the search process. The method facilities enforce security on their
proposed is based on a query clustering electronic health records (EHRs) through a
process in which groups of corrective mechanism: some staff
semantically similar queries are nominally have almost unrestricted access
identified. The clustering process uses to the records, but there is a strict ex post
the content of historical preferences of facto audit process for inappropriate
users registered in the query log of the accesses, i.e., accesses that violate the
search engine. The method not only facility's security and privacy policies.
discovers the related queries but also This process is inefficient, as each
ranks them according to a relevance suspicious access has to be reviewed by a
criterion. Finally, we show with security expert, and is purely retrospective,
experiments over the query log of a as it occurs after damage may have been
search engine the effectiveness of the incurred. This motivates automated
method. approaches based on machine learning
G. ―Data Mining Applications In using historical data. Previous attempts at
Healthcare Sector: A Study ‖ such a system have successfully applied
Author -M. Durairaj, V. In this supervised learning models to this end,
paper, our system have focused to such as SVMs and logistic regression.
compare a variety of techniques, While providing benefits over manual
approaches and different tools and its auditing, these approaches ignore the
impact on the healthcare sector. The identity of the users and patients involved
goal of data mining application is to in record access. Therefore, they cannot

exploit the fact that a patient whose record J. ―Evaluation of radiological

was previously involved in a violation has features for breast tumor classification
an increased risk of being involved in a in clinical screening with machine
future violation. Motivated by this, in this learning methods‖ Author-Tim W.
paper, we propose a collaborative filtering Nattkempera, Bert Arnrich The k-
inspired approach to predicting means clustering and self-organizing maps
inappropriate accesses. Our solution (SOM) are applied to analyze the signal
integrates both explicit and latent features structure in terms of visualization. We
for staff and patients, the latter acting as a employ k-nearest neighbor classifiers (k-
personalized "fingerprint" based on nn), support vector machines (SVM) and
historical access patterns. The proposed decision trees (DT) to classify features
method, when applied to real EHR access using computer-aided diagnosis (CAD)
data from two tertiary hospitals and a file- approach.
access dataset from Amazon, shows not K. ―Comparative Analysis of
only significantly improved performance Logistic Regression and Artificial
compared to existing methods, but also Neural Network for Computer-Aided
provides insights as to what indicates Diagnosis of Breast Masses‖ Author-
inappropriate access. Song J H, Venkatesh S S, Conant E A,
I. ―Text data mining of aged care Breast cancer is one of the most common
accreditation reports to identify risk cancers in women. Solography is now
factors in medication management commonly used in combination with other
in Australian residential aged care modalities for imaging breasts. Although
homes‖ Author-Tao Jiang & Siyu ultrasound can diagnose simple cysts in
Qian, This study aimed to identify risk the breast with an accuracy of 96%–100%,
factors in medication management in its use for unequivocal differentiation
Australian residential aged care (RAC) between solid benign and malignant
homes. Only 18 out of 3,607 RAC masses has proven to be more difficult.
homes failed aged care accreditation Despite considerable efforts toward
standard in medication management improving imaging techniques, including
between 7th March 2011 and 25th sonography, the final confirmation of
March 2015. Text data mining whether a solid breast lesion is malignant
methods were used to analyze the or benign is still made by biopsy.
reasons for failure. This led to the
identification of 21 risk indicators for a 3. EXISTING SYSTEM
RAC home to fail in medication The system leverages data mining methods
management. These indicators were to reveal the relationship between regular
further grouped into ten themes. They physical examination records and potential
are overall medication management, health risk. It can predict examinees' risk
medication assessment, ordering, of physical status next year based on the
dispensing, storage, stock and disposal, physical examination records this year.
administration, incident report, Examinees can know their potential health
monitoring, staff, and resident risks while doctors can get a set of
satisfaction. The top three risk factors examinees with potential risk. It is a good
are: "ineffective monitoring process" solution for the mismatch of insufficient
(18 homes), "noncompliance with medical resources and rising medical
professional standards and guidelines" demands. They apply various supervised
(15 homes), and "resident machine learning methods, including
dissatisfaction with overall medication decision tree, XG Boost to predict
management" (10 homes). potential health risks of examinees using
their physical examination records.

Examinees can know their symptoms symptoms which accrued in the body
which accrued in the body which set as the which set as the while doctors can get a set
(potential health risks according) while of examinees with potential risk. A
doctors can get a set of examinees with feedback mechanism could save
potential risk. manpower and improve the performance
of the system automatically. The doctor
4.PROPOSED SYSTEM could fix prediction result through an
The Main Concept to determine medical interface, which will collect doctors' input
diseases according to given symptoms & as new training data. An extra training
daily Routine when User search the process will be triggered every day using
hospital then given the nearest hospital of these data. Thus, our system could
their current location. The system provides improve the performance of the prediction
a user-friendly interface for examinees and model automatically.
doctors. Examinees can know their
Registration with Symptoms given by

particular Hospital User
Predication
Registration view symptoms
diseases
Doctor
Add Hospital with

Specialization
Admin After add
Hospital
Add Hospital View Doctor View User Edit Hospitals
Admin
User Registration &

Search By
Login
Hospital Name
user Given
Search Symptoms
user Select user Select
User Keyword Search By
Hospital Doctor Doctor
Doctor Name Predication the
Given diseases &
Search By appointment Medicines
Specilization
Fig 1: System Overview
Advantages are: This project implements an AI-assisted

 Increases human-computer prediction system which leverages data
interactions mining methods to reveal the relationship
 Location of User is detected. between the regular physical examination
 Recommended the hospital and doctor records and the potential health risk given
to by the user or public Different machine
 patient according to diseases learning algorithms are applied to predict
Predicted. the physical status of examinee will be in
 Provided medicine for diseases which danger of physical deterioration next year.
is predicted. In our System user or patient search the
 Fast Prediction system hospital, then results are given according
to the nearest location of the current
 Scalable, Low-cost
location of user/patients. User / Patients
 Comparable quality to experts.
gives symptoms and the system will
5. CONCLUSION
predict the diseases and will give the

medicines. We also design a feedback Machine Learning, 2014, 95(1):87-101.

mechanism for doctors to fix classification [9] Accreditation Reports to Identify Risk Factors
in Medication Management in Australian
result or input new training data, and the Residential Aged Care Homes[J]. Studies in
system will automatically rerun the Health Technology & Informatics, 2017,
training process to improve performance 245:892.
every day. [10] Nattkemper T W, Arnrich B, Lichte O, et al.
Evaluation of radiological features for breast
tumor classification in clinical screening with
REFERENCES machine learning methods[J]. Artificial
[1] Zhaoqian Lan, Guopeng Zhou, Yichun Duan, Intelligence in Medicine, 2005, 34(2):129-
Wei Yan, "AI-assisted Prediction on Potential 139.
Health Risks with Regular Physical [11] Song J H, Venkatesh S S, Conant E A, et al.
Examination Records", IEEE Transactions On Comparative analysis of logistic regression
Knowledge And Data Science, 2018. and artificial neural network for computer-
[2] Srinivas K, Rani B K, Govrdhan A. aided diagnosis of breast masses.[J]. Academic
―Applications of Data Mining Techniques in Radiology, 2005, 12(4):487-95.
Healthcare and Prediction of Heart Attacks‖. [12] V. Akgün, E. Erkut, and R. Batta. On finding
International Journal on Computer Science & dissimilar paths. European Journal of
Engineering, 2010. Operational Research, 121(2):232–246, 2000.
[3] Sittig D, Wright A, Osheroff J, et al. ―Grand [13] T. Akiba, T. Hayashi, N. Nori, Y. Iwata, and
challenges in clinical decision support‖. Y. Yoshida. Efficient topk shortest-path
Journal of Biomedical Informatics, 2008. distance queries on large networks by pruned
[4] Anderson J E, Chang D C. ―Using Electronic landmark labeling. In Proc. AAAI, pages 2–8,
Health Records for Surgical Quality 2015.
Improvement in the Era of Big Data‖[J]. Jama [14] A. Angel and N. Koudas. Efficient diversity-
Surgery, 2015. aware search. In Proc. SIGMOD, pages 781–
[5] Gheorghe M, Petre R. ―Integrating Data 792, 2011. H. Bast, D. Delling, A. V.
Mining Techniques into Telemedicine Goldberg, M. Müller-Hannemann, T. Pajor,P.
Systems‖ Informatica Economica Journal, Sanders, D. Wagner, and R. F. Werneck.
2014. Route planning in transportation networks. In
[6] R. Baeza-Yates, C. Hurtado, and M. Mendoza, Algorithm Engineering, pages 19–80. 2016.
―Query recommendation using query logs in [15] H. Bast, D. Delling, A. V. Goldberg, M.
search engines,‖ in Proc. Int. Conf. Current Müller-Hannemann, T. Pajor,P. Sanders, D.
Trends Database Technol., 2004, pp. 588–596. Wagner, and R. F. Werneck. Route planning in
[7] Koh H C, Tan G. Data mining applications in transportation networks. In Algorithm
healthcare.[J]. Journal of Healthcare Engineering, pages 19–80. 2016.
Information Management Jhim, 2005, [16] Borodin, Allan, Lee, H. Chul, Ye, and Yuli.
19(2):64-72. Max-sum diversification, monotone
[8] Menon A K, Jiang X, Kim J, et al. Detecting submodular functions and dynamic updates.
Inappropriate Access to Electronic Health Computer Science, pages 155–166, 2012.
Records Using Collaborative Filtering[J].

CRIME DETECTION AND PREDICTION SYSTEM

Aparna Vijay Bhange1, Shreya Arish Bhuptani2, Manjushri Patilingale3,
Yash Kothari4, Prof. D.T. Bodake5
1,2,3,4,5
Dept. of Computer Engineering
Smt. Kashibai Navale College of Engineering, Pune, India.
aparnabhange14@gmail.com1, shreya.bhuptani@gmail.com2,
manjushripatilingale@gmail.com3, yashkothari140598@gmail.com4,
digambertb@gmail.com5
ABSTRACT
Crime these days has become a problem of every nation. Around the globe many
countries are trying to curb this problem. Preventive measures are taken to reduce the
increasing number of cases of crime against women. A huge amount of data set is
generated every year on the basis of reporting of crime. This data can prove very useful
in analyzing and predicting crime and help us prevent the crime to some extent. Crime
analysis is an area of vital importance in police department. Study of crime data can
help us analyze crime pattern, inter-related clues & important hidden relations between
the crimes. For prevention of crime, further using data mining technique, data can be
predicted and visualized in various form in order to provide better understanding of
crime patterns and prediction of crime becomes easier.
General Terms
KNN algorithm
Keywords
Crime, Classification, Detection and prediction, Knn.
1.INTRODUCTION future trends in data based on similarity
The crime rates are accelerating measures.
continuously and the crime patterns are The objective of this work is to
constantly changing. Crime is a violation predict whether the area the person is
against the humanity that is often accused travelling to is safe or not. Along with this
and punishable by the law. Criminology is crime capturing and women safety
a study of crime and it is interdisciplinary modules are added. For this purpose, we
sciences that collects and investigate data have used K-means clustering and KNN
on crime and crime performance. The classification techniques. We have
crime activities have been increased now- illustrated how social development may
a-days and it is the responsibility of police lead to crime prevention.
department to control and reduce the crime
activities [6]. According to National Crime 2. MOTIVATION
Records Bureau, crime against women has Effecting conditions of the physical and
significantly increased in recent years. It social environment that provide
has become most prior to the opportunities for or predicate criminal
administration to enforce law & order to acts. Reduce chances of crime. To help
reduce this increasing rate of the crime local police stations in crime suppression.
against women. So we need methodologies Nowadays crime against women has
to predict and prevent crime. Data Mining increased tremendously. So, this work can
provides clustering and classification be helpful to the needy woman. Primarily
technique for this purpose. Clustering is the motive is to help common people live
used for grouping the similar patterns. in a peaceful and better place.
Classification is a technique of data
analysis that is used to extract and predicts

3. STATE OF ART to identify the hotspot of criminal

Name- Crime Pattern Detection Using activities. In this paper we find the hotspot
Data Mining. of the criminal activities by using
Author - Shyam Varan Nath clustering and classification algorithms.
Description - Here we look at the use of The similar type of crime activities will be
clustering algorithm for a data mining grouped together. Based on the clusters
approach to help detect the crimes patterns result, which cluster contains the more
and speed up the process of solving crime. number of criminal activities that will be
We will look at k-means clustering with called as crime hotspot
some enhancements to aid in the process for the particular crime.
of identification of crime patterns. Name- ABHAYA: AN ANDROID APP
Name-Incorporating data sources and FOR THE SAFETY OF WOMEN
methodologies for crime data Author - Ravi Sekhar Yarrabothu,
mining. Bramarambika Thota
Author -C Atzenbeck, A Celik, Z Erdem Description - This paper presents Abhaya,
Description - This paper investigates an Android Application for the Safety of
sources of crime data mining, Women and this app can be activated this
methodologies for knowledge discovery, app by a single click, whenever need
by pointing out which forms knowledge arises. A single click on this app identifies
discovery is suitable for which the location of place through GPS and
methodology. sends a message comprising this location
Name- Crime Prediction and Forecasting URL to the registered contacts and also
in TamilNadu using Clustering call on the first registered contact to help
Approaches. the one in dangerous situations. The
Author-S. Sivaranjani, S.Sivakumari, unique feature of this application is to send
Aasha.M the message to the registered contacts
Description-This paper uses KNN continuously for every five minutes until
classification technique. The KNN the ―stop‖ button in the application is
classification searches through the dataset clicked. Continuous location tracking
to find the similar or most similar instance information via SMS helps to find the
when the input is given to it. location of the
Name- ―Efficient k-means clustering victim quickly and can be rescued safely.
algorithm using ranking method in data Name- Android Application for women
mining‖ security system
Author- Kaur N, Sahiwal JK, Kaur Author - Kavita Sharma, Anand More
Navneet Description - This paper describes a GPS
Description-This paper demonstrates the and GSM based
use of K-means clustering algorithm. It has ―women security system‖ that provides the
explained the four steps of this clustering combination of GPS device as well as
algorithm namely initialization, provide alerts and messages with an
classification, centroid recalculation and emergency button trigger. whenever some
convergence condition. body is in trouble They might not have so
Name- Criminals and crime hotspot much time, All that they have to do is
detection using data mining algorithms: pressing the volume key. Global
clustering and classification Positioning System (GPS) technology to
Author - Sukanya.M, T.Kalaikumaran and find out the location of women. The
Dr.S.Karthik information of women position provided
Description - To analyse the criminals by the device can be
data, clustering and classification viewed on Google maps using Internet or
techniques are used. These algorithms help specialized software.

By referring these papers, we have tried to safety module and crime capturing
develop this proposed system. module. In first module that is in user
Comparison between existing module the person will come to know
technologies and proposed system: whether the place to which he is travelling
In the existing module of women to safe or not. This module is basically an
safety woman needs to click a button on android application where user can register
the app and then help message will be sent himself. After registration whenever user
to the emergency contact number. This will login, he will see three options which
message will be continuously sent till she are view crime rate, crime capture and last
presses stop button. In the proposed one is logout. In first option he will be
system woman can press the power button able to view the crime status of any area he
3 to 4 times and then single help message wishes to from the available list. This will
is sent to her emergency contact number. be displayed in graphical format. In second
The existing module of user in android option that is crime capture which is
application can see crime rate in the form second module also, if a user finds a crime
of maps or graphs. The proposed system of happening in the surrounding then he can
user in android application can view crime capture it and send it to the nearest police
status in the form of pie chart which is station from the available list so that police
based on crime type. The module of crime will be notified and they can take
capture is solely included in the proposed immediate necessary action. And last one
system. is logout option.
The third module is woman safety
4. GAP ANALYSIS module. This is also an android application
Table: Gap Analysis where the woman must be registered first.
Propo If a woman feels insecure then she can
Manual
sed press the power button of her android
Verificat Govt.
DVS service mobile 4-5 times so that a notification can
ion
s be sent to the emergency contact number
Unlimi which she has provided during the
Validity Medium High
ted registration process.
Confidenti Moderat Mediu Along with the android application
High there will be a webpage which will be
ality e m
Cost of available for both user and admin. Police
verificatio Medium Mediu Low officers will act as admin. Admin can add
n m and update data in the database area wise.
Moderat Mediu
Security High
e m 6. ALGORITHMS
1. K-means clustering
Energy We are using clustering technique of Data
Moder
Consumpt High High Mining. Here Clustering is used for
ate
ion grouping the similar patterns based on
crime type [7]. K-means clustering is used
here. K-means clustering is an
5. PROPOSED SYSTEM unsupervised learning algorithm.
The developed model will help to reduce Clustering will help us to display crime
crimes and will help the crime detection rate graphically using pie chart.
field in many ways that is in reducing The K-means algorithm can be executed in
crimes by carrying out various necessary the following steps:
measures. In this system there are three 1) Specify the value of k that is the number
modules namely user module, woman of clusters.

2) Randomly select k cluster centers. 3) We find the most common classification

3) Assign the data point to the cluster of these entries.
center whose distance from the cluster 4) This is the classification we give to the
center is minimum of all the cluster new sample. [10]
centers. In our project we have used KNN to find
4) Set the position of each cluster center to the list of nearest police station from the
the mean of all data points belonging to current location of user. This will help the
that cluster. user to select the police station which is
5) Recalculate the distance between each nearest to him/her so that the police can
data point and new obtained cluster also take action quickly by reaching the
centers. destination in time.
6) If no data point was reassigned then
stop, otherwise repeat from step 3). 7. SYSTEM ARCHITECTURE
In our project for K-means algorithm there In the system architecture, the flow is :
will be k clusters with k cluster centers and 1) First the user will register/login into the
each center would represent a particular android application. After logging in
crime type. The data points will be the he/she can view crime status of a particular
various types of crimes that have happened area he/she wants to see. At the back-end
and the clustering would be done such that data will be processed from the database
the similar crime type is grouped together and will generate result. Along with the
in a cluster. This grouping of crime type android application there will be a
will be displayed with the help of a pie webpage which will be available for both
chart which will help us to understand the user and admin. The crime data will be
rate of a particular crime in an area. added and updated by police officers in the
2. KNN classification database. The police officer is the admin.
Classification is a technique of data 2) For the Woman Safety module, if any
analysis which is used to extract and woman feels insecure she can press power
predict future trends in data based on button of her android mobile phone 3 to 4
similarity measures. KNN algorithm is times and after this a help message is
used as a classification algorithm. Here, generated and sent to her emergency
we are using KNN algorithm to get the list contact which she has given during her
of nearest police stations. KNN Algorithm registration on the android app.
is based on feature similarity. An object is 3) For the Crime Capture module, if the
classified by a majority vote of its user watches a crime happening in his/her
neighbors, with the object being assigned surrounding then he/she can capture the
to the class most common among its k crime scene using mobile and after that
nearest neighbors. The algorithm can be he/she will get the nearest police station
explained as: list, he/she can select nearest police station
1) A positive integer k is specified, along and can send the photo to that station.
with a new sample. With this photo the police will reach to
2) We select the k entries in our database that location and can do further procedure
which are closest to the new sample.

Fig 1: System architecture
8. CONCLUSION [4] W. Li, ―Modified k-means clustering

algorithm,‖ IEEE Congress on Image and
We looked at the use of data mining Signal Processing, pp. 616- 621, 2006.
techniques in crime prediction and [5] Sukanya. M, T. KalaiKumaran, and Dr. S.
detection. Crime detection is the dynamic Karthik -Criminal and Crime hotspot using
and emerging research field in the real data mining algorithms: clustering and
world which aims to prevent the crime classification.
[6] S. Sivaranjani, Dr. S. Sivakumari, Aasha. M -
rates. Data Mining plays an important role Crime prediction and forecasting in Tamilnadu
in law enforcement agencies in crime using clustering approaches.
analysis in terms of crime detection and [7] Kaur N, Sahiwal JK, ―Efficient k-means
prevention. The developed work model clustering algorithm using ranking method in
will reduce crimes and will help the crime data mining‖, International Journal of
Advanced Research in Computer Engineering
detection field in many ways that is & Technology, vol. 1(3) pp. 85-91, 2012.
reducing the crimes by carrying out [8] Ravi Sekhar Yarrabothu, Bramarambika
various necessary measures. Thota- ABHAYA: AN ANDROID APP FOR
THE SAFETY OF WOMEN
[9] Kavita Sharma, Anand More- Android
REFERENCES Application for women security system.
[1] J. Agarwal, R. Nagpal, and R. Sehgal,
[10] https://medium.com/@adi.bronshtein/a-quick-
―Crime analysis using k-means
introduction-to-k-nearest-neighbors-
clustering,International Journal of Computer
algorithm-62214cea29c7
Applications, Vol. 83 – No4, December 2013.
[11] Shyam Varan Nath- Crime Pattern Detection
[2] J. Han, and M. Kamber, ―Data mining:
Using Data Mining.
concepts and techniques,‖ Jim Gray, Series
[12] C Atzenbeck, A Celik, Z Erdem- Incorporating
Editor Morgan Kaufmann Publishers, August
data sources and methodologies for crime data
2000.
mining.
[3] P. Berkhin, ―Survey of clustering data mining
techniques,‖ In: Accrue Software, 2003.

ACADEMIC ASSESSMENT WITH AUTOMATED

QUESTION GENERATION AND EVALUATION
Kishore Das1, Ashish Kempwad2, Shraddha Dhumal3, Deepti Rana4, Prof. S.P. Kosbatwar5
1,2,3,4,5
Department of Computer Engineering, Smt. Kashhibai Navale College Of Engineering, Pune, India.
kishore.das2697@gmail.com1, kempwadashish123@gmail.com2, shraddhadhumal1997@gmail.com3,
deepti777rana@gmail.com4, shyamkosbatwar@gmail.com5
ABSTRACT
We have introduced an automated way which would permit the operation of generating
exam paper to be further well organized and productive and it would also aid in
developing a database of questions which could be further classified for blending of
exam question paper. Currently, there is no systematic procedure to fortify quality of
exam question paper. Hence, there appears a requirement to have a system which will
automatically create the question paper from teacher entered description within few
seconds. We have implemented a modern evolutionary path that is able to manage
multi-constraints issue along with creating question papers for examinations in
autonomous institutes from a very vast question bank database. The utilization of
randomization algorithm in an Automatic Question Paper Generator System which has
been implemented specially for autonomous institutes are described. The endeavour
needed for generating question paper is diminished after the implementation of this
advanced system and because of this advanced system there is no obligation for
humans to ponder and employ time which can be utilized on some additional important
duty instead of designing question paper.
Keywords
NLP, POS Tagging, Answer Evaluation, Random Question Generator, keyword
extraction, Descriptive Answer verifier.
1. INTRODUCTION Examinations predominantly use question
The examination committee in an institute papers as a vital constituent to discover the
works in a very conventional manner. This caliber of students. A good exam gives all
way it is time consuming and makes all students an equal opportunity to fully
instructors tired of doing these same demonstrate their learning.
activities frequently. Question paper
2. SCOPE
generator is a special and unique software, We are aiming to develop an automated
which is used in school, universities and question paper generator and evaluator.
colleges. Test paper setters who want to The system must minimize the human
have a huge database of questions for errors. The question paper is to be
frequent generation of question can use generated using automation so as to avoid
this too. This software can be implemented repetition. The evaluation must be to
in various medical, engineering and replace the manual checking of answer
coaching institutes for theory paper. You sheets. This is to reduce biased correction.
can create random question papers with The system must be reliable and efficient.
this software anytime within seconds. You It will also save the human labour and
can enter questions based on units, time.
chapters and subjects depending upon the
system storage, capacity and as per the 3. STATE OF ART
requirement. For entering questions, you Various modules like admin module, user
have to first specify the subject and you module, and question entry and question
can enter unlimited questions in a unit. management are mentioned. From the

entered input the paper is generated and

saved as .pdf file[1]. Usage of Stanford
Parser for parsing as well as a parts of
speech tagger. Then adjectives are
separated and relationship of the words
will be determined[2]. Answers are
converted into graphical forms to apply
some of the similarity measures, WordNet
and spreading process to calculate
similarity score[3]. The systems developed
to correct assignments primarily use short-
Fig 2: Data Flow Diagram
text matching, a similarity score[3],
template matching[4], an answer In the first module faculty will add the
validation system[5]. questions which are stored in the
4. SYSTEM DESIGN database by the faculty itself. These
questions can be generated randomly
Architecture using the randomized algorithm.
It consists of three tiers: User Interface,
Questions will be stored in the database
Business Logic and Database. The two
by the faculty and whenever the add
main users are Faculty and Admin. Faculty
questions module will be used questions
will get access to Add and Evaluate
will get added to generate the question
module. Admin will have access to
papers. In the second module admin will
Evaluate and Generate module. The three
generate the question papers. Admin will
modules: Add, Generate and Evaluate will
set some parameters like difficulty level
manipulate the database.
of the questions. Questions can be
generated unit wise. After setting all the
required parameters question paper will
be generated. In the third module faculty
or admin both can verify the answer
sheets of the students. Students write the
answers on the sheets those answer
sheets will be scanned by the admin or
faculty using OCR and then evaluation
of the answer sheets will take place. For
evaluating subjective answers system
will match the keywords of the student's
answer with the standard answer to
check the correctness of the answer. Also
synonyms will be considered for
keyword matching. For checking the
Fig 1: System Architecture grammar of the answers lexical analysis
DFD (Data Flow Diagram) will be used.
There are three major modules: Add Add module
Questions, Question Generation and Faculty will enter the question, subject,
Evaluation of Papers. topic, difficulty level, keywords for answer
evaluation etc. in the database.

keyword matching is done to collectively

evaluate the result.
Mathematical model
The score will be calculated as:
P(QST, Keywords, Grammar) = P(QST)
* P(Keywords) * P(Grammar)
QST is Question Specific Term.
Here, there must be presence of Question
Fig 3: Add module Specific Terms, Keywords and Grammar
Generate module collectively. Absolute absence of any
Admin will select subject, difficulty level, one (say value 0) will result in the score
marks distribution, no. of questions etc. to become 0. It can be understood like a
and generate the required question paper. simple multiplication with the number 0
Questions can be selected by using either which results in 0.
random algorithm or manual selection. 5. ADVANTAGES
 Question can be selected using
difficulty levels.
 Admin can use automated test
paper generator module to save a
lot of time.
 Randomization algorithm for
selection of questions.
 With the use of this system for
exam paper generation there are
zero chances of exam paper getting
leaked as paper can be generated
few minutes before the exam.
Fig 4: Generate module  With this system fewer human

efforts time and resources.
Evaluate module
Answer sheets are converted into text  Unbiased evaluation of answer
file using Image recognition and OCR. sheets.
LIMITATIONS
 Problem of recognizing a wide
variety of handwritten answer
sheets.
 Keyword matchings must support
the usage of synonyms too.
 Difficult to evaluate the diagrams
in the answer sheet.
APPLICATIONS
Fig 5: Evaluate module
 In schools, colleges, universities
The text file is further analyzed using
and other educational institutions
grammar analysis, Keyword extraction,
with huge databases to generate
Synonym replacement for keywords and
question papers frequently.

 In various medical, engineering checked based on the writing

and coaching institutions for style.
theory examinations.  A module can be constructed
 Students are the most important wherein it simulates all the
group of indirect users as they are answer sheets and displays the
the ones who are impartially most ideal sheet and compares it
being evaluated. with the original and shows
similarity ratio.
6. CONCLUSION
The proposed work narrates an REFERENCES
automated system that heads away from [1] Prof. Mrunal Fatangare, Rushikesh Pangare,
Shreyas Dorle, Uday Biradar, Kaustubh Kale,
the traditional process of paper
―Android Based Exam Paper Generator‖,
generation to an automated process, by (IEEE 2018), pp. 881-884.
giving controlled entry to the resources [2] Prateek Pisat, Shrimangal Rewagad, Devansh
that is attained by involving users and Modi, Ganesh Sawant, Prof. Deepshikha
their roles in the colleges. We have also Chaturvedi, ―Question Paper Generator and
Answer Verifier‖, (IEEE 2017), pp. 1074-
considered the significance of
1077.
randomization to avoid duplication of [3] Amarjeet Kaur, M. Sasikumar, Shikha Nema,
questions. Hence the resultant automated Sanjay Pawar, ―Algorithm for Automatic
system for Question Paper Generation Evaluation of Single Sentence Descriptive
will yield enhancement of random Answer‖, 2013.
creation of question papers and [4] Tilani Gunawardena, Medhavi Lokuhetti,
Nishara Pathirana, Roshan Ragel and Sampath
automated evaluation. Deegalla, ―An Automatic Answering System
with Template Matching for Natural Language
FUTURE WORK Questions‖, Faculty of Engineering, University
of Peradeniya, Peradeniya 20400 Sri Lanka.
 Addition of a module that would
[5] jAnne-Laure Ligozat, Brigette Grau, Anne
accept voice data from a Vilnat, Isabelle Robba, Arnaud Grappy,
microphone and correct the same ―Towards an automatic validation of answers
without any human assistance. in Question Answering‖, 19th IEEE
 Scanning for diagrams, figures International Conference on Tools with
and blocks.
 Behavior prediction and
vocabulary of the student can be

A COMPREHENSIVE SURVEY FOR SENTIMENT

ANALYSIS TECHNIQUES
Amrut Sabale1, Abhishek Charan2, Tushar Thorat3, Pavan Deshmukh4
1,2,3,4
Pune, India.
samrut355@gmail.com1, charanabhishek20@gmail.com2, tusharthorat62@gmail.com3,
pavanrajedeshmukh8@gmail.com4
ABSTRACT
Analysis of public information from social media could yield interesting results and
insights into the world of public opinions about almost any product, service or
personality. Social network data is one of the most effective and accurate indicators of
public sentiment. The explosion of Web 2.0, also called Participative and Social Web
has led to increased activity in Pod-casting, Blogging, Tagging, Contributing to Social
Book-marking, and Social Networking. As a result there has been an eruption of
interest in people to mine these vast resources of data for opinions. Sentiment Analysis
or Opinion Mining is the computational treatment of opinions, sentiments and
subjectivity of text.The main idea behind this article is to bring out the process involved
in sentiment analysis. In this paper we will be discussing about techniques which
allows classification of sentiments.
Index Terms—Sentiment analysis, sentiment classification, so-cial media.

1. INTRODUCTION applications. Hence, sentiment analysis
Sentiment is an attitude, thought, or seems having a strong fundamental with
judgment prompted by feeling. Sentiment the support of massive online data.
analysis, which is also known as opinion Microblogging websites have evolved to
mining, studies people‘s sentiments become a source of varied kind of
towards certain entities. Internet is a information. This is due to nature of micro
resourceful place with respect to sentiment blogs on which people post real time
information. From a user‘s perspective, messages about their opinions on a variety
people are able to post their own content of topics, discuss current issues, complain,
through various social media, such as and express positive sentiment for
forums, micro-blogs, or online social products they use in daily life. In fact,
networking sites. From a researcher‘s companies manufacturing such products
perspective, many social media sites have started to poll these micro blogs to
release their application programming get a sense of general sentiment for their
interfaces (APIs), prompting data product. Many time these companies study
collection and analysis by researchers and user reactions and reply to users on
developers. For instance, Twitter currently microblogs. One challenge is to build
has three different versions of APIs technology to detect and summarize an
available, namely the REST API, the overall sentiment.
Search API, and the Streaming API. With The purpose of this survey is to
the REST API, developers are able to investigate lexicon based, machine
gather status data and user information; the learning techniques for different sentiment
Search API allows developers to query analysis tasks. Sentiment analysis tasks are
specific Twitter content, whereas the surveyed as subjectivity classification,
Streaming API is able to collect Twitter sentiment classification. Therefore, articles
content in real time. Moreover, developers writ-ten in last five years on sentiment
can mix those APIs to create their own classification techniques of these tasks are

discussed in this study. Moreover, various tools used to demonstrate the

sentiment analysis approaches,applications process involved in sentiment analysis.
of sentiment analysis and some general
challenges in sentiment analysis are Anuja P Jain , Asst. Prof Padma
presented. Dandannavar[3],The objective of this
paper is to give step-by-step detail about
2. SENTIMENT ANALYSIS the process of sentiment analysis on twitter
PPROACHES-A SURVEY data using machine learning.
Trupthi, Suresh Pabboju, This paper also provides details of
G.Narsimha[2],The key features of this proposed approach for sentiment
system are the training module which is analysis. This work proposes a Text
done with the help Hadoop and analysis framework for twitter data using
MapReduce, Classification based on Naïve Apache spark and hence is more flexible,
Bayes, Time Variant Analytics and the fast and scalable. Naive Bayes and
Continuous-learning System. The fact that Decision trees machine learning
the analysis is done real time is the major algorithms are used for sentiment analysis
highlight of this paper. in the proposed framework.
Juan Guevara, Joana Costa, Jorge Arroba, Anusha K S , Radhika A D[4],In this paper
Catarina Silva[5],One of the most popular we discuss the levels, approaches of
social networks for microblogging that has sentiment analysis, sentiment analysis of
a great growth is Twitter, which allows twitter data, existing tools available for
people to express their opinions using sentiment analysis and the steps involved
short, simple sentences. These texts are for same. Two approaches are discussed
generated daily, and for this reason it is with an example which works on machine
common for people to want to know which learning and lexicon based respectively.
are the trending topics and their drifts. In
this paper we propose to deploy a mobile Ms. Farha Nausheen , Ms.Sayyada Hajera
app that provides information focusing on Begum[6],The opinion of the public for a
areas, such as, Politics, Social, Tourism, candidate will impact the potential leader
and Marketing using a statistical lexicon of the country. Twitter is used to acquire a
approach. The application shows the large diverse data set representing the
polarity of each theme as positive, current public opinions of the candidates.
negative, or neutral. The collected tweets are analyzed using
lexicon based approach to determine the
S. Rajalakshmi, S. Asha, N. sentiments of public. In this paper, we
Pazhaniraja[1],In this case, sentiment determine the polarity and subjectivity
analysis or opining mining is useful for measures for the collected tweets that help
mining facts from those data. The text data in understanding the user opinion for a
obtained from the social network primarily particular candidate. Further, a comparison
undergoes emotion mining to examine the is made among the candidates over the
sentiment of the user message. Most of the type of sentiment.
sentiment or emotional mining uses Sentiment analysis can be classified into
machine learning approaches for better lexicon based approach, machine learning
results.The principle idea behind this approach and hybrid approach. Sentiment
article is to bring out the process involved analysis approaches are listed in Table 1.
in sentiment analysis.Further the
investigation is about the various methods
or techniques existing for performing
sentiment analysis.It also presents the

Therefore, natural language processing is

used to classify and mine the data.
Data preprocessing: This phase is to
clean the collected data before analysis
using various data cleaning methods
involves Get Rid of Extra Spaces, Select
and Treat All Blank Cells, Remove
Duplicates, Highlight Errors Change Text
to Lower/Upper/Proper Case, Spell Check,
Fig. 1. Process of Sentiment Analysis Delete all Formatting etc.
Sentiment detection: The collected
3. APPROACHES IN SENTIMENT sentence opinion has been examined in
ANALYSIS this phase. Subjective sentence carries
This section outlines about the various more sentiment which contains beliefs,
steps involved in sentiment analysis and opinions and reviews have been retrained.
various sentiment classification Objective sentence contains facts and
approaches. factual information has been discarded.
Data acquisition:In this first phase Sentiment classification: Classify
collect data from various social media like sentence into three categories positive,
Twitter, Facebook, LinkedIn etc. These negative and neutral.
Data are in unstructured format.So it is
difficult to analyze the data manually.
Table: Sentiment Classification Techniques
MERITS AND
TYPES APPROACHES DEMERITS
Novel Machine
Learning Approach
MERITS : Broader term
Dictionary based analysis
DEMERITS : Limited
approach number of
Lexicon
words in lexicons and
based Ensemble assigning
Approaches fixed score to opinion
words
Corpus based ap-
proach
MERITS : The capability

Bayesian Networks to cre-
ate trained models for
Maximum Entropy particular
Naive Bayes Classi- purposes
DEMERITS : The new
Machine fication data has
low applicability and it
learning Support Vector Ma- becomes a

more applicable one only

chine when the
data has been labeled but it
Neural Networks can be
more costlier
MERITS : Analysis done

at the
sentence level, so it shows
Lexicon based ma- docu-
ment expressions exactly by
Hybrid chine learning adding
or removing words in the
lexicon
DEMERITS : Noisy
review
4. APPLICATIONS OF SENTIMENT the company and analyze their emotions

ANALYSIS and attitude towards their job. And to
Sentiment analysis is a technique determine whether they are satisfied
which allows big compa-nies to with their job or not.
categorize the massive amount of Better services:
unstructured feedback data from social Text mining can provide a filter about,
media platforms. which service of the company is getting
Finding hot keywords: more negative feedback. This will help
Opinion mining can majorly help the company to know, what are the
in discovering hot search problems arising with that particular
keywords. This feature can help service. And based on this information
the brand in their SEO (Search the company can rectify these problems.
Engine Optimization). This means Get to know what‘s trending:
that opinion mining will help them This will not only help the company to
make strategies about, how their stay updated and connect more with the
brand will come up among the top audience, but it will also facilitate the
results, when a trending or hot rise of new ideas, for developing new
keyword is searched in a search products. This will allow the company
engine. determine what the majority of the
Voice of customer: audience demands, and develop a
Sentiment analysis of social media product according to these demands.
reviews, mentions and surveys help to Feedback on pilot releases and beta
broadcast the voice of customers to the versions:
brand, they are expressing their views When a company releases a new
about. This way the brand knows, product or service, it is released as a
exactly how common folk feels about pilot or beta version. The monitoring of
their services. The company can use public feedback at this stage is very
this information in growing their crucial. So, text mining from social
market, advertisement targeting and media platforms and review sections
building loyalty among its customers. greatly helps accelerate this process.
Employee feedback:
Sentimental analysis can also be used to 5. CHALLENGES FOR SENTIMENT
receive feedback from the employees of ANALYSIS

The challenges in sentiment analysis are [1] S.Rajalakshmi; S.Asha; N.Pazhaniraja, A

Entity Recognition - What is the person Comprehensive Survey on Sen-timent Analysis,
2017 4th International Conference on Signal
actually talking about, e.g. is 300 Spartans Processing, Communications and Networking
a group of Greeks or a movie? (ICSCN -2017), March 16 18, 2017, Chennai,
Classification filtering limitation-Some INDIA
irrelevant opinions are filtered to [2] M.Trupthi; Suresh Pabboju; G.Narasimha.
determine most popular concept, which SENTIMENT ANALYSIS ON TWITTER
USING STREAMING API 2017 IEEE 7th
results limitation in filtering. International Advance Computing Conference
Sentence Parsing - What is the subject and [3] Anuja P Jain; Asst. Prof Padma Dandannavar,
object of the sentence, which one does the Application of Machine Learning Techniques to
verb and/or adjective actually refer to? Sentiment Analysis, 2nd International
Sarcasm - If you don‘t know the author Conference on Applied and Theoretical
Computing and Communication Technology
you have no idea whether ‘bad‘ means bad (iCATccT)
or good. [4] Anusha K S; Radhika A D, A Survey on
Twitter - abbreviations, lack of capitals, Analysis of Twitter Opinion Mining Using
poor spelling, poor punctuation, poor Sentiment Analysis, Dec-2017 International
grammar. Research Jour-nal of Engineering and
Technology (IRJET)
[5] Juan Guevara; Joana Costal; Jorge Arroba;
6. CONCLUSION Catarina Silva Harvesting Opinions in Twitter
The sentiment classification is done for Sentiment Analysis
based on 3 different measures. These [6] Ms. Farha Nausheen; Ms. Sayyada Hajera
measures signify the positive, negative or Begum, Sentiment Analysis to Predict Election
Results Using Python, Proceedings of the
neutral attitude of users towards a Second International Conference on Inventive
particular software or application, thereby Systems and Control (ICISC 2018)
enabling us to know the status of the [7] Godbole, Namrata, Manja Srinivasaiah, and
software from the users perspective.In this Steven Skiena. ‖Large-Scale Sentiment
paper, we have studied various approaches Analysis for News and Blogs.‖ ICWSM 7.21
(2007): 219-222
of many authors views which provide [8] Mitali Desai, Mayuri A. Mehta, Techniques for
several challenges that arise to the sheer Sentimental Analysis of Twitter Data: A
amount of data on web and it proves to Comprehensive Survey, International
show that the sentiment analysis is a Conference on Computing, Communication and
research and very high demanding area for Automation (ICCCA2016).
decision support system. [9]Boia, Marina, et al. ‖A:) is worth a thousand
words: How people attach sentiment to
emoticons and words in tweets.‖ Social
REFERENCES computing (socialcom), 2013 international
conference on. IEEE, 2013.

E – REFERENCING OF DIGITAL DOCUMENT

USING TEXT SUMMARIZATION
Harsh Purbiya1, Venktesh Chandrikapure2, Harshada Sandesh Karne3, Ishwari Shailendra
Datar4, Prof. P. S. Teli5
1,2,3,4,5
Department of Computer Engineering, Smt Kashibai Navale College of Engineering, Vadgan(Bk),
Pune, India.
harshpurbiya2013@gmail.com1, venkteshblue@gmail.com2, harshadaskarne@gmail.com3,
ishwaridatar97@gmail.com4
ABSTRACT
To have a brief look over a particular topic and searching for specific answer from a
documented book or e-book is still quite hectic task. The information might be stated on
eventually a different number of pages which may be ordered or random. This problem
can be solved by an automated text summarization. With this system, the user which
would be a student eventually only needs give the input as e-book to the system and
after the information get processed, he/she is free to shoot queries. In order to achieve
this, we have used machine learning, neural networks, deep learning, etc. Text
summarization approaches are classified into two categories: extractive and abstractive.
This paper presents the comprehensive survey of both the approaches in text
summarization.
Keywords
Automatic Text Summarization, Extractive Summarization, Natural Language Processing
(NLP), NLTK Library, Part-Of-Speech (POS).
1. INTRODUCTION summarize over-sized documents of text.
Reading every page of the book, There is a wealth of textual content
memorizing each information and available on the Internet. But, usually, the
relocating it again afterwards in a short Internet contribute more data than is
time span is likely not possible mostly. desired. Therefore, a twin problem is
Our users possibly the students don‘t have detected: Seeking for appropriate
time to go through hundreds of pages of documents through an awe-inspiring
every book. Detailed study will obviously number of reports offered, and fascinating
take a lot of time, but going through the a high volume of important information.
piece again for a small information isn‘t The objective of automatic text
very efficient. Time and efforts taken for summarization is to condense the origin
that lengthy process can be invested text into a precise version preserves its
somewhere more fruitful. If an automated report content and global denotation. The
system is there which can provide answers main advantage of a text summarization is
to most of their doubts in the form of reading time of the user can be reduced. A
summary then this will not only enhances marvelous text summary system should
the academics but also improves the reproduce the assorted theme of the
knowledge. document even as keeping repetition to a
To overcome this situation, Automatic minimum. Text Summarization methods
Summarization of Textual Documents can are publicly restricted into abstractive and
be taken into consideration. Automatic extractive summarization.
Summarization has grown into a crucial This paper is divided into different
and appropriate engine for supporting and sections. Section 1 is the introduction
illustrate text content in the latest speedy which we have already gone through.
emergent information age. It's far very Section 2 and 3 is all about the research
complex for humans to physically papers that have been referenced for the

current work and also their comparison is 3. GAP ANALYSIS

done. In the later section there is the For the purpose of creating an automated
working of the proposed system and summarization, many research papers have
advantages, future scope and conclusion. been introduces in recent years. We used
few of them as a reference and got
2. LITERATURE SURVEY educated with different facts.
In the past many different kind and Review On Natural Language Processing
versions of summarizers have been [2013] written by Prajakta Pawar, Prof.
introduced and implemented. All of them Alpa Reshamwala and Prof. Dhirendra
are either based on abstractive Mishra discussed the fact that NLP is deep
summarization or extractive and diverse also they proposed that the
summarization. Few referenced papers are main output is based on phonology,
mentioned below. morphology, semantics and pragmatics.
Review On Natural Language Processing : Comparative Study of Text Summarization
 The field of natural language Methods [2014] written by Nikita Munot
processing (NLP) is deep and and Sharvari S. Govilkar, specifies that
diverse. summarization has been viewed in two
 The importance of NLP in steps process.
processing the input text to be Graph Based Approach For Automatic
synthesized is reflected. Text Summarization [2016] is written by
 Natural language processing (NLP) Akash, Somaiah, Annapurna. In this paper
is a collection of techniques used to they introduces the approach to
extract grammatical structure and summarization via Graph and clustering
meaning from input in order to techniques.
perform a useful task as a result,
natural language generation builds 4. PROPOSED WORK
output based on the Phonology, This section will illustrate the purpose and
Morphology,Semantics, complete description about the working of
Pragmatics. the system. It will also explain system
Comparative Study of Text Summarization constraints, interface and interactions with
Methods: other external applications.
 Summarization has been viewed as
a two step process. System Design
 The first step is the extraction of The two types of input are provided to the
important concepts from the source system, first is the document which gets
text by building an intermediate feed in the system, it is provided by the
representation admin and other input is query ,which is
 The second step uses this supplied by the user.
intermediate representation to The original document is duplicated and
generate a summary. stored in the database and the copy is sent
Automatic Text Summarization and it’s further along the system. The viability of
Methods: the document is checked, i.e., if given
There are 3 kinds of Text specifications like size of file, language of
Summarization Systems e-book etc. are followed. After the first
 Abstractive vs. Extractive input, the document is passed from the
 Single Document vs. Multi- pre-processing module where all the
Documents information is parsed, NLP algorithms are
 Generic vs. Query-based applied, proper elemental tags are
generated via POS tagging or graph-based
tagging is done. This parsed information is

saved accordingly in the particular few text processing algorithms are

database (RDBMS / Graph). After the pre- implemented. But to enhance the working
processing stage, the query from user will of the system, Artificial Intelligence, Deep
be one of the Second Input for the system. Learning, Neural Networks, Machines
The query asked by the user will be Learning concepts are mandatory too.
processed using different text processing When the preprocessing stage is over, the
algorithms and the query is validated. If user can now toggle the queries for which
valid the system moves to the next phase. he expects a automates summary. This
Here to enhance the working of the query of user will be one of the Second
system; Artificial Intelligence, Deep Input for the system. We are taking this
Learning, Neural Networks, Machines under consideration that the user will
Learning concepts are used to successfully search for more than one query to satisfy
identify the particular part of the e-book his thirst of information. Here for multiple
which consist of the output to the users second input, There will be different
query. The particular passage gets tagged. multiple output.
System Inputs and Outputs

This system will project a simple interface
which help the User to interact with the
input given by user itself in the form of
digitalized document. User can actually
seek for answers by giving another input
as a query.
This system consist of two types of input,
first is the document which gets fed in the After the second input of the user the
system,and other input is query. Both of system processes that input as a query. The
these input are given by User. whole document is inspected in a brief and
After the first input, the document is efficient manner so an informative
passed from the preprocessing module summary can be generated in response to
where all the information is parsed, NLP that particular query. Yet this system has
algorithms are applied, proper elemental some limitation and bounds. Technical and
tags are generated via POS tagging or computation related documents are not
graph-based tagging. This parsed entertained. Only particular theoretical e-
information is saved accordingly in the books or documents shall be preprocessed.
Particular database (RDBMS / Graph). User group is also very much specific, i. e.
The processing also has its own different Particular Students or schools or colleges.
complex procedures. Each and every
sentence, even a single word or literal
needs to be parsed. For performing that

System Functions literature. Such a hectic task would be

With the help of the interface that will be normalised.
provided to handle the system, the user can Time saved means more time for quality
get benefits by getting the informative work. A lot time that was being consumed
summary for his/her question or query in research work would be minimized and
over a document. This document must pass there leading to betterment of different
all the conditions before preprocessing. parameters.
The conditions are, the document that is to The system facilitates the ease of grasping
be fed needs to be theoretical completely, the crucial topics in a go. Sometimes what
there should be no technical computational happens is, we ignore some of the most
or mathematical proof part, etc. The important aspects or points from a
rectification of these conditions is a future particular document while examining it or
scope. The user can also toggle the newly reading it.
generated interactive document. Each and It is easy to understand and requires hardly
every query user asks in a session will be any setup. The out of the procedure is
saved to enhance the user experience and simple, readable and understandable.
also so that the system will learn from it A high degree of abstraction is attained.
for delivering better results in the future. The whole system designed is kept in
backside so the primary actor of course
User Characteristics shouldn‘t get confuse of the methodologies
As for now the user category will only and technicalities used.
stick to the students. The role of students
will be of a stack holder as they will be 6. FUTURE WORK
getting benefits from this system, in In today‘s date, the rate at which the data
gaining knowledge or in their academic is increasing day by is vicious. By data we
session. Student will take a document and not only mean the visual data but also the
will feed that in the system and again as textual data. Suppose you work in a law
the second input he/she will give their firm and you need to verify few documents
query to the system on the interface and and those documents have sub parts of
the system will give its output in the for of 6000 scripts each. Such a work for a single
summary from that document. At the person or even a group is timid. Another
initial stage even the category of student example can be taken of the business
will be neutralised and their level will be related books and documents. Business
winged off. doesn‘t comes in a single book but in
Looking at the future scope in user experience and that experience is scripted
category, even the teachers can use such a in documents which are delivered
system for their betterment and time regularly. To study all these documents is
saving or say productivity. literally impossible.
With the help of the automated
5. ADVANTAGES summarization and E-referencing an
The system provides summary in a general individual will get help in finding the right
automated manner. This simply means that information from these large chunks.
the output will be very much simplified In future our system can be used by
and in a easy to read and understand different schools and institutional
manner. organizations for examining purposes and
Re-reading for finding a particular also for teaching the right content to the
conclusion is banished. Sometimes a lot of students.
rework is required to find a particular
information from a chunk of gigantic

7. CONCLUSION
In this paper, we have presented a way in REFERENCES
which the automated generated [1] Review On Natural Language Processing -
summarization can performed and be used Prajakta Pawar, Prof. Alpa Reshamwala and
for various fundamental purposes. Prof. Dhirendra Mishra Cite As :
The procedure that we proposed consist of https://www.researchgate.net/publication/235
788362 | Published in An International
few steps. First we need to take an input Journal (ESTIJ), ISSN: 2250-3498, Vol.3,
from the user which would be inn the form No.1, February 2013
of digital document. Our system would [2] Comparative Study of Text Summarization
scan that document and get it preprocessed Methods - Nikita Munot andSharvari S.
for the later steps. The next step would be Govilkar Cite As :
https://pdfs.semanticscholar.org/0c95/0bc8f2
applying different algorithms for the 34ecb6cf57f13bca7edd118809d0ca.pdf |
purpose of summarization and Published in: International Journal of
information retrieval. This may include Computer Applications (0975 – 8887)
scanning, parsing, POS tagging and Volume 102– No.12, September 2014.
different technical measures. [3] Automatic Text Summarization and it‘s
Methods - Neelima Bhatiya and Arunima
Now the original document will be saved Jaiswal Cite As :
in the database along with the output that https://ieeexplore.ieee.org/abstract/document/7
we got after the preprocessing i.e., the 508049/ | Published in 2016 6th International
summary. After this, the user will give Conference
second input to the system, that would be [4] Graph Based Approach For Automatic Text
Summarization - Akash, Somaiah, Annapurna
the query that needed to be looked for. The Cite As : https://ijarcce.com/wp-
system will take that query and look for content/uploads/2016/11/IJARCCE-
the appropriate solution for that. At last an ICRITCSA-2.pdf | Published in International
output in the form of text is generated and Journal of Advanced Research in Computer
is provided to the user. and Communication Engineering, Vol. 5,
Special Issue 2, October 2016

ONLINE SHOPPING SYSTEM WITH STITCHING

FACILITY
Akshada Akolkar1, Dahifale Manjusha2, Chitale Sanchita3
1,2,3
Dept. of computer engineering, S.C.S.M.C.O.E, Nepti, Ahmednagar, Maharashtra, India.
akshadaakolkar321@gmail.com1, manjushad500@gmail.com2, sanchu8796@gmail.com3
ABSTRACT
Online shopping is a form of electronic commerce which allows consumers to directly
buy goods or services from a seller over the internet using a web browser. Consumers
find a product of interest by visiting the website of the retailer directly or by searching
among alternative vendors using a shopping search engine, which displays the same
product‘s availability and pricing at different e-retailers.
The Proposed web application would be attractive enough, have a professional
look and user friendly. The online shopping is a web based application intended for
online retailers. The main goal of this system is to make it interactive and its ease of
use. It would make searching, viewing and selection of the product easier. The user can
then view the complete specification of each product. The application also provides a
drag and drop features, so that a user can add a product to shopping cart by dragging
the items into the shopping cart. The Main aim of the project is to automate the
tailoring sector which is manually maintained. After the automation this will provide
better services such as fitting facility and also paperless environment, and quick search,
data integrity and security.
Keywords
Shopping process, E-Commerce and mining, Web mining, Website reorganization,
Improved Mining, Consumer buying behaviour.
1. INTRODUCTION As the tailors work manually, the whole
E-commerce is fast gaining ground as an process tends to be slow. Customers too
accepted and used business paradigm. have no prior information on cost of
More and more business houses are netting their garments. So Proposed
implementing web sites providing system is a system aimed to assist in
functionality for performing commercial management of tailoring activities within
transactions over the web. It is reasonable the industry. It will provide online services
to say that the process of shopping on the to customers such as: measurement
web is becoming commonplace. The submission to their tailors, check whether
objective of this project is to develop a their garments are finished and also help
general purpose e-commerce store where proper keeping of records. The availability
product like clothes can be bought from of right information, information safety,
the comfort of home through the Internet. easy storage, access and retrieval will be
However, for implementation purposes, ensured.
this paper will deal with an online
shopping for clothes. 2. RELATED WORK
Currently customers have to walk Tailors use traditional manual systems to
to the tailor shops to get their book in their clients. The clients have to
measurements taken for the tailoring of travel to location of the tailor shop to get
their garments. Their details are taken and their measurement taken. These
kept on papers. Customers need to take measurements are written on paper or
some time out from their busy schedule books. This system will solve all these
and visit the tailor. This is time and costly. problems and automate the tailor shops

and enhance accessibility irrespective of the clients for collection and bookings
geographical locations provided there is made, administrator is able to view all the
internet. customers and their details, finished
garments and all the booking made. To
3. PROPOSED WORK create a data bank for easy access a
The proposed system will automate the retrieval of customer details, orders placed
current manual tailoring system and and the users who register to the system.
maintain a searchable customer, product The registration process for the customers
database, maintain data security and user is provided online by the system which
rights. Here system will enable customers will help to successfully submit their
to send their measurements to their tailors measurements. The system has inbuilt
for their clothes to be made. Also this will validation system to validate the entered
provide information about the cost, the data. The customer can login to the system
fabric type, the urgency at which a to check on the status of the clothes for
customer wants the dress finished, the type collection. The system will show the
of material to be used and quantity in already completed garments for client to
terms of pairs needed. To compute the collect. The system also provides
total cost depending on the selected fabric, information about the cost of each garment
type of material, quantity and duration and the customer intents to get knit. The data
avails that information to the customer. will be store in the database for further
This enable report generation: it is able reference or audit.
give a report of finished the garments to
Figure1 Use Case Diagram

constant guidance and support. We will

4. ADVANTAGES
forever remain grateful for the constant
1. E- Commerce has changed our life support and guidance extended by guide,
styles entirely because we do not in making this report. Through our many
have to spend time and money discussions, she helped us to form and
travelling to the market. solidify ideas. The invaluable discussions
2. It is one of the cheapest means of we had with her, the penetrating questions
doing business as it is e-commerce he has put to us and the constant
development. motivation, has all led to the development
3. This will ensure availability of of this project. We would like to convey
right information, information our sincere and heart rendering thanks to
safety, easy storage, access and Principal Dr. Deshpande R.S. for his co-
retrieval. operation, valuable guidance. Also we
4. This will eliminate all the manual wish to express our sincere thanks to the
interventions and increase the Head of department, Prof. Lagad J.U. for
speed of the whole process. their support.
5. It provides better services good
keeping of records, data integrity,
data security, quick search and also REFERENCES
paperless environment. [1] Anand Upadhyay, Ambrish Pathak, Nirbhay
Singh, ―Evolution of Online Shopping:
ECommerce‖, International journal of
5. CONCLUSION Commerce and Management Research, June
The main reason behind the establishment 2017
[2] Neha Verma, Prof.(Dr.)Jatinder Singh,
of Online shopping system with stitching ―Improved Web Mining for E-Commerce
facility is to enable the customer and Website Restructuring‖ ,2015 IEEE
administrator in a convenient, fair and [3] Ifeoma Adaji and Julita Vassileva, ―Tailoring
timely manner of interaction. Therefor the Persuasive Strategies in E-Commerce‖ ,
IT used by whoever uses the system Persuasive Technology 2017
[4] Subramani Mani and Eric Walden, ―The
should support the core objective of the Impact of E-Commerce Announcements on the
system if it is to remain relevant. This may Market Value of Firms , ‖ Information System
involve training of the staff on how to Research, Vol.12, Issue.2, pp.135-154,2001.
enter right and relevant data into system [5] Shahrzad Shahriari, Mohammadreza Shahriari
and management to keep updating the and Saeid Gheiji, ―E-Commerce and It impact
on Global Trend and Market‖, International
hardware and software requirement of the
Journal of Research, Vol.3, Issue.4, pp.49-55
system. IT and computer system need to 2015
be kept being upgraded as more and more [6] Menal Dahiya, ―Study on E-Commerce and its
IT facilities software are introduced in Impact on market and Retailers in India‖ ,
today‘s IT market. The researcher Advances in Computational Sciences and
Technology ISSN 0973-6107 Volume 10,
acknowledges the fact that this system Number 5(2017) pp. 1495-1500
does not handle all staff the tailor shop [7] Shen Zihao, Wang hui, ―Research on
have like the asset section and staff Ecommerce Application based on Web
members in the tailor shop. The researcher mining‖, Proceedings of IEEE International
therefore suggests that for further research Conference on Intelligent Computing and
cognitive Informatics, 2010, pp.337-340, DOI
into building a system that capture all 10.1109/ICICCI 2010.89
fields as pertains the tailor shop. [8] Zhiwu Liu, Li Wang, ―Study of Data Mining
technology used for E-Commerce‖,
Proceedings of IEEE Third International
6. AKNOWLEDGEMENT Conference on Intelligent network And
First and foremost, we would like to thank Intelligent Systems, 2010, pp.509-512, DOI
our guide, Prof. Pawar S.R. for his 10.1109/ICINIS.2010.61

[9] Babita Saini, ―E-Commerce in India‖, Symposium on Computational Intelligence and

Proceeding of International Journal Of Design
Business and Management, ISSN 2321-8916, [17] Ning Luo,Jungang Xu. Application of Web
vol.2, Issue 2, pp.1-5, Feb 2014 data mining in E-commerce [J]. Electronic
[10] Latika Tamrakar, S.M.Ghosh, ―Identification technology .2012,4:005.
of Frequent Navigation Pattern Using Web [18] Jinyong Liu. WEB data mining research
Usage Mining‖, International Journal of application in E-commerce [J]. Network
Advance Research in Computer Science And security technology and Application, 2013
Technology (IJRCST), ISSN 2347- 9817 , Vol. (9):25-26
2, Issue 2, Ver 2, April-June 2014, pp.296-299. [19] Li Kan, Hao Pan. Application of Web data
[11] Bhupinder Singh, Usvir Kaur, Dr. Dheerebdra mining technology in E-commerce [J].
Singh, ―Web usage Clustering Algorithms: A Computer Knowledge and technology, 2010,
Review‖, International Journal of Latest 4: 816-81.
Scientific Research and Technology, ISSN [20] Yonghua Zhao, Hong Lin, ―WEB data mining
2348-9464, July 2014, pp.1-7. applications in e-commerce‖, The 9th
[12] Adaji, I., Vassileva, J.; Evaluating International Conference on Computer Science
Personalization and Persuasion in & Education(ICCSE 2014) August 22-
ECommerce. Proc. Int. Work. Pers. Persuas. 24,2014. Vancouver, Canada.
Technol.(2016), [21] Hye Young Lee, Minwoo Lee, Moon-Gil
[13] Damanpour, Faramarz, Jamshid Ali Yoon, ― Website Development Strategy for e-
Damanpour. E-business e-commerce commerce Success‖.
evolution; perspective and strategy, [22] Nor Haimimy Rawi, Marini Abu Bakar,
Managerial finance. 2001;27(7):16-33. Rokiah Bahari and Abdullah Mohd Zin,
[14] Gunesekaran A et al. E-Commerce and its ―Development Environment for Layout Design
impact on operation management. of e-commerce Applications Using Block-
International Journal of Production economics. Based Approach‖, 2011 International
2002; 75(1):185-197. Conference on Electrical Engineering and
[15] J. Tian . Software quality Engineering – Informatics 17-19 July 2011, Bandung,
Testing, Quality Assurance and quantifiable Indonesia.
improvement, IEEE Computer Society. [23] Chuan Lin, ―The Evolution of Ecommerce
[16] Xia Wang, Ke Zhang, Qingtian Wu, ―A Payment‖, Technology and Investment ,2017,
Design Of Security Assessment System for 8, 56-66, DOI:10.4236/ti.2017.81005.
Ecommerce Website‖, 2015 8th International

A SURVEY ON ONLINE MEDICAL SUPPORT

SYSTEM
Shivani J. Sawarkar1, G.R. Shinde2
1,2
Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Vadgaon(Bk),
Pune, India.
shivanisawarkar15@gmail.com1, grshinde@sinhgad.edu2
ABSTRACT
In our society, humans pay more attention to their own fitness. Personalized fitness
service is regularly rising . Due to lack of skilled doctors and physicians, maximum
healthcare corporations cannot meet the clinical call of public. Public want extra
accurate and on the spot result. Thus, increasingly more facts mining packages are
evolved to provide humans extra custom designed healthcare provider. It is a good
answer for the mismatch of insufficient clinical assets and growing medical demands.
Here an AI-assisted prediction device is advocated which leverages information mining
strategies to show the relationship between the everyday physical examination
information and the capability fitness danger given through the consumer or public.
The main concept is to decide clinical illnesses in step with given signs and symptoms
& every day Routine where user search the sanatorium then given the closest medical
institution of their cutting-edge area.
Keywords:
Data mining, machine learning and disease prediction.
1. INTRODUCTION lack of effective analysis tools to discover
Many healthcare organizations (hospitals, hidden relationships and trends in data.
medical centers) in China are busy in This process is inefficient, as each
serving people with best-effort healthcare suspicious access has to be reviewed by a
service. Nowadays, people pay more security expert, and is purely retrospective,
attention on their physical conditions. as it occurs after damage may have been
They want higher quality and more incurred.[1] Data mining is suitable for
personalized healthcare service. However, processing large datasets from hospital
with the limitation of number of skilled information system and finding relations
doctors and physicians, most healthcare among data features.The list of challenges
organizations cannot meet the need of in order of importance that they be solved
public. How to provide higher quality if patients and organizations are to begin
healthcare to more people with limited realizing the fullest benefits possible of
manpower becomes a key issue. The these systems consists of: improve the
healthcare environment is generally human–computer interface; disseminate
perceived as being ‗information rich‘ yet best practices in CDS design,
‗knowledge poor‘. Hospital information development, and implementation;
systems typically generate huge amount of summarize patient-level information;
data which takes the form of numbers, prioritize and filter recommendations to
text. . There is a lot of hidden information the user; create an architecture for sharing
in these data untouched. Data mining and executable CDS modules and services;
predictive analytics aim to reveal patterns combine recommendations for patients
and rules by applying advanced data with co-morbidities; prioritize CDS
analysis techniques on a large set of data content development and implementation;
for descriptive and predictive purposes. create internet-accessible clinical decision
There is a wealth of data available within support repositories; use free text
the healthcare systems. However, there is a information to drive clinical decision

support; mine large clinical databases to added to the examiner data set. It
create new CDS[2] It takes only a few consumes lot of time.
researchers to analyze data from hospital
information.. Knowledge discovery and 3. REVIEW OF LITERATURE
data mining have found numerous Sittig D, Wright A, Osheroff J, et al. [1]
applications in business and scientific There is a pressing need for high-quality,
domain.[3]The main concept is to effective means of designing, developing,
determine medical diseases according to presenting, implementing, evaluating, and
given symptoms & daily routine when user maintaining all types of clinical decision
search the hospital then given the nearest support capabilities for clinicians, patients
hospital of their current location. Data and consumers. Using an iterative,
mining techniques used in the prediction consensus-building process we identified a
of heartattacks are rule based, decision rank-ordered list of the top 10 grand
trees,attificial neural networks.[4] The challenges in clinical decision support.
related queries are based in previously This list was created to educate and inspire
issued queries, and can be issued by the researchers, developers, funders, and
user to the search engine to tune or redirect policy-makers. The list of challenges in
the search process. The method proposed order of importance that they be solved if
is based on a query clustering process in patients and organizations are to begin
which groups of semantically similar realizing the fullest benefits possible of
queries are identified [5]. The clustering these systems consists of: improve the
process uses the content of historical human–computer interface; disseminate
preferences of users registered in the query best practices in CDS design,
log of the search engine The system development, and implementation,
provides a user-friendly interface for summarize patient-level information;
examinees and doctors. Examinees can prioritize and filter recommendations to
know their symptoms which accrued in the user; create an architecture for sharing
body which set as the while doctors can executable CDS modules and services;
get a set of examinees with potential risk. combine recommendations for patients
A feedback mechanism could save with co-morbidities; prioritize CDS
manpower and improve performance of content development and implementation;
system automatically. create internet-accessible clinical decision
support repositories; use free text
2. MOTIVATION information to drive clinical decision
Previous medical examiner only used basic support; mine large clinical databases to
symptoms of particular diseases but in this create new CDS. Identification of
application examiner examines on the solutions to these challenges is critical if
word count, laboratory results and clinical decision support is to achieve its
diagnostic data. A feedback mechanism potential and improve the quality, safety
could save manpower and improve and efficiency of healthcare.
performance of system automatically. The
doctor could fix prediction result through Anderson J E, Chang DC. Et al [2] Many
an interface, which will collect doctors‘ healthcare facilities enforce security on
input as new training data. An extra their electronic health records (EHRs)
training process will be triggered everyday through a corrective mechanism: some
using these data. Thus, this system could staff nominally have almost unrestricted
improve the performance of prediction access to the records, but there is a strict
model automatically. When the user visits ex post facto audit process for
hospital physically, then user‘s personal inappropriate accesses, i.e., accesses that
record is saved and then that record is violate the facility‘s security and privacy

policies. This process is inefficient, as decision tree, naïve bayes and artificial
each suspicious access has to be reviewed neural network to massive volume of
by a security expert, and is purely healthcare data is briefly examined. The
retrospective, as it occurs after damage healthcare industry collects huge amounts
may have been incurred. This motivates of healthcare data which, unfortunately,
automated approaches based on machine are not ―mined‖ to discover hidden
learning using historical data. Previous information. For data preprocessing and
attempts at such a system have effective decision making One
successfully applied supervised learning Dependency Augmented Naïve Bayes
models to this end, such as SVMs and classifier (ODANB) and naive creedal
logistic regression. While providing classifier 2 (NCC2) are used. This is an
benefits over manual auditing, these extension of naïve Bayes to imprecise
approaches ignore the identity of the users probabilities that aims at delivering robust
and patients involved in a record access. classifications also when dealing with
Therefore, they cannot exploit the fact that small or incomplete data sets. Discovery of
a patient whose record was previously hidden patterns and relationships often
involved in a violation has an increased goes unexploited. Using medical profiles
risk of being involved in a future violation. such as age, sex, blood pressure and blood
Motivated by this, in this paper, a sugar it can predict the likelihood of
collaborative filtering inspired approach to patients getting a heart disease. It enables
predict inappropriate accesses is proposed. significant knowledge, e.g. patterns,
Our solution integrates both explicit and relationships between medical factors
latent features for staff and patients, the related to heart disease, to be established.
latter acting as a personalized ―finger-
print‖ based on historical access patterns. Srinivas K, Rani B K, Govrdhan A. et
The proposed method, when applied to al[4].In this paper, care services through
real EHR access data from two tertiary telemedicine is provided and it has become
hospitals and a file-access dataset from an important part of the medical
Amazon, shows not only significantly development process, due to the latest
improved performance compared to innovation in the information and
existing methods, but also provides computer technologies. Meanwhile, data
insights as to what indicates an mining, a dynamic and fast-expanding
inappropriate access. domain, has improved many fields of
human life by offering the possibility of
ZhaoqianLan, Guopeng Zhou, predicting future trends and helping with
YichunDuan , Wei Yan et al[3] healthcare decision making, based on the patterns and
environment is generally perceived as trends discovered. The diversity of data
being ‗information rich‘ yet ‗knowledge and the multitude of data mining
poor‘. There is a wealth of data available techniques provide various applications for
within the healthcare systems. However, data mining, including in the healthcare
there is a lack of effective analysis tools to organization. Integrating data mining
discover hidden relationships and trends in techniques into telemedicine systems
data. Knowledge discovery and data would help improve the efficiency and
mining have found numerous applications effectiveness of the healthcare
in business and scientific domain. organizations activity, contributing to the
Valuable knowledge can be discovered development and refinement of the
from application of data mining techniques healthcare services offered as part of the
in healthcare system. In this study, the medical development process.
potential use of classification based data
mining techniques such as rule based,

Gheorghe M, Petre R. et al[5] In this paper Koh H C, Tan G.et al [7] many healthcare
a method is proposed that, given a query facilities enforce security on their
submitted to a search engine, suggests a electronic health records (EHRs) through a
list of related queries. The related queries corrective mechanism: some staff
are based in previously issued queries, and nominally have almost unrestricted access
can be issued by the user to the search to the records, but there is a strict ex post
engine to tune or redirect the search facto audit process for inappropriate
process. The method proposed is based on accesses, i.e., accesses that violate the
a query clustering process in which groups facility‘s security and privacy policies.
of semantically similar queries are This process is inefficient, as each
identified. The clustering process uses the suspicious access has to be reviewed by a
content of historical preferences of users security expert, and is purely retrospective,
registered in the query log of the search as it occurs after damage may have been
engine. The method not only discovers the incurred. This motivates automated
related queries, but also ranks them approaches based on machine learning
according to a relevance criterion. Finally, using historical data. Previous attempts at
with experiments over the query log of a such a system have successfully applied
search engine is shown andthe supervised learning models to this end,
effectiveness of the method. such as SVMs and logistic regression.
While providing benefits over manual
R. Baeza-Yates, C. Hurtado, and M. auditing, these approaches ignore the
Mendoza,et al[6], thesystem have focused identity of the users and patients involved
to compare a variety of techniques, in a record access. Therefore, they cannot
approaches and different tools and its exploit the fact that a patient whose record
impact on the healthcare sector. The goal was previously involved in a violation has
of data mining application is to turn that an increased risk of being involved in a
data are facts, numbers, or text which can future violation. Motivated by this, in this
be processed by a computer into paper, a collaborative filtering inspired
knowledge or information. The main approach to predict inappropriate accesses
purpose of data mining application in is proposed. The solution integrates both
healthcare systems is to develop an explicit and latent features for staff and
automated tool for identifying and patients, the latter acting as a personalized
disseminating relevant healthcare ―finger-print‖ based on historical access
information. This paper aims to make a patterns. The proposed method, when
detailed study report of different types of applied to real EHR access data from two
data mining applications in the healthcare tertiary hospitals and a file-access dataset
sector and to reduce the complexity of the from Amazon, shows not only
study of the healthcare data transactions. significantly improved performance
Also presents a comparative study of compared to existing methods, but also
different data mining applications, provides insights as to what indicates an
techniques and different methodologies inappropriate access.
applied for extracting knowledge from
database generated in the healthcare Tao Jiang & Siyu Qian, et al. [8]The study
industry. Finally, theexisting data mining aimed to identify risk factors in medication
techniques with data mining algorithms management in Australian residential aged
and its application tools which are more care (RAC) homes. Only 18 out of 3,607
valuable for healthcare services are RAC homes failed aged care accreditation
discussed in detail. standard in medication management
between 7th March 2011 and 25th March
2015. Text data mining methods were used

to analyse the reasons for failure. This led solving capacitated flow problems or in
to the identification of 21 risk indicators selecting routes for hazardous materials .A
for an RAC home to fail in medication critical disscussion of three existing
management. These indicators were methods for the generation of spatially
further grouped into ten themes. They are dissimilar paths is offered and
overall medication management, computational experience using these
medication assessment, ordering, methods is reported. As an alternative
dispensing, storage, stock and disposal, method, the generation of a large set of
administration, incident report, candidate paths and the selection of a
monitoring, staff and resident satisfaction. subset using a dispersion model which
The top three risk factors are: ―ineffective maximizes the minimum dissimilarity in
monitoring process‖ (18 homes), the selected subset is proposed.
―noncompliance with professional
standards and guidelines‖ (15 homes), and T. Akiba, T. Hayashi, N. Nori, Y. Iwata,
―resident dissatisfaction with overall and Y. Yoshida.et al [12]. An indexing
medication management‖ (10 homes). scheme for top-k shortest path distance
queries on graphs, which is useful in a
Song J H, Venkatesh S S, Conant E A, et wide range of important applications such
al. [9], the k-means clustering and self- as network aware searches and link
organizing maps (SOM) are applied to prediction is proposed. While many
analyze the signal structure in terms of efficient methods for answering standard
visualization. k-nearest neighbor (top-1) distance queries have been
classifiers (k-nn), support vector machines developed, none of these methods are
(SVM) and decision trees (DT) are directly extensible to top-k distance
employed to classify features using a queries. A new framework for top-k
computer aided diagnosis (CAD) distance queries based on 2-hop cover is
approach. developed and then present an efficient
indexing algorithm based on the recently
Song J H, Venkatesh S S, Conant E A, et proposed pruned landmark labeling
al. [10],Breast cancer is one of the most scheme. The scalability, efficiency and
common cancers in women. Sonography is robustness of the method is demonstrated
now commonly used in combination with in extensive experimental results.
other modalities for imaging breasts. A. Angel and N. Koudas.et al
Although ultrasound can diagnose simple [13].diversity-aware search, in a setting
cysts in the breast with an accuracy of that captures and extends established
96%–100%, its use for unequivocal approaches, focusing on content-based
differentiation between solid benign and result diversification is studied. DIVGEN
malignant masses has proven to be more is presented, an efficient threshold
difficult. Despite considerable efforts algorithm for diversity-aware search.
toward improving imaging techniques, DIVGEN utilizes novel data access
including solography, the final primitives, offering the potential for
confirmation of whether a solid breast significant performance benefits. The
lesion is malignant or benign is still made choice of data accesses to be performed is
by biopsy. crucial to performance, and a hard problem
in its own right. Thus a low-overhead,
V. Akgün, E. Erkut, and R. Batta.et al intelligent data access prioritization
[11] considers the problem of finding a scheme is proposed, with theoretical
number of spatially dissimilar paths quality guarantees, and good performance
between an origin and a destination. A in practice..
number of dissimilar paths can be useful in

H. Bast, D. Delling, A. V. Goldberg, M. of physical status next year based on the

Müller-Hannemann, T. Pajor,P. Sanders, physical examination records this year.
D. Wagner, and R. F. Werneck et al [14]. Examinees can know their potential health
Survey is done in recent advances in risks while doctors can get a set of
algorithms for route planning in examinees with potential risk. It is a good
transportation networks. For road solution for the mismatch of insufficient
networks, it is shown that one can compute medical resources and rising medical
driving directions in milliseconds or less demands. They apply various supervised
even at continental scale. A variety of machine learning methods, including
techniques provide different trade-offs decision tree, XG Boost to predict
between preprocessing effort, space potential health risks of examinees using
requirements, and query time. Some their physical examination
algorithms can answer queries in a fraction records.Examinees can know their
of a microsecond, while others can deal symptoms which occured in body which
efficiently with real-time traffic. set as the (potential health risks according)
Borodin, Allan, Lee, H. Chul, Ye, and while doctors can get a set of examinees
Yuliet al. [15].Result diversification is an with potential risk.
important aspect in web-based search,
document summarization, facility location, 5. PROPOSED SYSTEM
portfolio management and other After the analysis of the previous system
applications. Given a set of ranked results this ssystem‘s main concept is to
for a set of objects (e.G. Web documents, determine medical diseases according to
facilities, etc.) With a distance between given symptoms & daily routine and when
any pair, the goal is to select a subset S user search the hospital then the nearest
satisfying the following three criteria: (a) hospital of their current location is given.
the subset S satisfies some constraint The system provides a user-friendly
(e.G.Bounded cardinality); (b) the subset interface for examinees and doctors.
contains results of high ―quality‖; and (c) Examinees can know their symptoms
the subset contains results that are which occured in body while doctors can
―diverse‖ relative to the distance measure. get a set of examinees with potential risk.
The goal of result diversification is to A feedback mechanism could save
produce a diversified subset while manpower and improve performance of
maintaining high quality as much as system automatically. The doctor could fix
possible prediction result through an interface,
which will collect doctors‘ input as new
4. OPEN ISSUES training data. An extra training process
According to Survey, system leverages will be triggered everyday using these
data mining methods to reveal the data. Thus, our system could improve the
relationship between the regular physical performance of prediction model
examination records and the potential automatically.
health risk. It can predict examinees‘ risk

Preprocessing
Search Show Result Clean Data DataBase

Select Doctor
Given Systoms
Given Result
Logical Part
Machine To Predication
Random Forest
Learning Result.
Baseline
Algorithm
User
User Search Get Current Seen Current

Keyword Location Hospital
Figure 1: System Overview
This system increases human-computer run a number of experiments on randomly-

interactions. Location of user is detected. ordered arrays of size n and find the
Also the hospital and doctor is average number of comparisons for those
recommended to the patient according to experiments. Graph the average number of
the prediction of the disease. Medicines comparisons as a function of n and repeat
are provided for the predicted disease. This the above items 1–4, using an alternative
prediction system is fast, scalable and pivot selection method.
lowcost. Baseline algorithm:
Implement the algorithm and test it to find
6. ALGORITHMS the RWR based top-m query recommend.
Random Forest Algorithm: Start from one unit of active ink injected
The beginning of random forest algorithm into node Kq and the order in descending
starts with randomly selecting ―k‖ features order. Find the weight of each edge e is
out total ―m‖ features. In the image, it is adjusted based on q.The algorithm
seen that features and observations are returns the top-m candidate
randomly taken. In the next stage, the suggestionsother than kq in C as the result.
randomly selected ―k‖ features are used to
find the root node by using the best 7. CONCLUSION
split approach. In the next stage, the This project implements an AI-assisted
daughter nodes are calculated using the prediction system which leverages data
same best split approach. The first 3 stages mining methods to reveal the relationship
until we form the tree with a root node and between the regular physical examination
having the target as the leaf node. Finally, records and the potential health risk given
1 to 4 stages are repeated to by the user or public. Different machine
create ―n‖ randomly created trees. This learning algorithms are applied to predict
randomly created tree forms the random physical status of examinee will be in
forest. danger of physical deterioration next
Partition based Algorithm: year. In this system user or patient search
the hospital, then results are given
Implement the algorithm and test itand according to the nearest location of
instrument the algorithm so that it counts current location of user/patients. User /
the number of comparisons of array Patients gives symptoms and the system
elements. (Don‘t count comparisons will predict the diseases and will give the
between array indices.) Test it to see if the medicines. A feedback mechanism is also
counts ―make sense‖. For values of n from designed for doctors to fix classification
500 to 10000, result or input new training data, and the
system will automatically rerun the

training process to improve performance Trends Database Technol., 2004, pp. 588–596.
every day. [7] Koh H C, Tan G. Data mining applications in
healthcare.[J]. Journal of Healthcare
Information Management Jhim, 2005,
8. ACKNOWLEDGEMENT 19(2):64-72.
The authors would like to thank the [8] Menon A K, Jiang X, Kim J, et al. Detecting
researchers as well as publishers for Inappropriate Access to Electronic Health
making their resources available and Records Using Collaborative Filtering[J].
teachers for their guidance. We are Machine Learning, 2014, 95(1):87-101.
thankful to the authorities of Savitribai [9] Tao Jiang & Siyu Qian, et al. Accreditation
Reports to Identify Risk Factors in Medication
Phule University of Pune and concern
Management in Australian Residential Aged
members of ICINC 2019 conference, Care Homes[J]. Studies in Health Technology
for their constant guidelines and & Informatics, 2017
support. We are also thankful to the [10] Song J H, Venkatesh S S, Conant E A, et
reviewer for their valuable suggestions. al. Comparative analysis of logistic regression
We also thank the college authorities and artificial neural network for computer-
aided diagnosis of breast masses.[J]. Academic
for providing the required infrastructure Radiology, 2005, 12(4):487-95.
and support. Finally, we would like to [11] V. Akgün, E. Erkut, and R. Batta. On
extend a heartfelt gratitude to friends finding dissimilar paths. European Journal of
and family members. Operational Research, 121(2):232–246, 2000.
[12] T. Akiba, T. Hayashi, N. Nori, Y. Iwata,
REFERENCES and Y. Yoshida. Efficient topk shortest-path
[1] Sittig D, Wright A, Osheroff J, et al. ―Grand distance queries on large networks by pruned
challenges in clinical decision support‖. landmark labeling. In Proc. AAAI, pages 2–8,
Journal of Biomedical Informatics, 2008. 2015.
[2] Anderson J E, Chang D C. ―Using Electronic [13] A. Angel and N. Koudas. Efficient
Health Records for Surgical Quality diversity-aware search. In Proc. SIGMOD,
Improvement in the Era of Big Data‖[J]. Jama pages 781–792, 2011. H. Bast, D. Delling, A.
Surgery, 2015. V. Goldberg, M. Müller-Hannemann, T.
Pajor,P. Sanders, D. Wagner, and R. F.
[3] ZhaoqianLan, Guopeng Zhou, YichunDuan ,
Werneck. Route planning in transportation
Wei Yan , ―AI-assisted Prediction on Potential
networks. In Algorithm Engineering, pages
Health Risks with Regular Physical
19–80. 2016.
Examination Records‖, IEEE Transactions On
Knowledge And Data Science ,2018. [14] H. Bast, D. Delling, A. V. Goldberg, M.
Müller-Hannemann, T. Pajor,P. Sanders, D.
[4] Srinivas K, Rani B K, Govrdhan A.
Wagner, and R. F. Werneck. Route planning in
―Applications of Data Mining Techniques in
transportation networks. In Algorithm
Healthcare and Prediction of Heart Attacks‖.
Engineering, pages 19–80. 2016.
International Journal on Computer Science &
Engineering, 2010. [15] Borodin, Allan, Lee, H. Chul, Ye, and
Yuli. Max-sum diversification, monotone
[5] Gheorghe M, Petre R. ―Integrating Data
submodular functions and dynamic updates.
Mining Techniques into Telemedicine
Computer Science, pages 155–166, 2012.
Systems‖ InformaticaEconomica Journal,
2014.
[6] R. Baeza-Yates, C. Hurtado, and M. Mendoza,
―Query recommendation using query logs in
search engines,‖ in Proc. Int. Conf. Current

NATURAL LANGUAGE QUESTION ANSWERING

SYSTEM USING RDF FRAMEWORK
Maruti K. Bandgar1, Avinash H. Jadhav2, Ashwini D. Thombare3, Poornima D. Asundkar4,
Prof.P.P.Patil5
1,2,3,4,5
Pune, India.
marutibandgar1@gmail.com1, avinashjadhav030@gmail.com 2, ashwinithombare1909@gmail.com3,
poornimaasundkar1@gmail.com4, pramod.patil@.sinhgad.edu5
ABSTRACT
To answer a natural language question, the existing work takes a two-stage
approach: question understanding and query evaluation. Their focus is on question
understanding to deal with the disambiguation of the natural language phrases. The
most common technique is the joint disambiguation, which has the exponential search
space. In this paper, we propose a systematic framework to answer natural language
questions over RDF repository (RDF Q/A) from a graph data-driven perspective. We
propose a semantic query graph to model the query intention in the natural language
question in a structural way, based on which, RDF Q/A is reduced to sub graph
matching problem. More importantly, we resolve the ambiguity of natural language
questions at the time when matches of query are found. The cost of disambiguation is
saved if there are no matching found. More specifically, we propose two different
frameworks to build the semantic query graph, one is relation (edge)-first and the other
one is node-first. We compare our method with some state-of-the-art RDF Q/A systems
in the benchmark dataset. Extensive experiments
confirm that our method not only improves the precision but also speeds up query
performance greatly. A typical knowledge-based question answering (KB-QA) system
faces two challenges: one is to transform natural language questions into their
meaning representations (MRs).
Key Words:
RDF,Q/A, N,Q for SPARQL query.
1. INTRODUCTION semantic query information using
The proposed system focus is on knowledge graph best answer, because of
question Understanding to deal with the this we can overcome problems occur in
disambiguation of the natural language real-time application such as Quora and
Used to get disambiguation, which has the Stack Overflow.
exponential search space. We propose a
framework to answer natural language 2 .LITERATURE SURVEY
questions over RDF repository from a [1] ―Knowledge-based question
graph data-driven technique. We propose a answering as machine translation‖
semantic query graph to model the query A typical knowledge-based question
knowledge in the natural language answering (KB-QA) system faces two
question in a structural way, Resource challenges: one is to transform natural
Description Framework Question and language questions into their meaning
answering is main use reduced to sub representations (MRs); the other is to
graph matching problem. More retrieve answers from knowledge bases
importantly, we resolve the ambiguity of (KBs) using generated MRs. Used to
natural language questions at the time presents a translation-based KB-
when matches of query are found. The cost QAmethod that integrates semantic
of disambiguation is saved if there are no parsing and QA in one unified framework
matching found. In our system we use this System faces challenges: To transform

natural language questions into their 3 .GAP ANALYSIS

meaning representations (MRs); [1] Knowledge-based question
[2] ―Robust question answering over answering as machine translation: A
the web of linked data‖ typical knowledge-based question
Knowledge bases and the Web of Linked answering (KB-QA) system faces two
Data have become important assets for challenges: one is to transform natural
search, recommendation, and analytics. language questions into their meaning
Natural-language questions are a user- representations (MRs); the other is to
friendly mode of tapping this wealth of retrieve answers from knowledge bases
knowledge and data .The explosion of (KBs) using generated MRs.
structured data on the Web, translating Remarks
natural-language questions into structured System faces challenges: To transform
queries seems the most intuitive approach. natural language questions into their
[3] In ―A unified framework for meaning representations (MRs);
approximate dictionary-based entity [2] Robust question answering over the
extraction ― web of linked data:
Zhiguo ong, Dictionary-based entity Knowledge bases and the Web of Linked
extraction identifies predefined entities Data have become important assets for
from documents. A recent trend for search, recommendation, and analytics.
improving extraction recall is to support Natural-language questions are a user-
approximate entity extraction, which finds friendly mode of tapping this wealth of
all substrings from documents that knowledge and data.
approximately match entities in a given Remarks
dictionary However question answering technology
[4] In ―Evaluating question answering does not support work robustly in this
over linked data‖. setting as questions have to be translated
The availability of large amounts of open, into structured queries and users have to be
distributed and structured semantic data on careful in phrasing their questions.
the web has no precedent in the history of [3] A unified framework for
computer science. In recent years, there approximate dictionary-based entity
have been important advances in semantic extraction:
search and question answering over RDF Zhiguo ong, Dictionary-based entity
data. extraction identifies predefined entities
The importance‘s of interfaces that bridge from documents. A recent trend for
the gap between the end user and Semantic improving extraction recall is to support
Web data have been widely recognized. approximate entity extraction, which finds
[5] In ―Question answering on freebase all substrings from documents that
via relation extraction and textual approximately match entities in a given
evidence‖ dictionary..
Existing knowledge-based question Remarks
answering There are no evaluations so far that
systems often rely on small annotated systematically evaluate this kind of
training data. While shallow methods like systems, in contrast question answering
relation extraction are robust to data and search interfaces to document spaces.
scarcity, they are less expressive than the [4] Mottac-Evaluating question
deep meaning representation methods like answering over linked data:
semantic parsing, thereby failing at The availability of large amounts of open,
answering questions involving multiple distributed and structured semantic data on
constraints. the web has no precedent in the history of
computer science. In recent years, there

have been important advances in semantic Andersoni. For a relation phrase, ―directed
search and question answering over RDF by‖ also refers to two possible predicates
data. In particular, natural language the directory and h writer. Sometimes a
interfaces to online semantic data have the phrase needs to be mapped to a non-atomic
advantage that they can exploit the structure in knowledge graph. For
expressive power of Semantic Web data example, ―uncle of‖ refers to a predicate
models and query languages, while at the path (see Table 4). In RDF Q/A systems,
same time hiding their complexity from We should eliminate ―the ambiguity of
the user. phrase linking‖. Composition. The task of
Remarks composition is to construct corresponding
There are no evaluations so far that query or query graph by assembling the
systematically evaluate this kind of identified phrases. In the running example,
systems, in contrast to traditional question we know the predicate directory is to
answering and search interfaces to connect subject hfilmi and object hPaul W.
document spaces. S. Andersoni; consequently, we generate a
[5] Question answering on freebase via triple h film, director, Paul W. S.
relation extraction and textual evidence: Andersoni. However, in some cases, it is
Existing knowledge-based question difficult to determine the correct subject
answering systems often rely on small and object for a given predicate, or there
annotated training data. While shallow may exist several possible query graph
methods like relation extraction are robust structures for a given question sentence.
to data scarcity, they are less expressive We call it ―the ambiguity of query graph
than the deep meaning representation structure‖.
methods like semantic parsing, thereby
failing at answering questions involving
multiple constraints. 5. PROPOSED SYSTEM APPROCH:
Remarks This system uses framework to answer
While shallow methods like relation natural language questions over RDF
extraction are robust to data scarcity, they repository from a graph data-driven
are less expressive than the deep meaning technique. A semantic query graph is used
representation methods like semantic to model the query knowledge in the
parsing, thereby failing at answering natural language question in a structural
questions involving multiple constraints. way, Resource Description Framework
This paper is very most useful for our Question and answering is main use
system. It is great paper. Deep Learning is reduced to sub graph matching problem.
used in this Paper. More importantly, the system resolves the
ambiguity of natural language questions at
4. CURRENT SYSTEM the time when matches of query are found.
The Existing system hardness of RDF Q/A The cost of disambiguation is saved if
lies in the ambiguity of unstructured there is no matching found. The system
natural language question sentences. uses this semantic query information using
Generally, there are two main challenges. knowledge graph best answer; because of
Phrase Linking: A natural language this it overcome problems occur in real-
phrase wsi may have several meanings, time application such as Quora and Stack
i.e., wsi correspond to several semantic Overflow.
items in RDF
Graph G. As shown in Figure 1(b), the 6. SYSTEM ARCHITECTURE
entity phrase ―Paul Anderson‖ can map to
three persons hPaul Anderson (actor)i,
hPaul S. Andersoni and hPaul W. S.

In other words, the system combines the

disambiguation and query evaluation in
an uniform process.
REFERENCES
[1] Junwei Bao , Nan Duan, Ming Zhou , Tiejun
Zhao-―Knowledge-based question
answering as machine translation
Baltimore, Maryland, USA, June 23-25 2014.
[2] Mohamed Yahya, Klaus Berberich, Shady
Elbassuoni†, Gerhard Weikum-―Robust
question answering over the web of linked
data‖
[3] Dong Deng ,Guoliang Li · Jianhua Feng ,Yi
Fig.6.1. System Architecture Duan-―A unified framework for
approximate dictionary-based entity
extraction‖ Received: 12 November 2013 /
7. FLOW DIAGRAM Revised: 28 April 2014 / Accepted: 11 July
2014
© Springer-Verlag Berlin Heidelberg 2014
[4] Vanessa Lopeza, Christina Ungerb, Philipp
Cimianob, Enrico ―Mottac-Evaluating
question answering over linked data. ―
[5] Kun Xu, Siva Reddy, Yansong Feng,
Songfang Huang and Dongyan Zhao-
―Question answering on freebase via
relation extraction and textual evidence
―Berlin, Germany, August 7-12, 2016.
[6] W. M. Soon, H. T. Ng, and D. C. Y. Lim, ―A
machine learning approach to co reference
Fig.7.1. Flow Diagram resolution of noun phrases,‖ Comput.
Above diagram shows the actual flow of Linguist. vol. 27, no. 4, pp. 521–544, 2001.
the system. [7] L. Androutsopoulos, Natural Language
Interfaces to Databases – An Introduction,
Journal of Natural Language Engineering
8. CONCLUSION 1 (1995), 29–81
In our system, a graph data-driven [8] V . I. Spitkovsky and A. X. Chang, ―A cross-
framework to answer natural language lingual dictionary for english wikipedia
questions over Resource Description concepts,‖ in Proceedings of the Eighth
International Conference on Language
Framework graphs. Different from
Resources and Evaluation, LREC 2012,
existing work, the ambiguity both of Istanbul, Turkey, May 23-25, 2012, 2012, pp.
phrases and structure in the question 3168–3175.
understanding stage is removed. The [9] C.D.Manning,P.Raghavan,and H.Schütze
system pushes down the disambiguation ,IntroductiontoInformation Retrieval. New
into the query evaluation stage. Based on York: Cambridge University Press, 2008.
[10] N. Nakashole, G. Weikum, and F. M.
the query results over Resource Suchanek, ―Discovering and exploring
Description Framework graphs, we can relations on the web,‖ PVLDB, vol. 5, no.
address the ambiguity issue efficiently. 12, pp. 1982–1985, 2012.

TECHNIQUE FOR MOOD BASED

CLASSIFICATION OF MUSIC BY USING C4.5
CLASSIFIER
Manisha Rakate1, Nandan More2
1,2
Department of Computer Engineering,TSSM‘s BSCOER,Narhe,Pune,
Savitribai Phule Pune University, Pune, 411041, Maharashtra, India.
manisharakate@gmail.com1, nandanforall61@gmail.com2
ABSTRACT
In today‘s rapidly growth in internet, where downloading and purchasing music from
websites are growing intensely. As we know that there is a different relation between
music and human emotions, we are listening the songs according to our mood. There
are number of methods were implemented for selection of music according to mood.
Therefore there is need of method which classifies the music by the human mood. In
this paper propose the system which classifies the moods of distinct types of music. C4.5
classifier is used for the classification. By testing the classification system on various
mood dimensions, we will examine to what extent the linguistic part of music revealed
adequate information for assigning a mood category and which aspects of mood can be
classified best, based on extracted features only.
Keywords— Data mining, mood classification, timbre features, modulation features,
SVM.
1. INTRODUCTION by any human and indeed by some other.
In the past few years, research in Music Expertise tends to be for a particular genre
Information Retrieval has been very of music or for Western music theory,
active. Music information retrieval has much of which does not apply to music
produced automatic classification methods from other parts of the world.
in order of amount of digital music
available. Problem arrived is the automatic Music information retrieval has been
mood classification of music. It consists of focus on automatically extracting
system taking the waveform of a musical information from musical sources. The
piece as a input and outputting text labels. musical source comes in so many formats
It will describe the mood in the music including written score and audio. A
(happy, sad, etc). It has been demonstrated number of machine learning and statistical
that audio-based techniques can achieve to analysis techniques are applied. The field
satisfying results. By using few simple of music information retrieval has
mood categories and checking for reliable discovered features for predicting genre.
agreements between people, automatic Determining key and tempo of music then
classification based on audio features gives distinguishing instruments and analyzing
promising results. Initially psychological the similarity of music, transcribing to
studies have shown Music Technology score from audio and finally eliciting
Group, University at Pompeu Fabra, It is musical information from written scores.
the part of semantic information of songs
The fact is music indeed has an
resides exclusively in the lyrics. Lyrics can
emotional quotient attached with it. It is
contain relevant emotional information
necessary to know what are the intrinsic
which is not included in the audio.
factors are present in music or not and
Music can be concordant or discordant;
which associate it with a particular mood
this is known from the physics of wave
or emotion. Audio features are
propagation, and System which have
mathematical functions calculated over the
emerged across the world reflect this. Very
audio data and describe some unique
discordant sounds are perceived negatively
aspect of that data. In the last few decades

number of features was developed for the features like acoustic-modulation spectral
analysis of audio content. contrast/valley (AMSC/AMSV), acoustic
modulation spectral flatness measure
Amount of work has been (AMSFM), and acoustic-modulation
dedicated to the modeling of relationships spectral crest measure (AMSCM), are after
between music and emotions, including computes from the spectra of each joint
psychology, musicology and music frequency sub-band. The prominent status
information retrieval. Proposed emotion of music in human culture and everyday
models are either the categorical approach life is due in large part to its striking
or the dimensional approach. Categorical ability to elicit emotions. It may have
approaches represent emotions as a set of slight variation in mood to changes in our
categories that are clearly distinct from physical condition and actions.
each other. For an example six basic
emotion categories based on human facial M. Barthet et al. [2] describes study of
expressions of anger, fear, happiness, music and emotions from different
sadness, disgust and surprise. Another disciplines including psychology,
famous categorical approach is Hevner‘s musicology and music information
affective checklist, where eight clusters of retrieval. music information retrieval
affective adjectives were discovered and propose new insights to enhance
laid out in circle, as shown in Fig. 1. Each automated music emotion recognition
cluster includes similar adjectives, and models.
meaning of neighboring clusters varies in a
cumulative way until reaching a contrast in C.-H. Lee et al. [3] proposed an automatic
the opposite position‖ music genre classification approach based
on long-term modulation spectral analysis
In this paper we study about the related of spectral (OSC and MPEG-7 NASE) and
work done, in section II, the proposed cepstral (MFCC) features. Modulation
approach modules description, spectral analysis of every will generates a
mathematical modeling, algorithm and modulation spectrum. All the modulation
experimental setup in section III .and at spectra are collected to form a modulation
final we provide a conclusion in section spectrogram. Which exhibits the time-
IV. varying or rhythmic information of music
signals Each modulation spectrum is then
2. LITERATURE REVIEW decomposed into several logarithmically-
spaced modulation sub-bands. The MSC
In this section discuss the literature and MSV are then computed from each
review in detail about the recommendation modulation sub-band.
system for online social network.
Many short-term long-term modulation Y. Songet al. [4] proposed a collected a
and timbre features are developed for truth data set of 2904 songs, that have been
content-based music classification. There tagged with one of the four words
are two operations in modulation analysis ―happy‖, ―sad‖, ―angry‖ and ―relaxed‖.
are useful for modulation information and Audio is then retrieved from 7Digital.com,
it degrades classification performance. To and by using standard algorithms sets of
deal with this problem, Ren et al. [1] audio features are extracted.
proposed a two-dimensional representation By using support vector machins there are
of acoustic frequency and modulation two classifiers are trained. with the
frequency. It extracts joint acoustic polynomial and radial basis function
frequency and modulation frequency kernels and these are tested with 10-fold
features. Long-term joint frequency cross validation. Results show that spectral

features outperform those based on underlying machine, Processes different

rhythm, dynamics, and, to a lesser extent, audio and visual low and mid level
harmony. features.
Y. Panagak et al. [5] proposed the In paper [8] authors proposed a way in
automatic mood classification problem. By which music can be displayed for the user
resorting the low rank representation of which is based on similarity of the acoustic
slow auditory spectro-temporal features. All songs are in music library
modulations. Recently, If each data class onto a 2D feature space. The user can
is linearly spanned by a subspace of better understand the relationship between
unknown dimensions and the data are the songs, with the distance between each
noiseless. The lowest-rank representation song reflecting its acoustic similarity.
of a set of test vector samples with respect Low-level acoustic features are extracted
to a set of training vector samples has the from the raw audio signals and performing
nature of being both dense for within-class dimension reduction using PCA on the
affinities and almost zero for between- feature space. The proposed approach
class affinities. LRR exacts the avoids the dependence of contextual data
classification of the data, result is Low- called as metadata and collaborative
Rank Representation-based Classification filtering methods. By using song space
(LRRC). The LRRC is compared against visualizer, the user can chose songs or
three well-known classifiers, namely the allow the system to automate the song
Sparse Representations-based Classifier, selection process given a seed song.
SVM and Nearest Neighbor classifiers for
music mood classification by conducting In paper [9] authors proposed a method,
experiments on the MTV and the which considers the various kinds of audio
Soundtracks180 datasets. features. A bin histogramhas been
computed from each feature‘s frame to
In paper [6] authors proposed method save all needed data related with it. The
using cell mixture models to automate the histogram bins are used for calculating the
task of music emotion classification similarity matrix. The number of similarity
Designed system has potential application matrices depends on the number of audio
of both unsupervised and supervised features. There are 59 similarity matrixes.
classification learning. This system is To compute the intra-inter similarity ratio,
acceptable for music mood classification. the intra and inter similarity matrix are
The ICMM is suitable for the music utilized. These similarity ratios are sorted
emotion classification. in descending order in each feature. From
this some of the selected similarity ratios
In paper [7] authors given a technical are ultimately used as prototypes from
solution for automated slideshow each feature. Further used for
generation by extracting a set of high-level classification by designing the nearest
features from music Like beat grid, mood, multi-prototype classifier.
genre and intelligently combining this set
with image high-level features. For In paper [10] authors proposed a self-
example, the user request the system to colored music mood segmentation and a
automatically create a slideshow, which hierarchical framework based on new
plays soft music and shows pictures with mood taxonomy model to automate the
sunsets from the last 10 years of his own task of multi-label music mood
photo collection. The high-level feature classification. The taxonomy model
extraction the audio and visual combines Thayer‘s 2D models . Schubert‘s
information which is based on the same Updated Hevner adjective Model (UHM)

to mitigate the probability of error causing

by classifying upon maximally 4 class
classification from 9. The verse and chorus
parts approximately 50 to 110 sec of the
whole songs is exerted manually as input
music trims in this system. The extracted
feature sets from these segmented music
pieces are ready to inject the FSVM for
classification.
3. PROPOSED APPROACH
Proposed System Overview
We implement a feature set for music
mood classification, which combine
modulation spectral analysis of MFCC,
OSC, and SFM/SCM and statistical
descriptors of short-term timbre features.
By employing these features for SVMs,
our submission to the audio mood Figure 1.Proposed System Architecture
classification task was ranked #1. In fact,
Mathematical Model
the submission outperformed all the other
submissions of the task from 2008 to2014, For a joint acoustic-modulation
indicating the superiority of the proposed spectrogram, we can compute four joint
feature sets. Moreover, based on a part of frequency features, namely AMSC,
the aforementioned feature sets, we have AMSV, AMSFM, and AMSCM, and each
also proposed another new feature set that of them is a matrix of size AXB.
combines the newly proposed joint
frequency features (including AMSP and AMSV
AMSC/AMSV and AMSFM/AMSCM), For each joint acoustic modulation
together with the modulation spectral frequency sub band, we compute the
analysis of MFCC, and statistical acoustic-modulation spectral peak (AMSP)
descriptors of short-term timbre features. and the acoustic-modulation spectral
Experiments conducted on Raga Music valley (AMSV) as follows:
Dataset. Explore the possibility of using
dimensionality reduction techniques to
extract a compact feature set that can
achieve equal or better performance
For simplicity, we can assume Sa,b is a

descending sorted vector in which Sa,b [i]
is the i-th element of Sa, b, Na, b is the
total number of elements in Sa, b, and αis
a neighborhood factor identical to that
used in computing OSC.
AMSC:

The difference between AMSP and Following figure 2 shows the time
AMSV, denoted as AMSC (acoustic- comparison graph of the proposed system
modulation spectral contrast), can be used with the existing system. Graph is plot by
to reflect the spectral contrast over a joint using the above table.
frequency sub band:
AMSFM
To measure the noisiness and sinusoidality
of the modulation spectra, we further
define the acoustic modulation spectral
flatness measure (AMSFM) as the ratio of
the geometric mean to the arithmetic mean
of the modulation spectra within a joint
frequency sub band: Fig. 2: Time Graph
In table 2 shows the memory required by
the proposed system using C4.5 and
existing system using KNN classification.
The following table shows that the
AMSCM memory consumed by existing system is
The acoustic modulation spectral crest more than the memory consumed by the
measure (AMSCM) can be defined as the proposed system.
ratio of the maximum to the arithmetic Table 2: Memory Comparison for clustering
mean of the modulation spectra within a System Memory
joint frequency subband, Required
Existing system With 2500 kb
KNN
Proposed System with 1800 kb
C4.5
4. RESULTS AND DISCUSSION Following figure 3 shows the memory

comparison graph of the proposed system
A. Expected Result with the existing system.
In this section discussed the experimental
result of the proposed system.
In table 1 shows the time required for the
music mood classification for proposed
system by using C4.5 and existing system
by using KNN. From the table it shows
that time required by C4.5 classification is
less than the time required by the KNN.
Table 1: Time Comparison for Clustering
System Time
Required Fig 3: Memory Graph
Existing system with 1500 ms
KNN 5. CONCLUSION AND FUTURE
Proposed System with 900 ms SCOPE
C4.5 .We found that two operations (which
compute the representative feature

spectrogram and the mean and standard [3] C.-H. Lee, J.-L. Shih, K.-M. Yu, and H.-S.
deviation of the MSC/MSV matrices) in Lin, ―Automatic music genre classification
the modulation spectral analysis of short based on modulation spectral analysis of
spectral and cepstral features.‖ IEEE
term timbre features are likely to smooth Transactions on Multimedia, vol. 11, no. 4, pp.
out useful modulation information, so we 670–682, 2009.
propose the use of a joint frequency [4] Y. Song, S. Dixon, and M. Pearce, ―Evaluation
representation of an entire music clip to of musical features for emotion classification,‖
extract joint frequency features. These in Proceedings of the 13th International
joint frequency features, including Society for Music Information Retrieval
Conference, Porto, Portugal, October 8-12
acoustic-modulation spectral 2012, pp. 523–528.
contrast/valley, acoustic-modulation [5] Y. Panagakis and C. Kotropoulos, ―Automatic
spectral flatness measure and acoustic- music mood classification via low-rank
modulation spectral crest measure, representation,‖ in Proc, 2011, pp. 689–693,
outperform the modulation spectral 2010.
analysis of OSC and SFM/SCM in Raga [6] X. Sun and Y. Tang, "Automatic Music
Emotion Classification Using a New
Music datasets by small margins. The Classification Algorithm," Computational
advantage of the proposed features is that Intelligence and Design, 2009. ISCID '09.
they can have a better discriminative Second International Symposium on,
power due their operation on the entire Changsha, 2009, pp. 540-542.
music, with no averaging over the local [7] P. Dunker, C. Dittmar, A. Begau, S. Nowak
modulation features. Extracted features are and M. Gruhne, "Semantic High-Level
Features for Automated Cross-Modal
used for classification of test music files Slideshow Generation," 2009 Seventh
according to the mood softest files. For International Workshop on Content-Based
classification C4.5 classifier is used. Multimedia Indexing, Chania, 2009, pp. 144-
System can be enhanced with mood 149.
classification in music videos. We will [8] M. S. Y. Aw, C. S. Lim and A. W. H. Khong,
also apply these features to multi-label "SmartDJ: An interactive music player for
music discovery by similarity comparison,"
tasks such as auto tagging and tag-based Signal and Information Processing Association
retrieval. Annual Summit and Conference (APSIPA),
2013 Asia-Pacific, Kaohsiung, 2013, pp. 1-5.
REFERENCES [9] B. K. Baniya, ChoongSeon Hong and J. Lee,
[1] Ren, Jia-Min, Ming-Ju Wu, and Jyh-Shing "Nearest multi-prototype based music mood
Roger Jang. "Automatic music mood classification," Computer and Information
classification based on timbre and modulation Science (ICIS), 2015 IEEE/ACIS 14th
features." IEEE Transactions on Affective International Conference on, Las Vegas, NV,
Computing 6.3 (2015): 236-246. 2015, pp. 303-306.
[2] M. Barthet, G. Fazekas, and M. Sandler, [10] E. E. P. Myint and M. Pwint, "An
―Multidisciplinary perspectives on music approach for mulit-label music mood
emotion recognition: recommendations for classification," Signal Processing Systems
content- and context-based models.‖ Proc. (ICSPS), 2010 2nd International Conference
CMMR, pp. 492–507, 2012. on, Dalian, 2010, pp. V1-290-V1-294.

SUPER MARKET ASSISTANT WITH MARKET

BASKET AND INVENTORY ANALYTICS
Aditya Kiran Potdar1, Atharv Subhash Chitre2, Manisha Dhalaram Jongra3, Prasad Vijay
Kudale4, Prema S. Desai5
1,2,3,4,5
adityapotdar6@gmail.com1, atharvchitre27@gmail.com2, manishajongara@gmail.com3,
prasadkudale0002@gmail.com4, psdesai@sinhgad.edu5
ABSTRACT
The goal of the system is to help know the user whether a product he intends to buy is
available within a particular shop by using a certain set of algorithms which will assist
in letting him know the details of product such as price, quantity etc. and would let him
know which are the items frequently bought along with the product which the user
wants. The system would also assist the wholesaler to know the demand of the super
market using inventory analysis and forecasting
Keywords-
OCR, Apriori, FP, AI, Customer, Buyer, Wholesaler
1. INTRODUCTION
In today‘s modern world Artificial 1.2 Objective
Intelligence is a boom, it‘s a head start for To eradicate the day to day problems
people who are searching to have a of buying products by having an
development in this field. For AI to work Artificially intelligent device that helps
on developing substance machine learning solve the customers to keep knowing that
can be a glory towards it. By using their right product is available in that shop
machine learning algorithms brings easy or not. This not only helps to know the
access towards the development. An commodities are present or not but also it
application can be made by such has the factor to detect how much value of
combinations which brings a growth in the product.
this section. A modern application is being
made by using technology of AI as well as
machine learning which shows the modern 2. MOTIVATION
problems of waiting in the queue for The actual motivation is to help the
shopping. This application helps customer person find the right product in his or her
to just feed his way of subjects without choice and it helps to know the right cost
standing for the queue and checks the estimation of the product. It helps to find
product which is available or not. the availability of the product before
entering the market as the person has the
1.1 Problem Definition right intelligent device. This solves the
To develop an android application problem of the people to know that they
based on Machine Learning Image are purchasing right number of
recognition which would help customer to commodities to their usage of day to day
check whether a product is available in a activities.
particular shop by scanning the shops
name plate and proceed to buy that 3. STATE OF ART
product, depending upon the availability of According to the current market
that product. For the dealers also, who scenario, there are not many applications
provide goods for the super market they available that searches the product which
can scan the necessary requirements and customer wants to buy, from standing
then the dealer can get demand of their outside the shop. Currently no such
resources very easily. realistic application has paved its way into

the market. Such situations tend to create  It constructs conditional frequent

problems into the buyer‘s head. Every pattern tree and conditional pattern
person in any country would have the base from database which satisfy
tendency to shop online from outside the minimum support.
big shops as it would not waste his/her  FP Growth uses a depth first
time. So, there is lack in this type of field. search.
When just a formal inspection was done  FP Growth utilizes a pattern-
that how the actual operation could be growth approach means that, it
carried out, a lot of meaningful ideas has only considers patterns actually
flown through the mind that how can the existing in the database.
user benefit from this. A rough estimation  Runtime increases linearly,
has that people avoid stepping in shops depending on the number of
after seeing large traffic outside. So, by transactions and items.
having all these calculations and point of  Data are very interdependent; each
views, this project tries to curb all these node needs the root.
challenges which come into the mind of  It requires less memory space due
people very easily. To classify these to compact structure and no
scenarios a developing application is done candidate generation.
using machine learning algorithms. By this  It scans the database only twice for
means a person can check that the product constructing frequent pattern tree.
which he tends to buy is available in that
particular shop or not. He can further
classify that the rate is too high or which
5. PROPOSED WORK
company brand he wants, which is totally Proposed System:
dependent on the choice of the customer.
4. GAP ANALYSIS
Apriori Algorithm:
 It is an array-based algorithm.
 It uses Join and Prune technique.
 Apriori uses breadth first search.
 Apriori utilizes a level wise
approach where it generates
patterns containing 1 item, then 2
items, then 3 items, and so on.
 Candidate generation is extremely
slow. Runtime increases Figure 1:-System Architecture
exponentially depending on the
number of different items. OCR:
 Candidate generation is very OCR (optical character
parallelizable. recognition) is the recognition of
 It requires large memory space due printed or written text characters by
to large number of candidate a computer. This involves photo
generation. scanning of the text character-by-
 It scans the database multiple times character, analysis of the scanned-
for generating candidate sets. in image, and then translation of
the character image into character
FP Growth Algorithm: codes, such as ASCII, commonly
 It is a tree-based algorithm. used in data processing.

FP Growth Algorithm: count initialized to 1, its parent

The FP-Growth Algorithm is an link linked to T, and its node-
alternative way to find frequent link linked to the nodes with
itemset without using candidate the same item-name via the
generations, thus improving node-link structure. If P is
performance. For so much it uses a nonempty, call insert tree(P, N
divide-and-conquer strategy. The ) recursively.
core of this method is the usage of
a special data structure named By using this algorithm, the FP-tree is
frequent-pattern tree (FP-tree), constructed in two scans of the database.
which retains the item set The first scan collects and sort the set of
association information. frequent items, and the second constructs
the FP-Tree.
FP-Tree structure: FP-Growth Algorithm: -
The frequent-pattern tree (FP-tree)
is a compact structure that stores After constructing the FP-Tree it‘s
quantitative information about possible to mine it to find the complete set
frequent patterns in a database of frequent patterns. To accomplish this
job, Han in [1] presents a group of lemmas
Algorithm 1: FP-tree construction: and properties, and thereafter describes the
Input: A transaction database DB and a FP-Growth Algorithm as presented below
minimum support threshold? in Algorithm 2.
Output: FP-tree, the frequent-pattern tree
of DB. Algorithm 2: FP-Growth:
Method: The FP-tree is constructed as Input: A database DB, represented by FP-
follows. tree constructed according to Algorithm 1,
1. Scan the transaction database DB and a minimum support threshold?
once. Collect F, the set of frequent Output: The complete set of frequent
items, and the support of each patterns.
frequent item. Sort F in support- Method:
descending order as FList, the list call FP-growth(FP-tree, null).
of frequent items. Procedure FP-growth(Tree, a)
2. Create the root of an FP-tree, T, {
and label it as ―null‖. For each if Tree contains a single prefix path
transaction Trans in DB do the then
following: { // Mining single prefix-path FP-
 Select the frequent items in tree
Trans and sort them according let P be the single prefix-
to the order of FList. Let the path part of Tree;
sorted frequent-item list in
Trans be [ p | P], where p is the let Q be the multipath part
first element and P is the with the top branching node
remaining list. Call insert tree([ replaced by a null root;
p | P], T).
 The function insert tree([ p | P], for each combination
T) is performed as follows. If T (denoted as ß) of the nodes
has a child N such that N.item- in the path P do
name = p.item-name, then
increment N ‘s count by 1; else
create a new node N, with its

generate pattern ß ∪ a with  Light gbm is used to increase the

support = minimum support accuracy of prediction.
of nodes in ß;  Light gbm supports parallel
computing and GPU learning.
let freq pattern set(P) be the  Training speed is faster as well as
set of patterns so generated; efficient.
}
else let Q be Tree; 6. CONCLUSION AND FUTURE
for each item ai in Q do WORK
{ // Mining multipath FP-tree With the help of this application we can
Generate pattern ß = ai ∪ a help the customer to find the exact
with support = ai .support; commodity which he intends to check in
the early stages of its cycle. He can then
construct ß‘s conditional check on many products as he wishes.
pattern-base and then ß‘s The algorithms implemented helps to
conditional FP-tree Tree ß; make tasks easy for the developer as he
can make on updates when a new stage of
if Tree ß ≠ Ø then implementation flashes his mind We have
call FP-growth(Tree ß , ß); also discussed that how this app could help
shopkeepers to improve their production.
let freq pattern set(Q) be the
set of patterns so generated; The big shopping queue which is created
} in front of stores can be reduced to an
return(freq pattern set(P) ∪ freq extent. This can help to save more time by
pattern set(Q) ∪ (freq pattern set(P) not standing in queues and our country
× freq pattern set(Q))) will see a tremendous growth by this
} project.
When the FP-tree contains a single prefix- We are hoping to build upon a payment
path, the complete set of frequent patterns gateway system to the application so as it
can be generated in three parts: the single will be convenient for the user to buy the
prefix-path P, the multipath Q, and their commodities.
combinations (lines 01 to 03 and 14). The
resulting patterns for a single prefix path REFERENCES
are the enumerations of its sub paths that [1] Mindpapers Bibliography on the Philosophy of
have the minimum support (lines 04 to AI (Compiled by David Chalmers) People with
06). Thereafter, the multipath Q is defined Online Papers in Philosophy of AI (Compiled
by David Chalmers)
(line 03 or 07) and the resulting patterns [2] Philosophy and Artificial Intelligence
from it are processed (lines 08 to 13). (Association for the Advancement of Artificial
Finally, in line 14 the combined results are Intelligence) From Russell and Norvig
returned as the frequent patterns found. [3] Copeland, Jack Artificial Intelligence: A
Philosophical Introduction Blackwell 1993 An
excellent and engaging discussion of the
Light Gradient Boosting Machine: philosophical issues surrounding AI.
 Light GBM is a gradient boosting [4] The role of Apriori algorithm for finding the
framework that uses tree-based learning association rules in Data mining
algorithm. [5] Trending topic prediction by optimizing K-
nearest neighbor algorithm (Jugendra Dongre;
 The tree in light gbm grows Gend Lai Prajapati; S. V. Tokekar)
vertically as compared to other algorithms [6] https://cloud.google.com/vision/docs/ocr
which grows horizontally. (Syafruddin Syarif; Anwar; Dewiani)

[7] An Implementation of FP-growth Algorithm [11] Improving OCR performance with

(ACM, 2005) background image elimination (2015)
[8] An Efficient Frequent Patterns Mining [12] Android Based Home Security Systems
Algorithm Based on Apriori Algorithm and the Using Internet of Things (IoT) and
FP-Tree Structure (IEEE, 2008) Firebase (2018)
[9] An empirical analysis and comparison of [13] Optical Character Recognition (OCR)
Apriori and FP- growth algorithm for frequent Performance in Server-Based Mobile
pattern mining (IEEE, 2015) Environment (2013)
[10] OCR Engine to Extract Food-Items, Prices, [14] OCR++: A Robust Framework for
Quantity, Units from Receipt Images, Information Extraction from Scholarly
Heuristics Rules Based Approach (2017) Articles (2016)

ANALYSIS AND PREDICTION OF ENVIRONMENT

NEAR A PUBLIC PLACE
Bhagyesh Pandey1, Rahul Bhati2, Ajay Kuchanur3, Darshan Jain4, S.P. Kosbatwar5
1,2,3,4
Student, Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune,
India.
5
Assistant Professor, Department of Computer Engineering, Smt. Kashibai Navale College of Engineering,
Pune, India.
pandey.bhagyesh@gmail.com1, rahulbhati.vijay@gmail.com2, arkuchanur@gmail.com3,
darshanjain5684@gmail.com4, spkosbatwar@sinhgad.edu5
ABSTRACT
Background/Objectives: To forecast weather, this is one of the greatest challenges in
the meteorological department. Weather prediction is necessary so as to inform people
and prepare them in advance about the current and upcoming weather condition. This
helps in reduction in loss of human life and loss of resources and minimizing the
mitigation steps that are expected to be taken after a natural disaster occurs.
Methods/Statistical analysis: This study makes mention of various techniques and
algorithms that are likely to be chosen for weather prediction and highlights the
performance analysis of these algorithms. Various other ensemble techniques are also
discussed that are used to boost the performance of the application. Findings: After a
comparison between the data mining algorithms and the corresponding ensemble
technique used to boost the performance, a classifier is obtained that will be further
used to predict the weather. Applications: Used to Predict and forecast the weather
condition of a specific region based on the available prehistorical data which helps to
save resources and prepare for the changes forthcoming years. The user can fix this
system anywhere without memorizing its location. Location is continuously updated on
the Android app.
Keywords: - Data Mining, Decision Tree, Pre-Processing, Weather Prediction
1. INTRODUCTION this, we are analyzing the data of
The environment monitoring system is a temperature, air quality, and sound level to
system that is capable of measuring avoid spreading of diseases. A healthy
several environmental parameters like person should not get affected hence we
temperature, humidity, pressure, are using swachh collector the proposed
illumination, and quantity of gasses like IOT device. This device will installed in
LPG etc. These parameters are important hospital premises and sensor of swachh
in many applications like in industry, collector we will get data about the present
smart homes Greenhouse and weather environment and that data will be analyzed
forecasting. Advanced Environment through algorithm and important
monitoring systems offer many features information is extracted in the required
like remote access to the measurement format, the extracted information is
data and also can initiate some control mapped with IMD database for data
action from a distant location. These accuracy.
systems use Wireless Sensor Networks for
sensing the environment parameters. 2. LITERATURE SURVEY
Wireless Sensor Network (WSN) has By using embedded intelligence into the
sensors to sense the physical parameters environment makes the environment
and they are interconnected wirelessly to interactive with other objectives, this is
exchange information. They have a central one of the application that smart
monitoring system that is connected to the environment targets. Human needs
internet to access the data remotely. In demand different types of monitoring

systems these depend on the type of data proposed architecture having innovative
gathered by the sensor devices. Event mesh network will be a more efficient way
Detection based and Spatial Process of gathering data from the nodes of WSN.
Estimation are the two categories to which It will have lots of benefits with respect to
applications are classified. Initially, the the future concept of Smart Cities that will
sensor devices are deployed in the have the new technologies related to the
environment to detect the parameters (e.g., Internet of Things. [2] Temperature and
Temperature, Humidity, Pressure, LDR, relative humidity play an important role in
noise, CO and radiation levels etc.) while the lifecycle of the plants. When plants
the data acquisition, computation and have the right humidity they thrive
controlling action (e.g., the variations in because they open their pores completely
the noise and CO levels with respect to the and so breathe deeply without the threat of
specified levels). Sensor devices are excessive water loss. Wireless sensor
placed at different locations to collect the network (WSN) has revolutionized the
data to predict the behavior of a particular field of monitoring and remote sensing.
area of interest. The main aim of this paper Wireless sensor network or wireless sensor
is to design and implement an effective & actuator network (WSAN) are spatially
monitoring system through which the distributed sensors to monitor physical or
required parameters are monitored environmental conditions such as
remotely using the internet and the data temperature, humidity, fire etc. and to
gathered from the sensors are stored in the cooperatively pass their data through the
cloud and to project the estimated trend on network to the main location. The aim of
the web browser[1]. this paper is to design and develop a
With the progression of advancements in system which fulfills all above
technology, several innovations have been requirements. In this paper, digital
made in the field of communications that humidity temperature composite (DHT11)
are transiting to the Internet of Things. In sensor is used to sense the environmental
this domain, Wireless Sensor Networks temperature and Relative Humidity.
(WSN) is one of those independent Arduino microcontroller is used to make
sensing devices to monitor physical and the complex computation of the
environmental conditions along with parameters and then to transmit the data
thousands of applications in other fields. wirelessly by using ZigBee S2 module to
As air pollution is a major environmental the receiver. At receiver section, ZigBee
change that causes many hazardous effects S2 module is used to capture the serial
on human beings that need to be data, which is transmitted, by the
controlled. Hence, we deployed WSN transmitter and using Digi's XCTU
nodes for constant monitoring of air software the data is logged onto PC. [3]
pollution around the city and the moving This paper uses the ZigBee CC2530
public transport buses and cars. This development platform applied to various
methodology gave us the monitoring data types of sensors developed for
from the stationary nodes deployed in the environmental monitoring systems to
city to the mobile nodes on Public enhance multi-Sensor wireless signals
Transport buses and cars. The data of the aggregation via multi-bit decision fusion.
air pollution particles such as gases, ZigBee is a short-range wireless
smoke, and other pollutants are collected transmission standard IEEE 802.15.4-
via sensors on the Public transport buses based, formulated by the ZigBee Alliance
and the data is being analyzed when the ZigBee protocol. It is low cost, low power
buses and cars reach back to the source consumption, and short-distance
destination after passing through the transmission at a transmission rate of 250k
stationary nodes around the city. Our bps for wireless sensor networks. Its main

applications include temperature, humidity all activities or to check data at any time.
and other types of data monitoring, factory GUI can be designed using Python,
automation, home automation, remote HTML, CSS or any other language.
monitoring, and home device control.[4] Depending on sensor types, various
The concern of better quality agricultural monitoring services can be designed. To
products from the consumers made the monitor and control services or action we
farmers adapt to the latest agricultural can use the Internet. Data acquired by
techniques by implementing modern sensors can be transferred over the
technologies for producing better network by using a web server or by using
agricultural products. Among the some SMS service. To provide energy, the
important things which are taken into battery cell can be used [6]. Wireless
consideration by the farmers are the sensor networks have been a big promise
qualities of agricultural land, weather during the last few years, but a lack of real
conditions etc. Traditional farming applications makes difficult the
involves human labor. With proper data, establishment of this technology. This
the farmer will be able to deliver the paper reviews the wireless sensor network
quality product to the consumer. In this applications which focus mainly on the
paper, we have discussed monitoring of environmental monitoring system. These
agriculture parameter using soil moisture systems have low power consumption, low
level sensor, Wireless technology. We cost and are a convenient way to control
update the parameter result from the real-time monitoring. Moreover, it can also
sensor node data is transferred to the be applied to indoor living monitoring,
wireless transceiver to another end server greenhouse monitoring, climate
PC. From the PC, then after that values are monitoring, and forest monitoring. These
analyzed and some predicate is applied to approaches have been proved to be an
it. If they give a positive response then alternative way to replace the conventional
there will continuous monitoring but if it method that uses men force to monitor the
shows negative then it will provide a total environment and improves the
farming solution and cultivation plan. It performance, robustness, and provides
also sends these all solution to farmers or efficiency in the monitoring system.
user via SMS to them in their regional Monitoring the museum's environment for
languages [5]. preventive conservation of art purposes is
The environment monitoring system, in one major concern to all museums. In
general, is used to monitor various order to properly conserve the artwork, it
environmental parameters with the help of is critical to continuously measure some
the sensor. Some communication media, parameters, such as temperature, relative
like Wireless Communication, is needed to humidity, light and, also, pollutants, either
transfer sensor data. An environment in storage or exhibition rooms. The
parameter can be temperature, pressure, deployment of a Wireless Sensor Network
humidity, GPS location, or an Image. We in a museum can help to implement these
can design a system to monitor all or any measurements in real-time, continuously,
of these parameters as and when required. and in a much easier and cheap way. In
For monitoring purpose, we need to install this paper, we present the first testbed
some sensors on each node. A node will deployed in a Contemporary Art Museum,
interact with the sensor and will transfer located in Madeira Island, Portugal, and
that data to the controlling unit. A the preliminary results of these
controller will receive data from each node experiments. On the other hand, we
and can take action depending on propose a new wireless sensor node that
programming done. The user can use offers some advantages when compared
Graphical User Interface (GUI) to manage with several commercially available

solutions. Furthermore, we present a Sensors: Sensors are used to gather the

system that automatically controls the information from the surrounding and send
dehumidifying devices, maintaining the it to the Raspberry Pi for processing and
humidity at more constant levels. A smart gathering of data.
environment can be defined as sensor- Raspberry Pi: In Raspberry Pi, the
enabled and networked devices that work information which is gathered from the
continuously and collaboratively to make sensor is used for processing and storage
lives of inhabitants more comfortable. In of data
this paper, we discuss a unified Android App: The Processed Data from
signaling/sensing and communication the Raspberry Pi is sent to the Cloud.
platform known as Wireless Sensor From the cloud, the data can be accessed
Network (WSN) for smart environment by the clients using Android App
monitoring. WSN is one of the fastest Data from CO sensor, temperature and
emerging technologies that can be utilized humidity sensor is collected by Raspberry
for monitoring our cities and living which describes CO level, temperature,
environments. The proposed paradigm can and humidity of that place. This data is
set a platform to continuously monitor the stored in a MYSQL database. The user can
levels of large quantities of pollutants and access this data through the Android app.
environmental parameters in both land and The user can place this system at any
sea in a more efficient smarter way. The place; GPS will update location on
paper proposes a framework concerned Android App.
with protecting and improving the
environmental quality using WSN. Among
the issues that the paper elaborate on are
the types of sensors, sensor power
systems, data communication, networking
standards, and decision capabilities. In the
course of earth evolution, there has been
significant development related to human
race. However, right from the Stone Age
to the mobile age, the development is with Fig 1 :Architecture of the proposed system
respect to human beings only, his progress Classification in data mining differentiates
for making a comfortable life. Technology the parameters to view the clear
can help the animals and plants for their information. We will be using Decision
identification, monitoring and studying tree and k-means clustering algorithm in
their behavior pattern. Use of technology our project. Decision tree and k-means
for Wildlife monitoring is a boon provided clustering algorithm seem to be good at
by the advances in the research; however predicting the weather with higher
extensive use of it may prove as a accuracy than the other techniques of data
hindrance to the animal behavior. The data mining. K-means clustering and decision
gathered by Wildlife monitoring can be tree building process are implementations
used for a number of purposes viz that the stored data about past measures
visualization, analysis, interpretation, can be used for the future ones.
prediction etc using various algorithms and
tools. The paper is designed to study the 4. CONCLUSION
role of Information technology and study The Internet of Things concept arises from
various tools and strategies for their the need to manage, automate, and explore
efficient Habitat monitoring. all devices, instruments, and sensors in the
world. In order to make wise decisions
3. PROPOSED SYSTEM both for people and for the things in IoT,

data mining technologies are integrated data mining system area. Based on the
with IoT technologies for decision making survey of the current research, a suggested
support and system optimization. Data big data mining system is proposed.
mining involves discovering novel,
interesting, and potentially useful patterns REFERENCES
from data and applying algorithms to the [1] Edward N. Lorenz ―Dynamical And Empirical
extraction of hidden information. In this Methods Of Weather Forecasting‖
Massachusetts Institute Of
paper, we survey the data mining in 3 Technology.pp.423-429,2014
different views: knowledge view, [2] Mathur, S., and A. Paras. "Simple weather
technique view, an application view. In the forecasting model using mathematical
knowledge view, we review classification, regression." Indian Res J Exten Educ: Special1
clustering, association analysis, time series (2012).
[3] Monika Sharma, Lini Mathew, Chatterji s.
analysis, and outlier analysis. In the "Weather Forecasting using Soft Computing
application view, we review the typical and Statistical Techniques". IJAREEIE. Vol.3,
data mining application, including e- Issue 7,pp.122-131
commerce, industry, health care, and [4] Sohn T., Lee J.H., Lee S.H., and Ryu,
public service. The technique view is "Statistical prediction of heavy rain in South
Korea" Advances in Atmospheric Sciences,
discussed with knowledge view and Vol. 22, 2015. pp.365-372
application view. Nowadays, big data is a [5] Kannan, M. Prabhakaran S. and
hot topic for data mining and IoT; we also Ramachandran, P. ―Rainfall forecasting using
discuss the new characteristics of big data data mining technique‖. International Journal
and analyze the challenges in data of Engineering and Technology, Vol. 2, No. 6,
pp. 397-401, 2014.
extracting, data mining algorithms, and

SECURE CLOUD LOG FOR CYBER FORENSICS

Dr V.V.Kimbahune1, Punam Shivaji Chavan 2, Priyanka Uttam Linge3, Pawan Bhutani3
1,2,3
Department of Computer Engineering, Smt Kashibai Navale College of Engineering, Vadgaon(Bk0,
Pune, India.
6
vinodkimbahune01@gmail.com1, chavan.punam96@gmail.com2, priyankalinge22@gmail.com3,
pvbhutani@gmail.com
ABSTRACT
The widespread use of online social networks (OSNs) to disseminate information and
exchange opinion, by the public, media and political as well as actors, has enabled
some avenues of research in political science. In this paper, we study the problem of
quantifying political leaning of users. We formulate political leaning inference as a
convex optimization problem that incorporates two ideas where users are tweeting and
retweeting about political issues, and other similar users tend to be retweeted by
similar audiences. We then apply our inference technique to election-related tweets
collected in eight months during the 2012. On a dataset of frequently retweeted tweets,
our technique achieves 90% to 94% accuracy. By studying the political leaning of 1,000
frequently retweeted sources, 232,000 ordinary users who retweeted them, and the
hashtags used by these sources, our quantitative study sheds light on the political
demographics of the Twitter population, and the temporal dynamics of political
polarization as events unfold.
Keywords- Twitter , Tweet , Retweet, Dataset
1. INTRODUCTION social graph constrains the propagation of
One of the largest social networks information. This problem is important
with more than 500 million registered because its answer will unveil the
accounts is the Twitter . However, it highways used by the owns of information.
differs from other large social networks, To achieve this goal, we need to overcome
such as Facebook and Google+, because it two challenges. First, we need an up-to
uses exclusively arcs among date and complete social graph. The most
accounts1.Therefore, the way information recent publicly available Twitter datasets
propagates on Twitter is close to how are from 2009, at that time Twitter was 10
information propagates in real life. Indeed, times smaller than in July 2012. More-
real life communications are characterized over, these datasets are not exhaustive,
by a high asymmetry between information thus some subtle properties may not be
producers (such as media, celebrities, etc.) visible. Second, we need a methodology
and content consumers. Consequently, revealing the underlying social
understanding how information propagates relationships among users, a methodology
on Twitter has implications beyond that scales for hundreds of millions of
computer science. However, studying accounts and tens of billions of arcs.
information propagation on a large social Standard aggregate graph metrics such as
network is a complex task. Indeed, degree distribution are of no help because
information propagation is a combination we need to identify the highways of the
of two phenomena. First, the content of the graph followed by messages. Our study
messages sent on the social network will has a number of implications. (a) From a
determine its chance to be relayed. modeling perspective, we see evidence
Second, the structure of the social graph that tweeting and retweeting are indeed
will constrain the propagation of consistent, and this observation can be
messages. In this paper, we specifically applied to develop new models and
focus on how the structure of the Twitter algorithms. (b) From an application

perspective, besides election prediction, controversial issues in the Twittersphere.

our method can be applied for other Therefore, we need a methodology to both
purposes, such as building an automated reduce the social graph and keep its main
tweet aggregator that samples tweets from structure.
opposite sides of the political spectrum to
provide users with a balanced view of 2. LITERATURE SURVE
Table: Literature Survey
Sr. Paper Name Dataset Year Technology
No Used
1 Studying Social Collected data of Twitter 2014 Crawling
Networks at after 2009 Methodology(
Scale:Macroscopic Used Twitter
Anatomy of the Twitter REST API to
Social Graph crawl the
data from
Twitter )
2 Birds of the Same Focus on six countries 2012 Twitter API
Feather Tweet Together. where high-quality they obtained
Bayesian Ideal Point ideology measures are the entire list
Estimation Using Twitter available for a subset of of followers
Data all Twitter users: the US,
the UK, Spain, Germany,
Italy, and the
Netherlands
3 What‘s in Your Tweets? Dataset formed of 2016 Bayesian
I Know Who You collected messages from Classification
Supported in the UK Twitter related to the
2010 General Election 2010 UK general
election which took
place on May
6th, 2010.
4 Political Tendency Collected data of Twitter 2014 SVM based
Identification in Twitter after 2013 algorithm
using Sentiment
Analysis Techniques
3. PROBLEM STATEMENT post will be deleted automatically by

Using data mining we are going checking the dataset. Later according to
to collect all the data from the system the positive and negative comments we
and are going to categorized them will get the feedback count through
according to various fields like sports, which graph will be generated.
bollywood, politics etc. The collected According to the graph been generated
data from system will be in raw data a voter can easily decide to whom they
from that raw data we will get all the have to vote to choose the best
information of fields like political. government.
When someone is posting some of data
and if the data posted is vulgar then that 4. PROJECT SCOPE

Now a day's twitter is a source B received only 660 retweets. The

popular social media, where everyone situation reversed with Republicans
is having a account like Bollywood, enthusiastically retweeting. This example
Sportsperson, Political Party. Now we provides two hints: (a) The number of
are just focusing on political account, retweets received by a tweeter (the two
like that we can Predicts as well as media sources) during an event can be a
match score according to people review signal of its political leaning. In particular,
on the pitch or movies box office one would expect a politically inclined
collection prediction and so on. This tweeter to receive more retweets during an
review or graphs will be helpful for event favorable to the candidate it
peoples to choose the decisions at the supports. (b) The action of retweeting
time of elections. carries implicit sentiment of the retweeter.
This is true even if the original tweet does
5. SYSTEM ARCHITECTURE not carry any sentiment itself.
7. CONCLUSION
Motivated by the election
prediction problem, we study in this paper
the problem of quantifying the political
leaning of prominent members in the
Twitter sphere. By taking a new point of
view on the consistency relationship
between tweeting and re-tweeting
behavior, we formulate political leaning
quantification as an ill-posed linear inverse
problem solved with regularization
techniques. The result is an automated
Figure: System Architecture method that is simple, efficient and has an
6. PROPOSED APPROCH intuitive interpretation of the computed
To motivate our approach based on scores. Compared to existing manual and
retweets, we consider a small example Twitter network-based approaches, our
based on some data extracted from our approach is able to operate at much faster
dataset on the presidential election. timescales, and does not require explicit
Consider a proRepublican media source A knowledge of the Twitter network, which
and a pro- Democrat media source B. We is difficult to obtain in practice.
observe the number of retweets they
received during two consecutive events. ACKNOWLEDGMENTS
During the ―Romney 47 percent comment‖ The volume of the work would not have
event1 (event 6 in Table 1), source A been possible without contribution in one
received 791 retweets, while source B form or the other by few names to
received a significantly higher number of mention. We welcome this opportunity to
2,311 retweets. It is not difficult to express our heartfelt gratitude and regards
imagine what happened: source B to our project guide Prof. V.V. Kimbahune
published tweets bashing the Republican Department of Computer Engineering,
candidate, and Democrat supporters STES SMT. KASHIBAI NAVALE
enthusiastically retweeted them. Then COLLEGE OF ENGINEERING, for his
consider the first presidential debate. It is unconditional guidance. She always
generally viewed as an event where bestowed parental care upon us and
Romney outperformed Obama. This time evinced keen interest in solving our
source A received 3,393 retweets, while problems.

[6] A. Boutet, H. Kim, and E. Yoneki, Whats in

REFERNCES your tweets? I know who you supported in the
[1] Felix Ming Fai Wong, Member, IEEE, Chee UK 2010 general election Proc. ICWSM,
Wei Tan, Senior Member, IEEE, Soumya.Sen , 2012.
Senior Member, IEEE, Mung Chiang, Fellow, [7] Adamic, L. A., and Glance, N. 2005. The
IEEE, Quantifying Political Leaning from political blogosphere and the 2004 U.S.
Tweets, Retweets, and Retweeters, IEEE election: Divided they blog. In Proc.
Transactions on Knowledge and Data LinkKDD.
Engineering. [8] Ansolabehere, S.; Lessem, R.; and Snyder, J.
[2] M.Gabjelkov, A.Rao and A.Legout, Studing M. 2006. The orientation of newspaper
social networks at scale:Macroscopic anatomy endorsements in U.S. elections. Quarterly
of the Twitter social graph in Proc. Journal of Political Science 1(4):393–404.
SIGMETRICS, 2014. [9] Rishitha Reddy,A.sri lakshmi, J.Deepthi ,
[3] P. Barbera, Birds of the same feather tweet ―Quantifying Political Leaning from Tweets,
together: Bayesian ideal point estimation using Retweets, and Retweeters‖, International
Twitter data Political Analysis, 2014.Fröhlich, Journal of Computational Science,
B. and Plate, J. 2000. Mathematics and Engineering Volume-4-
[4] The cubic mouse: a new device for three- Issue-2-February-2017.
dimensional input. In Proceedings of the [10] M. Thelwall, K. Buckley, G. Paltoglou, D.
SIGCHI Conference on Human Factors in Cai, and K. A., ―Sentiment strength detection
Computing Systems in short informal text,‖ Journal of the
[5] F. Al Zamal, W. Liu, and D. Ruths, American Society for Information Science and
Homophily and latent attribute inference: Technology, vol. 61, no. 12, pp. 2544–2558,
Inferring latent attributes of Twitter users from 2010.
neighbors in Proc. ICWSM, 2012. Tavel, P.
2007 Modeling and Simulation Design. AK
Peters Ltd.

TRAFFIC FLOW PREDICTION WITH BIG DATA

Nitika Vernekar1, Shivani Naik2,Ankita More3, Dr V V Kimbahune4, Pawan Bhutani5
1,2,3,4,
Department of Computer Engineering, Smt Kashibai Navale College Of Engineering, Vadgaon(Bk),
Pune, India.
5
nitika123.nv@gmail.com1, pvbhutani@gmail.com
ABSTRACT
The traffic flow in metropolitan city is most popular issue in current days. Importance
of finding such solution derives from the current problems faced by the urban road
traffic, such as congestions, pollution, security issues .To solve existing problem ,to
developed new proposed system,in that can collect the raw data of traffic flow of
different areas in metropolitan city. After collecting, analyzing, predict how much
traffic increase in next few days or year and how to control them. Based on defining
and classifying the large special events, this system analyzes the passenger flow
distribution characteristics of large special events, studies the spatial and temporal
distribution of road traffic flow surrounding the event areas. The system designs
common process of traffic organization and management for different large special
events, proposes the static and dynamic traffic organization methods and management
strategies, and designs the operation steps, which provide a reference and guidance for
the traffic organization practice of large special events.
Keywords- Intelligent transportation, Traffic, Prediction
1. INTRODUCTION Joe Lemieux et al. [1] proposed worldwide
Exact and convenient traffic stream data is improvement of the vitality utilization of
at present emphatically required for double power source vehicles, for
individual explorers, business divisions, example, crossover electric vehicles,
and government offices. In metropolitan module half breed electric vehicles, and
city traffic flow is more as compare to fitting in energy component electric
other metro city as well as other urban city vehicles requires learning of the total
so,traffic flow most popular issue in course attributes toward the start of the
current days. Importance of finding such excursion. One of the primary attributes is
solution derives from the current problems the vehicle speed profile over the course.
faced by the urban road traffic, such as The profile will make an interpretation of
congestions, pollution, security issues .To straightforwardly into vitality necessities
analyze this problem and solve the issue to for a given vehicle. In any case, the
developed this proposed system, In this vehicle speed that a given driver picks will
system, first collect the raw data of traffic fluctuate from driver to driver and every
flow of different areas in metropolitan city now and then, and might be slower,
then analysis on traffic data, after equivalent to, or quicker than the normal
analyzing find out traffic areas in traffic stream. On the off chance that the
metropolitan city. Then system can also explicit driver speed profile can be
predict how much traffic increase in next anticipated, the vitality utilization can be
few days or year and how to control them. improved over the course picked. The
User also avoids going in particular area motivation behind this paper is to look into
for at time of large special events. the use of Deep Learning systems to this
issue to recognize toward the start of a
drive cycle the driver explicit vehicle
2 . LITERATURE SURVEY speed profile for an individual driver

rehashed drive cycle, which can be utilized recurrence models. The solid reliance on
in an advancement calculation to limit the nature due to multipath engendering is
measure of petroleum derivative vitality exhibited. These outcomes can help in the
utilized amid the excursion. distinguishing proof of the ideal area of the
Youness Riouali et.al[2] states that traffic handsets to limit control utilization and
flow demonstrating is a fundamental increment benefit execution, enhancing
advance for planning and controlling the vehicular correspondences in ITS.
transportation frameworks. It isn't vital for DAI Lei-lei et.al[4] introducing a view of
enhancing wellbeing and transportation characterizing and grouping the extensive
efficiency, yet additionally it can yield uncommon occasions, the paper
financial and ecological benefits. Consider investigates the traveler stream
the discrete and consistent parts of traffic conveyance qualities of substantial
flow elements, half and half Petri nets have extraordinary occasions, thinks about the
turned out to be an incredible asset for spatial and fleeting dispersion of street
moving toward this elements and portray traffic stream encompassing the occasion
the vehicle conduct precisely since they regions. By summing up the traffic
incorporate the two perspectives. Another association and the executives encounters
expansion of mixture petri net is exhibited of model at home and abroad, joined with
in this paper for summing up the traffic the arrangement results, the paper
flow demonstrating through considering structures basic procedure of traffic
state conditions on outer principles which association and the executives for various
can be planned and furthermore vast extraordinary occasions, proposes the
nondeterministic time, for example, stop static and dynamic traffic association
sign or need streets. Also, a division of techniques and the board methodologies,
streets is proposed to manage the exact and plans the activity steps, which give a
limitation of occasions. reference and direction to the traffic
Leyre azpilicueta et.al [3] proposed an association routine with regards to
intelligent transportation frameworks expansive unique occasions.
(ITSs) are as of now under serious Thomas Liebig et.al[5] states circumstance
innovative work for making transportation subordinate course arranging assembles
more secure and progressively proficient. expanding enthusiasm as urban areas end
The improvement of such vehicular up swarmed and stuck. A framework for
correspondence systems requires exact individual outing arranging that
models for the spread channel. A key consolidates future traffic perils in
normal for these channels is their transient steering. Future traffic conditions are
fluctuation and innate time-evolving processed by a Spatio-Temporal Random
insights, which majorly affect Field dependent on a surge of sensor
electromagnetic, spread expectation. This readings. Furthermore, our methodology
article researches the channel properties of gauges traffic flow in territories with low
a remote correspondence framework in a sensor inclusion utilizing a Gaussian
vehicular communication domain with Process Regression. The molding of
deterministic displaying. An investigation spatial relapse on halfway forecasts of a
of the physical radio channel engendering discrete probabilistic graphical model
of a ultra-high-recurrence (UHF) radio- permits to join authentic information,
recurrence ID (RFID) framework for a gushed online information and a rich
vehicle-to-foundation (V2I) dispersing reliance structure in the meantime. Exhibit
condition is exhibited. Another module the framework with a genuine use-case
was executed in the proposed site-explicit from Dublin city, Ireland.
apparatus that considers the development Shen, L et.al [6] introducing a centers
of the vehicles, prompting existence around examining dynamic company

scattering models which could catch the vehicles cutting into the path of the host
inconstancy of traffic stream in a cross- vehicle.
sectional traffic location condition. The Shen, L. et al[8] states an Lacking of
dynamic models are connected to adequate worldly variety trademark
anticipate the advancement of traffic examination and spatial connection
stream, and further used to create flag estimations prompts constrained
timing designs that account not just for the fulfillment accuracy, and represents a
present condition of the framework yet in noteworthy test for an ITS. Utilizing the
addition for the normal transient changes low-position nature and the spatial-worldly
in rush hour gridlock streams. To explore connection of traffic organize information,
factors influencing model exactness, this paper proposes a novel way to deal
including time-zone length, position of with remake the missing traffic
upstream traffic location gear, street area information dependent on low-position
length, traffic volume, turning rates, and lattice factorization, which expounds the
calculation time. The effect of these potential ramifications of the traffic
elements on the model's execution is network by disintegrated factor
delineated through a reenactment frameworks. To additionally misuse the
examination, and the calculation execution worldly evolvement attributes and the
of models is talked about. The outcomes spatial similitude of street joins, user plan
demonstrate that both the dynamic speed- a period arrangement imperative and a
truncated typical circulation model and versatile Laplacian regularization spatial
dynamic Robertson display with elements requirement to investigate the
beat their particular static adaptations, and neighborhood association with street joins.
that they can be additionally connected for The exploratory outcomes on six
dynamic control. certifiable traffic informational collections
Graf R et.al [7] proposed to future driving demonstrate that our methodology beats
help frameworks will require an expansion alternate techniques and can effectively
capacity to deal with complex driving remake the street traffic information
circumstances and to respond properly as exactly for different basic misfortune
indicated by circumstance criticality and modes.
prerequisites for hazard minimization.
People, driving on motorways, can pass 3.EXISTING SYSTEM APPROACH
judgment, for instance, cut-in In existing framework, in metropolitan city
circumstances of vehicles due to their traffic jam is more as contrast with other
encounters. The thought displayed in this urban city zone just as other rural area. So,
paper is to adjust these human capacities to traffic jams most well-known issue in
specialized frameworks and learn current days. Traffic jam occurs when
distinctive circumstances after some time. movement of vehicles is hampered at a
Case-Based Reasoning is connected to particular place for some reasons over a
foresee the conduct of street members certain period of time. If the number of
since it joins a learning viewpoint, in light vehicles plying on a street or road is
of information obtained from the driving increased than the maximum capacity it is
history. This idea encourages built to sustain, it results in traffic jams.
acknowledgment by coordinating genuine Traffic jam or traffic congestion is an
driving circumstances against put away everyday affair in big cities. It is the result
ones. In the main occasion, the idea is of growing population and the increase in
assessed on activity expectation of use of personal, public as well as
vehicles on neighbouring paths on commercial transport vehicles. The loss of
motorways and spotlights on the part of the profitable time brought about by the
automobile overloads isn't at all useful for

a Nation's affordable development. season. Based on defining and classifying

Moreover, it results in more wastage of the large special events, this system
fuel by stationary vehicles just analyzes the passenger flow distribution
contributing more to the natural characteristics of large special events,
contamination. There is likewise an studies the spatial and temporal
expanded plausibility for street accidents distribution of road traffic flow
as the vehicles need to stand or move in surrounding the event areas also find
nearness to one another and furthermore as traffic of particular areas. System can
a result of forceful driving by baffled different recommended different route to
drivers. By and large, the time squandered user.
in roads turned parking lots additionally
prompts the monetary loss of the 5. CONCLUTION
nation.There is traffic prediction in The traffic jam in metropolitan city is most
specified at special event as well as places. famous topic in present days. Different
kind of peoples are faced problem of urban
4. PROPOSED SYSTEM APPROACH road traffic, road accident such as
congestions, pollution, security problems
.Due to this reasons, road traffic is
increased day by day. To solve existing
problem, to develop a new proposed
system, in that can requirement gathering
of the system and collect the raw data of
traffic jam of different places in
metropolitan city. After collecting,
analyzing, predict how much traffic
increase in next few days or year and how
to control them. In view of characterizing
Fig.1 Block Diagram of Proposed System and arranging the huge extraordinary
In proposed system, collect the raw data of occasions, this framework examines the
traffic flow of different areas in traveller stream circulation attributes of
metropolitan city. After collecting, vast uncommon occasions, thinks about
analyzing, predict how much traffic the spatial and worldly dissemination of
increase in next few days or year and how street traffic stream encompassing the
to control them. The system designs occasion territories and give direction to
common process of traffic organization the traffic association routine with regards
and management for different large special to expansive exceptional occasions.
events, proposes the static and dynamic
traffic organization methods and 6. ACKNOWLEDGMENT
management strategies, and designs the This work is supported in a traffic
operation steps, which provide a reference prediction system of any state in india.
and guidance for the traffic organization Authors are thankful to Faculty of
practice of large special events. In Engineering and Technology (FET),
proposed system consists mainly 2 admin SavitribaiPhule Pune University,Pune for
and user module.Admin play most providing the facility to carry out the
important role in our traffic prediction research work.
system with performing their functionality
like upload traffic dataset, upload route REFERENCES
dataset, view user and traffic info. User [1] Joe Lemieux , Yuan Ma, ―Vehicle Speed
can search the traffic with different Prediction using Deep Learning ‖
scenario like search by location, search by Department of Electrical and Computer

Engineering, University of Michigan Predictions.‖ TU Dortmund University,

Dearborn, Mi USA. 2015Conc Dortmund, Germany 2014
[2] Youness Riouali, Laila Benhlima, Slimane [6] Shen, L., Liu, R., Yao, Z., Wu, W., &
Bah ―Petri net extension for traffic road Yang, H. (2018). ―Development of Dynamic
modelling‖ Mohammadia School of Engineers Platoon Dispersion Models for Predictive
Mohammed V University of Rabat Traffic Signal Control.‖ IEEE Transactions
AMIPSMorocco, Rabat 2016 on Intelligent Transportation Systems, 1–10.
[3] Leyre azpilicueta , césar vargas-rosales, and doi:10.1109/tits.2018.2815182
Francisco Falcone, ―Intelligent vehicle [7] Graf R., Deusch, H., Fritzsche, M., &
communication Deterministic Propagation Dietmayer, K. (2013). ―A learning concept
Prediction in Transportation Systems‖ IEEE for behavior prediction in traffic
vehicular technology magazine 2016. situations.‖ 2013 IEEE Intelligent Vehicles
[4] DAI Lei-lei, GU Jin-gang, SUN Zheng-liang, Symposium (IV).
QIU Hong-tong ―Study on Traffic doi:10.1109/ivs.2013.6629544
Organization and Management Strategies [8] Shen, L., Liu, R., Yao, Z., Wu, W., &
for Large Special Events.‖ International Yang, H. (2018). ―Development of Dynamic
Conference on System Science and Platoon Dispersion Models for Predictive
Engineering, Dalian, China, 2012. Traffic Signal Control.‖ IEEE Transactions
[5] Thomas Liebig, Nico Piatkowski, Christian on Intelligent Transportation Systems, 1–10.
Bockermann, and Katharina Morik. ―Route doi:10.1109/tits.2018.2815182.
Planning with Real-Time Traffic

DETERMINING DISEASES USING ADVANCE

DECISION TREE IN DATA MINING
TECHNOLOGY
Vrushali Punde1, Priyanka Pandit2, Sharwari Nemane3
1,2,3
shravanipunde1997@gmail.com1, priyankapandit88@gmail.com2, sharwarinemane1997@gmail.com3
ABSTRACT
Heart disease is the leading cause of death amongst all other diseases. The number of
people suffering from heart disease is on a rise each year. This prompts for its early
diagnosis and treatment. Due to lack of resources in the medical field, the prediction of
heart disease may not be possible occasionally. This paper addresses the issue of
prediction of heart disease according to the input attributes on the basis of data mining
techniques. Mining is a method of exploring massive datasets to find hidden patterns
and knowledge discovery. The large data available from medical diagnosis is analyzed
using advance decision tree algorithm. Using this, the hospitals could offer better
diagnosis and treatment to the patient to attain a good quality of service.
Keywords
Decision Tree, Machine Learning, QA System, heart disease prediction.
1. INTRODUCTION of getting heart disease given patient data
The main reason for death worldwide, set. Prophecies‘ and descriptions are
including South Africa is heart attack principal goals of data mining; in practice
diseases and possible detection at an Prediction in data mining involves
earlier stage will prevent these attacks. attributes or variables in the data set to
Medical practitioners generate data with a locate unknown or future state values of
wealth of concealed information present, other attributes. Description emphasize on
and it‘s not used effectively for discovering patterns that describes the data
predictions. For this reason, the research to be interpreted by humans.
converts the unused data into a dataset for
shaping using different data mining 2. MOTIVATION
techniques. People die having encountered The huge data growth in biomedical and
symptoms that were not taken into healthcare businesses, accurate analysis of
considerations. There is a requirement for medical data benefits from early detection,
medical practitioners to defined heart patient care, and community services. The
disease before they occur in their patients. analysis accuracy is reduced if the quality
The features that increase the chances of of the medical data is incomplete.
heart attacks are smoking, lack of physical
exercises, high blood pressure, high 3. LITERATURE SURVEY
cholesterol, unhealthy diet, detrimental use Literature survey is the most important
of alcohol, and high sugar levels . Cardio step in any kind of research. Before start
Vascular Disease (CVD) constitutes developing we need to study the previous
coronary heart, cerebro-vascular or Stroke, papers of our domain which we are
hypertensive heart disease, congenital working and on the basis of study we can
heart, peripheral artery, rheumatic heart predict or generate the drawback and start
disease, and inflammatory heart disease. working with the reference of previous
Data mining is a knowledge discovery papers. In this section, we briefly review
technique to examine data and encapsulate the related work on Heart disease
it into useful information. The current prediction and their different techniques.
research intends to forecast the probability

[1]Classification of Heart Diseases using two more attributes i.e. obesity and
K Nearest Neighbor and Genetic smoking. The data mining classification
Algorithm(2013) algorithms, namely Decision Trees, Naive
Nearest neighbor (KNN) is very Bayes, and Neural Networks are analyzed
simple, most popular, highly efficient and on Heart disease database.
effective technique for pattern recognition. [4]Cardio Vascular Disease Prediction
KNN is a straight forward classifier, where System using Genetic Algorithm(2012)
parts are classified based on the class of Medical Diagnosis Systems play
their nearest neighbor. Medical databases important role in medical practice and are
have big volume in nature. If the data set used by medical practitioners for diagnosis
contains excessive and irrelevant and treatment. In this work, a medical
attributes, classification may create less diagnosis system is defined for predicting
accurate result. Heart disease is the main the risk of cardiovascular disease. This
cause of death in INDIA. In Andhra system is built by combining the relative
Pradesh heart disease was the prime cause advantages of genetic technique and neural
of mortality accounting for 32% of all network. Multilayered feed forward neural
deaths, a rate as high as Canada (35%) and networks are particularly adapted to
USA. Hence there is a need to define a complex classification problems. The
decision support system that helps weights of the neural network are
clinicians to take precautionary steps. This determined using genetic technique
work proposed a new technique which because it finds acceptably good set of
combines KNN with genetic technique for weights in less number of iterations.
effective classification. Genetic technique [5]Wavelet Based QRS Complex
performs global search in complex, large Detection of ECG Signal(2012)
and multimodal landscapes and provides A wide range of heart condition is
optimal solution. defined by thorough examination of the
[2]A Survey of Non-Local Means based features of the ECG report. Automatic
Filters for Image De-noising(2013) extraction of time plane features is
Image de-noising includes the valuable for identification of vital cardiac
manipulation of the image data to produce diseases. This work presents a multi-
a visually high quality image. The Non resolution wavelet transform based system
Local means filter is originally designed for detection 'P', 'Q', 'R', 'S', 'T' peaks
for Gaussian noise removal and the filter is complex from original ECG signal. 'R-R'
changed to adapt for speckle noise time lapse is an important minutia of the
reduction. Speckle noise is the initial ECG signal that corresponds to the
source of medical ultrasound imaging heartbeat of the related person. Abrupt
noise and it should be filtered out. This increase in height of the 'R' wave or
work reviews the existing Non-Local changes in the measurement of the 'R-R'
Means based filters for image de-noising. denote various anomalies of human heart.
[3]Improved Study Of Heart Disease Similarly 'P-P', 'Q-Q', 'S-S', 'T-T' also
Prediction System Using Data Mining corresponds to various anomalies of heart
Classification Techniques(2012) and their peak amplitude also envisages
This work has analyzed prediction other cardiac diseases. In this proposed
systems for Heart disease using more method the 'PQRST' peaks are marked and
number of input attributes. The work uses stored over the entire signal and the time
medical terms such as sex, blood pressure, interval between two consecutive 'R' peaks
cholesterol like 13 attributes to predict the and other peaks interval are measured to
likelihood of patient getting a Heart find anomalies in behavior of heart, if any.
disease. Until now, 13 attributes are used [6] Heart Disease Diagnosis using Data
for prediction. This research work added Mining Technique - Decision Tree It has

tremendous efficiency using fourteen [8] Disease Prediction by Machine

attributes, after applying genetic algorithm Learning over Big Data from Healthcare
to reduce the actual data size to get the Communities
optimal subset of attribute acceptable for This paper proposes a a new
heart disease prediction. convolutional neural network based
[7] Predictions in heart disease using multimodal disease risk prediction
techniques of Data Mining - Different algorithm using structured and
classification techniques of data mining unstructured data from hospital.
They have merits and demerits for data
classification and knowledge extraction.
4. GAP ANALYSIS
Sr Publishe
Author Title Conclusion Limitations
no r
1 M.Akhil Classification of M.Akhil Use KNN and Genetic 1.Low accuracy
jabbar B.L Heart Diseases jabbar* algorithm for heart disease 2.Limited
Deekshatulua using K Nearest B.L detection dataset used
Priti Chandra Neighbor and Deekshat
b Genetic ulua Priti
Algorithm
2 Beshiba A Survey of Non- IJERT This paper 1.Worked on

Wilson, Local Means reviews the existing Non-Local image noising
Dr.Julia based Filters for Means based filters for image 2.worked only
Punitha Malar Image De-noising Denoising. on images not
Dhas on text
3 Chaitrali S. Improved Study Internatio This work has analyzed 1.Limited
Dangare Of Heart Disease nal prediction systems for Heart dataset used
Sulabha S. Prediction Journal disease using more number of
Apte, PhD. System Using of input attributes. The work uses
Data Mining Compute medical terms such as sex,
Classification r blood pressure, cholesterol like
Techniques Applicati attributes to predict the
ons likelihood of patient getting a
Heart disease. Until now,
attributes are used for
prediction.
4 M.Akhil Cardio Vascular ICECIT This system is built by 1.Accuracy low
jabbar a*, Disease combining the relative
Dr.Priti Prediction advantages of genetic technique
Chandrab System using
, Dr.B.L Genetic
Deekshatuluc Algorithm
5 K.V.L.Naraya Wavelet Based iiste A wide range of heart condition 1.Data not
na, QRS Complex is defined by thorough extracted
A.Bhujanga Detection of ECG examination of the features of properly.
Rao Signal the ECG report 2.Accuracy is
low.
5. PROPOSED SYSTEM heart diagnosis. First, the heart numeric

This work is used for finding heart dataset is extracted and pre-process them.
diseases. Based on risk factor the heart After that using extract the features that
diseases can be predicted very easily. The are condition to be find to be classified by
main aim of this paper is to predict the Decision Tree (DT). Compared to existing;

algorithms provides better performance. algorithms i.e., Decision Tree

After classification, performance criteria (DT).
including accuracy, precision, F-measure  Find reliable answer using this
is to be calculated. The comparison system
measure reveals that Decision tree is the  To achieve better accuracy.
best classifier for the diagnosis of heart  Easy to understand, interpret.
 onRules
disease the existing data.
are easily generated.
Advantages:  Implicitly performs feature
 Predict heart disease for Structured selection.
Data using machine learning

Fig. System Architecture
Table: Accuracy Graph for Heart Dataset
Existing Proposed
System System
Precisio
0.825 0.9
n
Recall 0.825 0.9 This work can be enhanced by increasing
F-
number of attributes for disease prediction,
0.825 0.9 making this system more accurate.
Measure
6. CONCLUSION AND FUTURE REFERENCES

WORK [1] Sarath Babu, Vivek EM, Famina KP ―Heart
Disease Diagnosis using Data Mining
This work can be enhanced by increasing Technique‖ International Conference on
number of attributes for disease prediction, Electronics Communication and Aerospace
technology, ICECA 2017.
making this system more accurate. Thus
[2] Beshiba Wilson, Dr.Julia Punitha Malar Dhas
by using decision tree algorithm on ―A Survey of Non-Local Means based Filters
specific attributes, classification model is for Image Denoising‖ International Journal of
generated. Using this classification model Engineering Research & Technology, Vol.2 -
we predict the heart diseases. Issue 10 (October – 2013).

[3] Chaitrali S Dangare ―Improved Study Of Heart

Disease Prediction System Using Data Mining
Classification Techniques‖, International
Journal Of Computer Applications, Vol.47,
No.10 (June 2012).
[4] Amma, N.G.B ―Cardio Vascular Disease
Prediction System using Genetic Algorithm‖,
IEEE International Conference on Computing,
Communication and Applications, 2012.
[5] Sayantan Mukhopadhyay1 , Shouvik Biswas2 ,
Anamitra Bardhan Roy3 , Nilanjan Dey4‘
Wavelet Based QRS Complex Detection of
ECG Signal‘ International Journal of
Engineering Research and Applications
(IJERA) Vol. 2, Issue 3, May-Jun 2012,
pp.2361-2365
[6] Algorithm M.Akhil jabbar B.L Deekshatulua
Priti Chandra International ―Classification of
Heart Disease Using K- Nearest Neighbor and
Genetic Algorithm‖ Conference on
Computational Intelligence: Modeling
Techniques and Applications (CIMTA) 2013.
[7] Monika Gandhi, Dr. Shailendra Narayan Singh
―predictions in Heart Disease using
Techniques of Data Mining‖ International
Conference on futuristic trend in
Computational Analysis and Knowledge
Management (ABLAZE) 2015.
[8] Min Chen, Yixue How, Kai Hwang ―Disease
Prediction by Machine Learning over Big Data
by Heart Care communities‖ IEEE 2016.
[9] Jyoti Soni, Predictive Data Mining for Medical
Diagnosis: An Overview of Heart Disease
Prediction, International Journal of Computer
Applications (0975 – 8887) Volume 17– No.8,
March 2011.
[10] P.K. Anooj, Clinical decision support
system: Risk level prediction of heart disease
using weighted fuzzy rules, Journal of King
Saud University – Computer and Information
Sciences (2012) 24, 27–40.
[11] Nidhi Bhatla and Kiran Jyoti, An Analysis
of Heart Disease Prediction using Different
Data Mining Techniques, International Journal
of Engineering Research & Technology
(IJERT) Vol. 1 Issue 8, October - 2012 ISSN:
2278-0181.
[12] Aditya Methaila, Prince Kansal,
Himanshu Arya, Pankaj Kumar, Early Heart
Disease Prediction Using Data Mining
Echniques, Sundarapandian et al. (Eds) :
CCSEIT, DMDB, ICBB, MoWiN, AIAP –
2014 pp. 53–59, 2014. © CS & IT-CSCP 2014
DOI : 10.5121/csit.2014.4807.

SURVEY PAPER ON MULTIMEDIA RETRIEVAL

USING SEMANTIC CROSS MEDIA HASHING
METHOD
Prof.B.D.Thorat1, Akash Parulekar2, Mandar Bedage3, Ankit Patil4 ,Dipali Gome5
1,2,3,4,5
Bhagwan.thorat@gmail.com1, akash365681@gmail.com2, mandarb121@gmail.com3
ankit20nikhil@gmail.com4, dipali.game2511@gmail.com5
ABSTRACT
Storage requirements for visual and Text data have increased in recent years,
following the appearance of many interactive multimedia services and applications for
mobile devices in personal and business scenarios. Hash methods are useful for a
variety of tasks and have attracted great attention in recent times. They proposed
different approaches to capture the similarities between text and images. However,
most of the existing work use bag-of-words method is used to represent text
information. Since words with different forms may have a similar meaning, the
similarities of the semantic text cannot be well worked out in these methods. To address
these challenges in this paper, a new method called semantic cross-media hashing
(SCMH), which uses the continuous representations of the proposed words capturing
the semantic textual similarity level and the use of a deep conviction network (DBN) to
build the correlation between different modes. To demonstrate the effectiveness of the
proposed method, it is necessary to consider three commonly used data sets that are
considered basic. Experimental results show that the proposed method achieves
significantly better results in addition, the effectiveness of the proposed method is
similar or superior to other hash methods.
Keywords
Semantic cross media hashing Method, SIFT Descriptor, Word Embedding, Ranking,
Mapping
1. INTRODUCTION multiple modality and retain / protect the
With the fast development of internet and similarity relation in each respective
multimedia, information with various form modalities. Generally hashing method
has become enough smooth, simple and divided into 2 categories: matrix
easier to access, modify and duplicate. decomposition method and vector based
Information with various forms may have method. Matrix decomposition based
semantic correlation for example a hashing method search low dimensional
microblogs in Facebook often consist of spaces to construct data and quantify the
tag, a video in YouTube is always reconstruction coefficient to obtain binary
associated with related description or tag codes. Such kind of methods avoid graph
as semantic information inherently consist construction and Eigen decomposition.
of data with different modality provide an The drawback with such methods, causes
great emerging demand for the large quantization errors which detorate
applications like cross media retrieval, such performance for large code length.
image annotation and recommendation We have design multi-modal hashing
system. Therefore, the hash similarity model SCMH which focuses on Image and
methods which calculates or approximate Text type of data with binary
search suggested and received a representation Hashing. This method
remarkable attention in last few years. The processed text data using Skip gram model
core problem of hash learning is how to and image data using SIFT Descriptor.
formulate underlay co-relation between

After it generates hash code using Deep challenges can be mitigated by jointly
Neural network by avoiding duplicates. exploring the cross-view learning and the
use of click-through data. The former aims
Motivation to create a latent subspace with the ability
 In existing, use Canonical in comparing information from the original
Correlation Analysis (CCA), incomparable views (i.e., textual and
manifolds learning, dual-wing visual views), while the latter explores the
harmoniums, deep auto encoder, largely available and freely accessible
and deep Boltzmann machine to click-through data (i.e., ―crowdsourced‖
approach the task. human intelligence) for understanding
 Due to the efficiency of hashing- query [2].
based methods, there also exists a D. Zhai, H. Chang, Y. Zhen, X. Liu,
rich line of work focusing the X. Chen, and W. Gao: In this paper, we
problem of mapping multi-modal study HFL in the context of multimodal
high-dimensional data to low- data for cross-view similarity search. We
dimensional hash codes, such as present a novel multimodal HFL method,
Latent semantic sparse hashing called Parametric Local Multimodal
(LSSH) , discriminative coupled Hashing (PLMH), which learns a set of
dictionary hashing (DCDH), Cross- hash functions to locally adapt to the data
view Hashing (CVH), and so on. structure of each modality [3].
G. Ding, Y. Guo, and J. Zhou: In this
paper, we study the problems of learning
2. RELATED WORK hash functions in the context of
Literature survey is the most important multimodal data for cross-view similarity
step in any kind of research. Before start search. We put forward a novel hashing
developing we need to study the previous method, which is referred to Collective
papers of our domain which we are Matrix Factorization Hashing (CMFH) [4].
working and on the basis of study we can H. J_egou, F. Perronnin, M. Douze, J.
predict or generate the drawback and start S_anchez, P. P_erez, and C. Schmid:
working with the reference of previous This paper addresses the problem of large-
papers. scale image search. Three constraints have
In this section, we briefly review the to be taken into account: search accuracy,
related work on Tag Search and Image efficiency, and memory usage. We first
Search and their different techniques. present and evaluate different ways of
Y. Gong, S. Lazebnik, A. Gordo, and aggregating local image descriptors into a
F. Perronnin: This paper addresses the vector and show that the Fisher kernel
problem of learning similarity-preserving achieves better performance than the
binary codes for efficient similarity search reference bag-of-visual words approach
in large-scale image collections. We for any given vector dimension [5].
formulate this problem in terms of finding J. Zhou, G. Ding, and Y. Guo: In this
a rotation of zero-cantered data so as to paper, we propose a novel Latent Semantic
minimize the quantization error of Sparse Hashing (LSSH) to perform cross-
mapping this data to the vertices of a zero- modal similarity search by employing
cantered binary hypercube, and propose a Sparse Coding and Matrix Factorization.
simple and efficient alternating In particular, LSSH uses Sparse Coding to
minimization algorithm to accomplish this capture the salient structures of images,
task [1]. and Matrix Factorization to learn the latent
Y. Pan, T. Yao, T. Mei, H. Li, C.-W. concepts from text [6].
Ngo, and Y. Rui: we demonstrate in this Z. Yu, F. Wu, Y. Yang, Q. Tian, J.
paper that the above two fundamental Luo, and Y. Zhuang: In DCDH, the

coupled dictionary for each modality is text documents or images from different
learned with side information (e.g., data sources [10].
categories). As a result, the coupled
dictionaries not only preserve the intra- 3. EXISTING SYSTEM
similarity and inter-correlation among Lot of work has been done in this field
multi-modal data, but also contain because of its extensive usage and
dictionary atoms that are semantically applications. In this section, some of the
discriminative (i.e., the data from the same approaches which have been implemented
category is reconstructed by the similar to achieve the same purpose are
dictionary atoms) [7]. mentioned. These works are majorly
H. Zhang, J. Yuan, X. Gao, and Z. differentiated by the algorithm for
Chen: In this paper, we propose a new multimedia retrieval.
cross-media retrieval method based on In another research, the training set
short-term and long-term relevance images were divide into blobs. Each such
feedback. Our method mainly focuses on blob has a keyword associated with it. For
two typical types of media data, i.e. image any input test image, first it is divided into
and audio. First, we build multimodal blobs and then the probability of a label
representation via statistical canonical describing a blob is found out using the
correlation between image and audio information that was used to annotate the
feature matrices, and define cross-media blobs in the training set.
distance metric for similarity measure; As my point of view when I studied the
then we propose optimization strategy papers the issues are related to tag base
based on relevance feedback, which fuses search and image search. The challenge is
short-term learning results and long-term to rank the top viewed images and making
accumulated knowledge into the objective the diversity of that images is main task
function [8]. and the search has that diversity problem
so the open issue is diversity.
A. Karpathy and L. Fei-Fei: We
present a model that generates natural 4. PROPOSED SYSTEM
language descriptions of images and their We propose a novel hashing method,
regions. Our approach leverages datasets called semantic cross-media hashing
of images and their sentence descriptions (SCMH), to perform the near-duplicate
to learn about the inter-modal detection and cross media retrieval task.
correspondences between language and We propose to use a set of word
visual data. Our alignment model is based embeddings to represent textual
on a novel combination of Convolution information. Fisher kernel framework is
Neural Networks over image regions, incorporated to represent both textual and
bidirectional Recurrent Neural Networks visual information with fixed length
over sentences, and a structured objective vectors. For mapping the Fisher vectors of
that aligns the two modalities through a different modalities, a deep belief network
multimodal embedding [9]. is proposed to perform the task. We
J. Song, Y. Yang, Y. Yang, Z. Huang, evaluate the proposed method SCMH on
and H. T. Shen: In this paper, we present two commonly used data sets. SCMH
a new multimedia retrieval paradigm to achieves better results than state-of-the-art
innovate large-scale search of methods with different the lengths of hash
heterogeneous multimedia data. It is able codes and also display query results in
to return results of different media types ranked order.
from heterogeneous data sources, e.g.,
using a query image to retrieve relevant Advantages:

 We introduce a novel DBN based

method to construct the correlation [1] Y. Gong, S. Lazebnik, A. Gordo, and F.
between different modalities. Perronnin, ―Iterative quantization: A
procrustean approach to learning binary codes
 The Proposed method can for large-scale image retrieval,‖ IEEE Trans.
significantly outperform the state- Pattern Anal. Mach. Intell., vol. 35, no. 12, pp.
of-the-art methods. 2916–2929, Dec. 2013.
 Improve the searching accuracy. [2] Y. Pan, T. Yao, T. Mei, H. Li, C.-W. Ngo, and
Y. Rui, ―Clickthrough-based cross-view
learning for image search,‖ in Proc. 37th
System Architecture:
Int.ACMSIGIR Conf. Res. Develop. Inf.
Retrieval, 2014, pp. 717–726.
[3] D. Zhai, H. Chang, Y. Zhen, X. Liu, X. Chen,
and W. Gao, ―Parametric local multimodal
hashing for cross-view similarity search,‖ in
Proc. 23rd Int. Joint Conf. Artif. Intell., 2013,
pp. 2754–2760.
[4] G. Ding, Y. Guo, and J. Zhou, ―Collective
matrix factorization hashing for multimodal
data,‖ in Proc. IEEE Conf. Comput. Vis.
Pattern Recog., 2014, pp. 2083–2090.
[5] H. J_egou, F. Perronnin, M. Douze, J.
S_anchez, P. P_erez, and C. Schmid,
―Aggregating local image descriptors into
compact codes,‖ IEEE Trans. Pattern Anal.
Mach. Intell., vol. 34, no. 9, pp. 1704–1716,
Sep. 2011.
[6] J. Zhou, G. Ding, and Y. Guo, ―Latent
Fig. System Architecture semantic sparse hashing for cross-modal
similarity search,‖ in Proc. 37th Int. ACM
5. CONCLUSION SIGIR Conf. Res. Develop. Inf. Retrieval,
2014, pp. 415–424.
In this paper, propose a new SCMH novel
[7] Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y.
hashing method for duplicate and cross- Zhuang, ―Discriminative coupled dictionary
media retrieval. We are proposing to use a hashing for fast cross-media retrieval,‖ in Proc.
word embedding to represent textual 37th Int. ACM SIGIR Conf. Res. Develop. Inf.
information. The Fisher Framework Retrieval, 2014, pp. 395–404.
Kernel used to represent both textual and [8] H. Zhang, J. Yuan, X. Gao, and Z. Chen,
―Boosting cross-media retrieval via visual-
visual information with fixed length
auditory feature analysis and relevance
vectors. To map the Fisher vectors of feedback,‖ in Proc. ACM Int. Conf.
different modes, a network of deep beliefs Multimedia, 2014, pp. 953–956.
intends to do the operation. We appreciate [9] A. Karpathy and L. Fei-Fei, ―Deep visual-
the proposed method SCMH on Mriflicker semantic alignments for generating image
dataset. In the Mriflicker data set, SCMH descriptions,‖ in Proc. IEEE Conf. Comput.
Vis. Pattern Recog., Boston, MA, USA, Jun.
over other hashing methods, which 2015, pp. 3128–3137.
manages the best results in this data sets, [10] J. Song, Y. Yang, Y. Yang, Z. Huang, and
are text to image & image to Text tasks, H. T. Shen, ―Inter-media hashing for large-
respectively. Experimental results scale retrieval from heterogeneous data
demonstrate effectiveness proposed sources,‖ in Proc. Int. Conf. Manage. Data,
method in the cross-media recovery 2013, pp. 785–796.
method.
REFERENCES

MODERN LOGISTICS VEHICLE SYSTEM USING

TRACKING AND SECURITY
Arpit Sharma1 , Bakul Rangari2 , Rohit Walvekar3 , Bhagyashree Nivangune4 , Prof
.G.Gunjal5
1,2,3,4,5
Department of Computer Engineering, Smt. Kashibai Navale College Of Engineering ,
Vadgaon(Bk),Pune, India.
ABSTRACT
The Logistic management system have risen as of late with the improvement of Global
Positioning System (GPS), portable correspondence advancements, sensor and remote
systems administration innovations. The logistics management system are vital as they
can add to a few advantages, for example, proposing right places for getting clients,
expanding income of truck drivers, diminishing holding up time, car influxes and in
addition limiting fuel utilization and henceforth expanding the quantity of treks the
drivers can perform. The principle motivation behind this framework would supply
required vehicles that would be utilized to meet client requests through the arranging,
control and usage of the powerful development and capacity of related data and
administrations from birthplace to goal. We need to give end to end security to client
and supplier information by utilizing QR code idea. We are suggestion of closest best
specialist organization as indicated by client intrigue and identify spam specialist co-
op. Coordination administration alludes to the duty and administration of plan and
direct frameworks to control the development and land situating of crude materials,
work-in-process, and completed inventories at the most minimal aggregate expense.
Collaborations incorporates the organization of demand planning, stock,
transportation, and the mix of warehousing, materials managing, and packaging, all
joined all through an arrangement of workplaces.
General Terms
Intelligent transportation, Logistic system, QR Code, Request allocation, Vehicle
routing.
Keywords
Keywords are your own designated keywords which can be used by the user.
1. INTRODUCTION dynamic solicitation . The second
To settle the issues of conventional classification indicating vehicles as per
movers and packers frameworks, an notable directions of the portability
electronic arrangement has been suggested examples of clients utilizing GPS.
that will permit both the clients and the
specialist organizations to track the 2. MOTIVATION
vehicles while transportation and The Transportation logistics systems have
furthermore gives best administrations to emerged recently with the development of
the clients at most minimal expense by Global Positioning System (GPS), mobile
prescribing just accessible specialist communication technologies and wireless
organizations at favored expense. In networking technologies. These are very
Logistic frameworks concentrated degree important as they can contribute to several
on open transportation administrations benefits such as suggesting right places for
have been contemplated broadly. For the getting customer, increasing revenue to
most part, these strategic administration drivers, reducing waiting time hence
frameworks can be partitioned generally increasing the number of trips the drivers
into two classifications. The primary class can perform. The main purpose of this
demonstrating vehicles as indicated by the system is to supply transportation vehicles

that are used to meet customer demands to be mediocre. A combination of the two
through the planning, control and issues defeats this inadequacy. The two
implementation of the effective movement sub-issues can be settled successively
and storage of related information and (SP), by methods for various leveled basic
services from origin to destination and leadership (FI), or model update (DI). The
also maintain information of user in the last two methodologies are gotten from
form of QR code. The proposed system Geoffrion's idea of model mix. An issue a
focuses on delivery of goods, raw stochastic programming approach
materials, shifting home appliances, regarding the transportation issue isn't
furniture while relocation. resolved.
4. Product allocation to different
3. LITERATURE SURVEY types of distribution center in retail
1. An Automated Taxi Booking and logistics networks
Scheduling System In this system, study about novel solution
This proposed structure displays an approach is developed and applied to a
Automated Taxi Booking and Scheduling real-life case of a leading European
System with safe booking. The system grocery retail chain. Learn about City
gives an invaluable, ensured and safe compose will supplant nation or area as
holding for the two taxi drivers and the most significant division measurement
enrolled customers through PDAs. For that decides versatility conduct. A further
more customers are the in the time are aspect arises from assuming identical store
arrived then issues occurred, there are no delivery frequencies in outbound
taxi parking, central working transportation from all DC types.
environments or a booking structure for 5. The dynamic vehicle allocation
the generous number of taxis. problem with application in trucking
2. Autonomous vehicle logistic companies in Brazil
system: Joint routing and charging This paper manages the dynamic vehicle
strategy assignment issue (DVAP) in street
Principle point of this framework to roll transportation of full truckloads between
out the unavoidable improvements more terminals. The DVAP includes multi-
substantial. Begin from the general period asset allotment and comprises of
agreement that the business is changing characterizing the developments of an
and go further to indicate and measure the armada of vehicles that vehicle products
extent of progress. Inside a more between terminals with a wide land
perplexing and expanded versatility circulation. The consequences of a useful
industry scene, occupant players will be approval of the model and arrangement
compelled to at the same time contend on strategies proposed, isn't plainly specified.
different fronts and participate with 6. Road-based goods transportation:
organization. City compose will supplant A survey of real-world logistics
nation or district as the most significant applications from 2000 to 2015
division measurement that decides This paper gives a review of the
versatility conduct. fundamental genuine utilizations of street
3. Integration of vehicle routing and based merchandise transportation over the
resource allocation in a dynamic previous 15 years. It audits papers in the
logistics network territories of oil, gas and fuel
This proposed framework presents a transportation, retail, squander gathering
multi-period, incorporated vehicle and administration, mail and bundle
directing and asset distribution issue. conveyance and nourishment circulation.
Ignoring interdependencies between Take care of Integration of steering issues
vehicle directing and asset portion appears with different parts of the store network.

Another promising zone of research is the search vehicles, vehicles, can

reconciliation of vehicle directing with request vehicles request vehicles
other transportation modes, for example, and do payment and track
ships and prepares isn't say. according to the vehicles on map,
7. Online to Offline Business: Urban trip. and do
Taxi Dispatching with Passenger- Payments to
Driver Matching Stability service
A stable marriage approach is proposed. It providers.
can deal with unequal numbers of Customer can
passenger requests and taxis through review on the
matching them to dummy partners. For system. View or
sharing taxi dispatches (multiple passenger send
requests can share a taxi), Passenger information in
requests are packed through solving a form of QR
maximum set packing problem. code
8. Noah: A Dynamic Ridesharing Logistic In this system,
System management providers can
The framework analyzer will demonstrate systems are very add vehicles and
the framework execution including normal important as they drivers, also
holding up time, normal reroute rate, can contribute to view customers
normal reaction time, and normal level of several benefits request and send
sharing. The System can't enable clients to such as notification to
ask for taxicabs from their present area. suggesting right drivers.
places for getting
4. GAP ANALYSIS Customers,
Table 2. Comparison of existing and proposed increasing
System revenue to truck
Proposed drivers, reducing
Existing System
System waiting time,
In existing In this system, avoiding traffic
system admin admin have to jams as well as
have to provide provide minimizing fuel
authentication authentication consumption and
permission to permission to hence increasing
provider and only provider and can the number of
admin can view view vehicle, trips the drivers
vehicles, customers, can perform.
customers and providers,
providers. In this detection of
system, provider spam service 5.FIGURES/CAPTIONS
can add vehicles providers as This diagram depicts the actual working
and drivers, also well as ranking of the proposed system and all the
view customer of service functionalities it will perform.
requests and send providers.
notification to 6. PROPOSED WORK
drivers. In the existing system for logistic
management system, customers need to
In this system, Customer can search for providers and the required
customers can view vehicles vehicles to make transportation successful.
view vehicles, and search This leads to increase in waiting time for

customer and also the customer is unable client intrigue and give related occasion.
to trace out the current location of Client Advice is a term which is utilized in
transported material. The primary concern the sense to enthusiasm mining. One can
in our framework is, we need to give end give direction for the issue or can basically
to end security to client and supplier give an answer. Direction, is apparently a
information by utilizing QR code concept. supposition with request or control and
In QR code parallel picture we need to even control. Suggestion takes after, a
shroud client and supplier information. customer energy opening about
Just approved client can see information. organization is used for new customer to
For customer interest mining we used use authority association vehicle. We need
collaborative filtering method. The to give end to end security to client and
fundamental rule of this strategy is supplier information by utilizing QR code
suggestion of vehicle as per supplier idea.
benefit. Proposal is utilized to discover
7. ACKNOWLEDGMENTS REFERENCES
It gives us great pleasure in presenting [1] AlbaraAwajan,―An Automated Taxi Booking
the preliminary project report on and Scheduling System‖,
Conference―Automation Engineering‖, 12
modern logistics vehicle system using January 2015.
tracking and security. [2] A. Holzapfel, H. Kuhn, and M. G. Sternbeck,
We would like to take this opportunity ―Product allocation todifferent types of
to thank my internal guide distribution center in retail logistics networks,‖
Prof.G.Gunjal for giving us all the European Journal of Operational Research),
February. 2016.
help and guidance we needed. We are [3] J. Q. Yu and A. Y. S. Lam, ―Autonomous
really grateful to them for their kind vehicle logistic system: Joint routing and
support Their valuable suggestions charging strategy,‖ IEEE Transaction of
were very helpful. Intelligence Systems, 2016.
We are also grateful to Dr. P. N. [4] R. A..Vasco and R. Morabito, ―The dynamic
vehicle allocation problem with application in
Mahalle. Head of Computer trucking companies in Brazil,‖ Computers and
Engineering Department. Operational 24 April 2016.
[5] L. C. Coelho, J. Renaud, and G. Laporte,
―Road-based goods transportation: A survey of
real-world logistics applications from 2000 to

2015,‖ information Systems and operational Passenger-Driver Matching Stability‖, IEEE

Research March2016. 37 International Conference on Distributed
[6] T. Huth and D. C. Mattfeld, ―Integration of Computing Systems,2017.
vehicle routing and resourceallocation in a [8] Cheng Qiao, Mingming Lu, Yong Zhang,and
dynamic logistics network,‖ Transportation Kenneth, N. Brown, ―An Efficient Dispatch
Research Part 15 July 2015. and Decision-making Model for Taxi-booking
[7] HuanyangZheng and Jie Wu, ―Online to Service‖ 21st July,2016.
Offline Business: Urban Taxi Dispatching with

NETWORK AND
CYBER SECURITY

ONLINE VOTING SYSTEM USING OTP
Archit Bidkar1,Madhabi Ghosh2,Prajakta Madane3 ,Rohan Mahapatra4 ,Prof. Jyoti

Nandimath5
1,2,3,4,5 Department of Computer, Smt. Kashibai Navale College of Engineering, Vadgaon, Pune,
Maharashtra, India.
ABSTRACT
Currently voting process throughout the world is done using Electronic Voting
Machines. Though this system is widely followed, there are many drawbacks of the
system. People have to travel to their assigned poll booth stations, wait in long queues
to cast their vote, face unnecessary problems and so on. It becomes difficult for working
profession people or elderly/ sick people to cast their vote due to this system. This calls
for a change in system which can be done if voting processes in conducted online. Few
developed countries are trying to implement online voting system on small scale and
have been successful in doing so. We propose a system which overcomes limitations of
existing online system which uses bio-metric technologies and instead use One Time
Password system which is more secure and accurate.
Key Terms
Electronic Voting Machines (EVM), Online Voting System (OVS), One Time
Password (OTP), Election Commission (EC)
1. INTRODUCTION Internet voting. A total of 14 countries till
Online voting system will be a website. have now used online voting for political
Online voting system is an online voting elections or referenda.
technique in which people who are Indian Within the group of Internet voting
citizens and age is above 18 years and are system users, four core countries have
of any sex can cast their vote without been using Internet voting over the course
going to any physical polling station. of several elections: Canada, Estonia,
Online voting system is a software France and Switzerland.
application through which a voter can cast Estonia is the only country to offer
votes by filling forms themselves which Internet voting to the entire electorate. The
are distributed in their respective ward remaining ten countries have either just
offices. All the information in forms which adopted it, are currently piloting Internet
has to be entered by data entry officer is voting, have piloted it and not pursued its
stored in database. Each voter has to enter further use, or have discontinued its use.
his all basic information like name, sex,
religion, nationality, criminal record etc. 3. MOTIVATION
correctly in form taken from ward offices. The average election turnout over all
nine phases for 2014 Lok Sabha election
2. EXISTING SYSTEM was around 66.38 %. Due to current
The current system that most of the government‘s Digital India Campaign 88
countries including India follow is voting % of households in India have a mobile
by using Electronic Voting Machines. phone. Many of the people are having
Before EVMs where introduced and mobile phones and internet connection
legalized for voting procedure, Paper even in rural areas. Due to expansion of
Ballot system was used. communication networks throughout
The first use of Internet/ Online voting India, casting vote online is a possible
for a political election took place in the US idea. India‘s mobile phone subscriber base
in 2000, with more countries subsequently crested the 1 billion users mark, as per data
beginning to conduct trials of and/or use released recently by the country‘s telecom

regulator. People of all age group must principle may easily become the main
willingly exercise their right to vote challenge.
without feeling any sort of dissatisfaction.  Given that an Internet voting system
Currently 42 % of internet users in India cannot ensure that voters are casting
have an average internet connection speed their ballots alone, the validity of
of above 4 Mbit/s, 19 % have a speed of Internet voting must be demonstrated
over 10 Mbit/s, and 10 % enjoy speeds on other grounds.
over 15 Mbit/s. The average internet
connection speed on mobile networks in 3) Accessibility of Internet Voting –
India was 4.9 Mbit/s. With so many people  Improving accessibility to the
connected to internet, the idea of using voting process is often cited as a
OVS is very much feasible & it also reason for introducing Internet
overcomes various other problems faced voting. The accessibility of online
during election process such as creating voting systems, closely linked to
awareness among rural areas and youths, usability is relevant not only for
cost reduction, security, etc. voters with disabilities and
linguistic minorities, but also for
4. SOME IMPORTANT POINTS the average voter.
FROM REVIEW OF OVS [6]-  The way in which voters are
identified and authenticated can
1) Trust in Internet Voting – have a significant impact on the
 Trust in the electoral process is usability of the system, but a
essential for successful democracy. balance needs to be found between
However, trust is a complex concept, accessibility and integrity.
which requires that individuals make  Different groups in society have
rational decisions based on the facts to different levels of access to the
accept the integrity of Internet voting. Internet. Therefore, the provision
 Technical institutions and experts can of online voting in societies where
play an important role in this process, there is very unequal access to the
with voters trusting the procedural role Internet will have a different
played by independent institutions and impact on accessibility for various
experts in ensuring the overall integrity communities.
of the system.
 One of the fundamental ways to enable 4) Electoral Stakeholders and Their
trust is to ensure that information about Roles –
the Internet voting system is made  The introduction of Internet voting
publicly available. significantly changes the role that
 A vital aspect of integrity is ensured stakeholders play in the electoral
through testing, certification and audit process. Not only do new
mechanisms. These mechanisms will stakeholders, such as voting
need to demonstrate that the security technology suppliers, assume
concerns presented by Internet voting prominence in the Internet voting
have been adequately dealt with. process, but existing stakeholders
must adapt their roles in order to
2) The Secrecy and Freedom of the fulfill their existing functions.
Vote –  Central to this new network of
 Ensuring the secrecy of the ballot is a stakeholder relationships is public
significant concern in every voting administration, especially the role
situation. In the case of Internet voting of the EC. Public administration
from unsupervised environments, this and the EC will establish the legal

and regulatory framework for the through their smart phones or by

implementation of online voting; using an e-voting website
and this framework will define the  Authentication technique proposed
roles and rights of the various is - One Time Password (OTP).
stakeholders in the Internet voting One Time Password principle
process. produces pseudorandom password
 Internet voting introduces several each time the user tries to log on.
new elements and points of inquiry This OTP will be sent to voter‘s
for election observers. These mobile phone.
include evaluating the security of  An OTP is a password that is only
voting servers, assessing the EC‘s valid for single login session thus
monitoring of voting server improving the security. The system
security and threat response plans, takes care that no voter can
and the functioning of Internet determine for whom anyone else
Service Providers (ISPs). voted and no voter can duplicate
anyone elses vote.
5. LITERATURE SURVEY  This technique is imposed to
Few reference papers that we have used ensure that only the valid person is
for our project are - allowed to vote in the elections.
In [1], the description is -
In [3], the description is –
 In this paper authors propose an
 Electronic voting system provides
approach for effectively user-
improved features of voting system
friendly application for all users.
over traditional voting system such
This system is being developed for
as accuracy, convenience,
use by everyone with a simple and
flexibility, etc. The design of the
self explanatory graphical user
system guarantees that no votes in
interface (GUI). The GUI at the
favor of a given candidate are lost,
server‘s end enables creating the
due to improper tallying of the
polls on behalf of the client.
voting counts.
 The authors also further try to
 Authors propose to make full use
experiment on televoting process
of Aadhar Card of a person
i.e voting by sending SMS from
developed under UIDAI project to
user‘s registered mobile number.
make the election process fool-
 They also propose multilanguage
proof.
support so that an user can access
 Their system has-
and interact with website in the
1) User Mode- User fills data according
language he/she is comfortable
to his/her Aadhar card. The system
with.
then verifies it and allows user to have
 By using this proposed system, the
complete access to website.
percentage of absentee voting will
2) Admin Mode- In this mode officers of
decrease.
EC will be appointed to keep watch on
In [2], the description is - proceedings of election ad have
 Authors propose to build an E- authority to start, stop the election and
Voting system which is basically procure the result too.
an online voting system through  GAP ANALYSIS
which people can cast their vote This will help us get clear idea about
various ideas proposed.

submit a copy of his/ her Aadhar card as

6. PROPOSED SYSTEM an extra proof. During registration process,
With India on a fast track progress the user must correctly fill all personal
of achieving the status of ―Digital India‖, information like name, mobile number,
there have been improvement in ward no, etc.
infrastructure of internet and mobile Once the user submits completely
communication. Also many people are filled form, the data operator will enter all
now made aware of various advantages of the data in Election Commission database.
internet. With so much progress in this Once user record is created, a set of
field, why not try to implement use of username and password will be sent to
internet in voting process of India? registered user mobile number. On
With this project we are trying to receiving this, the user can access voting
help every Indian citizen who is above 18 website and on entering the received
years of age to vote for his/ her favored credentials, he/ she will be prompted to
candidate without having any fear of being change password. For security reasons,
pressured by political party members or username by default will be set as Aadhar
break any commitments. Instead one can card number of the user and cannot be
vote from his/her home/ office/ institute changed. After successfully creating new
any time before the deadline for that password and logging in, user can view
particular day‘s election. his/ her profile to check if there are any
For this, we do not need any sort of discrepancies.
elaborate infrastructure or expensive As an extra step of security, we
personal digital assistants. The user will propose to make use of One Time
fill out the registration form which is Password (OTP) for user log in. A one-
available at every ward office and also time password, is a password that is valid

for only one login session or transaction, The results, which the admins will
on a computer system or other digital send to ECI, will be further analyzed. The
device. OTPs avoid a number of results will be broken down into result of
shortcomings that are associated with each state, overall winner of election and
traditional password-based authentication; by how much has a certain party beaten
a number of implementations also other competitors.
incorporate two factor authentications by
ALGORITHM FOR PROPOSED OVS –
ensuring that the one-time password Algorithm: Successful online voting
requires access to something a person has Input: Biodata of voter & candidate,
(such as a smartcard or specific cellphone) various wards‘ details.
as well as something a person knows (such Output: Successful voting for voters and
as a PIN). declaration of results.
This ensures that individual can Steps:
vote only for her/himself and thus 1. Person must be 18 years of age or
reducing fraud votes. Only when user above.
enters correct Aadhar card number, mobile 2. Fill Form 6 for first time
number and set password, then website registration in respective ward
will give option of Generate OTP. On office.
clicking it an OTP will be sent to user 3. For changes in details, contact
mobile number within 2 minutes. On respective ward office.
entering correct OTP, the user will be able 4. Necessary documents must be
to log in and cast vote. Once the user submitted while doing steps 2 & 3.
selects candidate he/she wants to vote for, Failing to do so will result in
the system will pop up a confirmation rejection of form.
message. Once user selects confirm vote, 5. Once forms and documents are
he/ she will be automatically logged out verified, data entry operator will
from the website, thus preventing the user enter person‘s details in database
from voting again. and a default password will be sent
Additionally, the website will have to user.
another option of Admin log in. The 6. On receiving password, user must
admins are officers selected by Election log in using it and must select new
Commission who will monitor the voting password to access website for
as it progresses and will have their profile further use.
created by Election Commission. Their 7. Once new password is set, user can
main task will be start/ stop election on view profile, election related
time, make sure it progresses without any information.
issues and generate local ward results once 8. If any discrepancies are found in
elections are finished and send it to profile, step 3 must be followed.
Election Commission.
9. To cast vote, user must enter an
On information front, the website OTP which will be sent on
will have details of all candidates that are registered mobile no and is active
selected by respective parties for different for 1 minute.
wards. On selecting any of the candidate 10. If OTP is not received, repeat step
name, complete information of that 9.
candidate will be displayed. Also various 11. Once user enters correct OTP, vote
awareness programs that the Election can be cast.
Commission is conducting will be 12. On successful voting, a
displayed. This will help voter to gain confirmation message will be
more knowledge of voting process. displayed and user will be logged
out.

13. Final result will be declared after [3] C.Tamizhvanan, S.Chandramohan,

election and all can view it. A.Mohamed Navfar, P.Pravin Kumar,
R.Vinoth Assistant Professor1, B.Tech
Student Department of Electronics
7. CONCLUSION and Communication Engineering Achariya
By doing this project we are trying College of Engineering Technology,
to allow maximum number of people to Puducherry, India , Electronic Voting
vote. This way people can save time by System Using Aadhaar Card, International
avoiding standing in queues, vote for their Journal of Engineering Science and
Computing, March 2018.
choice of candidate, elders/ sick people [4] Chetan Sontakke, Swapnil Payghan,
can also cast heir votes without making Shivkumar Raut, Shubham Deshmukh,
any trip to polling stations and there will Mayuresh Chande, Prof. D. J. Manowar BE
be overall increase in voter turnout. Student Assistant Professor Depart
ment of Computer Science and Engineering
KGIET, Darapur, Maharashtra, India,
REFERENCES
Online Voting System via Mobile
[1] Pallavi Divya, Piyush Aggarwal, Sanjay
,International Journal of Engineering
Ojha (School Of Management, Cen
Science and Computing, May 2017.
ter For Development of Advanced
[5] R.Sownderya, J.Vidhya, V.Viveka,
Computing (CDAC), Noida , ADVANCED
M.Yuvarani and R.Prabhakar UG Scholar,
ONLINE VOTING SYSTEM, International
Department of ECE, Vivekanandha College
Journal of Scientific Research En
of Engineering for Women, India, Asian
gineering Technology (IJSRET) Volume 2
Journal of Applied Science and Technology
Issue 10 pp 687-691 January 2014
(AJAST) Volume 1, Issue 2, Pages
www.ijsret.org ISSN 2278 0882.
6-10, March 2017.
[2] Prof. Uttam Patil, Asst.Prof. at Dr.MSSCET.
[6] https://www.ndi.org/e-voting-guide/internet-
Computer Science branch Vaibhav
voting
More, Mahesh Patil ,8th Sem at
For reviews of past online voting conducted
Dr.MSSCET. Computer Science branch,
by various countries.
Online
Election Voting Using One Time Password ,
National Conference on Product
Design (NCPD 2016), July 2016.

ACCIDENT DETECTION AND PREVENTION

USING SMARTPHONE
Sakshi Kottawar1, Mayuri Sarode2, Ajit Andhale3, Ashay Pajgade4, Shailesh Patil5
1,2,3,4
Student, Smt. Kashibai Navale College of Engineering ,Pune
5
Assistant Professor, , Smt. Kashibai Navale College of Engineering ,Pune
sakshikottawar@gmail.com1, mayurisarode3@gmail.com2, andhalea3@gmail.com3,
ashaypajgade@gmail.com4, shaileshpp19@gmail.com5
ABSTRACT
Nonetheless, the number of accidents has continued to expand at an exponential rate.
Due to dynamic nature of VANET positions, there will be considerable delay in
transmission of messages to destination points. Android phones are broadly used due to
its features like GPS, Computational capability, internet connectivity. Traffic blocking
and Road accidents are the foremost problems in many areas. Also due to the
interruption in realization of the accident position and the traffic congestion at between
accident place and hospital increases the chances of the loss of victim. So in order to
provide solution for this problem, we develop an android application which detects
accident automatically. It makes use of the various sensors within the android phone
such as accelerometer, gyroscope, magnetometer, etc. A low power consumption
protocol is being used here for the effective transmission of messages and notifications
to the third party vehicles.
Keyword:Accelerometer, Gyroscope, GPS, Rollover, Deceleration, Accident Detection.
1. INTRODUCTION chances of survival and recovery for
The demand for emergency road services injured victims. Thus, once the accident
has risen around the world. Moreover, has occurred, it is crucial to efficiently and
changes in the role of emergency crews quickly manage the emergency rescue and
have occurred – from essentially resources.
transporting injured persons (to the With the rapid development of society,
hospital) to delivering basic treatment or there are some side effects including the
even advanced life support to patients increasing number of car accidents. On
before they arrive at the hospital. In average one out of every three motor
addition, advances in science and vehicle accidents results in some type of
technologies are changing the way injury. Traffic accidents are one of the
emergency rescue operates. leading causes of fatalities in most of the
In times of road emergency, appropriately countries.
skilled staffs and ambulances should be As number of vehicle increases mean
dispatched to the scene without delay. while the accident also increases. The
Efficient roadside emergency services government has taken number of actions
demand and so many awareness program also
the knowledge of accurate information contacted even though the accident
about the patient (adult, child, etc), their increases as population increases.
conditions (bleeding, conscious or The Proposed system can detect accident
unconscious, etc), and clinical needs. In automatically using accelerometer sensors
order to improve the chances of survival and notify all the nearest applications user
for passengers involved in car accidents, it and emergency points (Police station,
is desirable to reduce the response time of Hospital).
rescue teams and to optimize the medical
and rescue resources needed. A faster and
more efficient rescue will increase the

2. MOTIVATION [2] Girts Strazdins, Artis Mednis,

The motivation for doing was primarily an Georgijs Kanonirs et al; The paper showed
interest in undertaking an challenging one of the most popular smartphone
project in an interesting area of research. platforms at the moment, and the
With the rapid development of society, popularity is even rising. Additionally, it is
there are some side effects including the one of the most open and edible platforms
increasing number of car accidents. On providing software developers easy access
average one out of every three motor to phone hardware and rich software. API.
vehicle accidents results in some type of They envision Android-based smartphones
injury. Traffic accidents are one of the as a powerful and widely used
leading causes of fatalities in most of the participatory sensing platform in near
countries. future. The paper they had examines
As number of vehicle increases mean Android smartphones in the context of
while the accident also increases. The road surface quality monitoring. They
government has taken number of actions evaluated a set of pothole detection
and so many awareness program also algorithms on Android phones with a
contacted even though the accident sensing application while driving a car in
increases as population increases. There is urban environment. The results provide
need to design a system that will help to rest insight into hardware differences
victim who suffer from accidents. Half of between various smartphone models and
the fatalities are due to lack of quick suggestions for further investigation and
medical aid. Many systems that make use optimization of the algorithm, sensor
of on board accident unit are in existence choices and signal processing.
but there are no efficient systems that can [3] Jorge Zaldivar, Carlos T. Calafate et
detect accidents through smart phones. al;The paper combine smartphones with
existing vehicles through an appropriate
interface they are able to move closer to
3. LITERATURE SURVEY the smart vehicle paradigm, offering the
[1] Attila Bonyar,Oliver Krammer et al; user new functionality and services when
The paper gives an overview on the driving. In this paper they propose an
existing eCall solutions for car accident Android based application that monitors
detection. Sensors are utilized for crash the vehicle through an On Board
sensing, for notification. eCall is an Diagnostics (OBD-II) interface, being able
emergency call that can be generated to detect accidents. They proposed
either manually by passenger or application estimates the G force
automatically via activation of in-vehicle experienced by the passengers in case of a
sensors when a serious accident detects. frontal collision, which is used together
When system activated the in-vehicle with airbag triggers to detect accidents.
eCall system established a 112 voice The application reacts to positive detection
connection directly to the nearest safety by sending details about the accident
answering point. Even if passenger is not through either e-mail or SMS to pre-end
able to speak, a minimum set of data destinations, immediately followed by an
(MSD) is sent to safety point which automatic phone call to the emergency
include location of crash site, the services. Experimental results using a real
triggering mode, the vehicle identification vehicle show that the application is able to
number, timestamp, and current location. react to accident events in less than 3
This way of information that is valuable seconds, a very low time, validating the
for emergency res ponder to reaching them feasibility of smart-phone based solutions
as soon as possible. for improving safety on the road.

[4] Joaquim Ferreira, Arnaldo Oliveira et vehicles, Texive is able to successfully

al;The paper gives the information of detect dangerous operations with good
wireless vehicular networks for sensitivity, specificity and accuracy by
cooperative Intelligent Transport Systems leveraging the inertial sensors integrated in
(ITS) have raised widespread interest in regular smartphones.
the last few years, due to their potential [6] Brian Dougherty, Adam Albright, and
applications and services. Cooperative Douglas et al; The paper shows how
applications with data sensing, acquisition, smartphones in a wireless mobile sensor
processing and communication provide an network can capture the streams of data
unprecedented potential to improve provided by their accelerometers,
vehicle and road safety, passengers compasses, and GPS sensors to provide a
comfort and efficiency of track portable black box that detects traffic
management and road monitoring. Safety, accidents and records data related to
efficiency and comfort ITS applications accident events, such as the G-forces
exhibit tight latency and throughput (accelerations) experienced by the driver.
requirements, for example safety critical It also present architecture for detecting
services require guaranteed maximum car accidents based on WreckWatch,
latency lower than 100ms while most which is a mobile client/server application
infotainment applications require QoS we developed to automatically detect car
support and data rates higher than 1 accidents. How sensors built into a
Mbit/s. The mobile units of a vehicular smartphone detect a major acceleration
network are the equivalent to nodes in a event indicative of an accident and utilize
traditional wireless network, and can act as the built-in 3G data connection to transmit
the source, destination or router of that information to a central server. That
information. Communication between server then processes the information and
mobile nodes can be point-to- point, point- notices the authorities as well as any
to-multipoint or broadcast, depending on emergency contacts.
the requirements of each application. [7] .Deepak Punetha, Deepak Kumar,
Besides the adhoc implementation of a Vartika Mehta et al; The paper shows how
network consisting of neighboring vehicles An accident is a deviation from expected
joining up and establishing Vehicle-to- behavior of event that adversely affects the
Vehicle (V2V) communication, there is property, living body or persons and the
also the possibility of a more traditional environment. Security in vehicle to vehicle
wireless network setup, with base stations communication or travelling is primary
along the roads in Vehicle-to- concern for everyone. The work presented
Infrastructure (V2I) communication that in this article documents the designing of
work as access points and manage the owe an accident detection system. The accident
of information, as well as portals to detection system design informs the police
external WANs. control room or any other emergency
[5] JCheng Bo,Xuesi Jian et al;The paper calling system about the accident. An
dense a critical task of dynamically accelerometer sensor has been used to
detecting the simultaneous behavior of detect abrupt change in g-forces in the
driving and texting using smartphone as vehicle due to accident. When the range of
the sensor. They propose, design and g- forces comes under the accident
implement texive which achieve the goal severity, then the microcontroller activates
of detecting texting operations during the GSM modem to send a pre-stored SMS
driving utilizing irregulaties and rich micro to a preened phone number. Also a buzzer
movements of user. Without relaying on is switched on. The product design was
any infrastructures and additional devices, tested in various conditions. The test result
and no need to bring any modification to

confirms the stability and reliability of the 4. GAP ANALYSIS

system Pa Smart Micr GPS Sen Cost
[8] Alexandra Fanca, Adela Puscasiu et al; pe Phone o- - sor -
The paper gives the information about r used contr accu Effe
implementation of system, able to achieve No oller racy ctive
a set of information from the user, 10. No Yes No Exte No
information that associated with a location rnal
using a GPS tracking system and creates 11. No Yes Yes Exte No
an accident report. The system sense the rnal
Gps coordinates of the person, display the 12. Yes No Yes Inte Yes
coordinates on map and computes the rnal
shortest root to the accident site. Also, the 13. Yes No Yes Inte No
system is automatic detect the accident rnal
when occurs. The paper focuses on mobile 14. No Yes Yes Exte No
part of the system. rnal
Table 4.1: Gap Analysis table
5. EXISTING SYSTEM smartphone. Our System is providing an
This system has used algorithm for alert about accident prone areas soon as
accident detection which is about using the vehicles enters into these region. As
accelerometer sensor in the vehicle side. soon as certain
And at the receiver side the location of the events of rollover, deceleration etc. are
accidents can be known by displaying the detected by the android sensors the
occurrence location name with the newly accident confirmation must be provided.
developed android application. By The response needs to quick. On
identifying the changes in the confirmation of
accelerometer sensor tilt the possibility of accident the concerned authorities must be
accident can be known. This system contacted immediately. If a certain area
adopted two different technologies namely has more number of accidents and it is not
embedded and android. Embedded registered within the app the details of
technology is used to detect the accident such area will be reported by the users.
using accelerometer sensor and android events of rollover, deceleration etc. are
technology is used to determine that detected by the android sensors the
location instead of latitude and longitude accident confirmation must be provided.
values so that even a layman can The response needs to quick. On
understand these values and can know confirmation of accident the concerned
about the vehicle location. And andoid app authorities must be contacted immediately.
that specifies the location name when the If a certain area has more number of
mobile receives GPS data plays a major accidents and it is not registered within the
role in this system. app the details of such area will be
The major limitation of the system is reported by the users.
signal to the GPS receiver. GPS receiver Flowchart is given below for further
requires good signal conditioning so as to understanding of this application.
ensure exact or correct location data. And Flowchart
also the system is cost inefficient.
6. PROPOSED WORK
Today, almost everyone in the world have
smartphone in hand. In this project we
present an android application, a
lightweight, flexible and power efficient

false positives as well as its capabilities for

accident reconstruction.
REFERENCES
[1] Abdul Khaliq, Amir Qayyum, Jurgen Pannek,
―Prototype of Automatic Accident and
Management in Vehicular Environment using
VANET and IOT‖,Nov 2017.
[2] Bruno Fernandes, Vitor Gomes,Arnaldo
Oliveira, ―Mobile application for automatic
accident detection multimodal alert‖, Oct
2015.
[3] Jie Yang, Jie Wang, Benyuan Liu, ―An
Intersection collision warning system using
WiFi smartphones in VANET‖, 2012.
[4] Sneha R.Sontakke, Dr.A.D.Gawande, ―Crash
notification system for portable devices‖, Nov
201
[5] G. Jaya Suma, R.V.S. Lalitha, ―Revitalizing
VANET communication using Bluetooth
devices‖, 2016.
[6] M.B.I. Reaz, Md. Syedul Amin, Jubayer Jalil,
―Accident detection and reporting using
GPS,GPRS and GSM technology‖, 2012.
[7] Evellyn S.Cavalcante, Andre L.L. Aquino,
Figure 6.1. Accident Scenario Antonio A.F. Loureiro, ―Roadside unit
deployment for information dissemination in a
7. CONCLUSION VANET‖, 2018.
Accident detection systems facilitate [8] Hamid M. Ali, Zainab S. alwan, "Car accident
detection and notification system using
decrease dead stemming from car smartphone",2015.
accidents by decrease the reaction time of [9] Oliver Walter, Joerg Schmalenstroeer,
emergency responders. The greatest Andreas Engler, "Smartphone based sensor
advantage of this project is that it needs no fusion for improved vehicular
cellular networks and it fully utilizes the navigation",2013.
[10] Parag Parmar, Ashok M. Sapkal, ―Real time
capabilities of the smart phone. This Detection and reporting of vehicle
project provides two offerings to the collision‖,2017
learning of using smartphone based [11] Dr. Sasi Kumar, Soumyalatha, Shruti G
accident detection systems. First, we Hegde,‖ Iot approach to save life using GPS
explain solutions to key issues connected for the traveller during Accident‖,2017
[12] Jayanta Pal, Bipul Islam,‖Method for
with detecting traffic accidents, such as smartphone based accident detection‖,2018
preventing false positives by utilizing [13] Henry Messenger, Leonid Baryudin,‖Fall
portable environment information and detection system using a combination of
polling onboard sensors to detect huge accelerometer audio input and
accelerations. Second, we present the magnetometer‖,2017.
[14] Bannaravuri, Amrutha Valli,‖Vehicle
architecture of our prototype smartphone- positioning system with accident detection
based accident detection system and using accelerometer sensor and android
empirically analyze its capability to resist technology‖,2017

GENERATION OF MULTI-COLOR QR CODE

USING VISUAL SECRET SHARING SCHEME
Nirup Kumar Satpathy1, Sandhya Barikrao Ingole2, Pari Sabharwal3, Harmanjeet Kour4
1,2,3,4
Dept. of Computer Engineering,Smt. Kashibai Navale College of
Engineering,Savitribai Phule Pune University,Pune,India.
nirup2123@gmail.com1, sandhyaingole2@gmail.com2, pari.sabharwal2212@gmail.com3,
kourharman06@gmail.com4
ABSTRACT
The QR code was intended for storage data and fast reading applications. Quick
Response (QR) codes were extensively used in fast reading applications such as
statistics storage and high-speed device reading. Anyone can gain get right of entry to
data saved in QR codes; hence, they're incompatible for encoding secret statistics
without the addition of cryptography or other safety. This paper proposes a visual secret
sharing scheme to encode a secret QR code into distinct shares. In assessment with
other techniques, the shares in proposed scheme are valid QR codes that may be
decoded with some unique that means of a trendy QR code reader, so that escaping
increases suspicious attackers. In addition, the secret message is recovered with the aid
of XOR-ing the qualified shares. This operation which can effortlessly be achieved the
use of smartphones or different QR scanning gadgets. Contribution work is, to
maximize the storage size of QR code and generating multi-colored QR code.
Experimental results show that the proposed scheme is feasible and cost is low. Two
division approaches are provided, which effectively improves the sharing efficiency of
(k, n) method. Proposed scheme's high sharing performance is likewise highlighted in
this paper.
KEYWORDS
Division algorithm, error correction capacity, high security, (k, n) access structure,
Quick Response code, visual secret sharing scheme
1. INTRODUCTION
In recent years, the QR code is widely
used. In daily life, QR codes are used in a
variety of scenarios that include
information storage, web links,
traceability, identification and
authentication. First, the QR code is easy
to be computer equipment identification,
for example, mobile phones, scanning
Fig. 1: Specific QR Code Structure
guns. Second, QR code has a large storage
capacity, anti-damage strong, cheap and so
The module get it together is set by
on.
timing patterns. Furthermore, the format
Specific QR code structure information areas contain error correction
As represented in Fig. 1, the QR code has level and mask pattern. The code version
a unique structure for geometrical and error correction bits are stored in the
correction and high speed decoding. Three version information areas.
position tags are used for QR code The popularity of QR codes is primarily
detection and orientation correction. One due to the following features:
or more alignment patterns are used to  QR code robust to the copying
code deformation arrangement. process,

 It is easy to read by any device and transformed binary form of the watermark
any user, data into the DWT domain of the cover
 It has high encoding capacity image and uses a unique image code for
enhanced by error correction the detection of image distortion. The QR
facilities, code is embedded into the attack resistant
 It is in small size and robust to HH component of 1stlevel DWT domain
geometrical distortion. of the cover image and to detect malicious
Visual cryptography is a new secret interference by an attacker. Advantages
sharing technology. It improves the secret are: More information representation per
share images to restore the complexity of bit change combined with error correction
the secret, relying on human visual capabilities. Increases the usability of the
decryption. Compared with traditional watermark data and maintains robustness
cryptography, it has the advantages of against visually invariant data removal
concealment, security, and the simplicity attacks. Disadvantages are: Limited to a
of secret recovery. The method of visual LSB bit in the spatial domain of the image
cryptography provided high security intensity values. Since the spatial domain
requirements of the users and protects is more susceptible to attacks this cannot
them against various security attacks. It is be used.
easy to generate value in business In [3] paper, design a secret QR
applications. In this paper, proposed a sharing approach to protect the private QR
standard multi-color QR code using data with a secure and reliable distributed
textured patterns on data hiding by text system. The proposed approach differs
steganography and providing security on from related QR code schemes in that it
data by using visual secret sharing scheme uses the QR characteristics to achieve
secret sharing and can resist the print-and-
2. MOTIVATION scan operation. Advantages are: Reduces
The motivation of the work is to propose the security risk of the secret. Approach is
the storage capacity can be significantly feasible. It provides content readability,
improved by increasing the code alphabet cheater detectability, and an adjustable
q or by increasing the textured pattern size. secret payload of the QR barcode.
It increases the storage capacity of the Disadvantages are: Need to improve the
classical QR code. It provides security for security of the QR barcode. QR technique
private message using visual secret sharing requires reducing the modifications.
scheme. The two-level QR code (2LQR),
3. STATE OF ART has two public and private storage levels
The paper [1] proves that the contrast and can be used for document
of XVCS is times greater than authentication [4]. The public level is the
OVCS. The monotone property of OR same as the standard QR code storage
operation degrades the visual quality of level; therefore it is readable by any
reconstructed image for OR-based VCS classical QR code application. The private
(OVCS). Accordingly, XOR-based VCS level is constructed by replacing the black
(XVCS), which uses XOR operation for modules by specific textured patterns. It
decoding, was proposed to enhance the consists of information encoded using
contrast. Advantages are: Easily decode q_ary code with an error correction
the secret image by stacking operation. capacity. Advantages are: It increases the
XVCS has better reconstructed image than storage capacity of the classical QR code.
OVCS. Disadvantages are: Proposed The textured patterns used in 2LQR
algorithm is more complicated. sensitivity to the P&S process.
In [2] paper, present a blind, key based Disadvantages are: Need to improve the
watermarking technique, which embeds a pattern recognition method. Need to

increase the storage capacity of 2LQR by designed scheme is feasible to hide the
replacing the white modules with textured secrets into a tiny QR tag as the purpose of
patterns. steganography. Only the authorized user
To protect the sensitive data, [5] paper with the private key can further reveal the
explores the characteristics of QR concealed secret successfully.
barcodes to design a secret hiding Disadvantages are: Need to increase the
mechanism for the QR barcode with a security.
higher payload compared to the past ones.
4. GAP ANALYSIS
For a normal scanner, a browser can only
reveal the formal information from the TABLE:GAP ANALYSIS
marked QR code. Advantages are: The
Sr. No. Author, Title and Journal Technique Advantages
Name Used
1 C. N. Yang, D. S. Wang, XOR-based 1. Easily decode the
―Property Analysis of XOR- VCS (XVCS) secret image by stacking
Based Visual Cryptography,‖ operation.
IEEE Transactions on Circuits 2. XVCS has better
& Systems for Video reconstructed image
Technology, vol. 24, no. 12 than OVCS.
pp. 189-197, 2014.
2 P. P. Thulasidharan, M. S. Watermarking 1. More information

Nair, ―QR code based blind technique for representation per bit
digital image watermarking QR code change combined with
with attack detection code,‖ error correction
AEU - International Journal of capabilities.
Electronics and 2. Increases the usability
Communications, vol. 69, no. of the watermark data
7, pp. 1074-1084, 2015. and maintains
robustness against
visually invariant data
removal attacks.
3 P. Y. Lin, ―Distributed Secret A secret QR 1. Reduces the security
Sharing Approach with sharing risk of the secret.
Cheater Prevention Based on scheme 2. Approach is feasible.
QR Code,‖ IEEE Transactions 3. It provides content
on Industrial Informatics, vol. readability, cheater
12, no. 1, pp. 384-392, 2016. detectability, and an
adjustable secret
payload of the QR
barcode.
4 I. Tkachenko, W. Puech, C. Two-level QR 1. It increases the
Destruel, et al., ―Two-Level code storage capacity of the
QR Code for Private Message classical QR code.
Sharing and Document 2. The textured patterns
Authentication,‖ IEEE used in 2LQR
Transactions on Information sensitivity to the P&S
Forensics & Security, vol. 11, process.
no. 13, pp. 571-583, 2016.

5 P. Y. Lin, Y. H. Chen, ―High Secret hiding 1. The designed scheme

payload secret hiding for QR is feasible to hide the
technology for QR codes,‖ barcodes. secrets into a tiny QR
Eurasip Journal on Image & tag as the purpose of
Video Processing, vol. 2017, steganography.
no. 1, pp. 14, 2017. 2. Only the authorized
user with the private key
can further reveal the
concealed secret
successfully.
5. PROPOSED WORK
In this paper, an innovative scheme is
proposed to improve the security of QR
codes using the XVCS theory. First, an
improved (n, n) sharing method is
designed to avoid the security weakness of
existing methods. On this basis, consider
the method for (k, n) access structures by
utilizing the (k, k) sharing instance on
every k-participant subset, respectively.
This approach will require a large number
of instances as n increases. Therefore,
presents two division algorithms to
classify all the k-participant subsets into
several collections, in which instances of
multiple subsets can be replaced by only
one.
 Enhanced (n, n) sharing
method
Fig 2: Proposed System Architecture
 (k, n) sharing method
Based on the enhanced (n, n) method, a (k, 6. MATHEMATICAL MODEL
n) method can be achieved if we apply the Two collections of Boolean matrices
(k, k) instance to every k-participant subset denoted by and consist of an (n, n)-
of the (k, n) access structure. However, XVCS if the following conditions are
there will be a huge amount of (k, k) satisfied:
instances.
Advantages are:
 Secure encoding of document or text.
 Text steganography for message
encoding.
 Increases the sharing efficiency. (2)
 VCS is low computational complexity. The first property is contrast, which
 Higher security and more flexible illustrates that the secret can be recovered
access structures. by XOR-ing all participant shares. The
 Computation cost is less. second property is security, which
 stego synthetic texture for QR code prevents any k (k < n) participants from
hiding. gaining any knowledge of the secret.
Enhanced (n, n) sharing method

Define two blocks and belong to an which makes improvement mainly on two
identical group G if is satisfied. aspects: higher security and more flexible
(3) access structures. In addition, we extended
With above definition, we can divide the access structure from (n, n) to (k, n) by
further investigating the error correction
into several
mechanism of QR codes. Two division
groups . For example, to approaches are provided, effectively
determine whether and are of a improving the sharing efficiency of (k, n)
same group, we calculate . method. Therefore, the computational cost
of our work is much smaller than that of
If ,
the previous studies which can also
we can conclude that and are of an achieve (k, n) sharing method. The future
identical group, and vice versa. A block work will make the QR code reader for
different from any other blocks will not be scanned QR code within fraction of
contained in any group. seconds.
is said to be responsible for if
REFERENCES
is reversed to share . [1] C. N. Yang, D. S. Wang, ―Property Analysis of
Let denote the case that is XOR-Based Visual Cryptography,‖ IEEE
Transactions on Circuits & Systems for Video
responsible for and let Technology, vol. 24, no. 12 pp. 189-197, 2014.
represent the opposite. A matrix X is [2] P. P. Thulasidharan, M. S. Nair, ―QR code
constructed by solving (1). based blind digital image watermarking with
attack detection code,‖ AEU - International
Journal of Electronics and Communications,
vol. 69, no. 7, pp. 1074-1084, 2015.
[3] P. Y. Lin, ―Distributed Secret Sharing
Approach with Cheater Prevention Based on
QR Code,‖ IEEE Transactions on Industrial
Informatics, vol. 12, no. 1, pp. 384-392, 2016.
[4] I. Tkachenko, W. Puech, C. Destruel, et al.,
―Two-Level QR Code for Private Message
Sharing and Document Authentication,‖ IEEE
Transactions on Information Forensics &
Security, vol. 11, no. 13, pp. 571-583, 2016.
[5] P. Y. Lin, Y. H. Chen, ―High payload secret
hiding technology for QR codes,‖ Eurasip
Journal on Image & Video Processing, vol.
2017, no. 1, pp. 14, 2017.
[6] https://en.wikipedia.org/wiki/QR_code
[7] F. Liu, Guo T: Privacy protection display
(4) implementation method based on visual
If n satisfies the condition , passwords. CN Patent App. CN
201410542752, 2015.
there must be a solution to (1) when [8] S J Shyu, M C Chen, ―Minimizing Pixel
In addition, we can adjust the Expansion in Visual Cryptographic Scheme for
value of to balance General Access Structures,‖ IEEE
Transactions on Circuits & Systems for Video
errors between the covers and the Technology, vol. 25, no. 9, pp.1-1,2015.
reconstructed secret. Based on X, we [9] H. D. Yuan, ―Secret sharing with multi-cover
design a new sharing algorithm. adaptive steganography,‖ Information
Sciences, vol. 254, pp. 197–212, 2014.
[10] J. Weir, W. Q. Yan, ―Authenticating
7. CONCLUSION Visual Cryptography Shares Using 2D
In this paper, we proposed a visual secret Barcodes,‖ in Digital Forensics and
sharing scheme for QR code applications, Watermarking. Berlin, German: Springer
Berlin Heidelberg, 2011, pp. 196-210.

VERIFYING THE INTEGRITY OF DIGITAL FILES

USING DECENTRALIZED TIMESTAMPING ON
THE BLOCKCHAIN
Akash Dhande1, Anuj Jain2, Tejas Jain3, Tushar Mhaslekar4, Prof. P. N. Railkar5, Jigyasa
Chadha6
1,2,3,4,5
Dept of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune, India.
6
akashdhande.1997@gmail.com1, anujjain0703@gmail.com2, tejasjain52@gmail.com3,
tusharmhaslekar@gmail.com4, punamrailkar@gmail.com5, Jigyasachadha14@gmail.com
ABSTRACT
In today's day and age, the integrity and the authenticity of the digital files and
document is a critical issue. Especially if those digital files are to be submitted as the
evidenc in court. For example, a video file of an accident. Fakers can exploit such files
by editing the video and leading the court into the wrong judgement. Therefore, this
paper proposes a system to prove the integrity of a digital file such as video proof of
accident in the above example. The complete system consists of three functions, one for
calculating and storing the hash value of the digital file and its details, second for
proving the integrity of the given file by comparing it with stored hash and its
timestamp and the third function is for storing and retrieving the original file stored on
the InterPlanetary File System (IPFS) network. In this approach, one can store the
integrity of a file and can use it to prove the authenticity of that file by comparing the
hash of another file with the stored hash. This paper proposes a system that uses the
new and emerging technology of Blockchain to secure the integrity of the digital files.
Keywords
Decentralized, Blockchain, Timestamping, IPFS
1. INTRODUCTION inexpensively. A blockchain database is
The proposed system uses the new and managed autonomously using a peer-to-
emerging technologies like Blockchain to peer network and a distributed
store the hash of the files uploaded by user
which will be the integrity of that file and timestamping [7]. They are authenticated
IPFS (InterPlanetary File System) by mass collaboration powered by
Network which will be used to store the collective self-interests. The result is a
original files uploaded along with the robust workflow where participants'
trusted timestamping and the location of uncertainty regarding data security is
the file from where it was uploaded. marginal. So essentially a blockchain is a
distributed ledger which cannot be
Blockchain is a decentralized ledger or tampered with ensuring the security of the
data structure. It can be referred as blocks data stored in it.
in a chain where the corresponding blocks
refer to the blocks, prior to them [5]. A IPFS (InterPlanetary File System) is a
blockchain is a decentralized, distributed peer-to-peer distributed file system that
and public digital ledger that is used to seeks to connect all computing devices
record transactions across many computers with the same system of files [1]. In some
so that the record cannot be altered ways, IPFS is similar to the World Wide
retroactively without the alteration of all Web, but IPFS could be seen as a single
subsequent blocks and the consensus of BitTorrent swarm, exchanging objects
the network. This allows the participants to within one Git repository. In other words,
verify and audit transactions

IPFS provides a high- throughput, content- trust each other not to tamper with data in
addressed block transit. Distributed Content Delivery saves
bandwidth and prevents DDoS attacks,
storage model, with content-addressed which HTTP struggles with.
hyperlinks. This forms a generalized Section 2 contains State of Art, Section 3
Merkle directed acyclic graph (DAG). contains Gap Analysis, Section 4 User
IPFS combines a distributed hash table, an Classes and Characteristics, Section 5
incentivized block exchange, and a self- contains Proposed Work, Section 6
certifying namespace. IPFS has no single contains Conclusion and Future Work and
point of failure, and nodes do not need to the Section 7 contains References.
2. STATE OF ART without a centralized server. This scheme

The papers [2][3][4] talk about can enhance the effectiveness of digital
timestamping which is used in watermarking technology in the field of
cryptocurrencies like bitcoin. This system copyright protection. The concept of
was proposed by Norman Neuschke & digital watermarking was introduced by
Andre Gernnandt in 2015. System uses Meng Zhaoxiong Morizumi Tetsuya in
hash of digital data and can be used to 2018.
record the transactions into the blocks. The system in paper [8] is a peer-to-peer
Also. the papers [5][7] was proposed by distributed file system that seeks to
Rishav and Rajdeep Chaterjee in 2017. It connect all computing devices with the
consists of detailed implementation of same system of files. IPFS combines a
blockchain technology and its use-cases distributed hashtable, an incentivized
includes transactions of multiple parties block exchange, and a self-certifying
based on Hyperledger. The paper [1] namespace. IPFS has no single point of
consists copyright management system failure, and nodes do not need to trust each
based on digital watermarking includes other. This concept was introduced by
blockchain perceptual hash function, quick Juan Benet. The paper [9] introducing
response code (QR), InterPlanetary File blockchain, which is a form of database
System (IPFS) related work to compare storage that is noncentralized, reliable, and
copyrights of digital files. IPFS is used to difficult to use for fraudulent purposes.
store and distribute watermarked images Transactions are made with no middle men

in blockchain. This work proposed by

Govt.
Pinyaphat Tasatanattakool and Chian
Manual Proposed
Techapanupreeda in 2018.The paper
DVS
[10] was proposed in 2018 by Huayi
Verification Services
Duan. The design system gives secure
and trustworthy blockchain
applications and systems in their own Validity Unlimited Unlimited Unlimited
workplaces.
Confidentiality Moderate Medium High
3. GAP ANALYSIS
Table 1 compares various methods Cost of
available today for verifying the Medium Medium Low
integrity of any document. Manual verification
Verification methods like attestation
by Govt. officials is the most common
method used today. The security is Security Low Medium High
low since documents can be easily
tampered. Also, a moderate cost is Energy
associated with each attestation. Nil Medium High
The Document Verification Service Consumption
(DVS) is an online system that allows
organizations to compare a customer's companies. The platform frequency of use
identifying information with a government could be on a daily basis as every sector,
record. The DVS is a secure system that every operations need raw data in the form
operates 24/7 and matches key details of files, images, videos i.e. digital
contained on Australian-issued identifying documents.
credentials, providing a 'yes' or 'no' answer VERIFIERS: Another User Class is of
within seconds. This helps protect verifiers who need to verify their digital
governments, businesses and civilians documents by uploading them on the web
from identity crime. Drawback of the portal. By generating different hash values,
method is that the entire database is user can distinguish between the files. As
centralized. this platform is innovative, any user can
One can only verify government have free and easy access of the platform.
documents using this service.The proposed The platform has a web-based platform
system securely stores the hash code of and users will require an account to secure
any type of digital file on a decentralized, their files.
tamper-proof ledger. Once a hash
fingerprint is embedded in the blockchain, 4.1 SYSTEM FEATURES
it is immutable and will exist "forever" as Decentralized Timestamping: Trusted
a trusted timestamp. Confidential files can timestamping is the process of securely
also be used since only the hashes are keeping track of the creation and
stored and not the actual file. This service modification time of a document. Security
is free of charge. here means that no one—not even the
owner of the document—should be able to
4. USER CLASSES AND change it once it has been recorded
CHARACTERISTICS provided that the time stamper‘s integrity
PRIMARY USER: The main User Class is never compromised [9][2].
that is going to use this product is Blockchain Network: The Blockchain
Insurance companies, Government Network consists of two parts:
authorities, Private sectors like IT

1. Calculation of Hash using SHA-256 block is mined by a network node,

function. it is added in the blockchain
2. Creation of Blocks with has values. network along with the calculated
IPFS System: InterPlanetary File System nonce.
(IPFS) is a protocol and network designed 5. With user's consent, add the file on
to create a content-addressable, peer-to- IPFS for future retrieval. In IPFS,
peer method of storing and sharing each file and all of the blocks
hypermedia in a distributed file system within it are given a unique
[1][6]. fingerprint called a cryptographic
hash. Each network node stores
5. PROPOSED WORK only content it is interested in, and
This project aims to achieve and maintain some indexing information that
the digital integrity of the files. As the helps figure out who is storing
court procedures and insurance procedures what. When looking up files,
take a long time to identify, analyze and you're asking the network to find
verify the digital files our portal will nodes storing the content behind a
shorten the process largely and help the unique hash [6].
users to get the overall process go much 6. Verifying authorities like courts,
faster. government bodies (e-Seva
Kendra, UIDAI, Universities)
5.1 ALGORITHM register themselves on the web
1. Register(userDetails) portal.
2. Upload_file(file) 7. Verifiers can search the blockchain
3. FileHash = network by uploading the file or
calculate_sha256_hash(file) searching by name, hash, date or
4. CreateNewTransaction(owner, time.
fileHash, remarks) 8. The results are fetched from the
5. Upload_file_IPFS(file) blockchain and if hash matches,
6. Register(verifierDetails) then the file is valid.
7. Search_blockchain(hash)
8. Verify_results() 6. CONCLUSION AND FUTURE
WORK
5.2 PROCESS OF THE SCHEME The proposed web-based application
1. Users like government authorities, converts the inputs to hash values using
artists, students who want to secure Secure Hashing Algorithms. The
the integrity of a digital file or converted hash codes are stored in
claim the ownership of a file decentralized and tamperproof transaction
register themselves on the web ledger, i.e. the blockchain along with its
portal. timestamp.
2. User uploads the file on the web Currently, courts do not routinely accept
portal. video/image footage as evidence, because
3. SHA256 hash of the uploaded file it is impossible to prove that the files were
is calculated. not manipulated after the incident. By
4. The newly created block contains using the blockchain of a to store a hash of
various information like owner the digital file it can be proven that the
name, file hash, timestamp, block footage was not manipulated. Any
of previous hash and custom tampering with the file in retrospect would
remarks by the user (if any). This result in a file hash that no longer matches
transaction is added in the list of the hash that was embedded in the
pending transactions. Once the blockchain.The scope of our application in

future is by extending it to a mobile cryptocurrency Bitcoin‖ iConference 2015,

application which can be used to record Newport Beach, CA, USA, March 24-27,
2015.
live and onsite data and secure its [4] Haber, S. and Stornetta, W.S. 1991. How to
integrity. Also, an application may be Time-Stamp a Digital Document. Advances in
extended to add second factor of Cryptology-Crypto ‗90 Proceedings. 3,2
authentication by tracking and storing live (1991),99-111.
locations on to the blockchain network. [5] Rishav Chatterjee and Rajdeep Yadav, ―An
Overview of the Emerging Technology:
Blockchain‖ 2017 3rd International
REFERENCES Conference on Computational Intelligence and
[1] MENG Zhaoxiong, MORIZUMI Tetsuya, Networks (CINE) 2017
MIYATA Sumiko and KINOSHITA [6] IPFS is the Distributed Web: https://ipfs.io/
Hirotsugu, ―Design Scheme of Copyright [7] Decentralized timestamping on the blockchain,
Management System Based on Digital available:
Watermarking and Blockchain‖ 42nd IEEE https://en.wikipedia.org/wiki/Trusted_timesta
International Conference on Computer mping
Software & Applications 2018. [8] IPFS - Content Addressed, Versioned, P2P
[2] Bela Gipp, Corinna Breitinger, Norman File System (White Paper) by Juan Benet.
Meuschke and Joeran Beel, ―CryptSubmit: [9] Blockchain: Challenges and Applications,
Introducing Securely Timestamped Pinyaphat Tasatanattakool and Chian
Manuscript Submission and Peer Review Techapanupreeda in 2018 at ICOIN.
Feedback using the Blockchain‖, ACM/IEEE [10] Chengjun Cai, Huayi Duan, and Cong
Joint Conference on Digital Libraries Wang, ―Tutorial: Building Secure and
(JCDL) 2017. Trustworthy Blockchain Applications‖, IEEE
[3] Norman Neuschke and Andre Gernandt, Conference 2018
―Decentralized Trusted Timestamping using

SMART PHONE SENSOR APP USING

SECURITY QUESTIONS
Prof.Yashanjali Sisodia1, Miss.Monali Sable2, Miss.Rutuja Pawar3
1,2,3
Department of Computer Engineering,GHRCOE, Ahmadnagar, India.
yashanjali44@gmail.com1, Sablemonali907@gmail.com2, Rutujapawar390@gmail.com3
ABSTRACT
Many web applications provide secondary authentication methods, i.e., secret questions
(or passwordrecovery questions), to reset the account password when a user‘s login
fails. However, the answers tomany such secret questions can be easily guessed by an
acquaintance or exposed to a stranger thathas access to public online tools (e.g., online
social networks); moreover, a user may forget her/hisanswers long after creating the
secret questions. Today‘s prevalence of smartphones has granted us newopportunities
to observe and understand how the personal data collected by smartphone sensors and
appscan help create personalized secret questions without violating the users‘ privacy
concerns. In this paper,we present a Secret-Question based Authentication system,
called ―Secret- QA‖ that creates a set of secretquestions on basic of people‘s
smartphone usage. We develop a prototype on Android smartphones, andevaluate the
security of the secret questions by asking the acquaintance/stranger who participate in
ouruser study to guess the answers with and without the help of online tools;
meanwhile, we observethe questions‘ reliability by asking participants to answer their
own questions. Our experimental resultsreveal that the secret questions related to
motion sensors, calendar, app installment, and part of legacy app usage history (e.g.
Phone calls) have the best memorability for users as well as the highest
robustnesstoattacks.
General Terms
Security, Questions, Authentication, AES, secret questions, user authentication system,
and Android smartphones application
Keywords
Security, Questions, Authentication, AES, secret questions, user authentication system,
and Android smartphones application
1. INTRODUCTION using proving solution to the security
Secret questions (i.e. password restoration query.
questions) had been extensively used by 2) Whilst the person wants to get get
many internet packages as the secondary admission to to the very at ease shape of
authentication approach for resetting the facts like banking then additionally he/she
account password when the number one ought to provide answer to the safety
credential is lost [1]. When developing a query. Password recuperation questions
web account, a person can be required to are widely used by many web offerings
select a mystery query from a pre- because the secondary authentication
determined listing supplied by way of the approaches for resetting the account
server, and set solutions for that reason. password while user forgets their primary
The user can reset his account password credential. When person creates their
via supplying the proper answers to the account on generally used websites like
name of the game questions later. Gmail, yahoo, msn and so forth. Person
Secondary Authentication may be has to pick out questions from
classified in 2 sorts. predetermined list of the Questions. All
1) Whilst consumer forgets their password these are blank fillings. User can reset his
and wants to log in to their account by

account password by means of imparting secondary authentication. The article

the correct solution to the security query. discusses secondary authentication
For the easiness of putting and mechanisms, emphasizing the importance
memorizing the answers, most of the name of assembling an arsenal of mechanisms
of the game questions are blank-fillings that meet users' security and reliability
and which can be created based at the needs. Robert Reeder, Stuart Schechter,
lengthy-term remembrance of a ―When the Password Doesn't Work:
consumer‘s non-public history that might Secondary Authentication for Websites‖.
not change over months/years (e.g., [1] Nearly all websites that maintain user-
―What‘s the model of your first specific accounts employ passwords to
automobile?‖). So the studies has verify that a user attempting to access an
discovered that such type of blank-filling account is, in fact, the account holder.. M.
questions created upon the consumer‘s Zviran, W.J. Haga, ―User authentication
long-time period non-public records may by cognitive passwords: an empirical
additionally lead to bad protection and assessment‖ [2] The concept of cognitive
reliability as answers of such Questions passwords is introduced, and their use as a
can be guessed by using the usage of method to overcome the dilemma of
social networking sites. the superiority of passwords that are either difficult to
clever cellphone has supplied a supply of remember or easily guessed is suggested.
the user‘s private records associated with Cognitive passwords are based on personal
the knowledge of his short-term history, facts, interests, and opinions that are likely
i.e., the records amassed with the aid of the to be easily recalled by a user. A brief
clever phone sensors and apps can be used dialogue between a user and a system,
for creating the secret Questions. Quick - where a user provides a system with exact
term non-public records (typically within answers to a rotating set of questions, is
one month) can be used. Short-term non- suggested to replace the traditional
public history is less probable exposed to a authentication method using a single
stranger or acquaintance; due to the fact password. The findings of an empirical
the speedy changes of an occasion that investigation focusing on memorability
someone has experienced within a short and ease-of-guessing of cognitive
time period will increase the resilience to passwords, are reported. They demonstrate
guess attacks. This implies advanced that cognitive passwords are easier to
security for such mystery questions. recall than conventional passwords, while
Advise device present a mystery-question being difficult for others, even those close
based totally Authentication gadget, with to the users, to guess.
the gain of the facts of clever cellphone
sensors and apps without violating the user J. Podd, j. Bunnell, r. Henderson, ―cost-
privacy. In this Authentication machine effective computer security: cognitive and
questions are authentic/fake for easier associative passwords‖ [3] recall and
remembrance of person. guessing rates for conventional, cognitive,
and word association passwords were
2. LITERATURE REVIEW compared using 86 massey university
However, websites must still be able to undergraduates. Respondents completed a
identify users who can't provide their questionnaire covering all three password
correct password, as passwords might be types, returning two weeks later for a
lost forgotten, or stolen. In this case, users recall test. Each respondent also
will require a form of secondary nominated a "significant other" (parent,
authentication to prove that they are who partner, etc.) Who tried to guess the
they say they are and regain account respondent's answers. On average,
access. Websites can use a variety of cognitive items produced the highest recall

rates (80%) but the guessing rate was also by author. Author argues that today‘s
high (39.5%). Word associations produced personal security questions owe their
low guessing rates (7%) but response strength to the hardness of an information-
words were poorly recalled (39%). retrieval problem. However, as personal
Nevertheless, both cognitive items and information becomes ubiquitously
word associations showed sufficient available online, the hardness of this
promise as password techniques to warrant problem, and security provided by such
further investigation questions, will likely diminish over time.
Author supplements our survey of bank
Priyanka sonawane, archana augustine, security questions with a small user study
―enhancing the security of secondary that supplies some context for how such
authentication system based on event questions are used in practice.
logger‖ [5] web application provides
secondary authentication when user 3. STUDY RECRUITMENT
forgets their password. For that user have To study the reliability and security of
to select the question from pre-defined personal questions, we ran a laboratory
lists of question which includes user long study over four separate days between
term history question like what is your first march 22 and june 23, 2008, with a
school, what is your birth place etc. follow-up study in september and october.
Answer of such question will not change The cohorts assigned to each day are
over a decade. Answer of this question can shown in in table 2a. The study
be easily break by using social networking encompassed both the personal questions
sites like facebook as well as answer of used by windows live‘s password-reset
this question will also be guess by brute workflow and the
force attack . So to overcome this problem 3.1 Participant recruitment
author present secondary authentication Our recruiting team selected participants
system based on mobile data of user. from a larger pool of potential participants
Today smart phones come with inbuilt they maintain for all studies at Microsoft.
features like gps. Author used the data for The pool contains members of the general
calls, sms history, calendar, application public who had been recruited via public
installment and based on this data are have events, lotteries, and our website. We
created the question and categorized them required that participants speak English as
as mcq, blank filling, true/false .to fetch their primary language and not be
the user mobile activity svm algorithm is employed by Microsoft.Our recruiters
used and to keep the answer of the selected a balance of men and women; 64
question secure author have used rsa participants were male and 66 female. The
algorithm. recruiters also selected participants with a
Ariel rabkin, ―personal knowledge diversity of ages and
questions for fallback authentication: professionsParticipants in the first three
security questions in the era of facebook‖ cohorts were required tobe Hotmail users
[4] security questions (or challenge for at least three months and to access their
questions) are commonly used to account at least three times a week. The
authenticate users who have lost their great majority of participants (83%) had
passwords. Author examined the password been using their Hotmail account for at
retrieval mechanisms for a number of least four years, as detailed in Table
personal banking websites, and found that 2d.After reaching one qualified
many of them rely in part on security participant, our recruiters would ask if the
questions with serious usability and participant had a coworker,friend, or
security weaknesses. Author discusses family member who might also be
patterns in the security questions observed qualified for the study. Recruiters then

interviewed potential partners to ensure we would ask the same questions later to
they met our requirements. determine how well they remembered the
3.2 Initial laboratoryvisit answers. We offered two prizes (an
XBOX 360 and a Zune digital music
We scheduled participants for a
player) and gave participants a virtual
two-hour visit to perform the tasks
lottery ticket for each question they both
summarized
answered and later recalled.We
Participants in each session were
randomized the question order for each
split into groups and placed into
participant.We asked participants to mark
different rooms such that no two
questions they were either unable or
partnerswere in the same room.
unwilling to answer. We instructed
Each partner was placedat a
participants that capitalization,
computer. We seated participants
punctuation, and spaces would be ignored
sufficiently far from each other to
when comparing answers.We anticipated
ensure that their screens, on which
participants might try to increase their
their answers might appear while
chance of recalling their answers by
being typed, could not be seen by
providing the same answer for all
others. All questions were asked
questions. We added a rulethat eliminated
using web survey software, though
rewards for recalling the same answer
participants were required to be on-
numerous times. We also feared that if
site to preventcollusion.
participants
Table 1. Order of laboratory visit tasks
1) Move to room separate frompartner
3.3 Guessing by acquaintances
We asked participantsto describe their
2) Answer demographic questions
relationship with their partnerand asked
3) Authenticate to Hotmail using
them whether they would trust their
personal question (cohorts1-3)
partner with their Hotmail password. Then
4) Answer personal questions for
we asked them to guess their partners‘
top four webmail services
answers. As before, we presented the
5) Describe relationship withpartner
questions in random order and rewarded
6) Guess partner‘s answers to
success with an increased opportunity to
personalquestions
win one of our prizes, though we could not
7) Attempt to recall answers to own
tell participants which answers were
personalquestions
correct. We allowed participants to guess
8) Second chance to guess partner‘s
up to five times by placing guesses on
questions using onlineresearch
separate lines. We restricted participants
(cohorts 2-4)
from communicating answers to each other
Authentication to HotmailWe explained
by asking them to turn off their mobile
to participants how personal questions
devices (―as a courtesy to others‖),
could be used to reset the passwords
isolating them in separate rooms, and
participants‘ used to logintoHotmail.We
monitoring their behavior.After running
asked the 116 participants in the first three
the first cohort of the study (40
cohorts (those selected to be Hotmail
participants),we discovered that many
users) to attempt to answer their personal
participants weren‘tguessing as hard as we
question. We asked them only to
had hoped. Most were providing at most
authenticate (provide the answer to their
one guess per answer and none appeared
question) and not to actually reset their
tobe performing any online research. We
password if successful.Initial answers to
thus gave the 90 participants in the three
personal questions. We then asked all 130
remaining cohorts (cohorts 2–4 a second
participants to answer all of the personal
opportunity to guess their
questions in use by the top four
partners‘answers. In this second guessing
webmailservices.We told participants that

round, we encouraged them to use search equality algorithm for examining partners‘
engines and social networking sites to guesses due toan artifact of our study. The
research the answers to their partners‘ Illume survey software we used to collect
questions. We also told them that this was the guesses participants provided for their
the last task of the studyin hopes that they partners‘ answers fails to store carriage
might feel less rushed. returns, which we had asked participants to
use to separatetheir guesses. To address
3.4 Limitations this problem our second algorithm, the
We style a user authentication system with substring algorithm, treated a guess as
a group of secret queries created supported valid if it contained a substring that
the info of user‘s daily activity and short- matched the original answer, as suggested
run smartphone usage. We have a by Toomim et al. [16]. The final algorithm
tendency to evaluate the responsibility and we tested was the Levenshtein edit
security by mistreatment true/false sort distance algorithm with two modifications.
secret queries. These queries area unit First, we reduced the cost of transpositions
simple to answer and no have to keep in of two characters (‗swapped‘!‗sawpped‘)
mind as a result of those area units on from two to one. This reduces the cost of
supported user personal life and events. this very common typo to be equal to that
Because of this application security are of a single mistyped character. Second, we
going to be enhance as a result of solely removed the cost of extra characters at the
user knew the events and things he/she did beginning or end of the guess, to adjust for
recently. the artifact that all guess strings were
concatenated together.
4. ANSWER COMPARISON
ALGORITHMS 5. RESULTS
In total, 130 participants initially provided In a world of social media it‘s terribly
2,874 answers and 49 participated in the straightforward for hackers to guess the
follow-up study and tried to recall 1,074 of solution such question. User desires
those answers. We needed an algorithm for effective system to douche this drawback.
determining whether a recollection, or to resolve this drawback we will take a
partner‘s guess, sufficiently matched the facilitate of sensible phone device.
original. We tested three different It‘s terribly difficult task to recollect the
algorithms. For all algorithms, we alpha numeric and symbolic positive
removed all nonalphanumeric characters identification. Single writing system
and forced letters into lower case. When amendment can result in wrong positive
counting the number of attempts to recall identification. To reset positive
an answer, we did not count repetitions of identification, user should answer question
the same guess.1 Attackers learn nothing that was set at the time of registration. This
by being able to repeat a guess, whereas question is understood as secret question.
account holders, who may repeat the same Users should keep in mind these questions
answer thinking they previously mistyped declare very long time. Study show that
it, will not be penalized for this mistake. these queries answer not amendment or
The first algorithm, simple equality, used for months/ year. this could cause to
compares the resulting simplified strings forget the solution of question.
character for character. This is the
algorithm that was used, during the 5.1 Real-world memorability results
memorability follow-up study, to provide While we asked all 116 participants in the
participants with feedback as to whether firstthree cohorts to try to reset their
they had recalled their answers correctly. password using their personal question,
Unfortunately, we could not use the not all accounts had a question configured.

Furthermore, an answer alone was not can catch and match with answer. If
sufficientto authenticate: a zip code answer given by user are correct then
previously associated with the account was secret can reset otherwise exposure can
also required.A total of 99 participants capture mechanically and send to the
reported being asked to provide the answer reregister email id.We style a user
to their personal question. Only 43 (43%) authentication system with a group of
reported being able to successfully provide secret queries created supported the info of
the correct answer and their zip code. The user‘s daily activity and short-run
majority, 56 (57%) could not reset their smartphone usage. We have a tendency to
password and reportedbeing unable to evaluate the responsibility and security by
remember either the answer or the zip code mistreatment true/false sort secret queries.
they had provided when they set up the These queries area unit simple to answer
account. When asked why they had trouble and no have to keep in mind as a result of
authenticating,75% participants suspected those area units on supported user personal
they may have beenunable to answer their life and events. Because of this application
personal question and 31% reported that security are going to be enhance as a result
they may have been unable to recall of solely user knew the events and things
thezipcode they had previous provided. A he/she did recently.Understanding
surprising 13% of participants suspected Smartphone Sensor and App Data for
that the reason they could not answer their Enhancing the Security of Secret
personal question was because they had Questions is anandroid base project which
intentionally provided a bogus answer collect the user activity data like user
when setting up their account. location, call log history. This data will
used to generate question for resetting
6. SYSTEM ARCHITECTURE password.User will install our 3rd party
Understanding Smartphone device And application. This application will help to
App knowledge for enhancing the generate the and ask question based on
protection of Secret queries is a golem daily activity. These questions are based
base project that collects the user activity onthe short time duration like week,
knowledge like user location, decision log month.At the beginning user will install
history. This knowledge can accustomed application in his/her mobile phone.
generate question for resetting secret. User Application will continuously capture
can install our third party application. This events, this event data is extracted and
application can facilitate to come up with send back to the application. Application
and raise question supported daily activity. will generate question and answer as per
These queries are supported the short time data. These question and answer will store
period like week, month.At the start user to the database. Question generation
can install application in his/her portable. process is continuously executed in back
Application can incessantly capture ground.
events; this event knowledge is extracted 6.1 Response Protocol
and challenge to the appliance. We create three types of secret questions:
Application can generate question and A ―True/false‖ question is also called a
answer as per knowledge. These question ―Yes/No‖question because it usually
and answer can store to the information. expects a binary answer of ―Yes‖ or ―No‖;
Question generation method is incessantly a ―multiple-choice‖ question or a ―blank-
dead in back ground. Order and answer filling‖ question that typically starts by a
can replace with new question and answer. letter of ―W‖, e.g.,
User access social media application and Who/Which/When/What(and thus we call
request to the reset the secret. Question these two types of questions as ―W‖
can fetch raise to user, response from user questions).We have two ways of creating

questions in either a ―Yes/No‖ or a ―W‖ smartphone sensor/app data to improve the

format: (1) a frequency based question like security and reliabilityof secret questions.
―Is someone (Who is) your most-frequent Researchers are free to create more secret
contact in last week?‖; and (2) anon- questions with new questionformats or by
frequency based one like ―Did you (Who using new sensor/app data, which leads to
did you) call (Someone) last week?‖, in more flexibility in the design of
our system are example questions that we asecondary authentication mechanism.
have forstudying the benefits of using

up for a new account.386.I would
7. ACKNOWLEDGMENTS prefer to give thanks the researchers
This paper was inspired by the griping likewise publishers for creating their
of Jon Howell. We are indebted to Will resources available. I‘m conjointly
Ip, MaritzaJohnson,and Arry Shin for grateful to guide, reviewer for their
their assistance in running our study. valuable suggestions and also thank the
We are also grateful for the valuable college authorities for providing the
feedback on earlier drafts provided by required infrastructure and support.
Robert W. Reeder and the anonymous
reviewers.EpilogOn November 12, REFERENCES
2008, we contacted AOL, Google, and [1] R. Reeder and S. Schechter, When the
password doesnt work: Secondary
Yahoo! to provide them with a draft of
authentication for websites, S P., IEEE, vol. 9,
this paper and share our intent to no. 2, pp. 4349, March 2011.
publish at this symposium. We asked to [2] M. Zviran and W. J. Haga, User authentication
be notified by the end of 2008 if they by cognitive passwords: an empirical
had concerns that might warrant the assessment, in Information Technology,
1990.Next Decade in Information Technology,
delay of publication, so as to provide
Proceedings of the 5th Jerusalem Conference
ample time to discuss these concerns on (Cat. No. 90TH0326-9). IEEE, 1990, pp.
with them and, if necessary, withdraw 137144.
the paper. AOL and Google sent email [3] J. Podd, J. Bunnell, and R. Henderson, Cost-
explicitly consenting to publicationin effective computer security: Cognitive and
associative passwords, in Computer-Human
advance of the deadline. Yahoo! made
Interaction, 1996. Proceedings., Sixth
no request to delay publication. We Australian Conference on. IEEE, 1996, pp.
learned in February 2009 that Yahoo! 304305.
had replaced all nine of the [4] S. Schechter, A. B. Brush, and S. Egelman, Its
personalauthentication questions that its no secret. measuring the security and reliability
of authentication via secret questions, in S P.,
users may choose from when signing

IEEE. IEEE, 2009, pp. 375390. Location based services using android (lbsoid),
[5] S. Schechter, C. Herley, and M. Mitzenmacher, in IMSAA. IEEE, 2009, pp. 15.
Popularity is everything: A new approach to [18] M. Oner, J. A. Pulcifer-Stump, P. Seeling, and
protecting passwords from statistical-guessing T. Kaya, Towards the run and walk activity
attacks, in USENIX Hot topics in security, classification through step detection-an android
2010, pp. 18. application, in EMBC. IEEE, 2012, pp.
[6] D. A. Mike Just, Personal choice and challenge 19801983.
questions: A security and usability assessment, [19] W. Luo, Q. Xie, and U. Hengartner, Facecloak:
in SOUPS., 2009. An architecture for user privacy on social
[7] Rabkin, Personal knowledge questions for networking sites, in CSE, vol. 3. IEEE, 2009,
fallback authentication: Security questions in pp. 2633.
the era of facebook, in SOUPS. ACM, 2008, [20] H. Falaki, R. Mahajan, S. Kandula, D.
pp. 1323. Lymberopoulos, R. Govindan, and D. Estrin,
[8] J. C. Read and B. Cassidy, Designing textual Diversity in smartphone usage, in MobiSys.
password systems for children, in IDC., ser. New York, NY, USA: ACM, 2010, pp.
IDC 12. New York, NY, USA: ACM, 2012, 179194.
pp. 200203. [21] Understanding Smartphone Sensor and App
[9] H. Ebbinghaus, Memory: A contribution to Data for Enhancing the Security of Secret
experimental psychology. Teachers college, Questions
Columbia university, 1913, no. 3. [22] L. Nyberg, L. Backman, K. Erngrund, U.
[10] F. I. Craik and R. S. Lockhart, Levels of Olofsson, and L.-G. Nilsson, Age differences
processing: A framework for memory research, in episodic memory, semantic memory, and
Journal of verbal learning and verbal behavior, priming: Relationships to demographic,
vol. 11, no. 6, pp. 671684, 1972. intellectual, and biological factors, The
[11] T. M. Wolf and J. C. Jahnke, Effects of Journals of Gerontology Series B:
intraserial repetition on short-term recognition Psychological Sciences and Social Sciences,
and recall. Journal of Experimental vol. 51, no. 4, pp. P234P240, 1996.
Psychology, vol. 77, no. 4, p. 572, 196 [23] C. Wang, Q. Wang, K. Ren, and W. Lou,
[12] H. Kim, J. Tang, and R. Anderson, Social Privacy-preserving public auditing for data
authentication: harder than it looks, in storage security in cloud computing, in
Financial Cryptography and Data Security. INFOCOM, 2010 Proceedings IEEE, March
Springer, 2012, pp. 115. 2010, pp. 19.
[13] S. Hemminki, P. Nurmi, and S. Tarkoma, [24] S. Yu, C. Wang, K. Ren, and W. Lou,
Accelerometer-based transportation mode Achieving secure, scalable, and fine-grained
detection on smartphones, in Proceedings of data access control in cloud computing, in
the 11th ACM Conference on Embedded INFOCOM, 2010 Proceedings IEEE, March
Networked Sensor Systems, ser. SenSys 2010, pp. 19.
[14] New York, NY, USA: ACM, 2013, pp. [25] R. Faragher and P. Duffett-Smith,
13:113:14. [Online]. Avail- able: Measurements of the effects of multipath
http://doi.acm.org/10.1145/2517351.2517367 interference on timing accuracy in a cellular
[15] J. Clark and P. van Oorschot, Sok: Ssl and radio positioning system, Radar, Sonar
https: Revisiting past challenges and Navigation, IET, vol. 4, no. 6, pp. 818824,
evaluating certificate trust model December 2010.
enhancements, in Security and Privacy (SP), [26] M. Dong, T. Lan, and L. Zhong, Rethink
2013 IEEE Symposium on, May 2013, pp. energy accounting with cooperative game
511525. theory, in Proceedings of the 20th Annual
[16] J. Whipple, W. Arensman, and M. S. Boler, A International Conference on Mobile Computing
public safety application of gps-enabled and Networking, ser. MobiCom 14. New York,
smartphones and the android operating system, NY, USA: ACM, 2014, pp. 531542. [Online].
in SMC. IEEE, 2009, pp. 20592061. Available:
[17] S. Kumar, M. A. Qadeer, and A. Gupta, http://doi.acm.org/10.1145/2639108.2639128.

A SURVEY ON PRIVACY AWARENESS

PROTOCOL FOR MACHINE TO MACHINE
COMMUNICATION IN IoT
Apurva R. Wattamwar1, Dr. P. N. Mahalle2, D. D. Shinde6
1,2
Savitribai Phule Pune University, Pune, India.
6
Aarhus University, Herning, Denmark
apurvawattamwar@gmail.com1, pnmahalle@sinhgad.edu2, ddshinde14@gmail.com
ABSTRACT
Internet of Things (IoT) winds up optional piece of regular day to day existence and
could come upon a risk if security isn't considered before arrangement of
communication. Verification and access control in IoT is similarly critical to set up
secure correspondence between machines. When two machines start communication
then data send from sender to receiver and vice versa, this traveling of data from one to
another is not secure it may consists of man in middle, replay and denial of service
attacks. This paper presents survey about convention utilizing ECC (Elliptic curve
Cryptography) along with ElGamal cryptography, which secure against the previously
mentioned attacks.
Here it partitions a Record into pieces and encodes this section, and repeats the divided
information over the cloud/server hubs. Every one of the hubs stores just a solitary
section of a specific information document that guarantees that even if there should
arise an occurrence of an effective attack; no important data is get to the attacker. So
finally data is get secure from above mentioned attack. This paper gives general idea
and survey about how proposed system works and store fragment of file to protect data
from attacks.
General Terms:
Elliptic curve cryptography (ECC), ElGamal, File Fragmentation, Plain text, Cipher
text.
Keywords: Attribute-based encryption, Cloud storage, Privacy protection, Decryption,
Large universe, Full security.
1. INTRODUCTION (technical) measures in comparison to the
In this paper proposed a novel end-to-end conventional personal computers (PCs).
data integrity Cloud-assisted cyber- From the existing work survey, like this
physical systems (Cloud-CPSs; also here deduce the both security and
known as cyber-physical cloud systems) performance are critical for the next
have broad applications, ranging from generation large-scale systems, such as
healthcare, to smart electricity grid, smart clouds. Therefore, in this project, the
cities, battlefields, military, and so on. In collective approach the issue of security
such systems, client devices (e.g., Android and performance as a secure data
and iOS devices, or resource constrained replication problem. It presents Division
devices such as sensors) can be used to and Replication of Data in the Cloud for
access the relevant services (e.g., in the Optimal Performance and Security that
context of a smart electricity grid, it may judicially fragments user files into pieces
include utility usage data analyzed and and replicates them at strategic locations
stored in the cloud) from/via the cloud. within the cloud.
However, client devices generally have The division of a file into fragments is
less computing capabilities and hence, are performed supported a given user criteria
unlikely to have adequate security such that the individual fragments don‘t

contain any significant data. Each of the are critical for the next generation large-
cloud nodes (here technology use the term scale systems, such as clouds. Therefore,
node to represent computing, storage, in this paper, the collective approach, issue
physical, and virtual machines) contains a of security and performance as a secure
distinct fragment to increase the data data replication problem is defined.
security. In a successful attack on a single
node should not reveal the locations of 3. REVIEW OF LITERATURE
different fragments at intervals in the Paper [1] presents Multi-client accessible
cloud. To keep an attacker uncertain about encryption scheme, scheme, which has
the locations of the file fragments and to various points of interest over the known
further improve the security, here it select methodologies. The related model and
the nodes in a manner that they are not security prerequisites are likewise planned.
adjacent and are at bound distance from It further talks about to expand given plan
each other. The node separation is ensured in a few different ways in order to
by suggested that of the T-coloring. accomplish different search abilities.
In propose paper [2] a secure data access
2. MOTIVATION scheme dependent on character based
The level of security required for device encryption and bio metric validation for
varies dramatically depending upon the distributed computing. System describe the
function of the device. Rather than asking security worry of distributed computing
if the device is secure, it should be asking and after that propose a coordinated
if the device is secure enough or not. integrated data access scheme for
Cloud-assisted cyber-physical systems distributed cloud computing, the strategy
(Cloud-CPSs; also known as cyber- of the proposed conspire incorporate
physical cloud systems) have broad parameter setup, key appropriation,
applications, ranging from healthcare to include layout creation, cloud information
smart electricity grid to smart cities to processing and secure data access control.
battlefields to military, and so on. In such The paper third proposes an identity based
systems, client devices (e.g., Android and data storage scheme where the two
iOS devices, or resource constrained questions from the intra-space and
devices such as sensors) can be used to between areas are considered an agreement
access the relevant services (e.g., in the assaults can be stood up to. Moreover, the
context of a smart electricity grid, it may entrance authorization can be controlled by
include utility usage data analyzed and the proprietor autonomously [3].
stored in the cloud) from/via the cloud. Fourth paper, focuses on the critical issue
However, client devices generally have of identity revocation, System bring re-
less computing capabilities and hence, are appropriating calculation into IBE and
unlikely to have adequate security propose a revocable plan in which the
(technical) measures in comparison to the disavowal activities are appointed to CSP.
conventional personal computers (PCs). With the guide of KU-CSP, the proposed
So the file cryptographic storage is an plan is full-highlighted:
effective method to prevent private data A. It accomplishes steady proficiency
from being stolen or tampered. Data for both calculation at PKG and
integrity is also maintain if attack is private key size at client;
performed for tempered data then it should B. User needs not to contact with
detect and prevent. By which we can able PKG amid key-refresh, at the
to perform secure communication between end of the day, PKG is
two or more devices. permitted to be disconnected in
From the existing work survey, this can the wake of sending the denial
deduce the both security and performance rundown to KU-CSP;

C. No secure channel or client basic construction does not use random

confirmation is required amid oracles. System proves the security of our
key-refresh among client and schemes under the Selective-ID security
KU-CSP [4]. model [14].
Tenth paper presents construction of a
Here in paper [6] document partitions into Decentralized Cipher text-Policy
pieces, and repeat the divided information Attribute-Based Encryption (DCP-ABE)
over the cloud hubs. Every one of the hubs scheme. Under this scheme, any
stores just a solitary section of a specific participating entity can act as an authority
information record that guarantees that by creating a public key. The authority
even if there should be an occurrence of an utilizes the users‘ attributes to generate the
effective assault, no important data is private keys for them. Any user can
uncovered to the aggressor. Besides, the encrypt data in terms of any monotone
hubs putting away the pieces are isolated access structure over attributes issued from
with certain separation by methods for any chosen set of authorities. Hence the
diagram T-shading to preclude an protocol does not depend on any central
aggressor of speculating the areas of the authority [15].
sections. Idea of T-shading chart for part
position just as calculation for section 4. OPEN ISSUES
arrangement has been alluded from this 1 In the cloud/server, for achieving
paper. Document is divided put away on access control and keeping data
different hubs. [6] confidential, the data owners could
Seventh paper shows the protocol is based adopt attribute-based encryption to
on an ECC-based double trapdoor encrypt the stored data.
chameleon hashing. Through informal Users with restricted computing power
security analysis, given paper shows that do square measure but a lot
how protocol is secure against key of possible to delegate the mask of
exposure problem and provides integrity the decoding task to the cloud servers to
and authenticity assurances [7]. cut back the computing value. As a
Eighth paper present, a framework of an result, attribute-based encryption with
m-Health monitoring system based on a delegation emerges.
cloud computing platform (Cloud-MHMS)
is designed to implement pervasive health 2. Still, there are caveats and questions
monitoring. Furthermore, the modules of remaining in the previous relevant works.
the framework, which are Cloud Storage For instance, during the delegation, the
and Multiple Tenants Access Control cloud/servers could tamper or replace the
Layer, Healthcare Data Annotation Layer, delegated cipher text and respond a forged
and Healthcare Data Analysis Layer, are computing result with malicious intent.
discussed [13]. 3. They may also cheat the eligible users
Paper ninth presents two constructions of by responding them that they are ineligible
Fuzzy IBE schemes. Our constructions can for the purpose of cost saving.
be as an Identity-Based Encryption of a Furthermore, during the encryption, the
message under several attributes that access policies may not be flexible enough
compose a (fuzzy) identity. Our IBE as well.
schemes are both error-tolerant and secure
against collusion attacks. Additionally, our
5. PROPOSED WORK

Fig.1: System architecture
5.1 Proposed system architecture Owner distributor will assign the file to
In above Figure 1 shows the architectural user by generating access policies by
flow of proposed system. Here user request considering user attributes like date and
browser and browser accept its request, time stamp after entering encryption key
then through browser file is get uploaded then file will be divided into fragments and
while uploading of file it will bet encrypted store the fragment and its replica on
through defined policy attribute. This files server/cloud. When Authenticated user
integrity and user‘s authentication is login then he will get the file with which
checked via server then file is uploaded on his policy attribute matches. Then he can
cloud/server. request for the file key and download the
When user wants the uploaded file again file after entering secrete key. Third party
then cloud/server check the integrity of auditor will check data integrity of stored
user. After user gets verified then file is fragment that means placed fragment
accessible to end user but here file is given content is changed or not if changed then
in encrypted format. To decrypt this file integrity checker will inform to owner
user needs to get intended key from about that file. Then integrity checker will
authenticated user after getting key user replace tempered fragment with original
decrypt the file by using sender key plus fragment and provide security to the file.
self key. Then original content of file will
be downloaded by the user.
6. MATHEMATICAL MODEL
5.2 System overview
In proposed system owner will get data 6.1 File fragmentation
that file will allocate to users according to Fragment=Size of file/No.of fragments.
users position location and experience. (1)

6.2 Encryption 7.2 ElGamal

Given a message m such that ―0 ≤ m < p‖, The ElGamal encryption system is an
any user ―B‖ can encrypt ―m‖ as follows: asymmetric key encryption algorithm for
―Y‖ is public key and ―d‖ is private key public-key cryptography which is based on
the Diffe-Hellman key exchange. ElGamal
Pick the integer ―k ∈ {1...p−2}‖ uniformly depends on the one way function the first
at random. public key system proposed by Diffe and
Two cipher texts will be generated let it be Hellman requires association of both sides
C1 and C2. to compute a common private key. Those
problems if the cryptosystem should be
C1 = Yk mod p (2) applied to communication system wherever
either side aren‘t able to move in
C2 =m×Yk (mod p) (3) reasonable time due to deferrals in
transmission or inaccessibility of the
C1 and C2 will be sending. receiving party. It means that the proposed
scheme defined by Diffe and Hellman is
6.3 Decryption not a general purpose encryption algorithm
as it can only provide secure secret key
Now for get back the message ‗m‘ that exchange. Thus it presents a challenge for
was send to us, the cryptologists to design and provide a
general purpose encryption algorithm that
m = [C2 × (C1d−1)] mod p satisfies the public key encryption
standards.
M is the original message that is send.
8. CONCLUSION
In this paper proposed a novel end-to-end
7. ALGORITHM data integrity protocol to protect data
aggregation against message tampering.
7.1 Elliptic curve cryptography (ECC)
Our protocol is based on combination of an
Fragment=Size of file/No.of fragments.
ECC and-ELGAMAL-algorithm. Through
Elliptic Curve Cryptography (ECC) is a
informal security analysis, It will show that
term which describes a term of
given protocol is secure for provides
cryptographic tools and protocols whose
integrity and authenticity assurances.
security is relies on special versions of the
discrete logarithm problem. It does not
use numbers modulo p. 9. ACKNOWLEDGEMENT
ECC is relies on sets of numbers that are The authors would like to thank the
related to the mathematical objects called researchers as well as publishers for
elliptic curves. There are rules for adding making their resources available and
and computing multiples of those numbers, teachers for their guidance. We are
even as these are just as there are for thankful to the authorities of Savitribai
numbers modulo p. Phule University of Pune and concern
members of ICINC2019 conference,
ECC includes a variant of many organized by, for their constant guidelines
cryptographic schemes that were initially and support. We are also thankful to the
designed for modular numbers such as reviewer for their valuable suggestions. We
ElGamal encryption and Digital Signature also thank the college authorities for
Algorithm. providing the required infrastructure and
support. Finally, we would like to extend a
heartfelt gratitude to friends and family
members.

Forum on Internet of Things (WF-IoT). IEEE,

2018.
REFERENCES [9] Somkunwar, Rachna, et al. "SECURE
[1] Yang, Yanjiang, Haibing Lu, and Jian Weng. DYNAMIC FRAGMENT AND REPLICA
"Multi-user private keyword search for cloud ALLOCATION OF DATA WITH OPTIMAL
computing." Cloud Computing Technology and PERFORMANCE AND SECURITY IN
Science (CloudCom), 2011 IEEE Third CLOUD."
International Conference on. IEEE, 2011. [10] Quanlu Zhang, Shenglong Li, Zhenhua Li,
[2] Cheng, Hongbing, et al. "Identity based Yuanjian Xing, Zhi Yang, and Yafei
encryption and biometric authentication DaiCHARM: A Cost-efficient Multi-cloud
scheme for secure data access in cloud Data Hosting Scheme with High Availability
computing." Chinese Journal of .2015
Electronics 21.2 (2012): 254-259. [11] Kaufman, Lori M. "Data security in the world
[3] Han, Jinguang, Willy Susilo, and Yi Mu. of cloud computing." IEEE Security &
"Identity-based data storage in cloud Privacy 7.4 (2009).
computing." Future Generation Computer [12] Boru, Dejene, et al. "Energy-efficient data
Systems 29.3 (2013): 673-681. replication in cloud computing
[4] Li, Jin, et al. "1Identity-based Encryption with datacenters." Cluster computing 18.1 (2015):
Outsourced Revocation in Cloud Computing." 385-402.
(2015). [13] Xu, Boyi, et al. "The design of an m-Health
[5] Hur, Junbeom, and Dong Kun Noh. "Attribute- monitoring system based on a cloud computing
based access control with efficient revocation platform." Enterprise Information Systems 11.1
in data outsourcing systems." IEEE (2017): 17-36.
Transactions on Parallel and Distributed [14] Sahai, Amit, and Brent Waters. "Fuzzy
Systems 22.7 (2011): 1214-1221. identity-based encryption." Annual
[6] Ali, Mazhar, et al. "Drops: Division and International Conference on the Theory and
replication of data in cloud for optimal Applications of Cryptographic Techniques.
performance and security." IEEE Transactions Springer, Berlin, Heidelberg, 2005.
on Cloud computing 6.2 (2018): 303-315. [15] Lewko, Allison, and Brent Waters.
[7] Chameleon: A Blind Double Trapdoor Hash "Decentralizing attribute-based
Function for Securing AMI Data Aggregation encryption." Annual international conference
Heng Chuan Tan, Kelvin Lim, Sye Loong on the theory and applications of
Keoh, Zhaohui Tang*, David Leong, Chin cryptographic techniques. Springer, Berlin,
Sean Sum. 2018 IEEE. Heidelberg, 2011.
[8] Tan, Heng Chuan, et al. "Chameleon: A blind
double trapdoor hash function for securing
AMI data aggregation." 2018 IEEE 4th World

SURVEY ON SECURITY ENHANCEMENT IN

NETWORK PROTOCOL
Jagdish S. Ingale1, Pathan Mohd Shafi2, Jyoti Prakash Rajpoot3
1,2
Pune, India.
3Department of ECE, HMR Institute of Technology and Managmenet Delhi
ingalejagdish27@gmail.com1, shafipathan@gmail.com2, jyotiet@gmail.com
ABSTRACT
Now day‘s security is very important issue in network communication. The purpose of
this paper is analyzing efficient and secure communication in network. The network
protocol Transmission Control Protocol/Internet Protocol (TCP/IP) is compare with
Route Once and Cross-connect Many (ROACM) Protocol.
This investigation reduces various malicious attacks on network protocol. ROACM
protocol includes extra header i.e. forward header and backward header to IP address.
Extra header contains dynamic control field and static index of port number towards
number of hops. Instead of network interface address it store index port number. The
Open Shortest Path First (OSPF) is creates the routing tables for routers. In this paper,
we create network using java simulator and introduce all feature of ROACM protocol
in that network. After introducing all feature of ROACM protocol we apply AES
algorithm protocol. The result is analyzing by using variable delay and throughput in
network.
Keywords: AES, Delay Time, Route Once and Cross-connect Many, Routing, Routing
Protocol, Switching, TCP/IP, Throughput.
1. INTRODUCTION intelligent routers or switches. In ROACM
TCP/IP has four layers such as protocol which have extra header to IP
application, transport, network and data address that allows dynamic virtual circuit
link. The higher layer Transmission to be created. This extra header includes
Control Protocol (TCP) manages the all related information to cross-connect the
collection of a message or file into smaller IP packets to the second layer i.e. data link
packets which can be transmitted over the layer. In the call set up phase, the
Internet and obtained by a TCP layer that information is attached to the network
recollect the packets into the authentic layer header and during in data
message. The lower layer Internet Protocol transmission phase, the information is
(IP) handles the part of every packet in stored in the ROACM header. Circulation
order that it receives to the appropriate of the ROACM information can occur
destination. Each edge computer on the below the IP level in networks that contain
network analyses the address where to routers that all agree and support ROACM.
forward the message. Even though some Since this information is attached to the
packets from the identical message are end of an IP packet, the routers, which do
routed in another way than others, they not employ ROACM, are still able to
will be recollecting on the destination. forward packets using regular routing
During traveling from source to protocol. The ROACM protocol itself
destination, most of the IP packets face maps virtual circuit links (indexes), which
different networking devices which have are provided from the local interface tables
different Network Interfaces such as at each router where each index
packet forwarding routers, frame relay, corresponds to a next hope interface
MPLS (Multiprotocol Label Switching), address. This allows interoperability on a
ATM (Asynchronous Transfer Mode), etc. large range of networks.
The ROACM provides many features to

2. MOTIVATION best path for a packet from source station

The motivation behind this paper is to to destination station. The ROACM
provide good service to internet user. This protocol route only once and ROACM
paper based on network protocol for protocol has no need to route more than
efficient and secure communication in once and ROACM cross connect the
network. It provides security to ROACM packet to the many in network. Fast
(Route Once and Cross Connect Many) transmission of data packet is carried out
protocol over a TCP/IP to increase the by using ROACM protocol. But in case of
security of transmitting packet in network. security it is not good so to address
In complex networks, how to improve security is very important.
quality of service of the network has
become the urgent need to address one of 3. LITERATURE SURVEY
the problems. The paper [1] proposes a comparative
Based on the high growth of data analysis of TCP/IP and ROACM protocol-
traffic, which is such as IP packet traffic, it Simulation study this research is based on
is generally considered that the next- comparative analysis of TCP/IP and
generation transport network will be a pure ROACM protocol by creating network
packet based network. How traffic using network simulation tool. In this
transmit in network so as to minimize research run 10 scenarios by using delay
resource utilization and network and throughput as result ROACM protocol
performance. So to increase a network much faster than TCP/IP. But security not
quality of service the ROACM protocol is provide to ROACM protocol. Some
more efficient than the TCP/IP. security issue may occur in that paper.
TCP/IP combines different protocol in The paper [2] proposes Multi-Protocol
different layers of TCP/IP such as at Label Switching uses label to transmit fast
application layer Telnet, FTP (File packet switching in network, and
Transfer Protocol), at transport layer use combines switching method and IP routing
TCP (Transfer Control Protocol), UDP method for purpose of constructing a new
(User Datagram Protocol), at network type of network shape with highest
layer IP, ICMP (Internet Control Message stability and good agility. In order to solve
Protocol), IGMP (Internet Group the problem of network fault occurs MPLS
Management Protocol) and at data link (Multiprotocol Label Switching) packet
layer device driver and interface card used. disorder, use Buffer and Tag of Hundessa
So to improve network transmission delay mechanisms as well as Reverse Backup
and maximize throughput the ROACM Path to reduce the packet loss rate and
protocol is introduced. ROACM protocol minimize the delay time and so on.
helps to speed up packet from source Advantages are: It saves recovery time. It
station to destination station in network. reduces delays. It can guarantee the
The problem of delay require to transmit network transmission stability and
the data in network is reduced by the reliability. Disadvantages are: Need to
ROACM protocol. In our networks, how overcome packet loss ratio.
to improve the network quality of service Dynamic protocol is carried out which
so user satisfied with the network has contains a updates packets for the details
become the urgent need to address one of of label Switch Paths, in conjunction with
the problems. this feedback mechanism is also brought
ROACM protocol is a next generation which find the shortest route amongst
based protocol can speed up packet MPLS (Multiprotocol Label Switching)
forwarding from source station to network [3]. Thus proposed algorithm
destination station by using indexing selects a dynamic method of choosing best
instead of NI address. Routing is finding path through using LU and BP which pick

a backup path inside MPLS network. reduces delay and improves efficiency of
Advantages are: It is reliable and also network. Disadvantages are: Need to
faster. It allocates free path for the packets mature MPLS-TP architecture.
congestion and reduces. Disadvantages The paper [7] proposes a high speed
are: Need to overcome the congestion in routing so in this paper provide alternative
MPLS network. IP switch architecture to gigabit router. IP
In [4] paper proposed to construct table switch architecture provides higher speed
for fast forwarding packet. In binary routing than a gigabit router. In this paper
search on prefix (BSP), to constructing of it uses low level switching flow and it
forwarding table consist of two steps first contains a protocol to allow explicit use
is sorting operation and second is stack and management of the cached
operation. It also solves the problem of the information through an IP switching
ambiguous lookup caused by duplicate network.
entry. Advantages are: Much faster. This The paper [8] includes a
improves the router performance connectionless approach for integrated IP
significantly. Disadvantages are: It stops and also provides a fast ATM
the updated data only in the corresponding (Asynchronous Transfer Mode) switching
sub tree. hardware. In ATM switch the routing
In [5] paper includes an MPLS decision of IP is catch as a soft state such
network, IGP select best path and each that the future packet belongs to hardware
LSP created by over best path selected by rather than software. It provides IP with
LSP towards the destination network. To simple and robust way for speed and
decide the best path to specific destination capacity of an ATM.
networks an IGP is used to spreads routing The paper [9] proposes a mobility
data to all routers in an MPLS domain. framework based on a MPLS
MPLS has capability to classify and (Multiprotocol label Switching). The
manage the traffic in network to offer Optimized Integrated Multi-Protocol Label
higher utilization of resources. Advantages Switching (Optimized I-MPLS) this
are: As compare to other network it framework combines MPLS with the
provides much better traffic engineering MIFA (Mobile IP Fast Authentication
capability. MPLS VPN (Virtual Private protocol). It solves the problem of
Networ) provides advantages that service duplicate resources and reduces the
providers want urgently of their networks, various dropped packet.
contains, manageability, reliability and
scalability. Disadvantages are: Security is
the major issue in this Technology.
MPLS-TP (Multiprotocol Label
Switching – Transport Profile) ring
protection system defined in standards, it
includes the capability to restore traffic
delivery following failure of network
resources to applicable carrier class
transport network requirements. The paper
[6] proposes a new protection mechanism, Fig. 1: Delay for TCP/IP and ROACM.
which combines advantages of both
steering and wrapping approaches, and
which minimizes packet loss significantly
in case of in order delivery. Advantages
are: It achieves fast protection switching
and consequent less packet loss. Highly

delay and throughput. The TCP/IP and

ROACM protocol with security is compare
and performance of TCP\IP and ROACM
protocol is analyzes on the basis of
variable average delay and throughput.
5. PROPOSED SYSTEM
The ROACM protocol provides many
features to intelligent routers/switches. In
this paper, the new protocol, ROACM, has
been included where the IP packet
contains an extra header that allows a
dynamic virtual circuit to be created. This
Fig. 2 Throughput for TCP/IP and ROACM header contains all relevant information to
Figure 1 and 2 show delay graph in cross connect the IP packets at the second
seconds and throughput graph in kbps for layer i.e. data link layer. In the call set up
TCP and ROACM protocol for all stage, the information is attached to the
scenarios. network layer header and in data
transmission stage the information is
4. GAP ANALYSIS stored in the ROACM header (in the
TCP/IP contains four layers such as frame). The ROACM protocol propagates
Application, Transport, Internet and Data information can occur below the IP level
link. TCP/IP is communication protocols in networks that contain routers that all
used to interconnect network devices on agree and support ROACM. Since this
the internet. OSPF (Open Shortest Path information is appended to the end of an
First) is interior routing protocol to create IP packet, the routers, which do not
routing table for routers. TCP/IP uses that employ ROACM, are still able to forward
routing table for forwarding that packet. packets using regular routing protocol. The
ROACM (Route Once and Cross ROACM itself maps virtual circuit links
Connect Many) protocol cross connect IP (indexes), which are provided from the
packet and provide faster data local interface tables at each router where
transmission. The performance of each index corresponds to a next hope
ROACM protocol is much more as interface address. This allows
compare to TCP/IP, but in case of security interoperability on a wide range of
the ROACM protocol is less secure than networks.
TCP/IP. The ROACM protocol consists of four
To overcome the problem of ROACM major tasks.
protocol in this paper we provide the 1. Call Set Up: At call set up stage
security to ROACM protocol. For security control field value 01 means it‘s a call
approach in ROACM protocol we use set up stage in ROACM protocol. In
AES (Advanced Encryption Standard) 128 this stage first initial IP send to
bit key size algorithm to ROACM establish connection.
protocol. 2. Data Transmission: At data
Network is created by using java transmission stage control field value
simulation tool and all feature of ROACM changes to 11 means it‘s a data
protocol are introduced in that network. transmission stage.
Then AES 128 bit key size algorithm 3. Path Update: It provide facility of path
applies in that network. In last we analyses update. For example at particular time
the performance of TCP/IP and ROACM path is optimal but not same path for
protocol for that we take variable average

different time so periodic message the basis of variable average delay and
send in network. throughput.
4. Recovery Plan: It provides facility of B. Algorithms:
recovery plan if any port is malicious. Following is a flow steps to security of
In this phase source station stop ROACM protocol.
sending packet and reestablish 1. Create a network using java
connection. simulation tool.
2. Construct a packet for ROACM
protocol in network.
3. Deploy packet of ROACM
protocol into a network.
4. Generate a key using 128 bit key
size AES (Advanced Encryption
standard) algorithm.
5. Assign that 128 bit key of AES
algorithm in network.
Fig. 3 ROACM Forward Header
As shown in figure 3 ROACM 6. Verify packet transmission in
protocol provide extra header to IP packet network.
that header include dynamic and static
field. Control field is dynamic field and Advantages are:
 The performance of ROACM protocol
index port number towards first hop and
second hop is static field. is generally faster than TCP/IP
A. Architecture: protocol.
 No need for high power processing in
the routers.
 In the ROACM, no need to search in
the routing table except for the call set
up packet.
 The ROACM protocol through large
number of packets transfer in less time.
 The delay ratio is minimum as
compared to TCP/IP protocol.
 Throughput is maximizing.
6. CONCLUSION
ROACM protocol cross connect IP
packet by using index of port number.
ROACM protocol provide extra header to
IP address. In this paper we provide
security to ROACM protocol by using
AES 128 bit key size algorithm by
Fig. 4 Proposed System Architecture introducing all feature of ROACM
As shown in figure 4 the contribution protocol in network by creating
work is, to implement the network in environment of network in java simulator.
which ROACM protocol feature Hence we can provide security against
introduced in network by using java various malicious attacks on ROACM
simulation tool and provide security to protocol.
data using AES 128 bit key size algorithm
and analysis of ROACM and TCP/IP on

REFERENCES [5] Kaur G, Kumar D. ―MPLS technology on IP

[1] Abdullah Ali Bahattab, ―A comparative backbone network‖, International Journal of
analysis of TCP/IP and ROACM protocols- A Computer Applications, 2010; 5(1).
simulation study‖, july2016. [6] Xie W, Huang S, Gu W. ―An improved ring
[2] Qiu Y. A., ―research of MPLS-based network protection method in MPLS-TP networks‖,
fault recovery‖, Third International Conference IEEE; 2010.
on Intelligent Networks and Intelligent [7] Newman P, Minshall G, Lyon T, ―Huston L.
Systems, 2010. IP switching and gigabit routers‖, IEEE
[3] Dumka A, Mandoria H., ―Dynamic MPLS Communications Magazine, 1997 Jan;
with feedback‖, International Journal of 35(1):64–9.
Computer Science, Engineering and [8] Newman P, Lyon T, Minshall G. ―Flow
Applications; 2012. labelled IP: A connectionless approach to
[4] Chan C, Wang P-C, Hu S-C, Lee C-L, Chen ATM‖, Proceedings IEEE Infocom‘96, San
R-C. ―High performance IP forwarding with Francisco: CA; 1996 Mar. p. 1251–60.
efficient routing-table update‖, Computer [9] Diab A, Boringer R. ―Optimized I-MPLS: A
Communications, 2003 Sep1; 26(14):1641–92. fast and transparent micro-mobility-enabled
MPLS framework‖, IEEE; 2006.

DISTRIBUTED ACCESS CONTROL SCHEME FOR

MACHINE TO MACHINE COMMUNICATION IN
IoT USING TRUST FACTOR
Miss. Nikita D. Mazire1, Dr. Vinod V. Kimbahune2, D. D. Shinde6
1,2
Department of Computer Engineering, Smt. Kashibai Navle College of Engineering, Pune, India.
6
Nikita.mazire02@gmail.com1, Vinodkimbahune01@gmail.com2, ddshinde14@gmail.com
ABSTRACT
Access control is one of the earliest issues remain a constant challenge. Its component
determines whether request to access resource is granted. Its domain covers the various
mechanisms by which a system grants or revokes the right to access data and services.
This paper presents a trust-based service management technique by using fuzzy
approach. The innovation lies in the use of distributed collaborating filtering to select
trust feedback from owners of IoT nodes sharing similar social interests. System is
scalable to large IoT systems in terms of storage and computational costs. This adaptive
IoT trust system detects malicious IP‘s and keywords from system and file respectively.
This paper also presents to manage trust protocol parameters dynamically to minimize
trust estimation bias and maximize application performance.
Keywords:
Access Control, Fuzzy approach, Authentication, Capability, Adaptive, Internet of
Things, Trust.
1. INTRODUCTION technology. In this paper, we propose an
Access control is one of the most access control model based on attributes
important concepts to protect resources, and trust to meet the requirements of fine-
having been used in a variety of network grained, dynamic secure access control in
environments. In this paper, we consider IoT environment.
the connected smart objects as the node
resource users. The users connection and Motivation:
disconnection from the IoT system Now days IOT is a popular technology
randomly according to requirement and used everywhere for automation services.
there is probably some malicious node But for huge usage of IOT increases
user who provides fake information via security issues. For IOT technology there
files. Even from malicious users spread are various aspects to secure the data and
offensive data or services. For example in activities. But up till now no one secure
hotel management system there is various the IOT by using trust-based access
services provide via mobile application control. So from that I motivate to get this
(Identity, Check in-out, Tables/Rooms module to implement which we can secure
availability, Air conditioner handling, and IOT devices by using various trust factors,
parking). These services can only and methods.
applicable to certain networking area (on IOT provides interconnection between the
Hotel‘s private Wi-Fi). If user disconnect uniquely identifiable devices. By
from this network and even after integrating several technologies like
user(admin) reject request of user then this actuators and sensor networks,
node IP will get block by admin (Owner of identification and tracking technology,
system). These dynamic and distributed enhanced communication protocol and
characteristics of IoT system have harsh distributed intelligence of smart objects,
requirements for access control IoT enables communication between the

real time objects present around us. The Section V represents proposed system
effectiveness of IoT can be seen in both approach. Finally, section VI summarizes
domestic (e.g. assisted living, e-health, the research and discusses the future work.
enhanced learning) and business
(automation, intelligent transportation) 2. SCOPE
fields. While various issues are related to Our eventual goal is to develop an
the implementation of IoT, Security of IoT authoritative family of foundational
has significant impact on the performance models for attribute-based access control.
of IoT applications. Trust is an important We believe this goal is best pursued by
aspect while talking about secure systems. means of incremental steps that advance
A system can behave in untrustworthy our understanding. ABAC (Attribute-
manner even after having security and based access control) is a rich platform.
privacy implementation. Behavior based Addressing it in its full scope from the
analysis of devices is required that can beginning is infeasible. There are simply
predict the device performance over the too many moving parts. A reasonable first
time. Trust management provides step is to develop a formal ABAC model,
behavior-based analysis of entities, using which we call ABACα that is just
their past behavior, reputation in the sufficiently expressive to capture DAC
network or recommendation. A (Discretionary access control), MAC
trustworthy system is needed to prevent (Mandatory access control) and RBAC
from unwanted activities conducted by (Role-based access control). This provides
malicious devices. My research work is to us a well-defined scope while ensuring
design a dynamic trust management that the resulting model has practical
system for IoT devices. relevance. There have been informal
Machine to machine are present to direct demonstrations, such as, of the classical
communication between devices using models using attributes. Our goal is to
wireless communication channel. More develop more complete and formal
recent machine to machine communication constructions.
has changed into a system of networks that
transmits data to individual files or Standard Permission Types:
services. The expansion of IP networks has 1. Read:
made machine to machine communication  View the file names and
faster and easier while using shrink subfolder names.
power. File sharing and access control is a  Navigate to subfolders.
serious issue in networks if there is no  Open files.
trust factor between sender and receiver.  Copy and view data in the
Any kind of file can be controlled by folder's files.
access controls but still these files cannot 2. Write:
be trusted as they may contain suspicious  Create folders.
or malicious data. Applications in network  Add new files.
cannot be trusted to execute as they can
 Delete files.
harm machines. There is a strong need of 3. Operate Devices In IoT:
trust maintaining mechanism as well as
 Request for service from an
trust defining factors with combination of
IoT node/ device
rules in this scenario.
 Perform actions
This paper is organized as follows: Section Mathematical Model
II presents scope of concept. Section III
presents background and related works. Let S be the closed system defined as,
Section IV presents existing system. S = {Ip, Op, A, Ss, Su, Fi}

Where, Ip=Set of Input, Op=Set of Output, descendant nodes. It can be improved by

Su= Success State, Fi= Failure State and increasing the number of iterations or
A= changing search parameters. With
Set of actions, Ss= Set of user‘s states. increasing of ancestor nodes‘ trust
information, the trust evaluation result is
device more accurate, which can effectively solve
operations} the joint fraud problem. [3] Bumjin Gwak
in this paper author proposed a trust-
Where, aware, role-based access control system in
o F1= Trust management a public IoT environment, namely
o F2 = File sharing TARAS, where a smart object does not
o F3 = Compare all values have any prior knowledge towards a user
o F4 = Check IOT command in a public place. To grant an access right
o F5= Start devices if trust is good to a user with no prior interaction, manual
o F6= Show files if trust is good configuration and authorization by
assigning a user with a particular role has
state, login state, selection of device, give been huge administrative overhead which
command, ON/OFF, logout} introduces a serious scalability issue for
the IoT environment consisting of a large
device operated} number of heterogeneous entities.
Development of an assessment tool for the
Success, Login Success, Web Server} service provided by service providers: This
paper presents a service quality assessment
Login failed, web server Failure, electronic tool which can rank service providers that
device failure} provide quality service based on attributes.
x This will contribute to developing an
={NullPointerException while registration integrated trust model that considers both
state, RecordNotFound (InvalidPassword) service requesters and service providers.
while login state , NullValues Exception [4] Vladimir Oleshchuk In this paper
while Showing state} author propose a framework for security
enforcement in disruption/delay tolerant
3. REVIEW OF LITERATURE networks where centralized trusted
In [1], the authors present a secure based authorities are (almost) not available. The
trusted execution environment. Used a framework combines attribute-based
simulation environment open-TEE to build access control, distributed attribute issuing
a trust an environment that is isolated from and subject logic to measure and support
the rich environment. Our paper can trust-based security model. The objective
protect confidentiality of private file of the proposed approach is to provide
through the file slice, authentication flexible security solutions suitable for
verification of other operation. [2] Shunan crisis and disaster areas when fixed
Ma In this paper, trust in cloud telecommunication infrastructure is
environment is regarded as a self- damaged, and fast deployment is crucial.
organizing system, using bionic [5] Ankur Chattopadhyay the main
mechanism, a dynamic trust evaluation contribution of this paper is a novel
method with family attribute is proposed. biometric authentication-based hybrid
In this method, factors of trust evaluation trust-computing model that innovate OHI
include time, IP and behavior feedback. trust related research by exploiting a
Data structure of access record table and unique provider-centric approach and
trust record table are designed to store the verifies trust constructs at the institutional
relationship between ancestor nodes and level. The presented work represents a

unique application of computer vision future work, we plan to improve the

based visual recognition biometrics to the overall access control protocol with a more
field of information assurance-based trust decentralized solution. More specifically,
computation in OHI. [6] Junshe Wang In it is plan to use the TEE on the
this paper author present proposed a novel Smartphone to embed securely the
access control model for Internet of Things different encryption keys and to perform
based on ABAC model, supporting trust some cryptographic operations to reduce
attribute called Trust-ABAC (T-ABAC). the number of communications with the
TABAC combines ABAC with TBAC, secure cloud. From the secure server
and fulfills more secure requirements by standpoint, it is investigated to implement
deploying trust evaluation module in IoT our solution on a JUNO card which is a
system. As our future work, Our Paper hardware-based TEE to improve
going to integrate our proposed model scalability.
with the real IoT system and further
improve evaluation efficiency by1 4. EXISTING SYSTEM
optimizing algorithm.[10] The paper
mainly introduces traditional role-based File sharing and access control is a serious
access control model and on the base of it, issue in networks if there is no trust factor
combines the role-based access control between sender and receiver. Any kind of
model and the trust-based access control file can be controlled by access controls
model to put forward a new access control but still these files can‘t be trusted as they
model--- trust-role based access control may contain suspicious or malicious data.
model T-RBAC. The paper describes the Applications in network can‘t be trusted to
flow and mechanism and computation of execute as they can harm machines. There
credibility value of the T-RBAC model is a strong need of trust maintaining
and how to utilize simulation experiment mechanism as well as trust defining factors
to compare T-RBAC model with RBAC with combination of rules in this scenario.
model. The simulation experiment proves
that the T-RBAC model shows great 5. PROPOSED SYSTEMS
advantage at the aspects of shortening  Several Nodes or devices are
network delay, enlarging system connected through IoT network with
throughput and anti-interference. third party centralized server for
Compared with the traditional RBAC communication and trust management.
model, the T-RBAC model has more  Give access controls(permissions) in
secure permission granting mechanism between devices
which could better maintain security of
 Send request to get connect (yes/no) If
cloud data and other resource. [8] In this
yes get connected If no give
project, we proposed a secure access
permission denied alert (If both
control system based on TEE and IBE for
communicators are in same network)
university campus. First, we gave a
detailed presentation of IBE highlighting
the pros and cons. Then we exposed the
TEE architecture and presented OP-TEE
that provides a secure OS enabling secure
storage of encryption keys and secure
computation of cryptographic operation
that improves the security of the IBE‘s
PKG. Trusted access control architecture
is then proposed based on a TEE cloud
architecture relying on OP-TEE. As a Fig1. Proposed Architecture

File access control (Read write execute):

i.e. If machine 1 sent file to machine 2 in
read only mode then machine 2 can only We believe this goal is best pursued by
read it cannot manipulate it. means of incremental steps that advance
Malicious IP list (Knowledge Base): our understanding. ABAC is a rich
A database for known malicious IP will be platform. Addressing it in its full scope
maintain and if file is found to be received from the beginning is infeasible. There are
from that IP then system will warn user for simply too many moving parts. A
possible threat. reasonable first step is to develop a formal
Malicious Keywords List (Knowledge ABAC model, which we call ABACα,
Base): which is just sufficiently expressive to
A database for known malicious keywords capture DAC, MAC and RBAC. This
will be maintain and file will be analyzed provides us a well-defined scope while
before opening though the file is encrypted ensuring that the resulting model has
by using public key approach to detect if practical relevance. There have been
file contains malicious matter or violent or informal demonstrations, such as, of the
vulgar matter. classical models using attributes. Our goal
Service Request Access: is to develop more complete and formal
Any device can request for a service from constructions. For development of the
another device connected in IoT as per system for expected results the trust factor
user requirements. All request will be calculation with fuzzy logic was very
handled through a centralized server and important and to carry that work we
propagation of requests takes place propose the simple method along with
according to trust calculated for device multiple parameters as shown in following
requesting services. table.
Table1: Trust Rules
Combination of Access, past
Rule If and and AF Then
Experiences and Knowledge Base:
EX KN
As defined above will we have a set of
1 Good Good Good Trust
knowledge and as log procedures
2 Bad Good Good Don‘t
experiences with individual machine
Trust
would have been recorded. All these rules,
experiences and knowledge can be 3 Good Bad Good Don‘t
combined to generate trust factors. Trust
Trust Factor: 4 Good Good Average Trust
Trust factor is calculated on centralized 5 Good Good Bad Don‘t
server by its past experiences , current Trust
access rules specified and knowledge
which may be predefined or gained. 6 Bad Bad Bad Don‘t
Trust
Our eventual goal is to develop an
authoritative family of foundational EX=Experience, KN=Knowledge,
models for attribute-based access control. AF=Access Frequency
Measured counterpart: As considering above trust factor
Accuracy (ACC): This metric measures calculation table we calculate the trust
the correct detection ratio in terms of the factor of a machine or device and system
number of correctly detected users, then decides if access is to be given or not.
detecting a benign user as benign and a As we are working with IoT we keep this
malicious user as malicious, over the total trust calculation on web server through
number of users. ACC is obtained by: which all communication is to be handled.

6. CONCLUSION REFERENCES
Trust based access control is in file [1] Bumjin Gwak and Jin-Hee Cho ―TARAS:
transfers in networks along with Trust-Aware Role-based Access Control
System in Public Internet-of-Things‖ 2324-
application access communication, 9013/18/31.00 ©2018 IEEE
especially where there is no rules for [2] Junshe Wang and Han Wang ―Trust and
trusting each other and no knowledge Attribute-based Dynamic Access Control
base or past experiences recording. This Model For Internet of Things‖ 978-1-5386-
project presents the novel approach to 2209-4/17 $31.00 © 2017 IEEE.
[3] Shunan Ma and Xunbo Shuai ―Bionic
maintain a trust between machines in Mechanism Based Dynamic Trust Evaluation
network to share files with various access Method in Cloud Environment‖ 2324-
controls and calculating trust based on 9013/18/31.00 ©2018 IEEE.
predefined knowledge or gained [4] Chaitali Uikey and D. S. Bhilare‖ TrustRBAC:
knowledge to combine with experiences Trust Role Based Access Control Model in
Multi-Domain Cloud Environments‖ ICICIC –
and access rules to decide and warn user if 2017.
to trust he files from sender machine or [5] Ankur Chattopadhyay and Michael J. Schulz‖
not. It also decides if to service a request Towards A Biometric Authentication-based
from and device is to be perform or to Hybrid Trust-computing Approach for
perform necessary action requested based Verification of Provider Profiles in Online
Healthcare Information‖ DOI
on third party centralized trust calculation. 10.1109/SPW.2017.23
This approach can avoid much harm [6] Vladimir Oleshchuk ―A Trust-Based Security
caused due to lack of knowledge and Enforcement in Disruption-Tolerant
trusting any files or request whichever is Networks‖ 978-1-5386-0697-1/17/$31.00
asked. Also the authenticity of files will be ©2017 IEEE.
[7] Chaoyin Huang and Zheng Yan ―Secure
preserved by access rights. In future we Pervasive Social Communications based on
would love to extend our work with Trust in a Distributed Way‖ 2169-3536 (c)
multiple layers and more rules for trust 2016 IEEE.
calculations so that it can help if [8] Mohamed Amine Bouazzouni and Emmanuel
centralized server fails then there will be a Conchon ―Trusted Access Control System for
Smart Campus‖ 978-1-5090-2771-2/16 $31.00
backup option and trust will be more © 2016 IEEE.
precise as it will go through multiple [9] Ms. Swathy M Sony and Ms. Swapna B Sasi
layers. ―ON - OFF ATTACK MANAGEMENT
BASED ON TRUST‖ 978-1-5090-4556-
7. ACKNOWLEDGMENT 3/16/$31.00 ©2016 IEEE.
[10] Huang Lanying and Xiong Zenggang ―A
I would like to express my appreciation to Trust-role Access Control Model Facing Cloud
all those who provided me the possibility Computing‖ July 27-29, 2016, Chengdu,
to complete this report. A special China.
gratitude I give to my seminar guide Prof. [11] Hui Xia ―Design and Implementation of Trust
V. V. Kimbahune, whose contribution in –based Access Control System for Cloud
Compouting‖,978-1-5090-5363-6/17/$31.00
stimulating suggestions and ©2017 IEEE
encouragement, helped me to present this [12] Eugene Sanzi and Steven A Demurjin
seminar. I have to appreciate the guidance ―Integrating Trust Profiles,Trust
given by other supervisor as well as the Negotiation,and Attribute Based Access
panels especially in my seminar Control‖, 978-1-5090-6325-3/17 $31.00 ©
2017 IEEE
presentation that has improved my [13] Yongkai Fan ―One Secure Access Scheme
presentation skills thanks to their comment based on Trusted Execution Environment‖,
and advices. 2324-9013/18/31.00 ©2018 IEEE.

MEDIBLOCK- A HEALTHCARE MANAGEMENT

SYSTEM USING BLOCKCHAIN TECHNOLOGY
Gayatri Bodke1, Himanshu Bagale2, Prathamesh Bhaskarwar3, Mihir Limaye4,
Dr.S.K.Pathan5, Jyoti Prakash Rajpoot6
1,2,3,4,5
Department of Computer Engineering, Smt Kashibai Navale College of Engineering,
vadgaon(Bk), Pune, India.
6
gayatribodke18@gmail.com1, himanshubagale3@gmail.com2,
prathameshbhaskarwar@gmail.com3, mihirlimaye26@gmail.com4, shafipathan@gmail.com5,
jyotiet@gmail.com
ABSTRACT
Blockchain technology has shown its significant adaptability in recent years as a
variety of market sectors in search of incorporating its abilities into their operations.
Numerous initial points for Blockchain technology in the healthcare industry are the
focus of this system. With examples for public healthcare management, user-oriented
medical research in the pharmaceutical sector, this project aims to illustrate possible
influences, goals and potentials connected to this disruptive technology.
To explain this concept, we present a Web-Application that can store the hash values of
the digital files of patient onto a Blockchain network and additionally it has a cloud
database for storing the files itself. Digital file‘s hash is immediately transferred to a
Blockchain network. The Doctor while trying to append a new record to the existing
patient‘s file will request for the hash key from the blockchain network. After this the
transaction is complete, the file is updated in the cloud‘s database. Patient can view this
file anytime but cannot modify it.
General Terms
Keywords
Decentralized, Blockchain, Timestamping, IPFS
1. INTRODUCTION The software will be a web-portal
We propose a web-based, trusted time- that contains the patient‘s digital file hash.
stamping service that uses the Also, the digital file itself will be saved on
decentralized blockchain to store Cloud. Whenever the patient needs the
anonymous, tamper-proof time stamps for file, he can directly view it on his web
any patient‘s data. This service allows portal and verify it.
patients to hash records, and subsequently The purpose of the software is to simplify
store the created hashes in the blockchain the way in which the hospitals need to
as well as retrieve and verify time stamps verify the integrity of some digital records
that have been committed to the and access control models to address
Blockchain[7]. security and privacy concerns in an
This document is meant for stakeholders, increasingly cloud-based environment. We
consumers and developers who consider study the potential to use the Blockchain
storing their digital records safely on a technology to protect healthcare data
Blockchain. This document aims at hosted within the cloud.
explaining in an easy manner, the basic
idea behind Blockchain in Healthcare 2. STATE OF ART
Management. It aims to introduce to The State of Art for implementing this
developers and consumers the main system includes study of various IEEE
features of and how to achieve their goals. papers such as: -
Future development objectives are also
discussed.

1) Blockchain Technology in Healthcare Healthcare user determine which parties

which was published by Matthias Mettler have insight into his / her blockchain,
and M.A.HSG describes ―The immense which ensures improved privacy
potential of this technology shows up protection. This functionality increases the
wherever, until now, a trusted third party quality of delivered care. The security will
was necessary for the settlement of market be provided to patient data. No one can
services. With Blockchain, direct access data without permission of patient.
transactions suddenly become possible, Patient will see how their data will be
whereby a central actor, who controlled used.[8].
the data, earned commission or even After acquiring hash key, the Patient can
intervene in a censoring fashion, can be store his records on the Blockchain
eliminated [1]‖. Network. The Patient can only view his
data and cannot modify it. The system is
2) Blockchain: A Panacea for secure as no third party can access the
Healthcare Cloud-Based Patient data. The Patient can view his
Data Security and Privacy written by previous records as well as the current one
Christian Esposito and Henry Chang due to chain like structure. After getting
describes ―One particular trend observed the key the Doctor can add records
in healthcare is the progressive shift of (prescription) to the Blockchain Network
data and services to the cloud, partly due for a particular time after that the system
to convenience (e.g. availability of will close his access. This will be future
complete patient medical history in real- state of our project[10].
time) and savings (e.g. economics of
healthcare data management) [3]‖. 3. USER CLASSES AND
CHARACTERISTICS
3) Integrating Blockchain for Data
Sharing and Collaboration in Mobile PATIENTS: The main User Class that is
Healthcare Applications written by going to use this product is Patients and
Xueping Liang Juan Zhai, Sachin Doctors. The product frequency of use
Shetty, Jihong Liu in the year 2017 could be on a daily basis as every sector,
describes‖ In this paper, design and every operations need raw data in the form
implement a mobile healthcare system of records i.e. digital documents.
for personal health data collection,
sharing and collaboration between DOCTOR: Another User Class is of
individuals and healthcare providers, Doctor who needs permission from patient
as well as insurance companies. to access his files and add new entries in it.
4) GAP ANALYSIS System Features
Decentralized Timestamping: Trusted
Gap analysis include current state and Timestamping is the process of securely
future state of project. The Current state of keeping track of the creation and
project is that currently data of patient is modification time of a document. Security
not secure. Anyone can misuse the data of here means that no one not even the owner
patient even within the system and can of the document should be able to change
also edit data of patient which is very it once it has been recorded provided that
dangerous. Anyone can access the the time stamper‘s integrity is never
patient‘s data in the absence of patient. In compromised.[7]
this, the patient is unaware of the way his Blockchain Network: The Blockchain
data can be misused. They do not have Network consists of two parts:
ownership of their data. 1. Calculation of Hash using SHA-256
function.

2. Creation of Blocks with hash values. Patient and Doctor-

The patient will register on the web portal.
IPFS System: InterPlanetary File System
After registration patient will login and
(IPFS) is a protocol and network designed
upload the required file. The patient will
to create a content-addressable, peer-to-
get the Hash details of the file for future
peer method of storing and sharing
reference.
hypermedia in a distributed file system.
Also, the user will get the option to upload
the file on IPFS system for safekeeping
5. PROPOSED WORK
and will be able to download the file
whenever he wants. The verifier will
This project aims to achieve and maintain
register and login. The verifier will get the
the digital integrity of the files. The
hash details from the user and will verify
System will be implemented as follows: -
the file.
1. The user will register on the web portal.
Every patient‘s data has unique key called
2. The user will login and uploads the
as hash key. After login to the system
required file
patients can request for the hash key. After
3. The user will get the Hash details of the
generating hash key suppose particular
file for future reference.
doctor wants to access the data of that
4. Also, the user will get the option to
patient then doctor can request to patient
upload the file on IPFS system for
for the access.
safekeeping and will be able to download
If patient gives permission to the
the file whenever he wants.
doctor then only they can access data for a
5. The Doctor will register and login.
particular time period. Time-stamping is
6. The Doctor will get the hash details
provided for security purpose. Only within
from the Patient and will add new
the time period doctor can access patient‘s
Records to this file.
data. Patient can also deny access if they
don‘t want to share their data with
6. SYSTEM ARCHITECTURE particular doctor.
The System Architecture is the conceptual
model that defines the structure, behavior, Patient and blockchain network-
and more views of a system. An After acquiring the hash key, the Patient
architecture description is a formal can store his records on the Blockchain
description and representation of a system, Network. Patient can only view his data
organized in a way that supports reasoning and cannot modify it. The system is secure
about the structures and behaviors of the as no third party can access the Patient‘s
system. System architecture of the system data. The Patient is able to view his
consists of following blocks- previous records as well as the current one
due to chain like structure.
Patient and Cloud Database:

The Patient can view his updated file by
connecting to cloud Database. The Cloud
Database stores all the files and their
respective timestamps.
Doctors and blockchain network

Doctor requests key from the patient. After
Fig 4.1: System Architecture getting the key, Doctor can add records
(prescription) to the Blockchain Network

for a particular time after that the system Also, an application may be extended to
will close his access. add second factor of authentication by
tracking patient‘s internal movements and
5.Additional Features then predicting the type of disease caused
Some researchers are worried about the to him and sending the data to the doctor
situation in which the Patient is not able to on the blockchain network.
handle his own account and hence cannot
provide the hash key to the Doctor i.e.
REFERENCES
during operation or any critical injuries.
Our system solves this problem by [1] Steward, ―Electronic Medical Records,‖
Journal of Legal Medicine, vol. 26, no 2005,
accepting emergency contact information pp. 491–506.
of Patient‘s certain close people who he [2] R. Hauxe, ―Health Information Systems—Past,
can trust his information with .They will Present,Future,‖ Int'l Journal of Medical
then provide Doctor with the Hash Key Informatics, vol. 75,no. 3–4, 2006, pp. 268–
whenever Patient is in critical condition. 281.
[3] K. Häyrinena et al., ―Definition, Structure,
Content, Use and Impacts of Electronic
7. CONCLUSION Health Records: A Review of the Research
Literature,‖ Int'l Journal of Medical
The examples described, show that Informatics, vol. 77, no. 5, 2008, pp. 291–304.
[4] M. Ciampi et al., ―A Federated Interoperability
Blockchain offers numerous opportunities Architecture for Health Information Systems,‖
for usage in the healthcare sector, e.g.in Int‘l ournal of Internet Protocol Technology,
public health management, user-oriented vol. 7, no. 4, 2013, pp. 189–202.0
medical research based on personal patient [5] M..Moharra et al., ―Implementation of a Cross-
data as well as drug counterfeiting[11]. Border Health Service: Physician and
Pharmacists‘ Opinions from the ep SOS
The immense potential of this technology Project,‖ Family Practice, vol. 32, no. 5, 2015,
shows up wherever, until now, a trusted [6] P. B. Nichol (2016, March), Blockchain
third party was necessary for the applications for healthcare,
[Online].Available:http://www.cio.com/article/
settlement of market services. With
3042603/innovation/blockchain-applications-
Blockchain, direct transactions suddenly for-healthcare.html.
became possible, whereby a central actor, [7] G. Prisco (2016, April), The Blockchain for
who controlled the data, earned Healthcare: Gem Launches Gem Health
commission or even intervened in a Network With Philips Blockchain Lab,
[Online]. Available:
censoring fashion, can be eliminated.
https://bitcoinmagazine.com/articles/the-
blockchain-for-heathcare-gemlaunches-gem-
8. FUTURE SCOPE health-network-with-philips-blockchain-lab-
1461674938
The scope of our application in future is by [8] P. Taylor (2016, April), Applying blockchain
technology to medicine traceability, [Online].
extending it to sectors like Insurance
Available: https://www.securingindustry.com/
Companies where hospitals can check pharmaceuticals/applying-blockchain-
whether a patient is subjected to a technology-to
particular policy and greatly improve the medicinetraceability/s40/a2766/#.V5mxL_mL
risk management. It can be used to record TIV.
patient‘s gestures and onsite data and
secure its integrity.

MULTIMODAL GAME BOT DETECTION USING

USER BEHAVIORAL CHARACTERISTICS
Prof. P.R.Chandre1,Kushal Matha2 ,Kiran Bibave3, Roshani Patil4, Mahesh Mali5
1,2,3,4,5
Department of Computer Engineering, Smt kashibai Navale College of Engineering, vadgaon(Bk),
Pune, India.
Pankajchandre30@gmail.com1,mathakushal@gmail.com2 ,kiranbibave9@gmail.com3
,roshaniop19@gmail.com4 ,mmali2869@gmail.com5
ABSTRACT
As the online service industry has continued to grow, illegal activities in the online
world have drastically increased and become more diverse. Most illegal activities occur
continuously because cyber assets, such as game items and cyber money in online
games, can be monetized into real currency. The aim of this study is to detect game bots
in a massively multiplayer online role playing game (MMORPG). We observed the
behavioral characteristics of game bots and found that they execute repetitive tasks
associated with gold farming and real money trading. We propose a game bot detec-
tion method based on user behavioral characteristics. The method of this paper was
applied to real data provided by a major MMORPG company. Detection accuracy rate
increased to 96.06 % on the banned account list.
Game bots also disturb human users because they consistently consume game
resources. For instance, game bots defeat all monsters quite rapidly and harvest items,
such as farm produce and ore, before human users have an opportunity to harvest
them. Accordingly, game bots cause complaints from human users and damage the
reputation of the online game service provider.
Furthermore, game bots can cause inflation in a game‘s economy and shorten the
game‘s lifecycle, which defeats the purpose for which game companies develop such
games.
Keywords: Online game security, Social network analysis, Behavior analysis, Data
mining, MMORPG
1. INTRODUCTION purpose for which game companies
A game bot is an automated program that develop such games.
plays a given game on behalf of a human Several studies for detecting game bots
player. Game bots can earn much more have been proposed in academia and
game money and items than human users industry. These studies can be classified
because the former can play without into three categories: client-side, network-
requiring a break. Game bots also disturb side, and server-side. Most game
human users because they consistently companies have adopted client-side
consume game resources. For instance, detection methods that analyze game bot
game bots defeat all monsters quite rapidly signatures as the primary measure against
and harvest items, such as farm produce game bots. Client-side detection methods
and ore, before human users have an use the bot program‘s name, process
opportunity to harvest them. Accordingly, information, and memory status.This
game bots cause complaints from human method is similar to antivirus programs
users and damage the reputation of the that detect computer viruses.
online game service provider. Client-side detection methods can be
Furthermore, game bots can cause readily detoured by game bot developers,
inflation in a game‘s economy and shorten in addition to degrading the computer‘s
the game‘s lifecycle, which defeats the performance. For this reason, many
countermeasures that are based on this

approach, such as commercial anti-bot advantage of the game and even

programs, are not currently preferred. make aliving out of it..
Network-side detection methods, such as
network traffic monitoring or network  THE MOST POPULAR OF THIS TYPE
protocol change analysis, can cause OF VIDEO GAME IS WOW(WORLD
network overload and lag in game play, a OF WARCRAFT).
significant annoyance in the online gaming
experience. To overcome these limitations 2. LITERATURE SURVEY
of the client-side and network-side Title : ‖ Multimodal game bot detection
detection methods, many online game using user behavioral characteristics.‖
service providers employ server-side Author: Ah Reum Kang, Seong Hoon
detection methods. Server-side detection Jeong, Aziz Mohaisen and Huy Kang Kim.
methods are based on data mining Description : Most illegal activities occur
techniques that analyze log data from continuously because cyber assets, such as
game servers. game items and cyber money in online
Most game servers generate event logs games, can be monetized into real
whenever users perform actions such as currency. The aim of this study is to detect
hunting, harvesting, and chatting. Hence, game bots in a massively multiplayer
these in-game logs facilitate data analysis online role playing game (MMORPG). We
as a possible method for detecting game observed the behavioral characteristics of
bots. Online game companies analyze user game bots and found that they execute
behaviors or packets at the server-side, and repetitive tasks associated with gold
then online game service providers can farming and real money trading. We
selectively block those game bot users that propose a game bot detection method
they want to ban without deploying based on user behavioral characteristics. A
additional programs on the client-side. For game bot is an automated program that
that, most online game service providers plays a given game on behalf of a human
prefer server-side detection methods. In player. Game bots can earn much more
addition, some online game companies game money and items than human users
introduced big data analysis system because the former can play without
approaches that make use of data-driven requiring a break. Game bots also disturb
profiling and detection.Such approaches human users because they consistently
can analyze over 600 TB of logs generated consume game resources. For instance,
by game servers and do not cause any game bots defeat all monsters quite rapidly
side-effects, such as performance and harvest items, such as farm produce
degradation or conflict with other and ore, before human users have an
programs.The literature is rich of various opportunity to harvest them. Accordingly,
works on the problem of game bot game bots cause complaints from human
detection that we review in the following. users and damage the reputation of the
We present key server-side detection online game service provider.
methods classified into six analysis Furthermore, game bots can cause
categories: action frequency, social inflation in a game‘s economy and shorten
activity, gold farming group, sequence, the game‘s lifecycle, which defeats the
similarity, and moving path. purpose for which game companies
develop such games.
Motivation
Title :―Automatic Detection for online
 This concentration of money and
Games Bot with APP.‖
players is a spawning pool for
Author: Chin-Ling Chen, Chang-Cheng
hackers ,cheaters, and criminals Ku, Yong-Yuan Deng, Woei-Jiunn Tsaur.
that will do anything to take

Description : The main reasons lies in Minecraft servers to control CPU load;
spreading viruses by sharing other players‘ we primarily explored manually setting the
account via illegal plugin, Trojans, CPU affinity of the Minecraft server
exploiting security vulnerabilities game or thread to run on specific virtual cores.
other malicious virtual property illegally
acquired players. In this study, we 3. PROPOSED SYSTEM:-
proposed a new way to scan and resist bot. We proposed a multimodal framework
The APP we developed can scan, detect, for detecting game bots in order to reduce
and filter out the bot that needs to be shut damage to online game service providers
down, so as to delete online game bot and legitimate users. We observed the
effectively. With the growing of the behavioral characteristics of game bots and
popularity of online games,the potential found several unique and discriminative
risks are also increasing. characteristics. We found that game bots
Attacks usually take various to execute repetitive tasks associated with
defraud players in order to get the players‘ earning unfair profits, they do not.
virtual property or personal data. Advantages:
Currently, robot online game plug . The bot will kill monsters, loot money,
detection is running in many ways. mine, or gain levels automatically without
However, related studies are still unable the player having to be in front of the
to make a truly effective and computer.
comprehensive preventingmechanism
especially with the fact that criminal  A bot is a player who runs a third
behaviors for online games are difficult to party program to control their
curb effectively. character.
Title : Utilizing Minecraft Bots to  To reduce damage to online game

Optimize Game Server Performance and
service providers and legitimate
Deployment.
users.
Author: Matt Cocar, Reneisha Harris and
Your Khme 4. SYSTEM ARCHITECTURE
levsky.
Description : In this paper [3] ,control the
Minecraft s- erver threads to expose how
scaling from one server to ten servers
increases load across the system. To start
w -e will examine the infrastructure of our
environment ; it was set up as a virtual
network between the server & client(s). In
this infrastructure, we have the following -
configuration: One virtual machine that is
used to host the Minecraft servers, and
several virtual machines those are used to
run bots for every two Minecraft servers.
We achieved this through creating scripted Fig. System Architecture
movements of Minecraft characters that In System architecture user and admin
are connected to Minecraft server(s) login the application. Then user can start
hosted within our virtual infrastructure. the game. After starting time also start.
After this was achieved, we explored Then user can play the game then it is low
altering the methods of running the active possibility to win the game.

game bot detection using user behavioral

characteristics‖, Kang et al. SpringerPlus,2016.
[2]Chin-Ling Chen, Chang-Cheng Ku,
5. CONCLUSION [2] Yong-Yuan Deng, Woei-Jiunn Tsaur,
We proposed a multimodal framework for ―Automatic Detection for online Games Bot
detecting game bots in order to reduce with APP‖, Third International Conference on
damage to online game service providers Fog and Mobile Edge Computing
and legitimate users. We observed the (FMEC)2018.
[3] Matt Cocar, Reneisha Harris and Youry
behavioral characteristics of game bots and Khmelevsky, ―Utilizing Minecraft Bots to
found several unique and discriminative Optimize Game Server Performance and
characteristics. We found that game bots Deployment‖, 30th Canadian Conference on
execute repetitive tasks associated with Electrical and Computer
earning unfair profits, they do not enjoy Engineering(CCECE)2017.
[4] R. Thawonmas, Y. Kashifuji, and K. Chen,
socializing with other players, are ―Detection of MMORPG bots based on
connected among themselves and behavior analysis,‖ in Proc. 2008 International
exchange cyber assets with each other. Conference on Advances in Computer
Interestingly, some game bots use the mail Entertainment Technology, 2008,pp.91-94.
function to collect cyber assets. We [5] H. Kwon, K. Woo, C. H. Kim, C. Kim and H.
K. Kim, ―Surgical strike: A novel approach to
utilized those observations to build minimize collateral damage to game BOT
discriminative features. detection,‖ in Proc. Annual Workshop on
We evaluated the performance of the Network and Systems Support forGames,pp.1-
proposed framework based on highly 2.
accurate ground truth—resulting from the [6] K. Woo, H. Kwon, H. Kim, C. Kim, and
H. K. Kim, ―What can free money tell us on
banning of bots by the game company. the virtual black market?‖ Computer
The results showed that the framework can Communication Review, vol. 41, no. 4, pp.
achieve detection accuracy of 0.961. 392-393.
Nonetheless, we should consider that the [7] J. Blackburn, N., Kourtellis, J. Skvoretz, M.
banned list does not include every game Ripeanu, and A. Iamnitchi, ―Cheating in
online games: A social network
bot. perspective,‖ ACM Transactions on Internet
Technology, vol. 13, no. 3, pp. 9:1-9:25.
REFERENCES [8] R. A. Hanneman, and M. Riddle.
[1] Ah Reum Kang, Seong Hoon Jeong, Aziz "Introduction to social network methods," .
Mohaisen and Huy Kang Kim, ―Multimodal

SURVEY ON MULTIFACTOR AUTHENTICATION

SYSTEM
Nisha Kshatriya1, Aishwarya Bansude2, Nilesh Bansod3, Anil Sakate4
1,2,3,4
Pune, India.
nisha.kashtriya08@gmail.com1, aishwaryabansude7385@gmail.com2, nileshbansod100@gmail.com3,
anilsakate926@gmail.com4
ABSTRACT
Cyber security is all about Authentication, security and confidentiality. There are so
many methods for strengthening the security of login password based authentication.
Primarily this has been through the use of two-factor authentication methods. Two-
factor authentication is the combination of single factor authentication mechanisms.
The growing popularity and acceptance of two-factor methods are driven by the
increasing need for privacy and security in this technological age. The success and
popularity of adapted security measures are largely dependent on their ease of
implementation and convenience to the user. The focus of this research is to address
and analyze the implications of using a three-factor authentication model for added
security in websites and mobile apps. This paper will present an app we created which
could provide a potential method for three-factor authentication that could potentially
ensure added authentication assurances without loss of convenience.
1.INTRODUCTION assurance that the bearer has been
Authentication is the act of authorized to access secure systems. The
establishing or confirming something (or owner of reliable data or the operator of
someone) as authentic, that is that claims such secure systems is implementing
made by or about the thing are true. multi-factor authentication for services or
Authenticating an object may mean products. Multi-factor authentication
confirming its provenance, whereas hence means two or more of the
authenticating a person often consists of authentication factors required for being
verifying their identity. Authentication authenticated.
depends upon one or more authentication Three universally recognized
factors. authentication factors exist today: what
In computer security, you know (e.g., passwords), what you
authentication is the process of attempting have (e.g., ATM card or tokens), and what
to verify the digital identity of the sender you are (e.g., biometrics). Recent work has
of a communication such as a request to been done in trying alternative factors such
log in. The sender being authenticated may as a fourth factor, e.g., somebody you
be a person using a computer, a computer know, which is based on the notion of
itself or a computer program. A blind vouching.
credential, in contrast, does not establish Multi-Factor authentications are a
identity at all, but only a narrow right or mechanism which implements the multi-
status of the user or program. level of the factors mentioned above and is
In a web of trust, authentication is therefore considered stronger and more
a way to ensure users are who they say secure than the traditionally performed
they are that the user who attempts to one-factor authentication system.
perform functions in a system is in fact the Withdrawing money from an ATM utilizes
user who is authorized to do so.means two-factor authentications; the user must
using any independent multi of these possess the ATM card, i.e., what you have,
authentication methods (e.g., text-based + and must know a unique personal
color based + image based) to increase the

identification number (PIN), i.e., what you easily be misplaced or accidentally run
know. through the laundry. If you trust factors
like PINs, there‘s always the chance that
2. MOTIVATION you forget it. Biometric factors like eyes
Security has been the main concern for all and fingers can be lost in accidents.
users, organizations. Securing sensitive 2.False security
data becomes more critical on internet. Two-factor authentication provides a level
Everyday new type of attack introduces in of security, but it‘s typically exaggerated.
cyberspace to break authentication. We For example, if you were locked out of a
need proper authentication method to service because you lost a factor, you‘re
secure critical business infrastructure. basically in the same predicament as a
Avoid loss of intellectual property of a hacker attempting to gain access to your
user and securing the sensitive information account. If you can reset your account
on the internet. without an access factor, then a hacker
can, too.
3. STATE OF ART Recovery options typically contradict the
Authentication is the use of one or more point of two-factor authentication, which
mechanisms to prove that you are whom you is why companies like Apple have done
claim to be. once the identify of the human or away with them. However, without
machine is validated, access is recovery options, your account may be lost
granted.Authentication is generally required to forever.
access secure data or enter a protected area. 3. It can be turned against users
The requester for access or entry shall While two-factor authentication is
authenticate himself based on proving intended to keep hackers out of your
authentically his identify using account, the opposite can happen. Hackers
can set up or reconfigure two-factor
- What the requestor individually knows as authentication to keep you out of your own
a secret, such as a password or a Personal accounts.
Identification Number (PIN), or Two-factor authentication may not be
- What the requesting owner uniquely has, effective enough to secure your accounts
such as a passport, physical token, or an but can also be too effective if you‘re not
ID-card, or careful. As services improve with two-
- What the requesting bearer individually factor practices and make account
is, such as biometric data, like a fingerprint recovery more difficult, it‘s pertinent to set
or the face geometry. up the authentication on your necessary
accounts before a hacker does.
4. GAP ANALYSIS
5. PROPOSED WORK
1. Factors can get lost In the run-up to the 2016 U.S. presidential
There is no certainty that your elections, Democratic candidate Hillary
authentication factors will be available Clinton received a serious blow from a
when you need them. Typically, you are series of leaks coming from the email
locked out of your account after one account of her campaign chairman John
mistake is made. Security is main concern Podesta. Hackers were able to access the
for all users, organizations. contents of Podesta‘s account by staging a
In situations when you lose power or your successful phishing attack and stealing his
phone is damaged by water, you won‘t be credentials.
able to get your SMS codes as the second Podesta is one of the millions of people
authentication factor. Relying on a USB whose passwords get stolen as a result of
key as a second factor is also risky. It can social engineering attacks or data breaches

every year. A recent research by security passwords or security keys. Moreover,

firm 4iQ found a 41-gigabyte file being every authentication attempt is performed
sold on the dark web, which contained 1.4 over multiple channels, each using a
billion usernames and passwords. separate security method. Meanwhile, the
It is now evident more than ever that user experience is seamless and
passwords are not enough to protect online frictionless, requiring only a tap or
accounts. With each of us managing fingerprint verification on the Octopus
dozens of online accounts, keeping every Authenticator app.
password unique is becoming increasingly As hackers become more sophisticated in
burdensome. That‘s why we often reuse their methods to take over sensitive
passwords, which can lead to chain attacks accounts and steal critical information,
when one password is revealed to hackers. enterprises must also improve their
As computers grow faster, stronger and defenses. The next generation of multi-
more affordable, we‘re forced to create factor authentication technologies will
more complex passwords to protect our make sure you‘re ready to face the security
accounts against brute-force attacks. And challenges that lie ahead.
as quantum computing gradually turns
from myth to reality, no amount of 6. CONCLUSION AND FUTURE
complexity will p0rotect us against WORK
hackers. And finally, as long as our In redesigning the authentication service
passwords are stored somewhere in providers and users have to look into
servers, a hacker can always get a hold of future verification necessities, not today‘s.
them by breaking into those servers. As a rule, one needs to spend more to get
The next generation of multi-factor more elevated amounts of security. Three-
authentication (MFA) mechanisms will element confirmation arrangement
combine impregnable security and ease of prepares clients by giving adaptable and
use, ensuring that users have a frictionless solid validation to expansive scale. Three
experience while preventing hackers from element validation frameworks are easy to
finding and exploiting loopholes. use approach and require memorability of
Passwords will most likely disappear and verification passwords. The objective of
give way to more reliable and user- security to keep up the trustworthiness,
friendly methods. A recent survey accessibility, and protection of the data
conducted by Secret Double Octopus endowed to the framework can be gotten
found that most company employees find by adjusting this verification method.
passwords unwieldy and burdensome, and Three-factor authentication (3FA) could
would prefer biometric authentication as definitely diminish the frequency of online
the main method for securing their online extensive fraud and other online
accounts. extortions, in light of the fact that the
Biometrics were previously expensive and victims password would never again be
inaccurate, but recent years have seen sufficient to give a hoodlum access to their
precise and affordable fingerprint, iris and data .
face scanners integrated in a large number As we have made this system/framework
of consumer devices. Companies will be for secure login through Website and Web
able to leverage these technologies to application. Future Modifications can be
replace passwords. done and made it available for Mobile
An example of modern multi-factor Applications.
authentication is Secret Double Octopus‘s
passwordless identity verification solution. REFERENCES
Secret Double Octopus obviates the need [1] Development of host based Intrusion detection
for storing any form of secrets, be it system for Log files. IEEE Symposium on

Business, Engineering and Industrial [8] Alireza Pirayesh Sabzevar, Angelos Stavrou
Applications (ISBEIA), Langkawi, ―Universal Multi-Factor Authentication Using
Malaysia,2011 Graphical Passwords‖, Proceedings of the
[2] HSNORT: A Hybrid Intrusion Detection 2008 IEEE International Conference on Signal
System using Artificial Intelligence with Snort. Image Technology and Internet Based
IJCTA| May-June 2013 Systems. pp. 625-632, 2008.
[3] DDoS Attacks Impact on Network Trac and its [9] F.Hoornaert D.Naccache O.Ranen D.M‘Raihi,
Detection Approach. International Journal of M. Bellare. Hotp: An hmacbased one-time
Computer Applications (09758887) Volume password algorithm. Request for Comments:
40 No.11, 2012 4226, 2005.
[4] An Overview on Intrusion Detection System [10] Wei FS, Ma JF, Aijun G, et al. (2015) A
and Types of Attacks It Can Detect provably secure three-party password
Considering Different Protocols. International authenticated key exchange protocol without
Journal of Advanced Research in Computer using server‘s public-keys and symmetric
Science and Software Engineering, 2012 cryptosystems. Inf Technol Control 44:195–
[5] Zhu H. ―A Provable One-way Authentication 206
Key Agreement Scheme with User Anonymity [11] Haichang Gao, Wei Jia, Fei Ye, Licheng Ma
for Multi-Server Environment‖. TIIS, vol. 9, ―A survey on the use of Graphical Passwords
no. 2, pp 811-829, 2015. in Security‖, Journal of software, Vol. 8, No.
[6] McAfee Case Study ―Securing the Cloud with 7, July 2013.
Strong Two-Factor Authentication through [12] Xia ZH, Wang XH, Sun XM, Wang Q
McAfee One Time Password‖ (2016a) A secure and dynamic multi-keyword
[7] Edward F. Gehringer ―Choosing passwords: ranked search scheme over encrypted cloud
Security and Human factors‖ IEEE 2002 data. IEEE Trans Parallel Distrib Syst
international symposium on Technology and 27:340–352.
Society, (ISTAS‘02),\ ISBN 0-7803-7284-0,
pp. 369 - 373, 2002.

CLOUD
COMPUTING

CLOUD STRESS DISTRIBUTION AND DE-

DUPLICATION CHECK OF CLOUD DATA WITH
SECURE DATA SHARING VIA CLOUD
COMPUTING
Amruta Deshmukh1,Rajeshri Besekar 2,Raveena Gone3,Roshan Wakode4, Prof.
D.S.Lavhkare
1,2,3,4,5
Pune, India.
amrutasd28@gmail.com1, rjbesekar@gmail.com2,raveenagone007@gmail.com3,
miroshanwakode@gmail.com4
ABSTRACT
In the current digital world, data is of prime importance for individuals as well as for
organizations. As the amount of data being generated increases exponentially with
time, duplicate data contents being stored cannot be tolerated. Thus, employing storage
optimization techniques is an essential requirement to large storage areas like cloud
storage. De duplication is a one such storage optimization technique that avoids storing
duplicate copies of data. Currently, to ensure security, data stored in cloud as well as
other large storage areas are in an encrypted format and one problem with that is, we
cannot apply de-duplication technique over such an encrypted data. Thus, performing
de-duplication securely over the encrypted data in cloud appears to be a challenging
task.
In this we, propose a effective method for data de-duplication along with secure cloud
data storage .With encryption of data along with de-duplication proving a better and
effective method for cloud storage. Along with de-duplication on encrypted data, we
propose load balancing of cloud data servers, which results in reduction of response
time and also improves overall performance.
Keywords
Cloud (Abstract Data Center), AES (Advanced Encryption Standard), SHA 1(Secured
Hash Algorithm 1 , NTRU (Nth Degree Truncated Ring Unit) , API(Application
Programming Interface).
1. INTRODUCTION different chunks and securely uploaded on
Cloud computing is the delivery of clouds.
computing services servers, storage, De-duplication Concept :
databases, networking, software, analytics, In computing, data de-duplication is a
intelligence and moreover the Internet specialized data compression technique for
.Cloud computing enables on-demand eliminating duplicate copies of repeating
network access to a shared pool of data. Related and somewhat synonymous
configurable computing resources such as terms are intelligent (data) compression
servers, storage and applications. There are and single-instance (data) storage. This
3 types of cloud :private, public and technique is used to improve storage
hybrid. Cloud provides 3 types of services utilization and can also be applied to
::IaaS, PaaS and SaaS. The key idea network data transfers to reduce the
behind this system is to develop System number of bytes that must be sent. In the
which secure your data over the cloud de-duplication process, unique chunks of
using access control scheme. For data, or byte patterns, are identified and
uploading file or data encryption algorithm stored during a process of analysis. As the
is applied and the file is divided into analysis continues, other chunks are

compared to the stored copy and whenever computing resources. Cloud load
a match occurs, the redundant chunk is balancing reduces costs associated with
replaced with a small reference that points document management systems and
to the stored chunk. Given that the same maximizes availability of resources. It is a
byte pattern may occur dozens, hundreds, type of load balancing and not to be
or even thousands of times (the match confused with Domain Name System
frequency is dependent on the chunk size), (DNS) load balancing. While DNS load
the amount of data that must be stored or balancing uses software or hardware to
transferred can be greatly reduced. perform the function, cloud load balancing
Load balancing Concept : uses services offered by various computer
Cloud load balancing is the process of Network companies.
distributing workloads across multiple
Table: Literature Survey
SR. PAPER CONCEPT AUTHOR NAME AND
NO. NAME YEAR
1. A Study on This paper introduces the Akhila Ka*,Amal
De-duplication de-duplication techniques, Ganesha,Sunitha Ca, 2016.
Techniques securing data by ELSVIER.
over encryption and some
Encrypted challenges related to it .
Data.
2. Secure Data This paper provides Junbeom Hur, Dongyoung

De-duplication scheme for security and Koo, Youngjoo Shin, and
with Dynamic de-duplication. Kyungtae Kang, 2016.
Ownership Randomized convergent IEEE.
Management encryption and secure
in Cloud ownership group key
Storage. distribution are here for
solving problem .
3. Enhanced This paper introduces the Jan Stanek, Member,
Secure concept of "data IEEE, and Lukas Kencl,
Threshold popularity". i.e Member, 2016, IEEE .
Data De- semantically secure cipher
duplication text is transparently
Scheme for downgraded to a
Cloud Storage. convergent cipher text for
de-duplication.
4. Heterogeneous This paper, proposes Zheng Yan,
data storage scheme to manage the Senior Member, IEEE,
management encrypted big data in cloud Lifang Zhang, Wenxiu
with de- with de-duplication based Ding, and Qinghua Zheng,
duplication in on ownership challenge Member, 2016.
cloud and PRE IEEE
computing.

3.GAP ANALYSIS
Table: Gap Analysis 5. PROPOSED WORK
Existing Proposed We propose a scheme to de-duplicate data
System System by applying techniques such as hashing
Sharing Less More and encryption. Also try to reduce the load
secured secured on data server, by using de-dup server that
De- File Content will reduce the response time and increase
duplication Name is is the processing speed. This is applicable in
checked checked scenarios where data holders are not
Security Moderate High available for de-duplication control.
Efficiency Medium High 5.1 Procedures
Time Low Moderate 5.1.1 Data upload
Consumption Step 1: At first the file that has to be
uploaded , its hash code (H1)is generated
4. SCHEME using SHA1 .
Our scheme contains the following main Step 2: Then the hash code is then
aspects: matched with the metadata in database
Encrypted Data Upload .In this process M(H).
initially, hash code of the data is generated Step 3: If M(H) ==H1 ,the only link the
.After that hash code is matched with that data. And then stop
in database, if exists then only link user to Step 4 :But if M(H)!=H1 i.e. data doesn‘t
respective data file. If not then the data is exist in file server then encrypt given data
encrypted using AES and then divided into using AES.
chunks. And the data is the uploaded onto Step 5:Split the given encrypted data and
respective data server. then upload the file.
Data De-duplication. This portion is
subpart of upload process .i.e. hash code of 5.1.2 Data download
the given file is generated .then this code is When the data has to be downloaded after
checked in the database if exists or not. In the request has been sent by user to the
this way the data already present is not server the following happens,
uploaded again instead only the user is Step 1: De-dup Server traces the chunks of
linked to respective file. files where have those been uploaded
Data Deletion. When the user want to using the metadata stored in database.
delete data from its respective cloud Step 2: After which the data chunks is
portion (account) .The system here rather randomly downloaded and combined to
than deleting the file from cloud simply make the original data.
delinks the user from the given data .As Step 3: Then the given file is decrypted
the file can be uploaded by many users but using AES.
stored in cloud only once ,So deleting the Step 4:And hence the file is downloaded
file would leave to ambiguous situation.
Data Owner Management. In this case 6. SYSTEM ARCHITECTURE
real data owner or the user can use cloud Our system proposes, the given
to store , retrieve data. And can use architecture in which there would be two
different functionality as upload, delete main categories for using the system i.e.
and share. existing user (login) and new user
Encrypted Data Update. Over here if the (registration). With every user having
user updates existing data .i.e. the data is username , password and a unique private
changed .So the system treats this as a new key. And can use the functionalities such
file. Hence the upload process for this new as data upload , download, sharing and
or updated file repeats. delete functionalities along with the main

motive i.e. data de-duplication , to reduce sharing as well.

the redundancy of data and secured
Fig 2: Data Flow Diagram
Fig 3: Activity Diagram

7. CONCLUSION helped us in the successful completion of

This system proposes the architecture of this project. We would also like to
de-duplication system for cloud storage extend our sincere thanks to Principal
environment and gives the process of Dr. A. V.Deshpande, for his dynamic
avoiding duplication in each stage. For and valuable guidance throughout the
Client, system employs the file-level de- project and providing the necessary
duplication to avoid duplication. The facilities that helped us to complete our
algorithm also supports mutual inclusion dissertation work. We would like to
and exclusion. thank my colleagues friends who have
Load balancing algorithm which has helped us directly or indirectly to
policy, that partitions the system into complete this work.
various domains and also having concept
of cache manager and information REFERENCES
dissemination for the various cloudlets. [1] Akhila Ka,Amal Ganesha,Sunitha Ca,‖ A
The system will perform de-duplication Study on Deduplication Techniques over
Encrypted Data‖, Department of CSE, Vidya
for reducing the redundancy of the data Academy Of Science and Technology,
and will increase the efficiency. Load Thrissur 680501, India, 2016.ELSVIER.
balancing reduces the response time. [2] Junbeom Hur, Dongyoung Koo, Youngjoo
And hence we are providing de- Shin, and Kyungtae Kang, ‖Secure Data
duplication on encrypted data which Deduplication with Dynamic Ownership
Management in Cloud Storage‖, 2016. IEEE
provides better security. Computer Society.
[3] Jan Stanek, Member, IEEE, and Lukas
.8. ACKNOWLEDGMENTS Kencl, Member,‖IEEE TRANSACTIONS
It gives us great pleasure in presenting ON DEPENDABLE AND SECURE
the preliminary project report on COMPUTING Enhanced Secure
Thresholded Data Deduplication Scheme for
‘CLOUD STRESS DISTRIBUTION Cloud Storage‖, 2016. IEEE Computer
AND DE-DUPLICATION CHECK OF Society.
CLOUD DATA WITH SECURE DATA [4] Chia-Mu Yu, Sarada Prasad Gochhayat,
SHARING VIA CLOUD Mauro Conti, and Chun-Shien Lu, ‖Privacy
COMPUTING‘. With due respect and Aware Data Deduplication for Side Channel
in Cloud Storage‖, 2015. IEEE Computer
gratitude we would like to take this Society.
opportunity to thank our internal guide [5] GAI Keke1, QIU Meikang2, SUN
Prof. D. S. Lavhkare for giving us all the Xiaotong1, ZHAO Hui3, ‖Smart data
help and guidance we needed. We are deduplication for telehealth systems in
really grateful for her kind support. He heterogeneous cloud computing‖, 2016.
SPRINGER.
has always encouraged us and given us [6] Yifeng Zheng, Xingliang Yuan, Xinyu
the motivation to move ahead. He has Wang, Jinghua Jiang, Cong Wang, and
put in lot of time and efforts in this Xiaolin Gui, ‖Towards Encrypted Cloud
project along with us and given us lot of Media Center with Secure Deduplication‖,
confidence. We are also grateful to Dr. P. 2016. IEEE Computer Society.
[7] Hui Cui, Robert H. Deng, Yingjiu Li, and
N. Mahalle, Head of Computer Guowei Wu,‖Attribute-Based Storage
Engineering Department, Smt. Kashibai Supporting Secure Deduplication of
Navale College of Engineering for his Encrypted Data in Cloud‖, 2016. IEEE
indispensable support. Also we wish to Computer Society.
thank all the other people who have

EFFICIENT CLIENT-SIDE DEDUPLICATION OF

ENCRYPTED DATA WITH IMPROVED DATA
AVAILABILITY AND PUBLIC AUDITING IN
CLOUD STORAGE
Akash Reddy1, Karishma Sarode2, Pruthviraj Kanade3,Sneha M. Patil4
1,2,3,4
Department of Compuer Engineering, Smt. Kashibai Navale College of Engineering, Vadgaon(Bk),
Pune, India.
akashr.reddy@gmail.com1,sarodekarishma94@gmail.com2,
prith.kanade@gmail.com3,srsuryavanshi@sinhgad.edu4
ABSTRACT
Storage auditing, client-side deduplication and regenerating- code techniques have
been proposed to assure data integrity, improve storage efficiency and increase the data
availability respectively. Recently, a few schemes start to consider these different
aspects together. However, these schemes either only support plaintext data file or have
been proved insecure. In this paper, we propose a public auditing scheme for cloud
storage systems, in which deduplication of encrypted data, data integrity checking and
high availability of data can be achieved within the same framework. To support these
functions, the proposed scheme performs challenge response protocols using the BLS
signature based homomorphic linear authenticator. We utilize a third party auditor for
performing public audit, in order to help low-powered clients. Also system performs
erasure code regeneration algorithm when data is corrupted. The proposed scheme
satisfies all the fundamental security requirements.
Keywords
Cloud Computing, Deduplication, Third Party Auditing, Data availability, and
Encryption.
1. INTRODUCTION the scrambling of user‘s data into a form
Cloud Storage is an online storage service such that it is impossible to decrypt the
which provides services such as data data without the acknowledgment of the
maintenance, data management, and data cryptography key. Data security can be
backup. The user is permitted to store their effectively achieved by encryption.
files online and can access the stored files Encryption and secure encryption key
from anywhere. Cloud storage is an management allow only authorized users
important service of cloud computing, to access the uploaded data. The encrypted
which has been increasingly prevalent data is meaningless without its respective
because it can provide low-cost and on- key.
demand use of vast Storage and processing In this paper, we focus on the security and
resources. Some of the most commonly efficiency of cloud storage. On one hand,
known cloud services are Dropbox, data owners may worry that their data
Google drive, etc. Stored files may be in the cloud could be lost. This is
accessed from anyplace via net because examples of data loss or server
association. corruption with major cloud service
Privacy is a major concerning factor one providers appear frequently. On the
has to look for. Cloud encryption is the other hand, since more and more
conversion of a cloud user‘s data into owners start to store their data in the
ciphertext. The Cloud storage providers cloud, storage efficiency has been a
provide services like cloud encryption concern for cloud service providers. The
which is an encryption of the user's data studies on cloud storage security and
before stored in the cloud. Encryption is efficiency have been addressed separately

for years. From the perspective of cloud encryption key, so the same file will
storage security, many data integrity result in the same ciphertext. This
checking schemes have been proposed. technique is useful, but the encryption key
From the perspective of cloud storage has nothing to do with the client‘s will.
efficiency, client-side deduplication Moreover, using the hash of a file as
technique has been adopted to save disk the encryption key is not secure[7].
space and network bandwidth. More Therefore, there still needs an efficient
specifically, the cloud server may only solution to support data integrity auditing
keep one or few copies for duplicated with storage deduplication for encrypted
files, regardless of how many data data in cloud storage. To solve this open
owners want to store that file. If the problem, the following major challenges
cloud server already stores a copy of exist:
the file, then owners do not need to 1) Client-side deduplication of encrypted
upload it again to the cloud, thus data; in real-world scenarios, owners may
bandwidth as well as storage can be saved. encrypt their data with their own keys.
However, Client-side deduplication may Thus, identical data copies of different
cause new security problems. Malicious owners will lead to different cipher texts.
owners who do not have the file may When a new owner wants to become a
obtain the exact same file by cheating new owner of the encrypted file, he needs
the cloud server. For secure client-side to prove to the cloud server that he indeed
deduplication, the notion of Proof of holds the whole file. Since the data stored
Ownership (POW) has been in cloud may be encrypted by another
introduced[4], which lets an owner owner, this new owner does not possess
efficiently prove to the cloud server the encryption key, which makes client-
that the owner indeed holds the whole file. side deduplication of encrypted data more
To achieve both data integrity auditing challenging.
and storage deduplication within the 2) Deduplication of data tags. Lacking
same framework, researchers try to mutual trust, the owners need to
combine an existing integrity checking separately store their own data tags in
scheme with a POW scheme. Zheng et the cloud. Due to the large number of
al proposed a scheme named POSD[5] owners, the storage overhead of tags
and Yuan et al proposed a scheme may be very huge, which contradicts the
named PCAD[6]. However, these objective of deduplication for saving
schemes are no longer applicable to the storage.
cloud storage systems for some reasons: 3) Public auditing for de-duplicated and
1) Zheng‘s POSD scheme has been encrypted data. Any owners can delegate
proved not secure and the storage the data integrity auditing task to the
overhead of tags is linear to owners. auditor. In our scheme, the cloud server
2) Yuan‘s PCAD suffers a high only stores one copy of encrypted data
communication cost on owner side and the product of de-duplicated data
during deduplication which is linear to tags of all owners. In such case, how to
the challenged number of blocks. guarantee the integrity of de-duplicated
3) Both Zheng‘s POSD and Yuan‘s data can be still correctly checked.
PCAD schemes cannot support encrypted In this paper, we address the above
data, while data confidential is the basic challenges and propose an efficient
security requirement for storing data in public auditing scheme for encrypted
untrusted cloud. data with client-side deduplication.
To achieve deduplication of encrypted Our contributions can be summarized as
data, convergent encryption is proposed follows:
[7]. It uses the hash of the file as the

1) We propose a public remote knowledge checking supports large

auditing scheme by TPA for cloud knowledge sets in widely-distributed
storage systems, in which data storage systems. We gift 2 provably-secure
integrity checking and storage PDP schemes that square measure more
deduplication can be achieved economical than previous solutions, even
within the same framework. compared with schemes that attain weaker
2) Our scheme performs PoW guarantees. Above all, the overhead at the
for secure deduplication and server is low (or even constant), as op-
integrity auditing based on the posed to linear within the size of the info.
homomorphic linear authenticator Experiments exploitation our
(HLA), Thus, the storage overhead implementation verify the usefulness of
is independent of the number of PDP and reveal that the performance of
owners. PDP is delimited by disk I/O and not by
3) Our scheme can greatly science computation.
reduce the communication cost on In a proof-of-irretrievability [14] system,
owner side during the auditing of an information storage centre should
deduplication, which is as small as persuade a verger that he‘s actually storing
the size of one data block. all of a client‘s knowledge. The central
4) To improve the security challenge is to create systems that are each
OTP is used at login. ancient and incontrovertibly secure that is,
5) Use of Erasure Codes to it ought to be doable to extract the client‘s
improve data availability. knowledge from any proverb that passes a
variation check. During this paper, we
The paper is organized as; section 2 have a tendency to offer the rest proof-of-
contains information about literature irretrievability schemes with full proofs of
review. Section 3 contains implementation security against impulsive adversaries
details which includes system architecture, within the strongest model, that of Juels
systems overview, mathematical model, and Kaliski. Our rest theme, engineered
and experimental setup. The section 4 from BLS signatures and secure within the
contains results and discussion of the random oracle model, features a proof-of-
project work done so far. Section 5 irretrievability protocol within which the
contains the conclusion of research work client‘s question and server‘s response are
done. each extremely short. This theme permits
public variability: anyone will act as a
2. LITERATURE REVIEW varied, not simply the le owner. Our
Author introduce [1] a model for second theme that builds on pseudorandom
demonstrable knowledge possession (PDP) functions (PRFs) and is secure in the
that allows a consumer that has keep standard model, permits solely non-public
knowledge at associate untreated server to variation. It options a proof-of-
verify that the server possesses the initial irretrievability protocol with a good
knowledge without retrieving it. The shorter server‘s response than our rest
model generates probabilistic proofs of theme; however the client‘s question is
possession by sampling random sets of long. Both schemes admit homomorphism
blocks from the server that drastically properties to mixture an indication into
reduces I/O prices. The client maintains a one little critic price.
continuing quantity of information to Remote information checking (RDC) [8]
verify the proof. The challenge/response may be a technique by that purchasers will
protocol transmits a little, constant amount establish that information outsourced at
of information that minimizes network entrusted servers remains intact over time.
communication. Thus, the PDP model for RDC is helpful as a bar tool, permitting

purchasers to periodically check if concerning the necessity to verify its

information has been broken, and as a integrity. Thus, sanctioning public audit
repair tool whenever injury has been ability for cloud storage is of essential
detected. At first planned within the importance so users will resort to a 3rd
context of one server, RDC was later party auditor (TPA) to examine the
extended to verify information integrity in integrity of outsourced data and be worry-
distributed storage systems that deem free. To firmly introduce an efficient TPA,
replication and on erasure writing to store the auditing method ought to usher in no
information redundantly at multiple new vulnerabilities towards user
servers. Recently, a way was planned to knowledge privacy, and introduce no
feature redundancy supported network further on-line burden to user. During this
writing that offers attention-grabbing paper, we have a tendency to propose a
trade-offs as a result of its remarkably low secure cloud storage system supporting
communication overhead to repair corrupt privacy preserving public auditing. We
servers. Unlike previous work on RDC have a tendency to any extend our result to
that centered on minimizing the costs of alter the TPA to perform audits for
the bar section, we have a tendency to take multiple users simultaneously and with
a holistic look and initiate the investigation efficiency. In depth security and
of RDC schemes for distributed systems performance analysis show the planned
that deem network writing to attenuate the schemes area unit incontrovertibly secure
combined prices of each the bar and repair and extremely economical.
phases. We have a tendency to propose In this paper [10], we tend to outline and
RDC-NC, a completely unique secure and explore proofs of irretrievability (PORs).
efficient RDC theme for network coding- A POR theme enables associate archive or
based distributed storage systems. RDC- back-up service (proverb) to supply a
NC mitigates new attacks that stem from laconic proof that a user (verifier) can
the underlying principle of network retrieve a target file F, that is, that the
writing. The theme is in a position to archive retains and faithfully transmits file
preserve in associate adversarial setting the knowledge sufficient for the user to
lowest communication overhead of the recover F in its completeness. A POR is
repair part achieved by network writing also viewed as a sort of crypto logic proof
during a benign setting. We implement our of data (POK), however one specially
theme and by experimentation show that designed to handle an outsized file (or bit
it‘s computationally cheap for each string) F. we tend to explore POR
purchasers and servers. protocols here during which the
Using Cloud [17] Storage, users will communication prices, range of memory
remotely store their knowledge and accesses for the proverb, and storage needs
luxuriate in the on-demand prime quality of the user (verifier) square measure little
applications and services from a shared parameters primarily freelance of the
pool of configurable computing resources, length of F. additionally to proposing new,
while not the burden of native knowledge sensible POR constructions, we tend to
storage and maintenance. However, the explore implementation issues and
actual fact that users not have physical optimizations that bear on antecedent
possession of the outsourced knowledge explored, connected schemes. In a POR,
makes the information integrity protection not like a POK, neither the proverb nor the
in Cloud Computing a formidable task, friend would like even have data of F.
particularly for users with affected PORs produce to a brand new and strange
computing resources. Moreover, users security definition whose formulation is
ought to be ready to just use the cloud another contribution of our work. We read
storage as if it‘s native, without fear PORs as a vital tool for semi-trusted on-

line archives. Existing crypto logic the deduplication storage has been gaining
techniques facilitate users make sure the great significance.
privacy and integrity of files they retrieve.
It‘s conjointly natural, however, for users 3. IMPLEMENTATION DETAILS
to require verifying that archives don‘t System Overview
delete or modify files before retrieval. The The Figure 1. Shows the proposed system
goal of a POR is to accomplish these architecture. , For integrity auditing and
checks while not users having to transfer secure deduplication our scheme uses the
the files themselves. A POR may also give BLS signature-based Homomorphic Linear
quality-of-service guarantees, i.e., show Authenticator (HLA), proposed in. We
that a file is retrievable at intervals an also introduce TPA to support public
explicit time certain. integrity auditing. The proposed scheme
Proposed scheme improves compression consists of the following.
effectiveness by 11 percent to 105 percent, Client (or user).
compared to traditional compressors. It outsource data to a cloud storage. CE-
Deduplication process is used for encrypted data is first generated, and then
eliminating duplicates in data, thus uploaded it to the cloud storage to preserve
improving the effective capacity of storage confidentiality. The client also needs to
systems. Single-node raw capacity is still verify the integrity of the outsourced data.
mostly limited to tens or a few hundreds of For verifying integrity, the client delegates
terabytes, forcing users to resort to integrity auditing to the TPA.
complex. In [11], author proposed new Cloud Storage Server (CSS).
mechanisms called progressive sampled It provides different services to users for
indexing and grouped mark and sweep, to data storage. Deduplication technique is
address dedupe challenges and also to applied to save space required for storage
improve single-node scalability. and cost. We think that the CSS may act
Progressive sampled indexing removes maliciously because of attacks,
scalability limitations by using indexing software/hardware malfunctions,
technique. intentional saving of computing resources,
Advantages of proposed scheme are, etc. During the deduplication process, the
improves scalability, provide good CSS apply the PoW protocol to
deduplication efficiency and improvement authenticate the client owns the file.
in throughput. H. L. Goh, K. K. Tan, S. Moreover, in the integrity audit process, it
Huang, and C. W. d. Silva [19], author is necessary to generate and respond to a
proposed three fold approaches, first they proof corresponding to the request of the
discuss sanitization requirements in the TPA.
context of de-duplicated storage, second TPA (Third Party Auditor).
implemented a memory efficient technique TPA Performs auditing on behalf of the
for managing data based on perfect client to decrease the client‘s processing
hashing, third they design sanitizing de- cost. Instead of the client, the auditor sends
duplicated storage for EMC data domain. a challenge to the storage server to
Proposed approach minimizes memory periodically perform an integrity audit
and I/O requirements. Perfect hashing protocol. TPA is assumed to be a semi-
requires a static fingerprint space, which trust model, that is, an honest model.
conflicts with proposed scheme desire to
support host writes during sanitization
Data de-duplication has recently gain
importance in most secondary storage and
even in some primary storage for the
storage purpose. A read performance of

output by Gen(1k), all ∈[B]n, all

( ,st)output by Tagsk( ), and all
∈Zn p, it holds that
Experimental Setup
The system is built using Java framework
(version JDK 1.8) on Windows platform.
The NetBeans (version 8.0) is used as a
development tool. The system doesn‘t
require any specific hardware to run, any
standard machine is capable of running the
application.
Fig 1: Architecture Diagram 4. RESULTS AND DISCUSSION

Algorithms Dataset
HLA: A public-key homomorphic linear System conduct a large experiment on the
authenticator is a tuple of four ppt News dataset, which is download from
algorithms (Gen,Tag,Auth,Vrfy) such that: UCI Machine Learning website.
1. (pk,sk) ← Gen(1k) is a (https://archive.ics.uci.edu/ml/datasets/Ne
probabilistic algorithm used to set ws+Aggregator).Dataset contains news
up the scheme. It takes as input the which are grouped into clusters that
security parameter and outputs a represent pages discussing the same news
public and private key pair (pk,sk). story.
We assume pk defines a k-bit System Comparison
prime p and a positive integer B. Table 1. System Comparison Table of Various
2. ( ,st) ← Tagsk( ) is a probabilistic References with Multiple Attributes.
algorithm that is run by the client Paper T Dat De- Dat Rege
in order to tag a file. It takes as P a dupli a nerat
input a secret key sk and a file ∈ A Enc catio Dy ing
ryp n na Code
[B]n, and outputs a vector of tags ~ tion Chec mic s
t and state information st. k s
3. τ := Authpk( , , ) is a
deterministic algorithm that is run [1] ✔ ✖ ✖ ✖ ✖
by the server to generate a tag. It [12] ✔ ✔ ✔ ✖ ✖
takes as input a public key pk, a file
∈[B]n, a tag vector , and a [18] ✔ ✔ ✔ ✖ ✖
challenge vector ∈Zn p; it
[2] ✔ ✔ ✖ ✖ ✖
outputs a tag τ.
4. b := Vrfypk(st,µ, ,,τ): is a [15] ✔ ✔ ✖ ✔ ✖
deterministic algorithm that is used
to verify a tag. It takes as input a
[6] ✔ ✔ ✖ ✔ ✖
public key pk, state information st, [19] ✔ ✔ ✔ ✖ ✖
an element µ ∈ N, a challenge
vector ∈ Zn p, and a tag τ. It Propos ✔ ✔ ✔ ✔ ✔
outputs a bit, where ‗1‘ indicates ed
acceptance and ‗0‘ indicates Syste
rejection. For correctness, we m
require that for all k ∈N, all (pk,sk)

Gap Analysis and Resolution Matrix proposed application or functionality

and its resolution
 This section captures the key
information about the current and

Table 2. Gap Analysis

No Topics Current Proposed Gap Resolution
[1] Efficient Client- Efficient Client- Use Of Improving Use Of
Side Side Regeneration The Data Erasure Code
Deduplication Of Deduplication Codes For Availability Algorithm
Encrypted Data Of Encrypted Increasing Solves The
With Public Data With The Data Problem Of
Auditing In Public Auditing Availability Data
Cloud Storage In Cloud Availability
With Storage
Regeneration
Codes
[2] User Security Less User Improved Improving Use of OTP
Security User the user will improve
Security security the user
With OTP with use of security
OTP which
is send to
registered
email id or
mobile
number
5. CONCLUSION 6. ACKNOWLEDGMENT
When storing data on remote cloud The authors would like to thank the
storages, users want to be assured that researchers as well as publishers for
their outsourced data are maintained making their resources available and
accurately in the remote storage without teachers for their guidance. We are
being corrupted. In addition, cloud servers thankful to the authorities of Savitribai
want to use their storage more efficiently. Phule University of Pune and concern
To satisfy both the requirements, here members of ICINC 2019 conference,
system proposes a scheme to achieve both organized by Smt. Kashibai Navale
secure deduplication and integrity auditing College of Engineering, Pune. For their
in a cloud environment. To prevent constant guidelines and support. We are
leakage of important information about also thankful to the reviewer for their
user data, the proposed scheme supports a valuable suggestions. We also thank the
client side deduplication of encrypted data, college authorities for providing the
while simultaneously supporting public required infrastructure and support.
auditing of encrypted data also proposed Finally, we would like to extend a heartfelt
system supports high data availability with gratitude to friends and family members.
the use of erasure codes.
REFERENCES
[1] Ateniese, R. Burns, R. Curtmola, J. Herring, L.
Kissner, Z. Peterson, and D. Song, ―Provable

data possession at untrusted stores,‖ in Proc. USENIX Security Symposium (USENIX

Of the 14th ACM conference on Computer and Security 13), Washington, D.C. USA, 2013,
communications security (CCS‘07), pp. 179–194.
Alexandria, Virginia, USA, 2007, pp. 598– [9] J. Li, J. Li, D. Xie and Z. Cai, ―Secure auditing
609. and deduplicating data in cloud,‖ IEEE
[2] G. Ateniese, R. Di Pietro, L.V. Mancini and G. Transactions on Computers, vol. 65, no. 8, pp.
Tsudik, ―Scalable and efficient provable data 2386–2396, Aug. 2016.
possession,‖ in Proc. Of the 4th international [10] X. Liu,W. Sun, H. Quan,W. Lou, Y. Zhang
conference on Security and privacy in and H. Li, ―Publicly verifiable inner product
communication netowrks (SecureComm‘08), evaluation over outsourced data streams under
Istanbul, Turkey, 2008, pp. 1–10 multiple keys,‖ IEEE Transactions on Services
[3] D. Boneh, B. Lynn and H. Shacham, ―Short Computing, vol. 10, no. 5, pp. 826- 838, Sept.-
signatures from the Wei pairing,‖ Journal of Oct. 2017.
Cryptology, vol. 17, no. 4, pp. 297–319, Sept. [11] H. Shacham and B. Waters, ―Compact proofs
2004. Y. Dodis, S. Vadhan and D. Wichs, of retrievability,‖ in Proc. Of the 14th
―Proofs of retrievability via hardness International Conference on the Theory and
amplification,‖ in Proc. of the 6th Theory of Application of Cryptology and Information
Cryptography Conference on Theory of Security, Advances in Cryptology –
Cryptography (TCC‘09), San Francisco, CA, ASIACRYPT 2008, Melbourne, Australia,
USA, 2009, pp. 109 2008, pp. 90–107.
[4] M. Dworkin, ―Recommendation for block [12] Q.Wang, C.Wang, K. Ren,W. Lou and J. Li,
cipher modes of operation methods and ―Enabling public auditability and data
techniques,‖ NIST, USA, No. NIST-SP-800- dynamics for storage security in cloud
38A., 2001.C. Erway, A. Küpçü, C. computing,‖ IEEE Transactions on Parallel
Papamanthou and R. Tamassia, ―Dynamic and Distributed Systems, vol. 22, no. 5, pp.
provabl data possession,‖ in Proc. of the 16th 847–859, Dec. 2011.
ACM conference on Computer and [13] Y. Youn, K. Y. Chang, K. R. Rhee and S. U.
communications security (CCS‘09), Chicago, Shin, ―Public Audit and Secure Deduplication
Illinois, USA, 2009, pp. 213–222. in Cloud Storage using BLS signature,‖
[5] J. Gantz and D. Reinsel, ―The digital universe Research Briefs on Information &
decade - are you ready?‖ IDC White Paper, Communication Technology Evolution
2010.S. Halevi, D. Harnik and B. Pinkas and (ReBICTE), vol. 3, article no. 14, pp. 1-10,
A. Shulman-Peleg, ―Proofs of ownership in Nov. 2017.
remote storage systems,‖ in Proc. of the 18th [14] J. Yuan and S. Yu, ―Proofs of retrievability
ACM conference on Computer and with public verifiability and constant
communications security (CCS‘11), Chicago, communication cost in cloud,‖ in Proc. of the
USA, 2011, pp. 491–500. 2013 international workshop on Security in
[6] D. Harnik, B. Pinkas and A. Shulman-Peleg, cloud computing, Hangzhou, China, 2013,
―Side channels in cloud services: pp.19–26.
Deduplication in cloud storage,‖ IEEE [15] J. Yuan and S. Yu, ―Secure and constant cost
Security & Privacy, vol. 8, no. 6, pp. 40–47, public cloud storage auditing with
Dec. 2010. deduplication,‖ in Communications and
[7] A. Juels and B.S. Kaliski Jr, ―Pors: proofs of Network Security (CNS), 2013 IEEE
retrievability for large files,‖in Proc. of the Conference on, National Harbor, MD, USA,
14th ACM conference on Computer and 2013, pp. 145-153.
communications security (CCS‘07), [16] Taek-young Youn, Ku-young Chang, Kyung
Alexandria, Virginia, USA, 2007, pp. 584– Hyune Rhee, and Sang Uk Shin ―Efficient
597. Client-Side Deduplication of Encrypted Data
[8] S. Keelveedhi and M. Bellare and T. with Public Auditing in Cloud Storage‖,
Ristenpart, ―DupLESS: serveraided encryption Access.2018.2836328, IEEE Access 2018.
for deduplicated storage,‖ in Proc. of the 22nd

A NOVEL METHODOLOGY USED TO STORE BIG

DATA SECURELY IN CLOUD
Kale Piyusha Balasaheb1, Pawar Shital Vijaykumar2, Ukande Monika Prakash3
1,2,3
Department Of Computer Engineering, Shri Chhatrapati Shivaji Maharaj College Of Engineering, Nepti,
Ahmednagar.
Piyushabkale748@gmail.com1, Shital33pawar@gmail.com2, Monikaukande22@gmail.com3
ABSTRACT
Big data are voluminous and complex for that retrieving cipher text to a cloud is
deemed to be one Of most effective approaches for big data storage and access. The
new policies are proposed in cloud where access legitimacy of user and updating cipher
text security designated by data owner are two critical challenges to make cloud based
data storage practical and effective. Existing approach are completely avoid access
policy but in reality it is important to update the access policy amplify the security and
dealing with different cause by user join and leave activity. In this system, we plan a
secure and verifiable access control scheme based on NTRU cryptosystem for big data
storage in cloud. First the new NTRU decryption algorithm meet the decryption fault of
the original NTRU, then details of its analyse its correctness, security strength and
computational efficiency. When new access policy is specified by big data owner our
system allows the cloud server to efficiently update the cipher text. Which is able to
update and validate policy against cheating scheme of cloud? It also authorize the end
user to validate information by other user for data access and a user to validate the
information provided by for recovery of plaintext.
General Terms
1. Security: The proposed scheme should be able to defend against various attacks such
as the collusion attack. Meanwhile, access policy update should not break the security
of the data storage, disclose sensitive information about the data owner, and cause any
new security problem.
2. Verification: When a user needs to decrypt a stored cipher text, its access legitimacy
should be verified by other participating users and the secret shares obtained from
other users must be validated for correct recovery.
3. Authorization: To reduce the risk of information leakage, a user should obtain
authorization from the data owner for accessing the encrypted data.
Keywords
NTRU Cryptosystem, Big Data, Cipher text.
1. INTRODUCTION asks for in a proficient way. For instance
Enormous information is a high volume, as in e-health applications, the genome data
well as high speed, high assortment data thought to be safely put away in an e-
resource, which requires new types of wellbeing cloud as a solitary sequenced
preparing to empower upgraded basic human genome is around 140 gigabytes in
leadership, understanding disclosure, and measure.
process improvement. Because of its In any case, when an information
intricacy and substantial volume, proprietor outsources its information to a
overseeing huge information utilizing cloud, delicate data might be unveiled on
close by database administration the grounds that the cloud server isn't
instruments is troublesome. A viable trusted; normally the cipher text of the
arrangement is to outsource the information is put away in the cloud. Be
information to a cloud server that has the that as it may, how to refresh the
capacities of putting away huge ciphertext put away in a cloud when
information and handling clients' entrance another entrance strategy is assigned by

the information proprietor and how to development of big data and Internet of
check the authenticity of a client who things (IOT), the number of networking
means to get to the information are still of devices and data volume are increasing
awesome concerns. dramatically. Fog computing, which
This paper is divided into Section- I as extends cloud computing to the edge of the
Introduction. Section-II as Literature network can effectively solve the
Survey is related to NTRU algorithm. bottleneck problems of data transmission
Section-III as Motivation and Its related and data storage. However, security and
work. Section-IV Proposed System. privacy challenges are also arising in the
Section-V Its Implementation. Section-VI fog cloud computing
describes the Novel Methodology to store environment.Ciphertext-policy attribute-
big in cloud. based encryption (CP-ABE) can be
adopted to realize data access control in
2. LITERATURE SURVEY fog-cloud computing systems. In this
The main focus of literature survey is to system, we propose a verifiable outsourced
study and contrast the existing models to multi-authority access control scheme,
protect from pothole using manual named VO-MAACS. In our construction,
methods. This chapter highlights the most encryption and decryption
succinct research contributions in computations are outsourced to fog
developing automated pothole detection devices and the computation results can be
based on the various detection techniques. verified by using our verification method.
 Surajkumar Singh, Niraj Meanwhile, to address the revocation
Chaudhary, Sreenu M, Manjunath B M, issue, we design an efficient user and
Secure Accessibility for Big Data in Cloud attribute revocation method for it. Finally,
International Journal of Innovative analysis and simulation results show that
Research in Science, Engineering and our scheme is both secure and highly
Technology Volume 7, Special Issue 6, efficient.
May 2018 NTRU and then present a  Roslin Dayana K.,Vigilson Prem M.,
secure and verifiable access control Review of the Various Optimized Access
scheme based on the improved NTRU to Control Techniques for Big Data in Cloud
protect the outsourced big data stored in a Environment, International Journal of
cloud. Our scheme allows the data owner Computer Applications (0975 8887)
to dynamically update the data access Volume 179 No.11, January 2018.Cloud
policy and the cloud server to successfully computing is an information technology
update the corresponding Outsourced (IT) domain that enables efficient access to
cipher text to enable efficient access shared and private collection of
control over the big data in the cloud. The configurable system resources. It provides
security of our proposed scheme is higher-level services that can be very
guaranteed by those of the NTRU quickly provisioned at a greater rate with
cryptosystem and the (t,n)-threshold secret minimum amount of effort for
sharing. We have rigorously analysed the management, mostly over the Internet.
correctness, security strength, and Due to the high complexity and huge
computational complexity of our proposed volume, outsourcing cipher texts to a cloud
scheme. is deemed to be one of the most effective
 Kai Fan, Junxiong Wang, Xin approaches for big data storage and access.
Wang , Hui Li and Yintang Yang, A Verifying the access legitimacy of a user
Secure and Verifiable Outsourced Access and securely updating a ciphertext in the
Control Scheme in Fog-Cloud Computing cloud based on a new access policy
Sensors 2017, 17, 1695; designated by the data owner are two
doi:10.3390/s17071695 With the rapid critical challenges. The access policy

update is important for enhancing security information proprietor, who is

and dealing with the dynamism caused by additionally ready to approve the refresh
user join and leave activities. In this paper, to counter against bamboozling practices
the two different approaches developed of the cloud. It likewise empowers (I) the
recently to provide the secure, verifiable information proprietor and qualified
and flexible access control of Big data clients to adequately confirm the
storage in cloud are discussed to solve the authenticity of a client for getting to the
above challenges. The working and information, and (ii) a client to approve
drawbacks of different schemes developed the data gave by different clients to revise
in the past for the access control are also plaintext recuperation.
discussed.
 Dr. S.Prayla Shyry, Dhrupad 3. MOTIVATION
Kumar Das, A Secure And verifiable In this information era, companies and
Access Control Scheme For Big Data organizations are facing a challenging
Storage In Clouds, International Journal problem of effectively managing their
of Pure and Applied Mathematics complex data. As the development of
Volume 119 No. 12 2018, 14147-14153. cloud storage, outsourcing the data to a
Because of the intricacy and volume, cloud is an appropriate approach.
outsourcing cipher texts to a cloud Generally speaking, clouds can be
isconsidered to be a standout amongst the classified into two major categories: i)
best methodologies for enormous public clouds with each being a multi-
information stockpiling and access. By tenant environment shared with a number
and by, confirming the entrance of other tenants, and ii) private clouds with
authenticity of a client and safely each being a single-tenant environment
refreshing a ciphertext in the cloud in dedicated to a single tenant[5][6]. In this
view of another entrance strategy system we propose a secure and verifiable
assigned by the information proprietor are access control scheme for big data storage
two basic difficulties to make cloud-based to tackle the following challenges: i) how
huge information stockpiling to securely store the data in a cloud server
commonsense and successful. and distribute the shares of the access right
Conventional methodologies either totally to all legitimate users of the data? ii) how
disregard the issue of access arrangement to verify the legitimacy of a user for
refresh or designate the refresh to an accessing the data? iii) how to recover the
outsider specialist; yet practically data when the access right needs to be
speaking, get to approach refresh is vital jointly granted by multiple users? and iv)
for improving security and managing the how to dynamically and efficiently update
dynamism caused by client join and leave the cipher text in the cloud when the
exercises. In this paper, we propose a safe access policy of the data is changed by the
and evident access control plot in light of data owner? To overcome these
the NTRU cryptosystem for huge challenges, we make use of the following
information stockpiling in mists. We techniques in the design of our secure and
initially propose another NTRU decoding verifiable access control scheme for big
calculation to conquer the unscrambling data storage. First, a plaintext data is
disappointments of the first NTRU, and bound to a secret that is shared by all
afterward detail our plan and break down legitimate users of the data based on (t; n)-
its rightness, security qualities, and threshold secret sharing, and a message
computational proficiency. Our plan certificate is computed for the data based
enables the cloud server to effectively on the NTRU encryption; the cipher text is
refresh the ciphertext when another produced from both the shared secret and
entrance approach is determined by the the message certificate.

4. PROPOSED SYSTEM It will not collude with any entity to

We propose a novel heterogeneous acquire data contents. AAs are responsible
framework to remove the problem of for conducting legitimacy verification of
single point performance bottleneck and users and judging whether the users have
provide a more efficient access control the claimed attributes. We assume that AA
scheme with an auditing mechanism. Our can be compromised and cannot be fully
framework employs multiple attribute trusted. Furthermore, since the user
authorities to share the load of user legitimacy verification is conducted by
legitimacy verification. Meanwhile, in our manual labor, misoperation caused by
scheme, a CA (Central Authority) is carelessness may also happen. Thus, we
introduced to generate secret keys for need an auditing mechanism to trace an
legitimacy verified users. Unlike other AAs misbehavior. Although a user can
multi- authority access control schemes, freely get any encrypted data from the
each of the authorities in our scheme cloud server, he/she cannot decrypt it
manages the whole attribute set unless the user has attributes satisfying the
individually. To enhance security, we also access policy embedded inside the data.
propose an auditing mechanism to detect Therefore, some users may be dishonest
which AA (Attribute Authority) has and curious, and may collude with each
incorrectly or maliciously performed the other to gain unauthorized access or try to
legitimacy verification procedure. collude with (or even compromise) any
Analysis shows that our system not only AA to obtain the access permission
guarantees the security requirements but beyond their privileges. Owners have
also makes great performance access control over their uploaded data,
improvement on key generation. In our which are protected by specific access
proposed scheme, the security assumptions policies they defined.
of the five roles are given as follows. The
cloud server is always online and managed 5. IMPLEMENTATION MODULE
by the cloud provider. Usually, the cloud The system model of our design is shown
server and its provider are assumed to be in Fig. which involves five entities: a
honest-but-curious, which means that they central authority (CA), multiple attribute
will correctly execute the tasks assigned to authorities (AAs), many data owners
them for profits, but they would try to find (Owners), many data consumers (Users),
out as much secret information as possible and a cloud service provider with multiple
based on data owners inputs and uploaded cloud servers (here, we mention it as cloud
files. CA is the administrator of the entire server.).
system, which is always online and can be A. The central authority (CA): CA is the
assumed to be fully trusted. administrator of the entire system. It is
responsible for the system construction by
setting up the system parameters and
generating public key for each attribute of
the universal attribute set. In the system
initialization phase, it assigns each user a
unique Uid and each attribute authority a
unique Aid. For a key request from a user,
CA is responsible for generating secret
keys for the user on the basis of the
received intermediate key associated with
the users legitimate attributes verified by
an AA. As an administrator of the entire
Fig 1. Project Idea
system, CA has the capacity to trace which

AA has incorrectly or maliciously verified cloud server doesnt conduct data access
a user and has granted illegitimate attribute control for owners. The encrypted data
sets[9]. stored in the cloud server can be
B. The attribute authorities (AAs): AAs downloaded freely by any user.
are responsible for performing user
legitimacy verification and generating 6. METHODOLOGY
intermediate keys for legitimacy verified The NTRU cryptosystem is based on the
users. Unlike most of the existing multi- shortest vector problem (SVP) in a lattice
authority schemes where each AA that makes it lightning fast and resistant to
manages a disjoint attribute set quantum computing attacks. It has been
respectively, our proposed scheme proved to be faster than RSA. NTRU
involves multiple authorities to share the implements the following three basic
responsibility of user legitimacy functions[12].
verification and each AA can perform this 1. Key Generation: To create his public
process for any user independently. When and private keys.
an AA is selected, it will verify the users 2. Encryption: To send a message first we
legitimate attributes by manual labor or encrypt the msg.
authentication protocols, and generate an 3. Decryption: The encrypted message is
intermediate key associated with the decrypts by using private key.
attributes that it has legitimacy-verified.
Intermediate key is a new concept to assist 7. CONCLUSION AND FUTURE
CA to generate keys[10]. SCOPE
C. The data owner (Owner): Data owner In this system, we first propose an
defines the access policy about who can improved NTRU cryptosystem to
get access to each file, and encrypts the overcome the decryption failures of the
file under the defined policy. First of all, original NTRU and then present a secure
each owner encrypts his/her data with a and verifiable access control scheme based
symmetric encryption algorithm. Then, the on the improved NTRU to protect the
owner formulates access policy over an outsourced big data stored in a cloud. Our
attribute set and encrypts the symmetric scheme allows the data owner to
key under the policy according to public dynamically update the data access policy
keys obtained from CA. After that, the and the cloud server to successfully update
owner sends the whole encrypted data and the corresponding outsourced ciphertext to
the encrypted symmetric key (denoted as enable efficient access control over the big
ciphertext CT) to the cloud server to be data in the cloud. It also provides a
stored in the cloud. verification process for a user to validate
D. The data consumer (User): User is its legitimacy of accessing the data to both
assigned a global user identity Uid by CA. the data owner and t -1 other legitimate
The user possesses a set of attributes and is users and the correctness of the
equipped with a secret key associated with information provided by the t -1 other
his/her attribute set. The user can freely users for plaintext recovery.
get any interested encrypted data from the
cloud server. However, the user can 8. ACKNOWLEDGMENT
decrypt the encrypted data if and only if We express our sincere thanks to our
his/her attribute set satisfies the access project guide Prof. Lagad J. U. who
policy embedded in the encrypted always being with presence & constant,
data[11]. constructive criticism to made this paper.
E. The cloud server : Cloud Server We would also like to thank all the staff of
provides a public platform for owners to COMPUTER DEPARTMENT for their
store and share their encrypted data. The valuable guidance, suggestion and support

through the project work, who has given [5] Chunqiang Hu, Wei Li, Xiuzhen Cheng, Jiguo
co-operation for the project with personal Yu, Shenling Wang, Rongfang Bie. A Secure
and Verifiable Access Control Scheme for Big
attention. Above all we express our Data Storage in Cloud , IEEE Transactions on
deepest gratitude to all of them for their Big Data, Vol pp, issue 99, Feb 2017.
kind-hearted support which helped us a lot [6] Zheng Yan, Xueyun Li, MingjunWang,
during project work. At the last we Athanasios V. Vasilakos Flexible Data Access
thankful to our friends, colleagues for the Control Based on Trust and Reputation in
Cloud Computing, IEEE Transactions on
inspirational help provided to us through a Cloud Computing, Vol 5, issue 3, July-Sept. 1
project work. 2017.
[7] E. Goh, H. Shacham, N. Modadugu, D. Boneh,
REFERENCES Sirius: Securing untrusted storage, Proc. of
[1] Surajkumar Singh, Niraj Chaudhary, Sreenu NDSS, 2003, pp. 131145.
M, Manjunath B M, Secure Accessibility for [8] L. Zhou, V. Varadharajan, M. Hitchens,
Big Data in Cloud International Journal of Achieving secure role-based access control on
Innovative Research in Science, Engineering encrypted data in cloud storage, IEEE Trans.
and Technology Volume 7, Special Issue on Information Forensics and Security, vol. 8,
6,May 2018 no. 12, pp. 1947-1960, 2013.
[2] Kai Fan, JunxiongWang , XinWang , Hui Li [9] S. Yu, C. Wang, K. Ren, W. Lou, Achieving
and Yintang Yang, A Secure and Verifiable secure, scalable, and fine-grained data access
Outsourced Access Control Scheme in Fog- control in cloud computing, Proc. of the IEEE
Cloud Computing Sensors 2017, 17, 1695; INFOCOM, 2010, pp. 534542.
doi:10.3390/s17071695 [10] G. Wang, Q. Liu, J. Wu, M. Guo, Hierarchical
[3] Roslin Dayana K.,Vigilson Prem M., Review attribute-based encryption and scalable user
of the Various Optimized Access Control revocation for sharing data in cloud servers,
Techniques for Big Data in Cloud Computers Security, vol. 30, no. 5, pp.
Environment, International Journal of 320331, 2011.
Computer Applications (0975 8887) Volume [11] A. Lewko and B.Waters, Decentralizing
179 No.11, January,2018. attribute-based encryption, Advances in
[4] Dr. S.Prayla Shyry, Dhrupad Kumar Das, A Cryptology EUROCRYPT 2011, pp. 568588,
Secure And verifiable Access Control Scheme 2011.
For Big Data Storage In Clouds, International [12] C. Hu, X. Cheng, Z. Tian, J. Yu, K. Akkaya,
Journal of Pure and Applied Mathematics and L. Sun, An attribute based signcryption
Volume 119 No. 12 2018, 14147-14153 scheme to secure attribute-defined multicast
communications, in Secure Comm 2015.
Springer, 2015, pp. 418435

SURVEY PAPER ON SECURE HETEROGENEOUS

DATA STORAGE MANAGEMENT WITH
DEDUPLICATION IN CLOUD COMPUTING
Miss. Arati Gaikwad1, Prof. S. P. Patil2
1,2
Department of Compuer Engineering, Smt. Kashibai Navale College of Engineering, Vadgaon(Bk), Pune,
India.
aratisvgaikwad@gmail.com1, Shaileshpp19@gmail.com2
ABSTRACT
SAAS(Storage as a service) as one of the most important cloud computing services
helps cloud users overcome the bottleneck of limited resources and expand storage
without upgrading their devices. To ensure the safety and security of cloud users, data
is always outsourced in encrypted format. However, encrypted data could generate a lot
of storage waste in the cloud and complicate the exchange of data between authorized
users. We are still facing challenges in storage and management encrypted data with
deduplication. Traditional deduplication schemes always focus on particular
application scenarios, where deduplication is completely controlled by data owners or
servers in the cloud. They cannot flexibly satisfy the different requests from data
owners based on the level of sensitivity of the data. In this paper, scheme that flexibly
offers both deduplication scheme management and access control at the same time
through multiple cloud service providers (CSPs). Author evaluate your performance
with security analysis, comparison and implementation.
Keywords-
Cloud Computing, Data Deduplication, Access Control, Storage Management.
1. INTRODUCTION even exclude an active and deliberate data
The storage system in the cloud has file that we only have access to and belong
been adopted mostly does not meet some to a common customer. Given the large
important emerging needs, such as the size of outsourced data files and the
ability to verify the integrity of files in the limited capacity of customer resources, the
cloud by customers in the cloud and the first problem is widespread so the
detection of duplicate files on servers in customer can perform integrity checks
the cloud. Author report both problems effectively, even without a local copy of
below. These servers in the cloud can free the data file. Cloud computing is
customers from the heavy burden of computing in which large groups of
storage management and maintenance. The remote servers are networked to allow
biggest difference between cloud storage centralized data storage and online access
and traditional internal storage is that data to services or IT resources.
is transferred over the Internet and stored Cloud computing, large groups of
in an uncertain domain, which is not under resources can be connected via a private or
the control of customers, which inevitably public network. In the public cloud,
raises major concerns about your data services (that is, applications and storage
integrity. These concerns stem from the space) are available for general use on the
fact that cloud storage is affected by Internet. A private cloud is a virtualized
security threats both outside and inside the data center that operates within a firewall.
cloud, and servers in the uncontrolled Cloud computing provides computing and
cloud can passively hide some episodes of storage resources on the Internet. The
customer data loss to maintain its increasing amount of data is stored in the
reputation What is more serious is that to cloud, and users with specific privileges
save money and space, cloud servers can share it, which defines special rights to

access stored data. Managing the converging keys. Differential authorization

exponential growth of a growing volume duplication control is an authorized
of data has become a critical challenge. duplication elimination technique in which
According to the IDC 2014 cloud report, each user is granted a set of privileges
companies in India are gradually moving during system initialization. This privilege
from the legacy of premise to different set specifies what types of users can
forms of cloud. As the process is gradual, perform duplicate checks and access files.
it began during the migration of some
cloud application workloads. To perform 2. RELATED WORK
scalable management of data stored in In this section, Author briefly review the
cloud computing, deduplication has been a related work on Data Deduplication and
well-known technique that has become their different techniques.
more popular recently. Deduplication is a
specialized data compression technique G. Wallace, F. Douglis, H. Qian, P.
that reduces storage space and charges Shilane, S. Smaldone, M. Chamness,
bandwidth in cloud storage. In and W. Hsu has developed Characteristics
deduplication, only a single instance of of backup workloads in production
data is actually on the server and the systemsThe author presents a complete
redundant data is replaced with a pointer to characterization of backup workloads by
the copy of the unique data. Deduplication analyzing statistics and content metadata
can occur at the file or block level. From collected from a large set of EMC Data
the user's point of view, security and Domain backup systems in production use.
privacy issues arise, as data is susceptible This analysis is complete (it covers the
to internal and external attacks. We must statistics of over 10,000 systems) and in
properly apply the confidentiality, integrity depth (it uses detailed traces of the
verification and access control metadata of different production systems
mechanisms of both attacks. Deduplication that store almost 700TB of backup data).
does not work with traditional Author compared these systems with a
cryptography. The user encrypts their files detailed study of Microsoft's primary
with their own individual encryption key, a storage systems and demonstrated that
different encryption text may also appear back-up storage differs significantly from
for identical files. Therefore, traditional the primary storage workload in terms of
cryptography is incompatible with data data quantities and capacity requirements,
duplication. Converged encryption is a as well as the amount of data storage
widely used technique for combining capacity. Redundancy within the data.
storage savings with deduplication to These properties offer unique challenges
ensure confidentiality. In converged and opportunities when designing a disk-
encryption, data copy is encrypted with a based file system for backup workloads
key derived from the data hash. This [1].
converging key is used to encrypt and
decrypt a copy of data. After key A. El-Shimi, R. Kalach, A. Kumar, A.
generation and data encryption, users keep Ottean, J. Li, and S. Sengupta have
keys and send encrypted text to the cloud. developed Primary data deduplication-
Because cryptography is deterministic, large scale study and system designThe
copies of identical data will generate the author presents a large-scale study of
same convergent key and the same primary data deduplication and uses the
encrypted text. This allows the cloud to results to guide the design of a new
duplicate encrypted texts. Cryptographic primary data deduplication system
texts can only be decrypted by the owners implemented in the Windows Server 2012
of the corresponding data with their operating system. The file data were

analyzed by 15 servers of globally Shweta D. Pochhi, Prof. Pradnya V.

distributed files that host data for over Kasture have represents ―Encrypted Data
2000 users in a large multinational Storage with De-duplication Approach on
company. The results are used to achieve a Twin Cloud. The data and the private
fragmentation and compression approach cloud where the token generation will be
that maximizes deduplication savings by generated for each file. Before uploading
minimizing the metadata generated and the data or file to the public cloud, the
producing a uniform distribution of the client will send the file to the private cloud
portion size. Deduplication processing for token generation, which is unique to
resizing with data size is achieved by a each file. Private clouds generate a hash
frugal hash index of RAM and data and token and send the token to the client.
partitioning, so that memory, CPU and The token and hashes are kept in the
disk search resources remain available to private cloud itself, so that whenever the
meet the main workload of the IO service. next token generation file arrives, the
[2]. private clone can refer to the same token.
P. Kulkarni, F. Douglis, J. D. LaVoie, Once the client gets the token for a given
and J. M. Tracey, ―Redundancy file, the public cloud looks for the token
elimination within large collections of similar if it exists or not. If the public
files‖. Propose a new storage reduction cloud token exists, it will return a pointer
scheme that reduces data size with to the existing file, otherwise it will send a
comparable efficiency to the most message to load a file. A system that
expensive techniques, but at a cost achieves confidentiality and allows block-
comparable to the fastest but least level deduplication at the same time.
effective. The scheme, called REBL Before uploading the data or file to the
(Block Level Redundancy Elimination), public cloud, the client will send the file to
exploits the advantages of compression, the private cloud for token generation,
deletion of duplicate blocks and delta which is unique to each file. The private
encoding to eliminate a wide spectrum of cloud generates a hash and token and
redundant data in a scalable and efficient sends them to the client. The token and the
way. REBL generally encodes more hash are kept in the private cloud itself so
compactly than compression (up to a that whenever the next token generation
factor of 14) and a combination of file arrives, the private clone can refer to
compression and suppression of duplicates the same token.
(up to a factor of 6.7). REBL is also coded [4].
similarly to a technique based on delta
encoding, which significantly reduces the Jin Li, Yan Kit Li, Xiaofeng Chen,
overall space in a case. In addition, REBL Patrick P. C. Lee, Wenjing Lou have
uses super fingerprint, a technique that developed A Hybrid Cloud Approach for
reduces the data needed to identify similar Secure Authorized De-duplication[9]. In
blocks by drastically reducing the the proposed system, we are getting data
computational requirements of the deduplication by providing data evidence
matching blocks: it converts the from the data owner. This test is used
comparisons of O (n2) into searches of when the file is uploaded. Each file
hash tables. As a result, the use of super uploaded to the cloud is also limited by a
fingerprints to avoid enumerating the set of privileges to specify the type of
corresponding data objects decreases the users who can perform duplicate
calculation in the REBL resemblance verification and access the files. New
phase of a couple of orders of magnitude duplication constructs compatible with
[3]. authorized duplicate verification in the
cloud hybrid architecture where the private

cloud server generates duplicate file

verification keys. The proposed system D. T. Meyer and W. J. Boloskyhas
includes a data owner test, so it will help represents A study of practical
implement better security issues in cloud Deduplication. Author collect data from
computing [5]. the file system content of 857 desktop
computers in Microsoft for a period of 4
M. Lillibridge, K. Eshghi, and D. weeks. Author analyze the data to
Bhagwatrepresents the improvement in determine the relative efficiency of data
recovery speed for backup systems that deduplication, especially considering the
use block-based online deduplication. The elimination of complete file redundancy
slow recovery due to the fragmentation of against blocks. Author have found that full
the parts is a serious problem faced by data file deduplication reaches about three
deduplication systems in one piece: the quarters of the space savings of more
recovery speeds for the most recent aggressive block deduplication for live file
backup can eliminate orders of magnitude system storage and 87% of backup image
during the life cycle of a system. Author savings. Author also investigated file
have studied three techniques: increase the fragmentation and found that it does not
size of the cache, limit the containers and prevail, and Author have updated previous
use a direct assembly area to solve this studies on file system metadata, and
problem. Limiting the container is a time- Author have found that file size
consuming task and reduces fragmentation distribution continues to affect very large
of fragments at the cost of losing part of unstructured files[8].
the deduplication, while using a direct
assembly area is a new technique of V. Tarasov, A. Mudrankit, W. Buik, P.
recovery and caching in the recovery Shilane, G. Kuenning, and E. Zadok
process which exploits the perfect having represents generating realistic
knowledge of the future access to the datasets for the deduplication analysis. The
fragments available during the restoration author has developed a generic model of
of a backup to reduce the amount of RAM file system changes based on properties
needed for a certain level of caching in the measured in terabytes of real and different
recovery phase [6]. storage systems. Our model connects to a
generic framework to emulate changes in
D. Meister, J. Kaiser, and A. the file system. Based on observations
Brinkmannrepresented caching of data from specific environments, the model can
deduplication locations. The author generate an initial file system followed by
proposes a new approach, called Block continuous changes that emulate the
Locality Cache (BLC), which captures the distribution of duplicates and file sizes,
previous backup execution significantly realistic changes to existing files and file
better than existing approaches and always system growth. [9].
uses up-to-date information about the
location and is therefore less prone to P. Shilane, M. Huang, G. Wallace, and
aging. Author evaluated the approach W. Hsu
using a simulation based on the detection discovered the optimized WAN replication
of multiple sets of real backup data. The of backup data sets using delta
simulation compares the Block Locality compression reported by the stream.
Cache with the approach of Zhu et al. and Offsite data replication is critical for
provides a detailed analysis of the disaster recovery reasons, but the current
behavior and the IO pattern. In addition, a tape transfer approach is cumbersome and
prototype implementation is used to error prone. Replication in a wide area
validate the simulation [7]. network (WAN) is a promising alternative,

but fast network connections are expensive preserve the privacy of data owners by
or impractical in many remote locations, proposing a scheme to manage the storage
so better compression is needed to make of encrypted data with deduplication.
WAN replication very practical. Author Author test safety and evaluate the
present a new technique for replicating performance of the proposed scheme
backup data sets through a WAN that not through analysis and simulation. The
only removes duplicate file regions results show its efficiency, effectiveness
(deduplication) but also compresses and applicability.
similar file regions with delta
compression, which is available as a Objectives:
feature of EMC Data Domain systems.  To improved integrity.
[10].  To increase the storage utilization.
 To remove the duplicate copies of
OPEN ISSUES:- data and improve the reliability.
Existing solutions for deduplication suffer  To improve the security.
from many attacks. They cannot friendly
support data access control and revocation 4. System Architecture:
at the same time. Most existing solutions
cannot ensure reliability, security and
privacy with sound performance. First data
holders may not be always online or
available for each a management, which
could come storage delay. Second
deduplication could become too
complicated in the term of communication
and computation to involve data holder
into deduplication process. Third, it may
intrude the privacy of data holder in a
process of discovering duplicated data.
Forth a data holder may have no idea how
to issue data access right or deduplication Fig. System Architecture
key to users in some situation when it does
not know other data holders due to data  CSP: The CSP allows the data
suffer distribution. Therefore, CSP cannot owner for data storage services.
cooperate with data holders on data You cannot trust completely. That's
storage deduplication in many situations. why the content of stored data is
curious. It must be done honestly in
3. PROPOSED SYSTEM:- the conservation of data for profit.
 Data Holder: The data owner
In this paper, Author propose a confidence can upload and save his data and
scheme in the challenge of data ownership files in the CSP. In this system it is
and cryptography to manage the storage of possible that the number of data
encrypted data with deduplication. Our holders can store their files in
goal is to solve the problem of cryptographic raw data in the CSP.
deduplication in the situation where the The owner of the data that
data owner is not available or it is difficult produces or creates the file
to get involved. Meanwhile, the data size considers the file as the owner of
does not affect the performance of data the data. The owner of the data is
deduplication in our schema. Author are in normal form that the highest
motivated to save space in the cloud and to priority of the owner

(128 bit) +plain text (128

 AP: An authorized party where bit). Process:
data owners trust completely. Data
holders to verify data ownership 10/12/14-rounds for-128 bit /192 bit/256
and manage data deduplication. It bit input
does not converge with the CSP. In
this case, CSP should not know the Xor state block (i/p)
user data in its memory.
Final round:10,12,14
Mathematical Model:
Each round consists: sub byte, shift byte,
KeyGenCE (M): K is the key generation mix columns, add
algorithm that maps a data copy M to a
convergent key K; round key.
EncryptCE(K, M): C is the symmetric Output:

encryption algorithmthat takes both the
convergent key K and the data copy M cipher text(128 bit)
asinputs and then outputs a cipher text C;
FRAGMENTATION
DecryptCE(K,C): M is the decryption ALGORITHM
algorithm that takesboth the cipher text C
and the convergent key K as inputs and Input: File
Then outputs the original data copy M
Output: Chunks
TagGenCE(M): T(M) is the tag generation Step1: If file is to be split go to step 2 else
algorithmthat maps the original data copy merge the fragments of the file and go to
M and outputs a tagT(M). We allow step Step2: Input source path, destination
TagGenCE to generate a tag from path
thecorresponding cipher text, by using
T(M)=TagGenCE(C),where Step3: Size = size of source file
C=EncryptCE(K,M).
Step4: Fs = Fragment Size
Algorithms:
1. AES Algorithm for Encryption and Step5: NoF = number of fragments
Decryption:
Step6: Fs = Size/Nof
AES (advanced encryption standard).It is Step7: We get fragments with merge

symmetric algorithm. It used to convert option
plain text into cipher text .The need for
coming with this algo is weakness in DES. Step8: End
The 56 bit key of des is no longer safe
against attacks based on exhaustive key 3. MD5 (Message-Digest Algorithm)
searches and 64-bit block also consider as
weak.. The MD5 message digest algorithm is a
Input: widely used cryptography hash function
that produces a 128-bit (16-byte) hash
128 bit /192 bit/256 bit value, typically expressed as text in 32-
input (0, 1) Secret key digit hexadecimal numbers. MD5 has been

used in a wide variety of cryptographic 6. Conclusion

applications and is also commonly used to Data deduplication is important and
verify data integrity. Steps: significant in the practice of data storage in
the cloud, in particular for the management
5) A message digest algorithm is a hash of big data filing. In this paper, Author
function that accepts a sequence of bits of proposed a heterogeneous data storage
any length and produces a sequence of bits management scheme, which offers flexible
of a small fixed length. data deduplication in the cloud and access
control. Our schema can be adapted to
6) The output of a message digest is different scenarios and application
considered as a digital signature of the requests and offers cost-effective
input data. management of big data storage across
multiple CSPs. Data deduplication and
7) MD5 is a message digest algorithm that access control can be achieved with
produces 128 bits of data. different security requirements.Security
analysis, comparison with existing work
5. GAP ANALYSIS:- and implementation-based performance
evaluation have shown that our scheme is
Proposed System tested the time spent to safe, advanced and efficient.
encryption and decryption a file with
different sizes by applying AES with 2 REFERENCES
different key sizes, namely 128 bits and [1] D. Meister, J. Kaiser, and A. Brinkmann,
256 bits and observe from graph that ―Block locality caching for data
deduplication,‖ in Proc. 6th Int. Syst. Storage
encrypting or decrypting a file of 10 to 20 Conf., 2013, pp. 1–12.
megabytes (MB) with 128-bit AES takes [2] M. Lillibridge, K. Eshghi, and D. Bhagwat,
about 100 seconds. It is a reasonable and ―Improving restore speed for backup systems
practical choice to apply symmetric that use inline chunk-based deduplication,‖ in
encryption for data protection. Proc. 11th USENIX Conf. File Storage
Technol, Feb. 2013, pp. 183–197.
[3] V. Tarasov, A. Mudrankit, W. Buik, P.
Shilane, G. Kuenning, and E. Zadok,
AES AES 128 AES ―Generating realistic datasets for deduplication
Parameter 128 Bits 128 analysis,‖ in Proc. USENIX Conf. Annu. Tech.
Conf., Jun. 2012, pp. 261–272.
Bits Bits
[4] D. T. Meyer and W. J. Bolosky, ―A study of
Time(sec) 7 12 15 practical deduplication,‖ ACM Trans. Storage,
File 10 50 70 vol. 7, no. 4, p. 14, 2012.
Size(MB) [5] G. Wallace, F. Douglis, H. Qian, P. Shilane, S.
Smaldone, M. Chamness, and W. Hsu,
―Characteristics of backup workloads in
Table 1:AES 128 bit Comparative Result
production systems,‖ in Proc. 10th USENIX
Conf. File Storage Technol., Feb.2012,pp.33–
48.
AES AES AES [6] El-Shimi, R. Kalach, A. Kumar, A. Ottean, J.
Parameter 256 256 256 Li, and S. Sengupta, ―Primary data
deduplication-large scale study and system
Bits Bits Bits design,‖ in Proc. Conf. USENIX Annu. Tech.
Time(sec) 9 19 22 Conf., Jun. 2012, pp.285–296.
File 10 50 70 [7] P. Shilane, M. Huang, G. Wallace, and W.
Size(MB) Hsu, ―WAN optimized replication of backup
datasets using stream-informed delta
compression,‖ in Proc. 10th USENIX Conf.
Table 2: AES 256 bit Comparative Result File Storage Technol.,Feb.2012,pp.49–64.
[8] P. Kulkarni, F. Douglis, J. D. LaVoie, and J.
M. Tracey, ―Redundancy elimination within

large collections of files,‖in duplication‖ IEEE Transactions on Parallel

roc.USENIXAnnu.Tech.Conf. Jun.2012, and Distributed Systems: PP Year 2014.
pp.59–72. [10] Shweta D. Pochhi, Prof. Pradnya V.
[9] Jin Li, Yan Kit Li, Xiaofeng Chen, Patrick P. Kasture ―Encrypted Data Storage with De-
C. Lee, Wenjing Lou ―A Hybrid Cloud duplication Approach on Twin Cloud ―
Approach for Secure Authorized De- International Journal of Innovative Research in
Computer and Communication Engineering

SURVEY ON A RANKED MULTI-KEYWORD

SEARCH IN CLOUD COMPUTING
Mr.Swaranjeet Singh1, Prof. D. H . Kulkarni2
1,2
Department of Compuer Engineering, Smt. Kashibai Navale College of Engineering, Vadgaon(Bk),
Pune, India.
dhkulkarni@sinhgad.edu1, gs13singh@gmail.com2
ABSTRACT
Privacy and ownership of the graphs in the cloud has become a major concern.
proposed a tree based ranked multi-keyword search scheme for multiple data owners
specifically, by considering a large amount of data in the cloud, Proposed system the
tfi-df model (Relevance score))to develop a multi-keyword search and return the top-k
ranked search results. To enable the cloud servers to perform a secure search without
knowing any sensitive data. Most cloud server do not serve single user, it give service to
multiple user at the same time. this project consist of functions in which user can
search multiple file and send the file to multiple user at the same time. it has ranking
search technique in which most frequent searches are shown. Integrity of data is
checked by third party auditor for better service by cloud.
Index Terms
AES, Cloud computing, Encryption, Multiple owners, Privacy preserving, Ranking.
1. INTRODUCTION of us would be willing to share our secret
Most servers in the cloud do not just need passwords with others, the owners of
a data owner; On the contrary, they often different data would prefer to use their
support multiple data owners to share the secret passwords to encrypt their secret
benefits of cloud computing. For example, data. As a result, it is very difficult to
to help the government establish perform a secure, convenient and efficient
satisfactory health care policies, or to help search on encrypted data with different
medical institutions conduct useful secret keys. Third, when there are several
research, some patient volunteers agree to data owners involved, This must guarantee
share their health data in the cloud. To efficient mechanisms for registering and
preserve your privacy, encrypt your health withdrawing users, so that our system
data with your secret passwords. In this enjoys excellent security and scalability.
scenario, only authorized organizations
can perform a secure search of this A. MOTIVATION
encrypted data provided by multiple data
owners. This system of sharing the Protecting data privacy in the cloud is not
personal health record, in which several straightforward, as encryption alone can
data owners participate, can be found at limit clouds usage in computation. Data
mymedwall.com. Compared to the single- sharing is another crucial utility function,
owner scheme, the development of a fully i.e., sharing data files with each other. In
proprietary multi-owner scheme will personal health record system, data user
present many new complex prob-lems. (e.g., a patient) should have the ability to
First, in the single owner scheme, the access his/her top-k data files about a
owner of the data must remain online to specific case from different data owners
generate traps (encrypted keywords) for (e.g., health monitors, hospitals, doctors).
the data users. However, when there is a Similarly, the employees in an enterprise
large number of data owners involved, should have the ability to search data files
asking them to stay online simultaneously outsourced by other employees. Recent
to generate traps, this would seriously work proposed a privacy-preserving
compromise the flexibility and usefulness ranked multi-keyword search in a multi-
of the search system. Second, since none user model (PRMSM), which addresses

the multi-keyword search problem in the 3) Here o on the condition of preserving

multiple data owners model. However, user data privacy and user querying
PRMSM is inefficient and potentially privacy. Performance analysis shows that
expensive for frequent queries due to the SPKS scheme is applicable to a cloud
matching various ciphertexts from environment[5].
different data owners even for the same
query. 4) Paper establish a set of privacy
requirements and utilize the appearance
2. REVIEW OF LITERATURE frequency of each keyword to serve as its
weight. A preference preprocessing
[10] This paper presents Techniques mechanism is then explored to ensure that
provide provable secrecy for encryption, in the search result will faithfully respect the
the sense that the untrusted server cannot users preference[10].
learn anything about the plaintext given
only the ciphertext. described new 5) Propose a novel multi- keyword fuzzy
techniques for remote searching on search scheme by exploiting the locality-
encrypted data using an untrusted server sensitive hashing technique. Our proposed
and provided proofs of security for the scheme achieves fuzzy matching through
resulting cryptosystems. Our techniques algorithmic design rather than expanding
have a number of crucial advantages: they the index file. It also eliminates the need of
are provably secure; they support a predefined dictionary and effectively
controlled and hidden search [1]. supports multiple keyword fuzzy search
[11].
[11] Secure indexes are a natural extension 6) Public key encryption algorithm for
of the problem of constructing data encrypting the data and invoke ranked
structures with privacy guarantees such as keyword search over the encrypted data to
those provided by oblivious and history retrieve the files from the cloud. This aim
independent data structures. Develop an to achieve an efficient system for data
efficient ind-cka secure index construction encryption without sacrificing the privacy
called z-idx using pseudo-random of data. Further, this ranked keyword
functions and Bloom filters, and show how search greatly improves the system
to use z-idx to implement searches on usability by enabling ranking based on
encrypted data. This search scheme is the relevance score for search result, sends top
most efficient encrypted data search most relevant files instead of sending all
scheme currently known [2]. files back, and ensures the fi le retrieval
accuracy[12].
3.survey the ways in which Bloom filters
have been used and modified for a variety 7) Present a privacy-preserving multi-
of network problems, with keyword text search (MTS) scheme with
the aim of providing a unified similarity-based ranking to address this
mathematical and practical framework for problem. To support multi-keyword search
them and stimulating their use in future and search result ranking, This propose to
applications[3]. build the search index based on term
frequency and the vector space model with
2) Define SSE in the multi-user setting, cosine similarity measure to achieve
and present an efficient construction that higher search result accuracy[13].
achieves better performance than simply
using access control mechanisms[4]. 8) algorithm to provide efficient multi-
keyword ranked search. This scheme

provides a resolution for secured data with different keys for different data
sharing onto cloud or any public resource. owners.
Data contents also get secured as no
authority can also access user data. User 2. The proposed scheme allows new data
can differentiate between other users with owners to enter this system without
whom he/she is sharing own data.[15]. affecting other data owners or data users.
3. SYSTEM ARCHITECTURE/
SYSTEM OVERVIEW
The proposed system is used to and out the

shortest path between source location and
destination location by using shortest path
estimation algorithm and cache
replacement policy is used for cache
management. PPattern detection algorithm
is also used for detecting best matching
patterns. Proposed system will provide
security to data. Secure search protocol is
propose in which cloud server can perform
secure search without knowing the actual
value of keywords and trapdoors. In Fig. 1. Proposed System Architecture
multiowner and multiuser cloud
computing model, four entities are 1) Keyword Balanced Binary Tree
involved such as data owners, data users, (KBB-Tree): To im-prove the efficiency of
cloud server and TPA .Data owners have the search, first proposed the keyword
collection of files. Data owners build balanced binary tree. However, it does not
secure search able index of keyword set . support the multiple data owners model. In
Data owners submit keyword index to our scheme, each data owner builds a
server. Data owners encrypt files and secure keyword balanced binary tree and
outsource encrypted files to cloud server. outsource them to the cloud server. The
encrypted keyword index to the cloud cloud server merges those index trees and
server. When data user wants to search performs the efficiently multi keyword
over files from cloud server, he first search. Each node in the index tree stores a
computes the corresponding trapdoors and vector D whose elements are the relevance
submits them Then encrypted trapdoors scores.
and submit them to cloud server. Cloud Vector Space Model: The vector space
server searches encrypted index of data model along with TF IDF rule is a popular
owner and returns top-k relevant encrypted information retrieval model , where TF
files to the data user. When data user denotes the frequency of a given keyword
receives top-K files from cloud server, appearing in the file and IDF is the
then data user download files and decrypts logarithm of the total number of files
these files. Third party auditor check divided by the number of files containing
integrity of data and inform to owner. the keyword and get value obtained the
loga-rithm. There are many variations of
Advantages: 1 .The proposed scheme the TF IDF weight-ing scheme. Without
allows multi-keyword search over loss of generality, This choose a
encrypted files which would be encrypted commonly used formula to calculate the
relevance score of the document.

be used as a checksum to verify data

DFS:Depth first search: cloud server integrity, but only against unintentional
receives a search request, it first converts corruption. It remains suitable for other
the trapdoor into a search vector and then non-cryptographic purposes, for example
calls the DFS-algorithm and returns the for determining the partition for a
top-k related files.It takes Input: The Index particular key in a partitioned database.
Trees node u, query vector q and returned
file number k Output: The list RList of 1. Data integrity check is a most
top-k ranked encrypted files common application of the hash
functions. It is used to generate the
A. Algorithms checksums on data files.
1) Advanced encryption standard (AES) 2. Instead of storing password in

Algorithm For Encryption clear, mostly all logon processes store the
hash values of passwords in the file.
AES(advanced encryption standard).It is
symmetric algorithm. It used to convert Steps
plain text into cipher text
1 append padded bits The message is
.The need for coming with this algo is padded so that its length is congruent to
weakness in DES. The 56 bit key of des is 448, modulo 512
no longer safe against attacks based on
exhaustive key searches and 64-bit block 2:Append length: A 64 bit representation
also consider as weak. of b is appended to the result of the
Input: 128 bit=192 bit=256 previous step.The resulting message has a
bit input(0,1) secret key(128 length that is an exact multiple of 512
bit)+plain text(128 bit). bits.
Output: cipher text(128 bit).
Steps 3: Initialize MD Buffer- A four-word
buffer (A,B,C,D) is used to compute the
1. 10/12/14-rounds for:128 bit /192 message digest
bit256 bit input
4 Process message in 16-word blocks, This
2. Xor state block (i/p) first define four auxiliary functions that
each take as input three 32-bit words and
3. Final round:10,12,14 produce as output one 32-bit word. The
message digest produced as output is A, B,
4. Each round consists:sub byte, shift C, D. That is, This begin with the low-
byte, mix columns, add round key. order byte of A, and end with the high-
order byte of D. This completes the
2) MD5 Algorithm description of MD5.
This is used for data deduplication 4. OUTPUT

checking in present system.The MD5
message-digest algorithm is a widely used B. Mathematical Model
hash function producing a 128-bit hash
value. Although MD5 was initially Here , The relevance score of
designed to be used as a cryptographic document,Given files F=f1,f2,f3...fn and
hash function, it has been found to suffer Keyword set W=w1,w2,w3...wn.This
from extensive vulnerabilities. It can still

calculate the relevance score between Fb(b This paper presents secure serching
2 j1; dj) techniques in cloud stored data. This paper
Score (fb; wj) 1 N also survey on different tech niques to
= jfb (1 + lnFfb;wj )ln(1 + ) search over the encrypted data solves the
f
j wj problem of ranked search over encrypted
Where,jfbj denotes the length of file cloud data. The data will get in less time
Fb,fFb;wj denotes the frequency of the by secure index searching. The cloud
keyword wj in the file Fb, fwj denotes the server performs searching over the
number of files containing keyword wj and encrypted data but server does not know
N denotes number of files. the sensitive information behind the data
collection.TPA Check the integrity of data
Each node in the index tree stores a vector that stored on cloud.
D whose elements are the relevance
scores. This define the node in the index REFERENCES
tree as [1] Song, D. Wagner, A. Perrig, Practical
techniques for searches on encrypted data, in:
SP00, Berkeley, CA, 2000.
unode = <ID; FID; D; Pl; P r> [2] Goh, Secure indexes, Cryptology ePrint
Archive, pp. 216 216, 2003.
where ID, FID, and OID denote the id of [3] Broder, M. Mitzenmacher, Network
node, file and data owner, respectively.Pr applications of bloom filters: A survey,
denotes the pointers to the right child of Internet Math., vol. 1, no. 4, pp. 485 509,
2002.
the unode, and Pl denotes the pointers to [4] R. Curtmola, J. Garay, S. Kamara, R.
the left child. Ostrovsky, Searchable symmetric encryption:
improved definitions and efficient
5. GAP ANALYSIS constructions, Journal of Computer Security,
vol. 19, no. 5, pp. 895 934, 2011.
[5] Q. Liu, G. Wang, J. Wu, Secure and privacy
Number of searched file for user entered preserving keyword searching for cloud
string storage services, J NETW COMPUT APPL.,
No.of words in vol. 35, no. 3, pp. 927 933, 2012.
Index string Existing Proposed [6] Wang, N. Cao, J. Li, K. Ren, W. Lou, Secure
Num- system system ranked keyword search over encrypted cloud
data, in: ICDCS10, Genoa, Italy, 2010.
ber [7] Liu, L. Zhu, J. Chen, Efficient searchable
1 3 18 7 symmetric encryption for storing multiple
2 2 12 6 source dynamic social data on cloud, J NETW
COMPUT APPL., vol. 86, pp. 3 14, 2017.
3 1 18 10
[8] N. Cao, C. Wang, M. Li, K. Ren, W. Lou,
4 2 22 16 Privacy-preserving multi-keyword ranked
5 3 11 4 search over encrypted cloud data, in:
Table I Shows the no.of searched file for INFOCOM11, Shanghai, China, 2011.
[9] Ibrahim, H. Jin, A. Yassin, D. Zou, Secure
user entered string In existing system and rank-ordered search of multi-keyword trapdoor
proposed system.In existing system for over encrypted cloud data, in: APSCC12,
multikeyword string it consider the each Guilin, China, 2012.
word as separate and search the document [10] Z. Shen, J. Shu, W. Xue, Preferred keyword
for each word separately. In proposed search over encrypted data in cloud computing,
in: IWQoS13, Montreal, Canada, 2013.
system only ranked files will display to [11] Wang, S. Yu, W. Lou, Y. Hou, Privacy-
user. preserving multi-keyword fuzzy search over
encrypted data in the cloud, in: INFOCOM14,
6. CONCLUSION Toronto, Canada, 2014.
[12] S. Pasupuleti, S. Ramalingam, R. Buyya, An
efficient and secure privacy-preserving
approach for outsourced data of resource

constrained mobile devices in cloud

computing, J NETW COMPUT APPL., vol.
64, pp. 12 22, 2016.
[13] Sun, B. Wang, N. Cao, H. Li, W. Lou, Y. Hou,
H. Li, Privacy preserving multi-keyword text
search in the cloud supporting similarity based
ranking, IEEE T Parall Distr., vol. 25, no. 11,
pp. 3025 3035, 2014.
[14] Z. Xia, X. Wang, X. Sun, Q. Wang, A secure
and dynamic multikeyword ranked search
scheme over encrypted cloud data, IEEE T
Parall Distr., vol. 27, no. 2, pp. 340 352, 2016.
[15] Dong, G. Russello, N. Dulay, Shared and
searchable encrypted data for untrusted
servers, Journal of Computer Security, vol. 19,
no. 3, pp. 367 397, 2011

PRIVATE SECURE SCALABALE CLOUD

COMPUTING
Himanshu Jaiswal1, Sankalp Kumar2, Janhvi Charthankar3, Sushma Ahuja4
1,2,3,4 Department of Computer Engineering, Smt Kashibai Navale College of Engineering, vadgaon(Bk),
Pune, India.
Him.manutd07@gmail.com1,sankalp13shanu@gmailcom2,janhvicharthankar@gmail.com3,sushahuja7@gm
ail.com4
ABSTRACT
The Cloud computing scenario is the reason why we are widely speaking about
Security, Privacy and Trust issues even though it exists since the evaluation of the
Internet.
Any client/small organization/enterprise that processes data in the cloud is subjected to
an inherent level of risk because outsourced services bypass the "physical, logical and
personnel controls" of the user. When storing data on cloud, one might want to make
sure if the data is correctly stored and can be retrieved later. As the amount of data
stored by the cloud for a client can be enormous, it is impractical (and might also be
very costly) to retrieve all the data, if one‘s purpose is just to make sure that it is stored
correctly.
Hence there is a need to provide such guarantees to a client. Hence, it is very important
for both the cloud provider and the user to have mutual trust such that the cloud
provider can be assured that the user is not some malicious hacker and the user can be
assured of data consistency, data storage and the instance he/she is running is not
malicious. Hence the necessity for developing trust models/protocols is demanding by
using encryption and decryption techniques. This architecture ensures better reliability,
availability, scalability and security and also maintains confidentiality of the data
stored.
General Terms
Scalable Cloud,Cyber Security,AES and RSA ,Digital Signature, Algorithms et. al.
Keywords:Digital Signature, Cloud Computing, Encryption,Database
1. INTRODUCTION availability, access control and checking
Cloud based systems have gained the integrity of data. Here we have
popularity over traditional systems owing proposed the secure architecture for the
to their advantages like cost effectiveness, cloud which is going to map some cloud
pay per use, scalability and ease to security issues that are authentication of
upgrade. Because of privacy leakage and the user, confidentiality, privacy, access
security problems, it is complicated for control and checking the integrity of data.
organizations holding core data (such as For authentication of the user, the system
innovative enterprise and army) to uses One Time Password (OTP), for data
extensively apply public cloud storage integrity check system uses modified
service. The security challenges of cloud SHA-2 hash function. This modified
computing has brought to us how to version of SHA-2 will provide the better
prevent user‘s data from leaking. solution for Pre-image attack and Collision
Cloud Service Provider (CSP) provides attack and for encryption and decryption
different resources and services to the user system uses standard Advanced
anytime anywhere over the internet. Due Encryption Standards.
to this feature of cloud maintain security
over the cloud is complex. Cloud 2. MOTIVATION
computing security issues are To provide guarantee for both the cloud
authentication of the user, nonrepudiation, provider and the user to have mutual trust
authority, confidentiality, privacy, such that the cloud provider can be assured

that the user is not some malicious hacker Cloud data sharing based on an
and the user can be assured of data encryption/decryption algorithm which
consistency, data storage and the instance aims to protect the data stored in the cloud
he/she is running is not malicious. from the unauthorised access.4
According to the user the capacity of
storing on the cloud can be altered 4. PROPOSED WORK
accordingly so scalability is an important The proposed Predictive model initially
factor and the system that is being undergoes the following techniques which
developed is more reliable because of the consists of:
security provided at different layer. Also it Proof Of Ownership
is quite affordable. A Proof Of Authentication
File Upload with Digital Signature
3. STATE OF ART Upload File & Grant permissions
3.1 Fine-Grained Two-Factor Protection Data Preprocessing is followed by
Mechanism for Data Sharing in Cloud designing the Prediction Engine and
Storage building the Learning model using
In this paper, the proposed system are a different Boosting techniques for
focus on data protection for cloud storage. producing Learned parameters which are
The proposed system focus on following ultimately used for Prediction Calculation.
points : Proof of Ownership
1) Cryptographic Key Data Owner uploads document, metadata on a
2)cryptographic key can be revoked cloud after encryption using keys from Data
efficiently by integrating the proxy re- Owner and Cloud Service Provider. As each
encryption and key separation techniques. and every document has a digital e-signing.
3) The data is protected in a fine-grained And all text documents should be able to be
way by adopting the attribute-based modified by authorized user
encryption technique. Proof of Authentication
3.2 Privacy preserving Model Each user has a one unique username and
A The data privacy-preserving issues are password which is used for authentication of
analysed by identifying unique privacy users. Each user has a unique digital e-sign
requirements and presenting a supportable
because they are used for upload documents.
solution that eliminates the possible threats
File Upload With Digital Signature
towards data privacy. The proposed system
Prior uploading the document, digitally sign
also gives the privacy-preserving model
(PPM) to audit all the stakeholders in order every individual document. Digital signatures
to provide a relatively secure cloud can provide the added assurances of evidence
computing environment. to origin, identity and status of an electronic
3.3 Applying Encryption Algorithm for document, transaction or message, as well as
data security in cloud storage acknowledging informed consent by the
This paper proposes a simple, secure, and signer. So we use digital signatures for File
privacy-preserving architecture for inter- Upload.

Secure Private Digital

Cloud Signature
Authorized
Person
Digitalcertificate
Authoriz Signed
ed Encryption Docu
person ment
Upload AES Key
Documen (16
t’s bytes)
AES
Algorithm
(128 bits) RSA
Algorit
-
Encrypted hmPubl
-
Document
ic
Priva
Key
te
Upl Key
oad
er Cloud
Fig.: System Architecture of Cloud Computing
Upload File & Grant permissions Encryption is the process of transforming

Suppose Data owner upload documents with information in such a way that an unauthorized
Encryption format and using digital signature. third party cannot read it, a trusted person can
If XYZ user want to access this document decrypt data and access it in its original form
which is upload by authorized person that time though. There are a lot of popular
XYZ user will sent the request to authorized encryption/decryption methods, but the key to
person then authorized person can give security is not a proprietary algorithm. The
permission to XYZ user. with its permissions, most important thing is keeping the encryption
because now a day security is very important. key (password) a secret so only trusted parties
Permissions are Read, Write and Append know it. Encrypt everything to protect your
data so each file will be stored in Encrypted
Share File Format.
Document owner can share data with other
users which are use private cloud. Entity Relationship Digram
Provide different types of permissions to users. Data objects and their major attributes and
Store File In Encrypted Format relationships among data objects are described
using an ER - like form.ER diagram is a data

model for describing the data or information relationships that exist among them. The
aspects of a software system. The main various entities of the synchronization system
components of ER models are entities and the are data owner, Data user.

5.8 Sequence Digram

A Sequence diagram is an interaction
diagram that shows how processes operate
with one another and in what order.
Sequence diagrams are sometimes called
event diagrams or event scenarios. The
sequence diagram for the proposed system
shows the interaction in between data user
,data owner.
5. CONCLUSION Trust based evaluation is proposed in the

Security issues in the area of cloud form of trust model. It covers various
computing are active area of research and aspects of security that are necessary to be
experimentation. Various issues are checked at the time of cloud service
identified one of which is the security of selection. Trust value is the output of the
user data and applications. Cloud services trust model that measures the security
are available to achieve security with the strength. Strength in terms of various
varying techniques and methods. To parameters is proposed for cloud services.
address the challenge of selecting one of Static and dynamic parameters are
the cloud service based on the user proposed and can be collectively used to
requirements of security, an assessor tool evaluate security of the cloud services.
is proposed.
REFRENCES
[1] B.Prabavathy; P. Ramya; ChitraBabu, Security in Cloud Storage‖, Advances in
―Optimized private cloud storage for Ubiquitous Networking, Lecture Notes in
heterogeneous files in an university Scenario‖, Electrical Engineering, vol 366. Springer,
International Conference on Recent Trends in Singapore, Year: 2015.
Information Technology (ICRTIT) Year: 2013. [5] Boeui Hong, Han-Yee Kim, Minsu Kim, Lei
[2] Cong Zuo, Jun Shao, Joseph K. Liu, Guiyi Xu, Weidong Shi, and Taeweon
Wei and Yun Ling"Fine-Grained Two-Factor Suh"FASTEN: An FPGA-based Secure
Protection Mechanism for Data Sharing in System for Big Data Processing",IEEE
Cloud Storage",IEEE Transactions on DESIGN & TEST HARDWARE
Information Forensics and Security. ACCELERATORS FOR DATA CENTERS.
[3] Kaiping Xue, Senior Member, IEEE, Weikeng [6] Hui CUI,Yingjiu LI, "Attribute-based cloud
Chen, Wei Li, Jianan Hong, Peilin storage with secure provenance over encrypted
Hong"Combining Data Owner-side and Cloud- data",Published in Future Generation
side Access Control for Encrypted Cloud Computer Systems, 2018 February, Volume
Storage",IEEE Transactions on Information 26, Issue 4, Pages 461-472.
Forensics and Security. [7] Nesrine Kaaniche, Aymen Boudguiga,
[4] Zaid Kartit, Mohamed EL Marraki, ―Applying Maryline Laurent"ID-Based Cryptography for
Encryption Algorithm to Enhance Data Secure Cloud Data Storage",2013, acm.

IMAGE AND
SIGNAL
PROCESSING

INDOOR NAVIGATION USING AUGMENTED

REALITY
Prof. B. D. Thorat1, Sudhanshu S. Bhamburkar2, Sumit R. Bhiungade3, Harshada S.
Kothawade4, Neha A. Jamdade5
1,2,3,4,5
Dept. of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune
bhagwan.thorat@gmail.com1, sudhanshu5496@gmail.com2, sumitbhiungade2@gmail.com3,
harshadakothawade31@gmail.com4, nehajamdade18@gmail.com5
ABSTRACT
The usage areas of mobile phones have increased in the last 10 years. Although there
have been improvements in many areas, most of the developments are in the field of
positioning systems. Although the people's lives continue in indoor environments,
location-based information system receives data from the satellites, which can detect a
person's location in outdoor areas. The outdoor areas are shown with the help of GPS
and the locations are mentioned on the map. In indoor areas, satellite signals can
cause false information or can be interrupted by the objects in the area. In this study,
an indoor navigation system has been designed and developed that only uses the
accelerometer, the camera and the compass components on the phone and does not
require satellite signals for positioning. To provide independence from the map in this
application, augmented reality is applied during the routing process by utilizing built-in
camera of the phone and no map is used. The proposed method uses on-device sensors
for Dead-reckoning and is supported by a web based architecture, for easily creating
indoor maps and providing an indoor location information for navigation and
localization. The system has been implemented and tested, and the results indicate that
the approach is useful for navigation in indoor environments.
General Terms
Indoor navigation, augmented reality.
Keywords
iOS operating system, Augmented Reality, map, sensor, GPS.
1. INTRODUCTION method is supported by a web based
Existing navigation systems can be interface for users to easily create a map of
broadly classified into two major any indoor location by capturing
categories indoor & outdoor. Most outdoor panoramic images. A smartphone
navigation techniques use satellite based application can then request for this map
navigation systems such as GPS to locate data from the web-server to localize and
an object in any outdoor area. Such navigate the user to a destination via the
techniques work well in open spaces with shortest route. Visual feedback is provided
a clear line of sight to the satellites, but to the user as he moves around in the area.
may not perform well in an indoor
environment, as the signals get scattered 2. LITERATURE SURVEY
and attenuated by physical objects. Tab.2.1:Literature Survey
Challenges in developing any indoor Researcher Title Publication
localization and navigation system include
map generation indoor localization, Dept. A IEEE 2016
software development for the client Shivam Smartphone
platform, etc. In this paper we present the Verma, Based Indoor
design of an end-to-end solution which Rohit Navigation
allows for map-generation, indoor Omanwar System
localization and navigation with the help
of an off the smartphone. The proposed

positions and thereby advancing that

Ibrahim Mobile Springer
position based upon known or estimated
Arda Indoor Publications,
data. The implementation of this technique
Cankaya Navigation 2015
can be done with the help of sensors
System in
available in our smartphones. To calculate
iOS Platform
the number of steps taken by the user Step
Using
Detection Algorithm is used. Dijkstra‘s
Augmented
algorithm is used to calculate the shortest
Reality
path and help in navigating user to the
Shahreen Indoor Springer destination.
Kasim, Loh Navigation International [2]Ibrahim Arda Cankaya Department
Yin Xia Using A* Publishing of Computer Engineering Suleyman
Algorithm AG 2017 Demirel University Isparta, Turkey.
Title of article is ―Mobile Indoor
Fumiaki Indoor Springer Navigation System in iOS Platform
Sato Navigation International Using Augmented Reality‖.
System Publishing This iOS application represents indoor
Based on AG 2018 navigation that only utilizes smartphones
Augmented accelerometer, compass and camera
Reality features and augmented reality without the
Markers need for additional hardware or GPS
module. Firstly an indoor map was created
Chen An Improved IEEE 2015 and routing algorithm was applied. Indoor
Wang, A* map was created in OpenStreetMap.
Jeng- Algorithm Photos of the moving user that are taken
Shyang Pan for Traffic by a phone and the location information
Navigation in are instantly sent to the remote server.
Real-time Photos and the location information were
Environment. compared in database and when there is a
match, the location is determined.
Dijkstra's algorithm is applied to calculate
[1]Dept. Shivam Verma , Rohit the shortest distance between two points.
Omanwar , Sreejith V, Meera GS of The reason to choose this algorithm is
Computer Science and Engineering, because it works faster than other
BITS Pilani - K K Birla Goa campus, algorithms. An application with map-
Goa, India. IEEE 2016. Title of article independence was developed. Therefore,
is ―A Smartphone Based Indoor the user is directed by looking at the way
Navigation System‖. with the virtual components that are placed
An android smartphone based indoor on camera and without having to look at
navigation system is designed. It uses on- the map on the mobile phone.
device sensors for Dead-reckoning and is [3]Shahreen Kasim, Loh Yin Xia,
supported by a web based architecture, for Norfaradilla Wahid, Mohd Farhan Md
easily creating indoor maps and providing Fudzee, Hairulnizam Mahdin, Azizul
an indoor locations information for Azhar Ramli, Suriawati Suparjoh and
navigation and localization. To estimate Mohamad Aizi Salamat. Springer
the current location of the user, from a International Publishing AG 2017. Title
known initial position, Dead-reckoning of article is ―Indoor Navigation Using
algorithm is used. Dead-reckoning is the A* Algorithm‖.
process of calculating the current position, An indoor navigation application that
with the help of previously calculated helps junior students in Faculty of

Computer Science and Information 2015. Title of article is ―An Improved

Technology to find their classroom A* Algorithm for Traffic Navigation in
location. It allows students to choose their Real-time Environment.‖.
destination (the classroom) and provide the There has been a resurgence of interest in
route map to students. It calculates the the shortest path problem. This is due to
shortest path for students and navigates the recent progress on vehicle navigation
students to their destination. At the systems. The shortest path problem
beginning of system design, floor planes involves finding the optimum path
were sketched on drawing papers before between the current position and the
mapping it into computer by using destination. The optimum path found in
Microsoft Paint. Next, the design of the vehicle navigation systems usually pursues
user interface of the proposed application the minimum TD (travel distance) or
was created. Android Studio was used to minimum TT (travel time). The crucial
develop the proposed application. The path point in this area is developing
finding algorithm chosen to implement in increasingly efficient optimal algorithms.
the proposed application is A* algorithm. In this way, it is important to study more
A* algorithm is a combination of heuristic effective method to search optimal route
approaches like Best-First-Search (BFS) on the road network. Past research in this
algorithm and formal approaches like area of shortest path problems mostly
Dijkstra‘s algorithm. revolve about the following two problem
[4]Fumiaki Sato Department of variants: the one origin-node to all
Information Science, Toho University, destination-nodes shortest paths problem
2-2-1 Miyama, Funabashi 274-8510, for a given time, and the all-nodes to one
Japan Springer International destination-node shortest paths problem
Publishing AG 2018. Title of article is for all possible departure times. This
―Indoor Navigation System Based on subject is to study the one origin-node to
Augmented Reality Markers‖. one destination-node problem variant,
The purpose of this study is to evaluate the which is the problem to find the shortest
range of AR marker recognition and the path from to on a directed graph such that
accuracy of distance estimates obtained the length of an edge in is where the origin
using AR markers. Further, we propose a s and the destination t are given. The
method that allows for long-term operation Dijkstra algorithm and the A* algorithm
of the navigation system by using are the most well-known algorithms to
Bluetooth low-energy (BLE) terminals to solve this problem. But, they are effective
activate the camera only when only in static network. In this paper, a brief
approaching a marker. Then, we describe a overview of the optimal shortest path
low-cost indoor navigation system that is algorithms is provided at first, then we
operational at a temporary event, such as propose an improved A* search algorithm,
an open house, by using the AR markers which can apply to the dynamic real-time
for position measurement. Conventional environments
indoor navigation systems cannot provide
a flexible route, such as to avoid crowds. 3. SYSTEM ARCHITECTURE
The proposed system can provide flexible Our augmented reality based indoor
routing by using position information from navigation system is composed of two
all users. parts. In the first part, the user enters
[5]Chen Wang & Jeng-Shyang Pan information about the starting location and
Innovative Information Industry the target location. After this process,
Research Center, Shenzhen Graduate augmented reality section is displayed and
School, Harbin Institute of Technology the routing process starts and the shortest
IIIRC, HITSZ Shenzhen China. IEEE

path is calculated. The system architecture show indoor applications. It does not
is shown in Figure 3.1. support internal routes.
4.2 Proposed System
To overcome the existing problems we
proposed a system. In proposed system we
make an indoor map which is linked to the
augmented reality. In these system we
make a navigation guide which will guide
the user to reach their respective
destination. It is actually an Augmented
Reality where we require a camera
generated on action video. Using on-
screen navigation guide will easily guide a
Fig. 3.1 System architecture
user to reach his/her destination without
any confusion or without asking someone
The layered Architecture gives a brief idea
for help.
about how different components of the
system can work together to give the
5. DATA FLOW DIAGRAM
desired output. The first layer is the APP
5.1 DFD Level 0
Layer, which is shows UI on the screen.
Using the UI a user can interact with the
app. The next layer is SDK Layer. The
SDK includes all the APIs which help the
user interact with the hardware.
Fig. 5.1 DFD (Level 0)
The final layer is the Sensor/Hardware
Layer. The Gyroscope and the
5.2 DFD Level 1
Accelerometer gets the app running by
calculating GPS coordinates thereby
making making the places to be located.
Fig. 5.2 DFD (Level 1)
Fig. 3.2 Layered Architecture
4. THE EXISTING AND THE

PROPOSED SYSTEMS
4.1 Existing System
A Navigation system is already available
in the Google Maps which helps to guide
the user to reach his/her destination
through GPS. But Google Maps cannot

6. MATHEMATICAL MODEL 8. LIMITATIONS

8.1 2 Dimensional Mapping
Mapping can be done only in 2D for each
floor. To map a 3D coordinate no direct
method is available. In order to map a 3D
coordinate a need is aroused to stack
multiple slices of a 2D plane on top of
each other.
8.2 Device Specific
AR system has to deal with vast amount of
information in reality. Therefore the
hardware used should be small, light, and
easily portable and fast enough to display
Fig. 6.1 A* Algorithm
graphics. The smartphones which have
been manufactured before 2017 mostly
don't support Augmented Reality. Many
7. ADVANTAGES companies now offer Augmented Reality
7.1 Simplicity support for their flagship smartphones.
User does not need to know about the Since, Augmented Reality requires special
internal working to satisfy his purpose. chipsets (motherboard) extra costs are
They just have to know the English incurred while developing such chipsets.
language and basic knowledge about Once, Augmented Reality is
smartphones. technologically accepted across all devices
7.2 Accuracy gradually costs incurred will also be
Indoor Navigation technology relies on reduced.
WiFi or data to provide accuracy where 8.3 Battery Issues
GPS systems do not work. Evidently, GPS Functionality will be limited as the device
does not produce an efficient outcome uses more battery, this is because camera
when used indoors. The indoor navigation will be running on the background. Too
technology relies on several platforms to much battery is consumed in rendering the
give you the best outcomes. The trick is to virtual environment in the real world.
combine a building‘s blueprints with the 9. APPLICATIONS
available technology, such as Wi-Fi to 9.1 School Campus
guarantee the unparalleled levels of School Campuses or institutions can
accuracy. If you want to find the location make use of indoor navigation to
of some goods in a huge shopping mall, provide help to students or visitors for an
you can rely on the accuracy of this area of interest in the campus.
system. 9.2 Museum Guided Tours
7.3 No special Hardware Required Museums can make use of such
If your phone is Wi-Fi enabled, you can applications to mark the historic
use the indoor navigation system without importance or valuability of a product.
having to invest in a smartphone. Whether They can guide visitors through different
you want to shop in a supermarket or visit items present in the museum. In doing so
a museum, you can find your way through simultaneously they can provide
the different precincts easily. Indoor information using augmented reality.
navigation is a basic technology platform 9.3 Shopping Malls
that does not require the users to upgrade Shopping malls can provide indoor
to the modern devices to enjoy the navigation to guide people to all the stores
services. present in the mall along with washrooms,
exits, parkings.

9.4 Tourism 11. ACKNOWLEDGEMENTS

Tourists are often concerned about places With due respect and gratitude we would
of interests in a city. With the help of like to take this opportunity to thank our
augmented reality they can visit a place internal guide PROF. B. D. THORAT for
and scout different areas of the place giving us all the help and guidance we
without any false information. needed. We are really grateful for his kind
9.5 Other support. He has always encouraged us and
Similar applications exist in many more given us the motivation to move ahead. He
locales like Warehouses, Factories, has put in a lot of time and effort in this
Airports, Subways, Hospitals, Hotels, project along with us and given us a lot of
Historic Places, etc. as the application confidence. We are also grateful to DR. P.
depends upon the environment in which N. MAHALLE, Head of Computer
user will use the app. Engineering Department, Smt. Kashibai
Navale College of Engineering for his
10. CONCLUSION & FUTURE indispensable support. Also we wish to
WORK thank all the other people who have helped
10.1 Conclusion us in the successful completion of this
It can be concluded that we have project. We would also like to extend our
developed indoor navigation application sincere thanks to Principal DR. A. V.
using augmented reality for indoor areas DESHPANDE, for his dynamic and
on iOS platform. We have developed the valuable guidance throughout the project
application with map-independence in and providing the necessary facilities that
mind. Therefore, the user is directed by helped us to complete our dissertation
looking at the way with the virtual work. We would like to thank my
components that are placed on camera and colleagues friends who have helped us
without having to look at the map on the directly or indirectly to complete this
mobile phone. Furthermore we work.
implemented A* algorithm to find the
shortest way to the target location. The REFERENCES
system is a simple, low cost navigation [1] Delling, D., Sanders, P., Schultes, D., Wagner,
assistant to provide a low cognitive load D.: Engineering route planning algorithms. In:
Lerner, J., Wagner, D., Zweig, Katharina, A.
interface on a user's standard camera (eds.) Algorithmics of Large and Complex
phone. Networks. LNCS, vol. 5515, pp. 117–139.
10.2 Future Work Springer, Heidelberg (2009).
Although the objectives of this project [2] A. Finkel, A. Harwood, H. Gaunt, and J.
have been achieved, there are limitations Antig, ―Optimizing indoor location recognition
through wireless fingerprinting at the ian potter
in the actual system. The application only museum of art,‖ in Indoor Positioning and
provides mapping in two-dimension (2D). Indoor Navigation (IPIN), 2014 International
In future work, we plan to extend our Conference on, Oct 2014, pp. 210–219.
smartphone application to automate the [3] Y. Li, P. Zhang, X. Niu, Y. Zhuang, H. Lan,
process of step counting when capturing and N. El-Sheimy, ―Real-time indoor
navigation using smartphone sensors,‖ in
panoramas. We also plan to provide 3D Indoor Positioning and Indoor Navigation
mapping for a better users‘ experience and (IPIN), 2015 International Conference on, Oct
explore real-time image matching 2015, pp. 1–10.
techniques using smartphones to increase [4] M. Werner, M. Kessel and C. Marouane,
the accuracy of our indoor navigation ―Indoor positioning using smartphone
camera,‖ In Indoor Positioning and Indoor
system. Navigation (IPIN), International Conference
on IEEE, pp. 1-6, September 21-23, 2011.
[5] J. B. Link, P. Smith, N. Viol and K. Wehrle,
―Footpath: Accurate map-based indoor

navigation using smartphones,‖ In Indoor [8] A. R. Jimenez, F. Seco, C. Prieto and J.

Positioning and Indoor Navigation (IPIN), Guevara, ―A comparison of pedestrian dead-
International Conference IEEE, pp. 1-8, reckoning algorithms using a low-cost MEMS
September 21-23, 2011. IMU,‖ In Intelligent Signal Processing, WISP,
[6] T. H. Kolbe, ―Augmented videos and IEEE International Symposium, August 26-28,
panoramas for pedestrian navigation‖, 2nd 2009.
Symposium Location Based Services and [9] J. Kim, H. Jun, ―Vision-based location
TeleCartography, January 28-29, 2004. positioning using augmented reality for indoor
[7] L. Liu, S. Zlatanova, ― A ―door-to-door" Path- navigation‖, Consumer Electronics, IEEE
finding Approach for Indoor Navigation,‖ Transactions on, vol. 54 no. 3, pp. 954-962,
Proceedings Gi4DM: GeoInformation for 2008.
Disaster Management, May 3-8, 2011. [10] IndoorAtlas,https://app.indooratlas.com/apps/
Accessed: June 2018.

AI BASED LESION DETECTION SYSTEM

Mayuri Warke1, Richa Padmawar2, Sakshi Nikam3, Veena Mahesh4, Prof. Gitanjali R.
Shinde5, D. D. Shinde6
1,2,3,4,5
Department of Computer Engineering, SKNCOE, Vadgaon(Bk), Savitribai Phule Pune University,
Pune, India.
6
Mayuriwarke1997@gmail.com1,richapadmawar123@gmail.com2,satnik1997@gmail.com3,veenamahesh19
97@gmail.com4,gr83.gita@gmail.com5, ddshinde14@gmail.com
ABSTRACT
In recent era, Lesion detection has gained importance, typically in the young
generation as a consequence of their unhealthy lifestyle. The traditional methods used
for lesion detection include manual detection which is time consuming and
inconvenient. Also, separate detection of skin lesions has been done individually but
there is more scope in combining the detection of one or more lesions in a single
system. In this work it has been proposed a sequence for acne detection and counting
through the processing of distance images taken by a webcam. The detection of
wrinkles using different filters is propose and accordingly the remedies for the skin
lesions have been suggested by the system.
Keywords:
Lesion detection, Image processing, machine learning, classification, Haar Cascade,
Heat map, Law‘s Mask Filter, Gabor wavelets transformation.
1. INTRODUCTION ageing, but wrinkled skin before reaching
The project aims at building a the expected age is also a substantial issue
lesion detection system which mainly in these days. Wrinkles are caused by loss
detects acne and wrinkles due to of collagen, which is protein present in
unhealthy lifestyles of the young the dermis layer of the skin [2].
generation. The system mainly focuses on There have been contributions in the field
image acquisition, face detection, skin of skin and face detection for facial and
segmentation, heat mapping, acne body marks. Previously Nasim Alamdari
extraction and blob detection [1]. The developed a system for acne detection
system also suggests a set of remedial using HSV model, K means and support
measures for the detected lesions. Acne vector machine but the model was trained
vulgaris is a very common skin problem on limited number of images [4].
across the entire globe. Majority of the Also,Jain et al proposed a system to detect
adolescent population suffers from this facial marks using active appearance
skin condition. Also, it is found that acne model and Laplacian of Gaussian to locate
affects around 650 million people across facial features but was unable to locate
the globe, i.e. Almost 9.4% of the individual features [12]. The detection and
population. On the other hand, wrinkles cure of skin lesions should be done in
are common in people belonging to the early stages. Since acne and wrinkles
age group of 40 and above. Both of these affect the face of the patient, it sometimes
are termed as lesions in the skin. Lesion is may also lead to anxiety, depression and
caused by any abnormal change in the loss of self- esteem. Therefore, it becomes
tissue. The major causes of acne are essential to know about the lesions and
unhealthy food habits, hormonal changes, take proper remedial measures for the
stress and environmental factors. same.
Summing it up together, acne can be said
to be caused by unhealthy lifestyle. 2. MOTIVATION
Wrinkles are predominantly caused due to

Acne vulgaris is a very common skin connected component labeling detected

disease and affects 85% of population at the connected regions as wrinkled skin
some point in life, typically in adolescence or not. Eigen calculation method used was
[1]. On the other hand, the cheek‘s very complex and platform dependent. If
collagen loses gradually. Losing collagen improved eigen calculation method is
will breed wrinkles on the face [2]. used then accuracy rate will be high [2].
Wrinkles are common in people belonging Phung, Son Lam, Abdesselam
to the age group of 40 and above. Bouzerdoum, and Douglas Chai
Both of these are termed as lesions in the proposed a system which mainly dealt
skin. The major causes of acne are with skin segmentation using Bayesian
unhealthy food habits, hormonal changes, classifier with the histogram technique
stress and environmental factors. Summing and the multilayer perceptron classifier.
it up together, acne can be said to be The major drawback was segmentation
caused by unhealthy lifestyle. performance degraded when only
These are serious issues which are chrominance channels were used in the
frequently seen to be ignored by the classification and thus we can combine
sufferers. The detection and cure of skin both intensity and texture-based image
lesions should be done in early stages. segmentation to provide better result [3].
Since acne and wrinkles affect the face of Nasim Alamdari and Kouhyar Tavakolian
the patient, it sometimes may also lead to proposed a system for detection and
anxiety, depression and loss of self- classification of acne lesions using
esteem. Beauty industry has become one various algorithms and checked the
of the largest growing industries. The most accuracy rate with each one of them.
famous service for beauty clinic is acne Accuracy rate of k-means, HSV model
treatment [1]. Therefore, it becomes and support vector machines was low, as
essential to know about the lesions and compared to fuzzy c-means which gave
take proper remedial measures for the 100% accuracy rate. The only pitfall of
same. the system was limited number of images
were used and could have been better
3. LITERATURE SURVEY system if the model was trained on a large
Kittigul, Natchapol and Bunyarit proposed dataset [4].
an automated acne detection system which Jain, Anil K., and Unsang Park proposed a
firstly acquired the image of the patient system to detect facial marks such as scars,
followed by the extraction of patient‘s face freckle and mole. They used Active
using Haar Cascade classifier. Then with Appearance Model (AAM) to locate and
the help of heat mapping and adaptive segment primary facial features and
thresholding, the system visualized the Laplacian of Gaussian (LOG). The pitfall
patient‘s face and finally blob detection of the system was individual types of
was used to mark blob on the image. facial marks were not explicitly defined.
Adaptive thresholding was the major There is still a lot to do on this system to
problem concerned. To overcome this, we improve its accuracy. In future the system
can set a threshold value instead of would recognize types of face marks and
varying threshold [1]. distinguish between them [5].
Chin, Chiun-Li, and Ho-Feng Chen Psyllos, Andreas, and David Al-Dabass
proposed a system for facial wrinkle developed a model which simulated the
detection with texture feature. Firstly, an process of identifying objects from
image was captured and skin was relative positions of blemishes and marks
detected. The method detected facial as commonly used by intelligent biological
wrinkle by Law‘s mask filter and Gabor vision systems. The results showed that
wavelets transformation. With the help of high percentage of recognition was

achieved. Further development effort was stage detects the lip in potential face
expected. There were differences in image regions using a lip colour model and
quality. Uniform image quality should be searches the eyes using geometry textures.
used to achieve higher accuracy [6]. The last stage clips the face region using
Guan, Haiyan, Yongtao Yu, and Jonathan an optimization ellipse. In the future, it is
Li developed a tensor voting approach to expected to plan to use this face detection
dark spot detection in RADARSAT-1 scan system for pre-processing to solve face
SAR narrow beam mode images. The tracking and face recognition problem
proposed method was developed using [10].
C++ running on an HP Z820 workstation.
Quantitative evaluations have 4. GAP ANALYSIS
demonstrated that the proposed method The comparison drawn between the paper
achieves an average commission error [7]. previously published addressing lesion
Hsieh, Chen-Chiung, and Meng-Kai Jiang detection can be well understood from
developed a facial expression the Table-1.
classification system based on Active Table 1- Comparison between content in papers.
shape model and support vector machine. Author Proposed Pitfall
This system utilized facial components to System
locate dynamic facial textures such as Nasim HSV model, K Limited
frown lines, nose wrinkle patterns,and Alamdari[2] means and number of
nasolabial folds to classify facial support vector images were
expressions. Support Vector Machine is machine. used.
deployed to classify the six facial Kittigul[1] Haar Cascade
expression types including neutral, classifier
happiness, surprise, anger, disgust, and
fear. The results showed that the method Chin et al[4] Law‘s mask Eigen value
proposed classified six human filter and calculation
expressions effectively, namely neutral, gabor wavelets method was
happiness, surprise, anger, disgust, and complex.
fear [8]. Jain[12] AAM and LOG Face marks
Ohchi, Shuji, Shinichiro Sumi, and Kaoru were not
Arakawa developed a nonlinear image explicitly
processing system for beautifying human defined.
facial images using contrast enhancement Douglas Bayesian Segmentation
which effects highlighting and shading. Chai [3] classifier and performance
This system can realize highlighting and multilayer degraded.
shading in the face, which make the face perceptron
look deeply chiseled as well as removing classifier
the undesirable skin roughness such as
wrinkle and spots. The parameters in this 5. PROPOSED SYSTEM
system are optimized with IEC. One- point The proposed model consists of three
crossover is applied where the crossover subsystems which are as follows -
point is randomly determined and a single  Acne Detection System
bit is reversed in the mutation where the  Wrinkle Detection System
locus is also determined randomly [9].  Remedial System.
Wang, YuanHui, and LiQian Xia Image will be taken from the user and
developed feature-based face detection in preprocessing steps will be performed as
complicated backgrounds. The first stage mentioned in Section 5.1.
adopts skin colour-based segmentation to After the preprocessing steps are done, the
search potential face regions. The second system will check whether acne or

wrinkles are present on the person‘s face or transformation. Then connected component
not. Then respective results will be labeling algorithm can detect connected
given for that image. regions in wrinkles‘ binary digital images.
After detection of lesion is done, the
corresponding remedies for the same are
also given as output by the system. This
automated lesion detection system detects
lesions and suggests measures to cure or
prevent them in the future.
5.1 Methodologies / Algorithms

5.1.1 Acne detection algorithm:
The algorithm follows the following

pipeline description-
Figure 1- Architecture of Proposed System.
Figure 1 indicates the flow of the proposed Step 1: Image Acquisition – using a high
system and subsystems required for the resolution webcam attached to the local
execution of the proposed system. PC.
Explanation-
The project aims at implementing Step 2: Frontal Face Detector – using
an AI based Lesion Detection System OpenCV‘s Haar Cascade Classifier.
which uses Facial Dataset containing
images for the analysis of the image given Step 3: Skin segmentation – Extraction
as input by the user. This system is of skin mask of validation images.
divided into two subsystems namely:
Acne Detection System and Wrinkle Step 4: Heat Mapping – conversion the
Detection System. Acne is the most skin extracted map to colour space.
common skin disease caused by oil
and dead skin cells clogging the pores. Step 5: Acne Extraction – acne
Wrinkles is a skin condition that occurs as separation from healthy skin using
a consequence of ageing and it is mainly Adaptive Thresholding on the heat map
caused by loss of collagen, a protein obtained in Step 4
present in the dermis layer of the skin. .
The Acne detection subsystem is Step 6: Blob Detection – based on
characterized by a pipeline including Laplacian- Gaussian of package Scikit-
steps like: body part detection, skin image was used.
segmentation, heat mapping, acne
extraction, and blob detection. The acne 5.1.2. Wrinkle Detection algorithm:
detection system acquires and extracts
frontal face image of the patient via web The algorithm follows the steps:
camera and an Haar Cascade classifier
and GrabCut segmentation algorithm Step 1: Facial image acquisition.
extracting only patient facial region.
Adaptive thresholding and blob detection Step 2: Colour space Conversion –
was finally used to mark detected spot of Transforms facial image to grey value.
acne.
Wrinkle detection system proposes a Step 3: Law‘s mask Filter – edge
method which detects facial wrinkle by detection.
Laws‘ Mask Filter and Gabor wavelets

Step 4: Image Binarization – to get clear receive some useful suggestions and
wrinkle contour. remedies to cure their skin problems in an
effective manner.
Step 5: Connected Component Labeling –
to mark wrinkles‘ number 7. FUTURE SCOPE
In the future, we want to collect more and
Step 6: Detect wrinkle position using more users‘ facial images and their
area, length, width feature on the CCL detection results for analyzing in order to
map. improve the accuracy of our system and
also put forward the superior proposal for
6. CONCLUSION our users. One more idea can be applying
Acne detection and classification is one of various methods to acne images and
the most important processes in acne comparing the results and combining the
treatment. In this work it has been successful algorithms to get higher
proposed that for acne detection and accuracy to segment and distinguish acne
counting it,use and process the images types and also identifying other skin
taken by a webcam. Haar Cascade conditions. The scope of the project can
detectors have been used to classify the be extended to detect dark circles below
facial portion of the images. the eyes and dark spots on the skin.
Mouth and ear detectors are used to
obscure critical parts of the face that REFERENCES
could be wrongly classified as acne [1] Kittigul, Natchapol, and Bunyarit
lesions. Segmentation of skin pixels has Uyyanonvara. "Automatic acne detection
system for medical treatment progress report."
been performed combining several colors, In Information and Communication
texture, shape, spatial and unsupervised Technology for Embedded Systems (IC-
descriptors. Proposed unsupervised ICTES), 2016 7th International Conference of,
features improved the performances of pp. 41-44. IEEE, 2016.
the skin segmentation model that is an [2] Chin, Chiun-Li, Ho-Feng Chen, Bing- Jhang
Lin, Ming Chieh Chi, Wei-En Chen,and Zih-
ensemble of 10 Random Forest models Yi Yang. "Facial wrinkle detection with
and achieved high accuracy at a texture feature." In Awareness Science and
reasonable computation time on FSD Technology (iCAST), 2017 IEEE 8th
dataset. The channel a* of the CIELab International Conference on, pp. 343-347.
model has been proven suitable to IEEE, 2017.
[3] Phung, Son Lam, Abdesselam Bouzerdoum,
enhance discrimination between acne and Douglas Chai. "Skin segmentation using
lesion and healthy skin, and Adaptive color pixel classification: analysis and
Threshold performed on this channel is comparision". IEEE transactions on pattern
able to separate acne lesion from healthy analysis and machine intelligence 27, no. 1
skin with acceptable result. Laplacian of (2005): 148- 154.
[4] Alamdari, Nasim, Kouhyar Tavakolian,
Gaussian filter is the algorithm selected to Minhal Alhashim, and Reza Fazel-
detect acne spots and mark them in the Rezai."Detection and classification of acne
image. Finally, reports are produced lesions in acne patients: A mobile
containing number, location and ray application." In Electro Information
dimension of the detected acne spots. The Technology (EIT),2016 IEEE International
Conference on, pp. 0739-0743.IEEE,2016.
system also worked on detecting skin [5] Guan, Haiyan, Yongtao Yu, and Jonathan Li.
conditions such as wrinkles, skin pores. "A tensor voting approach to dark spot
Using the Laws Mask filters followed by detection in RADARSAT-1 intensity
connected component labeling algorithm imagery." In Geoscience and Remote Sensing
for the detection of wrinkles. Users can Symposium (IGARSS), 2015 IEEE
International, pp. 3160-3163. IEEE,2015.
comprehend that what kinds of difficulties [6] Hsieh, Chen-Chiung, and Meng-Kai Jiang. "A
their skin has encountered and they will facial expression classification system based

on active shape model and support vector [9] Maroni, Gabriele, Michele Ermidoro, Fabio
machine." In Computer Science and Society Previdi, and Glauco Bigini. "Automated
(ISCCS), 2011 International Symposium on, detection, extraction and counting of acne
pp. 311-314. IEEE,2011. lesionsfor automatic evaluation and tracking of
[7] Ohchi, Shuji, Shinichiro Sumi, and Kaoru acne severity." In Computational Intelligence
Arakawa. "A nonlinear filter system for (SSCI), 2017 IEEE Symposium Series on, pp.1-
beautifying facial images with contrast 6. IEEE, 2017.
enhancement." In Communications and [10] Chin, Chiun-Li, Guei-Ru Wu, Tzu-Chieh
Information Technologies (ISCIT), 2010 Weng, Yun-Yun Kang, Bing-Jhang Lin,and
International Symposium on, pp. 13-17, IEEE, Ho-Feng Chen. "Skin condition detection of
2010. smartphone face image using multi-
[8] Wang, YuanHui, and LiQian Xia. "Skin color feature decision method. " In Awareness
and feature-based face detection in complicated Science and Technology (iCAST),2017 IEEE
backgrounds." In Image Analysis and Signal 8th International Conference on, pp. 379- 382.
Processing (IASP), 2011 International IEEE,2017.
Conference on, pp. 78-83. IEEE, 2011.

LEAP VIRTUAL BOARD: SWITCHLESS HOME

APPLIANCES USING LEAP MOTION
Aakanksha kulkarni1, Sakshi chauhan2, Vaishnavi sawant3 , Shreya satpute4, Prof P.N
Railkar5, Jigyasa Chadha6
1,2,3,4,5
Department of Computer Engineering, Smt. Kashibai Navale College of Engineering,
Savitribai Phule Pune University, Pune, India. 6Department of ECE, HMR Institute of Technology and
Managmenet Delhi
Jigyasachadha14@gmail.com
ABSTRACT
Now a days, there is a large interest growing up in the field of virtual reality
applications, due to its boom in gaining popularity. In this specific assingment, leap as
a motion controller is used which is an usb interface for connection to PC, MAC, IOS
and Android devices. Its purpose is to capture high resolution images of users hand
gestures. The aim is to design a hardware and software to operate home appliances,
through reading and tracking user hand gestures. Users hand gestures are dynamic
and provided as primary input. The data from the device will be taken and transmitted
to computer application, which will be authenticated and the computer application will
detect the gestures and finally operate the devices.[1]
Gesture recognition is a field in computer science and technology with a goal of
reading human gestures via mathematical algorithm.
Keywords:
Leap Motion, Gesture recognition, Hand Gestures, Virtual reality applications.
1. INTRODUCTION similarity as they show maximum
Hand gesture recognition has attracted similarity between two features. [2]
interest in many fields such as computer The distance between the adjacent fingers
gaming, human computer interaction, and their distances from the center of the
robotics and automatic sign recognition palm is present in feature vector. The
interpretation and so on. Descriptive grabbed feature vectors will be processed
information about hand gestures by by computer application in order to switch
tracking the finger movements in digital the devices.
format is obtained by motion The main aim is to develop a system
controller.Key points obtained by the which will be able to render the
controller is furthur used for training and technology of leap motion to access
recognition. No colour band is required for various components of computer.
gesture recognition in leap motion. Leap Henceforth, Building an application for
motion is most popular because it is more computer through which leap motion
compatible and cost effective.[1] applications are accessed. The aim is to
In this system,the input hand gestures are actually control devices without practically
obtained by the leap motion controller.The switching them on or off by touching. The
tips of fingers and center of palm is provided computer system will not only
obtained by the leap motion controller. recognize hand gesture but as well as store
Feature points are used to obtain mapping them too in the acquired database.[7]
as feature vector. Database used is The system would be successfully able to
serialization database which is used to switch the devices at the hardware end
store the features which are extracted from using specific hand gestures. It would also
the leap motion controller. Then, furthur be capable of accepting dynamic inputs
for extraction from the leap motion which can be dedicated to specific
controller. Feature extraction are compared functionality of a device. The project also
using eucledian distance measure cosine aims to overcome issues of earlier vision

based systems such as skin color, lighting between the leap and the devices. To make
variations and hand orientation relative to the devices properly work.[1]
the device.
Following are the remainders of the Arabic Sign Language Recognition Using
paper: Leap Motion Sensor - Leap motion‘s 3D
Section II gives a detailed Literature digital data is used in recognition system
survey. for Arabic sign language recognition based
Section III presents the proposed system of on ANN. Leap tackles issues in vision
the work. based systems such as skin colors,
Section IV proposes the implementation of lightning, etc. Captures hands and fingers
the system. in 3D format. MLP is used in which spatial
Section V concludes the paper. features are stored.Few disadvantages of
this system are
2. LITERATURE SURVEY Used dept cameras, kinect and digital
The Leap Motion controller is a small camera. Though they achieved high
USB peripheral device which consists of accuracy but they suffered from stability in
3 I.R LED‘s, a sensor, a black glass, 2 I.R realistic environment. Leap motion does
monochromatic cameras in it. It uses not track non-manual features.[2]
Infrared (IR) rays to define the position of
predefined objects in a limited space in the Bulbs Control in Virtual Reality by Using
real time environment. The device range is Leap Motion Somatosensory Controlled
a rough hemispherical area, which has Switches:- A four-channel Leap Motion
about a distance of 1 meter. Leap Motion somatosensory controlled switching
sensor is a small size sensor which is easy module implemented for bulb switching
to use and of low cost as shown in figure and control. To aid some person whose
1. The physical dimensions of the device hands have damaged.Improve quality of
are: length 80mm, width 30mm and height their living modules. Cost will be reduced
13mm.[1] by mass production. Non-touch VR avoids
possibility of infections. Practical training
of software design. The relay module was
served as electrical controlled switches
which received the signal from Arduino
Mega 2560 that received the instance were
controlled by using leap motion
somatosensory.[7]
Double Hand-gesture Interaction for

Walk-through in VR Environment:-
Double hand-gesture interaction (DHGI)
Figure 1. Leap Motion Sensor method for walk-through in VR
Comparison of hand gestures input of leap environment with an Oculus Rift headset
motion controller and data glove into a soft and Leap Motion function.DHGI is more
finger:- VIMi can be an alternative way to suitable for the applications that do not
learn Indonesia traditional musical need high response speed. It is simple,
instrument. The feature to show Indonesia comfortable double hand gesture to control
traditional songs also help user to learn an avatar (first-person view) movement
more about Indonesia traditional songs. It that allows the user to become fully
also can help to introduce and conserve the immersed in the VR environment. It may
Indonesia traditional music.Data glove can be likely to cause motion sickness and be
be used to capture the exact interface less accurate, and sometimes may confuse

us about the direction. The sensitivity of

the DHGI method is slightly less than that 4 Proposed System
of the joystick compared to the joystick In this system the input hand gestures are
method, the DHGI method spent slightly acquired and tracked by the help of a leap
more time to complete the task.[9] motion controller. Leap motion controller
gives the tips of the fingers along with the
Hand Gesture Recognition With Leap center of the palm as the points. This
Motion And Kinect Devices:- American mapping of the gesture is done in terms of
Manual Alphabet using Hand gesture feature points. The distance between the
recognition scheme implemented with leap tips of fingers and center of palm is
and Kinect. Kinect and leap allow termed as feature vector.
obtaining informative description of hand
poses. Assignment of each finger to a Serialization database is used to store the
specific angular region leads to a features of the hands which are extracted
considerable increase of performance from the Leap motion controller. Then
proposed set of features and classification further for recognition, the feature vectors
algorithm allows to obtain a good overall are compared using Euclidean distance
accuracy. The Leap Motion provides a measure, Cosine similarity as they show
higher level but more limited data maximum similarity between two
description G2 and G3 are easily confused. features.[2]
This is due to the limited accuracy of the
hand direction estimation from the Leap The feature vector consists of the distances
Motion software that causes an unreliable between the tip of adjacent fingers and
matching between the fingertip of these their respective distances from the center
gestures and the corresponding angular of the palm. The grabbed feature vectors
region[13] will be processed by computer application
in order to switch the devices on or off .
3. GAP ANALYSIS Diagram(Workflow Diagram)
[1] The first paper system is based on
touchscreen facility. It is more time
consuming. Accuracy for this system is
more but cost is less.
[2] The second paper system is not based
on touchscreen facility. It is less time
consuming than DHGI. Accuracy for this
system is more but cost is less.
[3] The third paper system is based on
touchscreen facility. It is more time
consuming than joysticks. Accuracy for
this system is less and cost is less. Figure 2.Work flow of proposed system.
[4] The fourth paper system is not based Figure 2 shows the basic work flow of
on touchscreen facility. It is less time proposed system.
consuming. Accuracy for this system is Leap helps to extracts certain points from
more but cost is less. the database which are termed as feature
Proposed system –Proposed system deals vectors. These points are further processed
with no touchscreen facility. Cost is less as and user defines these gestures.Further
compared to other systems. System these gestures are stored in database using
provides high accuracy with less time serialization process in java. These
consuming ability. Proposed system works gestures are specifically user-defined
on 3D image capturing format. gestures thus provides authentication.[7]

During runtime, when the user performs

specific gesture certain features are
extracted and compared with gestures
stored in database. This comparison is
done with the help of Cosine formula.
Thus, the comparison results in detection
of specific gestures predefined by user.
Further, a command is given to device
driver through which the devices starts
switching or working. Hence, leap
technology is used to switch the appliances
and make them to work.[1]
4. PROPOSED SYSTEM
ALGORITHM
Distance: Euclidean
Euclidean distance is a distance between

two key points in Euclidean space. When
measuring the distance in the plane, we Figure 3. Flowchart of proposed system
use the formula of Euclidean distance.
According to formula, the distance 5. Implementation Of System
between two points in the plane with co- Following points are included in
ordinates (x,y,z) and (a,b,c) is given as- implementation of the proposed system.
dist((x, y), (a, b)) = √(x - a)² + (y - A. Data Acquisition
b)²+ (z-c)2[2] The frame data represents a set of tracking
and analyzing data for hands and fingers to
Comparison:Cosine detect them in a single frame. The
In figure 3 It shows it is a measure of acquired data through the Leap sensor
similarity between two non-zero vectors of consists of array of objects to store the
an inner product space that measures physical characteristics of a detected
cosine angle between them.The cosine fingers such as fingertips. The gestures
value lies between 0 to 1.[2] captured by the Leap controller yields a
few key points corresponding to each
hands gesture as shown in figure 4. [7]

P4 P5
P3
P6
P2
(x1,y1,z1)
P1
Figure 4.Palm points considered for leap gesture
B. Feature Extraction run time, distances are calculated from
The extracted key points are coordinates the extracted feature points using
of finger positions from the input Euclidean distance formula as follows:
gesture. The points are the center of the 𝑑𝑖 =√ (𝑥𝑖 −𝑥𝑖+1)2 +(𝑦𝑖 −𝑦𝑖+1)2 + (𝑧𝑖
palm(say P), tips of thumb(say T), index −𝑧𝑖+1)2 -----(where i= 1 to 15 for single
finger(say I), middle finger(say M), ring handed gestures )
finger(say R), pinky finger(say K) for The cosine distance measure algorithm
each hand. The coordinates are, P(x1, gives the fastest and the most efficient
y1, z1), T(x2, y2, z2), I(x3, y3, z3), measure.[2]
M(x4, y4, z4), R(x5, y5, z5), K(x6, y6,
z6) for each hand. [2] D. Gesture Database
The extracted points are stored in a
serializable database and distances
between each tip and palm center is
calculated with respect to each finger.[3]
6 conclusion
To overcome issues of earlier vision
based systems such as skin color,
lighting variations and hand orientation
relative to traditional gesture detection
devices. This paper presents a
Figure 5.Distance vectors calculate d using
Euclidean
preliminary solution of hybrid interface
C. Gesture Classification consisting software and hardware
Figure 5 represents Whenever a gesture components.
is performed the points are extracted and
they are stored using serialization .At the

Develop a system which will be able to Loeve transform." arXiv preprint

render the technology of leap motion to arXiv:1306.2599 (2013).
[7] Kainz, Ondrej, and František Jakab. " Bulbs
access various components of computer. Control in Virtual Reality by Using Leap
Building an application for computer Motion Somatosensory Controlled
through which leap motion applications Switches." Acta Informatica Pragensia 3.1
are accessed. The aim is to actually (2014): 104-112.
control devices without practically [8] Giulio Marin, Fabio Dominio, Pietro
Zanuttigh,‖Hand Gesture Recognition with
switching them. The provided computer Leap Motion and Kinect Devices‖, ICIP
system will not only recognize hand 2014,.2-5.
gesture but as well as store them too. [9] Wei Lu, Member, IEEE, Zheng Tong, and
Based on this it can be concluded that Jinghui Chu,‖ Double Hand-gesture
leap technology can be a efficient Interaction for Walk-through in VR
Environment‖, IEEE Signal processing
interface for operating home appliances. letters, vol 23,no. 9,September 2016,1188-
1190.
The future work will exploit various [10] Rita Jadhav1,Vrushali Ithape2,Rutuja
such applications which could be Vankar3,Tanuja Shinde4, Prof.Sinu
accessed virtually in challenging Nambiar5,‖Gamer And Robot –An
Interactive Gaming And Robot Control Tool
environment. Using Leap Motion‖,International Journal of
Emerging Technology and Computer
REFERENCES Science,Vol 2 ,2 April 2017,1-5.
[1] Bing-Yuh Lu, Chin-Yuan Lin, Shu-Kuang [11] MithileyshSathiyanarayanan,Sharanya
Chang ,― Comparison of hand gestures input Rajan,‖ Understanding the Use of Leap
of leap motion controller and data glove into Motion Touchless Device in Physiotherapy
a soft finger‖,pp. 1-3,2017. and Improving the Healthcare System in
[2] Ayanava Sarkar,Geet Capoor, ―Arabic Sign India‖,9th International Conference on
Language Recognition Using Leap Motion Communication Systems and Networks
Sensor‖, pp. 2-5,2016. (COMSNETS),2017,506-508.
[3] MithileyshSathiyanarayanan,Sharanya [12] Y.Pititeeraphab, P. Choitkunnan, N.
Rajan, ―Understanding the Use of Leap Thongpance, K.Kullathum,
Motion Touchless Device in Physiotherapy Ch.Pintavirooj,‖Robot-arm Control System
and Improving the Healthcare System in Using Leap Motion
India ‖,pp. 2-5,2017. Controller‖,IEEE,2016,110-112.
[4] Alsheakhali, Mohamed, Ahmed Skaik, [13] Ridho Rahman Hariadi, Imam
Mohammed Aldahdouh, and Mahmoud Kuswardayan,‖ Hand Gesture Recognition
Alhelou. "Hand Gesture Recognition With Leap Motion And Kinect Devices‖,
System." Information & Communication International Conference on Information,
Systems 132 (2011). Communication Technology and System
[5] Bhuiyan, Moniruzzaman, and Rich Picking. (ICTS),2016, 44-46.
―Gesture-controlled user interfaces, what [14] Fan Zhang, Shaowei Chu, Ruifang Pan,
have we done and what‘s next.‖ Proceedings Naye Ji , Lian Xi,‖ Double Hand-gesture
of the Fifth Collaborative Research Interaction for Walk-through in VR
Symposium on Security, E-Learning, Environment‖, ICIS 2017, , May , 2017,538-
Internet and Networking (SEIN 2009), 540.
Darmstadt, Germany. 2009.
[6] Singha, Joyeeta, and Karen Das. "Hand
gesture recognition based on Karhunen-

RECOGNITION OF FAKE INDIAN CURRENCY

NOTES USING IMAGE FORGERY DETECTION
Kishory Chavan1,Rutuja Padwad2,Vishal Pandita3,Harsh Punjabi4, Prof P S Raskar5,
Jigyasa Chadha6
1,2,3,4,5
Smt. Kashibai Navale College of Engineering, Vadgaon(Bk), Pune, India.
6
kishori0414@gmail.com1, rujuta.padwad@gmail.com2, vishal.pandita44@gmail.com3,
harsh.pun21@gmail.com4, psraskar@sinhgad.edu5, Jigyasachadha14@gmail.com
ABSTRACT
Every country faces the problem of counterfeit currency in the form of faux notes. With
advancement in technology it is very hard to identify counterfeit currency with the
naked eye. Many researchers have applied different methods to identify the fake notes
but they always seem to have problems in terms of accuracy. The presented work in this
paper is in this direction of verification of the new 500₹ and 2000₹ Indian currency
notes (w.e.f Nov 2016). The proposed system will be implemented using open cv
libraries and various image processing algorithms and feature extraction techniques to
detect fake Indian currency notes. The output is produced in the form of a Boolean
value where the currency is either fake or original.
KEYWORDS
Feature Extraction, Image Processing, Edge Detection, Image Forgery.
1. INTRODUCTION from a scanned document, a photo
Fake notes are one of the biggest problems document or a scene-photo.
occurring in cash transactions. Testing of Principal Component Analysis (PCA) is
notes manually is very time consuming used to reduce the dimensionality of a
and a very untidy process. Also, there is a dataset that consists of a number of
chance of tearing the notes while handling variables correlated with each other,
them manually. Manual testing can‘t give heavily or lightly, while keeping the
you a high efficiency rate either. The variation present in the dataset, up to a
proposed system uses the new and maximum extent.
emerging technologies like image
processing for detection of forgery in 2. LITERATURE SURVEY
currency notes, using various algorithms The advancement in technology has made
like S.U.R.F, PCA, OCR, Color Tracking. it possible to make counterfeit paper
Speeded Up Robust Features (S.U.R.F) is currency. Therefore, the issue of
a patented algorithm used mostly in efficiently distinguishing counterfeit
computer vision tasks and tied to object banknotes from genuine ones via
detection purposes. SURF falls in the automatic machines has become more and
category of feature descriptors by more important.
extracting key points from different The paper [5] presented by Faiz M.
regions of a given image and thus is very Hasanuzzaman, Xuedong Yang, and
useful in finding similarity between YingLi Tian focuses on a component-
images. based framework for banknote recognition
Optical character recognition (OCR) is the using SURF has been proposed. It focuses
mechanical or electronic conversion of on developing a novel camera-based
images of typed, handwritten, or printed computer vision technology to
text into machine-encoded text, whether automatically recognize banknotes to
assist visually impaired people.

The paper [1] presented by Zahid Ahmed, compared. By comparing the values of the
Sabina Yasmin, Md Nahidul Islam, Raihan input to the reference values, the
Uddin focuses on building a software for denomination with the highest amount of
automated counterfeit currency detection significant similarity is selected
tool for Bangladeshi notes.The software The paper [2] presented by Sonali R.
detects fake currency by extracting Darade, Prof.G.R.Gidveer focuses makes
existing features of banknotes such as use of image processing techniques in
micro-printing, optically variable ink order to identify fake currency notes. The
(OVI), water-mark, iridescent ink, security automatic system is designed for
thread and ultraviolet lines using OCR identification of Indian currency notes and
(Optical Character recognition), Contour check whether it is fake or original. The
Analysis, Face Recognition, Speeded UP automatic system is very useful in banking
Robust Features (SURF) and Canny Edge system and other field also. In India
& Hough transformation algorithm of increase in the counterfeit currency notes
OpenCV. of 100, 500 and 1000 rupees. As increase
The paper [7] presented by Mohammad in the technology like scanning, colour
Shorif Uddin, Pronaya Prosun Das, Md. printing and duplicating because of that
Shamim Ahmed Roney focuses on there is increase in counterfeit problem
automated image-based technique is The paper [3] presented by P. Ponishjino,
described for the detection of fake Kennet Antony, Sathish Kumar, Syam
banknotes of Bangladesh. SVM (Support JebaKumar focuses on a system in which,
Vector Machine) classifier has been used the strip lines or continuous lines are
after extracting three security features that detected from real and fake note by using
are watermark, latent image and micro-text edge detection techniques. HSV
from the acquired images of the techniques are used to saturate the value of
banknotes.All the algorithms used in this an input image. The image processing will
proposed system have been implemented be implemented from the RSB to HSV
in MATLAB algorithm on the input image. The various
The paper [6] presented by Sahana characteristics of the paper currency will
Murthy, Jayanta Kurumathur, B Roja be cropped and segmented using ROI
Reddy focuses on software system, that algorithm.
uses image processing techniques in order The paper [4] presented by Pradeepa
to identify fake Indian currency notes. The Samarasinghe, L.K.P Lakmal, Weilkala
problem with current existing systems is A.V., W.A.N.P.C Wickramarachchi, E.R.S
the trade-off between complexity and Niroshana focuses on image processing to
speed. The selected security features for detect forgery in Driving licences of Sri
each denomination are analysed and the Lanka.
expected values for real notes are
Table 1:Literature Survey
Parameter Image Automatic Bogus Design and Our
Processing Recognitio Currency Implementatio Solutio
Based n of Authorizatio n of Paper n
Feature Fake n Using HSV Currency
Extraction Indian Techniques Recognition
of Currency with
Bangladesh Notes Counterfeit
i Bank
Notes
Accuracy High Low Average High High
Ease of Use Difficult Easy Difficult Difficult Easy

Hardware Complex Complex Complex Medium Medium

Requiremen
t
Complexity High High High High Medium
Quality of Good Very High High Quality High Quality Good
Image Quality Quality Image Image Quality
Required
Cost High High High Medium Low
robustness of this project is clearly
3. SYSTEM IMPLEMENTATION plausible later on.
PLAN
Security features of genuine banknotes: -
1. Security Thread
The security thread is in 2000₹ and 500₹
note, which appears on the right of the
Mahatma Gandhi‘s portrait. In security
thread the visible feature of ―RBI‖ and
―BHARAT‖. When note is held against the
light, the security thread can be seen as
Figure 1 - Features of 2000₹ Note one continuous line.
The system starts its performance based on
a training data set consisting of high 2. Water Mark
quality images of a genuine notes. A total The mahatma Gandhi watermark is present
of 5 security features out of the available 7 on the bank notes. The mahatma Gandhi
of the notes will be tested. The two most watermark is with a shade effect and
important security features are Security multidirectional lines in watermark.
Thread and Watermark. The result of
Security Thread and Watermark 3. Optically Variable Ink
Identification has to be positive in order Optically variable ink is used for security
for the note to proceed to the next steps. If feature; this type of feature is in the 2000₹
the result is negative, the note is declared and 500₹ bank note. Optically variable ink
as a fake note immediately. As, the system as security feature for bank note is
is capable of detecting five security introduced in Nov 2000. The
features of bank notes, the final state of denomination value is printed with the
this system will declare the note as a help of optical variable ink. The color of
genuine one, only when it can gain at least numerical 2000 or 500 appear green, when
3 success points. That means an accuracy note is flat but change the color to blue
of greater than or equal to 66.67%. This is when is held in an angle.
because, each of the five features are
strong enough to fight against 4. Latent Image
counterfeiting, but sometimes printing The latent image shows the respective
quality and rough usage can make the denomination value in numerical. On the
security features of genuine banknotes observe side of notes, the latent image is
fade. That is why some selected features present on the right side of Mahatma
may not be detected accurately. The Gandhi portrait on vertical band. When the
implemented system proves the control note is held horizontally at eye level then
logic of the whole project and the the latent image is visible.

5. Bleed Lines
There are angular bleed lines on left and 3) Gray Scale Conversion
right side of the note in raised print. Bleed The image obtained is in RGB format. It is
lines are used to help the visually impaired transformed into gray scale because it
people to identify the denomination of the takes only the intensity information which
notes. is easy to process than processing of three
components of RGB
6. See Through Register
The small floral design is printed in the 4) Edge Detection
middle of the vertical band and next to The Edge detection is a basic tool in image
watermark. The floral designed on the analysis, image processing, image pattern
front is hollow and in back is filled up. recognition and computer vision
The floral design has back to back techniques. Edge detection is basic tool
registration. The design will see as one particularly in the area of feature detection
floral design when seen against the light. and feature extraction.
7. Micro Lettering 5) Image Segmentation

The micro letters appear in between the Image segmentation involves the division
portrait of Mahatma Gandhi and vertical of image into regions or objects depending
band. Micro letters contain the upon the problem.
denomination value of bank note in micro For monochrome images, segmentation
letters. The denomination value can be algorithms are based on two basic
seen well under magnifying glass. properties of image intensity.
4. SYSTEM FLOW 6) Feature Extraction

1) Image Acquisition Feature extraction is the specific form of
The camera module is used for image dimensionality reduction. In feature
acquisition. The obtained image should extraction, we capture the visual content of
consist of all the required features. image for retrieval and indexing. When
input to the algorithm is too large to be
2) Pre-processing proceeding and it is having much data but
In pre-processing step, we perform main not more information, then input data will
data analysis and extract information. be converted into reduced representation
Furthermore, the unwanted distortions are set of features. Feature extraction makes
suppressed and some features are easy, the amount of resources required to
enhanced that are important for further describe the large set of data.
processing. It includes image adjusting and
image smoothening. 7) Comparison
In image adjusting, when the image is In comparison, the extracted feature of
obtained from camera module, the size of input image and extracted feature of
image is large and therefore to reduce the original image is compared.
size of image, image adjusting is used. In
this technique, for image adjusting 8) Output
interpolation is used. In image The output is displayed on LCD display.
smoothening, while using camera module The output is in currency denomination
and performing image transfer, some noise and a Boolean value that shows whether
will appear on the image. The important the note is fake or original.
step of removing noise is done by image
smoothening. For image smoothening,
convolution is used.

fc – feature counter
Result 66.67% - Genuine Note
Result 66.67% - Fake Note
6. PROBLEM IDENTIFICATION
Identification of fake notes is very useful
as it can be used by banks to distinguish
between original and fake notes but there
are certain issues regarding image
processing these are:
Figure 2 - System Flow Diagram 1)Motion Blur Problem
2)Noise imposed by image capture
instrument
5. CONTROL FLOW 3)Variety of notes.
4)Less efficient feature extraction.
7. CONCLUSION
The proposed software system will be very
useful in order to identify fake Indian
currency notes. The system will use
advanced Image Processing Algorithms
and be made available for free of cost to
everyone. Also, users will require
minimum hardware in order to access the
system and use it. The results are in the
form of a Boolean value which indicates
Figure 3 – Control Flow Diagram
whether the note is fake or original
The above figure represents the control
flow of the proposed system. The system 8. FUTURE SCOPE
starts by capturing a high-quality image of The system uses 5 distinct features in order
the note to be tested. The first and most to check the validity of the notes. In the
important feature that is tested is the future, the number of features can be
Security Thread. If system fails to verify increased in order to make the system
this feature then the note is directly more robust. The speed of the system can
declared as a fake one. On the other hand, also be increased by using advanced image
if the security thread feature test is passed processing technologies so that users can
then system starts a counter for each scan more notes in less amount of time.
feature of the note. Counter is increased by
1 if test is passed and kept as it if test is REFERENCES
failed. At the end the counter value is [1] Zahid Ahmed, Sabina Yasmin, Md Nahidul
divided by 5 (i.e. total number of features Islam, Raihan Uddin Ahmed ―Image
that will be tested) and multiplied by 100 Processing Based Feature Extraction of
to calculate the percentage. A minimum of Bangladeshi Banknotes‖ 2014.
[2] Sonali R. Darade, Prof.G.R.Gidveer
66.67% is required for the note to be
―Automatic Recognition of Fake Indian
declared as a genuine one. Currency Note‖ 2016 International Conference

on Electrical Power and Energy Systems Component-Based Banknote Recognition for

(ICEPES) 2016. the Blind‖ IEEE Transaction on systems, man,
[3] P. Ponishjino, Kennet Antony, Sathish Kumar, and Cybernetics—Part C: Applications and
Syam JebaKumar ―BOGUS CURRENCY Reviews, VOL. 42, NO. 6, November 2012.
AUTHORIZATION USING HSV [6] Sahana Murthy, Jayanta Kurumathur, B Roja
TECHNIQUES ―International Conference on Reddy ―Design and Implementation of Paper
Electronics, Communication and Aerospace Currency Recognition with Counterfeit
Technology 2017 Detection‖ 2016 Online International
[4] Pradeepa Samarasinghe, L.K.P Lakmal, Conference on Green Engineering and
Weilkala A.V., W.A.N.P.C Wickramarachchi, Technologies (IC-GET)
E.R.S Niroshana ―Sri Lanka Driving License [7] Mohammad Shorif Uddin, Pronaya Prosun
Forgery Detection‖ 2017 Fourth International Das, Md. Shamim Ahmed Roney ―Image-
Conference on Image Information Processing Based Approach for the Detection of
(ICIIP) Counterfeit Banknotes of Bangladesh‖ 2016
[5] Faiz M. Hasanuzzaman, Xiaodong Yang, and 5th International Conference on Informatics,
YingLi Tian ―Robust and Effective Electronics and Vision (ICIEV).

DETECTION OF SUSPICIOUS PERSON AND

ALERTING IN THE SECURITY SYSTEM
Avani Phase1,Purva Puranik2,Priyal Patil3, Rigved Patil4,Dr Parikshit Mahalle5, D. D. Shinde6
1,2,3,4,5
Department of Computer Engineering, Smt Kashibai Navale College of Engineering, vadgaon(Bk), Pune,
India.
6
avaniphase68@gmail.com1,purva.puranik@gmail.com2,preeyal.patil@gmail.com3,
rigved.2017@gmail.com4,aalborg.pnm@gmail.com5, ddshinde14@gmail.com
ABSTRACT
The widely applied security systems like CCTV cameras in existence today does not
help in taking real time actions to prevent theft / robbery. Secondly the existing system
does not give any notification and relies on manual supervision. Problem with such
system is that the criminal acts can be traced / investigated only after the happening of
the act. To overcome this lacuna and improve the existing security system, development
of an image processing, machine learning and sensor based security system is
presented in this project. Such an enhanced security system will obtain the input from
live CCTV footage. After capturing frames from surveillance video, image pre-
processing will be applied to it to obtain enhanced image by background subtraction
technique. Further the system will classify the objects in the image as human / non-
human through HOG and SVM classifier and track the presence and activity of human
w.r.t. pre-defined time. As an outcome of this, if the system finds the activity as
suspicious, it will generate a notification to the authorities in the form of an email or a
text message. A user interface consisting of website will be provided by the system
where user can login the website to see live CCTV footages and can also perform
various operations on it if required.
Keywords
Closed Circuit Television (CCTV), Support Vector Machine (SVM), Histogram of
oriented gradients (HOG), and Human Activity Recognition (HAR)
1. INTRODUCTION its highest level, this problem addresses
Human detection and tracking is one of the recognizing human behaviour and
most popular areas of video processing understanding intent and motive from
and the essential requirement of any observations alone. This is a difficult task,
surveillance system. Moving objects often even for humans to perform, and
contain important information for misinterpretations are common. In the area
surveillance videos, traffic monitoring, of surveillance, automated systems that
human motion capture etc. Human observe and detect dangerous actions are
detection in a video sequence is one of the becoming important. Many areas currently
most difficult problems in the field of have surveillance cameras in place,
image processing and computer vision. however, all of the image understanding
Human detection in a video sequence is and risk detection is left to human security
the key step for all applications for personnel. This type of observation task is
automatic processing of video data. But not well suited to humans, as it requires
the complexity of the task is mainly related careful concentration over long periods of
to the difficulty of correctly modelling the time. Therefore, there is clear motivation
humans because of their great variability to develop automated intelligent vision-
in physical appearance, pose, variation of based monitoring systems that can aid a
lighting, etc. The problem of using vision human user in the process of risk detection
to track and understand the behaviour of and alerting. Further the proposed system
human beings is a very important one. At will also provide a user interface for all

authority members. The user interface The paper also noted usage of Histogram
consist of a website where user needs to of Oriented Gradients. HOG is based on
login using his credential to gain access to the principle that the local appearance and
real-time CCTV footage. Once logged in shape of the object can be described by the
notification will be send to the user so that intensity distribution of the gradients or
he can invigilate all the recording and take the direction of the contours. The gradient
necessary actions as per requirements. of an image is a vector quantity that
indicates how the intensity of the pixel
2. LITERATURE REVIEW varies in space. The gradient is computed
In 2009, Thombre D.V. et al proposed by convolution of the image with a first
‗Human Detection and Tracking using derivative mask. The function of the
Image Segmentation and Kalman Filter‘[1] Support Vector Machine (SVM) is to give
For human detection, usage of image a decision about the candidate's belonging
segmentation technique and for human to the class sought. Learning is done from
tracking, Kalman filter with two a database of positive examples (class
dimension constant velocity model was containing the characteristics of the
talked about. The Kalman filter is a set of examples of Humans) and negative
mathematical equations that provides an examples (class containing the
efficient computational means to calculate characteristics of examples of no
the state of a process, in a way that Humans). By taking the characteristics of
minimizes the mean square error. This the candidate image as input, the classifier
method tracks individual pedestrians as must determine the class closest to the
they pass through the field of vision of the candidate image. In most cases, this step is
camera, and uses vision algorithms to the last step of the process since once
classify the motion and activities of each recognized by the classifier; it is enough to
pedestrian. The tracking is achieved display the windows of detection.
through the development of a position and In 2014, Resmi R et al. proposed ‗Video
velocity path characteristic for each human Image Processing for Moving Object
using a Kalman filter. By making use of Detection and Segmentation using
this information, the system can bring the Background Subtraction‘. [3] The key
incident to the attention of human security concept was using Background subtraction
personnel. method for moving object detection in
The paper titled ‗Human Detection using videos whereas using segmentation for
HOG-SVM, Mixture of Gaussian and detecting various features of moving
Background Contours Subtraction‘ was objects for further video/image processing.
given by Houssein Ahmed et al.[2] In Background Subtraction generates a
mixture of the Gaussian (MOG) foreground mask for every frame. This
modelling, a statistical process step is simply performed by subtracting
independent of all other pixels is applied to the background image from the current
each pixel by comparing it with the set of frame. When the background view
models existing in this location to find a excluding the foreground objects is
matching. The parameters of the available, it becomes obvious that the
corresponding model are updated based on foreground objects can be obtained by
a learning factor. If no match is found, the comparing the background image with the
least probable model is eliminated and current video frame. Segmentation is a
replaced by a new Gaussian with the significant part in image processing. Image
current pixel values. Also, background segmentation is the division of an image
contours subtraction are less sensitive to into regions or categories, which
light changes, thus they can be helpful in correspond to different objects or parts of
human detection. objects.

Optical flow is a technique in which recognise human activity whether it seems

intensity of pixels is calculated in a to be suspicious or not. Furthermore, if any
sequence of images of motion detection. attempt is made to enter the bank
Optical flow is vector based approach that forcefully, an alarm will be triggered in an
estimates motion in video by matching attempt to prevent crime.
points on objects over multiple frames. B Existing human detection solutions suffer
Jagadeesh, Chandrashekar M Patil has in their effectiveness and efficiency.
given a detail description on Video Based Current scenario focuses on three main
Action Detection and Recognition Human stages: low level (Detection), intermediate
using Optical Flow and SVM Classifier [4] level (Tracking), and high level
Later Amel Ben Mahjoub et.al. have (Behavioural Analysis). Previously HAAR
described Contribution to the realization of transform in image processing was used
a video surveillance system by optical and it gave many false positives for human
flow and stereovision [5]. detection. HAAR is more suitable for face
The paper titled ‗Histograms of Oriented and eye detection. HAAR-like features are
Gradients for Human Detection‘ was digital image features used in object
written by Navneet Dalal and Bill Triggs. recognition. They owe their name to their
[6] The detector window is used to scan intuitive similarity with HAAR wavelets
the input image and apply HOG in each and were used in the first real-time face
position. The detector window has a detector. Because such a HAAR-like
detection window of size 64ˣ128 which feature is only a weak learner or classifier
includes about 16 pixels of margin around (its detection quality is slightly better than
the person on all four sides. This extended random guessing) a large number of
border provides a sufficient amount of HAAR -like features are necessary to
context that helps detection. Decreasing it describe an object with sufficient
from 16 to 8 pixels (48×112 detection accuracy. In the Viola–Jones object
window) decreases performance by 6% at detection framework, the HAAR-like
10⁻⁴ false positives per window (FPPW). features are therefore organized in
If we increase the size of person in image something called a classifier cascade to
keeping the same window size of 64×128 form a strong learner or classifier. But
will also cause a s loss of performance. when it was used for human or pedestrian
detection it performed poorly. On the other
3. GAP ANALYSIS hand combination of SVM and HOG gives
Various works have been carried out so far better results for human detection.
to upgrade the existing security system for HOG+SVM works as follows: HOG is a
all banking sectors. Generally security was local descriptor that uses a gradient vector
provided through CCTV cameras and orientation histogram and SVM is a
strong security systems which consisted of classifier with good generalization power
network of videos monitored by a human that uses the features extracted by the
operator who needs to be aware of the descriptor. HOG features are good at
activity in the camera‘s field of view. Then describing object shape hence good for
technology was added to make the security human detection. Whereas HAAR features
systems more efficient. For example, smart are good at describing object shading
door locking systems came into picture. hence good for frontal face detection. As
But these techniques still never ensured we are more concentrated to detect human
100% security. Our aim is to provide such figure in our initial stage we aim to use
a software that will identify presence of SVM+HOG approach for our desired
humans from the surveillance videos system so that all loopholes of HAAR can
captured by CCTV cameras during night be covered.
time(some predefined slot) and also try to

We also aim to detect human behaviour in from the frame and foreground image is
the restricted area using additional generated. The background is removed by
algorithms. This algorithm will initially use of Gaussian filtering. The foreground
detect the human being and then try to image may consist of different objects.
track him and keep record of the time The foreground image will be given to the
period for which he is present in the Hog feature descriptor for target area
restricted area. The system can be made classification.
more robust by using sensors for motion [2] HOG Feature Descriptor:
detection along with the system that will A feature descriptor is a technique of
track time. representation of an image which
simplifies the image by extracting the
4. PROPOSED WORK useful information from image for
To overcome the gaps present in existing classification. HOG initially calculates
systems the proposed security system will gradients i.e. pixel intensities of the entire
identify suspicious activity using image for determining background and
movement of human and presence of foreground. But calculating the gradients
human after some predefined time. As of the entire image increases the
Figure No.1 illustrates the real world video computational time and makes it difficult
is fed to the system. The video is made up to use HOG in real life problems. To
of number of frames and these frames are overcome this drawback the feature
given for pre-processing. The pre- descriptor is provided with the foreground
processing stage will provide foreground image as an input. HOG will identify the
image by use of Background Subtraction. features of all the objects present in the
The pre-processed image is then given to foreground image. The information
HOG feature descriptor which will identify collected by this technique will be given
the target area in foreground image. The for further classification to SVM classifier.
identified target area is given to SVM [3] SVM Classifier:
classifier for Human or Non-Human Support Vector Machines are supervised
classification. If Human presence is learning models. These models have
detected the system will identify algorithms associated with them for the
suspicious activity using motion tracking purpose of analysing the data for
and time tracking. Motion tracking uses classification or regression. SVM classifier
Optical Flow for detecting the movement is trained on a set of examples for
and time tracking is carried out by using classifying the objects in the examples in
HOG repeatedly on the present human till different categories. In the proposed
the predefined time exceeds. If the activity system the SVM classifier will be trained
is classified as suspicious then the to categorize the detected object as human
responsible authority will be notified by or non-human.
sending a notification. If activity is not [4] Motion Tracking:
suspicious then there is no output from the If the object detected by the SVM
system. classifier is categorized as human then
[1] Pre-processing: motion tracking using optical flow is
The real world video is initially split into applied for determining suspicious
number of frames. These frames are given activity. Optical flow is flow of pixels that
to the system for initial human detection. generates a pattern by using movement of
There are various techniques available for objects in visuals. The sequence of frames
human detection of which background obtained from video surveillance will be
subtraction is more computationally used to estimate motion of the detected
efficient and robust. In background human figure. Optical flow can estimate
subtraction, the background is removed motion between two image frames by

using various differential methods such as:

Lucas–Kanade method Horn–Schunck
method Buxton–Buxton method Black–
Jepson method.
[5] User Interface:
The user interface consist of a website
where user needs to login using his
credential to gain access to real-time
CCTV footage. Once logged in
notification will be send to the user so that
he can invigilate all the recording and take
necessary actions. If the user finds that, a
certain activity is suspicious while viewing
real time CCTV footage then he can
trigger an alarm to avoid any harm to the
system or the environment. There might be
a scenario when the user will not be
available to invigilate the crime scene, at
that time a notification will be send to
inform about any suspicious activity. If the
user wants to access the past recording
then a set of data will be provided to him. Figure No. 1: Proposed Model of Suspicious
If the user wants to access a recording to a Human Detection & Alerting
specific time, then using that input
recording will be played starting from that 5. MATHEMATICAL MODEL
point. If the user wants to switch to the Image Pre-processing:
live footage, then clicking on a button the The HOG used for detecting humans is
user can shift to it. Sometime when an computationally very expensive. Therefore
activity is found suspicious and then the to narrow down the search, we use
user can zoom in to have a closer look and background subtraction technique to locate
that can be done using zoom in and zoom the intruder first and then apply HOG for
out button. feature extraction. For background
subtraction will be carried out using
Gaussian Mixture-based
background/foreground segmentation
algorithm.
HOG feature descriptor used for human
detection is calculated on a 64×128 patch
of an image. But an image frame may be
of any size. So HOG applies feature
extraction on multiple patches in the
image. The only constraint is that the
patches being analysed have a fixed aspect
ratio.
 Calculating the Gradient Images
To calculate a HOG descriptor, first
horizontal and vertical gradients are
calculated. This is achieved by filtering the
image with the following kernels as shown
in Figure No. 2.

1 element vector and it can be normalized

just the way a 3×1 vector is normalized.
The window is then moved by 8 pixels and
a normalized 36×1 vector is calculated
over this window and the process is
Figure No. 2: Horizontal & Vertical Kernels repeated.
The magnitude and direction of gradient
will be calculated using the following  Calculating the HOG feature vector
formula. To calculate the final feature vector for the
g = gx2 + gy2 entire image patch, the 36×1 vectors are
= arctan (gy / gx) concatenated into one giant vector. There
Where g is gradient, gx and gy are are 7 horizontal and 15 vertical positions
magnitude along horizontal and vertical making a total of 7 x 15 = 105 positions.
direction. is the angle between the Each 16×16 block is represented by a
horizontal and vertical axis. 36×1 vector. So when we concatenate
 Calculating Histogram of Gradients in them all into one giant vector we obtain a
8×8 cells 36×105 = 3780 dimensional vector.
In this step, the image is divided into 8×8 SVM Classifier:
cells and a histogram of gradients is We will train SVM classifier using sample
calculated for each 8×8 cells. An 8×8 images of two classes, human and non-
image patch contains 8x8x3 = 192 pixel human. We will create separate training set
values. The gradient of this patch contains and test set. We will use cross validation
2 values (magnitude and direction) per technique for training the SVM classifier.
pixel which adds up to 8x8x2 = 128 Figure no. 3 below shows an example of
numbers. two class SVM classifier with two
The next step is to create a histogram of dimensional feature data.
gradients in these 8×8 cells. The histogram
Figure no. 2 contains 9 bins corresponding
to angles 0, 20, 40 … 160.The
contributions of all the pixels in the 8×8
cells are added up to create the 9-bin
histogram.
Figure No. 3: Two Class SVM Classifier

 The instances of the classes are shown
by dots, the dotted line is boundary and
the dashed lines define the margins on
Figure No. 2: 9 Bin Histogram either side.
 16×16 Block Normalization  Positive class denotes presence of
Histograms obtained in previous step are human and negative class denotes
normalized so they are not affected by absence of a human.
lighting variations. Now, normalization is  Circle instances are support vectors.
done on over a bigger sized block of
 Motion descriptors using Optical Flow:
16×16. A 16×16 block has 4 histograms
which can be concatenated to form a 36 x
Optical flow is a natural way of describing Least Squares criterion. Lucas–Kanade

how a person is moving with limited method is a local approach as it considers
appearance based influences such as from only the local neighbourhood of a pixel.
different clothing and body shapes. With This method is simple, easy for calculating
the help of optical flow algorithm the time derivatives.
displacement of pixel between frames if It is assumed that the optical flow equation
measured. By knowing the intensity value, is applicable for all the pixels within a
the same pixel can be identified. Let δt be window which is centred at pixel ‗p‘. The
the incremental change in time, (δx,δy) be Lucas-Kanade method assumes that the
the change in pixel coordinate then the image contents between two consecutive
image constraint equation is given by frames is small and approximately
constant within a neighbourhood of the
I (x+δx, y+δy, t+δt) = I (x, y,t) +∂I∂x δx point ‗p‘.
+∂Iy δy +∂It δt
Ix(p1) Vx + Iy(p1) Vy = - It(p1)
If the intensity level does not change with Ix(p2) Vx + Iy(p2) Vy = - It(p2)
incremental change in time, then equation .
can be written as, ..
I(x+δx, y+δy, t+δt) = I(x,y,t) Ix(pn) Vx + Iy(pn) Vy = - It(pn)
where p1, p2…, pn are the pixels in the
And it follows that, defined neighbourhood of pixel p , Ix(pi),
∂I∂x δx + +∂I∂yδy + ∂I∂t δt = 0 Iy(pi) and It(pi) are the partial derivatives
This is referred as Optical Flow Equation of the image with respect to pixel
(OFE) coordinate (x,y) and time t, taken at point
The OFE can be conventionally written as, pi and current time interval.
Following are the steps in feature tracking:
IxVx + IyVy + It = 0 [1] Intensity of each pixel is calculated.
IxVx + IyVy = - It [2] For every pixel position, gradient,
matrix is calculated and the Eigen value
Where Ix, Iy and It are the spatial and of the matrix is stored.
temporal intensity derivatives which [3] Each pixel position is stored in the
should be computed for every frame. score matrix S
[4] Separate the high scoring pixels by flag
In the compact form above equation can be matrix F with region size of k and flag
written as, region size of f.
(Ix, Iy). (Vx, Vy) = -It [5] Only the top n Eigen values is
∇I = - It considered and use those for the
Where ∇I = (Ix, Iy) is the spatial intensity tracking features.
gradient of the image and = (Vx, Vy) is the
image velocity or optical flow at (x, y) at 6. CONCLUSION
time instant t. The product ∇I is called 2D Existing security systems lack in taking
motion timely action and prevent robbery. This is
because, though the CCTV footage helps
Lucas–Kanade method is the most widely in finding suspicious people, but it does
used method employed for optical flow not help in avoiding the crime from taking
estimation. It considers that flow is place. Based on the proposed system, we
basically constant in the local conclude that the security system will help
neighbourhood of the considered pixel and in improving the security aspects of
solves the optical flow equation for all the detecting the suspicious human presence at
pixels in the given neighbourhood using commercial places like banks. The system

when implemented will help in rapid Moreover, intelligent video algorithms,

detection of human activities and presence such as sophisticated motion detection, can
also it will take quick actions such as identify unusual walking patterns and alert
alerting the central authorities by sending a a guard to watch a particular video screen.
notification or ringing an alarm if door is Traditionally, intelligent video algorithms
opened before the prescribed time and will are components of a computer system in a
help to overcome the criminal activities security room to which video captured by
like robbery. Thus the proposed system an array of CCTV cameras is fed. But on
will make the currently used system more the horizon, manufacturers can make
secure. cameras that can process the intelligent
video algorithm right inside the camera.
7. FUTURE SCOPE
Thermal security cameras are effective day REFERENCES
and night, and in any environment. Thus, [1] Thombre D.V, Nirmal J H, LekhaDas ,‗Human
in future, instead of using normal CCTV, Detection and Tracking using Image
Segmentation and Kalman Filter‘, International
thermal security cameras can be used Conference on Intelligent Agent and Multi-
which will also overcome a lot of Agent Systems, Year 2009, Page: 1-5.
challenges faced by the current system.
Every person, object and structure emits [2] Abdourahman Houssein Ahmed ; Kidiyo
infrared waves which are detected clearly. Kpalma ; Abdoulkader Osman Guedi, ‗Human
Detection using HOG-SVM, Mixture of
Infrared is a part of the electromagnetic Gaussian and Background Contours
spectrum with wavelengths that are longer Subtraction‘, 13th International Conference on
than visible light. Thermal cameras detect Signal-Image Technology & Internet-Based
infrared energy in the mid-wave infrared Systems (SITIS), Year: 2017, Page: 334 – 338.
(MWIR) spectrum at 3-5 microns or long-
[3] Anaswara S Mohan, R Resmi, ‗Video Image
wave infrared (LWIR) at 8-12 microns. Processing for Moving Object Detection and
Infrared energy emitted by a scene being Segmentation using Background Subtraction‘,
viewed is focused through the specialized First International Conference on
lens of an infrared camera on to the Computational Systems and Communications
camera's focal plane array (FPA). The (ICCSC), Year 2014, Page: 288 – 292.
FPA uses materials that respond by [4] B Jagadeesh, Chandrashekar M Patil, ‗Video
generating electrical impulses when Based Action Detection and Recognition
infrared energy strikes it. These electrical Human using Optical Flow and SVM
impulses are sent in the form of Classifier‘, IEEE International Conference on
temperature values to an image signal Recent Trends in Electronics Information
Communication Technology, Year 2016, Page:
processor that turns them into interpretable 1761 – 1765.
video data. Thermal technology provides
clear, high-resolution images in all [5] Amel Ben Mahjoub, Hamdi Bou Kamcha,
environments, through smoke, haze, dust, Mohamed Atri, ‗Contribution to the realization
light fog and darkness. Thermal imaging of a video surveillance system by optical flow
and stereovision‘, Global Summit on Computer
cameras are often the ideal choice for 24- & Information Technology (GSCIT), Year
hour surveillance. Thermal imaging 2014, Page: 1 – 5.
cameras can detect recently driven
vehicles that still have warm engines. [6] Histograms of Oriented Gradients for Human
Thermal imaging cameras can also detect Detection by INRIA Rhàone-Alps, 655 avenue
de l 'Europe, Montbonnot 38334, France,
recently disturbed ground, footprints and Navneet Dalal and Bill Triggs.
hidden objects with heat traces.

ADAPTIVE COMPUTER DISPLAY FOR

PREVENTING COMPUTER VISION SYNDROME
Manpreet Kaur1, Dhanashri Yadav2, Ruhi Sharma3, Aman Katiyar4, Bhakti Patil5
1,2,3,4,5
Computer Science Department, AISSMS COE,Pune,India
manpreetkapse@gmail.com1,
amanrosary@gmail.com , yadavdhanashri23@gmail.com3, sharma.ruhi1316@gmail.com4,
2
bapatil@aissmscoe.com5
ABSTRACT
When you work at computer, your eyes have to focus and refocus all the time.They
move back and forth as you read. Your eyes react to changing images on the screen so
your brain can process what you're seeing. This requires a lot of efforts from your eye
muscles. Computer work gets harder as you age and the lenses in eyes become less
flexible. There is no proof that computer use can cause permanent eye damage but
regular use can cause eye-strain and discomfort. This condition is also called as
Computer Vision Syndrome.Computer screens are causing damage to human eyes
increasingly day-by-day. Exhaustive use of computers is causing various diseases
including Computer Vision Syndrome. Computer Vision Syndrome can cause eye-
strain, headaches ,blurry vision ,dried eyes ,shoulder-neckpain,etc[2].In this paper we
are proposing a system that tries to reduce the impact of Computer Vision Syndrome on
human eyes .This system tries to reduce the impact of CVS by dealing with light that is
emitted through electronic displays.
General Terms
Computer Vision Syndrome ,Eye blinks.
Keywords
Face Detection ,Eye detection ,Blink detection.

1. INTRODUCTION 2. LITERATURE SURVEY

In this paper we are proposing
implementation of a system that will
prevent computer users from the impacts
of Computer Vision Syndrome. The
complete solution or the complete
prevention from Computer Vision
Syndrome is not possible. Our System
does not prevent the disease completely
put it helps reducing the symptoms of the
computer vision syndrome .This system is
implemented using basic hardware
requirements .Our system requires only
webcam as an extra hardware. This
requirement can be eliminated if the user
uses a laptop with built-in web cam.
Average adult human being blinks every 4
seconds[1].For successful prevention of
the CVS along with our system users need 3. PROPOSED SYSTEM
to follow 20-20-20 rule[4]. Applications based on eye-blink detection
There is an alarming increase in health have increased because it is suitable for
issues caused due to continuous computer people who are either normal or disabled
use. This has been matter of great concern (that only can blink their eyes).
globally. Computer Vision Syndrome can Determining eye status (i.e. Open or
cause blurred vision, dry red eyes, closed) is more difficult than just
headaches etc. In this we are proposing a determining eye locations, because of their
system to take preventive measures against small region occupancies on the face
Computer Vision Syndrome (CVS). While information or weak contrast between the
using computer light rays and radiation are eye and the surrounding skin . The
faced by our eyes. These radiation can proposed system is a new system for
cause damage to our eyes.[8] detecting eye blink feature from a video. It
To prevent Damages from any external does not require any dedicated and
damage our eyes blink regularly but while expensive hardware. A webcam is the only
we are focusing onto something or device required. This section introduces
watching towards computer screen, our the architecture of the proposed system[5]
eye blinking rate decreases gradually. This 4. System Architecture
can cause an increase in damage from In this system we are using very basic
computer screen[1]. hardware requirements such as webcam.
The webcam continuously scanning the
human face infront of it. In the detected
face eye detection is done. After detecting
the eyes ,our system then tries to detect the
eye blinks.[3]
Number of eye blinks are recorded and are

compared with the threshold values after
certain time interval (5 to 10 minutes).If
the number of eye blinks during this time
is in considerable range of the threshold
value then we can conclude that blinking

rate is normal but if the user rate is

considerably below the threshold value
then the display brightness is reduced
accordingly. This system can be
implemented using canny-edge detection
algorithm for eye blink detection[1] and
eye aspect ratio algorithm(EAR) [2] also
performs the same task but employs Algorithm
different methodologies.In our system we 1. Detect 6 landmarks on eyes.
are using EAR algorithm. EAR algorithm
uses facial landmarks for eye blink 2. Calculate EAR with 6 landmarks.
detection which are provided with a pre-
compiled file.[6]
3. This EAR is constant for open eyes and

decrease towards zero while closing.
4. If the EAR decrease dramatically

towards zero then record a blink.
5. Calculate the number of blinks for a

certain time limit.
6. If the blink rate is below threshold value

for that time span then decrease the screen
brightness.
Fig 1.System Architecture
Mathematical model 7. If the blink rate is above the threshold
P1= Scan Eye value then increase the screen brightness.
P2= Facial Landmark Detection
P3=Eye Blink Detection 5. CONCLUSION
P4=Generate Report We believe such interactive system can be
P5=Adjust Brightness used to detect dangerous eye behaviour
PP= Unknown Exception during computer work and can help in
PE= Known/Unknown Error timely prevention of CVS related systems
.This system also adjusts the screen
brightness without any human intervention
hence it can be totally automated
process.By using this system users will not
be able to totally avoid the effects of
computer vision syndrome but they can
reduce the impact on their eyes.
6. FUTURE SCOPE
[7]In Future iterations, we can try to
implement this system for wide range of
de-vices such as mobile phones, smart
TVs,Tablets etc.This system performs
poorly in lightning constraint
environments so we can try to improve the

low light per-formance of the system. We Detection System", 2014 International Journal
can improve the performance of the of Computer Applications.
[4] Tereza Soukupova, Jan Cech, "Real Time Eye
system for the users wear spectacles. The Blink Detection using Facial Land-marks".
head orientation can also be taken into [5] http://www.sankaranethralaya.org/patient-care-
account for eye detection in future cvc.html
iterations of the product. Current version [6] https://www.aoa.org/documents/infographics/S
only adjusts the screen brightness but in YVM2016Infographics.pdf
[7] Cai-Xia Deng, Gui-Bin Wang, Xin-Rui Yang,
future versions we can adjust different "Image Edge Detection Algorithm Based on
light settings like adjusting or preventing Improved Canny operator", 2013 IEEE.
blue light from the display or other [8] Seongwon Han, Sungwon Yang, Jihyoung
environment oriented light setting Kim and Mario Gerla ,"EyeGuardian:A
REFERENCES Framework of Eye Tracking and Blink
[1] So_a Jennifer, Sree Sharmila, "Edge Based Detection for Mobile Device Users",
Eye-Blink Detection for Computer Vision [9] Xun Wang, Jianqiu JIN, An Edge Detection
Syndrome", 2017 IEEE. Algorithm Based on Improved Canny
[2] Richa Mehta, Manish Shrivastava, "An Operator, Seventh International Conference on
Automatic +Approach for Eye Tracking and Intelligent Systems Design and Applications,
Blink Detection in Real Time", 2012 IJEIT. 623-628, 2007.1. Clerk Maxwell, Treatise on
[3] Mai K. Galab , H.M.Abdalkader and Hala H. Electricity and Magnetism, 3rd ed., vol. 2.
Zayed ,"Adaptive Real Time Eye-Blink Oxford: Clarendon,1892, pp.687.

AAS [AUTOMATED ATTENDANCE SYSTEM] USING FACE DISCERNMENT

AND RECOGNITION USING FASTER R-CNN, POSE CORRECTION & DEEP
LEARNING
Mohit Vakare , Amogh Agnihotri2, Adwait Sohoni3and Sayali Dalvi4,
1
Prof. Araddhana Arvind Deshmukh5

1,2,3,4
Department of Computer Engineering, Smt Kashibai Navale College of Engneering, Vadgaon(Bk),
Pune, India.
5
mohitvakare@gmail.com1, amogh2222@gmail.com2, addysohoni@gmail.com3, sayalidalvi97@gmail.com4
, aaradhna.deshmukh@gmail.com5
ABSTRACT
In many Institution and Organization the attendance is a very important factor to maintain the
record of lectures, salary and work hours etc. Most of the institutes and organizations follow
the manual method using old paper and file method and some of them have shifted to biometric
technique. The current method that colleges use is that the professor passes a sheet or make roll
calls and mark the attendance of the students and this sheet further goes to the admin
department with updates the final excel sheet. This process is quite hectic and time consuming.
Also, for professors or employees at institutes or organizations the biometric system serves one
at a time. So, why not shift to an automated attendance system which works on face recognition
technique? Be it a class room or entry gates it will mark the attendance of the students,
professors, employees, etc. There are several alternatives like biometric-based attendance,
RFID based attendance, and Image Recognition-based attendance, but these techniques are
less efficient, high costing and have less accuracy. Our proposed system, focuses on Attendance
System using Deep Learning which includes Face Detection and Face Recognition. The
proposed system delivers a robust, accurate, scalable and affordable system, which can work
efficiently in diverse environments as well.
Keywords – Faceness-Nets, Eigen Faces, PCA, Deep Learning, CNN, Pose Correction.
1. INTRODUCTION software system using face detection and
This project aims to develop an recognition features of OpenCV library
automated system for human face using deep learning concept through which
recognition for organizations or user can take the attendance of the students
institutions to mark the attendance of with ease and efficiency. The objectives of
present students in the respective classes. the project will be to present an automated
Obtaining and Tracking students‘ attendance system to Institutes and
attendance manually, resulting in wastage Organisations, with the development of an
of time and high scale error probability are android app to input the photo which is to
the problems faced by lecturers or anyone be processed which will use the above
using current attendance system. In order methods to mark the attendance of present
to solve these problems and avoid the students which are registered in the
errors we suggest to automate this process database. In later stages if the user wants
by providing facial detection and to check the attendance of a particular
recognition system that will record and student or of a class, then the system can
manage students‘ attendance. The provide the necessary details for the same
proposed system will make use of so that user can access it at any time and
OpenCV to detect faces and cross verify remotely. The scope of the project will be
them with existing database and update the to provide users with convenient and
attendance automatically with minimal reliable attendance system which will have
error margin.The basic aim of the project enhanced accuracy results. Provide a
is to develop an attendance system which system which can give better results using
will use face recognition and detection to optimized algorithms in comparison to the
mark attendance of the present students. present ones. Give emphasis on
The idea behind this project is to design a recognizing more number of faces per

photo while maintaining its accuracy, device in the classroom. Students place
resulting in less fault rates. This their fingers on the sensor to mark the
application can be further extended to attendance. The GUI application on the
work areas such as Educational host computer helps the teacher to manage
organizations and Conference attendance. the attendance. The portable device needs
Make the records of attendance available to be handled with care and requires
irrespective of place and time. The basic technical knowledge to operate it. Also it
idea is to train manually is a time consuming process. The device is
developed/modified regressor and rotated in classroom which creates a
classifier to learn labeling of data disturbance in regular lecture.[1]
automatically and reduce error via RNN. S. Konatham et.al Here, the
Product will be working in Client-Server proposed model makes use of RFID tags
environment. Client being teachers or the and GSM for automated attendance
supervisors in the class or other management in an institute. Every student
educational organizations. The collected will have a unique RFID card, just like an
data will then be stored in the server and identity card. RFID card readers are
will be cross verified with the datasets installed at the entrance of the classroom.
which is already there in the server, after These readers have a built-in
which the attendance of the students will microcontroller, which matches the RFID
be updated accordingly. card of student with the RFID registered in
A. Motivation database. If a match is found then the door
Attendance is one of the important factors is opened. This is done using GSM
of assessment of student. Current module. The main disadvantage is that
attendance system is managed manually every student have to carry a RFID card to
in most of the places which wastes the enter the classroom. If a student forgets his
time of teacher, lecturer, etc. Managing card, there must be some system to mark a
attendance manually on paper or non true attendance. Also there is chance
electronically is tedious and accessing it of students adopting fraudulent methods
on requirement with time and place for marking some other person‘s
barriers is near to impossible. With the attendance. Also if the card is swiped more
help of face recognition by processing than once, there is a chance of marking his
image taken via mobile devices the attendance twice. Students need to stand in
automation of whole system is possible a queue which is as time consuming as
which will reduce the time wastage and traditional roll call system [2].
human errors in attendance with great S. Noguchi et.al This is similar to
scale. The attendance data can be stored RFID access card system, only the cards
on web server and can be accessed are accessed on students‘ android phones
remotely through website. Hence itself. It makes use of Bluetooth Low
accessing data gets easy. Energy (BLE) beacon device to transmit a
magic number necessary for proper
2. LITERATURE SURVEY registration. Only the Android devices of
K.P.M. Basheer et.al the students present inside the classroom
Fingerprint attendance system was one of receive the signals of the beacon device
the attempts to introduce automation in that carries the magic number. Then the
attendance system. This system consists of students run the application and register
two sections, one portable device and a their cards using Near Field
host computer. Fingerprint module is the Communication (NFC) reader to mark the
heart of portable device. It consists of a attendance. The main disadvantage is that
fingerprint sensor. Attendance is registered it requires a NFC device for registration of
by rotating the portable finger recognition access cards onto the system which

demands technical support. Any student form of image from Red Green Blue
who is not inside the class but fall within (RGB) format that obtained from capturing
the Bluetooth area limits can also mark images by using built-in webcam and
his/her attendance.[3] secondary data in the form of xml
S. Kadry et.al In this paper, a classifier file for face detection and
wireless iris attendance management recognition process. Robot has costly
system is designed and implemented using hardware specification and requirements.
Daugman‘s Algorithm. This system Data pre-processing for Red Green Blue
consists of iris verifying and identifying, conversion is an increased overhead and
managing iris of users, system setting and time-taking. The system uses RGB form of
managing wireless communication. The data as input for face detection and
shortcomings of this system are that there recognition which gives reduced
must be a managing PC nearly and it is efficiency. Eigen faces is one of the
difficult to lay the transmission lines simplest but least accurate method.
where topography is bad.[4] Efficiency provided by Eigen faces is
S. Chintalapati et.al This system, much poor.[7]
which is based on face detection and Monica Chillaron et.al
recognition algorithms, automatically The work here describes the development
detects the student when he enters the class of Face recognition and detection
room and marks the attendance by application that connects with Raspberry
recognizing him. Real time face Pi with Bluetooth protocols. It uses Eigen
recognition used here is reliable and fast. faces method for recognition whereas
The face detector is installed at the object detection is based on boosted
entrance of the classroom, which detects cascade. From the results , the average hit
the face and recognizes it by matching it of face detection is 84.4%. The use of
with existing faces in the database. Again Raspberry Pi hardware-base increases the
fraudulent methods can be adapted by cost of the system. The system is
students by appearing in front of camera Bluetooth dependent, which can be easily
but not entering the class. In this order, tampered resulting in system failure.
system is not useful and can become costly Efficiency provided by Eigen faces is
too.[5] much poor.[8]
E.Varadharajan et.al In this
method the camera is fixed in the Balika Hinge1 et.al This system is
classroom and it will capture the image, an automated system for human face
the faces are detected and then it is recognition in a real time background for a
recognized with the database and finally company to mark the attendance of their
the attendance is marked. If the attendance employees. To detect real time human face
is marked as absent the message about the Haar cascade is used and a simple fast
student‘s absent is send to their parents. It Principal Component Analysis is used to
uses Eigen Faces method for face recognize the faces detected with a high
detection. Eigen faces is one of the accuracy rate. The matched face is then
simplest but least accurate method. used to mark attendance of the employees.
Efficiency provided by Eigen faces is The system is real time hence needs full
much poor.[6] time vigilance and power source. Haar
Anissa LintangRamadhani et.al cascade has limitations like it cannot
Here, Principal Component Analysis detect faces in dark light environment,
(PCA) and Eigen Faces algorithms are hence resulting in poor face detection.[9]
used for face detection. Ry-UJI robot is
used to implement face recognition. This
research is using primary data type in the 3 . PROPOSED WORK

The proposed model is based on an 2.. Advantages:

android application, which is used to  Enhanced accuracy with
capture and transfer image to the server efficient results using optimized
with the help of a mobile phone. The algorithms in comparison to the
image is processed and attendance of the present ones.
present students in a particular lecture is
marked, which is recorded in attendance  This project emphasizes on
sheets. The lecturer can view the recognizing more number of faces per
photo, resulting into less fault rates.
attendance record of the whole class or a
particular student from the android
 The application of this
application itself.
project extends to work areas such as
1.. System Architecture: Educational organization and
As per the observations from above conference attendance.
literature survey, the proposed system
provides a different insight in the Face  Availability of attendance reports
detection and Face Recognition system. irrespective of place and time.
Proposed system performs several filtering
and cleansing of data to provide efficient 4. MATHEMATICAL MODEL
and accurate results. • The strategy we propose extends all
The image is scanned for partial previous methodologies by partitioning
faces with the help of faster RCNN attributes into groups based on facial
techniques, pose correction for enhanced components. For instance, ‗black hair‘,
results. ‗blond hair‘, ‗bald‘, and ‗bangs‘ are
grouped together, as all of them are related
to hair. The grouped attributes are
summarized in above Table . In this case,
face parts are modeled separately.
• If one part is occluded, the face region can
still be localized by the other parts. We
take the Hair-Branch as an example to
illustrate the learning procedure.
Learning is formulated as,
Let w be the faceness score of a window

w. For example, given a partness map of
hair, ha, w is attained by dividing the sum
of values in ABEF (green) by the sum of
values in FECD. Similarly, it shows that w
is obtained by dividing the sum of values
in EFGH (green) with respect to
ABEF+HGCD of he. For both of the
above examples, a larger value of w
indicates a higher overlapping ratio of w
with a face.
Besides this, a server should give access to
every individual to access their files for the
purpose of displaying.

where I(x,y) signifies the value at the Hence a system with expected results is
location (x, y). being developed but there is still some
room for improvement.
Pose Correction:
REFERENCES
[1] K.P.M. Basheer and C.V. Raghu, "Fingerprint
attendance system forclassroom needs,"
T is a homogeneous Annual IEEE India Conference (INDICON),
pp.433-438, 2012.
matrix defined by [2] S. Konatham, B.S. Chalasani, N. Kulkarni, and
rotation angle , scaling T.E. Taeib,―Attendance generating system
using RFID and GSM,‖ IEEE Long Island
factor s,T and translation Systems, Applications and Technology
Conference (LISAT), 2016.
vector [tx; ty]; as shown [3] S. Noguchi, M. Niibori, E. Zhou, and M.
in above= equation
5. CONCLUSION AND FUTURE Kamada, "Student Attendance Management
System with Bluetooth Low Energy Beacon and
WORK Android Devices," 18th International
There may be various types of lighting Conference on Network-Based Information
conditions, seating arrangements and Systems, pp. 710-713, 2015.
environments in various classrooms. Most [4] S. Kadry and K. Smaili, ―A design and
of these conditions have been tested on the implementation of a wireless Iris
recognition attendance management system,‖
system and system will hopefully show Information Technology and control, vol. 36,
90% accuracy for most of the cases. There no. 3, pp. 323–329, 2007.
may also exist students portraying various [5] S. Chintalapati and M.V. Raghunadh,
facial expressions, varying hair styles, ―Automated attendance management system
beard, spectacles etc. All of these cases based on face recognition algorithms,‖ IEEE
Int. Conference on Computational Intelligence
will be considered and tested to obtain a and Computing Research,2013.
high level of accuracy and efficiency. [6] Online International Conference on Green
Thus, it can be concluded from the above Engineering and Technologies (IC-GET),
discussion that a reliable, secure, fast and E.Varadharajan,R.Dharani, S.Jeevitha,
an efficient system is being developed by B.Kavinmathi, S.Hemalatha, Automatic
Attendance management system using Face
replacing a manual and unreliable system. Detection, 2016.
This system can be implemented for better [7] Anissa LintangRamadhani , Purnawarman
results regarding the management of Musa, Eri PrasetyoWibowo,‖ Human Face
attendance and leaves. The system will Recognition Application Using PCA and
save time, reduce the amount of work the Eigenface Approach‖.
[8] Monica Chillaron, Larisa Dunai, Guillermo
administration has to do and will replace Peris Fajarnes, Ismael LenguaLengua, ―Face
the stationery material with electronic detection and recognition application for
apparatus and reduces the amount of Android‖, IECON2015-Yokohama November
human resource required for the purpose. 9-12, 2015.

A SURVEY OF CURRENT DIGITAL APPROACHES

TO IMPROVE SOIL FERTILITY
Rahul Nikumbhe1, Jaya Bachchhav2, Ganesh Kulkarni3, Amruta Chaudar4
1,2,3,4
Pune, India.
nrahuln.6@gmail.com1, bachchhavjaya@gmail.com2, ganeshkulkarni689@gmail.com3,
amrutachaudar97@gmail.com4
ABSTRACT
About half of the population of India depends on agriculture for its livelihood, but its
contribution towards the GDP (Gross Domestic Product) of India is only 14 per cent.
One possible reason for this is the lack of adequate crop planning by farmers. There is
less number of systems that can advise farmers what crops to grow. The existing system
works manually i.e. user needs to compulsory provide the information about location,
soil type, and other need full information. After gaining the information the system
gives suggestions to farmers, related to crops which can be grown in that particular
area. This process seems to be quite difficult for farmers, due to their lack of
knowledge. In this work we present an attempt to solve this problem of farmers, by just
taking single input from farmers that is soil image and current location. Our system
intends to suggest the best crop choices for a farmer in order to address the prevailing
socio-economic crisis facing many farmers today. Our system will have a complaint
section, through which framers can register their complaints related to any framing
crisis.
General Terms
Pattern Recognition, Color Recognition, Location Tracker, Prediction.
Keywords
Data Mining, DBSCAN, Prediction, Data Visualization
1. INTRODUCTION different purposes. Soil study means the
India is an agricultural country with knowing of externally identifiable patterns
second highest land area of more than 1.6 seen on soil. Grouping of soil is
million square-kilometers under particularly basic for reasonable
cultivation. Most of the Indian population agricultural business. Recognizing the
is involved in agriculture hence the characteristics of soil is the key feature to
economy is largely dependent on reduce the product quantity losses. It is
agriculture. India possesses a power crucial for countries that export several
potential to be a superpower in the field of agricultural commodities In today‘s digital
agriculture. Agriculture promotes poverty world, multimedia act as the primary
upliftment and rural development. Today means of communication, and are
in India agriculture is being neglected regularly transmitted in large numbers
which has led to losing hope of farmers in over public channels such as the internet.
agriculture which has led to rise in the In addition, we can make use of this
number of farmer suicides. There is no multimedia for good purpose also such
such universal system to assist farmers in that it will help the people to simplify their
agriculture. In India agricultural is carried life. We can make use of multimedia and
out from ages and thus we have a rich technology in such a way to suggest the
collection of agricultural past data which best crop choices for a farmer in order to
can used for recommendation. Data address the prevailing socio-economic
mining techniques and algorithms can be crisis facing many farmers today. Also to
used for recommending crops and also the provide complaint section, through which
fertilizers. Soils may be described in the farmers can register their complaints
different ways by different people for their

related to any farming crisis. Farmer faces fertilizers for the soil from the shopping
different types of issues such as: portal. The fertilizers will be suggested to
 Farmer has insufficient knowledge about the users based on their past purchase. The
the soil. user will get suggestions of fertilizers that
 Lack of weather prediction. are usually purchased together. For these
 Illiteracy of crops diseases. suggestions, we are using the Apriori
 Unaware about how to increase fertility of algorithm which is used for obtaining
the soil. frequently purchased item sets [2].
 More efforts had to be taken regarding the This work designs fertilization decision
government schemes support algorithms from the perspective of
decision support system with the model of
In such scenario, a system can help to agricultural fertilization principles. These
farmer, which crop has to be ripening by integrated and optimal algorithms can
analyzing with different essential factors. provide accurate scheme of fertilization for
To Guiding the farmers in such a way that users. The fertilization decision support
efficient use of water, fertilizers and also system was designed and implemented in
need nutrition. A capable system can accordance with the B/S structure by using
predict weather and also taking decisions ASP.NET platform and SQL2000 database
from previous data logs regarding the [3].
crops. It will help the farmers for Several different philosophies are used in
registering the complaints regarding the Kentucky depending on who is making the
crops for various government schemes. recommendation. Different farm supply
dealers, agricultural consultants, and soil
2. MOTIVATION test laboratories use different approaches.
Due to lack of knowledge regarding which Because of this, farmers often wonder why
crop has to be ripen by considering they receive such contrasting fertilizer
different factors such as weather, soil and recommendations and what these
water, Farmer has to face different crisis differences mean in a farming operation
related to crop quality and productivity. [4].
Because of these farmer has to accept the A commercial fertilizer may contain one
loss every year. To help the farmer or all of the essential elements but the
regarding crop and increase the percent of each will be listed on the
productivity soil detection and suggestion fertilizer label. Micronutrients may or may
according to it will help the farmers. not be included in the formulation [5].
This work explains support vector machine
3. LITERATURE SURVEY based classification of the soil types. Soil
Inceptisol soil has low soil fertility and classification includes steps like image
relatively low to moderate levels of acquisition, image preprocessing, feature
organic matter content. Application of extraction and classification. The texture
organic fertilizer on inceptisol soil of features of soil images are extracted using
lowland swamp is expected capable to the low pass filter, Gabor filter and using
increase N, P and K nutrients as well as color quantization technique [6].
yield of sweet corn. This study objective This work presents an image segmentation
was to determine the dose of organic and approach for detecting the soil pore
inorganic fertilizers which can increase N, structures that have been studied by way of
P and K nutrients uptake as well as the soil tomography sections. In so doing, a
growth and yield of sweet corn on research study was conducted using a
inceptisol soil of lowland swamp [1]. density-based clustering method, and in
In the fertilizer purchase system, the user turn, the nonparametric kernel estimation
will be able to purchase the recommended methodology. This overcomes the rigidity

of arbitrary assumptions concerning the The present research deals collecting soil
number or shape of clusters among data, samples for trail pits at designated site as
and lets the researcher detect inherent data per IS code procedure. The digital image
structures [7]. database is prepared for the collected soil
The objective of this study was to develop sample in the laboratory and physical
a flexible and free image processing and properties(Y) are determined[9].
analysis solution, based on the Public This work presents a satellite image
Domain Image platform, for the classification system, which can classify
segmentation and analysis of complex between the vegetation, soil and water
biological plant root systems in soil from bodies. The objective of this work is met
x-ray tomography 3D images. Contrasting by subdividing the works into three
root architectures from wheat, barley and important phases, which are satellite image
chickpea root systems were grown in soil preprocessing, feature extraction and
and scanned using a high resolution micro classification. The image pre-processing
tomography system [8]. phase denoises the image by median filter
This paper investigates the development of and the contrast is improved by Contrast
digital image analysis approach for Limited Adaptive Histogram Equalization
estimation of physical properties of soil in (CLAHE) technique [10].
lieu of conventional laboratory approach.
Table 1: Literature Survey
4. PROPOSED SYSTEM A. Functionality

4.1 Architecture  To identify the color of soil
In this system architecture the image  To tract the farmers location
scanner will detect the color of the soil and  To give the crop recommendation
track user location, from the image which  To gives the suggestions for the fertilizers,
is provided by the user. On the basis of the For any particular diseases, and
soil color and user location, the suitable gives total estimations in the graphical
crops which can be taken in that particular format. To helps the user to register their
area, will be suggested to the user.
complaints online related to the farming crisis.

Figure 1 describes the system architecture, modules interact with each other to
it consist of different modules, and each produce the efficient result for the user.
module has certain functionality. These
B. Final Output 5. CONCLUSION

 After performing all the above In the earlier system, which are developed
functionalities the main output of all the yet provides limited functionality, Such as
required data will show to the user. some of the system detects only soil,
 This will be the final output for the crop remaining of them gives predictions
selection which contains all the related to the weather only. But in our
computations done in above modules and
proposed system we are combining various
the information will be shown according
to it. functionalities such as soil detection,
weather prediction and based on this crop
C. Complaint Registration will be suggested. The proposed system is
 Here the complaint will be registered on to determine that, we can develop a user
the official government complaint friendly application to suggest the best
registration portal regarding to the crop crop choices for a farmer in order to
disease. address the prevailing socio-economic
 This will require the info such as : crisis. Also to provide complaint section,
1. Geographical location through which the farmers can register
2. Soil type their complaints related to any farming
crisis. It saves the time that required for
3. Crop disease name farmer to register Complaint. In proposed
system, various new functionality can be
added in future, so that scope of the
application can be increased, as well as
user can rely on the single application.
Future enhancement such as, Bank loan
service for farmers, online fertilizer
shopping, etc. can be added into single
application. And hence scope can be
increased.
REFERENCES
[1] Lida Xu, Member, IEEE, Ning Liang, and
Qiong Gao, ―An Integrated Approach for
Agricultural Ecosystem Management‖ IEEE
TRANSACTIONS ON SYSTEMS, MAN,
AND CYBERNETICS—PART C:
APPLICATIONS AND REVIEWS, VOL. 38,
NO. 4, JULY 2008
[2] Jharna Majumdar, Sneha Naraseeyappa and
Shilpa Ankalaki ―Analysis of agriculture data
using data mining techniques: application of
big data‖ SpringerOpen Journal 2017
[3] Ramesh Babu Palepu1 and Rajesh Reddy
Muley2 ―An Analysis of Agricultural Soils by
using Data Mining Techniques‖ International
Journal of Engineering Science and
Fig 1: Proposed System Architecture Computing, October 2017
[4] Dasika P. Rao ―A Remote Sensing-Based
Integrated Approach for Sustainable

Development of Land Water Resources‖ IEEE Research and Stress Tolerance, GERMANY.
TRANSACTIONS ON SYSTEMS, MAN May 3, 2017
AND CYBERNETICS—PART C: [9] Karisiddappa, Ramegowda, Shridhara, S. ―Soil
APPLICATIONS AND REVIEWS, VOL. 31, Characterization Based on Digital Image
NO. 2, MAY 2001 Analysis‖. Indian Geotechnical Conference –
[5] Francisco Yandun, Giulio Reina, Miguel 2010 .
Torres-Torriti, George Kantor, and Fernando [10] Anita Dixit, Dr. Nagaratna Hedge and Dr. B.
Auat Cheein ―A Survey of Ranging and Eswar Reddy‖, Texture Feature Based Satellite
Imaging Techniques for Precision Agriculture Image Classification Scheme Using SVM‖.
Phenotyping‖ IEEE TRANSACTIONS 2017 International Journal of Applied Engineering
[6] Mengzhen Kang and Fei-Yue Wang ―From Research. ISSN 09734562 Volume 12,
Parallel Plants to Smart Plants: Intelligent Number 13 (2017). pp. 3996-4003 © Research
Control and Management for Plant Growth‖ India Publications.
IEEE/CAA JOURNAL OF AUTOMATICA http://www.ripublication.com.
SINICA, VOL. 4, NO. 2, APRIL 2017
[7] Małgorzata Charytanowicz, and Piotr
Kulczycki, ―An Image Analysis Algorithm for
Soil Structure Identification‖. Springer
International Publishing Switzerland 2015
[8] Richard J. Flavel, Chris N. Guppy, Sheikh M.
R. Rabbi, Iain M. Young, ―An image
processing and analysis tool for Identifying
and analyzing complex plant root systems in
3D soil using nondestructive analysis: Root1‖.
Dragan Perovic, Institute for Resistance

IOT BASED POLYHOUSE MONITORING AND

CONTROLLING SYSTEM
Shelke Snehal1, Aware Yogita2, Sapkal Komal3, Warkad shweta4
1,2,3,4
Dept. of computer engineering, Shri Chhatrapati Shivaji Maharaj College of
Engineering, Nepti, Maharashtra, India.
shelkesnehal71@gmail.com1,yogitaaware22@gmail.com2,komalsapkal2611@gmail.com3
, shwetawarkad2897@gmail.com4
ABSTRACT
As we know there are many issues surrounding our agriculture sector today lack of
proper technology has caused a decline in production in the recent years. As in other
countries we see that there are many technological advancement that has helped in the
increase in Production. IoT is one of the technology that can make a very large on
impact on the agriculture sector. IoT stands for Internet of things it means that things
will be connected to the internet and communicate with each other.
In our system we have designed a system that can monitor parameters like temperature,
humidity, Gas levels, Light detection etc. all this parameters will be monitored locally,
our system will be connected to the internet via a Wi-Fi module. All the data that has
been collected by the system than will be uploaded to the server where it will be
displayed using graphs and will be available for analysis.
Keywords
IOT, WiFi,Aurdino.
1. INTRODUCTION
In 1995, ―thing to thing‖ was coined by growing trend in technology space, and the
BILL GATES. In 1999, IoT (Internet of arduinouno is the perfect board to get
Things) was come up by EPC global. IOT started with building of IoT projects. [1]
interconnects human to thing, thing to ―Smart Sensing Technology for
thing and human to human. The goal of Agriculture & Environmental Monitoring‖
IoT is bring out a huge network by by SubhasMukhopadhyay .Environment
combining different types connected Monitoringusing Bluetooth technology is
devices. IoT targets three aspects less costly. User also can control various
Communication, automation, cost saving parameters using bluetooth but the
in a system. IOT empowers people to carry disadvantage of Bluetooth based systems
out routine activities using internet and is that Limited range. [2] ―Interface
thus saves time and cost making them System Planning for GSM‖ by
more productive. IOT enables the objects JukkaLempiainen.GSM based monitoring
to be sensed and/or controlled remotely & controlling various parameters is easy &
across existing network model. IOT in beneficial than Bluetooth but the
environmental monitoring helps to know disadvantage of those system is Different
about the air and water quality, AT commands. [3] So in our project we
temperature and conditions of the soil, and are going to do IoT based monitoring &
also monitor the intrusion of animals in to controlling with the help of wireless
the field. IOT can also play a significant sensors. Now anyone from anytime and
role in precision farming to enhance the anywhere can have connectivity for
productivity of the farm. anything and it is expected that these
connections will extend and create an
2. EXISTING SYSTEM entirely advanced dynamic network of
―Internet of Things with the Arduino Yun‖ IoTs.
by Marco Schwartz.IoT is currently

3. MOTIVATION 5. Moisture sensor: The moisture sensor

As we know agriculture is the very will be used to measure the moisture level.
important part of our economy and it has 6. Smoke Sensor: The Smoke Senor will
played a very vital role in our growth. So it detect fire caused in that area and will send
very important to increase the production notification through internet.
my using smart agriculture techniques.IOT
is a technology that has changed the world 7. Relative humidity control: The
can make a difference in the Indian humidistat coupled to water circulating
agriculture system. pump to control the relative humidity of
soil. Here we maintain the relative
4. OBJECTIVE humidity of soil. This is one type of
 To control water motors using the watering system in this soil sensor is use to
moisture levels. find out humidity in soil and if it is less
 Do monitor the physical then motor pump start and water is giving
parameters like temperature, humidity and to the soil.
water level. 8. Light intensity control: In certain areas
 To maintain a database of those where natural illumination is absent or
physical parameters. very low, illumination for plants may be
provided by artificial sources.
5. PROPOSED SYSTEM Incandescent bulbs generate excessive heat
1. Arduino Uno is a microcontroller and are unsatisfactory in most instances.
board based on the ATmega328P Fluorescent tubes are useful as the sole
(datasheet). It has 14 digital input/output source of light for African violets,
pins (of which 6 can be used as PWM gloxinias and many foliage plants which
outputs), 6 analog inputs, a 16 MHz quartz grow satisfactorily at low light intensities.
crystal, a USB connection, a power jack, Excessive light intensity destroys
an ICSP header and a reset button. chlorophyll even though the synthesis of
2. Humidity: We will use humidity this green pigment in many plants is
sensor for sensing the humidity of soil. dependent upon light. Chrysanthemum is a
After that this signal is send to Arduino. In classic example for a short-day plant.
that Arduino a particular set point is given However, flower buds will not form unless
and if it is below or above it take action the night temperature is high enough.
likewise. Chrysanthemum is flowered on a year-
3. Temperature: We will use round basis as a cut flower or potted plant
thermocouple as temperature sensor. simply by controlling the length of day and
Temperature is sense an after that this temperature
signal is send to Arduino. In that Arduino 9. Internet: All the data will be uploaded
a particular set point is given and if it is on the server which you will be able to
below or above it take action likewise. monitor and control from any device
4. Intensity: Intensity will be sense by which is connected to the internet.
photodiode. If intensity of sun increases
then green net is used for reducing the 6. SYSTEM ARCHITECTURE
intensity using Arduino.

Fig: System Architecture

7. FLOWCHART  Data can be accessed from any part
of the world.
9. FUTURE SCOPE
 To reduce Complexityof system
 Reduce Excessive delay
 To reduce High cost
10. CONCLUSION
In our system we have designed a system
that can monitor parameters like
temperature, humidity, Gas levels, Light
detection etc. all this parameters will be
monitored locally, our system will be
connected to the internet via a Wi-Fi
module. All the data that has been
collected by the system than will be
uploaded to the server where it will be
displayed using graphs and will be
available for analysis.Hence we have
designed an IoT based system for
monitoring the parameters of poly house.
REFERENCES
[1] Vamil B. Sangoi, ―Smart security solutions,‖
International Journal of Current Engineering
and Technology, Vol.4, No.5, Oct-2014.
Fig: Flowchart [2] Simon L. Cotton and William G. Scanlon,
―Millimeter - wave Soldier –tosoldier
8. ADVANTAGES communications for covert battlefield
operation,‖ IEEE communication Magazine,
 This software is freely available. October 2009.
 Low Cost and Easy to use.

[3] AlexandrousPlantelopoulous and

Nikolaos.G.Bourbakis, ―A Survey on
Wearable sensor based system for health
monitoring and prognosis,‖ IEEE Transaction
on system, Man and Cybernetics, Vol.40,
No.1, January 2010.
[4] B.Chougula, ―Smart girls security system,‖
International Journal of Application or
Innovation in Engineering & Management,
Volume 3, Issue 4, April 2014.
[5] Hock Beng Lim, ―A Soldier Health Monitoring
System for Military Applications,‖
International Conference on Body Sensor
Networks.
[6] PalvePramod, ―GPS Based Advanced Soldier
Tracking With Emergency Messages &
Communication System,‖ International
Journal of Advance Research in Computer
Science and Management Studies Research
Article, Volume 2, Issue 6, June 2014.
[7] RadhikaKinage, JyotshnaKumari, PurvaZalke,
MeenalKulkarni, Mobile Tracking
Application, International Journal of
Innovative Research in Science, Engineering
and Technology , Issue 3, March 2013.
[8] Nazir Ahmad Dar and AfaqAlam Khan, An
Implementation Of Lbs, Location Manager,
Services And Web-services In Android, ISST
Journal of Mathematics & Computing
System,Vol. 4 No. 1, (January-June 2013),
p.p.49-54.
[9] Sonia C.V, Dr.A.R.Aswatha,An Android
Application To Locate And Track Mobile
Phones, International Journal Of Engineering
Trends And Technology (IJETT) – Vol. 4
ISSUE 5- May 2013

ADOPTION OF E-LEARNING IN ENGINEERING

COLLEGES FOR TRAINING THE STUDENTS
Santosh Borde1 , Yogesh Kumar Sharma2
1,2
Department of Computer Engineering,Shri Jagdishprasad Jhambarmal Tibrewala
University,Chudela,Jhunjunu,Rajasthan
1
santoshborde@yahoo.com1
ABSTRACT
E-Learning applications has gained recognition globally as a panacea to address the
access, quality and equity challenges facing education systems.This paper reports findings
of a study whose aim was to assess the level of preparedness for e-learning adoption
among pre-service teacher trainees in selected Primary Teacher Training Colleges in
India. Specifically,the study sought to establish the level of pre-service teacher trainee
skills for adoption of e-learning; assess the level of availability and accessibility to e-
learning infrastructure and to assess the nature of strategies put in place to promote
adoption of e-learning.Descriptive survey design was used where questionnaires were
used to collect data from a sample of 287 respondents. Data was analysed by use of
descriptive statistics aided by Statistical Package for Social Sciences. Despite the efforts
made by the India government towards technology uptake in schools, teacher training
colleges are hardly prepared for e-learning as the study results showed that majority of
respondents (77%) were unskilled in performing functions related to use of e-learning
while a high percentage(67%) reported lack of strategies in place to promote use of e-
Learning.
Key words: e-Learning, Competencies, adoption, engineering
1. INTRODUCTION The World Summit on Information
The global technological advancement has Society(WSIS) forum identified the need to
led to increased use of new technologies in measure the progress made in bridging the
teaching and learning. In a meeting on the digital divide (ITU, 2011).
adoption of the Millennium Development
Goals (MDGS) in 2000, the world leaders The role of ICT in education has
drew attention to the urgency of the been supported as a solution to the triple
countries to enable access to ICT challenges of quality, equity and access to
infrastructure of their citizens in order to education (UNESCO-UIS,2009). UTI
reap the benefits therein. In fact, target (2014) urges countries to harness the
eight of the eighth MDG states that ―in co- power of ICTs for increased productivity
operation with the private sector, to reach the unreached and to enhance
governments shall make available the quality of learning as it is believed that
benefits of new technologies, especially ICT can have a monumental impact on the
information and communication expansion of learning opportunities for
technologies‖ (United Nations, 2000.p.1). diverse populations beyond cultural and
The relevant indicators for tracking the geographical barriers (Haddad and
progress of the two targets as endorsed by Draxler, 2002). However,Kozma (2005)
the United Nations Statistical Commission argues that simply putting computers into
(UNSC) at its 38th Sessional meeting in schools and development of plans in place
2007 included availability of ICT does not necessarily translate to
infrastructure, access in terms computers, implementation and results on the ground.
internet connectivity and individual For effective adoption of e-learning,
competencies (The International institutional preparedness is essential
Telecommunications Union (ITU), 2014). (Dutta et al.,2012), while Farrell (2007)

and ITU (2010) asserts that adoption of e- infrastructure and capacity building but
Learning requires not only development of also measuring the degree of availability
plans, connecting schools with
accessibility of those resources. This calls Some of the instruments developed to
for assessment of preparedness to provide measure e-Learning readiness as presented
key quantifiable information indicators for by UNESCO-UIS (2009) has been
a country‘s situation (McConnell indicated in table 1.
International, 2001, ITU, 2010).
Table 1: Readiness indicators for adoption of e-learning
Concept Description
Infrastructure Availability of ICT hardware (such as desktop computers, laptops,
Interactive White Boards), availability of ICT software
Vision The vision for an institution regarding e-learning in relation to
pedagogy transformation and lifelong learning
Staff Motivating instructors/teachers to acquire ICT skills for pedagogical
Development practices; training of instructors for skill acquisition of skills for ICT
plan utilization in teaching and learning.
ICT Support ICT support, vision, time and financial allocation in the institutional
strategic plans, pedagogical support for instructors, technical
support for both educators and students
Jones (2004)argues that for A survey carried out by Tinio
successful in adoption of new (2002) on ICT Utilization in public high
technologies, the process of adoption schools in Philippines recommended for
should focus on training of teachers, comprehensive assessment of the ICT
instituting educational reform activities, environment to be conducted to establish
training of technology support staff, institutional infrastructure and competency
training of students, implementing skill inventory as pre-requisites for
technological resources and digital content adoption of e-learning. In some countries
preparation. Furthermore, the shift to e- such as the United States, Canada,
learning strategy requires creation of clear Singapore, Sweden, Japan, Finland,
vision and mission for the institution to Britain, Norway and Australia, heavy
aligndigital content with the mandated investment has been directed to technology
curriculum with consideration of the in education. In Singapore for instance,
diversity of learner‘s needs. Wagner et al. teachers are required to complete over 10
(2005) recommends training for pre- core modules within 30 to 50 hours of
service and in-service teachers as a crucial training to enable use of e-learning in
input component, pointing out that the teaching process (Farrell et al.,2007).
level of e-learning adoption is determined In Chile, internet connected
by the percentage of trained teachers, the computers serve over 90% of the school
quality of ICT training and the technical population and 80% of the teachers have
support. Tinio (2002) asserts that for the been trained and acquired pedagogical
learners to participate fully in the e- skills for the adoption of e-learning
learning activities, learners should be (Garrison, 2011). Teachers in at all levels
equipped with three foundational skills. in Chile received two years of face-to-face
Since Technology becomes obsolete fast, training amounting to 100
there is need for planning for technological hours.Consequently, teachers regularly
sustainability in schools (Anderson, 2010; make use of computers for professional,
Ministry of Education [MoE], 2009). managerial and out-of-classroom tasks
searching for educational content on the

web, lesson planning) (UNESCO,2011). little documented history of their successes

Similary, the Republic of Korea have (Hennessy et al.,2010). Dutta and Bilbao-
shown commitment to adoption of Osorio( 2012) points out that the level of
technology in her education system.The ICT readiness in Sub Saharan Africa is
country ranked as the highest in e- still low as indicated by low internet
Learning adoption rate in the world, with connectivity, insufficient ICT
88 % internet connectivity in primary infrastructure compounded by low levels
schools, 78% in middle schools and 68.7% of skills. A progress report by ITU (2011)
in high schools, 47.1% junior-high on the achievement of the Tunis and
schools, 62% junior colleges, and 78% Geneva World Summit on Information
connectivity in universities, with an Society goals showed that over 80% of
average student computer ratio of 5.8 in population in Africa had no access to
70.7% of schools by 2007,closely internet, with extremely low regional
matching OECD levels of 5 students per household Internet access average of 5.3%,
personal computer and plans to digitize all far short of developing country average of
the contents in schools by 2013 (UNESCO 24%. Dutta et al. (2012) observes that Sub-
IITE, 2010). Saharan Africa has remained the world‘s
In Africa, most countries have least-connected region, where only 13% of
showed positive strides towards promotion individuals had the ability to use the
of ICT as indicated in ICT policy Internet by 2012.Table 2 shows the global
formulation by 2011. However, e-Learning digital divide in terms of regional internet
programs in African countries are still penetration.
small, experimental pilot projects with
Table 2: Regional Internet Penetration and usage (2014).
World Region Population Internet Penetration % Users (in
(2012 Est.) Users Population World %)
Africa 1,073,380,925 167,335,676 15.60% 7.00%
Asia 3,922,066,987 1,076,681,059 27.50% 44.80%
Europe 820,918,446 518,512,109 63.20% 21.50%
Middle East 223,608,203 90,000,455 40.20% 3.70%
North America 348,280,154 273,785,413 78.60% 11.40%
Latin America/
593,688,638 254,915,745 42.90% 10.60%
Caribbean
Oceania/Australia 35,903,569 24,287,919 67.60% 1.00%
World Total 7,017,846,922 2,405,518,38 34.30% 100.00%
Source: Dutta et al. (2012)
Table 2 shows that Africa lags characterised by inadequate ICT human

behind in terms of level of ICT readiness capacity and infrastructure and hence, high
among world regions as measured in terms level of digital divide.
of internet connectivity and access, usage, In India, the ICT policy
competency development and commitment is to make the country
affordability. The continent only accounts globally competitive and one of the
for 7% of the world internet users (Dutta et education objectives is the adoption of
al. (2012). Trucano (2006 cited in Farrell new technologies as a tool for the
et al., 2007) describe the status of ICT in achievement of vision 2030 (Ministry of
African universities as ―too little, too Education, Science and Technology,
expensive and poorly managed‖ (p.3) and 2015). the country aims at popularization

of ICT as well as Open and Distance been established that most higher
Education (ODE) at all levels of education education institutions in Africa have not
and training (RoK,2005) and the plan is to yet assessed the level of preparedness as
make education the platform to equip the the leadership is yet to be convinced on the
Indian citizens with ICT skills to create a role of ICT in education (Kashorda and
dynamic and sustainable economic growth Waema, 2009).The dearth of assessment of
through enhanced learning and the mission the level of preparedness results to
of ICT in education is ―to integrate ICT in duplication of efforts and inefficient use of
education and training in to prepare scarce resources (RoK, 2014).
learners and staff of today for the Indian
economy of tomorrow and enhance the 2. MATERIALS AND METHODS
nation‘s ICT skills‖ (RoK, 2006, p. 25) The study adopted descriptive survey
and a vision to adopt ICT as a universal design using both quantitative and
tool for education and training (MoE, qualitative techniques. Survey design was
2006). To achieve the vision,‗…..every preferred as it enables researchers to
educational institution, teachers, learners make description, explanation and
and the respective community will be exploration of the phenomena to establish
equipped with appropriate ICT the status quo (Saunders et al.,2007).The
infrastructure, competencies and policies study sampled five(5) PTTCs out of the 22
for usage and progress‖ (MoE, 2006,p.14; colleges. Simple random sampling was
RoK,2005). This is further reflected in used to obtain 287 respondents from the
India‘s Master plan of 2014 which lays out five colleges. The data were analyzed by
strategies of mainstreaming e-Learning, use descriptive statistics such as
targeting 100% use of e-Learning as an frequencies,mean and standard deviation
alternative curriculum delivery strategy in aided by use of Statistical Package for
teacher training institutions by 2017 (RoK, Social Sciences (SPSS version 20)
2014). software programme.
From earlier research however
(Kiilu, Nyerere & Ogeta, 2016; Kiiilu & 3. RESULTS AND DISCUSSIONS
Muema, 2012; Republic of India, 2012), The study sought to establish the
the use of ICT and e-Learning in teaching institutional and teacher trainee level of
in public institutions in India is still preparedness for the adoption of e-
patchy. A desk top review carried out by learning in teacher training colleges using
Kiilu and Muema (2012) on implications the UNESCO Institute of Statisticts[UIS]
of e-readiness on adoption of e-learning 2009 institutional e-readiness inicators
approach in secondary schools in India which include availability and accessibility
established that although the country to infrastructure, internet connectivity;
advocates for use of education as a competency (UNESCO-UIS, 2009). From
platform for the 21st century skills the study, the pre-service teacher trainees
development, less than 10% of secondary responses regarding infrastructural
schools in India offered computer studies facilities were presented in table 3.
as a specialty subject at the time. It has
Table 3: Availability of Resources for e-learning.
ICT infrastructure Mean Standard. Dev. N
Internet connectivity 3.5842 1.29280 287
Desktop computers 3.8750 1.10036 287
Interactive white boards 3.801 1.48719 287
LCD projectors 3.6915 1.21110 287
Database repositories 2.768 0.34625 287
College website and password 2.6795 1.10130 287

The commonly available resources observation checklist indicated that such

were desktop computers, Internet facilities were non-existent
connectivity and projectors. However, all On accessibility toe-learning
the 287 respondents indicated a dearth of resources, the teacher trainees‘ who
resources such as college website and participated showed lack of accessibility to
password, Database repositories, modem e-learning facilities. The pre-service
for internet connectivity. Although the teacher trainee responses were presented in
students indicated presence of Interactive Table 4.
White Boards in their colleges,
Table 4: Accessibility to e-learning Resources

AccessibleFacilities Mean Std. Dev. N
Internet connectivity everywhere 2.71 1.4620 287
Desktop/laptop computers 3.84 1.3954 287
Scanners, printers and digital cameras 2.334 1.367 287
A modem for connectivity 2.595 1.7899 287
LCD projectors on need basis 3.654 3.071 287
Interactive White Board 3.972 1.027 287
Digital storage devices 3.172 1.417 287
Specific e-books for reference 2.610 1.337 287
Database repositories 2.768 1.346 287
Digital content for all subjects 2.458 1.296 287
Technical support on 24/7 basis 2.425 1.404 287
From table 4, the most accessible cited lack accessibility to infrastructure as
e-learning resources were projectors, among the challenges affecting adoption of
Desktop/laptop computers and Interactive e-learning in Hong Kong schools.
white boards were accessible. However, e- The teacher-Trainee competency levels
learning resources were inaccessible to were measured by rating their ability to
teacher trainees. The study findings apply some of the essential ICT skills.The
support earlier results by So (2008) who results were presented as shown in table 5.
Table 5: Competencies for e-learning in PTTCs
Competencies/Skills Incompetent Not sure Competent N
F % F % F % 287
Operating a computer 22 8 52 18 213 74 287
Use of word processor 33 11 45 16 209 73 287
Use of search engines 164 57 47 17 76 26 287
Uploading and downloading 44 15 49 17 194 68 287
documents
Making of graphical illustrations 125 44 69 24 93 32 287
Use of e-mail for collaboration 125 43 77 27 85 30 287
in learning
Use of Interactive White Board 188 66 46 16 53 18 287
Making of PowerPoint and 179 62 59 21 49 17 287
presentation
Information gathering through 106 37 73 26 108 37 287
research in internet

The Pre-service teacher trainees‘ gathering (68%).Pernia (2008 as cited in

responses on ICT skills were diverse. Only Makhanu, 2010) observes that for effective
a small number (17%) of teacher trainees e-learning to take place, the learners must
were are competent in skills such as possess technical skills and capabilities
creation of Power Point presentation. For such as the ability to search, retrieve,
the skill of creation of graphical assess, store, development of course
illustrations, 32% of the respondents materials, uploading lessons, and
affirmatively expressed competence, 44% evaluation of learners, presentation and
incompetent while 24% of were not sure. communication to learners via the Internet.
The use of search engines was a challenge
as majority of them (58%) indicated Regarding availability of support
incompetence, while 66% were of students Strategies for e-learning in PTTCs, the
expressed inability to use of IWBs, 16% pre-service teacher trainees were asked to
were not sure. The students however give their opinions on awareness of
demonstrated some competencies in the strategies in place to support e-learning in
operating a computer (74%), use of word PTTCs. The findings were summarized in
processor (73%), and information table 6.
Table 6: Provision of Support Strategies in PTTCs
Statement Mean Std. Dev N
Internet-linked computers are provided 2.505 1.268 287
College provides students with e-content 1.834 0.873 287
College internet connectivity is available everywhere 2.187 1.186 287
Technical support provided on 24/7 2.247 1.231 287
college has subscribed to educational digital resources 2.763 2.726 287
Internet usage policy is in place 2.975 1.293 287
College strategic plan has a Vision statement on e- 2.837 1.221 287
learning
e-learning Materials prepared by my college are 2.415 1.161 287
available
College e-mail to communicate to students 2.768 2.356 287
From Table 6, majority of the objectives, to enhance ICT appropriate
teacher trainees denied the existence of competencies, knowledge and attitudes, to
any of the highlighted strategies to support manage education effectively and
adoption of e-learning. Majority of the efficiently at all levels‘ (RoK, 2013, p.
teacher- trainees were not sure whether the 221), the study findings shows that the
college had internet usage policy, or government intention has hardly been
whether the college had subscribed to any supported in PTTCs. By the time of the
educational digital resources or even study, most of the colleges had hardly put
whether strategic plan had mission and in place strategies to promote adoption of
vision statement on e-learning (mean of e-learning.
2.83). It is important to note that
successfule-learning environments 4. CONCLUSIONS
requires accessibility to digital content, From the literature review and study
user accounts, and communication tools findings, although a lot of emphasis on
such as e-mail and technical support mainstreaming e-learning as an alternative
(Carcary, 2008; ITU, 2010; RoK, 2014). delivery strategy and heavy financial
Although The Basic Education ACT allocations, the results from the study
No.14 of 2013 spells out the plan to shows that the colleges have been left out
promote the of use of ICT to ‗support and in the MoE efforts to supply institutions
enhance the attainment of curriculum with computer equipment for learning

purposes as indicated by teacher trainees. Practice (2nd Edition). London:

Furthermore,the from the study results, the Routledge/Falmer.
[5] Haddad, W.D. & A. Draxler (eds.) (2002).
policy discourses has not been backed with Technologies for Education: Potentials,
comprehensive policies and plans to Parameters and Prospects. Paris: UNESCO and
mainstream e-learning. Based on the the Academy for Educational Development
overall research results, the level of (AED).
preparedness in PTTCs in India can [6] International Telecommunication Union
(ITU). (2010). Monitoring the WSIS targets
generally be described as low and still at a A Mid-Term Review: World
developmental stage. Telecommunication/ICT Development Report
2010, Geneva,Switzerland.
The study recommends for more [7] Jones, A. (2004). A Review of the Research
investment to be directed to the provision Literature on Barriers to the Update of ICT by
Teachers: Becta.
of the essential e-learning facilities and [8] Kashorda, M. & Waema, T. (2009). E-
continous teacher trainees‘ competency readiness Survey of East African Universities,
development. Furthermore, internet Retrieved on 10th May, 2013 from
connectivity need to be provided to enable http://ereadiness.kenet.or.ke/sites/default/files/
teacher trainees to access the digital E- readiness.pdf
[9] Kiilu, R. M., Nyerere, J.K., & Ogeta,N.O.
materials for learning and also enable (2016). Status of Institutional Preparedness for
learner-tutor collaboration for effective Adoption of E-learning in Teacher Training
learning. As the teacher trainees would be Colleges in India. Public Policy and
expected to implementthe current one lap Administration Research.Vol.6, No.12, 2016
top per child progamme that has been [10] Kiilu, R.M. & Muema, E. (2012). An e-
Learning Approach to Secondary School
initiated in primary level of educaation, Education: E-Readiness Implications in
under the current pre-service teacher India. Journal of Education and Practice,
trainee competency levels, the objectives 3(16), 142-148.
of the OLPC programme may just remain [11] Kozma, R. B. (2005). Monitoring and
as a pipe dream. There is need certainly for evaluation of ICT for education impact: A
Handbook for Developing Countries.
a policy dialogue to chart out the way and Worldbank/InfoDev. Retrieved on 13thApril
prioritise planning for competency 2012 from
development for sustainable e-learning http://www.infodev.org/en/Publication.9.h
adoption in Primary Teacher Training tml.
Colleges in India [12] McConnell International.(2001). Ready?
Net. Go! Partnerships leading the Global
. economy. E-readiness Report. Retrieved on
REFERENCES 12th, May 2012
[1] Carcary, M. (2008). The Evaluation of ICT Fromhttp://www.witsa.org/papers/e-
Investment Performance in terms of its readiness2.pdf.
Functional Deployment: A Study of [13] Ministry of Education, Science and
Organizational Ability to Leverage Technology (2015). National Education Sector
Advantage from the Banner MIS in Institutes Plan (2013 – 2018)
of Technology in Ireland, a PhD Thesis. [14] Organization for Economic Co-operation
[2] Dutta, S., &Bilbao-Osorio, B. (2012). The and Development (OECD). E-learning in
Global Information Technology report 2012: Tertiary Education, Where do we stand? Paris
Living in a Hyperconnected world. World [15] Republic of India. (2014).The India National
Economic Forum, Geneva ICT Master Plan (2013/2014-
[3] Farrell, G.&Shafika, I.(2007).Survey of ICT 2017/2018):Towards a Digital India. Nairobi:
and Education in Africa: A Summary Report Government printer.
Based on 53 Country Surveys. [16] Saunders, M., Lewis & Thornhill, A. (2007).
Washington, DC: infoDev/World Bank. Research Methods for Business students. (4th
Retrieved on 10th May, 2012 from edition). England: Prentice Hall.
www.infodev.org/en/Publication.353.html [17] Tinio, V. (2002). Survey of Information and
[4] Garrison, D. R. (2011).E-Learning in the 21st Communication Technology Utilization in
century: A Framework for research and Philippine Public High Schools:
Preliminary Findings. Center of International

Technology Education and Development.

Retrieved on 12th July, 2014, from
http://www.digitalphilippines.org/files/researc
h_8.pdf
[18] UNESCO IITE. (2010). Medium-Term
Strategy 2008-2013.An Explorative Study of
Developing a National OER Policy
Framework. Paris: UNESCO/IIEP
[19] UNESCO-UIS. (2009).Initiatives for
Standardization of Information and
Communication Technologies (ICT) Use in
Education. Retrievedon 17th March, 2013
from
http://www.itu.int/ITUD/ict/conferences/rio09/
material/9-UNESCO-E.pdf
[20] United Nations (UN). (2012). Progress made
in the Implementation of the World Summit on
the Information Society at the regional and
International Levels. Retrieved on 25th
April,2013from
http://unctad.org/meetings/en/SessionalDocum
ents/a67d66_en.pdf
[21] United Nations (UN). (2000). Millennium
Development Goals (MDGs). Accessed on
Dec 18th 2014, from
http://www.un.org/milleniumgoals/goals.html
[22] Wagner, D. A. Day, B., Tina, J., Kozma,
R.B., Miller, J. & Unwin, T. (2005).
Monitoring and Evaluation of ICT in
Education Projects: A Handbook for
Developing Countries. The World Bank,
Washington DC.
[23] World Bank. (2005). E-Ready for What? E-
Readiness in Developing Countries: Current
Status and Prospects toward the Millennium
Development Goals. Washington DC, USA.

ICT GADGET: DESIGN OF E-LEARNING SYSTEM

FOR RURAL COMMUNITY
Ansari M A1, Yogesh Kumar Sharma2

1
Asst. Prof. Department of Computer Engineering, Smt. Kashibai Navale
College of Engineering, Vadgaon, Pune-411041, Maharashtra, India.
Email: maqans@gmail.com
2
Professor, Department of Computer Engineering, Shri Jagdishprasad
Jhambarmal Tibrewala University,Chudela, Jhunjunu, Rajasthan.
ABSTRACT
E-Learning has the potential to bridge the educational gaps that exist in society and
improve the lives of millions of people in the developing world who do not enjoy the
same opportunities as those in rich, developed countries. The purpose of this paper is to
outline the benefits and challenges associated with implementing e-Learning in rural
communities, specifically from the point of view of the village community. A model for
implementing e-Learning in rural areas is presented. As a result of the research
presented, it is recommended that teachers ‗drive‘ the implementation of e-Learning,
supported by technical specialists to enhance the learning process within the context of
the syllabus. It is also essential that local people are empowered to lead in this process
so that ownership is inculcated within the community, ultimately improving the
educational outcomes for students.
The system provides an e-learning environment, which supports for users like: 1.
Teachers/Instructors related to concerned subject. 2. Students in individual and
collaborative analysis of studies. The system is currently under development and this
paper emphasis on some important research features which supports as a tool of e-
Learning developed using latest JAVA technology called Java Media Framework
(JMF). This paper includes analysis and design phases, discussing the many
possibilities and problems of e-Learning.
Keywords: e-Learning System, Streaming Media, Audio-Video Transmission, Training,
Desktop Capturing, Real-Time Communication, pedagogical etc.
1. INTRODUCTION material will be developed in open source
The major aim with the e-learning project software (OSS).
was, to increase the quality of science E-Learning describes the use of ‗tools‘
education in schools and colleges, such as computers, the Internet and in
especially in the rural areas where there is general, information and communication
a severe lack of educated teachers and technology (ICT), to provide learning or
accurate books. With e-learning material, education in one or more subject areas. It
the students can access and use quality may be implemented in any environment –
material that should be self explanatory. e.g. school, industry, government or
The aim is to increase the number of village – and may be provided at any level
students who succeed in their studies. E- of expertise. There is a plethora of possible
learning was also seen as supporting weak ways of e-Learning. For example,
students to find new motivation and to be eLearning could be as simple as a student
able to learn in their own pace. In other communicating by e-mail with a professor.
words, many different benefits are A more complex example of e-Learning
expected from the introduction of could be a class having a lesson from a
computers and e-learning. The e-learning teacher in the United States via a video
linkup on the Internet. Another example is In the context of rural areas, following
a student in a remote location doing an factors are important:
entire course of study offered by a • Technology (both hardware and
university via the Internet (i.e. distance software) must be cheap but robust enough
education). for rural conditions. In essence, it must
In the context of rural areas, e- have an excellent cost/benefit ratio.
Learning presents both opportunities and • Open-source software is most
challenges. For example, rural areas are suitable as it is free for use under the GNU
often geographically isolated from public license.
developed towns and cities where there are • Given the harsh conditions (e.g.
better opportunities for education and dusty environment) in rural areas, it is
employment. E-Learning, if implemented necessary to develop a programme/policy
in the right way in rural areas, has the for the type of equipment used, how to
potential to overcome these geographical best protect equipment, and how to
barriers. monitor breakdowns and associated costs,
From this point of view, e-Learning is with a desire to continuously improve
possibly more beneficial for rural areas utilization/lifespan of equipment.
than any other area (e.g. towns and cities) • Given limitations in cost, it is
because it helps people to overcome impossible to ensure a 1:1 student to
resource limitations (e.g. lack of libraries computer ratio. Indeed, this is not even
and books) which other areas do not done in well-funded public schools in
necessarily encounter. However, the developed countries. Instead, given the
challenges of implementing e-Learning in requirement to minimize costs, it is best to
rural areas are usually far more extreme maximize technology utilization to ensure
than those faced in developed areas. For a good cost/benefit ratio, e.g. by having a
example, rural areas usually have a poorer computer lab.
infrastructure (e.g. poor electricity supply • Bandwidth in rural areas is often
and roads), less finances, lower levels of very expensive. OSS is chosen not
general literacy; lower accessibility/higher primarily to reduce costs, but to increase
cost of Internet access and limited the flexibility to modify and test and
understanding or appreciation of the develop appropriate materials. The
potential of eLearning. flexibility also makes it possible to adjust
to small bandwidth.
2. TECHNOLOGY
In this paper, technology refers to both the Network-side – Server
hardware and software to provide the basic Server is a freely available Linux-based
infrastructure for e-Learning. This includes server that is intended to meet the ICT
components for networking (e.g. access requirements of e-learning application. It
points and links to the Internet) as well as can be used to drive networks that have in
client computers and software for basic excess of 100 client computers.
services (e.g. e-mail, file sharing, web • Documents: Staff can work with
pages etc.). Technology also refers to their own documents and share them
servers that could be used for centralized among each other in workgroups. Staff can
data/program storage. It does not include simply copy relevant files to student
specific eLearning software intended folders. There are hourly backups of all
purely for the purposes of pedagogy, documents on the server and it is easy to
which is covered under ‗applications‘. restore lost documents.
However, the underlying technology is • Web: Internet access is provided
intended to have the capabilities to support on every computer with ‗safe‘ access. The
e-Learning applications. school has its own website, which is easy

to manage. Pupils have their own • Quizzes with different kinds of

webpages, supervised by teachers. An questions,
internal website may be used for access to • Database activities,
web-based educational software. • Chatting,
• E-mail: Webmail is available with • Glossaries
access control depending where the use is. • Whiteboard
An unlimited number of e-mail
addresses/aliases are possible. All mail is Client-side: User Interface
scanned for computer viruses. The client-side solutions are computers
• User management: Very simple used by pupils/teachers to access the
account creation and management of network. The two suggestions presented in
pupils, groups of pupils, faculty teams, this paper are traditional desktop/laptop
working groups, staff, etc. Promoting pupil computers.
accounts or groups to the next grade is
easy and can be done with a couple of Traditional desktop and laptop computers
mouse clicks. Most educational institutions around the
• Programs: When users log in they world maximize utilization of ICT
automatically get their programs (i.e. equipment by having computer labs where
roaming profiles). It is very easy to students can log into computers and do
manage and assign software for groups their work. The use of computers in the
and individuals. CD images can be stored classroom can be rotated among pupils, or
on the server and played everywhere. This if there are a sufficient number of
is useful in rural environments, where computers, a computer lab could be
CD/DVD-ROM drives break down easily. operated so that pupils could go whenever
 Security: Backups are made they desire.
automatically, removing the hassle of
doing so manually. For Windows Resources
computers, there is a free virus scanner. One computer per head is an initiative
The school‘s management system (i.e. aimed at designing. While the computer
financial administration, pupil can operate using Windows, it comes with
administration) has a double backup. a free preinstalled Linux-based operating
 Hardware/System: ‗Older‘ client system that has a ‗friendly‘ user interface
computers are sufficient and good for children. The computer is intended to
performance is achieved on an ‗older‘ be an educational tool but does not come
server computer. Free programs are used with e-Learning software. In other words,
to protect against viruses and spam. Offsite learners can discover new things using the
management of the server is possible with Internet as well as communicate/share
a safe/encrypted connection. ideas.
 Documentation: The server is
documented fully. There is also client
installation documentation and end user Recommendation for client-side computers
documentation. It is recommended that blends of
‗traditional‘ desktop/laptop computers are
Content management systems experimented with to determine what
A content management system may be mixture is optimal in the long-term. The
used by educators to create and manage ‗traditional‘ computers are ideal within the
online courses for rich interaction. It has classroom environment or for pupils who
many useful features expected for e- do not have computers. A cost/benefit
Learning purposes: analysis of various types of computers
• Content managing (resources), (e.g. new, refurbished and computers)

needs to performed to determine what type of machines should be purchased.
Figure 1: Architecture of E-learning System.
3. PROPOSED SYSTEM Sessions running is with the server within

Video Conferencing that network.
Videoconferencing is a real time
communication medium, conducting a
meeting in a virtual conference room Desktop Capturing
between two or more persons present at As the name of project the desktop
different locations by using computer capturing module require to capture the
networks to transmit audio and video data. remote side desktop to monitor and control
In this system AudioVideo data is the activities. This particular module uses
transmitted from Instructor to Students by Instructor or Trainer to monitor the
which are allocated for the particular students desktop to control the activities
training. Basically this module handles which are not required for that session.
Real-Time data which are shared between Desktop Capturing implemented by using
numbers of users on network. To Robot class of JAVA API.
implement this module JMF API used
which is directly supported for RTP Question/Answer Sharing
session. This module also provides an area This module is simple way of
on a display screen that multiple users can communication within number of clients
write or draw on. Whiteboards are a by using text data. This particular facility
principal component of teleconferencing of the system required for students to post
applications because they enable visual as the queries to the instructor in between
well as audio communication. This module sessions.
implemented by using Java Shared Data
toolkit which directly supports for drawing 4. USABILITY OF E-LEARNING
shapes, text and colors etc. The SYSTEM
communication between Instructor and Making an e-Learning system usable
student is through RTP protocol. Here we basically involves two aspects: technical
are using RTP, FTP, and TCP for usability and pedagogical usability. Simply
implementation. Information about the put, technical usability involves methods
for ensuring a trouble-free interaction with

the system, while pedagogical usability make assumptions concerning how

aims at supporting the learning process. learners will use the application. Further,
Both aspects of usability are intertwined they frequently provide additional insight
and tap the user‘s cognitive resources. The into the wants, needs, and expectations of
main goal should be minimizing the learners. This particular test of usability in
cognitive load resulting from interaction Rural Area is essential for Elearning
with the system in order to free more Applications.
resources for the learning process itself. A
prerequisite for doing so is the usability 5. RESULTS AND DISCUSSION
engineer's detailed knowledge about Basic results and advantages of e-learning
human learning in general and learning system:
goals and processes in a content domain in
particular. The paper mostly emphasizes Access to content anywhere and
on Technical Usability. anytime: 24 hours x 7 days
Access to content anywhere and anytime is
Evaluating Technical Usability of E- a strong reason. Imagine, however, you
learning would be able to pick and choose your live
To determine the usability of an e-learning lessons from a 24h scheduled Teachers
application the following three metrics are who are available 24 hours a day. How is
used, including heuristic evaluations, this possible? It is possible because of the
usability tests, and field studies. time zones and because one can enter a
• Heuristic evaluations. A heuristic virtual classroom at any given time.
is a rule or well-established standard. A • Classroom teaching is inconvenient
heuristic evaluation is a technique that because students ―have to wait to make up
entails the formal review of an application class ‗students have to wait to make up
with experts in usability and interface class’, this is why traditional classroom-
design to determine whether the based training initiatives are seen to be
application is aligned with recognized and "disruptive" because they often or not
established standards for graphical user include a waiting list. What about virtual
interfaces. Ideally, this procedure will be classroom-based initiatives? Very likely it
conducted before the application goes live will be much easier to fill these
or reaches another stage of development. classrooms.
The primary goal of a heuristic evaluation • The quality of e-learning content is
is to identify potential usability and ease of measurable, consistent and based on
use issues in order to resolve them before pedagogical expertise. ‘Quality of e-
final implementation. learning is measurable’. Granted: a CD
• Usability tests ask users to containing a course will very likely be
perform specified tasks on an application pedagogically and methodologically sound
within a controlled laboratory and will contain a test to examine progress
environment. Typical metrics collected made, when studying it from beginning to
during usability tests include the levels of end. A book or a manual too are of high
success users have performing a task, the quality.
amount of time that users need to complete As a result in the research of e-learning
a particular task, and the level of following points need to consider.
satisfaction that users have with the  While the usability and educational
application. effectiveness of an e-learning application
• Field studies involve watching are not one and the same, the two arguably
users interact with the application in their have very much in common. Even though
own environments. The appeal of field many organizations have made great
studies is that they negate the need to strides in their ability to develop and

deliver e-learning programs to their Computers are becoming more widely

employees, customers, and suppliers, the available and the younger generation are
usability of these e-learning applications is much more computer literate than their
often lacking or entirely overlooked when predecessors. The flexibility of learning
the usability issues require solving for both in time and space that it allows will
rural community. be attractive to many professional as they
 Given the large investments up skill themselves in a rapidly changing
organizations are making in online world. For e-Learning to be a success in a
training, and the unique needs of learners, rural area there needs to be total
it would be prudent to address the usability involvement of the local community,
of elearning applications. Doing so will leading to its acceptance and ownership,
help ensure that users can actually access combined with continuous communication
the necessary material, have optimal levels among the various specialists (e.g. teachers
of satisfaction with the learning and IT specialists) as well as proper
experience, and enable the organization to monitoring process.
maximize its e-learning investment.
REFERENCES
6. CONCLUSION [1] Irish Society for Information Technology in
This paper has presented an overview of Agriculture Sixth Annual Conference,
Portlaoise, 7 November 2002 ―E Learning
eLearning in the context of rural areas. Opportunities for Rural Areas‖, Jim Phelan
The proposed systems of e-learning with Department of Agribusiness, Extension and
modules are discussed. E-learning offers Rural Development, University College Dublin
real potential for the rural area. It will [2] ―Towards Tailormade eLearning Streaming
never replace the presence of a lecturer Services: A Framework for Specification,
Implementation and Management‖
and the class dynamic that is important in Telecommunications, 2006 International
the learning environment. It will however, Conference on Internet and Web Applications
bring learning opportunities to areas where and Services Publication Date: 19-25 Feb.
little was available in the past and this is a 2006 Volume 1, Page(s): 67 – 67
critical advantage for rural areas. [3] Phivos Mylonas, Paraskevi Tzouveli, Stefanos
Kollias, "Towards a Personalized e-Learning
Given the difficulties still with the on- Scheme for Teachers," icalt, pp. 560-564,
line systems it is almost a requirement that Fourth IEEE International Conference on
a back up CD be available as there will be Advanced Learning Technologies (ICALT'04),
several occasions for one reason or another 2004
that the system will not function. [4] ―No Lectures On-Campus: Can eLearning
provide a Better Learning Experience‖
However, these difficulties will decrease Anderson, Cushing; ELearning in Practice,
with time as the systems are perfected. In Blended Solutions in Action (An IDC White
rural area as people become more Paper sponsored by Mentergy Inc.)
confident they can begin to work more and [5] ―E-Learning Framework‖ Technical White
Paper February 2003: Sun Microsystems.
more from their homes.

DISEASE INFESTED CROP IDENTIFICATION USING DEEP

LEARNING AND SUGGESTION OF SOLUTION
J. N. Nandimath1, Sammit Ranade2, Shantanu Pawar3, Mrunmai Patil4
1,2,3,4
Pune, India.
jyotign@gmail.com1, ranadesammit@gmail.com2, shantanupawar4@gmail.com3,
mrunmaipatil47@gmail.com4
ABSTRACT
India is an agricultural country. A large amount of its population resides in villages
and depends on cultivation of crops as its primary source of livelihood. The outcome of
crop cultivation and its yield depends on a number of factors such as - quality of soil,
use of pesticides, herbicides, crop intensifiers , etc. Farmers face issues when dealing
with a disease infested crop due to lack of expertise.The usual way to detect infested
crops is using the naked eye or examination by an expert. This process, while being
time consuming, further leads to the overuse of pesticides, insecticides and crop
intensifiers in greed of a greater yield. The quality of the soil also suffers due to this
abuse of pesticide usage. This paper is a reflection of development towards disease
infested crop detection. The approach based on disease classification of crops , by the
use of deep learning and convolutional neural networks. Computer Vision techniques
present an opportunity to enhance and improve disease detection capabilities. A
proposed rule based approach finds use in the suggestion model used for pesticides,
insecticides.
General Terms
Computer Vision, Deep Learning, Convolutional Neural Networks
Keywords
Computer Vision, Deep Learning, Convolutional Neural Networks, Infested crops,
Image Processing
1. INTRODUCTION time consuming. An automated system
Sustainable agriculture greatly depends on designed to help identify crop diseases by
the ability of crops to fight pathogens and the crop‘s type , its appearance and visual
diseases without the use of chemical symptoms. Computer Vision techniques
pesticides. However, the present approach present an opportunity to enhance and
towards having pest free crops and a improve disease detection capabilities. A
greater yield largely depends on the proposed rule based approach finds use in
excessive use of pesticides. Timely the suggestion model used for pesticides,
diagnosis of crop infestation has higher insecticides.
importance due the value it brings to
farmer. Early diagnosis helps with 2. BACKGROUND AND
financial aid as well as avoids redundant MOTIVATION
use of pesticides. Visual examination of a Indian farmers spray a deadly cocktail of
crop by a trained professional is the prime pesticides because government lacks staff
technique adopted in practice for plant to guide them.The government‘s farm
disease detection. An expert with good extension system is crumbling. Private
knowledge and observation skills is thus companies have stepped into the vacuum
needed. This process of disease but they have commercial interests in
identification is dependent on the overselling pesticides.This degrades the
availability of a skilled expert. There is quality of crop produced and affects
room for error in this process. It is also population health at large.

leaves and fruits from diseased ones, one

more class was added in the dataset. It
contains only images of healthy leaves or
fruits. An extra class in the dataset with
background images was beneficial to get
more accurate classification. Thus, deep
neural network could be trained to
differentiate the leaves from the
surrounding. The main goal of the
presented study is to train the network to
learn the features that distinguish one class
from the others. Finally, a database
containing around 3000 images for
Fig 1: Consumption of Chemical Pesticides in India
training and 2000 images for validation
(1994 - ‘95 to 2016 -‘17) has been created. The augmentation
Source : scroll.in process shows all supported diseases
3. MATERIALS & METHODS- together with the number of original
OVERVIEW images and number of augmented images
for every class used as training and
validation dataset for the disease
classification model.
Table 1. Reference table for dataset gathering

Crop Disease Solution
Liquid
copper
Citrus
Citrus Plants fungicide
Canker
sprays (
organic )
Hot water
Citrus
Treatment
Citrus Plants Black
and Waxing
Spot
the fruit
Alternaria
Copper
Black
Fig 2: Flowchart Citrus Plants Oxychloride
Spots (
Dataset Spray
Appropriate datasets are required at all Leaves )
stages of object recognition research, Remove and
starting from training phase to evaluating destroy all
the performance of recognition algorithms. Grey affected parts
All the images collected for the dataset Strawberry
mould Avoid fruit
were downloaded from the Internet, contact with
searched by disease and crop name on soil.
various sources .Images in the dataset were
grouped into different classes which Remove
represented plant diseases which could be Rhizopus overripe fruit
Strawberry
visually determined from either the leaves Rot in the field.
or the fruit. In order to distinguish healthy Burn or bury

waste fruit dimension less than 500 pixels were not

from around considered as valid images for the dataset.
the packing It was ensured that images contain all the
shed. needed information for feature learning. It
is important to use accurately classified
Maintain images for the training and validation
adequate soil dataset. Only in that way may an
Grey
Strawberry moisture. appropriate and reliable detecting model
Mildew
Provide good be developed. Duplicated images that were
leaf canopy. left after the initial iteration of gathering
and grouping images into classes were
Fertilize on
removed from the dataset.
schedule,
Convolutional Neural Networks
using a low-
Fusarium Convolutional Neural Networks are very
Cotton nitrogen,
Wilt similar to ordinary Neural Networks. They
high-
are made up of neurons that have learnable
phosphorus
weights and biases. Each neuron receives
fertilizer.
some inputs, performs a dot product and
Bacterial Discard the optionally follows it with a non-linearity.
Cotton Each Layer accepts an input 3D volume
Blight crop
and transforms it to an output 3D volume
Foliar Spray through a differentiable function.
of 3gm Convolutional neural networks (CNNs)
Grey wettable consist of multiple layers of receptive
Cotton
Mildew sulphur per fields. These are small neuron collections
one litre of which process portions of the input image.
water The outputs of these collections are then
tiled so that their input regions overlap, to
Image Preprocessing obtain a higher-resolution representation
Images downloaded from the Internet were of the original image; this is repeated for
in various formats along with different every such layer. Tiling allows CNNs to
resolutions and quality. In order to get tolerate translation of the input image.
better feature extraction, final images Convolutional networks may include local
intended to be used as dataset for deep or global pooling layers, which combine
neural network classifier were the outputs of neuron clusters. They also
preprocessed in order to gain consistency. consist of various combinations of
Preprocessing images commonly involves convolutional and fully connected layers,
removing low-frequency background with point wise nonlinearity applied at the
noise, normalising the intensity of the end of or after each layer. A convolution
individual particles images, removing operation on small regions of input is
reflections, and masking portions of introduced to reduce the number of free
images. Image preprocessing is the parameters and improve generalisation.
technique of enhancing data. Furthermore , One major advantage of convolutional
procedure of image preprocessing networks is the use of shared weight in
involved cropping of all the images convolutional layers, which means that the
manually, making the square around the same filter (weights bank) is used for each
leaves, in order to highlight the region of pixel in the layer; this both reduces
interest (plant leaves and fruits). During memory footprint and improves
the phase of collecting the images for the performance. The convolutional neural
dataset, images with smaller resolution and network is also known as shift invariant or

Fig: Architecture of CNN

space invariant artificial neural network a type of feed-forward artificial neural
(SIANN), which is named based on its network in which the connectivity pattern
shared weights architecture and translation between its neurons is inspired by the
invariance characteristics. The organisation of the animal visual cortex.
convolutional layer is the essential Individual cortical neurons respond to
building block of the convolutional neural stimuli in a restricted region of space
network. AlexNet is the first work that known as the receptive field. The receptive
popularized Convolutional Networks in fields of different neurons partially overlap
Computer Vision was the AlexNet, such that they tile the visual field. The
developed by Alex Krizhevsky, Ilya response of an individual neuron to stimuli
Sutskever and Geoff Hinton. The AlexNet within its receptive field can be
was submitted to the ImageNet ILSVRC approximated mathematically by a
challenge in 2012 and significantly convolution operation. Convolutional
outperformed the second runner-up (top 5 networks were inspired by biological
error of 16% compared to runner-up with processes and are variations of multilayer
26% error). The Network had a very perceptron designed to use minimal
similar architecture to LeNet, but was amounts of pre-processing. They have
deeper, bigger, and featured Convolutional wide applications in image and video
Layers stacked on top of each other recognition, recommender systems and
(previously it was common to only have a natural language processing. The layer‘s
single CONV layer always immediately parameters are comprised of a set of
followed by a POOL layer). learnable kernels which possess a small
Neural Network Training receptive field but extend through the full
Training the deep convolutional neural depth of the input volume. Rectified
network for making an image Linear Units (Re LU) are used as
classification model from a dataset was substitute for saturating nonlinearities.
proposed. Tensor Flow is an open source This activation function adaptively learns
software library for numerical computation the parameters of rectifiers and improves
using data flow graphs. Nodes in the graph accuracy at negligible extra computational
represent mathematical operations, while cost. In the context correlation by
the graph edges represent the enforcing a local connectivity pattern
multidimensional data arrays (tensors) between neurons of adjacent layers: each
communicated between them. The flexible neuron is connected to only a small region
architecture allows you to deploy of the input volume. The extent of this
computation to one or more CPUs or connectivity is a hyper parameter called
GPUs in a desktop, server, or mobile the receptive field of the neuron. The
device with a single API. In machine connections are local in space (along width
learning, a convolutional neural network is and height), but always extend along the

entire depth of the input volume. Such the neurons in each depth slice to use the
architecture ensures that the learnt filters same weights and bias. Since all neurons
produce the strongest response to a in a single depth slice are sharing the same
spatially local input pattern. Three hyper parameterization, then the forward pass in
parameters control the size of the output each depth slice of the CONV layer can be
volume of the convolutional layer: the computed as a convolution of the neuron's
depth, stride and zero- padding. weights with the input volume (hence the
1. Depth of the output volume controls the name: convolutional layer). Therefore, it is
number of neurons in the layer that common to refer to the sets of weights as a
connect to the same region of the input filter (or a kernel), which is convolved
volume. All of these neurons will learn to with the input. The result of this
activate for different features in the input. convolution is an activation map, and the
For example, if the first Convolutional set of activation maps for each different
Layer takes the raw image as input, then filter are stacked together along the depth
different neurons along the depth dimension to produce the output volume.
dimension may activate in the presence of Parameter Sharing contributes to the
various oriented edges, or blobs of color. translation invariance of the CNN
2. Stride controls how depth columns architecture. It is important to notice that
around the spatial dimensions (width and sometimes the parameter sharing
height) are allocated. When the stride is 1, assumption may not make sense. This is
a new depth column of neurons is especially the case when the input images
allocated to spatial positions only 1 spatial to a CNN have some specific centred
unit apart. This leads to heavily structure, in which we expect completely
overlapping receptive fields between the different features to be learned on different
columns, and also to large output volumes. spatial locations. One practical example is
Conversely, if higher strides are used then when the input is faces that have been
the receptive fields will overlap less and centred in the image: we might expect
the resulting output volume will have different eye- specific or hair-specific
smaller dimensions spatially. features to be learned in different parts of
3. Stride controls how depth columns the image. In that case it is common to
around the spatial dimensions (width and relax the parameter sharing scheme, and
height) are allocated. When the stride is 1, instead simply call the layer a locally
a new depth column of neurons is connected layer. Another important layer
allocated to spatial positions only 1 spatial of CNNs is the pooling layer, which is a
unit apart. This leads to heavily form of nonlinear down sampling. of
overlapping receptive fields between the artificial neural networks, the rectifier is
columns, and also to large output volumes. an activation function defined as:
Conversely, if higher strides are used then f(x)=max(0,x) ,where x is the input to a
the receptive fields will overlap less and neuron. This is also known as a ramp
the resulting output volume will have function and is analogous to half-wave
smaller dimensions spatially. rectification in electrical engineering. This
Parameter sharing scheme is used in activation function was first introduced to
convolutional layers to control the number a dynamical network by Hahn loser et al.
of free parameters. It relies on one in a 2000 paper in Nature with strong
reasonable assumption: That if one patch biological motivations and mathematical
feature is useful to compute at some spatial justifications. It has been used in
position, then it should also be useful to convolutional networks more effectively
compute at a different position. In other than the widely used logistic sigmoid
words, denoting a single 2-dimensional (which is inspired by probability theory;
slice of depth as a depth slice, we constrain see logistic regression) and its more

practical counterpart, the hyperbolic with high-dimensional inputs such as

tangent. The rectifier is, as of 2015, the images, it is impractical to connect
most popular activation function for deep neurons to all neurons in the previous
neural networks. Deep CNN with ReLUs volume because such network architecture
trains several times faster. This method is does not take the spatial structure of the
applied to the output of every data into account. Convolutional networks
convolutional and fully connected layer. exploit spatially local Pooling operation
Despite the output, the input normalization gives the form of translation invariance; it
is not required; it is applied after ReLU operates independently on every depth
nonlinearity after the first and second slice of the input and resizes it spatially.
convolutional layer because it reduces top- Overlapping pooling is beneficially
1 and top-5 error rates. In CNN, neurons applied to lessen over fitting. Also in
within a hidden layer are segmented into favour of reducing over fitting, a dropout
―feature maps.‖ The neurons within a layer is used in the first two fully
feature map share the same weight and connected layers. But the shortcoming of
bias. The neurons within the feature map dropout is that it increases training time 2-
search for the same feature. These neurons 3 times comparing to a standard neural
are unique since they are connected to network of the exact architecture.
different neurons in the lower layer. So for Fig 4 : Suggestion Models
the first hidden layer, neurons within a
feature map will be connected to different
regions of the input image. The hidden
layer is segmented into feature maps
where each neuron in a feature map looks
for the same feature but at different
positions of the input image. Basically, the
feature map is the result of applying
convolution across an image. The
convolutional layer is the core building
block of a CNN. The layer's parameters
consist of a set of learnable filters (or Bayesian optimization experiments also
kernels), which have a small receptive proved that ReLUs and dropout have
field, but extend through the full depth of synergy effects, which means that it is
the input volume. During the forward pass, advantageous when they are used together.
each filter is convolved across the width The advance of CNNs refers to their
and height of the input volume, computing ability to learn rich mid- level image
the dot product between the entries of the representations as opposed to hand-
filter and the input and producing a 2- designed low- level features used in other
dimensional activation map of that filter. image classification methods.
As a result, the network learns filters that Suggestion of a solution to the identified
activate when it detects some specific type disease
of feature at some spatial position in the The identified disease based on the crop is
input. Stacking the activation maps for all used to suggest a feasible solution to the
filters along the depth dimension forms the problem. The solution mainly focus on
full output volume of the convolution Integrated Crop Management Techniques
layer. Every entry in the output volume that focus on Organic Farming and avoid
can thus also be interpreted as an output of heavy use of Chemical and Synthetic
a neuron that looks at a small region in the Pesticides and Insecticides. In case a
input and shares parameters with neurons solution doesn‘t exist, the proper approach
in the same activation map. When dealing based on a database of probable solutions

is selected. This approach guarantees that experimented with several formulas using
the end user will receive advice for the crossvalidation, such as linear (e.g. Borda
problem faced. A suggestion model that is Count) or exponential weights decreasing
basic is implemented. with the rank, and we settled for the
Suggestion Model based on k-NN following best-performing formula for
Classification scoring each candidate POI P: P = X k i=1
The idea is to assign a rating or score to si · Ri X k i=1 si , Ri = RD i + RW i 2 , (1)
each candidate POI based on the ratings of where si is the Indri tf-idf score of the ith
its k semantically nearest POIs (neighbors) ranked POI. This formula assigns to a
in the user profile. Then all candidate POIs candidate POI a score equal to the
are ranked in a decreasing order of their weighted average of the ratings of the k-
assigned scores. The model is nearest-neighbor POIs in a user profile,
implemented in three main steps: 1. where weights are given by tf-idf
Indexing the rated POIs. In order to be similarity. As POI‘s rating Ri we use the
able to find the k semantically nearest average rating of the description (RD i )
(rated) neighbor POIs of a candidate and the website (RW i ), because in Step 1
(unrated) POI, we create an index of the we index both the description and the text
POIs that are part of the user profiles and of website. The value of k that we use in
have been evaluated and rated by the our suggestions was optimized to k = 23
users. For each rated POI we index its title, by using 5-folds cross-validation [8] on the
description, place types and the text of its example places. The scored candidate
website. The place types of the rated POIs places are then ranked in a decreasing
are not provided by the track, but we order of their scores.
retrieve them from the three place search
engines that we use in context processing 4. EXPECTED RESULTS
(as we described in Section 2). For The proposed system is expected to pose
indexing, we use Indri6 v5.5 with the as a replacement for uneducated use of
default settings of this version, except that pesticides and chemical compounds. A
we enable the Krovetz stemmer [5]. controlled , better understanding of the
6http://www.lemurproject.org/ 2. crop in absence of Expert help is expected
Generating queries from candidate POIs. to come from the proposed system. The
We generate a query per candidate POI in system will take a high quality, high
a context. The query consists of the POI resolution image of the affected area of a
title, place types and the description of the crop of the user‘s selection. The captured
POI that we retrieved in the context image will be processed for feature
processing. From the query, we remove all extraction. A neural network and machine
punctuation and special characters. 3. learning model will further help draw
Scoring candidate POIs based on their k- conclusions based on the input image. The
NNs. We submit the queries (per context) conclusions thus drawn will help in the
that are generated in Step 2 to the index suggestion of a solution for the posed
that is created in Step 1 in order to rank the problem.
rated POIs in an increasing semantical
distance. In a standard k-NN [1, 6], a 5. CONCLUSIONS & FUTURE WORK
candidate POI (represented by its The result of this analysis will help easy
corresponding generated query) would be access to expertise. The system will
assigned the majority rating of the top-k improve with the influx of new data. A
retrieved POIs. In initial experiments, Neural Network at the tip of a farmer‘s
however, we found that taking into tips will enhance the quality of crop
account the ranks or retrieval scores of the production. We propose that this new
top-k results is beneficial. We system, with the help of expert domain

knowledge can be helpful n reducing the with the education we deserve. Needless to
usage of pesticides and insecticides. say, without them, this wouldn‘t have been
Organic Farming can be promoted. The possible. Our constant well-wishers, our
model used can be further scaled to other family and friends , who always had our
cops and plants as it is highly scalable. By back , and contributed through healthy
increasing the number of features and the discussions.
number of inputs to the Neural Network
the algorithms can be enhanced.this REFERENCES
technique is developed into a sophisticated [1] PLANT DISEASE DETECTION AND
interface in the form of a Website or CLASSIFICATION USING IMAGE
PROCESSING AND ARTIFICIAL NEURAL
Android Application it may prove to be NETWORKS - Mr. Sanjay Mirchandani, Mihir
great asset to the agricultural sector. In the Pendse, Prathamesh Rane, Ashwini Vedula
future this methodology can be integrated International Research Journal of Engineering
with other yet to be developed methods for and Technology (IRJET) e-ISSN: 2395-0056
disease identification and classification. [2] Plant Leaf Disease Detection using Deep
Learning and Convolutional Neural Network -
The use of other algorithms can be Anandhakrishnan MGJoel Hanson, Annette
explored to enhance the efficiency of the Joy, Jerin Francis
system in future. This application will International Journal of Engineering Science
serve as an aid to farmers (regardless of and Computing, March 2017
the level of experience), enabling fast and [3] AN IMAGE PROCESSING AND NEURAL
NETWORK BASED APPROACH FOR
efficient recognition of plant diseases and DETECTION AND CLASSIFICATION OF
facilitating the decision-making process PLANT LEAF DISEASES. - Garima Tripathi,
when it comes to the use of chemical Jagruti Save
pesticides. Furthermore, future work will International Journal of Computer Engineering
involve spreading the usage of the model and Technology (IJCET), ISSN 0976-6367
[4] DUTH at TREC 2013 Contextual Suggestion
by training it for plant disease recognition Track - George Drosatos, Giorgos
on wider land areas, combining aerial Stamatelatos, Avi Arampatzis and Pavlos S.
photos of orchards and vineyards captured Efraimidis1
by drones and convolution neural networks The Twenty-Second Text Retrieval
for object detection. By extending this Conference (TREC 2013), At NIST,
Gaithersburg, Maryland, Volume: Special
research, the authors hope to achieve a Publication 500-302
valuable impact on sustainable
development, affecting crop quality for
future generations. The main goal for the
future work will be developing a complete
system consisting of server side
components containing a trained model
and an application for smart mobile
devices with features such as displaying
recognized diseases in fruits, vegetables,
and other plants, based on leaf images
captured by the mobile phone camera.
6. ACKNOWLEDGMENTS
The authors would like to thank our
professor and guide. Prof. J.N. Nandimath
for her constant support and motivation.
Her encouragement and believe in our
work had got us this far. We would also
like to thank our college for providing us

Crop Recommendation Based On Local Environmental

Parameters Using Machine Learning Approach
Saurabh Jadhav1, Kaustubh Borse2, Sudarshan Dhatrak3, Milind Chaudhari4
1,2,3,4
Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Pune.
Saurabhjadhav5172@gmail.com1,borsekaustubh@gmail.com2,sudarshan.dhatrak@gmail.com3,milind22039
7@gmail.com4
ABSTRACT
India is a country of farmers which is mostly dependent on agriculture and is the most
crucial part of GDP. Indian farming is based on economical benefits from crop yields, but
now days agricultural era has failed to proven best crop selection methods and to increase
crop yield in all over India. So decrease in crop yield increases problem in farmers
financial health conditions. So it becomes most trending problem for our agricultural field
to invent such noble method to recommend best suitable crop for a particular region. To
achieve best suitable crop selection for regions based on parameters like soil conditions,
rainfall and weather we have implemented machine learning approach.
Crop recommendation is completely based on environmental factors like soil ,
weather and rainfall for perticular region. So there is need of machine learning techniques
like support vector machine and convolutional neural network for classification and
clustering dataset. We recommend best suitable crop for particular region based on this
regional parametric environmental information. Our contribution solves crop selection
problem and ultimately increase the rate of yields and helps to improve economical health
of our farmers.
Keywords Crop yield,Convolutional Neural Network, Support Vector Machine, Machine
Learning.
1. INTRODUCTION rainfall driven crop raising and controlling
Agriculture is the main base of Indian soil conditions using fertilizers.It is
economy. The agriculture era is the most difficult to elaborate the crop yield based
important economical sector in our county. on good fertilizers and considering
The farmers are totally depends on the rainfall. There is need of implement the
crops and their farms for economical gain. technique which helps the crop selection
The yield obtained primarily depends on for perticular region. In our work we are
weather conditions as rainfall patterns going to study the environmental factors of
largely influence cultivation this region like soil conditions, rainfall and
methodologies. With this context, farmers weather conditions. Help of this factors we
and agriculturalists require spontaneous train our system to make patterns of crops
advice proposition in predicting future for perticular region using support vector
reaping instances to maximize crop yield. machine. We are taking help of machine
material. learning algorithmic study on this
Due to lacking of contribution of information to make this work more and
technology, the throughput of agriculture more helpful.We recommends the best
is failed to achieve the rate of suitable crop selection for region wise
magnificence.Current farmers selects there gives farmers more production.
crop based on market rate of crop and not This work is highly profitable to farmers,
on the considering environmental factors agriculturalists, local self-governments,
study with respective perticular crop.This and Tahsildars to analyze and set capital
is the factor makes our farming industry for farming and crop growth. The
pattern less and loss in there crop proposed work also helps to solve the
yields.Over the year farmers take help of

chain of high suicide rate of farmers in frame and consequently, yield expectation
India. is an essential perspective for them.
Throughout the years, agriculturists have a
2. LITERATURE SURVEY thought regarding the example in yield
The research by Shanning Bao et al. [1] is according to intrinsic human instinct. Be
about an observation exists that the rural that as it may, precipitation as a
administration and product cultivars have noteworthy driver for harvest raising can
been enhanced clearly. Be that as it may, broadly shake instinctive yield expectation
the harvest yield variety incline due to by controlling a portion of the dirt and
above reason stay obscure yet. To assess ecological parameters identified with the
the fundamental sustenance trim (maize, product development. Additionally, the
soybean and rice) yield incline from 2007 correct sort of soil to be utilized for a
to 2016, the MODIS item (MCD12Q2) product is just known to the rancher just
was utilized to remove the develop date of by on-paper exhortation and makes it
various harvests. A two-band variation of troublesome for him/her to preliminary
the upgraded vegetation list at develop and test on harvest speculation.
date was connected to build up exact yield The research by Michael D. Johnson et al.
estimation display, coupling with factual [3] is about Harvest yield estimate models
product yield information. The normal for grain, canola and spring wheat
maize and soybean yield in study territory developed on the Canadian Prairies were
exhibited expanding pattern, yet rice yield created utilizing vegetation records got
introduced declining. Be that as it may, from satellite information and machine
maize yield in 22 urban areas and soybean learning techniques. Hier-archical
yield in 19 urban communities show bunching was utilized to assemble the
diminishing pattern really. Through harvest yield information from 40 Census
measurable investigation, the product yield Agricultural Regions (CARs) into a few
appropriation design was turned out to be bigger locales for building the figure
nearly settled. Most urban areas possesses models. The Normalized Difference
surmised position on the positioning of Vegetation Index (NDVI) and Enhanced
significant harvest yield. It was exhibited Vegetation Index (EVI) got from the
that a few urban communities, for instance Moderate-goals Imaging Spectro-
Chifeng city, was reasonable to create radiometer (MODIS), and NDVI got from
explicit agribusiness economy. This paper the Advanced Very High Resolution
can be utilized to give proposal for Radiometer (AVHRR) were considered as
agribusiness arranging and the board. indicators for harvest yields. Different
The research by Shruti Kulkarni et al. [2] direct relapse (MLR) and two nonlinear
stated that farming is the foundation of machine learning models – Bayesian
Indian economy. The yield got principally neural systems (BNN) and model-based
relies upon climate conditions as recursive apportioning (MOB) – were
precipitation designs generally impact utilized to gauge trim yields, with different
development techniques. With this specific blends of MODIS-NDVI, MODIS-EVI
circumstance, ranchers and agriculturalists and NOAA-NDVI as indicators.
require unconstrained counsel The research by X.E. Pantazi et al. [4] by
recommendation in anticipating future understanding yield restricting variables
harvesting cases to augment edit yield. requires high goals multi-layer data about
Because of deficient inclusion of elements influencing crop development
innovation, the throughput of farming is and yield. Consequently, on-line proximal
yet to achieve its full wonder. Each soil detecting for estimation of soil
agriculturist is keen on knowing the yield properties is required, because of the
he/she could expect at the collect time capacity of these sensors to gather high

goals information (>1500 test per ha), and 3. PROPOSED WORK

in this manner decreasing work and time DATASET GATHERING
cost of soil examining and examination. There are two data sets used for the our
The point of this paper is to foresee inside model. The first, contains historic district-
field variety in wheat yield, in light of on- wise rainfall data for Pune districts of
line multi-layer soil information, and Maharashtra. The collection period spans
satellite symbolism edit development to 10 years from 2010 to 2018. Rainfall is
attributes. Managed self-sorting out maps measured in millimeters and the labelled
equipped for dealing with existent data volume for a District is the mean of values
from various soil and product sensors by recorded at all the weather stations in the
using an unsupervised learning calculation District. The other data set contains a
were utilized. detailed description about the soil
The research by Xiang Xu et al. [5] by properties recorded in Pune District of
remote detecting pictures display huge Maharashtra recorded over 10 years. Soil
difference and intensityregions and edges, properties include the concentration of
which makes them very appropriate for Nitrogen, Phosphorous and Potassium
utilizing distinctive surface highlights to (NPK) in the soil (all in tones), the scales
legitimately speak to and arrange the items of pH of the soil, amongst others. Every
that they contain. In this paper, we present row of values is labelled with a
another method dependent on different corresponding Yield value expressed in
morphological segment investigation tones per hectare. The trained model
(MMCA) that abuses various textural proposed in this paper curates results of
highlights for disintegration of remote the model trained on rainfall data with the
detecting pictures. The proposed MMCA machine learning model trained on other
system isolates a given picture into soil properties.
numerous sets of morphological parts 1] Climate and Rainfall
(MCs) in view of various textural At the Western Ghat and hill region is cool
highlights, with a definitive objective of and eastern region having hot and dry
enhancing the flag to-clamor level and the climate. The maximum temperature of
information distinguishableness. A pune district ranges between 34 and 410C
distinctive element of our proposed during April-May, while the minimum
methodology is the likelihood to recover temperature varies between 50C to 100C
point by point picture surface data, as in the months of November to January.
opposed to utilizing a solitary spatial The average annual rainfall at the district
normal for the surface. In this paper, four is 675 mm, most of which is receive
textural features: content, coarseness, during South-West monsoon. However,
contrast,and directionality (counting even medium rainfall zone at district having on
and vertical), are considered for creating average rainfall of 900 mm, eastern region
the MCs. So as to assess the acquired have an average between 600 to 700 mm
MCs, we conduct classification by using while western region have an average of
both remotely detected hyperspectral and 1171 mm5. The regularity in occurrence in
manufactured gap radar (SAR) scenes, recent years has not experienced in the
demonstrating the limit of the proposed district.
technique to manage various types of 2] Soil and Topography
remotely detected pictures. The acquired Pune district possesses mainly three types
outcomes show that the proposed MMCA of soils, viz. black-fertile, brown and
system can prompt great order exhibitions mixed type. In western region soil, type
in various examination situations with has brown and low quality while eastern
restricted preparing tests. region having fertile and plain type. The
richest alluvial soil track found in the

Valley of Bheema River. The rivers Velu, economic and agricultural welfare of the
Ghod are left side of Bheema and countries across the world.
Indrayani, Bhama, Mula-Mutha etc. are at In future work we are going to focus on
right side. Each tahsil of the district have more detailed study of Indian crops to
minimum one river6. Therefore, the agro- invent emerging trends in digital
climatic condition of district is favorable. agricultural field.
4. SUPPORT VECTOR MACHINE 6. ACKNOWLEDGMENTS

SVM finds its place in this work for We would like to acknowledge our project
training the Recommendation system with guide, for providing necessary guidance to
training set. It is additionally used after the write the research paper.
classification using crops data based on
environmental factors. Algorithm works as REFERENCES
follows: [1] Shanning Bao, "Crop Yield Variation Trend
Due to this undesirable information And Distribution Pattern In Recent Ten
Years."2017 IEEE
present in the input data, both during [2] Shruti Kulkarni, "Predictive Analysis to
training and classification, the pre- Improve Crop Yield using a Neural Network
processor fails to identify the exact Model" 2018 IEEE
accuracy, thus failing to perform with [3] Michael D. Johnson, "Crop yield forecasting
improved efficiency. The parameter for the on the Canadian Prairies by remotely sensed
vegetation indices and machine learning
crops like climatic factor, moisture and methods .",2015 Elsevier B.V.
past dataset can be used to predict the [4] X.E. Pantazi, "Wheat yield prediction using
yield of the crop. Collection of more valid machine learning and advanced sensing
details of soil class, latitude, longitude and techniques", 9 November 2015
suitable crop can greatly accelerate the [5] Xiang Xu,"Multiple Morphological
Component Analysis Based Decomposition for
efficiency of work. The pre-training unit Remote Sensing Image Classification" , 2016
could hence be improved and a lot more IEEE.
features can be extended, thus significantly
contributing towards the agricultural
welfare worldwide.
Input of training set containing suitable
crops for given soil class and rainfall data.
Output in the form of crop
recommendation for current region.
5. CONCLUSION & FUTURE SCOPE

Thus we try to prove the current crop
selection method impacts on farmers
economical capacity by degrading yield
growth . So we invent the powerful crop
selection method based on machine
learning (SVM). By using this we
recommends the best suitable crop for the
regions considering environmental
conditions. Agriculture is the backbone for
a developing economy like India and there
is an enormous need to maintain the
agricultural sustainability. Hence it is a
significant contribution towards the

A SURVEY ON KEY DISTRIBUTION AND TRUST

BASED SCHEME ON BIG DATA ANALYSIS FOR
GROUP USER ON CLOUD SERVICE
Mrunal S.Jagtap1, Prof.A.M.Wade2

1,2
Department of Computer Engineering Smt.Kashibai Navale Collage of Engineering, Vadgaon(Bk), Pune,
India.
mrunalsjagtap5@gmail.com1, amwade@sinhgad.edu2
ABSTRACT
The most fundamental task is to provide a trustworthy service. Cloud computing
has been emerged as a computing network over the Internet. Cloud data indulge
storing of the data in the cloud as well as has sharing capability among multiple
users. Due to failures of human or hardware and even Software errors cloud data is
associated with data integrity. In this project the problem of a secure data search
with identity based authentication on cloud is solved by using encryption of data
before it actually used. Here, AES algorithm is used for searching, downloading the
file from the cloud. This paper presents a secure algorithm and provides security
backend as well as front end. Most of the people do their transaction through web
use. So there are chances of personal figures gets hacked then need to be provide
more refuge for both web server and database server. For that purpose dual security
system is used. The dual security system is used to identify & prevent attacks using
Intrusion detection system. Dual security prevents attacks and user account data
from unauthorized updating from his/her account.
Keywords
Encryption Algorithm, Data Privacy, Data Security, Dual security system, Message
digest
1.INTRODUCTION Cloud is a dynamic i.e. the resources
Cloud computing is a recently can add and remove the resources. The
developing paradigm of distributed most important advantage of cloud
computing [7]. Cloud computing is an computing is to pay for what you use.
internet based computing where all the Despite of the various advantages of
shared resources, software and cloud services, outsourcing sensitive
information are provided to the information (such as e-mails, personal
computers and devices on demand. User health records, company finance data,
can access the information or data from government documents, etc.) to remote
anywhere and anytime. Cloud computing servers brings privacy concerns. The
has following characteristics: on-demand cloud service providers (CSPs) that keep
self-service, broad network access, the data for users may access users‘
resource pooling, rapid elasticity, sensitive information without
measured services etc. Attracted by these authorization. A general approach to
features, both individuals and enterprises protect the data confidentiality is to
are motivated to outsource their data to the encrypt the data before outsourcing.
cloud, instead of purchasing software and However, this will cause a huge cost in
hardware to manage the data themselves. terms of data usability.

Figure 1. Existing System (Security-enhanced and trustworthy cloud service Broker (STCSB)
architecture)
Above figure explain the security 1.Identification of abnormal service
enhanced and trustworthy cloud service behavior, 2.Untrusted resource list,
broker architecture. There are many 3.Trust computing based on big data
modules in the existing architecture. analysis, 4.Trust based security measures
1. Communication and agent (access control, authorization, and
management module resource match making, 5.Security agent.
2. Trust computing module The agent publish and data perceiving
3. Cloud resource management module module can handles the real time service
In figure there are three types of agent data. Here verification mechanism are
which are monitoring agent, trust agent, introduced between agents. This
and last is service agent. All Trust agent mechanism is used to prevent a trust
has direct access to agent publish and data agent from hacked and hocked by
perceiving block. This block includes the malicious user. This mechanism can
security agent, QoS agent, Agent publish eliminates the data tempering issue.
sub block. Service agent can do cloud Cloud resource management module:-
service connection and adaption task. The federated service catalog can store
Monitoring agents are connected to both all the available and trustworthy services
trust agent and service agent .it can automatically chooses the high
Users can obtain service from selected trustworthy services to meets the
cloud broker i.e. STCSB which provides requirements of user. This module
the fast and trustworthy and secure creates a service catalog that links with a
services. This archicture is provides a highly trusted resource and then provides
security enhanced cloud service broker. this catalog as a trusted resource for the
Monitoring agent is used for enhanced user through the unified cloud service
the users experience. There are two portal [1].
different technologies are introduced in Trust computing module:-
this architecture which is first is cloud An administrator can manages the
monitoring and second is trust based virtual server on the unified cloud
cloud service. These two technologies management portal [1]. This portal can
are integrated to enhanced the security of creates the template for virtual server.
cloud computing and QoS of the service The cloud users open the unified cloud
provider. Compared with another service portal and select a trusted service
traditional collaborative cloud computing catalog when you like to would use a
framework, the STCSB architecture providers [1].
includes some security- enhanced
functional modules. These are

2. MOTIVATION: shared cloud data by constructing a

1. The main motivation of this system is homomorphic verifiable group signature.
users only allowed to perform the Unlike the existing solutions, proposed
selective operation for security purpose scheme requires at least group managers
2. Data modification analysis is another to recover a trace key cooperatively,
motivation in this system to avoid the which eliminates the abuse of single-
data tempering from user. authority power and provides non-
3. To understand issues and problems frameability. Moreover, this shheme
during the system fail. ensures that group users can trace data
4. Enhanced security changes through designated binary tree;
5. Increase the trust factor and can recover the latest correct data
6. Parallel batch processing block when the current data block is
damaged. In addition, the formal security
3. LITERATURE SURVEY analysis and experimental results
Xiaoyong Li , Jie Yuan, Huadong Ma, indicate that our scheme is provably
and Wenbin Yao 2018,the author suggest secure and efficient.
an innovative and parallel trust
computing scheme based on big data Ekta Naik, Ramesh Kagalkar, 2014.In
analysis for the trustworthy cloud service this paper [4] the author proposes
environment[1].firstly the author implemented double guard using IIS
proposes a distributed and modular (internet information and service
architecture for large scale virtual manager Furthermore, it quantify the
machine. After that an adaptive, limitations of any multitier IDS in terms
lightweight & parallel trust computing is of training sessions and functionality
proposed for the big data monitoring. coverage. Author implementing the
prevention techniques for attacks. The
X. Chen, J. Li, X. Huang, J. Ma, and author also finding IP Address of
W. Lou, 2015, in paper intruder. A network Intrusion Detection
[2] the author developed a model System (IDS) can be classified into two
which notion of verifiable database types: anomaly detection and misuse
(VDB) enables a resource-constrained detection. Anomaly detection first
client to securely outsource a very large requires the IDS to define and
database to an untrusted server so that it characterize the correct and acceptable
could later retrieve a database record and static form and dynamic behavior of the
update it by assigning a new value. Also, system, which can then be used to detect
any attempt by the server to tamper with abnormal changes or anomalous
the data will be detected by the client. behavior.
Author proposes a new VDB framework
from vector commitment based on the V. Vu, S. Setty, A.J. Blumberg, and M.
idea of commitment binding. The Walfish, 2013, in [5] work is promising
construction is not only public verifiable but suffers from one of two problems:
but also secure under the FAU attack. either it relies on expensive
Furthermore, author proves that our cryptography, or else it applies to a
construction can achieve the desired restricted class of computations. Worse, it
security properties. is not always clear which protocol will
perform better for a given problem.
Anmin Fu, Shui Yu, Yuqing Zhang, Author describe a system that (a)
Huaqun Wang, Chanying Huang 2016, extends optimized refinements of the
the author proposes[3] a new privacy- non-cryptographic protocols to a much
aware public auditing mechanism for broader class of computations, (b) uses

static analysis to fail over to the Experimental results show that T-broker
cryptographic ones when the non- yields very good results in many typical
cryptographic ones would be more cases, and the proposed mechanism is
expensive, and (c) incorporates this core robust to deal with various number of
into a built system that includes a service resources.
compiler for a high-level language, a
distributed server, and GPU acceleration. Haiying Shen and Guoxin Liu, 2014 in
Experimental results indicate that our this paper[9] , author presents an
system performs better and applies more integrated resource/ reputation
widely than the best in the literature. management platform, called Harmony,
for collaborative cloud computing.
S. Pearson and A. Benameur, 2010, [6] Recognizing the interdependencies
the author point out, cloud computing is between resource management and
an emerging paradigm for large scale reputation management, Harmony
infrastructures. Paper has the advantage incorporates three innovative
of reducing cost by sharing computing components to enhance their mutual
and storage resources, combined with an interactions for efficient and trust worthy
on-demand provisioning mechanism resource sharing among clouds.
relying on a pay-per- use business model.
These new features have a direct impact Ismail Butun, Melike Erol-Kantarci,
on the budgeting of IT budgeting but Burak Kantarci, and Houbing Song,
also affect traditional security, trust and 2016.In this paper[10] ,the ultimate goal
privacy mechanisms. Many of these is to design a cloud-centric public safety
mechanisms are no longer adequate, but network that is not only resilient but also
need to be rethought to fit this new reliable. Such a network is a cyber-
paradigm. In this paper author assess physical system that requires seamless
how security, trust and privacy issues integration of the cyber and physical
occur in the context of cloud computing elements (i.e., computing, control,
and discuss ways in which they may be sensing, and networking). Security and
addressed. privacy have to be built by design when
a develop a reliable public safety
Tian Li-qin, LIN Chuang,2016, in paper network.
[7] , author mainly discusses the P. Muralikrishna1, S. Srinivasan, N.
evaluation importance of user behavior Chandramowliswaran, 2015.Key
trust and evaluation strategy in the cloud distribution is very critical problem, in
computing, including trust object cryptography secrete sharing of any key
analysis, principle on evaluating user is invented by the Adi Shamir & Georgy
behavior trust, basic idea of evaluating Blakley in 1979.Secrete sharing is very
user behavior trust, evaluation strategy important concept to store secrete
of behavior trust for each access, and information or very sensitive information.
long access, which laid the theoretical Users of a group wish to communicate
foundation of trust for the practical cloud using symmetric encryption, they must
computing application. share a common key [12]. A secure
secret sharing scheme distributes shares
Xiaoyong Li, Huadong Ma, Feng so that anyone with fewer than t shares
Zhou, and Wenbin Yao, 2015.In this has no extra information about the secret
paper [8], author present T-broker, a trust- than someone with 0 shares. Recently, in
aware service brokering system for [13], Author discussed a secure secret key
efficient matching multiple cloud sharing algorithm using non-
services to satisfy various user requests. homogeneous equation. In this paper

[13], author gives an algorithm for such

perfectly secure scheme by using Pell‘s
equation.
4. GAP ANALYSIS
Cloud computing is buzzword not in
IT industry but in another field also. The
word cloud is came from the network
design by the network engineers to
represent the location of network devices
as well as inter-connection .the shape of
this is like a cloud. Cloud is normally
used for the storage purpose. Now a days
Figure 2. System architecture
it is possible to upload large amount data
Client can make a request for service
on cloud.it is advantage of cloud. But it is
to server. And server responds to that
also important to provide security to the
request. Manager layer and auditor layer
cloud as well as data. There are n no of
is the mediator between client and server.
algorithm are used to provide a security
Here AES algorithm is used for
to the cloud and data.
encryption and decryption of file.AES is
The existing system provides the cloud
symmetric key encryption algorithm. So
service broker architecture. The system
it can share a common key for both
is not providing a proper security to data
encryption and decryption
and also there is problem of trust
Client are nothing but a pc or
calculation. The key distribution is
workstation.
become important problem.
In client layer, n no of users are there
The proposed system should provide
which can make a request to server to
security to data and also verification is
upload or download or access the file.
done through the MD5 algorithm. And
Manager layer includes two main
encryption and decryption is done
component as job schedule & key
through the AES algorithm. The key
distribution. This layer manages the
generation is implemented in the AES
incoming request from the multiple users.
algorithm which provides the security to
Key distribution is done using AES
the data.
algorithm. And job scheduling is also
done on the basis of first come first
6. PROPOSED SYSTEM serve.
A. Architecture
Next layer is auditor layer. In auditor
This is just like a client server
layer, TPA and trust factor is there.
architecture. This architecture includes:
Typically TPA i.e. third party auditor is a
a. Client layer
one who can audit the information of
b. Manager layer
knowledge owner or consumer. Third
c. Auditor layer
party auditing is an accepted method for
d. Server.
establishing trust between business and
its data.
And trust factor is one who can be
used for verification of the user‘s profile
or account information. If any user wants
to download or upload the file at that time
the verification is done through the trust
factor.

Server can hosts, delivers and manages Methodology and Algorithm Used:
most of the resources and services to be 1. Encryption with signature
consumed by the client. It is just like a algorithm
database. Database can store the files, The signatures concept is used to hide
incoming request and encrypted data on the identity of singer on each file
the server. Server can manage the several generate encryption, so that the private
clients simultaneously. and sensitive information of user will
In this paper, the multiple users can secure. Here AES algorithm is used for
access data or upload or download data. the encryption and decryption.
But before that every user should2. Data Integrity Verification
registered first. After reregistering, it algorithm
would login and then it can upload, To maintain to overkill this issue here,
download or access data through their id we are giving public auditing process for
and password. When the user can cloud storage that users can check the
registered their personal information is integrity of data. The work that has been
stored in database and after that the user done in this line lacks data dynamics and
id and password is generated. Through true public auditability. MD5 algorithm
that user can use cloud services. Only is used for verification of user. If user is
registered user is able to login in cloud registered then and then only it can
server. The user is upload file or download or upload the file.
download or access file. Every time log is 3. Hash key generation
generate .and log is managed by the log It also uses random masking operation
generator. It can store at server. and index hash value in order to support
When user wants to download file the dynamic operations like insert, delete
at that time the secrete key required to and update over the shared data for
download a file. The key is generated at dynamic group. Hash key is just like a
the time of file upload .this key is key generation.
required for upload and download the
file. Every user can has the key. The user 7. MATHEMATICAL MODEL
id is used every time for security S= {I, P, R, O}
purpose. The user id is generated by the Where,
administrator. The admin will give S=system I=input
access to the user. Here data tempore R=rules/constraints O=output
analysis is also done. Data tempering is
that act of deliberately modifying I-{I1}
(destroying, manipulating or editing) data I1=file which contains text.
through unauthorized channel. The
system should does the analysis of data P={P1,P2,P3,P4,P5,P6,P7,P8,P9,P10}
which can keep a log of entire activities
which will happens in system. The P1=User Registration P2=Secrete Key
verification is done at the time of Generation P3=File Upload & Download
download. When user is upload a file this P4=Encryption & Decryption of File
file can convert into unreadable format.to P5=Temper Analysis
make it readable format the user should P6=Trust Factor P7=Audit Checking
follow the decryption flow. It should uses P8=Log Generation P9=Verification of
the key to download a file User P10=OTP Generation.
There are many modules
1. User module R= {R1, R2}
2. Admin module
3. Temper analysis module R1=User is must to registered first.

R2=Password should be more than 4 REFERENCES

character long as well user should use [1] Xiaoyong Li , Member, IEEE, Jie Yuan,
special symbols also. Member, IEEE, Huadong Ma, Senior
Member, IEEE, and Wenbin Yao,‖ Fast and
Parallel Trust Computing Scheme Based on
O= {O1, O2} Big Data Analysis For Collaboration Cloud
O1=Temper analysis is done and Service‖, 2018
restore successfully O2=User should get [2] X. Chen, J. Li, X. Huang, J. Ma, and W.
the original file. Lou,‖ New Publicly Verifiable Databases
with Efficient Updates‖ 2015
[3] Anmin Fu, Shui Yu, Yuqing Zhang, Huaqun
Wang, Chanying Huang,‖NPP: A New
8. CONCLUSION Privacy-Aware Public Auditing Scheme for
Our propose technique provides data Cloud Data Sharing with Group Users‖,2016
security using data encryption in cloud [4] Ekta Naik, Ramesh Kagalkar, ―Detecting and
Preventing Intrusions In Multi-tier Web
environment. We introduce a relative Applications‖2014
addressing method in which data will [5] V. Vu, S. Setty, A.J. Blumberg, and M.
check at entry level when user uploading Walfish, ―A hybrid architecture for
phases. Data privacy has become interactive verifiable computation‖, 2013
extremely important in the Cloud [6] S. Pearson and A. Benameur, ―Privacy,
security, and trust issues arising from cloud
environment. The object interface offers computing‖, 2010
storage that is secure and easy to share [7] Tian Li-qin, LIN Chuang, ―Evaluation of
across platforms. User Behavior Trust in Cloud Computing‖.

SURVEY ON MINING ONLINE SOCIAL DATA FOR

DETECTING SOCIAL NETWORK MENTAL
DISORDERS
Miss. Aishwarya Uttam Deore1, Prof. Aradhana A. Deshmukh2
1
Department of Computer Engineering, Smt Kashibai Navale College of Engineering, Vadgaon(Bk), India.
2
aishdeore2393@gmail.com1, aaradhna.deshmukh@gmail.com2
ABSTRACT
Development in social network communication prompts the dangerous utilization. An
expanding number of social networks mental disorders (SNMD), such as the
dependence on the cybernetic relationship, the over-burden of data and the constriction
of the network, have been noticed recently. Currently, the symptoms of these mental
disorders are passively observed, which causes late clinical intervention. In this paper,
argue that the mining of online social behavior offers the opportunity to actively
identify the SNMD at an early stage. It is difficult to detect SNMD because the mental
state cannot be observed directly from the records of online social activities. Our
approach, new and innovative for the practice of SNMD detection, it is not based on the
self-disclosure of these mental factors through questionnaires psychology. Instead, we
propose a framework of machine learning, or the detection of mental disorders in
social networks (SNMD), which exploits the features extracted from social network
data to accurately identify potential SNMD cases. We also use multiple sources
learning in SNMD and proposing a new SNMD-based tensor model (STM) to improve
accuracy. To increase the scalability of STM, we further improve efficiency with
performance guarantees. Our framework is evaluated through a user study with no of
users of the network. We perform a feature analysis and also apply SNMD in large-
scale data sets and analyze the characteristics of the three types of mental disorder.
Index Terms — social network, mental disorder detection, feature extraction,
Decision Tree classifier.
1. INTRODUCTION define a set of mental disorder-related
Mental disorder is becoming a threat to textual, visual, and social attributes from
people‘s health now days. With the rapid various aspects. Fast pace of life,
pace of life, more and more people are progressively and more individuals are
feeling mentally disturb. It is not easy to feeling stressed. Though Mental disorder
detect user‘s mental disorder in an early itself is non-clinical and common in our
time to protect user. With the fame of web- life, excessive and chronic disorder can be
based social networking, individuals are rather harmful to peoples physical and
used to sharing their day by day activities mental health. User‘s social interactions on
and interacting with friends via web-based social networks contain useful cues for
networking media stages, making it possible stress detection.
to use online social network data for mental Social psychological studies have made
disorder detection. In our system, we find two interesting observations. The first is
that users disorder state is closely related to mood contagion: a bad mood can be
that of his/her friends in social media, and transferred from one person to another
we employ a large-scale dataset from real- during social interaction. The second
world social platforms to systematically Social Interaction: people are known to
study the correlation of user‘s disorder social interaction of user. The
states and social interactions. We first advancement of social networks like

Twitter, Facebook and Sina Weibo, an Today, identification of potential mental

ever increasing number of people will disorders often falls on the shoulders of
share their every day events and moods, supervisors who can observe therefore
and interact with friends through the social mentioned symptoms better than others
networks. We can classify using machine but only passively. As the facts that there
learning framework. Due to leverage both are very few notable physical risk factors,
Facebook post content attributes and social the patients usually do not actively seek
interactions to enhance mental disorder medical or psychological services to
detection. After getting disorder level, reduce these symptoms.
system can recommended user hospital for Although previous work in Psychology
further treatment, we can show that has identified several crucial mental
hospital on map and system also factors related to SNMDs as standard
recommended to take precaution for diagnostic criteria for detecting SNMDs,
avoid disorder. they are mostly assessed via survey
questionnaires by design. To detect
2. MOTIVATION potential SNMD cases of OSN users,
The rapid increase of Mental Disorder has extracting these factors to assess the
become a great challenge to human health mental states of users is very challenging.
and life quality. Thus, there is significant There is a need for developing new
importance to detect Mental Disorder approaches for detecting SNMD cases of
before it turns into severe problems. OSN users. We argue that mining social
Traditional psychological Mental network data of individuals, as a
Disorder detection is mainly based on complementary alternative to the
face-to face interviews, self-report conventional psychological approach,
questionnaires or wearable sensors. provides an excellent opportunity to
However, traditional methods are actually actively identify those cases at an early
reactive, which are usually labor- stage.
consuming, time-costing and hysteretic.
The ascent of web-based social 4. LITERATURE SURVEY
networking is changing individuals' life, In this paper [1], we present our new
as well as research in medicinal services deep CNN architecture, MaxMin-CNN, to
and health. As these social media data better encode both positive and negative
timely reflect users‘ real-life states and filter detections in the net [1].
emotions in a timely manner, it offers new Advantages: 1. we propose to adjust the
opportunities for representing, measuring, standard convolution square of CNN
modeling, and mining user‘s behavior keeping in mind the end goal to exchange
patterns through the large-scale social more data layer after layer while keeping
networks and such social information can some invariance inside the system 2. Our
find its theoretical basis in psychology fundamental thought is to abuse both
research. Although there are some positive and negative high scores got in
Limitations exist in tweeting content the convolution maps. This conduct is
based Mental Disorder detection. acquired by altering the customary
Therefore In this project, we presented a enactment work venture before pooling
framework for detecting users‘ Disadvantages: Time required for this
psychological Mental Disorder states from is more. It is time consuming process.
users‘ weekly social media data,
leveraging tweets‘ content as well as We study [2] the about a an automatic
users‘ social interactions stress d e t e c t i o n method from cross-
media micro blog data.
3. OPEN ISSUES

Advantages: 1.Three-level framework to short lived properties of the condition)

for stress detection from cross-media and the identity attributes .
micro blog data. By combining a Deep Disadvantages: 1. in work
Sparse Neural Network to incorporate environments, where stress has become a
different features from cross- media micro serious problem affecting productivity,
blog data, the framework is quite feasible leading to occupational issues and causing
and efficient for stress detection. 2. This health diseases.2. Our system could be
framework, the proposed method can help extended and employed for early detection
to automatically detect psychological of stress-related conflicts and stress
stress from social networks. contagion, and for supporting balanced
Disadvantages: we plan to investigate workloads.
the social correlations in psychological This is used[5] to study about a
stress to further improve the detection Learning robust uniform features for
performance. cross-media social data by using cross
We are [3] interested in the identity of auto encoders.
clients. Identity has been appeared to be Advantages: 1. to solve learning
applicable to many sorts of co- operations. models to address problem handle the
Advantages: 1. we are interested in the cross-modality correlations in cross-media
identity of clients. Identity has been social elements. 2. We propose CAE to
appeared to be applicable to many sorts of learn uniform modality- invariant features,
cooperation‘s; it has been appeared to be and we propose AT and PT phases to
helpful in anticipating work fulfillment, leverage massive cross media data samples
relationship achievement, and even and train the CAE. Disadvantages:
inclination 2. We are intrigued in the Learning robust uniform features for
identity of clients. Identity has been cross- media social data by using cross
appeared to be applicable to many sorts auto encoders take a more time.
of communications; it has been We can[6] studies about when a any
appeared to be valuable in foreseeing person feel fine and searching the
work fulfillment, expert and sentimental emotional Web .
relationship achievement, and even Advantages: 1. on the usage of We Feel
inclination for various interfaces. Fine to suggest a class of visualizations
Disadvantages: we can begin to answer called Experiential Data Visualization,
more sophisticated questions about how to which focuses on immersive item-level
present trusted, socially-relevant, and interaction with data.
well-presented information to users. 2. The implications of such
We have [4] Studies about Daily stress visualizations for crowd sourcing
recognition from mobile phone data, qualitative research in the social sciences.
weather conditions and individual traits. Disadvantages: Repeated information in
Advantages: 1. That day by day stress can relevant answers requires the user to
be dependably perceived in view of browse through a huge number of answers
behavioral measurements, got from the in order to actually obtain information.
client‘s cell phone action what‘s more, To study [7] about bridging the
from extra markers, for example, the vocabulary gap between health seekers
climate conditions (information relating and healthcare knowledge with a global
learning
Approach. comprises of two components, local
Advantages: 1. a medical terminology mining and global learning. 2. Extensive
assignment scheme to bridge the evaluations on a real world dataset
vocabulary gap between health seekers demonstrate that our scheme is able to
and healthcare knowledge. The scheme produce promising performance as

compared to the prevailing coding This is [10] to studies about the

methods. influence maximization problem, which
Disadvantages: we will investigate how aims to find a small subset of nodes
to flexibly organize the unstructured (users) in a social network that could
medical content into user needs-aware maximize the spread of influence.
ontology by leveraging the recommended Advantages: 1. A Pair wise Factor
medical terminologies. Graph (PFG) model to formalize the
problem in probabilistic model, and we
Picture [8] tags and world knowledge extend it by incorporating the time
learning tag relations from visual semantic information, which results in the Dynamic
sources studies the use of everyday words Factor Graph (DFG) mode. 2. The
to describe images. proposed approach can effectively
Advantages: The proposed tagging discover the dynamic social influences.
algorithm generalizes to unseen tags, and Disadvantages: 1. Parallelization of our
is further improved upon incorporating algorithm can be done in future work to
tag-relation features obtained via ICR. scale it up further.
Disadvantages: Techniques to better
incorporate multi-word terms and out-of- 5. PROPOSED METHODOLOGY
vocabulary words; advanced NLP Today online SNMDs are usually
techniques for learning word relations treated at a late stage. To address this
from free-form text; evaluation of latent issue, we propose an approach, new to the
concept relation suggestion, and predicting current practice of SNMD detection, by
the type of relations. mining data logs of OSN users to actively
This is [9] used to we study a novel identify potential SNM cases early. We
problem of emotion prediction in social formulate the task as classification
networks. problem to detect three types of social
Advantages: 1.A method referred to as network mental disorder detection using
Mood Cast for modeling and predicting Machine learning framework:
emotion dynamics in the social network 2. i) Stress and non-Stress.
The proposed approach can effectively ii) Cyber-Relationship Addiction, which
model each user‘s emotion status and the shows addictive behavior for building
prediction performance is better than online relationships.
several baseline methods for emotion iii) Net Compulsion, which shows
prediction. compulsive behavior for online social
Disadvantages: It is used to due to the gaming or gambling.
limited number of participants. iv) Information Overload, which is
related to uncontrollable surfing.
We develop a machine learning We first focus on extracting
framework for detecting SNMDs, discriminative and informative features
namely Social Network Mental Disorder for design of SNMDD. Then we use
Detection (SNMDD). Moreover, we decision tree classifier to classify
design and analyze many features from whether user have mental disorder or
OSNs, such as disinhibition, not? If any user found to have mental
parasociality, self-disclosure, etc., which disorder then there is provision to send
serve as important factors or proxies for notification to user. User gets notified
identifying SNMDs. The proposed about mental disorder and there is a
framework can be deployed as a recommendation of precautions and
software program to provide an early recommendation of hospital and map.
alert for potential patients and their
advisors.


A. Architecture
Fig.1. System Architecture

B. Hardware and Software Requirements  At the beginning, we consider the whole
Hardware Requirements: training set as the root.
1. Processor - Pentium III  Feature values are preferred to be
2. RAM - 2 GB(min) categorical. If the values are continuous
3. Hard Disk - 20 GB then they are discredited prior to building
4. Key Board - Standard Windows the model.
Keyboard  On the basis of attribute values records are
5. Mouse - Two or Three Button Mouse distributed recursively.
6. Monitor - SVGA  We use statistical methods for ordering
attributes as root or the internal node.
Software Requirements:
1. Operating System - Windows D. Mathematical Model
2. Application Server - Apache Tomcat 1. Shannon index H
3. Coding Language - Java 1.8
4. Scripts - JavaScript. Where, H(S) = Entropy of set S
5. Server side Script - Java Server Pages. pi = proportion of users‘ friends belonging
6. Database - My SQL 5.0 to the i-th type of attributes and
7. IDE - Eclipse Nt = the total number of types
2. Information Gain IG
C. Algorithm
Decision Tree Machine Learning Where,
Algorithm: X is attribute on which we split set S
The decision tree algorithm falls into the p(t) is the proportion of number of
category of supervised learning. They can elements in t to the number of element in
be used to solve regression and set S
classification problems. The decision tree
uses the representation of the tree to solve 6. CONCLUSION
the problem in which each leaf node In this paper, consequently recognize a
corresponds to a class label and the potential online user with SNMDs.
attributes are represented in the inner node Psychological Mental Disorder is
of the tree. compromising individuals‘ wellbeing. It is

inconsequential to distinguish Mental pages 12371242, 2011.

Disorder timely for proactive [2] H. Lin, J. Jia, Q. Guo, Y. Xue, J. Huang, L.
Cai, and L. Feng. Psy- chological stress
consideration. Accordingly we displayed a detection from cross-media microblog data
structure for recognizing clients‘ Mental using deep sparse neural network. In
Disorder states from clients‘ month to proceedings of IEEE International Conference
month online networking information, on Multimedia Expo, 2014.
utilizing Facebook post ‘ content just as [3] Jennifer Golbeck, Cristina Robles, Michon
Edmondson, and Karen Turner. Predicting
clients‘ social associations. Utilizing personality from twitter. In Passat/socialcom
genuine internet based life information as 2011, Privacy, Security, Risk and Trust, pages
the premise, we considered the connection 149156, 2011
between‘s users‘ Mental Disorder states [4] Andrey Bogomolov, Bruno Lepri, Michela
and their social collaboration practices we Ferron, Fabio Pianesi, and Alex Pentland.
Daily stress recognition from mobile phone
suggested the user for wellbeing advisor or data, weather conditions and individual traits.
specialist. We demonstrate the medical In ACM International Conference on
clinics for further treatment on a chart Multimedia, pages 477486, 2014.
which find most limited way from current [5] Quan Guo, Jia Jia, Guangyao Shen, Lei
area user to that emergency clinic. Zhang, Lianhong Cai, and Zhang Yi. Learning
robust uniform features for cross-media social
data by using cross autoencoders. Knowledge
7. ACKNOWLEDGEMENT Based System, 102:64 75, 2016.
The authors would like to thank the [6] Sepandar D. Kamvar. We feel fine and
researchers as well as publishers for searching the emotional web. In Proceedings
making their resources available and of WSDM, pages 117126, 2011.
[7] Liqiang Nie, Yi-Liang Zhao, Mohammad
teachers for their guidance. We are Akbari, Jialie Shen, and Tat- Seng Chua.
thankful to the authorities of Savitribai Bridging the vocabulary gap between health
Phule University of Pune and concern seekers and healthcare knowledge. Knowledge
members of ICINC 2019 conference, for and Data Engineering, IEEE Trans- actions on,
their constant guidelines and support. We 27(2):396409, 2015.
[8] Lexing Xie and Xuming He. Picture tags and
are also thankful to the reviewer for their world knowledge: learning tag relations from
valuable suggestions. We also thank the visual semantic sources. In ACM Multimedia
college authorities for providing the Confer- ence, pages 967976, 2013.
required infrastructure and support. [9] Yuan Zhang, Jie Tang, Jimeng Sun, Yiran
Finally, we would like to extend a heartfelt Chen, and Jinghai Rao. Moodcast: Emotion
prediction via dynamic continuous factor graph
gratitude to friends and family members. model. 2013 IEEE 13th International
Conference on Data Mining, pages 11931198,
REFERENCES 2010.
[1] Dan C Ciresan, Ueli Meier, Jonathan Masci, [10] Chi Wang, Jie Tang, Jimeng Sun, and Jiawei
Luca Maria Gambardella,and J urgen Han. Dynamic social influence analysis
Schmidhuber. Flexible, high performance through time-dependent factor graphs.
convolutional neural networks for image Advances in Social Networks Analysis and
classification. In Proceedings of International Mining (ASONAM), 2011 International
Joint Conference on Artificial Intelligence, Conference on, pages 239 246, 2011.

SURVEY ON SECURE CLOUD LOG FOR CYBER

FORENSICS
Arati S. Patil1, Prof. Rachana A. Satao2
1,2
Department of Computer Engineering, Smt. Kashibai Navale College of Engineering, Vadgaon(Bk),
India.
aratipatil708@gmail.com1, rachana.satao@gmail.com2
ABSTRACT
Cloud computing represents a different paradigm in the field of distributed computing
that involves more and more researches. Compared to digital forensic, the field of cloud
forensic has a lot of difficulties because data is not stored on a single place and further
more it implies the use of virtualization technologies. Client movement logs can be
significant wellspring of data in cloud legal examinations. There are many other
existing sources for secure logging designed for conventional system rather than
complexity of the cloud environment. In this paper, we are proposing an alternative
scheme for secure logs in cloud environment.
Preventing modification of log from unauthorized person is important. Hence
we are using some encryption techniques in which we are trying to store log files in
encrypted form for that we are using unique user‘s public key so that other
unauthorized user can‘t decrypt the content.
Keywords
Cloud Forensic, Cloud Log, Cloud Computing, Cloud Security, Proof of past log.
1. INTRODUCTION cloud server, a client device and other
Cloud computing is a complex model in network infrastructure are compromised
which on-demand resources are provided due to malicious cyber activity. Due to
with storage at a little cost, in a very this, the host‘s illegal contents such as
flexible and efficient manner. As a cloud radicalization materials need to be
user performs various activities as per analyzed using forensic analysis. Due to
requirement in the cloud environment and the inherent nature of cloud technologies,
those activities got recorded in log files. conventional digital forensic procedures
The process of this recording is known as and tools need to be updated to retain the
logging. Log files provide various same usefulness and applicability in a
information regarding user activity, cloud environment.
servers, networks, operating systems, The rest of this paper is organized as
firewalls. Using these Log files, we can follows. Section II summaries the
optimize the system performance network, literature survey. Section III introduces the
and later perform net work monitoring and proposed methodology. Design in Section
investigate the malicious behavior. This IV. Result and discussion in Section V.
information is beneficial for cloud Section VI focuses on the conclusion.
forensics. 2. MOTIVA514TION
Cloud storage, security, and privacy are Though numerous opportunities to
fairly established research areas, which is different level of consumers are offered by
not surprising considering the widespread cloud computing many security issues of
adoption of cloud services and the cloud environment have not been resolved
potential for criminal exploitation (e.g., yet. Recent Survey 74% of IT executives
compromising cloud accounts and servers and CIO‘s referred to security as the main
for the stealing of sensitive data). reason revents their migration to the cloud
Interestingly cloud forensics is a relatively services model. Some recent and well-
less understood topic. In cloud service, publicized attacks on cloud computing

platform justify the concern with security. as a keyword in the literature with the aid
For example, a botnet attack on Amazon‘s of search engine SUMMON. A keyword is
cloud infrastructure was reported in 2009. known as ―cloud forensics" was used and
Categories it in three main dimensions
3. LITERATURE SURVEY based as (1) survey (2) technology and (3)
In this section, we have discussed different forensics-procedural. The aim in the paper
papers referred, based on cloud computing is not just to refer the related work on
as well as how the cloud logs can be discussed dimensions but to analyze those
secured and preserved. X. Liu et al. [1] dimensions and identify research gaps
help of multiple key they proposed with the help of generating a map.
outsourced calculation framework with
new efficient and privacy preserving. In [5] Indrajit Ray et al. drafted a
Hence the proposed designed allowed comprehensive scheme which addresses
different service provider to outsource security and integrity issues not just during
their data. For reducing private key the log generation phase, but also during
exposure risk they algorithm like other stages in the log management
cryptographic primitive, Distributed Two process, including log collection,
Trapdoors Public-Key Cryptosystem (DT- transmission, storage,and retrieval.
PKC), which helped them for splitting a Outsourcing log management to cloud
strong private key into various different used to arise for log privacy was the
shades. challenge. While storage or retrieval log
Drafted Secure Logging-as-a-Service should not be traceable, so that logs can be
(SecLaaS) [2], Author has put up some used or network to provide anonymous
storage virtual machines logs and permits protocols on logs in the cloud.Developed
legal access to forensic examiners protocol has the potential for usage in
guaranteeing the privacy of the cloud various areas.
customers. In addition to that, SeclaaS Ben Martini et al. [6] proposed an
sustains past log proof and accordingly integrated conceptual digital forensic
protects the confidentiality of the cloud framework which gives particular
logs from invalid investigators or CSPs. importance to the preservation of forensic
Eventually, Author successfully data and the collection of cloud computing
determined the feasibility of the work by data forforensics. The overarching
systematizing SecLaaS for network logs in framework for conducting digital forensic
a cloud of OpenStack. investigations in the cloud computing
Zhihua Xia et al. proposed a scheme for environment, they even stated that there
image retrieval image retrieval helped the must be further research to develop a
data owner for out sourcing the image library of digital forensic methodologies
database. Local sensitive has utilized for that would best suit the various cloud
improving the search ef515ficiency as well platforms and deployment models.
as two different stages were designed to
improve the search efficiency, the first In another work, Alecsandru Patrascu et
stage the unique images were filtered out al. [7] drafted a novel solution which
by pre-filter tables, and in the second provided investigators of digital forensic a
stage, the remaining image was compared reliable and secure method for monitoring
one by one by using EMD metric for activities of users in cloud infrastructure.
refined search results. Hence they mainly focused on the various
field like to increase the security and
Here author [4] highlights the state-of-the- safety as well as reliability of the cloud.
art digital forensics of cloud computing. Authors even proposed a model which
They pinpointed when the term was used allowed investigators to seamlessly

analyze workloads and virtual machines study related to Cloud Log Forensics was
while preserving scalability of large scale highlighted for the implementation of
distributed systems. analyzing malicious behavior of cloud log
investigation. To tolerate the
Lightweight hypervisor introduced in [8] susceptibilities of cloud log, Cloud log
to acquire and preserve data for reliable forensics security requirement,
live forensics. In three ways the reliability vulnerability points and the challenges
is improved: the lightweight architecture, were identified. In this paper they identify
the data acquisition mechanism, and the and introduce challenges and future
evidence protection mechanism. Unused directions to highlight open research areas
device drivers are eleminated to reduce the of CLF for motivating investigators,
TCB size, thereby decreasing the academicians, and researchers to
vulnerability of our hypervisor. investigate on them.
In [9] author highlights various issues and Author in [10] discusses proposed scheme
challenges involved while investigation of to protect the privacy of the data and the
data in cloud logs as well as states the query user from cloud and even to
state-of-art of Cloud Log Forensics. Case
resist the attackers knowing data that data Architecture of Proposed Scheme
owner shares with query user in addition to
the encrypted data. To achieve secure out Proof of Past
sourcing storage and k-NN query they Logs Internet
improved a dot product protocol and
merged it with the K-NN query system.
4. PROPOSED METHODOLOGY
A dishonest cloud user can attack a system
outside the cloud. They can also attack any Cloud Service
Provider(CSP) Encrypted
application deployed in the same cloud, or Logs
an attack can be launch against a node
controller which controls all the cloud User User
activities. For a virtual machine (VM), Logs Logs
CLASS scheme (Fig. 1) takes the log from
the node controller (NC), hides its content, Investigator
and stores it in a database. These storage
allow logs to become available for further
investigation despite VM shutdown.
Moreover, CLASS publishes its proof so Cloud Users
that log integrity protected and
User User
admissibility ensured. An essential term of 1 2
our proposed system is defined initially.
Then attacker‘s capability, possible
attacks on logs, and the security properties Fig. 3.1 Proposed Scheme
of a secure cloud log services are
provided.

• Log: A log can be the network log, α is a primitive root of q

process log, operating system log, or any if a mod p,a^2 mod p,.....a^(p-1)
other log generated in the cloud for a VM. mod p
• Proof of Past Logs (PPL): The PPL Step 3:Assume X_A (Private Key) and
contains the proof of logs to ensure the User A
integrity of logs. Calculate Y_A (Public Key) and
User B
• Log Chain (LC): The LC maintains the Step 4:Assume X_B (Private Key) and
chronological ordering of logs to protect User B
the logs from reordering. Step 5:Calculate Y_B (Public Key) and
• CSP: A Cloud Service Provider (CSP) is User B
the owner of a public cloud infrastructure, Step 6:Key Generation
who generates the PPL, makes it publicly A K= (Y_B) ^X_A mod q
available, and exposes APIs to collect B K= (Y_A) ^X_B
logs. if ( K = K)
• User: A user is a customer of the CSP,
who rents VMs provided by the CSP. A {
user can be malicious or honest. Key exchange is successful.
• Investigator: An investigator is a }
professional forensic expert, who needs to else
collect necessary logs from cloud {
infrastructures in case of any malicious Key exchanging unsuccessful.
incident. }
• Auditor: Usually, an auditor will be the
court authority that will verify the Enhanced Rijndael Encryption Algorithm:
correctness of the logs using PPL and LC.
• Intruder: An intruder can be any Steps of encryption for a 128-bit block as
malicious person including insiders from follows.
CSP, who wants to reveal user‘s activity1. Round keys from the cipher key
from the PPL or the stored logs. are derived.
The heading of a section should be in2. The state array initialized with
Times New Roman 12-point bold in all- the block data (plaintext).
capitals flush left with an additional 6-3. Initial round key added to the
points of white space above the section starting state array.
head. Sections and subsequent sub-4. Perform nine rounds of state
sections should be numbered and flush manipulation.
left. For a section head and a subsection5. It followed by the tenth and final
head together (such as Section 3 and round of state manipulation.
subsection 3.1), use no additional space6. The final state array is copied out
above the subsection head. as the encrypted data (ciphertext).
Algorithms 7.
Forward-secure authentication: UseCase Diagram
Step 1:Start
Not an encryption algorithm.
Exchange SECRET/Symmetric
Key
Asymmetric encryption
(Public/Private key)
Step 2: Assume Prime Number q
Selectαsuch that,

5. CONCLUSION
To execute a successful forensics
investigation in clouds, the proposed
system uses CSPs to collect logs from
different sources. The system uses secure
logs for the cloud which is a solution to
store and provide logs for forensics
purpose securely. Also, provide privacy of
cloud users by encrypting cloud logs with
a public key of the respective user while
also facilitating log retrieval in the event of
an investigation. This scheme allows
CSPs to store logs while preserving the
confidentiality of cloud users.
Fig 4.1:- UseCase Diagram Additionally, an auditor can check the
Above figure shows that the user can get integrity of the logs using the Proof of Past
access to files for the cloud by providing Log (PPL).This cloud logs can be securely
his public key and get access to log files used for cyber forensics.
and access to files from the cloud.. The
investigator sends the request to cloud REFERENCES
service provider to check user activities [1] X. Liu, R. H. Deng, K.-K. R. Choo, and J.
after the approval from a cloud service Weng, "An efficient privacy-preserving
provider, investigator receives the file with outsourced calculation toolkit with multiple
encrypted logs, and can get the decrypted keys," IEEE Transactions on Information
Forensics and Security, vol. 11, pp. 2401-
file after the match ofa key provided by 2414, 2016.
him. [2] Shams Zawoad; Amit Kumar Dutta; Ragib
Hasan,‖ Towards Building Forensics Enabled
Sequence diagram Cloud Through Secure Logging-as-a-
Below figure represents the sequence Service,‖IEEE Transactions on Dependable
and Secure Computing, 2015
diagram, which describes an interaction [3] Zhihua Xia, Xingming Sun, Zhan QinandKui
arranged in sequence. It depicts objects Ren, ―Towards Privacy-preserving Content-
involved in the scenario, and the sequence based image retrieval in Cloud Computing,‖
of the message exchanged between them IEEE Transactions On Computer Computing,
to carry out the functionality. September 2015.
[4] Sameera Almulla, Youssef Iraqi, and Andrew
Jones,‖A State-of-The-Art Review of Cloud
Forensics,‖Research Gate, Article · December
2014.
[5] Indrajit Ray, Kirill Belyaev, Mikhail Strizhov,
Dieudonne Mulamba, and Mariappan
Rajaram,― Secure Logging As a Service—
Delegating Log Management to the Cloud,‖
IEEE Systems Journal, 2013.
[6] Ben Martini, Kim-Kwang Raymond Choo,―An
integrated conceptual digital forensic
framework for cloud computing,‖Digital
Investigation, vol. 9, pp.71-80,2012.
[7] Alecsandru Patrascu, Victor-Valeriu Patricia,‖
Logging System for Cloud Computing
Forensic Environments,‖Journal of Control
Engineering and Applied Informatics, vol. 16,
pp. 80-88, 2014.
Fig 4.2:- Sequence Diagram [8] Zhengwei Qi, Chengcheng Xiang, Ruhui Ma,
Jian Li, Haibing Guan, and David S. L. Wei,

―Forensics ForenVisor: A Tool for Acquiring

and Preserving Reliable Data in Cloud Live
Forensics, ‖IEEE Transactions on Cloud
Computing, vol. 5, pp. 443-456, 2017.
[9] K. R. Choo, M. Herman, M. Iorga, and B.
Martini, "Cloud forensics: State-of-the-art and
future directions," Digital Investigation, pp.
77-78, 2016.
[10] L. Zhou, Y. Zhu, and A. Castiglione, "Efficient
k-NN query over encrypted data in cloud with
limited key-disclosure and offline data owner,"
Computers & Security, vol. 69, pp. 84-96,
2017.

ANALYSIS AND EVALUATION OF PRIVACY

POLICIES OF ONLINE SERVICES USING
MACHINE LEARNING
Ashutosh Singh1, Manish Kumar2, Rahul Kumar3, Dr. Prashant S. Dhotre4
1,2,3,4
Dr. D.Y. Patil Institute of Technology,Pimpri, Pune, India.
ashutoshsinghwork@gmail.com1, manish.30.kumarrr@gmail.com2, rahul005kumar@gmail.com3,
prashantsdhotre@gmail.com4
ABSTRACT
We found a lack of a good privacy policy grading tool, especially considering how
important privacy is becoming in today‘s world with the advent of smartphones and
24X7 online connectivity of our devices. Data companies like Google, Facebook and
Microsoft are the most valuable companies today and have revenues exceeding the
GDP of many countries. In this paper we detail how we used Machine Learning and
supervised classification techniques to trivialize the grading and categorization of
privacy policies. On visiting the privacy policy webpage of a service provider, our
system will automatically grade the policy and check for the satisfaction of individual
classes of privacy. This checklist and score can then be used by the user to judge
whether the policy is privacy abiding or not.
General Terms
Pattern Recognition, Machine Learning, Artificial Intelligence, Naïve Bayes and
Classification
Keywords
Privacy policy grading, machine learning, automated privacy policy grading, naïve
bayes
1. INTRODUCTION of 2017. This is because mobile devices
Today the number of internet users are have become more affordable and
increasing day by day. On average 200 powerful over the years, allowing a lot of
million internet users globally are added people to consume a lot of data, and access
each year. Easier access to computers, the a variety of online services from e-
modernization of countries around the commerce to social networking. For 2016,
world and an increased utilization of the number of smartphone users is forecast
smartphones has given people the to reach 2.1 billion. Whereas, the number
opportunity to use the internet more of mobile phone users is expected to pass
frequently and with more convenience. As the five billion mark by 2019.
of December 2017, there were For 2017, the number of mobile phone
approximately 772 million total internet users in India were 730.7 million, with 340
users in China and 312 million total million of them being smartphone users
internet users in the United States. and could reach 468 million by 2021. This
In India alone there are 460 million means that a large section of the Indian
internet users surpassing that of the United population has access to online services,
States and making it the second largest for the first time. They are ill-equipped to
online market. In fact, internet penetration handle the privacy risks that come with
in India has increased from 10% in 2011 to using these services and are unaware of the
26% in 2015, blessing the country with implications of the same.
one of the fastest growing user base. A clear indication of this boom in online
Subsequently, global mobile data traffic is services‘ users is the rise of Flipkart and
set to surpass 77 exabytes per month in Amazon in India. Flipkart‘s revenue
2022, up from 11.5 exabytes per month as exceeded $2.8 billion for 2017, while

Amazon has revenues close to $178 billion Recently Big Data too changed the entire
in 2017. Flipkart has 10 million active privacy policy framework and data use.
Internet users in India, while Amazon has Big data refers to data sets that are too
310 million active customers worldwide. large or complex for traditional data-
The number of online shoppers in India processing application software to
crossed the 100 million mark by the end of adequately deal with. Data with many
2016. cases (rows) offer greater statistical power,
IRCTC alone issues 13 lakh tickets a day while data with higher complexity (more
through its online train seat reservation attributes or columns) may lead to a
portal. This means that today more than higher false discovery rate. Big data
ever before, hundreds of millions of challenges include capturing data, data
Indians are directly affected due to poor storage, data analysis,
privacy policies and data misuse. In fact search, sharing, transfer, visualization, que
all Android devices (more than 2 billion) rying, updating, information privacy and
have Google Search by default installed in data source. Big data was originally
them, with no option to uninstall it. The associated with three key
best you can do is, reject its privacy policy concepts: volume, variety,
and not use it. and velocity. Other concepts later
These large companies (Google, attributed with big data are veracity (i.e.,
Microsoft, Amazon, Facebook, Apple etc.) how much noise is in the data) and value.
have the largest collection of human Big Data processing frameworks like
generated data ever in history, while also Hadoop empowered corporations to use
leaving users with the option of take all or (or misuse) data at unprecedented sales.
reject all, when it comes to sharing this This further eased up the road to data
information. As the Cambridge Analytica misuse and blatant sharing. Even the
scandal that engulfed Facebook in 2018, sharing of anonymous data and results
showed that even these large companies without user permission leads to a
are sometimes unaware of the way in perception that the user is no more than a
which data related to their users are being pawn in the business of Information
used by third parties. Retrieval and Analysis.
In fact the German Supreme Court Big Data is characterized by 4 V‘s –
directed Facebook in February 2019 to Volume, Velocity, Veracity, and Variety.
curb data collection, in response to how The most troubling of these aspects is the
Facebook integrates user data from speed (velocity) of data collection and
WhatsApp, Instagram and Facebook for processing. Real-time or near real-time
mining and analysis. Facebook was also processing of data, means that the user
found in violation of the General Data doesn‘t even get the opportunity to refuse
Protection Rules(GDPR), by tracking non- his data for such purposes.
users through like/share buttons. This leads to privacy issues and risks like:
But, Facebook is no outlier. Most Internet Right to be let alone
giants‘ entire business was created on user Limited access
data - Search for Google ads, Windows Control over information
usage and crash analytics for Microsoft or States of privacy
Buyer shopping and search data that Secrecy
powers Amazon‘s recommendation Personhood and autonomy
engine. Google recently found itself in hot
 Self-identity and personal growth
water when the US Senate called in
 Intimacy
question data collection by its research
apps, and issued a show cause notice to
2. RELATED WORKS:
Google Inc.

1. Assessment of Privacy Policies This can act as a database to teach our

using Machine Learning by Ritav Doshi, system what a good and decent privacy
Aditya Ahale, Gaurav Gharti, Prakhar policy looks like.
Pathrikar, Dr. P.S. Dhotre 7. https://www.onetrust.com/products
This paper mainly focuses on privacy /assessment-automation/ :
policy assessment in the Indian legal Privacy assessment is the task of
context. This can be especially useful for transforming a language re-presented in its
Privacy policies of Indian companies. It own spatial form of a document. Policy
also deals with generation of a trust score. assessment is the task of determining the
2. http://oecdprivacy.org/ meaning of a body of policy, e.g., a
This website deals with the OECD privacy privacy policy. Privacy policy assessment
policy, with this being especially useful is the task of determining the trust score of
since OECD provides the only a sample of policy from a set of policy
standardized privacy guidelines. This terms. This website focuses on automating
website emphasizes 8 privacy principles. this process.
The focus of this website is limited to 8. US patent - US20160164915A1 by
OECD and it‘s guidelines. Michael Cook
3. Privee: An architecture for This system includes a processor and a
automatically analyzing web privacy memory accessible to the processor. The
policies, USENIX Security, 2014 by S. memory stores instructions that, when
Zimmeck and S.M. Bellovin executed by the processor, cause the
This paper focuses on analysis of processor to determine a privacy policy
documents, notice-and-choice concept, score for one of an application and a
reading through a database of privacy website and provide the privacy policy
policies to grade new ones it encounters. score to a device.
4. A Machine Learning Solution to 9. US patent - US20120072991A1 by
Assess Privacy Policy Completeness by Rohyt Belani and Aaron Higbee
Elisa, Yuanhao, Milan et al. Rohyt and Aaron provide methods and
This paper combines multiple systems for evaluating and rating privacy
classification algorithms in order to risks posed by applications intended for
improve the performance of individual deployment on mobile platforms.
algorithms when classifying policies. This 10. Chinese patent - CN107465681A
is useful because this approach combines by Liu Ying
the strengths of many classification The invention provides cloud computing
algorithms, and help increase the accuracy big data privacy protection method. The
of the overall classification model. method comprises the following steps:
5. The Creation and Analysis of a deploying a plurality of authentication
Website Privacy Policy Corpus by Shomir, servers in a cloud storage platform, and
Florian, Aswarth et al. performing graded key distribution and
This paper focuses on pattern recognition feature authentication affair; and
heavily borrowing from earlier work on maintaining a global user feature list, and
Optical Character Recognition. They setting an authorization reading strategy
developed a model to read highly degraded and a constraint control strategy by the
documents (i.e. ambigous ones). authentication servers. By adoption of the
6. https://www.freeprivacypolicy.com cloud computing big data privacy
/ : an English privacy policy generator for protection method, the secure strategy
online privacy policy generation by reading control of the environment and the
FreePrivacyPolicy. strategy constraint is enhanced, the
computing cost of a writing user is reduced
on the premise of ensuring the security,

and the application demands of cross- and retention[2]) to learn classification for
cloud and cross-grade data strategies of all future unseen policies.
types are satisfied. The main aim of this tool is to help the
user understand the privacy policy in a
3. ISSUES AND CHALLENGES better way. For this, the tool focuses on
1) User Awareness two result factors. One is the score, and
Users in developing countries like India other, the details about the presence of
are still unaware of the risks that their data details about different privacy classes. The
can cause, and are mostly oblivious to the definition of all the categories was also
need for data privacy and security. displayed to help the user choose the
2) Privacy Policy is difficult to classes which were important for him/her.
Understand The components of our proposed
Privacy policies are usually filled with architecture for the Trust Score generator
technical and legal jargon thus making it tool are a browser extension, word pre-
difficult for the average user to completely processor, classifier, corpus, database,
understand and comprehend it. This leads score generator.
to users blindly accepting the policy The user first opens the policy webpage of
without being aware of the data control the service provider whose privacy policy
they just surrendered to a company. he/she wishes to understand. On clicking
3) Privacy Protection the extension, it fetches the source code of
The user may not understand completely the privacy policy webpage. This code is
the data he/she is surrendering and the then separated from the HTML tags to
implications of their actions as clearly as a generate the privacy policy text. The
computer professional or data scientist. So, policy text is then cleaned with the help of
the onus of privacy protection in spirit different pre-processing techniques. This
should lie with data collectors/aggregators. reduces the overhead on the algorithm.
4) Notification This policy is given to the classifier. The
A more granular and streamlined method Naïve Bayes classifier then classifies the
of data control can be the use of policy using words as a feature. The
notification (desktop or mobile) to alert the algorithm then labels the user‘s given
user about data sharing agreements with policy as one of the policies in the corpus.
third parties or data breaches. These [1]
notifications can also be in the form of an
e-mail. 5. CONCLUSION
5) Security The Trust Score serves as a medium of
Even if the user consents to providing data creating an understanding between the user
to a specific company, risks of and the service provider. It tries to put the
unauthorized access remains if someone user in control when the decisions
hacks the former‘s data. These hacks and regarding his/her privacy are concerned.
security breaches are generally outside the Our tool works dynamically on most
control of the company responsible for websites, but the structure of each website
data collection and control. is different. This makes it difficult to
scrape the policy text from this source
4. PROPOSED METHODOLOGY code. All in all, Trust Score generator can
We propose a Machine Learning based serve as a great foundation for judging the
solution to categorize privacy policies. Our privacy policy in a short time and take safe
Naïve-Bayes based algorithm will use the and unforced decisions about their online
ratings of 50 different policies in 8 privacy.
different categories (collect, choice,
cookies, access, purpose, security, share, 6. ACKNOWLEDGEMENTS

Special thanks to Dr. Prashant S. Dhotre

for blessing us with his expertise in
Machine Learning and it‘s various
implementations and Prof. Archana Kollu
for instilling in us a zeal and passion for
REFERENCES
[1] Assessment of Privacy Policies using Machine
Learning by Ritav Doshi, Aditya Ahale,
Gaurav Gharti, Prakhar Pathrikar, Dr. P.S.
Dhotre
[2] Organization for Economic Co-operation and
Development (http://oecdprivacy.org/)
[3] Privee: An architecture for automatically
analyzing web privacy policies, USENIX
Security, 2014 by S. Zimmeck and S.M.
Bellovin
[4] A Machine Learning Solution to Assess
Privacy Policy Completeness by Elisa,
Yuanhao, Milan et al.
[5] The Creation and Analysis of a Website
Privacy Policy Corpus by Shomir, Florian,
Aswarth et al.
[6] https://www.freeprivacypolicy.com/ : an
English privacy policy generator for online
privacy policy generation by
FreePrivacyPolicy.
[7] OneTrust
(https://www.onetrust.com/products/assessmen
t-automation/)
[8] US patent - US20160164915A1 by Michael
Cook
[9] US patent - US20120072991A1 by Rohyt
Belani and Aaron Higbee
[10] Chinese patent - CN107465681A by Liu Ying

WEB IMAGE SEARCH RE- RANKING DEPENDENT ON DIVERSITY
Nagesh K Patil1, S B Nimbekar2

1,2
Computer Engineeerig Dept, SIT Lonavala, SPPU Pune, Maharashtra, India.
nkp.sit@sinhgad.edu, sbn.sit@sinhgad.edu
ABSTRACT
Social media sharing websites sanction users to annotate images with free tags, which
significantly contribute to the development of the web image retrieval. Tag-predicated
image search is a consequential method to find images shared by users in gregarious
networks. However, how to make the top ranked result germane and with diversity is
arduous. In this paper, we propose a topic diverse ranking approach for tag-predicated
image retrieval with the consideration of promoting the topic coverage performance.
First, we construct a tag graph predicated on the homogeneous attribute between each
tag. Then community detection technique is led to mine the subject network of each tag.
From that point forward, inter network and intra network positioning are acquainted
with acquire the last recovered outcomes. In the inter-community ranking process, an
adaptive desultory walk model is employed to rank the community predicated on the
multi-information of each topic community. Besides, we build an inverted index
structure for images to expedite the probing process. Experimental results on Flickr
dataset and NUS-Wide datasets show the efficacy of the proposed approach.
Keyword – Image search, Re-ranking
1. INTRODUCTION efficiency, the visual feature vectors need
Web-scale image search engines mostly to be short and their matching needs to be
use keywords as queries and rely on expeditious. Another major challenge is
circumventing text to probe images. It is that the similarities of low-level visual
prominent that they suffer from the features may not well correlate with
ambiguity of query keywords. For images‘ high-level semantic meanings,
example, using ―apple‖ as query, the which interpret users‘ search intention. To
retrieved images belong to different narrow down this semantic gap, for offline
categories, such as ―red apple‖, ―apple image apperception and retrieval, there
logo‖, and ―apple laptop‖. Online image have been a number of studies to map
re-ranking and searching it has been shown visual features to a set of predefined
the effective way to improving the image concepts or attributes as semantic signature
searching results. Real web picture web However, these approaches are only
search tools have since embraced the re- applicable to closed image sets of
ranking methodology. Given a query relatively small sizes. They are not
keyword input by a utilizer, according to a congruous for online web-predicated image
stored word-image index file, a pool of re-ranking. According to our empirical
images pertinent to the query keyword are study, images retrieved by 120query
retrieved by the search engine. By asking a keywords alone include more than 1500
user to select query image, which reflects concepts. Therefore, it is arduous and
the user‘s search intention, from the pool, inefficient to design an immensely colossal
the remaining images in the pool are re- concept dictionary to characterize highly
ranked based on their visual similarities diverse web images.
with the query image. The visual highlights
of pictures are pre-processed disconnected 2. RELATED WORK
and put away by the web crawler. The Social networks allow users to annotate
principle online computational expense of their shared images with a set of
picture re-positioning is on contrasting descriptors such as tags. The tag-predicated
visual highlights. In order to achieve high image search can be facilely accomplished

by utilizing the tags as query. However, the walk model on the voting graph which is
weakly relevant tags, noisy tags and constructed based on the images
duplicated information make the search relationship to estimate the tag relevance.
results unsatisfactory. Most of the literature Moreover, many research efforts about the
focuses on tag processing, image relevance tag refinement emerged. Wu et al. [19]
ranking and diversity enhancement for the raised a tag completion algorithm to
retrieval results. The following complete the missing tags and correct the
components present the subsisting works erroneous tags for the given image. Qian et
cognate to the above three aspects al. proposed a retagging approach to cover
respectively. a wide range of semantics, in which both
A. Tag Processing Strategy the relevance of a tag to image as well as
It has been long acknowledged that tag its semantic compensations to the already
ranking and refinement play a determined tags are fused to determine the
consequential role in the re-ranking of tag- final tag list of the given image. Gu et al.
predicated image retrieval, for they lay a [45] proposed an image tagging approach
firm foundation on the development of re- by latent community classification and
ranking in tag based image retrieval multi-kernel learning. Yang et al. proposed
(TBIR). For example, Liu et al. [1] a tag refinement module which leverages
proposed a tag ranking method to rank the the abundant user-generated images and
tags of a given image, in which probability the associated tags as the ―social
density estimation is used to get the initial assistance‖ to learn the classifiers to refine
relevance scores and a random walk is noisy tags of the web images directly. Qi et
proposed to refine these scores over a tag al. proposed a collective intelligence
similarity graph. Similar to [1], and [26] mining method to correct the erroneous
sort the tag list by the tag relevance scores tags [50].
which are learned by counting votes from B. Relevance Ranking Approach
visually similar neighbors. The To directly rank the raw photos without
applications in tag-based image retrieval undergoing any intermediate tag
also have been conducted. Based on these processing, Liu et al. [3] utilized an
initial efforts, Lee and Neve [66] proposed optimization framework to automatically
to learn the relevance of tag and image by rank images based on their relevance
visually weighted neighbor voting, a scores to a given tag. Visual consistency
variant of the popular baseline neighbor among pictures and semantic data of labels
voting algorithm. Agrawal and Chaudhary are both considered. Gao et al. [7]
[17] proposed a relevance tag ranking proposed a hypergraph learning approach,
algorithm, which can automatically rank which aims to estimate the relevance of
tags according to their relevance with the images. They investigate the bag-of-words
constraint of image content. A modified and bag-of-visual words of images, which
probabilistic relevance estimation method is extracted from both the visual and
is proposed by taking the size of object into textual information of image. Chen et al.
account. Furthermore, random walk based [21] proposed a support vector machine
refinement is utilized to improve final classifier per query to learn relevance
retrieval results. Li [24] presented a tag scores of its associated photos. Wu et al.
fusion method for tag relevance estimation [15] proposed a two-step similarity ranking
to solve the limitations of a single scheme that aims to preserve both visual
measurement on tag relevance. Besides, and semantic resemblance in the similarity
early and tardy fusion schemes for a ranking. In order to achieve this, a self-tune
neighbor voting predicated tag pertinence manifold ranking solution that focuses on
estimator are conducted. Zhu et al. [34] the visual-based similarity ranking and a
proposed an adaptive teleportation random semantic-oriented similarity re-ranking

method are included. Hu et al. [27] different views. Wang et al. [29] proposed
proposed an image ranking method which a duplicate detection algorithm to represent
represents image by sets of regions and images with hash code, so that large image
apply these representations to the multiple- database with similar hash codes can be
instance learning based on the max margin grouped quickly. Qian et al. [48] proposed
framework. Yu et al. [35] proposed a an approach for diversifying the landmark
learning based ranking model, in which summarization from diverse viewpoints
both the click and visual feature are based on the relative view point of each
adopted simultaneously in the learning image. The relative viewpoint of each
process. Specially, Haruechaiyasak and image is represented with a 4-dimensional
Damrongrat [33] proposed a content-based viewpoint vector. They select the relevant
image retrieval method to improve the images with large viewpoint variations as
search results returned by tag-based image top ranked images. Tong et al. achieved the
retrieval. In order to give users a better diversity by introducing a diversity term in
visual enjoyment, Chen et al. [18] their model whose function is to punish the
proposed relevance-quality re-ranking visual similarity between images [61-62].
approach to boost the quality of the However, most of the above literatures
retrieval images. view the diversity problem as to promote
C. Diversity Enhancement the visual diversity but not the topic
The relevance based image retrieval coverage. As reported in [14], most people
approaches can boost the relevance said they preferred the retrieval results with
performance, but the diversity performance broad and interesting topics. So, many
of searching is also very important. Many literatures about topic coverage are
researchers dedicated their extensive emerged [23, 30, 49, 54]. For instance,
efforts to make the top ranked results Agrawal et al. [23] classify the taxonomy
diversified. Leuken et al. studied three over queries to represent the different
visually diverse ranking methods to re-rank aspects of query. This approach promotes
the search results [10]. Different from documents that share a high number of
clustering, Song et al. [9] proposed a re- classes with the query, while demoting
ranking method to meet users‘ ambiguous those with classes already well represented
needs by analyzing the topic richness. A in the ranking
diverse relevance ranking algorithm to
maximize average diverse precision in the
optimization framework by mining the
semantic similarities of social images
based on their visual features and tags is
proposed in [5]. Sun et al. [28] proposed a
social image ranking scheme to retrieve the
images to meet the relevance, typicality Figure1 Ranking Approche
and diversity criteria. They explored both
semantic and visual information of images 3. SYSTEM OVERVIEW
on the basis of [5]. Ksibi et al. [31] Our system includes five main parts: 1)
proposed to assign a dynamic trade-off Tag graph construction based on the tag
between the relevance and diversity information of image dataset. Tag graph is
performance according to the ambiguity constructed to mine the topic community.
level of the given query. Based on [31], 2) Community detection. Affinity
they further proposed a query expansion propagation clustering methods is
approach [6] to select the most employed to detect topic communities. 3)
representative concept weight by Image community mapping process. We
aggregating the weights of concepts from assign each image to a single community

according to the tag overlap ration between

the topic community and image. 4) Inter-
community ranking . we introduce the
adaptive arbitrary walk model to rank topic
communities according to the semantic
pertinence between the community and
query. 5) Intra community ranking. A
regularization framework is proposed to
determine the pertinence of each image to
the query by fusing the visual, semantic
and view information into a amalgamated Figure 2: Proposed system Architecture
system. We sequentially select the most
relevant image in each ranked community REFERENCES
[1]. Liu, X. Hua, L. Yang, M. Wang, and H. Zhang,
as our final re- ranking results.
―Tag ranking‖. WWW, 2009: 351-360.
[2]. X. Qian, H. Wang, Y. Zhao, et al.,Image
4. PROPOSED SYSTEM Location Inference by Multisaliency
We propose a topic diverse ranking Enhancement. IEEE Trans.Multimedia 19(4):
approach for tag –based image retrieval 813- 821 (2017
[3]. D. Liu, X. Hua, M. Wang, and H. Zhang,
with the consideration of promoting the
―Boost Search Relevance for Tag-Based Social
topic coverage performance. First we Image Retrieval‖. ICME, 2009:1636-1639
construct a tag graph predicated on the [4]. X. Lu, X. Li and X. Zheng, Latent Semantic
homogeneous attribute between each tag. Minimal Hashing for Image Retrieval,IEEE
The group strategy is directed to mine the Trans. Image processing, vol. 26, no.1,355-
368, 2017.
point network of each tag. After that ,
[5]. M. Wang, K. Yang, X. Hua, and H. Zhang,
inter-community and intra-community ―Towards relevant and diverse search of social
ranking are introduced to obtain the final images‖. IEEE Trans. Multimedia, 12(8):829-
retrieval results. We present a novel image 842, 2010.
search re-ranking, named spectral [6]. A. Ksibi, A. Ammar, and C. Amar, ―Adaptive
diversification for tag-based social image
clustering re-ranking with click-based
retrieval‖. International Journal of Multimedia
similarity and typicality. Which first use Information Retrieval, 2014, 3(1): 29-39.
image click information to guide image [7]. Y. Gao, M. Wang, H. Luan, J. Shen, S. Yan,
similarity learning for multiple features, and D. Tao, ―Tag-based social image search
then conducts spectral clustering to group with visual-text joint hypergraph learning‖.
ACM Multimedia information retrieval,
visually and semantically similar images
2011:1517-1520
into clusters. Determinately obtain the re- [8]. X. Li, B. Zhao, and X. Lu, A General
ranking results by calculating click- Framework for Edited Video and Raw Video
predicated clusters typicality and within- Summarization," IEEE Transactions on Image
clusters click predicated image typicality in Processing. Digital Object Identifier (DOI):
10.1109/TIP.2017.2695887
descending order. To the best of our
[9]. K. Song, Y. Tian, T. Huang, and W. Gao,
knowledge, this is the attempt for cluster- ―Diversifying the image retrieval results‖, In
based re-ranking using click-through data. Proc. ACM Multimedia Conf., 2006, pp.707–
The Proposed system Retrieve image 710.
results that are relevant and finding [10]. R. Leuken, L. Garcia, X. Olivares, and R.
Zwol, ―Visual diversification of image search
common features among images also
results‖. In Proc. WWW Conf., 2009,
interest points on the images are extracted. pp.341–350.
Similarity of each pair of images are [11]. R. Cilibrasi, and P. Vitanyi, ―The Google
computed by applying page ranking. Similarity Distance‖. IEEE Trans. Knowledge
and Data Engineering, 19(3):1065-1076,
2007.
[12]. X. Qian, H. Wang, G. Liu, and X. Hou,
―HWVP: Hierarchical Wavelet Packet Texture

Descriptors and Their Applications in Scene image retrieval‖. In Computer Vision and
Categorization and Semantic Concept Pattern Recognition, CVPR 2008. IEEE
Retrieval‖. Multimedia Tools and Applications, Conference on (pp. 1-8).
May 2012. [28]. F. Sun, M. Wang, and D. Wang,
[13]. X. Lu, Y. Yuan, X. Zheng, Jointly ―Optimizing social image search with multiple
Dictionary Learning for Change Detection in criteria: Relevance, diversity, and typicality‖.
Multispectral Imagery, IEEE Trans. Neurocomputing, 95, 40-47, 2012.
Cybernetics, vol. 47, no. 4, pp. 884-897, 2017. [29]. B. Wang, Z. Li, and M. Li, ―Large-scale
[14]. J. Carbonell, and J. Goldstein, ―The use of duplicate detection for web image search‖.
MMR, diversity based re-ranking for ICME 2006, pp. 353- 356.
reordering documents and producing [30]. R. Santos, C. Macdonald, and I. Ounis,
summaries‖. SIGIR 1998. ―Exploiting query reformulations for Web
[15]. Wu, J. Wu, and M. Lu, ―A Two-Step search result diversification‖. In WWW, pages
Similarity Ranking Scheme for Image 881–890, 2010.
Retrieval. In Parallel Architectures‖. [31]. A. Ksibi, G. Feki, and A. Ammar,
Algorithms and Programming, pp. 191-196, ―Effective Diversification for Ambiguous
IEEE, 2014. Queries in Social Image Retrieval‖. In
[16]. G. Ding, Y. Guo, J. Zhou, et al., Large- Computer Analysis of Images and Patterns (pp.
Scale Cross-Modality Search via Collective 571-578), 2013.
Matrix Factorization Hashing. IEEE [32]. Y. Guo, G. Ding, L. Liu, J. Han, and L.
Transactions on Image Processing, 2016, Shao, ―Learning to hash with optimized anchor
25(11): 5427-5440 embedding for scalable retrieval,‖ IEEE Trans.
[17]. G. Agrawal, and R. Chaudhary, Image Processing, vol. 26, no. 3, pp. 1344–
―Relevancy tag ranking‖. In Computer and 1354, 2017.
Communication Technology,pp. 169- 173, [33]. C. Haruechaiyasak, and C. Damrongrat,
IEEE, 2011. ―Improving social tag-based image retrieval
[18]. L. Chen, S. Zhu, and Z. Li, ―Image with CBIR technique‖. Springer Berlin
retrieval via improved relevance ranking‖. In Heidelberg, 2010, pp. 212-215.
ControlConference, pp.4620-4625, IEEE, [34]. X. Zhu, W. Nejdl, ―An adaptive
2014. teleportation random walk model for learning
[19]. L. Wu, and R. Jin, ―Tag completion for social tag relevance‖.ACM SIGIR, pp. 223-
image retrieval‖. Pattern Analysis and Machine 232, 2014.
Intelligence, IEEE Transactionson,35(3), 716- [35]. J. Yu, D. Tao, and M. Wang, ―Learning to
727, 2013. Rank Using User Clicks and Visual Features
[20]. Y. Yang, Y. Gao, H. Zhang, and J. Shao, for Image Retrieval‖. IEEE
―Image Tagging with Social Assistance‖. Trans.Cybern.(2014).
ICMR, 2014. [36]. S. Ji, K. Zhou, C. Liao, Z. Zheng, and G.
[21]. L. Chen, D. Xua, and I. Tsang, ―Tag- Xue, ―Global ranking by exploiting user
based image retrievalimproved by augmented clicks‖. ACM SIGIR, 2009, pp. 35-42.
features and group- based refinement‖. [37]. G. Dupret, ―A model to estimate intrinsic
Multimedia, IEEE Transactions on, 14(4), document relevance from the clickthrough logs
1057-1067, 2012 of a web search engine‖. ACM international
[22]. Z. Lin Z, G. Ding, J. Han, et al., Cross- conference on Web search and data mining (pp.
View Retrieval viaProbability- BaseSemantics- 181-190), 2010.
Preserving Hashing, IEEETransactions on [38]. X. Lu, X. Li, and L. Mou, Semi-
Cybernetics vol. PP, no.99, pp.1- Supervised Multi-task Learning for Scene
14 doi: 10.1109/TCYB.2016.2608906. Recognition, IEEE Trans. Cybernetics, vol. 45,
[23]. R. Agrawal, S. Gollapudi, A. Halverson, no. 9, pp. 1967-1976, 2015.
and S. Ieong, ―Diversifying search results‖. In [39]. X. Hua, and M. Ye, ―Mining knowledge
WSDM, pages 5– 14, 2009 from clicks: MSR- Bing image retrieval
[24]. X. Li, ―Tag relevance fusion for social challenge‖. In Multimedia and Expo
image retrieval‖. CoRR abs/1410.3462, 2014. Workshops, 2014.
[25]. X. Qian, X. Liu, and C. Zheng, ―Tagging [40]. X. Lu, X. Li, Multiresolution Imaging,
photos using users' vocabularies‖. IEEE Transactions on Cybernetics, vol. 44, no.
Neurocomputing, 111(111), 144-153, 2013. 1, pp.149-160, 2014.
[26]. D. Mishra, ―Tag Relevance for Social [41]. X. Qian, X. Hua, Y. Tang, and T. Mei,
Image Retrieval in Accordance with Neighbor ―social image tagging with diverse semantics‖.
Voting Algorithm‖. IJCSNS, 14(7), 50, 2014. IEEE Trans. Cybernetics, vol.44, no.12,2014,
[27]. Y. Hu, M. Li, and N. Yu, ―Multiple- pp. 2493- 2508.
instance ranking: Learning to rank images for

[42]. X. Qian, D. Lu, X. Liu, ―Tag based image image search‖. Multimedia Systems,
retrieval by user-oriented ranking‖. ICMR, 2014:
2015. [55]. X. Tian,et.al, ―Image search reranking with
[43]. Y. Zhang, X. Qian, X. Tan, J. Han, Y. hierarchical topic awareness‖.IEEE
Tang:Sketch-Based Image Retrieval by Salient TRANSACTION ON CYBERNETICS, 2015.
Contour Reinforcement. IEEE Trans. [56]. D. Dang-Nguyen, et.al, ―Retrieval of
Multimedia 18(8): 1604-1615 (2016). Diversity Images by Pre-filtering and
[44]. Y. Gu, X. Qian, Q. Li, and et al.,―Image Hiearchical Clustering‖. MediaEval,2014.
Annotation by Latent Community Detection [57]. X. Qian, Y. Xue, Y. Tang, X. Hou, and T.
and Multikernel Learning‖. IEEE Transactions Mei, ―Landmark Summarization with Diverse
on Image Processing 24(11): 3450-3463 Viewpoints‖. IEEE Trans. Circuits and
(2015). Systems for Video Technology, vol.25, no.11,
[45]. X. Yang, X. Qian, and Y. Xue. ‖Scalable 2015, pp.1857-1869.
Mobile Image Retrieval by Exploring [58]. H. Hou, X. Xu, G. Wang, and X. Wang,
Contextual Saliency‖. IEEE Trans. Image ―Joint-Rerank: a novel method for image
Processing 24(6): 1709-1721 (2015). search reranking‖ Multimedia Tools and
[46]. D. Lu, X. Liu, and X. Qian, ―Tag based Applications,2015, 74(4):1423-1442.
image search by social re-ranking‖. IEEE [59]. S. Liu, et.al, ―social visual image reranking
Transactions on Multimedia, vol.18, for web image search‖. MMM, 2013
no.8, 2016, pp.1628-1639. [60]. J. He, H. Tong, Q. Mei, and B. Szymanski,
[47]. X. Qian, Y. Xue, Y. Tang, and X. Hou, ―GenDeR: A generic diversified ranking
―Landmark Summarization with Diverse algorithm,‖ Advances in Neural information
Viewpoints‖. IEEE Trans. Circuits and process systems, 2012,2:1142-1150.
Systems for Video Technology, vol.25, no.11, [61]. H. Tong, J. He, Z. Wen, R. Konuru, and C.
2015, pp.1857-1869. Lin, ―Diversified ranking on large graphs: an
[48]. R. Santos, C. Macdonald, and I. Ounis, optimization viewpoint‖, SIGKDD, 2011,1028-
―Selectively diversifying web search results‖. 1036.
ACM CIKM, 2010:1179-1188. [62]. X. Li, S. Liao, W. Lan, X. Du, and G.
[49]. G. Qi, C. Aggarwal, and J. Han, Yang, ―Zero-shot Image Tagging by
―Mining Collective Intelligence in Diverse Hierarchical semantic embedding,‖ ACM
Groups‖, in Proc. WWW, 2013. SIGIR, 2015:879-882.
[50]. X. Qian, X. Tan, Y. Zhang, R. Hong, [63]. D. Zhang, J. Han, C. Li, J. Wang, and X.
and M. Wang, ―Enhancing Sketch-Based Li, Detection of Co-salient Objects by Looking
Image Retrieval by Re-ranking and Relevance Deep and Wide, International Journal of
Feedback‖. IEEE Trans. Image Processing, Computer Vision, 120(2): 215-232, 2016.
vol.25, no.1, 2016, pp.195-208. [64]. D. Zhang, J. Han, J. Han, L. Shao,
[51]. https://dumps.wikimedia.org/enwiki/latest/ Cosaliency Detection Based on Intrasaliency
enwiki-latest-pages-articles.xml.bz2. Prior Transfer and Deep Intersaliency Mining,
[52]. B. Frey, and D. Dueck, ―Clustering by IEEE Trans. on Neural Networks and Learning
passing messages between data points‖. Systems, 27(6): 1163-1176, 2016.
Science, 2007, 315(5814): 972-976. [65]. S. Lee, and W. Neve, ―Visually weighted
[53]. K. Song, Y. Tian, W. Gao, and T. Huang, neighbor voting for image tag relevance
―Diversifying the image retrieval results‖. learning‖. Multimedia Tools and Applications,
ACM MM. 2006:707-710. 1-24, 2013.
[54]. Y. Yan, G. Liu, S. Wang, and et al.―Graph-
based clustering and ranking for diversified

ABOUT THE DEPARTMENT
Live your life each day as you would climb mountain. An occasional glance towards
the summit keeps the goal in mind, but many beautiful scenes are to be observed from
each new vantage point. -Harold B Melchart.
Major Highlights of the Department

1.Patents filed / granted by faculties:
 Dr. P. N. Mahalle published patent on ―Renewable portable battery charger using
wind energy‖., Govt of India, Controller of Patients, designs and Trademarks
Dept. of Industry Policy and promotions, F03D-9/02,F03d-3/00.
 Dr. P. N. Mahalle published patent on ―Eco friendly heating and cooling system
with water distiller for cabinet ‖., Govt of India, Controller of Patents, Designs
and Trademarks Dept of Industry Policy & promotions,F25B21/04 , F25D11/00.
 Prof Vivek Jog filled a patent on ― System Automated Blood Diagnosis for
Detection of Desired Patterns Using Customized Hough Transform‖, Deputy
Registrar of Copyroghts Copyright Office, Govt Of India, Copyright Diary
Number: 8569/2015CO/L.
2.Funded Research Projects:
 Dr. P. N. Mahalle & Dr S K Pathan are working on research project ―
International Conferece on IoT, Next Generation networks & Cloud Computing‖,
funded by BCUD-SPPU, Pune (Rs.2,00,000/-).
 Dr. P. N. Mahalle & Prof P N Railkar are working on research project ― Future
Architecture for security of IoT ‖, funded by BCUD-SPPU,Pune(Rs.1,70,000/-).
 Dr. P. N. Mahalle & Prof Vinod V Kimbahune are working on research project ―
Context aware intelligence and adaptive approach for future Internet‖, funded by
BCUD-SPPU, Pune (Rs.55,000).
 Dr. P. N. Mahalle & Prof G R Shinde working on research project ― Light weight
group Authentication for IoT ‖, funded by BCUD-SPPU,Pune (Rs.90,000).
 Prof S P Pingat & Prof S P Dugam are working on research project ― erformance
& energy efficient routing protocol for WSN ‖, funded by BCUD-SPPU,Pune
(Rs.65,000).
3. Books Published:
 Dr. P. N. Mahalle has written a book ― Secure Access Control and Delegation
Based on Capability and Context Awareness for federated IoT ‖, River
Publications.
 Dr. P. N. Mahalle has written book on ― Data Structure & Algorithm ‖, with
Behrouz Forouzan and Richerd Gilberg by Ceengage Publications.
 Dr. P. N. Mahalle has written a book ― Theory of Computation ‖, Gigabyte
Publications, Pune.
 Dr. P. N. Mahalle has written a book ― Discrete Mathematics ‖, in Technical
Publication,Pune.
 Prof R A Kudale & Prof S Y Kulkarni have written a book ― High Performance
Computing‖, in Tech-Max Publications,Pune.
 Dr. S. K. Pathan & Prof M A Ansari have written a book ― Computer Forensic &
Cyber Applications‖, in Tech-Max Publications,Pune.

ABOUT THE COLLEGE
VISSION:
We are committed to produce not only good engineers but good
human beings also.
MISSION:
OUR MISSION is to do WHAT it takes to foster, sustain and upgrade
the quality of Education by way of harnessing Talent, Potential and
optimizing meaningful Learning Facilities.
Our ENDEAVOUR is to provide the best learning, conductive
environment & equip the students with effective Learning Strategies.
The Vadgaon(Bk) campus of Sinhgad Institutes has an ideal
environment with lush green surroundings & panoramic views.
Vadgaon(Bk) campus is situated on a delightful hillock of the
beautiful Sahyadri ranges. It provides quietude to stimulate the brain
to enhance the learning capabilities. The institutes on this campus
boast of independent infrastructure. Also, facilities to cover the
necessities of life are available on the campus.
Smt. Kashibai Navale College of Engineering (SKNCOE) is a

technical institution in the locality of Vadgaon Budruk in the city
Pune, India. It is affiliated with the Savitribai Phule Pune University,
and managed by the Sinhgad Technical Education Society. It has been
accredited by the National Board of Accreditation and recognized by
the AICTE. The institute has also been awarded an "A" Grade by The
Directorate of Technical Education (DTE), Maharashtra state,
Mumbai.


Postal Address
Smt. Kashibai Navale College of Engineering
Sr. No. 44/1, Vadgoan (Bk), off Sinhgad Road, Pune-411041 Maharashtra, INDIA
Tele. (020)24354938, Telefax: (020) 24354938
Email: principal.skncoe@sinhgad.edu

Proceedings ICINC19 PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Proceedings ICINC19 PDF

Caricato da

Copyright:

Formati disponibili

Proceeding of

International Conference on IoT, Next

Department of Computer Engineering

Savitribai Phule Pune University

Sinhgad Technical Education Society’s

Smt. Kashibai Navale College of

CORE SUPPORTING HANDS

It‘s an honor and privilege to host and witness an international conference,

Data Analytics and Machine Learning

Data Mining and Image Retrieval

Network and Cyber Security

Image & Signal Processing

AUTOMATED TOLL COLLECTION SYSTEM AND

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 1

tracing vehicles, government sectors, Collection System Using Embedded Linux

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 2

responds to the reader with a random predetermined amount is automatically

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 3

Fig 1: Block Diagram

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 4

4.1. Proposed pseudocode-

5. CONCLUSION AND FUTURE

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 5

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 6

WI-FI BASED HOME SURVEILLANCE BOT USING

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 7

applications that everyone needs to be night time as it is having the potential to

Sr. Paper Name Year Publication Concept

1. Implementation of Spy 2017 IEEE In this present work, a

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 8

Based Live Streaming for based surveillance system

5. PROPOSED SYSTEM There are two H-Bridges in IC. There are

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 9

Live Streaming of videos, MJPEG 5. If Result is equal to ‗L‘ Move robot

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 10

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 11

SMART DUSTBIN WITH METAL DETECTOR

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 12

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 13

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 14

Global System for Mobile communication

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 15

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 16

IMPROVEMENT IN PERSONAL ASSISTANT

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 17

3. THE PROPOSAL VPASS SYSTEM

Knowledge Base: There are two

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 18

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 19

IoT BASED HOME AUTOMATION SYSTEM FOR

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 20

2. MOTIVATION Dual Tone Multi Frequency based

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 21

Instant Capture of Image, Medical

Fig.2. Data flow diagram level 0

Fig. 1. System Architecture

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 22

send the notification to the necessary

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 23

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 24

SMART TRAFIC CONTROL SYSTEM USING TIME

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 25

ISSN:0975-887 Department of Computer Engineering, SKNCOE,Vadgaon(Bk),Pune. Page 26