Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Mrs.Pranamita Nanda1,B.Sandhiya2,R.Sandhiya3,A.S.Vanaja4
1Assistant Professor,2,3,4Students
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract: Big data is a challenging functionality for analyzing query. The keyword inpath or externalpath is used for
the large volume of data in the IT deployment in a different importing data from internal device and external device.
dimension. To make that analysis process in more efficient Then the data is extracted from the database using test data
manner we use Hive tool for query processing and providing and trained data. The trained data is already existing datas
statistical report using RStudio. The processing load in data which is just a predicted one. With the trained data the
stream mining has been reduced by the technique know as testing is done for analyzing. Both the test data and trained
Feature Selection. However, when it comes to mining over high data are used for classification algorithm known as Support
dimensional data the search space from which an optimal Vector Machine. The SVM classifier is the classification
feature subset is derived grows exponentially in size, leading to algorithm. For a dataset consisting of feature s set and label
an intractable demand in computation. To reduce the set an classifier build a model to predict classes. The
complexity of using accelerated particle swarm parameter used for this process is accuracy. The SVM
optimization.(APSO), we connect the data by using Hadoop classifier evaluate the predicted data and provides the
technology. Hadoop technology is easier to store and retrieve accuracy. Thus the efficient accuracy is taken into
the data in a big data environment. With the dataset the consideration.
datas are analysed and the statistical report is produced using
SVM algorithm in R software where R language is used. This R- EXISTING SYSTEM:
software environment is used to provide a statisical computing
and graphics. This statistical report compares the accuracy
between the linear and non linear grid where the higher The light weight feature selection technique known as
accuracy dataset is efficient. The final graph provides swarm search is used for classfing the dataset. There are
combination of the linear and nonlinear with respect to cost many feature selection technique like CCV, Improved PSO
and sigma which is the userdefined value. PSO with SVM etc.,The amount of data feed is potentially infinite and the
algorithm increases the performance of analysing the data. data delivery is continuous like a high speed train of
information.The processing hence is expected to be real time
INTRODUCTION: and instantly responsive. The retrieval of data from large
volume of data and maintaining them is difficult and the
The process of handling large volume of data, storing and accuracy of the data is little lower which is been overcomed
retrieval of data is challenging factor. Data stream mining is using best classifier algorithm. The complication on top of
the process of extracting knowledge structures from quantitatively computing the non-linear relations between
continuous, rapid data records. A data stream is an ordered the feature value and target classes is the temporal nature of
sequence of instances that in many application of data such data stream, One must crunch on the data stream long
stream mining can be read only once or a small number of enough for accurately modeling seasonal cycles or regular
times using limited computing and storage capabilities. Thus pattern if they ever exist. There are no straight-forward
for retrieval of data we use data stream mining technique. To relations that can easily map the attribute data into a specific
make the retrieval of data in efficient manner we use class without a long-term observation. This impacts
hadoop-hive tool for query processing. It takes less time to considerately on the data mining algorithm design that
process. Process such as converting the unstructured data should be capable of just reading and forgetting the data
into structured data by creating schema. Then in hadoop stream.
environment there is a data storage place known as hadoop
distributed file system where our database is imported from
the external device or internal device such as server or
system that we are working in to the HDFS using the hive
2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1341
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
ARCHITECTURAL DIAGRAM AND EXPLANATION: increases the efficiency and takes less time for anaysing and
for retrieving the data. It improves the data processing
DATA STORAGE speed. It can be able to analyse the large volume of data in a
small time compare to another tools. It provides large scale
integration of data.
MODULES:
STATISTICAL
REPORT
Create schema in data warehouse
Performance evolution
2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1343
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017 www.irjet.net p-ISSN: 2395-0072
E) STATISTICAL REPORT:
SCREENSHOTS:
FUTURE ENHANCEMENT:
REFERENCES:
2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1345