Sei sulla pagina 1di 30

DATA MINING

16 January 2013 1

WHY DATA MINING?


Data explosion problem Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories We are drowning in data, but starving for knowledge! Solution: Data mining

16 January 2013

DATA MINING
Data mining (knowledge discovery in databases)[3]: Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases

16 January 2013

DATA MINING TASKS


Data Mining includes the following tasks :
Classification: Classifies a data item into one of several predefined categories. Regression: Maps a data item to a real-valued prediction variable. Clustering: Maps a data item into a cluster, where clusters are natural groupings of data items based on similarity metrics. Association rules: Describes association relationship among different attributes. Summarization: Provides a compact description for a subset of data. Dependency modeling: Describes significant dependencies among variables. Sequence analysis: Models sequential patterns, like time-series analysis. The goal is to model the state of the process generating the sequence or to extract and report deviations and trends over time.

16 January 2013

DATA MINING CLASSIFICATION TECHNIQUES


Classification Techniques: Decision Tree based Methods Rule-based Methods Memory based reasoning Genetic Algorithms Bayesian Belief Networks Support Vector Machines Neural Networks

16 January 2013

INTRODUCTION TO NEURAL NETWORKS


16 January 2013 6

WHAT IS NEURAL NETWORK?


Biologically motivated approach to machine learning. A neural network is a powerful data modeling tool that is able to capture and represent complex input/output relationships. Neural networks resemble the human brain in the following two ways: * A neural network acquires knowledge through learning. * A neural network's knowledge is stored within interneuron connection strengths known as synaptic weights.
16 January 2013 7

BIOLOGICAL NEURON

16 January 2013

ARTIFICIAL NEURON

16 January 2013

ANN ARCHITECTURE

16 January 2013

10

DATA MINING CLASSIFICATION USING ARTIFICIAL NEURAL NETWORKS

16 January 2013

11

DATA MINING CLASSIFICATION USING ARTIFICIAL NEURAL NETWORKS


The Data Mining Classification using Artificial Neural Networks has eight steps: Step 1: (Data collection) The data to be used for classification is collected. Step 2: (Training and testing data separation) The available data are divided into training and testing data sets of size 80% and 20 % respectively. Step 3: (Network architecture) A network architecture and a learning method are selected. Important considerations are the exact number of perceptrons and the number of layers.

16 January 2013

12

DATA MINING CLASSIFICATION USING ARTIFICIAL NEURAL NETWORKS


Step 4: (Parameter tuning and weight initialization) There are parameters for tuning the network to the desired learning performance level. Part of this step is initialization of the network weights and parameters, followed by modification of the parameters as training performance feedback is received. Initialize weight and biases to the random numbers distributed over a small range of values: [-/sqrt(Ni ) , +/sqrt(Ni )] Where Ni -No. of inputs to ith unit, - integer between 1 to 3

16 January 2013

13

DATA MINING CLASSIFICATION USING ARTIFICIAL NEURAL NETWORKS


Step 5: (Data Normalization) Transforms the application data into the type and format required by the ANN. All data must be normalized i.e. all values of the attributes in the database are changed to contain in the interval [0,1] or [-1,1]. Two normalization techniques are used: 1. Max-Min Normalization 2. Decimal scaling Normalization
16 January 2013 14

DATA NORMALIZATION

16 January 2013

15

DATA NORMALIZATION
Decimal Scaling Normalization: Normalization by decimal scaling normalizes by moving the decimal point of values of attribute A. v=v/10j Where j is smallest integer such that max|v|<1 Example: Let As values range from -986 to 917 max|v|=986; v=-986 normalizes to v=-986/1000=-0.986
16 January 2013 16

DATA MINING CLASSIFICATION USING ARTIFICIAL NEURAL NETWORKS


Step 6: (Training) Training is conducted iteratively by
presenting input and desired or output data to the ANN. The ANN computes the outputs and adjusts the weights until the computed outputs are within an acceptable tolerance of the known outputs for the input cases.

16 January 2013

17

DATA MINING CLASSIFICATION USING ARTIFICIAL NEURAL NETWORKS


Step 7: (Testing) The testing examines the performance of the network using the derived weights by measuring the ability of the network to classify the testing data correctly. Step 8: (Implementation) Now a stable set of weights are obtained. Now the network can reproduce the desired output for the given inputs like those in the training set. The network is ready to use as a stand-alone system or as part of another software system where new input data will be presented to it and its output will be a recommended decision.
16 January 2013 18

NEURAL NETWORK CLASSIFICATION USING BACKPROPAGATION ALGORITHM


1. Initialize weight and biases. 2. Feed the training sample. 3. Propagate the inputs forward; we compute the net input and output of each unit in the hidden and output layers. 4. Back propagate the error. 5. Update weight and biases to reflect the propagated errors. 6. Terminating conditions.
16 January 2013 19

BACKPROPAGATION FORMULAS

16 January 2013

20

Applications, Benefits & Limitations


16 January 2013 21

SOME ANN APPLICATIONS


ANN application areas:
Tax form processing to identify tax fraud Enhancing auditing by finding irregularities Bankruptcy prediction Customer credit scoring Loan approvals Credit card approval and fraud detection Financial prediction Energy forecasting Computer access security (intrusion detection and classification of attacks) Fraud detection in mobile telecommunication networks
16 January 2013 22

Benefits of ANNs:

BENEFITS AND LIMITATIONS OF NEURAL NETWORKS

Usefulness for pattern recognition, classification, generalization, abstraction and interpretation of incomplete and noisy inputs. (e.g. handwriting recognition, image recognition, voice and speech recognition, weather forecasting).
Resemblance with the functioning of human brain Ability to solve new kinds of problems. ANNs are particularly effective at solving problems whose solutions are difficult, if not impossible, to define. This opened up a new range of decision support applications formerly either difficult or impossible to computerize.
16 January 2013 23

BENEFITS AND LIMITATIONS OF NEURAL NETWORKS (contd.)


Benefits of ANNs
Robustness. ANNs tend to be more robust than their conventional counterparts. They have the ability to cope with incomplete or fuzzy data. ANNs can be very tolerant of faults if properly implemented. Fast processing speed. Because they consist of a large number of massively interconnected processing units, all operating in parallel on the same problem, ANNs can potentially operate at considerable speed (when implemented on parallel processors).
Flexibility and ease of maintenance. ANNs are very flexible in adapting their behavior to new and changing environments. They are also easier to maintain, with some having the ability to learn from experience to improve their own performance.
24

16 January 2013

BENEFITS AND LIMITATIONS OF NEURAL NETWORKS (contd.)


Limitations of ANNs: ANNs lack explanation capabilities. Justifications for results is difficult to obtain because the connection weights usually do not have obvious interpretations .

16 January 2013

25

future scope & conclusion


16 January 2013 26

FUTURE SCOPE
Neural Network a fast and parallel processing network further may use for attribute selection and dimensionality reduction problem.

16 January 2013

27

CONCLUSION
Even if ANN lacks in explanation capabilities but because of its robustness, fast and parallel processing and flexible nature Neural Network is most useful tool for classification.

16 January 2013

28

REFERENCES
[1] Christopher M.Bishop, Neural Networks for Pattern recognition ,Oxford University Press. [2] A.Verikas, M.Bacauskiene, Feature selection with neural networks, Pattern recognition Letters (23) (2002) Page No. 1323-1335. [3] Ernst Haselsteiner and Gert Pfurtscheller, Using Time-Dependent Neural Networks for EEG Classification, IEEE transactions on rehabilitation engineering, vol. 8, no. 4, December 2000 [4] E. Hosseini Aria, J. Amini, M.R.Saradjian, Back Propagation Neural Network for Classification of IRS-1D Satellite Images , [5] Donald F. Specht, A General Regression Neural Network, IEEE transactions on neural networks. Vol. 2 . No. 6. November 1991 [6] Shivajirao M. Jadhav ,Sanjay L. Nalbalwar,Ashok A. Ghatol , Artificial Neural Network Models based Cardiac Arrhythmia Disease Diagnosis from ECG Signal Data, International Journal of Computer Applications, 2012 by IJCA Journal Volume 44 - Number 15 Year of Publication: 2012 [7] Parick K. Simpson, Fuzzy Min- Max Neural Networks: Part I Classification , IEEE transaction on Neural Networks, Vol 3, No.5 , September 1992
16 January 2013 29

Thank you !!!

16 January 2013

30

Potrebbero piacerti anche