Sei sulla pagina 1di 29

A

Minor Project Report


on

Credit Card Fraud Detection System

Submitted in partial fulfillment of the requirements for the award of degree of

Bachelor of Technology

by
Ragini Singh
(CE-4112-2K16)
Smriti Singh
(CE4124-2K16)

Under supervision of
Dr. Neelam Duhan

Department of Computer Engineering

J. C. BOSE UNIVERSITY OF SCIENCE & TECHNOLOGY, YMCA


FARIDABAD-121006

May 2019

i
CANDIDATE’S DECLARATION

We hereby certify that the work which is being presented in this project report titled “Credit
Card Fraud Detection System” in fulfillment of the requirement for the degree of Bachelor
of Technology and submitted to “J. C. Bose University of Science and Technology,
YMCA, Faridabad”, is an authentic record of our own work carried out under the
supervision of Dr. Neelam Duhan.

The work contained in this report has not been submitted to any other University or Institute
for the award of any other degree by us.

Ragini Singh
(CE-4112-2K16)
Smriti Singh
(CE-4124-2K16)

ii
CERTIFICATE

This is to certify that the project report titled “Credit Card Fraud Detection System”
submitted by Ragini Singh and Smriti Singh to “J. C. Bose University of Science and
Technology, YMCA, Faridabad” for the award of the degree of Bachelor of Technology is
a record of bonafide work carried out by them under my supervision. In my opinion, the work
has reached the standards of fulfilling the requirements of the regulations to the degree

Dr. Neelam Duhan


(Supervisor)
Assistant Professor,
Department of Computer Engg
J. C. Bose University of Science and Technology, YMCA, Faridabad

iii
LIST OF FIGURES

Figure Page No.

Fig. 1 Numpy 5
Fig. 2 Scikit-learn 5
Fig. 3 Theano 6
Fig. 4 Keras 6
Fig. 5 Tensorflow 7
Fig. 6 Pandas 7
Fig. 7 Matplotlib 7
Fig. 8 Scipy 8
Fig. 9 Pytorch 8
Fig. 10 Posterior Probability 10
Fig. 11 Neural networks 13
Fig. 12 Neural network 14
Fig. 13 Flow chart 15
Fig. 14 Dataflow Diagram 16
Fig. 15 Histogram 18

iv
TABLE OF CONTENTS

S. No. Content Page No.


1. INTRODUCTION 1
2. PROBLEM STATEMENT 2
3. MOTIVATION 2
4. OBJECTIVES & SCOPE 3
5. LITERATURE REVIEW 4
6. PROPOSED METHODOLOGY(USE CASE DIAGRAM, 14
DATAFLOW DIAGRAM, FLOWCHARTS)
7. HARDWARE AND SOFTWARE REQUIREMENTS 17
8. RESULTS (SCREENSHOTS, GRAPHS Etc.) 17
9. CONCLUSION 23

v
1. INTRODUCTION

Credit card fraud is increasing considerably with the development of modern technology and
the global superhighways of communication. Credit card fraud costs consumers and the
financial company billions of dollars annually, and fraudsters continuously try to find new
rules and tactics to commit illegal actions. Thus, fraud detection systems have become
essential for banks and financial institution, to minimize their losses. However, there is a lack
of published literature on credit card fraud detection techniques, due to the unavailable credit
card transactions dataset for researchers. The most commonly techniques used fraud detection
methods are Naïve Bayes (NB), Support Vector Machines (SVM), K-Nearest Neighbor
algorithms (KNN). These techniques can be used alone or in collaboration using ensemble or
meta-learning techniques to build classifiers. But amongst all existing method, ensemble
learning methods are identified as popular and common method, not because of its quite
straightforward implementation, but also due to its exceptional predictive performance on
practical problems. In this paper we trained various data mining techniques used in credit
card fraud detection and evaluate each methodology based on certain design criteria. After
several trial and comparisons; we introduced the bagging classifier based on decision three,
as the best classifier to construct the fraud detection model. The performance evaluation is
performed on real life credit card transactions dataset to demonstrate the benefit of the
bagging ensemble algorithm.

Popularity of online shopping is growing day to day. Credit card is the easy
way to do online shopping. According to an ACNielsen study conducted in 2005 one-tenth of
the world’s population is shopping online in same study it is also mentioned that credit cards
are most popular mode of online payment. In US it is found that total number of credit cards
from the four credit card network(Master Card, VISA, Discover, and American Express) is
609 million and 1.28 billion credit cards from above four primary credit card networks plus
some other networks (Store, Oil Company and other). If consider the statistics of credit cards
in India , it is found that total number of credit cards In India at the end of December-31-2012
is about 18 to 18.9 million. In case of multinational banks, the usage or average balance, per
borrower for credit card holder has rise up from Rs. 61,758 in 2011 to Rs. 82,455 in 2012. in
the same period, private bank customers' usage rise from Rs 39,368 to Rs. 47,370. As the
number of credit card users increases world-wide, the opportunities for fraudster to steal
credit card details and, subsequently, commit fraud are also grew up.

1
2. PROBLEM STATEMENT

The Credit Card Fraud Detection Problem includes modeling past credit card transactions
with the knowledge of the ones that turned out to be fraud. This model is then used to identify
whether a new transaction is fraudulent or not. Our aim here is to detect 100% of the
fraudulent transactions while minimizing the incorrect fraud classifications. The major
problem of fraudulent transactions is that it results in the loss of funds of the customer as well
as the organization. Whenever a fraud happens it results in the loss of faith of the customer in
the organization. If the fraud is caught it will restore the faith and the organizations. Credit
card fraud stands as major problem for word wide financial institutions. Annual lost due to it
scales to billions of dollars. We can observe this from many financial reports. Such as
(Bhattacharyya et al., 2011) 10th annual online fraud report by CyberSource shows that
estimated loss due to online fraud is $4 billion for 2008 which is 11% increase than $3.6
billion loss in 2007and in 2006, fraud in United Kingdom alone was estimated to be £535
million in 2007 and now costing around 13.9 billion a year. From 2006 to 2008, UK alone
has lost £427.0 million to £609.90 million due to credit and debit card fraud (Woolsey &
Schilz, 2011). Although, there is some decrease in such losses after implementation of
detection and prevention systems by government and bank, card-not-present fraud losses are
increasing at higher rate due to online transactions. Worst thing is it is still increasing un-
protective and un-detective way.

The major problem being – Online payments does not require physical cards.
Anyone who knows the details of the card can make a fraud transaction. Currently, the card
holder gets to know about the transaction only after the fraudulent transaction has occurred.
There is no mechanism to carry out the checking of these. These will further serve as
motivational factors for developing the credit card fraud detection.

Over the year, government and banks have implemented some steps to subdue these frauds
but along with the evolution of fraud detection and control methods, perpetrators are also
evolving their methods and practices to avoid detection. Thus an effective and innovative
methods need to be develop which will evolve accordingly to the need.

3. MOTIVATION

The real motivation for detecting the credit card comes from the fact that after a fraudulent
transaction, the customer may loose the faith in the online transaction and this may cost the
2
company its customer. Also the company’s image may tarnish in the market and this may
cause it to loose further customers. Popularity of online shopping is growing day to day.
Credit card is the easy way to do online shopping. According to an ACNielsen study
conducted in 2005 one-tenth of the world’s population is shopping online in same study it is
also mentioned that credit cards are most popular mode of online payment.

Now a day the customers prefer the most accepted payment mode via credit
card for the convenient way of paying bills, online shopping is easiest way. At the same time
the fraud transaction risks using credit card is a main problem which should be avoided. So
There are many data mining techniques available to avoid these risks effectively.

4. OBJECTIVE AND SCOPE

The major objective of this credit card fraud detection system is that it will be able to detect
fraudulent transactions amongst the genuine transactions. This will help eradiacte the cyber
crimes to a large extent and thus provide the customers a save and healthy environment to
commit online transactions.

Although incidences of credit card fraud are limited to about 0.1% of all card transactions,
they have resulted in huge financial losses as the fraudulent transactions have been large
value transactions. In 1999, out of 12 billion transactions made annually, approximately 10
million—or one out of every 1200 transactions—turned out to be fraudulent. Also, 0.04% (4
out of every 10,000) of all monthly active accounts were fraudulent. Even with tremendous
volume and value increase in credit card transactions since then, these proportions have
stayed the same or have decreased due to sophisticated fraud detection and prevention
systems. Today's fraud detection systems are designed to prevent one-twelfth of one percent
of all transactions processed which still translates into billions of dollars in losses. To prevent
being "charged back" for fraud transactions, merchants can sign up for services offered by
Visa and MasterCard called Verified by Visa and MasterCard SecureCode, under the
umbrella term 3-D Secure. This requires consumers to add additional information to confirm
a transaction. Often enough online merchants do not take adequate measures to protect their
websites from fraud attacks, for example by being blind to sequencing. In contrast to more
automated product transactions, a clerk overseeing "card present" authorization requests must
approve the customer's removal of the goods from the premises in real time. Credit card
merchant associations, like Visa and MasterCard, receive profits from transaction fees,
3
charging between 0% and 3.25% of the purchase price plus a per transaction fee of between
0.00 USD and 40.00 USD. Cash costs more to bank up, so it is worthwhile for merchants to
take cards. Issuers are thus motivated to pursue policies which increase the money transferred
by their systems. Many merchants believe this pursuit of revenue reduces the incentive for
credit card issuers to adopt procedures to reduce crime, particularly because the cost of
investigating a fraud is usually higher than the cost of just writing it off. These costs are
passed on to the merchants as "chargebacks". This can result in substantial additional costs:
not only has the merchant been defrauded for the amount of the transaction, he is also obliged
to pay the chargeback fee, and to add insult to injury the transaction fees still stand.
Additionally, merchants may lose their merchant account if their percent of chargeback to
overall turnover exceeds some value related to their type of product or service sold.

Merchants have started to request changes in state and federal laws to protect themselves and
their consumers from fraud, but the credit card industry has opposed many of the requests. In
many cases, merchants have little ability to fight fraud, and must simply accept a proportion
of fraud as a cost of doing business.

The main objective of this project is to observe and implement the optimization method for
the artificial neural network and compare and evaluate the result. Since the use of ANN is
mainly in developing artificial intelligence and expert system, optimization is necessity. Thus
our main aim in this project is to find out the efficient method of optimization of neural
network for classification of good and bad credit transactions.

5. LITERATURE REVIEW
In this section we would be describing the various python libraries used and various
algorithms which have been implemented throughout the entire system and also other
algorithms which could have been used like Random Forest , Decision Tree, Naïve Bayes ,
K-Nearest Neighbor, Logistic Regression, Artificial Neural Networks, Support Vector
Machine(SVM) etc.

5.1 Libraries Used


Some of the main libraries used are Numpy, Pandas, Tensorflow, Keras, Matplotlib, Scikit-
learn, Theano etc.

4
5.1.1 Numpy

Figure 1 Numpy

NumPy is a very popular python library for large multi-dimensional array and matrix
processing, with the help of a large collection of high-level mathematical functions. It is very
useful for fundamental scientific computations in Machine Learning. It is particularly useful
for linear algebra, Fourier transform, and random number capabilities. High-end libraries like
TensorFlow uses NumPy internally for manipulation of Tensors.

5.1.2 Scikit-learn

Figure 2 Scikit-learn

Scikit-learn is one of the most popular ML libraries for classical ML algorithms. It is built on
top of two basic Python libraries, viz., NumPy and SciPy. Scikit-learn supports most of the
supervised and unsupervised learning algorithms. Scikit-learn can also be used for data-
mining and data-analysis, which makes it a great tool who is starting out with ML .

5
5.1.3 Theano
We all know that Machine Learning is basically mathematics and statistics.

Figure 3 Theano

Theano is a popular python library that is used to define, evaluate and optimize mathematical
expressions involving multi-dimensional arrays in an efficient manner. It is achieved by
optimizing the utilization of CPU and GPU. It is extensively used for unit-testing and self-
verification to detect and diagnose different types of errors. Theano is a very powerful library
that has been used in large-scale computationally intensive scientific projects for a long time
but is simple and approachable enough to be used by individuals for their own projects.

5.1.4 Keras

Figure 4 keras

Keras is a very popular Machine Learning library for Python. It is a high-level neural
networks API capable of running on top of TensorFlow, CNTK, or Theano. It can run
seamlessly on both CPU and GPU. Keras makes it really for ML beginners to build and
design a Neural Network. One of the best thing about Keras is that it allows for easy and fast
prototyping.

6
5.1.5 Tensorflow

Figure 5 Tensorflow

TensorFlow is a very popular open-source library for high performance numerical


computation developed by the Google Brain team in Google. As the name suggests,
Tensorflow is a framework that involves defining and running computations involving
tensors. It can train and run deep neural networks that can be used to develop several AI
applications. TensorFlow is widely used in the field of deep learning research and
application.

5.1.6 Pandas

Figure 6 Pandas

Pandas is a popular Python library for data analysis. It is not directly related to Machine
Learning. As we know that the dataset must be prepared before training. In this case, Pandas
comes handy as it was developed specifically for data extraction and preparation.

5.1.7 Matplotlib

Figure 7 Matplotlib

7
Matplotlib is a very popular Python library for data visualization. Like Pandas, it is not
directly related to Machine Learning. It particularly comes in handy when a programmer
wants to visualize the patterns in the data. It is a 2D plotting library used for creating 2D
graphs and plots. A module named pyplot makes it easy for programmers for plotting as it
provides features to control line styles, font properties, formatting axes, etc. It provides
various kinds of graphs and plots for data visualization, viz., histogram, error charts, bar
chats, etc

Figure 8 scipy

5.1.8 Scipy
SciPy is a very popular library among Machine Learning enthusiasts as it contains different
modules for optimization, linear algebra, integration and statistics. There is a difference
between the SciPy library and the SciPy stack. The SciPy is one of the core packages that
make up the SciPy stack. SciPy is also very useful for image manipulation.

Figure 9 pytorch

5.1.9 Pytorch
PyTorch is a popular open-source Machine Learning library for Python based on Torch,
which is an open-source Machine Learning library which is implemented in C with a wrapper
in Lua. It has an extensive choice of tools and libraries that supports on Computer Vision,
Natural Language Processing(NLP) and many more ML programs. It allows developers to
perform computations on Tensors with GPU acceleration and also helps in creating
computational graphs.

8
5.2 Machine Learning Algorithms
Various machine learning algorithms can be used in this format like Decision trees, Naïve
Bayes, K Nearest Neighbor and the implemented algorithms like Logistic Regression and
Neural Networks.

5.2.1 Decision Trees


Decision Tree algorithm is a data mining induction Techniques that recursively partitions a
data set of records using depth-first greedy approach or breadth-first approach until all the
data items belongs to a special class. A decision tree structure is made of root, leaf and
internal nodes. The tree Structure is used in classifying unknown data records. So at each
internal node of the tree, a decision of best split is made using impureness measures. The tree
leaves are made up of the class labels which the data items have been group. In this method a
Credit Card Fraud Detection using algorithm for Decision Tree Learning. Although focus on
the Information Gain based Decision Tree Learning in this technique estimating the best split
of Purity Measures of Gini, Entropy and Information Gain Ratio to test the best classifier
attribute. In this Technique simply find out the Fraudulent Customer/Merchant through
Tracing Fake Mail and IP Address. Customer /merchant are suspicious if the mail is fake they
are traced all information about the owner/sender through IP Address. It can find out the
Location of the customer and Trace all details. Decision Tree is Powerful Technique in Data
Mining Decision Tree is vital part of Credit card Fraud Detection.

5.2.2 Naïve Bayes

Naïve Bayes classifier is an uncomplicated and prevailing algorithm for the classification
task. Even if we are running on a data set with millions of accounts with some attributes, it is
recommended to attempt Naïve Bayes approach. Naïve Bayes is supervised machine learning
algorithm that uses training dataset with known target classes to forecast the class of prospect
instances. In general words we can say that naïve Bayes technique presupposes the
occurrence or lack of distinct attribute do not depend on the occurrence or lack of attributes in
identical set. This technique is named as naive because it intelligently assumes the liberty of
attributes specified the class. After that classification is done by using Bayes rule to check the
probability of correct class. Naive Bayes is a type of classifier which uses the Bayes
Theorem. It estimates membership probabilities for every class such as the probability that
given record or data point belongs to a particular class. The class with the maximum

9
Figure 10 Posterior probability

probability is considered as the most liable class. This is also known as Maximum a
Posteriori (MAP).

5.2.3 K Nearest Neighbor


The concept of nearest neighbor analysis has been used in several anomaly detection
techniques. One of the best classifier algorithms that have been used in the credit card fraud
detection is k-nearest neighbor algorithm that is a supervised learning algorithm where the
result of new instance query is classified based on majority of K-Nearest Neighbor category.
The performance of KNN algorithm is influenced by three main factors:

• The distance metric used to locate the nearest neighbors.

• The distance rule used to derive a classification from k-nearest neighbor.

• The number of neighbors used to classify the new sample.

Among the various credit card fraud detection methods of supervised statistical pattern
recognition, the K Nearest Neighbor rule achieves consistently high performance, without a
priori assumptions about the distributions from which the training examples are drawn. K-
Nearest neighbor based credit card fraud detection techniques require a distance or similar the
measure defined between two data instances. In process of KNN, we classify any incoming
transaction by calculating of nearest point to new incoming transaction. Then if the nearest
neighbor be fraudulent, then the transaction indicates as a fraud. The value of K is used as, a
small and odd to break the ties (typically 1, 3 or 5). Larger K values can help to reduce the
effect of noisy data set. In this algorithm, distance between two data instances can be
calculated in different ways. For continuous attributes, Euclidean distance is a good choice.
For categorical attributes, a simple matching coefficient is often used. For multivariate data,
distance is usually calculated for each attribute and then combined. The performance of KNN
algorithm can be improved by optimizing the distance metric. This technique required
10
legitimate as well as fraudulent samples of data for training. It is fast technique along with
high false alert.

5.2.4 Logistic Regression

Logistic regression is the appropriate regression analysis to conduct when the dependent
variable is dichotomous (binary). Like all regression analyses, the logistic regression is a
predictive analysis. Logistic regression is used to describe data and to explain the
relationship between one dependent binary variable and one or more nominal, ordinal,
interval or ratio-level independent variables.

Sometimes logistic regressions are difficult to interpret; the Intellectus Statistics tool easily
allows you to conduct the analysis, then in plain English interprets the output. The dependent
variable should be dichotomous in nature (e.g., presence vs. absent). There should be no
outliers in the data, which can be assessed by converting the continuous predictors to
standardized scores. There should be no high correlations (multicollinearity) among the
predictors. This can be assessed by a correlation matrix among the predictors. As long
correlation coefficients among independent variables are less than 0.90 the assumption is met.
At the center of the logistic regression analysis is the task estimating the log odds of an event.

When selecting the model for the logistic regression analysis, another important consideration
is the model fit. Adding independent variables to a logistic regression model will always
increase the amount of variance explained in the log odds (typically expressed as R²).
However, adding more and more variables to the model can result in overfitting, which
reduces the generalizability of the model beyond the data on which the model is fit.

Numerous pseudo-R2 values have been developed for binary logistic regression. These
should be interpreted with extreme caution as they have many computational issues which
cause them to be artificially high or low. A better approach is to present any of the goodness
of fit tests available; Hosmer-Lemeshow is a commonly used measure of goodness of fit
based on the Chi-square test.

5.2.5 Neural Networks


Neural network based fraud detection is based totally on the human brain working principal.
Neural network technology has made a computer capable of think. As human brain learn
through past experience and use its knowledge or experience in making the decision in daily

11
life problem the same technique is applied with the credit card fraud detection technology.
When a particular consumer uses its credit card, there is a fix pattern of credit card use , made
by the way consumer uses its credit card. Using the last one or two year data neural network
is train about the particular pattern of using a credit card by a particular consumer. As shown
in the figure the neural network are train on information regarding to various categories about
the card holder such as occupation of the card holder, income, occupation may fall in one
category, while in another category information about the large amount of purchased are
placed, these information include the number of large purchase, frequencies of large
purchase, location where these kind of purchase are take place etc. within a fixed time period.
In spite of pattern of credit card use neural network are also trained about the various credit
card fraud face by a particular bank previously. Based on the pattern of uses of credit card ,
neural network make use of prediction algorithm on these pattern data to classify that weather
a particular transaction is fraudulent or genuine. When credit card is being used by
unauthorized user the neural network based fraud detection system check for the pattern used
by the fraudster and matches with the pattern of the original card holder on which the neural
network has been trained, if the pattern matches the neural network declare the transaction
okay.

When a transaction arrives for authorization, it is characterized by a stream of authorization


data fields that carry information identifying the cardholder (account number) and
characteristics of the transaction (e.g., amount, merchant code). There are additional data
fields that can be taken in a feed from the authorization system (e.g., time of day). In most
cases, banks do not archive logs of their authorization files. Only transactions that are
forwarded by the merchant for settlement are archived by the bank’s credit card processing
system. Thus, a data set of transactions was composed from an extract of data stored in
Bank’s settlement file. In this extract, only that authorization information that was archived to
the settlement file was available for model development.

12
Figure 11 neural networks

Matching the pattern does not mean that the transaction should exactly match with the pattern
rather the neural network see to what extent there exist difference if the transaction is near by
the pattern then the transaction is ok otherwise if there is a big difference then the chance of
being a transaction illegal increase and the neural network declare the transaction a fault
transaction. The neural network is design to produce output in real value between 0 and 1 .If
the neural network produce output that is below .6 or .7 then the transaction is ok and if the
output is above .7 then the chance of being a transaction illegal increase. There are some
occasion when the transaction made by a legal user is of a quite different and there are also
possibilities that the illegal person made use of card that fit into the pattern for what the
neural network is trained. Although it is rare, yet If the legal user can’t complete a transaction
due to these limitation then it is not much about to worry But what about the illegal person
who is making use of card , hare also work human tendency to some extent when a illegal
person gets a credit card he is not going to make use of this card again and again by making
number of small transaction rather he will try to made as large purchase as possible and as
quickly that may totally mismatch with the pattern for what the neural network is trained. In
the design of neural network-based pattern recognition systems, there is always a process of
business (e.g., jewelry store, consumer electronics, restaurant, hotel, etc.) History descriptors
contain features characterizing the use of the card for transact-ions and the payments made to
the account over some immediately prior time interval. Other descriptors can include such
factors as the date of issue (or most recent reissue) of the card. This can be important for the
detection of NRI (non-receipt of issue) fraud.

13
Figure 12 neural network

The neural network used in this fraud detection a three-layer, feed-forward network t4hat use
two training passes through the data set. The first training pass involves a process of
prototype cell commitment in which exemplars from the training set are stored in the weights
between the first and second (middle) layer cells of the network. A final training pass
determines local a posterior probabilities associated with each of these prototype cells. P-
RCE training is not subject to problems of convergence that can afflict gradient-descent
training algorithms. The P-RCE network and networks like it have been applied to a variety
of pattern recognition problems both within and beyond the field of financial services, from
character recognition to mortgage underwriting and risk assessment layer consisted of a
single cell that outputs a numeric response that can be considered as a “fraud score”. This is
analogous to credit scoring systems that produce a score, as opposed to a strict probability.
The objective of the neural network training process is to arrive at a trained network that
produces a fraud score that gives the best rank.

6. PROPOSED METHODOLOGY
Use case diagrams model the functionality of a system using actors and use cases. Use cases
are a set of actions, services, and functions that the system needs to perform. A data-flow
diagram (DFD) is a way of representing a flow of a data of a process or a system (usually an
information system). The DFD also provides information about the outputs and inputs of each
entity and the process itself. Data-flow diagrams can be regarded as inverted Petri nets,
because places in such networks correspond to the semantics of data memories. Analogously,

14
the semantics of transitions from Petri nets and data flows and functions from data-flow
diagrams should be considered equivalent.

Figure 13 Flow chart


6.1 Flow Chart
A flowchart is a type of diagram that represents a workflow or process. A flowchart can also
be defined as a diagrammatic representation of an algorithm, a step-by-step approach to
solving a task. Flowcharts are used in analyzing, designing, documenting or managing a
process or program in various fields. Flowcharts are used in designing and documenting
simple processes or programs. Like other types of diagrams, they help visualize what is going
on and thereby help understand a process, and perhaps also find less-obvious features within
the process.

15
6.2 Dataflow Diagram

Figure 14 Data flow diagram

A data-flow diagram is a way of representing a flow of a data of a process or a system. The


DFD also provides information about the outputs and inputs of each entity and the process
itself. A data-flow diagram has no control flow, there are no decision rules and no loops.

16
7. HARDWARE AND SOFTWARE REQUIREMENTS
Hardware requirements are:
• System : Pentium V 2.4 Ghz
• Hard Disk : 40 Gb
• Floppy Drive : 1.44 Mb
• Monitor : 15 Vga Colour
• Ram : 256 Mb

Software requirements are:


• Anaconda Navigator
• PyCharm
• Spyder
• HTML and Flask
• Processor i7
• Jupyter
• Coding Language – Python

8. RESULTS
8.1 Observations
1.Total/Net Accuracy: One approach to gauge the compute model’s correctness is to use
Accuracy as the deciding parameter. But, as stated earlier, in a highly skewed data set like
this, we know that even if we predict all values as non-fraudulent,

2.Confusion Matrix: In the field of machine learning and specifically the problem of
statistical classification, a confusion matrix, also known as an error matrix, is a specific table
layout that allows visualization of the performance of an algorithm, typically a supervised
learning one. Merely tabulating the confusion matrix will not provide a clear understanding
of the performance of the data. This is because the total number of fraud cases is much less,
and variation in the confusion matrix will be so small that it will be equivalent to a justified
error in a balanced dataset. So, this measure is also ruled out.

17
8.2 Histogram of Parameters

Figure 15 Histogram

18
8.3 Code for Logistic Regression

19
8.4 Code for Neural Networks

20
Running the Epocs for neural networks

21
8.5 Screenshot of UI

8.6 Setting the Route

22
8.7 Front-end of the System

9. CONCLUSION

In present scenario where online mode of payment and fraud associated to it drastically
increasing, the need of a security system is highly required which can detect the fraud. In this
synopsis we have seen various algorithms that can be used for this purpose and thus make the
transaction more secure. Machine learning when implemented in python can provide us with
these necessary modules. The data set can be classified into classes of class 0 and class 1 and
they will determine if the transaction is a fraud or not.
This model has a lot of future scope to be used whenever a transaction is done and thus
prevent cyber crimes to a certain degree. Credit card fraud has become more and more
rampant in recent years. To improve merchants’ risk management level in an automatic and
efficient way and building an accurate and easy handling credit card risk monitoring system
is one of the key tasks for the merchant banks. One aim of this study is to identify the user
model that best identifies fraud cases. There are many ways of detection of credit card fraud.
If one of these or combination of algorithm is applied into bank credit card fraud detection
system, Then the probability of fraud transactions can be predicted soon after credit card
transactions by the banks. This paper gives contribution towards the effective ways of credit

23
card fraudulent detection. Fraud detection is a complex issue that requires a substantial
amount of planning before throwing machine learning algorithms at it. Nonetheless, it is also
an application of data science and machine learning for the good, which makes sure that the
customer’s money is safe and not easily tampered with.
Credit card fraud detection is an important application of outlier detection . Due to
drastic increase in digital frauds, there is a loss of billions dollars and therefore various
techniques are evolved for fraud detection and applied to diverse business fields.
Credit card fraud is increasing considerably with the development of modern technology and
the global superhighways of communication. Credit card fraud costs consumers and the
financial company billions of dollars annually, and fraudsters continuously try to find new
rules and tactics to commit illegal actions. Thus, fraud detection systems have become
essential for banks and financial institution, to minimize their losses. However, there is a lack
of published literature on credit card fraud detection techniques, due to the unavailable credit
card transactions dataset for researchers. The most commonly techniques used fraud detection
methods are Naïve Bayes (NB), Support Vector Machines (SVM), K-Nearest Neighbor
algorithms (KNN). These techniques can be used alone or in collaboration using ensemble or
meta-learning techniques to build classifiers. But amongst all existing method, ensemble
learning methods are identified as popular and common method, not because of its quite
straightforward implementation, but also due to its exceptional predictive performance on
practical problems. In this paper we trained various data mining techniques used in credit
card fraud detection and evaluate each methodology based on certain design criteria. After
several trial and comparisons; we introduced the bagging classifier based on decision three,
as the best classifier to construct the fraud detection model. The performance evaluation is
performed on real life credit card transactions dataset to demonstrate the benefit of the
bagging ensemble algorithm.

24

Potrebbero piacerti anche