Sei sulla pagina 1di 16

“CREDIT CARD FRAUD DETECTION IN

MACHINE LEARNING”
A project report submitted to
Chhattisgarh Swami Vivekanand Technical University , Bhilai(C.G.) , India

For partial fulfillment of the award of the Degree Bachelor


of Technology in Computer Science & Engineering By
GAURAV RAJ ( BA3578 )
Under the Guidance of
Assist. Prof. Abhishek Saw Assist prof Computer Science &
Engineering RITEE,Raipur(C.G.)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


RAIPUR INSTITUTE OF TECHNOLOGY ,RAIPUR
Chhatauna , Mandir Hasaud , Raipur ,Chhattisgarh ,India
Phone: 0771-3208842 FAX: 0771-2537634 Email:
contactus@rit.edu.in , Website: www.rit.edu.in
Session: 2019-20
Department of Computer Science and Engineering

RAIPUR INSTITUTE OF TECHNOLOGY,RAIPUR


Chhatauna, Mandir Hasaud, Raipur, (C.G.)

DECLARATION BY THE CANDIDATE

We the undersigned solemnly declare that the Minor Project work entitled
“CREDIT CARD FRAUD DETECTION SYSTEM USING MACHINE
LEARNING” is based our own work carried out during the course of our study
under the supervision of Mr. Abhishek Saw .

We assert that the statements made and conclusions drawn are an outcome of the
project work. We further declare that to the best of our knowledge and belief that
the report does not contain any part of any work which has been submitted for the
award of any other degree/diploma/certificate in this University /deemed
University of India or any other country.

.…………………………
(Signature of the Candidate)

.…………………………
(Signature of the Candidate)

……………………………
(Signature of the Candidate)
Department of Computer Science and Engineering

RAIPUR INSTITUTE OF TECHNOLOGY,RAIPUR


Chhatauna, Mandir Hasaud, Raipur, (C.G.)

CERTIFICATE BY THE EXAMINERS

The project entitled “CREDIT CARD FRAUD DETECTION SYSTEM”


submitted by GAURAV RAJ ER. No:- BA3578 has been examined by the
undersigned as a part of the examination and is hereby recommended for the award
of the degree of Bachelor of technology in Computer Science &Engineering of
Chhattisgarh Swami Vivekananda Technical University Bhilai, (C.G.)

__________________ __________________

Internal Examiner External Examiner


Date: Date:
Department of Computer Science and Engineering

RAIPUR INSTITUTE OF
TECHNOLOGY,RAIPUR Chhatauna, Mandir Hasaud,
Raipur, (C.G.)

CERTIFICATE BY THE SUPERVISIOR


This is certify that the thesis entitled “CREDIT CARD FRAUD DETECTION
SYSTEM USING MACHINE LEARNING” is a record of research work carried
out by GAURAV RAJ under my guidance and supervision for the award of degree
of Bachelor of Engineering in the faculty of Computer Science & Engineering of
Chhattisgarh Swami Vivekanand Technical University , Bhilai (C.G.), india.
To the best of my knowledge and belief the thesis 1.
Embodies the work of the candidate themselves.
2. Has duty been completed.
3. Fulfils the requirement of the ordinance relating to the B.E. degree of the
university
4. Is up to the desired standard both in respect of contents and language for
being Reffered to the examiners.

Forworded to Chhattisgarh Swami Vivekanand Technical University , Bhilai (C.G.)

Signature of Guide

……………………………
ACKNOWLEDGEMENT
The pleasure, the achievement, the glory, the satisfaction, the reward appreciation
and the construction of our project cannot be through off without the few, who
apart from their regular schedule spared their valuable time. A number of persons
contribute either directly or indirectly in shaping and achieving the desired
outcome.

We express our sincere thanks to our superior Asst. Prof. Abhishek Saw,
Department of Computer Science &Engineering, Raipur Institute of
Technology,Raipur for his valuable guidance, suggestions and help required for
executing the project work time to time. Without his direction and motivation, it
would have been nearly impossible for us to achieve the level of target planned for
providing us with an opportunity to develop this project. Through his timely
advice, constructive criticism and supervision he was inspiration for us.

At the last but not the least we are really thankful to our Parents for always
encouraging us in our studies and also to our friends who directly or indirectly
help us in this work.
ABSTRACT
Financial fraud is an ever growing menace with far consequences in the financial industry.
Data mining had played an imperative role in the detection of credit card fraud in online
transactions. Credit card fraud detection, which is a data mining problem, becomes
challenging due to two major reasons - first, the profiles of normal and fraudulent behaviours
change constantly and secondly, credit card fraud data sets are highly skewed. The
performance of fraud detection in credit card transactions is greatly affected by the sampling
approach on dataset, selection of variables and detection technique(s) used. This paper
investigates the performance of naïve bayes, k-nearest neighbor and logistic regression on
highly skewed credit card fraud data. Dataset of credit card transactions is sourced from
European cardholders containing 284,807 transactions. A hybrid technique of under-
sampling and oversampling is carried out on the skewed data. The three techniques are
applied on the raw and preprocessed data. The work is implemented in Python. The
performance of the techniques is evaluated based on accuracy, sensitivity, specificity,
precision, Matthews correlation coefficient and balanced classification rate. The results
shows of optimal accuracy for naïve bayes, k-nearest neighbor and logistic regression
classifiers are 97.92%, 97.69% and 54.86% respectively. The comparative results show that
k-nearest neighbour performs better than naïve bayes and logistic regression techniques.
Table of Contents :

CHAPTER 1
Introduction

The PwC global economic crime survey of 2016 suggests that approximately 36% of
organizations experienced economic crime. Therefore, there is definitely a need to
solve the problem of credit card fraud detection. The task of fraud detection often
boils down to outlier detection, in which a dataset is scanned through to find
potential anomalies in the data. In the past, this was done by employees which
checked all transactions manually. With the rise of machine learning, artificial
intelligence, deep learning and other relevant fields of information technology, it
becomes feasible to automate this process and to save some of the intensive amount
of labor that is put into detecting credit card fraud. In the following sections, my
machine learning based Pythonic approach is explained.

CHAPTER -2
Introduction to Project and Working :
Due to rise and acceleration of E-Commerce, there has been a tremendous use of
credit cards for online shopping which led to High amount of frauds related to credit
cards. In the era of digitalization the need to identify credit card frauds is necessary.
Fraud detectioninvolves monitoring and analyzing the behavior of various users in
order to estimate detect or avoid undesirable behavior. In order to identify credit
card fraud detection effectively, we need to understand the various technologies,
algorithms and types involved in detecting credit card frauds. Algorithm can
differentiate transactions which are fraudulent or not.Find fraud, they need to passed
dataset and knowledge of fraudulent transaction. They analyze the dataset and
classify all transactions.

Outlier detection is an important problem with several applications.The goal in outlier


detection is to nd those data points that containuseful information on abnormal
behavior of the system describedby the data. Such data points are a small
percentage of the totalpopulation and identifying and understanding them accurately
iscritical for the health of the system.Credit card fraud detection is one such problem
that is often for-mulated as an outlier detection problem. Credit card fraud is one
ofthe common type of frauds that occur in e-commerce marketplacesand it is
important to have robust mechanisms in place to detect dataset of UCI machine
learning repository, the modied versionof the ann-thyroid dataset of the UCI machine
learning repositoryand the credit card fraud detection dataset available in Kaggle .

The Isolation Forest algorithm isolates observations by randomly selecting a feature


and then randomly selecting a split value between the maximum and minimum values
of the selected feature. The logic argument goes: isolating anomaly observations is
easier because only a few conditions are needed to separate those cases from the
normal observations. On the other hand, isolating normal observations require more
conditions. Therefore, an anomaly score can be calculated as the number of
conditions required to separate a given observation.
• Selected dataset contains records of card holders who made transactions using
credit card in September 2013. In the dataset of 2,84,807 transactions, 492are
fraudulent. Selected dataset is in the comma-separated values format i.e. CSV
format. CSV file format is used to store the data in tabular form. Dataset values
are in numerical form as PCA (Principal Component Analysis) transformation is
done on input values. This conversion is done so that the user’s personal
details remain hidden and the user’s security is maintained. Columns having
heads as V1 to V28 show PCA transformed numeric values but time, amount
and class features show their genuine values. Sometimes while dealing with
huge databases it is not possible to do a detailed observation on each value,
hence graphical representation of data makes observation easier. In this
dataset time, amount, class and columns V1to V28, total 31 features are
represented in the form of Histogram. Histogram is an accurate representation
of the distribution ofnumerical data. Time feature shows the elapsed time
between transactions while amount shows actual transaction amount. Class is
the result variable which gives values in the form of 0 and 1, 1 for fraudulent
transactions and 0 for valid transactions.
Random forest algorithm, Decision trees are the main components. Decision
tree is used for both classification and Regression. Decision tree isone of the
powerful and popular method for classification and prediction. It is tree like
structure where internal nodes denotes test on attribute, each branch
represents an outcome of a particular test in terms of binary
classification(answer is in the form of true or false, 1 or 0, yes or no)and leaf
node (terminal node) holds decision or classification. ForConstruction of
Decision tree source test is split into subsets based on an attribute value test.
Now for each derived subset this process isrepeated called as recursive
partitioning. When splitting no longer add value to the predictions, recursionis
completed.Example ofDecision Tree: Mark will play cricket today or not.

• Advantages of Decision Tree : It clearly indicates important fields for


classificationb)It Does Classification without much complex computations)It
handles both continuous and categorical variablesd)It generates simple and
understandable rulesRandom Forest Algorithm is Supervised Learning
Algorithm. It is capable of doing both classification and regression. Random
forest is method that operates by constructing multiple decision trees during
training of the model . The decision voted by maximum trees is considered by
the random forest algorithm. Number of trees in forest and results are directly
related to each other as higher number of trees in forest leads to higher
efficiency .For Implementation of random forest algorithm Decision tree is the
support tool. We have already discussed decision tree. We input a training
dataset with labels and pass to decision tree module and it formulates some
rules. These rules can be used to perform predictions.
B. Random Forest Creation : Randomly select ‘r’ features from all total
features, r << total features2)2. Among r features calculate node using
best split point3)Split the node into child nodes using splitting
method4)Repeat the process for further nodes5)Follow above steps ‘n’
times to create ‘n’ number of trees into the forestC.Random Forest
Predictions1)Take features and use rules of each randomly created
decision tree to predict outcome and store it for further use2)Calculate the
votes for each predicted feature3)Consider highest voted answer as the
final prediction from random forest algorithm.
CHAPTER -3
1. Software and Hardware Requirement

1.1. Software Requirement


• Jupyter Notebook

• Android IDLE

• Django Framework

• Pycharm Idle

1.2. Hardware Requirement

• Laptop or PC

• Wifi
CHAPTER- 4
2. Flow Diagram

Screen Shot :
CHAPTER-5
Conclusion and Future Enhancement:

Method Used For Calculating Efficiency In this project, two algorithms


Random Forest Algorithm and Local Outlier Factor are compared for
detecting fraudulent transactions from the given dataset. Random Forest
Algorithm is better in detecting frauds than Local Outlier factor. Efficiency
is 96% for Random Forest Algorithm and 99.7% for Local Outlier factor.
Credit cardfraud detection is efficient by both of these algorithms but
every algorithm has its own specific advantages and disadvantages.
Combining more than one algorithm will give higher efficiency.

Methodology :
We are using Random Forest Algorithm and Local Outlier Factor for detecting
fraudulent credit card transactions from the dataset. Here given dataset is in labelled
format. For analysing efficiency of the algorithms, we use split function on database.
Split function divides the dataset in training data and testing data. Amount of data
that is to be divided into training and testing data isupon user. User can decide how
much data to be used for training and testing purposes as per the need. Training
data is the data that is to be passed to the module for building its logic. After model
is trained with the training data, testing data is passed to the model to check
efficiency of algorithms. Here we have used 80% of the total credit card transactions
for training purpose and remaining 20% of the transactions for testing purpose.
Selected 80% of training data is used to train fraud detection module, module
defines its logic for dealing with further transactions, algorithms used can be
Random Forest Algorithm or Local Outlier Factor, Testing Data is passed to the
module as training of module is complete.

Potrebbero piacerti anche