Sei sulla pagina 1di 22

T H E FT

A N A LY T I C S
Team 3

INTRODUCTION

Overview

Model Fitting
Dimensionality
Reduction
Feature
Engineering
Data PreProcessing

Model
Training

Dataset

Vigilanc
e Data

Training
Dataset

Billing Data
..
Consumer Data
..
Complaints
Data

Predict
Theft

Test Dataset

List of features
Feature
SDBM_BU_ID
SDBM_PC_ID
SDMB_MR_ID

Description
Code used by the line man/ meter reader to identify
houses (Billing Unit)
Code used by the line man/ meter reader to identify
houses (Process Cycle)
Meter Reader Code

The month and year for meter reading recording


The number of days for which meter reading is
taken
SDBM_BILLING_DAYS
No of phases connected to load
SDBM_PHASE
0 & 2- Live
SDBM_DISCONN_TAG_ID 4- Permanent disconnect
Left out Id- Bill basis status
SDBM_LO_ID
SDBM_TOTAL_ARREAR Arrear amount
MONTH_YR

SDBM_SD_AMT

Total Bill Amt

List of features
Feature

Description

SDBM_LAST_RECEIPT_AMT Last amount paid for againt electric bill


Power consumption in units
SDBM_CONSUMPTION
The month and year for recording meter
reading
SDBM_BILLMONTH1
Tariff name based on residential, commercials
TARRIF_NAME
Whether consumer is govt. employee or not
-1 : Not a Govt. Employee
1 : Govt. Employee
GOVT_EMPLOYEE
Net Bill Amount to be paid by the customer
NET_BILL
BILL

Current Bill Amount

Last_Pay_Date

Last date of payment

Features Created
Excess Load for a Customer :
Connected Load
Sanctioned Load

Features Created
Excess Load for a Customer :
Connected Load
Sanctioned Load

Mean Consumption for a Customer :


Average Consumption in 12
months

Features Created
Excess Load for a Customer :
Connected Load
Sanctioned Load

Mean Consumption for a Customer :


Average Consumption in 12
months

Variance in Consumption for a Customer :


Variance in Consumption in 12
months

Features Created
Excess Load for a Customer :
Connected Load
Sanctioned Load

Mean Consumption for a Customer :


Average Consumption in 12
months

Variance in Consumption for a Customer :


Variance in Consumption in 12
months

Categorical Feature Encoding


One-hot encoding was used for categorical
feature encoding
Total Number of features after encoding : 148

Categorical Feature Encoding


Dimensionality Reduction using PCA

137 dimensions were reduced to 108

Retained 98.6 % of variability after


dimensionality reduction

Model Fitting - Transduction


Theft analysis here is a
one-class classification
model
Some positive labeled
data points and many
unlabeled ones
Identifying positive
labels from the
unlabeled dataset

Transductive Modeling

Transductive modeling better suited in the case


of this theft analysis than inductive modeling

Transductive Modeling

Transductive modeling better suited in the case


of this theft analysis than inductive modeling

Inductive models generalize where as


Transductive models specialize

Transductive Modeling

Transductive modeling better suited in the case


of this theft analysis than inductive modeling

Inductive models generalize where as


Transductive models specialize

Various one-class classification models include


Gaussian Mixture Model (GMM)
One-class Transductive SVM (T-SVM)
Transductive Random Forests (T-RF)

One class Transductive Support Vector


Machines
Why one-class SVM?

GMM can be used when the number of


features used were less
SVM performs better when there features are
more in number

One class Transductive Support Vector


Machines
Why one-class SVM?

GMM can be used when the number of


features used were less
SVM performs better when there features are
more in number

In general SVM builds a large margin between


the classes to be differentiated

One class Transductive Support Vector


Machines
Why one-class SVM?

GMM can be used when the number of


features used were less
SVM performs better when there features are
more in number

In general SVM builds a large margin between


the classes to be differentiated

T-SVM, we train on the labeled data and predict


on the unlabeled data

Model Parameter Selection

Selection of C (gamma
parameter) of radial basis
kernel for SVM cannot be
done using trivial cross
validation

Intuition of parameter
Selection:
Start with small C
Add labels to some test
data
Slowly increase C
Keep on labeling

Results

Training time 2769 Seconds

Final gamma parameter C after termination


0.68

Total number of thefts identified - 2456/10265

THANK YOU

Potrebbero piacerti anche