Sei sulla pagina 1di 8

Crime Analysis and Prediction Using Data Mining

Agaba Joab Ezra, Erio Crucecia, Binga Ivan, Amoo Brenda Ayoo, Alaroker
Juliet Olwedo.
College of Computing and Informatics Sciences, Makerere University Kampala Uganda.
Department of Computer Science

Crime analysis and prediction is a systematic approach for identifying and analyzing patterns and
trends in crime. Crime analysis is an area of vital importance to the police department. Study of
crime data can help the police department to analyze crime patterns, inter-related clues and
important hidden relations between the crimes, that is why data mining can be of great aid to
analyze, visualize and predict crime using crime data sets. [1] Classification and correlation of
data sets makes it easy to understand similarities & differences amongst the data objects. The focus
is on criminality of places rather than the tracing of individual criminals. The main users of the
system will be the police force who from time to time shall be able to predict the possibility of
crime occurrence in the nearest future as well a particular time the crime will likely occur. In this
paper we basically look at kmeans clustering algorithm for data mining and use it to generate
hotspots of criminal activity and also use it to predict the chance of crime occurring in the nearest
future. This analysis may help the law enforcement of the country to take a more accurate decision
for example allocation of more resources like police officers in the crime prone areas [2].

Key words: Crime Analysis, data mining, classification, correlation, Visualization, prediction,
Kmeans algorithm, modus operandi.
direct effect on society. Governments spend
1. INTRODUCTION lots of money through law enforcement agencies
For a long time, research in the area of crime to
analysis has been used to mitigate crime and try and stop crimes from taking place. Today,
ensure public safety. The improvement of many law enforcement bodies have large
technology in recent years has proven to help in volumes of data related to crimes, which need to
mining of large volumes of data. Data mining be processed to turn into useful information.
helps in processing of large amounts of data and Crime data are complex because they have many
discovering hidden information. [1] dimensions and in different formats e.g. most of
Crimes are a social nuisance and it has a them contain string records and narrative records.
Due to this diversity it is difficult to mine them. 2. OVERVIEW OF CRIME
The research problem that this project tries to
According to the Uganda police annual crime
address is to develop a software platform to
report of 2014 [11], a crime is an act committed,
conduct descriptive and predictive analysis of
or omitted, in violation of the law either
diverse crime data. Descriptive analyzing focuses
forbidding or commanding it. Crime can also be
on identifying spatial temporal relationships with
referred to as a comprehensive concept that is
crime data. Predictive analytics methods are
defined in both legal and non- legal sense [7].
mainly used for predicting category of a crime
From the legal point of view crime is the
which can occur somewhere at a given time.
breaking or breaching of the criminal law
Crime cannot be predicted since it is neither
(penal code) that governs a geographical area
systematic nor random [3]. Even though we
(jurisdiction) aimed at protecting the lives,
cannot predict who may be the victims of crime
property and the rights of citizens of belonging
but we can predict the place that has probability
to that jurisdiction.
for its occurrence. Our goal is to design an
Crime occurs in a variety of forms
effective system that will give accuracy of at
which police informally categorizes as being
most 70%. So for building such a powerful crime
major or volume. Major crimes consist of the
analytics tool we have to collect crime records
high profile crimes such as murder, armed
and evaluate it. The main challenge in front of us
robbery. These crimes can either be one-offs or
is developing a better, efficient crime pattern
serial. Serial crimes are relatively easy to link
detection tool. The challenges we faced included:
crimes together due to clear similarities in terms
• Analysis of data is difficult since data is of modus operandi or descriptions of the
incomplete and inconsistent offenders. This linking is possible due to the
• Limitation in getting crime data from law comparatively low volume of such crimes.
enforcement bodies. Major crimes usually have a team of detectives
• Accuracy of the data which highly allocated to conduct the investigation. In
depends on the accuracy of the training contrast, volume crimes such as burglary and
set. shoplifting are far more prevalent. They are
usually serial in nature as offenders go on to
commit many such crimes. Property crimes
such as domestic burglary offences committed
by individuals are highly similar and it’s rare to
have a description of the offenders.

Table 1. Classification of crime Source:

(Chen et al., 2003) [10].
Crime Description
Traffic Driving under the influence of
Violations alcohol, fatal/personal injury/
property damage traffic
Figure 1. Steps in crime analysis
accident, road rage.

Sex crime Sexual offences

Fraud Forgery and counterfeiting, II. Routine Activity Theory. This theory
frauds, embezzlement, identity explains the occurrence of crimes as the
deception. result of several circumstances. Namely,
Arson Arson on buildings a motivated offender, a desirable target,
target and offender must be at the same
Gang / Narcotic drug offences
place at the same time, and lastly absent
drug (sales or possession) of other types of controllers’ intimate
offences handlers, guardians and place managers.
Violent Criminal Homicide, armed
crime robbery, aggravated assault, III. Crime Pattern Theory. This theory
combines the above two theories and
other assaults
goes on to say that how targets come to
Cyber Internet frauds, illegal trading, the attention of offenders is influenced
crime network intrusion /hacking, by distribution of crime events over
virus spreading, hate crimes, time, space and among targets. An
cyber piracy, cyber offender will come to know of criminal
pornography, cyber-terrorism, opportunities while engaging in their
theft of confidential information day-to-day legitimate work. So, a given
offender will only know about a subset
of available targets. The concept of place
3. RELATED WORK is essential crime pattern theory.
3.1 Criminology Theories
Understanding criminology theories is essential
According to John and David, theories of to try and create crime analysis tools or
crimes can be divided into two categories platform using modern technologies.
namely, those that seek to explain the
development of criminal offenders and those that
seek to explain the development of criminal 3.2 Data mining
events. Criminology has been mainly developed
through theories and research on offenders. Only Data mining deals with the discovery of
recently it has begun to explain the crimes rather unexpected patterns and new rules that are hidden
than criminality of people involved in it. in large database. It serves as an automated tool
Criminology consists of many theories that that uses multiple advanced computational
explain how and why some offenders act in the techniques, including artificial intelligence to
way they do. Following are some of the theories fully explore and characterize large datasets
that explain how places are associated with involving one or more data sources, identifying
significant recognizable patterns, trends and
crimes [3].
relationships not easily detected through
I. Rational Choice. Rational Choice traditional analytical techniques [7]. This
suggests that offenders will select targets information then may help with various purposes
and define means to achieve their goals such as prediction of future events or behavior.
in a manner that can be explained. The development of new intelligent tools for
Further it can be explained as that human automated intelligent tools for automated data
actions are based on rational decisions, mining and knowledge discovery has led to the
that is they are informed by probable design and construction of successful systems
consequences of that action. that show early promise in their ability to scale
up the handling of voluminous datasets.
3.3 Data visualization damage others property or name. Shortcuts to
become rich, lust etc., So to analyze the crime
Visual methods are powerful tools in data
one must have the knowledge about external
exploration because they utilize the power of the
factors, apart from the reports, when a machine
human eye/brain to detect structures. A number
has to do this kind of intelligent task, best suitable
of data mining tools for visualization exist, a
algorithms need to be considered, from the
histogram and kernel plots being the most basic
used for displaying single variables. literature review various frameworks were
proposed and designed which are discussed in
Scatter plots for the display two variables next section [2].
at a time and reveal correlation, if any, between Crime analysis is generally difficult as it
them. And for more than two variables, scatter requires both collection and analysis of large
plot matrices are often used[9]. GIS also provides volumes of data. For example, according to the
a powerful visualization tool through display of Uganda police annual crime report of 2014[10]
maps that allow the exploration of spatial over 300,000 complaints and reports were made
patterns in an interactive fashion [10]. It is to the police. Given such volume of data, there is
important to visualize the crime data in order to a need to apply different algorithmic techniques
get a clear idea about its distribution and also to for bids manual analysis. Whereas an automated
display the analyzed results in a user friendly analysis of such rich dataset could identify
manner. When achieving this task, it is important complex crime patterns and assist in solving
to display time dimension within GIS crimes faster.
visualization system. But more focus has been Data mining techniques can be used in
given to display the spatial distribution of high law enforcement for crime data analysis, criminal
crime areas or hot spots of crime. An example of analysis and analysis of other critical problems
this is pin maps, which have been used since the [1]. Some of the traditional data mining
beginning of modern police systems. The utility techniques are association analysis, classification
of crime maps has increased significantly since and prediction, cluster analysis and outlier
the growth of Geographic Information Systems analysis which identify patterns in structured
(GIS) and geo-archives that link criminal data data.
with socio-economic and environmental data
such as location of high schools, liquor shops or
metro routes that potentially impact crime 3.5 Existing vs proposed platform
Several applications have been already
developed for crime analysis. Most of these
3.4 Crime analysis
tools are developed to help the police forces to
The main purpose of crime analysis is to identify different crime patterns and even to
provide the data needed to the investigator or predict criminal activities.
police from the huge amount of information Recent applications were developed by aiming
stored in the department to assist them in a right at adopting data mining techniques.
direction in order to prevent crimes, and control
There are following challenges in our proposed
the criminal activities which may occur in future.
The criminal analysis is not an easy task, because • Ever Increasing size of crime information
to commit any crime various parameters might that has to be stored and analyzed.
have supported, emotional imbalance, anger, • Problem of identifying techniques to
jealous, revenge, inferiority complex, ego, gain, analyze this data
• Methods and structures to be used for resources in those areas. This prediction is based
storing crime data. on attributes like location, time, day of the week
• The inconsistent and incomplete data and crime categories.
complicates analysis. Descriptive Analyzer uses both quantitative and
qualitative data and analytical techniques.
Qualitative data and analytical techniques refer to
a. Existing System non-numerical data as well as the examination
• Systems like COPLINK are complex and and interpretation of observations for the purpose
need user training. The current version of of discovering underlying meanings and patterns
COPLINK does not support temporal of relationships. This is most typical of field
reasoning or visualization and does not research, content analysis, and historical
support data mining. [4] research. Quantitative data are data primarily in
• Regional Crime Analysis numerical or categorical format.
Program Quantitative analysis consists of
(ReCAP) is a system designed to aid local manipulations of observations for the purpose of
police forces through crime analysis and describing and explaining the phenomena that
prevention. The system is quite old and is those observations reflect and is primarily
based on the windows 95 and works only statistical. Descriptive crime analysis employs
with a Local Area Network. This system both types of data and techniques depending on
is only limited to descriptive analysis of the analytical and practical need. For example,
data. [5] crime data can be used in various ways both
• Another existing platform is the Crime quantitatively and qualitatively. The information
Prediction Model(CPM) which predicts such as date, time location and type of crime is
offenders of terrorists events based on quantitative in the statistics can be used to
location, date and modus operandi analyze these variables. Predictive analytics
attributes. It basically focusses only on methods are mainly used for predicting category
terrorists’ activities and groups. [6] of crime which can be occurred somewhere at a
There are several other platforms and models given time. In order to integrate predictive
described in several papers about crime data analytics features, it is necessary to have a
analysis. Each of the above platforms assist law machine learning component as well.
enforcement bodies to analyze and identify
different crime patterns. 4. METHODOLOGY

b. Proposed system During data collection, we were able to interview

four major police stations around Kampala i.e.
After going through these solutions, it’s clear Naguru police headquarters, Kibuli police
that these platforms are specific for a given task. division, Wandegeya police division and Central
police station. The collected data is collected is
It is very useful to have all the above mentioned stored into a database for further processing.
analytical techniques one platform. That is a Crime data is an unstructured data since the
platform which can perform both descriptive and number of fields, content and size of the
predictive analysis of data. document can differ from one document to
In the proposed system, we are introducing the another. Other benefits of using an unstructured
application which will predict crime prone areas database is that:
so that government agencies can allocate
• Large volumes of structured,
semi structured, and unstructured data.
• Object-oriented programming that is easy
to use and flexible.
Interview guides were used to carry out this
survey. Other materials included pens and audio
recorders. The purpose of the study was Figure 2. Kmeans Pseudocode
included in the interview guide. We also used
document analysis to broaden our research by
iii. Pattern Identification
analyzing several documents including the
Uganda annual crime reports of 2008 and 2014. Third phase is the pattern identification
phase where we have to identify trends and
Table 2. Variables from the data collected patterns in crime. We shall be using the
KMedoid Clustering Algorithm to identify
patterns. Amedoid can be defined as the
object of a cluster whose average
dissimilarity to all the objects in the cluster
is minimal. i.e. it is a most centrally located
point in the cluster. [5]
iv. Prediction
For prediction we are using the decision tree
concept. A decision tree is similar to a graph in
which internal node represents test on an
attribute, and each branch represents outcome of
a test. The main advantage of using decision tree
is that it is simple to understand and interpret. The
ii. Data clustering/classification
other advantages include its robust nature and
We shall be using the kmeans algorithm for also it works well with large data sets. This
prediction analysis. This algorithm is mainly feature helps the algorithms to make better
used to partition the clusters based on their decisions about variables [4]. Corresponding to
mean. As a first step number of objects are each place we build a model. So for getting the
grouped and specified as K clusters. K numbers crime prone areas we pass current date and
of objects are initially selected as the cluster current attributes into the prediction software.
centers. Then again these objects are assigned The result is shown using some visualization
based on cluster center. Then cluster means are mechanisms
updated again. This algorithm is used as a base
for most of the other clustering algorithms [8]. v. Visualization

The crime prone areas can be graphically

represented using clusters with different colors
which indicates level of activity, usually darker
colors to indicate low activity and brighter colors
to indicate high activity.
Figure 4. Most frequent crimes committed

Both graphs indicate that theft is the most

Figure 3. Crime clusters in Chicago City rampant crime and that it occurs mostly in the
night hours as compared to other crimes. [1]
Results of the study showed that there’s still SCOPE
a lot that needs to be done when it comes to crime
analysis. The documents that we reviewed We looked at the use of data mining
indicated weak methods used for analyzing data. for identifying crime patterns using the
Data held at these different police stations are still clustering technique. Our contribution
poorly and are still captured manually, there’s a here was to formulate crime pattern
lot of missing data in files which makes data detection as machine learning task and to
analysis a very difficult task. From the data thereby use data mining to support
collection we conducted, it was evident that detectives in solving crimes.
crime has been increasing with at estimation of Along with the present scope of our
2% increase from the year 2013 to 204. In 2013 project, which is prediction of the crime
the crime rate was at 273 going by the population prone areas, we can also predict the
projection of 36,000,000. However, the current estimated time for the crime to take place
estimates in the 2014 population census were as a future scope. Along with this, one can
rebased at 34, 856, 813 therefore giving a crime try to predict the location of the crime. We
rate of 287. The crime rate by the end for 2014 will test the accuracy of frequent-item sets
was: and prediction based on different test sets.
So the system will automatically learn the
(103,720/34,856,813) X 100,000 = 298. This changing patterns in crime by examining
means that out of 100, 000 people, 298 were the crime patterns. Also the crime factors
victims of crime. change over time. By shifting through the
crime data we have to identify new factors
Graph showing crime distribution by time of that lead to crime. Since we are
the day. considering only some limited factors full
accuracy cannot be achieved. For getting
better results in prediction we have to find
more crime attributes. Our software
predicts the crime an individual criminal
is likely to perform. We will use Apriori
Algorithm with association rule mining
for this purpose. This will determine the
next crime a criminal is about to commit.

[1] F. U. M, "Knowledge Discovery and journal on systems and man, vol. 3, no. 3,
Data Mining: Towards a unifying pp. 2848-2853, 2005.
framework," in AAAI Press, Menlo Park,
Carliforni, Portland, Orgeon, Proc. 2nd Int. [9] W. C. H. Chen, "A general framework
Conf. on Knowledge Discovery and Data and some examples," in IEEE international
Mining.. conference, 2004.

[2] B. a. B. .L, "learning organisations in [10] A. R. W. a. M. Peter, "Police crime

the public sector," A study of police recording and investigation systems," an
agencies Employing Information And international journal of police strategies
Technology To Advance Knowledge., vol. and management, vol. 24, no. 1, pp. 100-
1, no. Public Administration Review 63, pp. 114, 2001.
30-43, 2003.
[11] O. F. R, "Survey of Data mining
[3] B. M. M. a. B. J. L, Learning methods for crime analysis and
organisations in the Public Sector, 2003. visualization,"
[4] a. W. D. M. Morabito, "Crime
Journal of ICT and legal applications, vol. 4
forecasting using data mining techniques,"
11th IEEE International Journal, vol. 29,
no. 10, pp. 779-786, 2011.
[5] F.-T. a. M. Green, "Crime and society,"
International Journal of Social Economics,
vol. 29, no. 6, pp. 781-795, 2002.
[6] A. Murray, "Explanatory spatial data
analysis techniques for examining urban
crimeimplications for evaluating
treatment," British Journal of Criminolgy,
vol. 41, no. 2, pp. 309-329, 2001.

[7] A. E. a. U. N, "Geographic Information

Systems Technologies In Crime Analaysis
And Mapping," International Journal of
Data Mining Techniques ans Applicatins,
vol. 1, no. 2, pp. 117-120, 2012.

[8] D. E. Brown, "The Regional crime

analysis program," IEEE internaltional