Sei sulla pagina 1di 9

SCORE PREDICTION AND ANALYSIS

IN CRICKET

Team Members:
18BCE0348 (NISHANT SHUKLA)

Report submitted for the


Third Project Review of

Course Code: CSE3013 – Artificial Intelligence

Slot: F1 + TF1

Professor: Dr. W.B. Vasantha Kandasamy


INTRODUCTION

Cricket is played in many countries of the world. There are a lot of domestic
and international matches organised by the International Cricket Council (ICC)
all over the world. Also, cricket is popular among the statistical science
community, but its inconsistent nature makes it difficult to predict by using
common probability models.

The main reason of cricket being unpredictable is the continuous change of


momentum between the two teams. Sometimes we get the result of the matches
at the last ball of the match. Considering all these unpredictable scenarios the
spectators have huge interest in predicting the final score of the team based on
the current scenario in the game which has also resulted in the rise of betting
industry which is now illegal in our country. There are 11 players on a team,
and the pitch is in the middle of an oval-shaped mound where most of the game
action occurs. Batsmen play in pairs, but bowlers are not allowed to throw the
ball; instead, they must use a “stiff-arm” action to deliver a ball. As is typical in
games of sport, winning is the ultimate goal. The team which has higher score
than the other in same number of overs is the winner.

In this paper, a method has been proposed in which the final score can be
predicted of the first innings and the winning probability of the batting team in
the second innings can be estimated. In the former case Linear Regression
Classifier has been used and in the latter Random forest regression has been
implemented. Unlike the current procedure for projecting the score, the factors
like the venue of the match, the number of wickets fallen and the batting team
have been considered in the estimation and in the second innings, the target
given to the batting team has been included along with the factors taken in the
first innings, for probability estimation.

2. Literature Review Summary Table:-


Limitation
Concept / Methodolog s/ Future
Authors and Dataset
Theoretical y used/ Relevant Research/
Year Title (Study) details/
model/ Implementa Finding Gaps
(Reference) Analysis
Framework tion identified

Only for
577 The user
T20
R lokhande Dataset different can be
Prediction of Comparing matches.
and analysis and matches rewarded
cricket score two Future
P chavan tabular files from if
and winning algorithms work will
(2018) algorithm cricsheet. predicted
be for all
org correctly
formats
More
Matches
Considering accuracy if
Sonu Kumar, Score between Limited
all factors Regression deep
Sneha Roy prediction in 2006-17 dataset,ac
affecting the analysis neural
(2017 ) cricket From cur-acy
match network is
cricinfo
used

First only Data of Team and Only for


Stylianos Using ML to
With Team team data English player 20-20,
Kampakis, predict
and player then both T20 combined including
William Thomas outcome of
data team and county was more weather as
(2015) cricket match
player cricket accurate a factor

“Home
Comparing
advantage
Ananda Predicting run rate Last 12 Only for
Linear ” and
Bandulasiri the winner in model with years ODI
regression benefit of
(2016) ODI cricket regression ODI data matches
winning
model
toss
Future
work will
Score and Comparing
be to
winning bayes 2002- Higher
Vishal single, increase
prediction in Naïve bayes classifier 2014 all accuracy
prateek Bhatia accuracy
cricket classifier with matches for bayes
(2015) and
through data regression played classifier
include
mining model
toss as a
parameter

3. Objective of the project:-


The Aim of this project is to cover all factors that affect run scoring ability of a
batsman and to what extent we can predict the final score of a team and the
outcome of the match. Various datasets are used for each format of the sport
with each set having more than 5 parameters to improve the reliability of the
system. This model attempts to predict the innings total in any form of a cricket
match.

4. Innovation component in the project:-

This project will take all factors in consideration for predicting scores even the
weather of the day/night, and it will be for all formats of the game.

5. Proposed work and implementation

Methodology adapted: Regression (Mainly two methods linear regression and


Random forest Regression) and decision tree algorithms

Hardware and software requirements: Python 3.0

Linear regression – It is a machine learning algorithm which is based on


supervised learning. It performs a regression task. Regression models a target
prediction value based on independent variables. It is mostly used for finding
out the relationship between variables and forecasting. Different regression
models differ based on – the kind of relationship between dependent and
independent variables, they are considering and the number of independent
variables being used.

In linear regression, we predict a dependent variable based on an independent


variable. So this finds a linear relationship between input and output. Hence it’s
called linear regression

The function for linear regression is:-

Y=A+Bx
While training the model we are given:-

x: input training data (univariate – one input variable(parameter))


y: labels to data (supervised learning)
When training the model – it fits the best line to predict the value of y for a
given value of x. The model gets the best regression fit line by finding the best
θ1 and θ2 values.
θ1: intercept
θ2: coefficient of x

Random Forest regression – The basic concept behind Random Forest is that
it combines multiple decision trees to determine the final output.

It is also a supervised learning algorithm. It uses the ensemble learning


technique to make multiple decision trees at a time or a single algorithm to get
more accurate and stable prediction.

Decision Trees – It is a simple representation for classifying examples. It is a


kind of tree in which each internal node is labelled as an input. The branches
from node are labelled with each values. Each leaf of the tree is labelled with a
class or probability distribution over the classes. A deterministic decision tree,
in which all of the leaves are classes, can be mapped into a set of rules, with
each leaf of the tree corresponding to a rule.

Below are two examples of decision trees:-

5. Dataset used / Tools used:


a. Where from you are taking your dataset?

GitHub:-

 1188 ODI matches


 1474 T-20 matches
 617 IPL matches

-
b. Is your project based on any other reference project (Stanford Univ. or MIT)?
- Yes, Using Machine Learning to Predict the Outcome of English County twenty
over Cricket Matches Stylianos Kampakis, University College London.
-
c. How does your project differ from the reference project?*
- The reference project does not make prediction for all formats of the game whereas
my project takes each format along with various factors affecting the game.

6. Screenshots and demo:

Code for linear regression:-


def custom_accuracy(y_test,y_pred,thresold):
right = 0

l = len(y_pred)
for i in range(0,l):
if(abs(y_pred[i]-y_test[i]) <= thresold):
right += 1
return ((right/l)*100)

import pandas as pd
# Importing the dataset
dataset = pd.read_csv('data/odi.csv')
X = dataset.iloc[:,[7,8,9,12,13]].values
y = dataset.iloc[:, 14].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25,
random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the dataset


from sklearn.linear_model import LinearRegression
lin = LinearRegression()
lin.fit(X_train,y_train)
# Testing the dataset on trained model
y_pred = lin.predict(X_test)
score = lin.score(X_test,y_test)*100
print("R square value:" , score)
print("Custom accuracy:" , custom_accuracy(y_test,y_pred,20))

# Testing with a custom input


import numpy as np
new_prediction = lin.predict(sc.transform(np.array([[100,0,13,50,50]])))
print("Prediction score:" , new_prediction)

Output screen:-

Here the final score was predicted after we entered the input values which are:-
Features: [runs,wickets,overs,striker,non-striker]

We input the current runs, wickets and scores of both striker and non-striker end.

R-squared is a statistical measure of how close the data are to the fitted regression
line.

After getting the output we wait for the expected result which will be to predict the final
score of a team after recognising the patterns from the dataset and measure its accuracy using
custom values which will take the current scenario of the game as input.

Code for Random Forest Regression:-


def custom_accuracy(y_test,y_pred,thresold):
right = 0
l = len(y_pred)
for i in range(0,l):
if(abs(y_pred[i]-y_test[i]) <= thresold):
right += 1
return ((right/l)*100)

# Importing the dataset


import pandas as pd
dataset = pd.read_csv('data/odi.csv')
X = dataset.iloc[:,[7,8,9,12,13]].values
y = dataset.iloc[:, 14].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25,
random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the dataset


from sklearn.ensemble import RandomForestRegressor
reg = RandomForestRegressor(n_estimators=100,max_features=None)
reg.fit(X_train,y_train)

# Testing the dataset on trained model


y_pred = reg.predict(X_test)
score = reg.score(X_test,y_test)*100
print("R square value:" , score)
print("Custom accuracy:" , custom_accuracy(y_test,y_pred,20))

# Testing with a custom input


import numpy as np
new_prediction = reg.predict(sc.transform(np.array([[100,0,13,50,50]])))
print("Prediction score:" , new_prediction)

Output Screen:-

Here the predicted score is 310 as compared to 322 in linear regression.

Finally, Random forest is better than linear regression as random Forest is able to discover
more complex dependencies at the cost of more time for fitting.

So chances are that Random forest will be able to predict scores correctly
7. References
1- International Journal of Trend in Research and Development, Volume 5(4), ISSN: 2394-
9333 www.ijtrd.com IJTRD | July – Aug 2018 Available Online@www.ijtrd.com 91
Prediction of Live Cricket Score and Winning 1Rameshwari A. Lokhande and 2Pramila M.
Chawan, 1Student, 2Professor, 1,2Computer and IT Dept, Veermata Jeejabai Technological
Institute, Mumbai, India

2- Using Machine Learning to Predict the Outcome of English County twenty over Cricket
Matches Stylianos Kampakis, University College London, stylianos.kampakis@gmail.com
William Thomas, University College London

3- Score and Winning Prediction in Cricket through Data Mining Tejinder Singh, Vishal Singla,
Parteek Bhatia Computer Science & Engineering Computer Science & Engineering Computer
Science & Engineering Thapar University Thapar University Thapar University Patiala,
Punjab, India Patiala, Punjab, India Patiala, Punjab, India teji.tsk@gmail.com
vsingla160@gmail.com parteek.bhatia@thapar.edu

4- Predicting the Winner in One Day International Cricket Ananda Bandulasiri, Ph.D.

***************************************************************************

Potrebbero piacerti anche