Sei sulla pagina 1di 2

Classification of video games outcome

Problem description:
DOTA (Defense of the Ancients) is a video game in which two teams of five
players fight each other on a virtual battlefield. Before the game starts, each
player can pick a virtual hero for the battle out of a pool of 100 heroes. Each
hero has distinct sets of skills, characteristics and has their own strengths and
weaknesses. The hero (or character) represents the player throughout the game.
Each five-player team usually has five distinct heroes.
Task: A training data of 15000 played games is provided. The data contains
the name of picked heroes for both sides and the outcome of the game. Because
players are evenly skilled, the outcome of the game depends on the choice of
heroes. The challenge is to develop a classifier that, given a new formation of
heroes of both team, will successfully predict the outcome of the game.

Search for the right data representation


A. Lazy approach:
Process: Assign random distinct numbers to each character. Input all data
points as aggregation of 1500 observations across 10 variables. Using different
hypothesis (decision tree, neural nets) to determine the classification output 1
or 2.
Analysis: This takes advantage of using strong machine learning algorithms
to search for an underlying function that captures the meaning of the data.
However, the search has two potential drawbacks:
1. Random number may not be a good estimation of each of the character.
2. The model treats variables (from player 1 to player 10) in a non-distinct
manner. For example, we can change player 2s character with player 3s character and the outcome is still the same. However, training algorithms such as
neural nets are calculated based on the assumption that variables are independent. Thus we may come up with hypothesis function that outputs contradictory
result for two observations that has the exact same collection of characters but
in a different order.
B. A more elaborate approach from A
Process: Assign random number to each characters. Calculate the mean,
medium, standard deviations of each 5 numbers. Input these numbers as training data.
1

Analysis: This solves the problem of interchangability. We have much better representation of the data than the previous one, because any permutation
of the five heroes result in the same mean, medium and standard deviations.
However, there are two questions that need to be address:
1. Have we found a sufficient amount of variables to represent the data?
2. (From A) Is random number a good representation of the character/ hero?
C. Alternative approach, using graph Another approach is to represent
heroes as distinct vertices in a undirected graph. This eliminates the worries
about number representation of hero, but it couldnt take advantages of a library of machine learning algorithms. The algorithm goes as followed:
1. Set all edge weights to be 0.
2. For each training data, decrease all edges among the losing teams characters by a fixed amount a, and increase the same amount for the winning team.
3. For prediction task, calculate the total weights of both teams and predict
the winning for higher-weighted team.
Analysis: This captures the level of collaborativeness of two individual
heroes. However, does it capture that of a team of 5?
D. An attempt to solve the issue of collaborativeness representation among a team of 5 in C.
Algorithm: Represent all heroes as data points in a multidimensional space.
Each time a team is winning, we pull all data points of team heroes towards
the centroid by a certain amount. Conversely, we push all data points of losing
team heroes towards the centroid by a certain amount.
This approach can cure the problem of C, but it still has problem of B and
A, namely, that we need to assign random vectors to represent each data point.
Another question is to search for the right number of dimensions that we want
to represent our data.
Can we do better?

Potrebbero piacerti anche