Sei sulla pagina 1di 25

2019

A statistical analysis of Roger


Federer’s career
FABIZ- ENGLISH SECTION, GROUP 131
MIRUNA POPESCU
Contents
1. Introduction .......................................................................................................................................... 2
2. Wins and Losses over the course of his career at ATP each tournament type .......................................... 3
2.1. Grand Slams ....................................................................................................................................... 3
2.2 Masters 1000 ...................................................................................................................................... 8
2.3 ATP 500 ............................................................................................................................................. 11
2.4 ATP 250 ............................................................................................................................................. 13
2.5 Pie chart representing total wins obtained throughout Roger Federer’s career at all types of
tournaments mentioned before. ............................................................................................................ 16
3. A statistical analysis of the prize money earned by Roger Federer throughout his career .................... 17
4. Evolution of Roger Federer’s abilities until the present moment .......................................................... 19
4.1 Winners and unforced errors evolution in matches ......................................................................... 19
4.2 Roger Federer’s serve ....................................................................................................................... 20
4.3 Suggestions for improvement ........................................................................................................... 21
5. Conclusion ............................................................................................................................................... 22
Bibliography ................................................................................................................................................ 22
References .................................................................................................................................................. 23
1. Introduction

Sports have become an integral part of our constantly evolving society. All across the world, almost each
country can say that they have a national sport that they are proud of. However, sports have grown to
become more than that. Probably due to the continuous globalization, some sports have been
embraced worldwide.

A perfect example of such sport is represented by tennis. Played for the first time in Birmingham,
England, sometime between 1859 and 1865, tennis has succeeded in amassing a staggering 1 billion fans
according to the latest studies. This figure is even more impressive if the world’s population (7.7 billion
as of November 2018) is also taken into account.

What is more, the most watched tennis match, the Australian Open Men’s Final in 2017, drew in 15.2
million simultaneous viewers in order to become Eurosport’s most watched tennis match of all times.
Furthermore, the ESPN ratings reports revealed that this final had experienced a growth in ratings of
over 80% from the one in the previous year and that it will most likely remain ESPN’s most watched
sports event in that particular time slot ( Sawe, 2018).

All of these facts ignite the question of just what were the reasons spurring these impressive figures.

One of the clear answers is represented by the current levels which the top ranked players in the world
have constantly showcased, but also their longevity and ability to create an icon worthy of people’s
admiration.

Nowadays, people are not content with just attending a tennis tournament and witnessing a good
match. No, they want something extraordinary, something special, brilliant even, from the players,
which will make them feel as though the money spent on tickets for major tournaments was not wasted.

What is more, some of them pay incredibly high prices in order to witness history being made. After all,
which tennis fan wouldn’t have liked to be in the stands when Arthur Ashe became the first male
African-American to win a Grand Slam in 1975?Which tennis fan wouldn’t have liked to be there when
Roger Federer beat the record for most Grand Slam tournaments in 2009 after a heavily contested
Wimbledon Final against Andy Roddick? Which tennis fan wouldn’t have liked to witness Andy Murray
become the first British man to triumph at Wimbledon after almost 80 years? (Tennis View Mag, 2015)

Well, when it comes to these kinds of events, the answer is almost always no one and it is not
necessarily the big tournaments that draw in the crowds, but in most cases it is mostly about the player
themselves. Some of these players have the magnetic ability to sell out arenas of thousands of people
only by announcing their attendance at a tournament.
One of these players is without a doubt Roger Federer. The Swiss born is widely regarded as the greatest
tennis player of all times, having achieved such incredible feats, even now at his age of 37, that some are
left to wonder just how he does it. Over the course of the years in which Roger Federer has played
professional tennis, he has won more Grand Slam tournaments than anybody else, boasting with a total
of 20 and he has also won more than 1080 matches and holds only 234 losses.

Furthermore, Roger Federer holds 27 Guinness World Records, 99 career singles titles and stands alone
in a long series of other dozens of tennis records. Upon looking at these facts, it comes without any
doubt that Roger Federer is a figure worthy of the admiration he receives at each tournament he plays,
the contributions he has brought to the sport being remarkable. Not only is he a safe bet towards
achieving a sold-out night, but he is in almost all cases a guarantee for offering the fans memorable
moments that will keep them coming back for more.

Moreover, Roger Federer and the other top-ranked players in the world have yielded significant
improvements to the game itself. Through their fame and the incredible fan dedication, they have been
one of the contributing factors in the increase of prize money awarded at all competitions. As an
example, we can take the US Open where the total prize money was $53 million dollars in 2018 and the
Australian Open where, for the 2019 edition, the organizers have made a commitment of $42.85 dollars
(Total Sportek, 2019). Thus, these increase have also been beneficial to other, lower ranked players,
helping them with the expenses concurred by playing tennis and enabling them to participate in more
tournaments worldwide and raise their rankings.

Well, in this case, if Roger Federer and a handful of others always bring out a great crowd and create
legions of fans, the question stands of whether he has anything left to improve or if he has given it all
and is now on a downwards slope towards retirement. This project will provide an analysis of Roger
Federer’s career, his wins and losses over the course of the years at each tournament and also his
earnings across his career and his game. In order for us to provide a thorough glimpse into Roger
Federer’s career and its progress in these areas, we have gathered secondary data from multiple sources
specialized in tennis.

2. Wins and Losses over the course of his career at ATP each tournament
type
In the following chapter we will try to statistically analyze Roger Federer’s wins and losses at each type
of tournament.

2.1. Grand Slams

We start by analyzing how Roger Federer performed at Grand Slams, the 4 most important tournaments
in tennis. As it can be seen in the table, in his first year as a professional tennis player he did not enter
the main draw at any Grand Slam, while in the next he only amassed 2 losses. However, starting from
2000, he slowly started to grow and make his way to the top.
Year Grand Slam
WINS(xi) LOSSES TOTAL
1998 0 0 0
1999 0 2 2
2000 7 4 11
2001 13 4 17
2002 6 4 10
2003 13 3 16
2004 23 1 24
2005 24 2 26
2006 27 1 28
2007 27 1 28
2008 24 3 27
2009 26 2 28
2010 20 3 23
2011 20 4 24
2012 21 3 24
2013 13 4 17
2014 19 4 23
2015 18 4 22
2016 10 2 12
2017 18 1 19
2018 14 2 16
Table 1-Grand Slam Wins/Losses

a) The arithmetic mean (or mean or average) is the most commonly used and
readily understood central tendency measure (Serban , et al., 2003). The
formula of the arithmetic mean for simple data series is:

𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ∑𝑛𝑖=1 𝑥𝑖
𝑥̅ = =
𝑛 𝑛
Equation 1-Arithmetic mean formula

Where:

x= the number of wins in a Grand Slam/ year

n= the number of years

Therefore, for the number of wins/ Grand Slam we will have:


21

∑ 𝑥𝑖/ 21
𝑖=1

and n= 21 years
343
𝑥̅ = 21
= 16.33 Grand Slam matches won/ year

Consequently, we can say that 16.33 is the average number of wins/ year in a Grand Slam that Roger
Federer has achieved over the course of his career.

Using the same formula for the losses that Federer sustained in a Grand Slam tournament we obtain the
following arithmetic mean:
54
𝑥̅ = 21
=2.5714 Grand Slam matches lost/year

b) The mode is a measure of central tendency corresponding to most frequently occurring


value. In our case, the mode is represented by the number 13 in the case of wins in a Grand
Slam which occurs in three different years. In the case of losses the mode is 4 which occurs
in 7 cases.
c) The median is an important measure of central tendency. In an ordered array, the median is
the middle number, so that half of the observations are smaller and the other half is
represented by larger numbers (Curwin, et al., 2013). For simple data series the median is
computed like this:
 If the data series contains an odd number of observations, the median is the middle
𝑛+1
number, which is the 2
ordered observation.
 If the data series contains an even number of observations, the median is the
average of the two middle observations.
𝑛+1
We have an odd number of observations so we will compute with the formula of: 2
.

21+1
Me placeWininaGrandSlam= 2
=11.

Therefore, the median will be in the 11th place of the observations ordered increasingly and it will be
represented by 18, which means that in half of the cases, Roger Federer won more than 18 matches in a
Grand Slam/ year and in the other half he won less than 18 matches in a Grand Slam/year.

Regarding the losses, the Meplace remains eleven because the number of observations is still 21 and the
Median will be 3, which means that in half of the cases, Roger Federer lost more than 3 matches in a
Grand Slam/ year and in the other half he lost less than 3 matches in a Grand Slam/year.

d)The variance is a computed measure whose value is affected by the value of every observation in a
series and therefore this measure reflects the dispersion of all the observations.
It is calculated as the simple arithmetic mean of the squares of the individual deviations and the mean
and the formula is as it follows:

2
∑(𝑥𝑖 − ̅̅̅
𝑥)2
𝜎 =
𝑛̅ − 1
Equation 2-Variance Formula

In order to find the variance for the number of matches won in Grand Slams over the course of the years
we did as follows:

Step 1: we computed the square of each observation and made a total sum of the squares.

Step 2: we divided the sum by the number of observations from which we subtracted 1 because n<31 .

Applying this we obtained the following:


1350.669
𝜎2 = 20
= 67.5333

Using the same steps to calculate the variance for the losses in Grand Slams we obtained the following:
33.14371536
𝜎2 = 20
= 1.65719

e)Standard deviation is one of the most frequently encountered measures of dispersion and it is
calculated as the square root of the variance:

𝜎 = √𝜎 2

Equation 3-Standard deviation formula

For the matches won in a Grand Slam we obtained: 𝜎 = √67.5333= 8.2178738126

For the matches lost in a Grand Slam we obtained: 𝜎 = √1.65719= 1.28731

f) The coefficient of variation is the standard deviation divided by the mean:


𝜎
𝜈= ∗ 100
𝑥̅
Equation 4-Coefficient of variation formula

8.2178738126
For the matches won in a Grand Slam, the coefficient of variation is: 𝜈 = ∗ 100=50.32%.
16.33
According to the rules, if the coefficient of variation is below 35%, that means that the data series is
homogenous and the average is representative. However, since in our case the coefficient of variation in
50.32%, which is above 35%, we can say that the data is not homogenous and that the average is not
representative. Therefore, regarding the matches won in a Grand Slam, Roger Federer did not win
similarly in each of the 21 years studied.
1.28731
For the matches lost in a Grand Slam, the coefficient of variation is: 𝜈 = 2.5714
*100= 0.5*100=
50.06%. According to the rules, if the coefficient of variation is below 35%, that means that the data
series is homogenous and the average is representative. Yet, in our case the coefficient of variation in
50.06%, which is above 35%, we can say that the data is not homogenous and that the average is not
representative. Therefore, regarding the matches lost in a Grand Slam, similarly to the previous value
for the matches won in a Grand Slam, Roger Federer did not lose similarly in each of the 21 years
studied.

g) The coefficient of skewness measures the degree of skewness of a distribution or curve, which is
denote by Sk and defined with the following formula for which the result usually varies between –3 (for
negative) to +3 (for positive) and the sign indicates the direction of skewness. We used the formula with
the Median instead of the Mode because the mode value was only repeated three times and we had
other values repeated twice, so it was not very representative for the data set.

3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆𝑘 =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Equation 5-Coefficient of skewness formula

By applying this formula to the wins recorded in a Grand Slam we obtained the following:

3(16.33−18)
𝑆𝑘 = = - 0.61
8.2178738126

Therefore, we can say that because the result is negative, we have a low negative skewness regarding
the number of wins in a Grand Slam over the course of his career.

Using the same formula for the losses that Federer recorded in a Grand Slam throughout his career we
obtained the result:

3(2.5714−3)
𝑆𝑘 = 1.28731
= -0.998834

Therefore, we can say that because the result is negative, we have a low to medium negative skewness
regarding the number of losses in a Grand Slam over the course of his career.

h) The range of a set of observations is the difference between the largest and the smallest
observations: R=xmax-xmin.

In the case of the Grand Slam wins we can say that the absolute range is: RGrandSlamWins=27-0=27.
However, the use of the absolute range is limited because it fails to take into consideration all of the
observations and only considers the extreme values. As a result, we will calculate the relative range:
𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 27−0
RGrandSlamWins(%)= 𝑥̅̅
∗ 100= 16.33 ∗ 100=165.33

Because our relative range is above 100-120%, we can say that the data regarding the Grand Slam wins
is not homogenous and that the data is not representative.
When we calculate the absolute range for Grand Slam losses we obtain: RGrandSlamLosses=4-0=4. Just like
above, we will calculate the relative range due to the limitations of the absolute range and the result is:
𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 4−0
RGrandSlamLosses(%)= 𝑥̅̅
∗ 100 = ̅̅̅̅̅̅̅̅̅̅ ∗ 100= 1.5557*100= 155.57%
2.5714

Because our relative range is above 100-120%, we can say that the data regarding the Grand Slam losses
is not homogenous and that the data is not representative.

2.2 Masters 1000

Next we will analyze how Roger Federer competed in the tournaments belonging to the Masters 1000
category. In the table below we can see how Roger Federer’s performance differed over the course of
the 21 years in which he played at Masters 1000 tournaments. His progress is clearly visible, from his
debut year when he did not succeed in reaching the main draw in any tournament of this level to his
following years when he was a very strong competitor except cases in which he suffered from injuries.

Year Masters 1000


WINS LOSSES TOTAL
1998 0 0 0
1999 0 2 2
2000 2 8 10
2001 8 7 15
2002 18 8 26
2003 21 8 29
2004 20 3 23
2005 27 1 28
2006 34 3 37
2007 26 7 33
2008 23 9 32
2009 24 6 30
2010 23 7 30
2011 22 7 29
2012 23 3 26
2013 14 6 20
2014 28 6 34
2015 16 6 22
2016 3 2 5
2017 21 1 22
2018 15 5 20
Table 2-Masters 1000 Wins/Losses

a) The arithmetic mean will be calculated using the following formula:


𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ∑𝑛𝑖=1 𝑥𝑖
𝑥̅ = =
𝑛 𝑛
Where:
x= the number of wins in a Masters 1000/ year
n= the number of years
Therefore, for the number of wins/ Grand Slam we will have:
∑21
𝑖=1 𝑥𝑖/ 21, where n= 21 years:
368
𝑥̅ = 21
= 17.5238095 Masters 1000 matches won/year
We can denote from this number that the average number of wins that Roger Federer
managed to obtain in a year in a Masters 1000 level tournament is 17.5238095.
Applying the same formula to the losses that Roger Federer suffered at Masters 1000
tournament levels we obtained:
105
𝑥̅ = =5 Masters 1000 matches lost/year
21
Because of this number we can say that the average number of losses that Roger Federer
suffered in a year in a Masters 1000 level tournament is 5.
b) The Mode in the case of the wins in the Masters 1000 is represented by the value that is
repeated most which is represented by 23 wins. In the case of the Masters 1000 losses, the
mode is represented by the value 7 which is repeated a total number of 4 times.
𝑛+1
c) The Median will be computed with the formula of: 2
because of the odd number of
associations. Therefore, the Median Place is the 11th place of the observations ordered
increasingly and it will be represented by 21 wins which means that in half of the cases
observed Roger Federer won less than 21 matches/year in Masters 1000 tournaments and
in the other half he won more than 21 matches/ year in Masters 1000 tournaments.
Through using the same formula for the losses in a Masters 1000 when calculating the
Median, we obtain the same MePlace=11 of the observations ordered increasingly and it will
be represented by 6 losses which means that in half of the cases observed Roger Federer
lost less than 6 matches/year in Masters 1000 tournaments and in the other half he lost
more than 6 matches/ year in Masters 1000 tournaments.
d) The variance will be computed with the formula:
̅̅̅2
∑(xi−x) 1903.24
For the wins in a Masters 1000= σ2 = ̅ −1
n
= 20
= 95.162
̅̅̅2
∑(xi−x) 151
For the losses in a Masters 1000= σ2 = ̅ −1
n
= 20 = 7.55
e) The standard deviation has the following formula:
σ=√(σ^2 )
For the number of matches won in a Masters 1000 level tournament, the standard deviation
is: σ=√95.162= 9.7551
For the number of matches lost in a Masters 1000 level tournament, the standard deviation
is: σ=√7.55= 2.7477
f) The coefficient of variation was calculated by using the following formula:
𝜎 9.7551
𝜈 = 𝑥̅ ∗ 100= 17.5238095
∗ 100= 55.66%
The comment for the coefficient of variation is that because it is above 35% which is
considered to be limit below which the data is homogenous and the average is
representative, we can say that the data is not homogenous and the average is not
representative. Consequently, the number of matches Roger Federer won over the course of
his 21 year long career in a Masters 1000 is not similar.
For the number of losses in a Masters 1000 we applied the same formula and obtained:
𝜎 2.7477
𝜈 = 𝑥̅ ∗ 100= 5
∗ 100= 0.54954*100= 54.95%
The comment for the coefficient of variation is that because it is above 35% which is
considered to be limit below which the data is homogenous and the average is
representative, we can say that the data is not homogenous and the average is not
representative. Therefore, the number of matches Roger Federer lost over the course of his
21 year long career in a Masters 1000 is not similar.
g) The coefficient of skewness was calculated with the formula:
3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛) 3(17.5238095−21)
𝑆𝑘 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛= 9.7551
= -1.069
Therefore, we can say that because the result is negative, we have a low medium negative
skewness regarding the number of wins in a Masters 1000 level tournaments over the
course of the 21 years in which he played.
By computing the coefficient of skewness for the losses suffered in a Masters 1000 we
applied the same formula and obtained:
3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛) 3(5−6)
𝑆𝑘 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛= 2.7477
=-1.091822
Consequently, we can say that because the result is negative, we have a low to medium
negative skewness regarding the number of losses in a Masters 1000 level tournaments
over the course of the 21 years in which he played.
h) The range:
We calculated absolute range with the formula: R=xmax-xmin= RMasters1000Wins= 34-0=34, but the
use of the absolute range is limited because it fails to take into consideration all of the
observations and only considers the extreme values and, because of this, we calculated the
relative range, using the following formula:
𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 34
RMasters1000WIns(%)= 𝑥̅̅
*100= 17.5238095*100= 194.02%

Because our relative range is above 100-120%, we can say that the data regarding the Masters 1000
wins is not homogenous and that the data is not representative.

By calculating the absolute range for the Masters 1000 losses we applied the same formula: R=xmax-xmin=
RMasters1000Losses= 9-0=9 and because of the fact that the absolute range is limited, failing to take into
consideration all of the observations, only considering the extreme values, we calculated the
relative range by applying the same formula:
𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 9
RMasters1000Losses(%)= 𝑥̅̅
*100= 5*100= 1.8*100= 180%
Because our relative range is above 100-120%, we can say that the data regarding the Masters 1000
losses is not homogenous and that the data is not representative.

2.3 ATP 500


In the following subchapter we will analyze the performance of Roger Federer throughout his career
while participating in tournaments belonging in the category of ATP 500. In the table below it is visible
how Roger Federer went from not managing to get past the qualifiers in an ATP 500 tournament level in
1998 to a rising number in the next years of his career.

Year ATP 500


WINS LOSSES TOTAL
1998 0 0 0
1999 5 3 8
2000 5 4 9
2001 6 2 8
2002 8 2 10
2003 13 1 14
2004 7 1 8
2005 10 0 10
2006 9 1 10
2007 5 0 5
2008 0 1 1
2009 4 1 5
2010 5 0 5
2011 9 1 10
2012 14 1 15
2013 12 4 16
2014 10 0 10
2015 15 0 15
2016 3 1 4
2017 11 1 12
2018 14 1 15
Table 3-ATP 500 Wins/Losses

a) The arithmetic mean will be calculated using the following formula:

𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ∑𝑛𝑖=1 𝑥𝑖
𝑥̅ = =
𝑛 𝑛
Where:
x= the number of wins in an ATP 500/ year
n= the number of years
Therefore, for the number of wins in ATP 500 tournaments we will have:
∑21
𝑖=1 𝑥𝑖/ 21, where n= 21 years:
165
𝑥̅ = = 7.8571 ATP 500 wins/year
21
From this number we can observe that the average number of wins that Roger Federer
managed to obtain in a year in an ATP 500 level tournament is 7.8571.
For the number of losses obtained in ATP 500 level tournaments we apply the same formula
for the mean as with the wins:
∑21
𝑖=1 𝑥𝑖/ 21, where n= 21 years:
25
𝑥̅ = =1.1904761
21
From this number we can observe that the average number of losses that Roger Federer
managed to obtain in a year in an ATP 500 level tournament is 1.1904761.
b) The Mode in the case of the wins in the ATP 500 is represented by the value that is
repeated most which is represented by 5 wins which is repeated 4 times. In the case of the
losses, the mode is represented by 1 loss which is repeated 10 times.
𝑛+1
c) The Median will be computed with the formula of: 2
because of the odd number of
associations. Therefore, the Median Place is the 11th place of the observations ordered
increasingly and it will be represented by 8 wins which means that in half of the cases
observed Roger Federer won less than 8 matches/year in ATP 500 tournaments and in the
other half he won more than 8 matches/ year in ATP 500 tournaments. Regarding the ATP
500 losses, the Meplace will remain in the 11th place of the observations ordered increasingly
and it will be represented by 1 loss which means that in half of the cases observed Roger
Federer lost less than 1 match/year in ATP 500 tournaments and in the other half he lost
more than 1 match/ year in ATP 500 tournaments.
d) The variance will be computed with the formula:
̅̅̅2
∑(xi−x) 390.571
For the number of wins in ATP 500 tournaments: σ2 = ̅ −1
n
= 20
= 19.5286
̅̅̅2
∑(xi−x) 29.2381
For the number of losses in ATP 500 tournaments: σ2 = ̅ −1
n
= 20 = 1.4619
e) The standard deviation has the following formula:
σ=√(σ^2 )
For the number of matches won in an ATP 500 level tournament, the standard deviation is:
σ=√19.5286 = 4.4191
For the number of matches lost in an ATP 500 level tournament, the standard deviation is:
σ=√1.4619= 1.209
f) The coefficient of variation was calculated by using the following formula:
𝜎 4.4191
For the wins in ATP 500 tournaments: 𝜈 = 𝑥̅ ∗ 100= 7.8571
̅̅̅̅̅̅̅̅̅̅
∗ 100= 56.2433%
The comment for the coefficient of variation is that because it is above 35% which is
considered to be limit below which the data is homogenous and the average is
representative, we can say that the data is not homogenous and the average is not
representative. Consequently, the number of matches Roger Federer won over the course of
his 21 year long career in a ATP 500 is not similar.
𝜎 1.209
For the losses in ATP 500 tournaments: 𝜈 = 𝑥̅ ∗ 100= 1.1904761 ∗ 100=101.556
The comment for the coefficient of variation is that because it is above 35% which is
considered to be limit below which the data is homogenous and the average is
representative, we can say that the data is not homogenous and the average is not
representative. Consequently, the number of matches Roger Federer lost over the course of
his 21 year long career in a ATP 500 is not similar.
g) The coefficient of skewness was calculated with the following formula:
3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛) 3(1.1904761−1)
𝑆𝑘 = = =- 0.097
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 4.4191
Therefore, we can say that because the result is negative, we have a very low negative
skewness regarding the number of wins in matches played in ATP 500 tournaments over
the course of his career.
For the losses suffered in ATP 500 we applied the same formula and obtained:
3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛) 3(7.8571−8)
𝑆𝑘 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛= 1.209
=-0.35459
Therefore, we can say that because the result is negative, we have a low negative skewness
regarding the number of losses in matches played in ATP 500 tournaments over the course
of his career.
h) The range: We calculated absolute range with the formula: R=xmax-xmin= RATP500Wins= 15-
0=15, but the use of the absolute range is limited because it fails to take into consideration
all of the observations and only considers the extreme values and, because of this, we
calculated the relative range, using the following formula:
𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 15
RATP500Wins(%)= 𝑥̅̅
*100= 7.8571*100= 190.91%
Because our relative range is above 100-120%, we can say that the data regarding the ATP
500 wins is not homogenous and that the data is not representative.
For the ATP 500 losses we calculated the absolute range with the formula: R=xmax-xmin=
RATP500Losses= 4-0=4, but the use of the absolute range is limited because it fails to take into
consideration all of the observations and only considers the extreme values and, because of
this, we calculated the relative range, using the following formula:
𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 4
RATP500WinsLosses(%)= 𝑥̅̅
*100= 1.1904761*100= 336%
Because our relative range is above 100-120%, we can say that the data regarding the ATP
500 losses is not homogenous and that the data is not representative.

2.4 ATP 250


In the table below we will present Roger Federer’s career performance at tournaments that belong in
the category of ATP 250.

Year ATP 250


WINS LOSSES TOTAL
1998 2 3 5
1999 7 7 14
2000 16 11 27
2001 19 7 26
2002 20 7 27
2003 22 4 26
2004 15 0 15
2005 15 0 15
2006 15 0 15
2007 5 0 5
2008 15 0 15
2009 3 1 4
2010 13 3 16
2011 5 0 5
2012 6 2 8
2013 4 1 5
2014 7 1 8
2015 8 0 8
2016 5 2 7
2017 0 1 1
2018 4 0 4
Table 4-ATP 250 Wins/Losses

a)The arithmetic mean will be calculated using the following formula:


𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ∑𝑛𝑖=1 𝑥𝑖
𝑥̅ = =
𝑛 𝑛
Where:
x= the number of wins in an ATP 250/ year
n= the number of years
Therefore, for the number of wins in ATP 250 tournaments we will have:
∑21
𝑖=1 𝑥𝑖/ 21, where n= 21 years:
206
𝑥̅ = 21
= 9.8095 ATP 250 wins/year
From this number we can observe that the average number of wins that Roger Federer
managed to obtain in a year in an ATP 250 level tournament is 9.8095.
For the number of losses in ATP 250 we have applied the same formula:
∑21
𝑖=1 𝑥𝑖/ 21, where n= 21 years:
50
𝑥̅ = 21
= 2.3809 ATP 250 losses/year

b) The Mode in the case of the wins in the ATP 250 is represented by the value that is
repeated most which is represented by 15 wins which is repeated 4 times. In the case of
losses in the ATP 250 the mode is represented by the value of 0 which is repeated 7 times.
𝑛+1
c) The Median will be computed with the formula of: 2
because of the odd number of
associations. Therefore, the Median Place is the 11th place of the observations ordered
increasingly and it will be represented by 7 wins which means that in half of the cases
observed Roger Federer won less than 7 matches/year in ATP 250 tournaments and in the
other half he won more than 7 matches/ year in ATP 250 tournaments.
For the ATP 250 losses the Meplace=11th place of the observations ordered increasingly and
it will be represented by 1 loss which means that in half of the cases observed Roger
Federer lost less than 1 match/year in ATP 250 tournaments and in the other half he lost
more than 1 match/ year in ATP 250 tournaments.
d) The variance will be computed with the formula:
̅̅̅2
∑(xi−x) 832
For the number of wins in ATP 250= σ2 = ̅ −1
n
= 20
= 41.6
̅̅̅2
∑(xi−x) 194.952
For the number of losses in ATP 250=σ2 = ̅ −1
= = 9.7476
n 20
e) The standard deviation has the following formula:
σ=√(σ^2 )
For the number of matches won in an ATP 250 level tournament, the standard deviation is:
σ=√41.6 = 6.4498
For the number of matches lost in an ATP 250 level tournament, the standard deviation is:
σ=√9.7476 = 3.1221
f) The coefficient of variation was calculated by using the following formula:
𝜎 6.4498
For the number of wins in ATP 250: 𝜈 = 𝑥̅ ∗ 100= 9.8095
̅̅̅̅̅̅̅̅̅
∗ 100= 65.75%
The comment for the coefficient of variation is that because it is above 35% which is considered
to be limit below which the data is homogenous and the average is representative, we can say
that the data is not homogenous and the average is not representative. Consequently, the
number of matches Roger Federer won over the course of his 21 year long career in a ATP 250
tournaments is not similar.
𝜎 3.1221
For the number of losses in ATP 250: 𝜈 = ∗ 100= ̅̅̅̅̅̅̅̅̅ ∗ 100= 131.13%
𝑥̅ 2.3809
The comment for the coefficient of variation is that because it is above 35% which is considered
to be limit below which the data is homogenous and the average is representative, we can say
that the data is not homogenous and the average is not representative. Consequently, the
number of matches Roger Federer lost over the course of his 21 year long career in a ATP 250
tournaments is not similar.
g) The coefficient of skewness was calculated with the following formula:
3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛) 3(9.8095−7)
For the wins in ATP 250: 𝑆𝑘 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛= 6.4498
=1.30678
Therefore, we can say that because the result is positive, we have a medium positive
skewness regarding the number of wins in matches played in ATP250 tournaments over the
course of his career.
3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛) 3(2.3809−1)
For the losses in ATP 250: 𝑆𝑘 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛= 3.1221
=0.4422
Therefore, we can say that because the result is positive, we have a low positive skewness
regarding the number of losses in matches played in ATP250 tournaments over the course
of his career.
i) The range: We calculated absolute range with the formula: R=xmax-xmin= RATP250Wins= 22-0=22 but
the use of the absolute range is limited because it fails to take into consideration all of the
observations and only considers the extreme values and, because of this, we calculated the
𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 22
relative range, using the following formula: RATP250Wins(%)= 𝑥̅̅
*100= 9.8095*100= 224.27%
Because our relative range is above 100-120%, we can say that the data regarding the ATP 250
wins is not homogenous and that the data is not representative.
For the number of losses in ATP 250 we calculated the absolute range in the following manner:
R=xmax-xmin= RATP250Losses= 11-0=11 but the use of the absolute range is limited because it fails to
take into consideration all of the observations and only considers the extreme values and,
because of this, we calculated the relative range, using the following formula:
𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 11
RATP250Losses(%)= 𝑥̅̅
*100= 2.3809*100=462.01%

2.5 Pie chart representing total wins obtained throughout Roger Federer’s career at
all types of tournaments mentioned before.

Wins

19%

32%
Grand Slam
Masters 1000
15% ATP 500
ATP 250

34%

Figure 1- Pie Chart of all Wins

In the pie chart above we have presented the percentages of wins in each category of tournaments from
the 1082 matches that Roger Federer has won throughout his career at these types of events.
Therefore, we can see that the highest percentages belongs to the Masters 1000 category, but that it is
followed closely behind by Grand Slams. These tournaments represent the most important and
prestigious categories of tennis tournaments and that explains why, once his career progressed, Roger
Federer chose to focus on them. What is more, this could also explain why he still continues to hold the
Guinness World Record for the most weeks spent at World No.1 and the highest count of Grand Slams in
the Open Era.
3. A statistical analysis of the prize money earned by Roger Federer
throughout his career

Year Earnings
(USD)
1998 27 955
1999 225 139
2000 623 782
2001 865 425
2002 1 995 027
2003 4 000 680
2004 6 357 547
2005 6 137 018
2006 8 343 885
2007 10 130 620
2008 5 886 879
2009 8 768 110
2010 7 698 289
2011 6 369 576
2012 8 584 842
2013 3 203 637
2014 9 343 988
2015 8 682 892
2016 1 527 269
2017 13 054 856
2018 8 629 233
Total 120 456 649
Table 5- Prize Money earned

In the table above we have gathered data regarding the prize money won by Roger Federer in
each year of his career, from 1998 until 2018. These figures represent only the money earned
through tennis, without taking into account the yearly endorsements which are also an
impressively growing figure each year.
In order to create a better image of his career’s earnings, we have calculated the average mean
for the data showcased above. The formula for the mean that we used is:
𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ∑𝑛𝑖=1 𝑥𝑖
𝑥̅ = =
𝑛 𝑛
Where:
x= the earnings/year
n= the number of years
Therefore, for the earnings in each year from tournaments we will have:
∑21
𝑖=1 𝑥𝑖/ 21, where n= 21 years:
120456649
𝑥̅ = 21
=5736030.9 USD/year
From this number we can observe that the average number USD that Roger Federer earned
in a year from tournaments is 5 736 030.9 USD.

14000000

12000000
y = 403236x - 8E+08
R² = 0.438
10000000

8000000

6000000

4000000

2000000

0
1995 2000 2005 2010 2015 2020

Figure 2- Average USD earned/year in tournaments

The figure above shows the fact that his earnings had an increasing evolution over the course of the 21
yeas analyzed. Also, the value of 0.438 is estimates how close the points are to the trend line, the closer
the value of R2 to 1, the better the fit to the trend line. Due to the fact that performance in sports can be
unpredictable at times, Roger Federer being no exception, seeing as he was plagued by injuries as well,
the fact that there are points further away from the trend line showcases that forecasting his future
performance can prove to be difficult. What is more, the furthest the points are from the trend line, the
more you can see in which years he was at peak physical prowess and in which years he was still
learning or wasn’t physically well.
4. Evolution of Roger Federer’s abilities until the present moment
In this chapter we will provide an analysis of how Roger Federer’s abilities have progressed or regressed
over the course of the years until the present moment. We will see in which areas he has succeeded in
improving and also try to gleam what could make his game even better.

4.1 Winners and unforced errors evolution in matches


We will present how the percentages of winners and unforced errors changed over the course of the
past until the present time. For the accuracy of the information we have studied secondary data
compiled from 401 matches in each year of his career.

Winners in matches
Forehand winners Backhand winners Net winners Aces

26%
27% 34%
35%

25% 13%
23%
17%

Figure 3-Winners in matches

The pie chart above presents the evolution of winners during the course of the match. We have split the
winners into 4 different shot types. The first level of the pie chart, the one towards the outside
represents the past while the level towards the inside in the present. Therefore, we can see that over
the course of the years, Roger Federer has increased his forehand winners by 1% and the net winners by
1.8%, while his backhand winners have decreased by 4% and the aces by 1%. Looking at his game, these
facts can be explained by considering him going for the net winners more. Also, in the most recent years
he tends to attack more on the forehand side, which is why his backhand winners have decreased the
most.
Unforced errors in matches
Forehand unforced errors Backhand unforced errors
Net unforced errors Double faults
10%
4% 6%
5%

48%46%

41%
40%

Figure 4-Unforced errors in matches

In the pie chart above we have put together data regarding Roger Federer’s unforced errors. Like above,
the level towards the outside is the past and the one towards the inside is the present. Therefore, the
forehand unforced errors grew with almost 2%, the backhand unforced errors grew as well by
approximately 1% and the net unforced errors also grew with a little more than 1%. The most significant
change was in the decrease of unforced errors, it going down with almost 4%. By correlating both charts,
we can see that although he has increased his forehand winners, the unforced errors also grew due to
him more aggressive style, the same applying to the net winners. However, a problem occurs with his
backhand where his unforced errors grew although his winners dipped as well, signifying that he hits his
target less when using the one-handed backhand. Regarding aces, their number is a bit lower, with
approximately 1%, but the double faults decreased by a much significant percentage: almost 4%. This
can be explained by his serve which has improved considerably over the course of the last years and
which we will analyze in the next subchapter.

4.2 Roger Federer’s serve

One of Roger Federer’s main weapons in the matches played against his opponents is his serve. It is
extremely consistent, very well placed and the variation that he uses when he serves always keeps his
opponents guessing. In the table below we have presented some of the career stats regarding his serve
and the percentage of points that he has won over the course of the years.

Federer Serve 1st Serve 2nd Serve


% Serve landed 64 100
Avg. speed (m/h) 125 95
% Points Won 78 59
Table 6-Serve Analysis
The first value that we calculated is the probability of winning a point on serve, be it with the first serve
or with the second serve. In order to determine that we calculated it as the chance of winning the point
on the first serve or the chance of doing a fault and then winning the point on his second serve. By
transposing this into an equation we obtained the following:

𝑃𝑟𝑜𝑏(𝑙𝑎𝑛𝑑 1𝑠𝑡 𝑠𝑒𝑟𝑣𝑒) ∗ 𝑃𝑟𝑜𝑏(𝑤𝑖𝑛 𝑝𝑜𝑖𝑛𝑡 𝑜𝑛 1𝑠𝑡 𝑠𝑒𝑟𝑣𝑒) + 𝑃𝑟𝑜𝑏(𝑑𝑜𝑢𝑏𝑙𝑒 𝑓𝑎𝑢𝑙𝑡) ∗


𝑃𝑟𝑜𝑏(𝑙𝑎𝑛𝑑 2𝑛𝑑 𝑠𝑒𝑟𝑣𝑒) ∗ 𝑃𝑟𝑜𝑏(𝑤𝑖𝑛 𝑝𝑜𝑖𝑛𝑡 𝑜𝑛 2𝑛𝑑 𝑠𝑒𝑟𝑣𝑒)=
0.64*0.78+0.36*1*0.59=0.4492+0.2124=0.7112

The result of this tells us that Roger Federer wins 71.12% of the total number of points that he plays.

The second value we wanted to calculate was the relative speed of his serve. In order to calculate this,
we used the following formula:
𝑠𝑒𝑟𝑣𝑒 𝑠𝑝𝑒𝑒𝑑
Relative speed=𝑎𝑣𝑒𝑟𝑎𝑔𝑒 1𝑠𝑡 𝑠𝑒𝑟𝑣𝑒 𝑠𝑝𝑒𝑒𝑑 *100

Applying this formula to the first speed we obtained the following:


1𝑠𝑡 𝑠𝑒𝑟𝑣𝑒 𝑠𝑝𝑒𝑒𝑑 125
Relative 1st serve speed= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 1𝑠𝑡 𝑠𝑒𝑟𝑣𝑒 𝑠𝑝𝑒𝑒𝑑 ∗ 100= 125
*100=100 m/h

2𝑛𝑑 𝑠𝑒𝑟𝑣𝑒 𝑠𝑝𝑒𝑒𝑑 95


Relative 2nd serve speed= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 1𝑠𝑡 𝑠𝑒𝑟𝑣𝑒 𝑠𝑝𝑒𝑒𝑑 ∗ 100= 125*100=76m/h

The third and final value that we computed concerning Roger Federer’s serve is represented by the
relationship that exists between his serve speed and the percentage of points that he won on serve. The
formula that we used to calculate this project is the following:
%𝑜𝑓 𝑝𝑜𝑖𝑛𝑡𝑠 𝑤𝑜𝑛
Relationship=𝑡ℎ𝑒 𝑠𝑝𝑒𝑒𝑑 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑟𝑣𝑒*100

Applying this for both the first and the second serve we obtained the following results:
%𝑜𝑓 𝑝𝑜𝑖𝑛𝑡𝑠 𝑤𝑜𝑛 𝑜𝑛 1𝑠𝑡 𝑠𝑒𝑟𝑣𝑒 78
For the 1st serve: 𝑡ℎ𝑒 𝑠𝑝𝑒𝑒𝑑 𝑜𝑓 𝑡ℎ𝑒 1𝑠𝑡 𝑠𝑒𝑟𝑣𝑒
*100=125*100=0.624*100=62.4 m/h.

%𝑜𝑓 𝑝𝑜𝑖𝑛𝑡𝑠 𝑜𝑛 2𝑛𝑑 𝑤𝑜𝑛 59


For the 2nd serve: 𝑡ℎ𝑒 𝑠𝑝𝑒𝑒𝑑 𝑜𝑓 𝑡ℎ𝑒 2𝑛𝑑 𝑠𝑒𝑟𝑣𝑒*100=95*100=0.621*100=62.1 m/h.

The comment for these values is that in both the case of the 1st serve and the 2nd serve is around 62
which means that there is a close relationship between the speed of Roger Federer’s serve and the
percentage of points that he wins on serve. In other words, the percentage of Federer winning a point
on serve is represented by the value of 0.62* the speed of the serve (m/h).

4.3 Suggestions for improvement


After analyzing these results, we can see that the probability of Roger Federer winning points on serve is
high, showcasing once again just how well he is able to serve. What is more, although the relative
speeds for the 1st and 2nd serve differ, we can see that the relationship between the serve speed and the
percentage of points he wins on serve is extremely similar, thus showing that the percentage of points
that he could win on his 2nd serve, should he attempt to serve faster would not be an improvement to
his game. Furthermore, after the analysis of the numbers in the pie chart regarding his abilities, we can
say that the winners that he produces on his forehand and at the net are at a very good level for his
aggressive style of tennis, but the area in which he could benefit is the backhand where he could
improve the accuracy of his shots in order not to make so many unforced errors. This suggestion for
improvement also correlates with the manner in which other players are known to attack him, going for
the backhand side because that is more susceptible to the error than the forehand.

5. Conclusion

After a statistic analysis of all of the data gathered regarding Roger Federer’s wins/losses, earnings,
winners, unforced errors and serve, we can clearly see the improvements that he made as he
progressed in his career. He went from only winning 2 matches in an ATP 250 level tournament in his
first year of play to reaching his greatest success at the most important categories of tournaments:
Grand Slams and Masters 1000 where it is clear to see that his focus shifted towards in the last years of
his career.

In spite of the fact that 2018 was on a slightly lower level than 2017, it is clear to see that Roger Federer
still has the will, passion and, most importantly, the ability to be one of the main contenders at the big
tournaments. What is more, the statistics we have analyzed in the project reveal perhaps his weaker
spot- the backhand- where, if he should improve, he could be almost guaranteed to extend his record
tally of Grand Slams.

All in all, Roger Federer has little upon where he could work on regarding his style of play, but the areas
that he should focus on while taking into consideration his age are his physical fitness in order to ensure
that he can compete against the younger players in matches that can last 4-5 hours or even longer and
also to attempt to remain as healthy as possible and avoid the performance of 2016, where he needed
to have knee surgery, since his wins and losses that year dipped considerably.

Bibliography

Sawe, B. E., 2018. The Most Popular Sports In The World. [Online]
Available at: https://www.worldatlas.com/articles/what-are-the-most-popular-sports-in-the-world.html

Curwin, J., Slater, R. & Eadson, D., 2013. Quantitative methods for business decisions. 7th ed. Andover :
Cengage Learning.

Serban , D., Mitrut, C. & Mitrut, C. A., 2003. In: Statistics for business administration. Bucharest: Editura
ASE.
Tennis View Mag, 2015. Memorable Moments in Tennis History Timeline. [Online]
Available at: http://www.tennisviewmag.com/memorable-moments-tennis-history-timeline

Total Sportek, 2019. Highest Prize Money In Tennis Grand Slams. [Online]
Available at: https://www.totalsportek.com/money/highest-prize-money-in-tennis-grand-slams/

References

Table 1-Grand Slam Wins/Losses .................................................................................................................. 4


Table 2-Masters 1000 Wins/Losses .............................................................................................................. 9
Table 3-ATP 500 Wins/Losses ..................................................................................................................... 11
Table 4-ATP 250 Wins/Losses ..................................................................................................................... 14
Table 5- Prize Money earned ...................................................................................................................... 17
Table 6-Serve Analysis................................................................................................................................. 20

Figure 1- Pie Chart of all Wins ..................................................................................................................... 16


Figure 2- Average USD earned/year in tournaments ................................................................................. 18
Figure 3-Winners in matches ...................................................................................................................... 19
Figure 4-Unforced errors in matches .......................................................................................................... 20

Equation 1-Arithmetic mean formula ........................................................................................................... 4


Equation 2-Variance Formula ....................................................................................................................... 6
Equation 3-Standard deviation formula ....................................................................................................... 6
Equation 4-Coefficient of variation formula ................................................................................................. 6
Equation 5-Coefficient of skewness formula ................................................................................................ 7

aces ....................................................................................................................................................... 19, 20


ATP 250 ..................................................................................................................... 1, 13, 14, 15, 16, 22, 23
ATP 500 ................................................................................................................................. 1, 11, 12, 13, 23
backhand ......................................................................................................................................... 19, 20, 22
forehand.......................................................................................................................................... 19, 20, 22
serve ............................................................................................................................................ 1, 20, 21, 22
winners............................................................................................................................................ 19, 20, 22