Nicolas Hohn PHD Thesis

PRODUCED ON ACID-FREE PAPER
MEASURING, UNDERSTANDING
AND MODELLING
INTERNET TRAFFIC
Nicolas Hohn
SUBMITTED IN TOTAL FULFILLMENT OF THE

REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
JULY 2004
DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING

THE UNIVERSITY OF MELBOURNE
AUSTRALIA
A mes parents, pour leur amour, encouragement et constant support,
sans qui rien ne serait.
iii
Abstract
This thesis concerns measuring, understanding and modelling Internet traffic. We first study
the origins of the statistical properties of Internet traffic, in particular its scaling behaviour,
and propose a constructive model of packet traffic with physically motivated parameters.
We base our analysis on a large amount of empirical data measured on different networks,
and use a so called semi-experimental approach to isolate certain features of traffic we seek
to model. These results lead to the choice of a particular Poisson cluster process, known as
Bartlett-Lewis point process, for a new packet traffic model. This model has a small number
of parameters with simple networking meaning, and is mathematically tractable. It allows
us to gain valuable insight on the underlying mechanisms creating the observed statistics.
In practice, Internet traffic measurements are limited by the very large amount of data
generated by high bandwidth links. This leads us to also investigate traffic sampling strate-
gies and their respective inversion methods. We argue that the packet sampling mechanism
currently implemented in Internet routers is not practical when one wants to infer the sta-
tistics of the full traffic from partial measurements. We advocate the use of flow sampling
for many purposes. We show that such sampling strategy is much easier to invert and can
give reasonable estimates of higher order traffic statistics such as distribution of number of
packets per flow and spectral density of the packet arrival process. This inversion technique
can also be used to fit the Bartlett-Lewis point process model from sampled traffic.
We complete our understanding of Internet traffic by focusing on the small scale behav-
iour of packet traffic. To do so, we use data from a fully instrumented Tier-1 router and
measure the delays experienced by all the packets crossing it. We present a simple router
model capable of simply reproducing the measured packet delays, and propose a scheme to
export router performance information based on busy periods statistics. We conclude this
thesis by showing how the Bartlett-Lewis point process can model the splitting and merging
of packet streams in a router.
v
Declaration
This is to certify that:
(i) the thesis comprises only my original work;
(ii) due acknowledgement has been made in the text to all other material used; and
(iii) the thesis is less than 80000 words in length, exclusive of tables, maps, bibliographies,
appendices and footnotes.
Nicolas Hohn
vii
Preface
The work presented in this thesis is the result of original research conducted by the author.
Parts of it have been published, or submitted for publication, as follows:
Chapters 3 and 4:
[81] N. Hohn, D. Veitch, and P. Abry, “Investigating the scaling behaviour of Internet
flow arrivals”, in Proc. International Conference on Self-Similarity and Applications,
Annales Mathématiques Blaise Pascal, Clermont Ferrand, France, May 2002.
[80] N. Hohn, D. Veitch, and P. Abry, “Does fractal scaling at the IP level depend on TCP
flow arrival processes ?”, in Proc. ACM Internet Measurement Workshop, pp. 63–68,
Marseille, France, November 2002.
[83] N. Hohn, D. Veitch, and P. Abry, “The impact of the flow arrival process in Internet
traffic”, in Proc. IEEE ICASSP, pp. VI 37–40, Hong Kong, April 2003.
[3] P. Abry, P. Flandrin, N. Hohn, and D. Veitch, “Invariance d’échelle dans l’Internet”,
in Proc. Colloque Mesure de l’Internet, Nice, France, May 2003.
[82] N. Hohn, D. Veitch, and P. Abry, “Cluster Processes, a Natural Langage for Network
Traffic”, IEEE Transactions on Signal Processing, Special Issue on Signal Processing
in Networking, 51(8):2229–2244, August 2003.
[173] D. Veitch, N. Hohn, and P. Abry, “Multifractality in TCP/IP traffic : the case against”,
(submitted).
Chapter 5:
[78] N. Hohn and D. Veitch, “Inverting sampled traffic”, in Proc. ACM Internet Measure-
ment Conference, pp. 222–233, Miami, USA, October 2003. Best student paper
award.
[79] N. Hohn and D. Veitch, “Inverting sampled traffic”, IEEE/ACM Transactions on Net-
working, (fast track submission).
Chapter 6:
[142] K. Papagiannaki, D. Veitch, and N. Hohn, “Origins of microcongestion in an access
router”, in Proc. Passive and Active Measurment Workshop, Antibes, France, April
2004.
[84] N. Hohn, D. Veitch, K. Papagiannaki, and C. Diot, “Bridging router performance and
queueing theory”, in Proc. ACM SIGMETRICS conference, New York, USA, June
2004.
ix
Chapter 7:
[85] N. Hohn, D. Veitch and T. Ye, “Splitting and merging of a traffic model: validation”,
(submitted).
x
Acknowledgements
If you have an apple and I have an apple and we exchange these apples, then
you and I still each have one apple. But if you have an idea and I have an idea
and we exchange these ideas, then each of us will have two ideas.
George Bernard Shaw
I would like to thank Darryl Veitch, my PhD advisor, for his support, guidance and
availability. I thoroughly enjoyed working and “exchanging ideas” with him. I will look
back at our late night enlightening discussions and desperate moments before dead lines
with fond memories. He made my PhD studies a great experience, as much scientifically
than personally.
Thanks go to Iven Mareels and Stephen Hanly, members of my PhD committee, for
their assistance and suggestions over the course of my work and in the preparation of this
thesis.
The financial supports from the Commonwealth government of Australia through an In-
ternational Postgraduate Research Scholarship, from the University of Melbourne, Ericsson
and the Australian Research Council Special Research Center for Ultra-Broadband Infor-
mation Networks were crucial to the successful completion of this project and are gratefully
acknowledged.
The story that led me to leave the French Alps and complete a PhD in Australia is too
long and too incredible to be fully accounted here. A couple of moments stand out: a job
offer from the Bionic Ear Institute just days before I was due to reluctantly leave Australia to
complete my military duties in France, and a fax from the Vice-Chancellor of the University
of Melbourne to support my visa application when I was about to be deported. I cannot
thank enough the persons involved in these life changing events.
Studying in Australia for my MSc and my PhD has been an amazing journey, not be-
cause of all the miles flown, but because I met some great people along the way.
From a research perspective, I was very lucky to work at Ecole Normale Supérieure de
Lyon (France) with Patrice Abry at multiple occasions. I am grateful to the people at the
Cooperative Association for Internet Data Analysis in San Diego (USA), Ecole Normale
Supérieure de Paris, Intel Research Cambridge and Laboratoire d’Informatique de Paris
VI for their kind hospitality and financial support during my short visits. I would also
like to give a special thank to the folks from the IP group at Sprint Advanced Technology
Laboratories in San Francisco (USA) for making my stay there such a great experience.
On a more personal note, I would like to thank my friend Jean for taking me moun-
taineering on Makalu 2 in the Himalayas and thus showing me that one can still have a life
xi
during a PhD. I am also grateful to all the amazing people from the Melbourne University
Mountaineering Club with whom I shared some wonderful adventures and epics.
Being so far from home means that I did not see my family as much as I would have
wished. I thank them all for their support and understanding.
Last but not least, I would like to thank Andrea for coping with my working hours and
my long overseas trips, and for bringing so much in my life over the years.
Melbourne, Australia
May 2004
Nicolas Hohn
xii
Contents
List of Tables xvii
List of Figures xix
Principal Notations xxi
1 Introduction 1
1.1 The Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 History and fundamentals . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Philosophy and aims of this thesis . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Teletraffic engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Traffic modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Internet traffic models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Black box traffic models . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Physical models . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Contributions and thesis outline . . . . . . . . . . . . . . . . . . . . . . . 15
1.5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 How to read this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Mathematical background 19
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Self-similarity and other scaling behaviours . . . . . . . . . . . . . . . . . 19
2.2.1 Self-similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Long-Range Dependence . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.3 Multifractals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.4 Infinitely Divisible Cascades . . . . . . . . . . . . . . . . . . . . . 22
2.3 Point Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.4 Density spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.5 Operations on point processes . . . . . . . . . . . . . . . . . . . . 31
2.4 Wavelet analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
xiii
2.4.4 Making sense at small scales . . . . . . . . . . . . . . . . . . . . . 37
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Empirical observations and semi-experiments 41

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 The data and data processing . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.1 Passive measurements . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.2 First observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.3 IP flow decomposition . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.4 Central observations: biscaling and heavy tails . . . . . . . . . . . 46
3.3 Flow arrival process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.1 Knee tracking algorithm . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.2 Dependence on traffic characteristics . . . . . . . . . . . . . . . . 53
3.3.3 Reconstruction from subsets . . . . . . . . . . . . . . . . . . . . . 57
3.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4 Packet arrival process and semi-experiments . . . . . . . . . . . . . . . . . 60
3.4.1 Basic manipulations . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4.2 Advanced manipulations . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5 Impact on packet arrival process . . . . . . . . . . . . . . . . . . . . . . . 68
3.5.1 Flow volumes manipulation . . . . . . . . . . . . . . . . . . . . . 68
3.5.2 Knee position manipulation . . . . . . . . . . . . . . . . . . . . . 69
3.5.3 Flow subsets manipulation . . . . . . . . . . . . . . . . . . . . . . 71
3.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4 Cluster processes 75
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Empirical observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3 Cluster models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.1 A black box model: gamma renewal . . . . . . . . . . . . . . . . . 78
4.3.2 A flow based model: Bartlett-Lewis point process . . . . . . . . . . 81
4.4 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4.1 Marginals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.4.2 Elephants, mice, and a multiclass cluster model . . . . . . . . . . . 90
4.5 Towards understanding traffic evolution . . . . . . . . . . . . . . . . . . . 92
4.6 Higher order statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.6.1 Model fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.6.2 Small scale behaviour: multifractal or not ? . . . . . . . . . . . . . 96
4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5 Inverting sampled traffic 99

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.3 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1.4 Outline and main contributions . . . . . . . . . . . . . . . . . . . . 102
5.2 Inverting sampling: theory . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2.1 Packet sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2.2 Flow sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3 Inverting sampling: practice . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.1 Packet level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3.2 Flow level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
xiv
5.4 The Bartlett-Lewis point process . . . . . . . . . . . . . . . . . . . . . . . 118
5.4.1 Thinning Bartlett-Lewis point processes . . . . . . . . . . . . . . . 119
5.4.2 Fitting from thinned data . . . . . . . . . . . . . . . . . . . . . . . 120
5.5 How to sample traffic ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.5.1 Packet sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5.2 Flow sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6 Bridging router performance and queuing theory 125

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2 Full router monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2.1 Hardware considerations . . . . . . . . . . . . . . . . . . . . . . . 127
6.2.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.2.3 Packet matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3 Preliminary delay analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.3.1 System definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3.2 Delay statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.4 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.4.1 The fluid queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4.2 A simple router model . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.4.4 Router model summary . . . . . . . . . . . . . . . . . . . . . . . . 145
6.5 Delay performance: understanding and reporting . . . . . . . . . . . . . . 145
6.5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.5.2 Busy periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6.5.3 Modelling busy period shape . . . . . . . . . . . . . . . . . . . . . 150
6.5.4 Reporting busy period statistics . . . . . . . . . . . . . . . . . . . 152
6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7 Modelling Internet traffic 155

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2 Empirical observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.2.1 Details of traffic streams . . . . . . . . . . . . . . . . . . . . . . . 155
7.2.2 Packet train through a router . . . . . . . . . . . . . . . . . . . . . 158
7.2.3 Modelling consequences . . . . . . . . . . . . . . . . . . . . . . . 160
7.3 Validation of the BLPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.3.1 Individual links . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
7.3.2 Splitting and merging of traffic through a router . . . . . . . . . . . 163
7.3.3 Model extension . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8 Conclusion 169
8.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
A IP Packet structure 171
Index 173
Bibliography 175
xv
List of Tables
3.1 Details of packet traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1 Full router trace details over 13 hours . . . . . . . . . . . . . . . . . . . . 129

6.2 Breakdown of packet matching for output link C2-out. . . . . . . . . . . . 132
7.1 Details of 2 hour long packet traces collected at the router . . . . . . . . . . 156
7.2 Router matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.3 Details of 2 hour long packet substreams crossing the router . . . . . . . . 157
A.1 IP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

A.2 TCP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
A.3 UDP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
xvii
List of Figures
1.1 Sprint North American network . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Aims of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Illustration of scale invariance . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Examples of Logscale Diagrams . . . . . . . . . . . . . . . . . . . . . . . 36
3.1 Packet size distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Flow decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Illustration of ‘slowly’ decaying variance . . . . . . . . . . . . . . . . . . 47
3.4 Ubiquity of biscaling behaviour . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Flows characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.6 Packet arrivals in TCP connections . . . . . . . . . . . . . . . . . . . . . . 50
3.7 Analysis of the flow arrival process Y (t) . . . . . . . . . . . . . . . . . . . 51
3.8 Logscale Diagrams for different protocols . . . . . . . . . . . . . . . . . . 52
3.9 Knee dependence on traffic subsets . . . . . . . . . . . . . . . . . . . . . . 54
3.10 Tracking the knee position in Y (t) . . . . . . . . . . . . . . . . . . . . . . 55
3.11 Knee position as a function of RTT and rate . . . . . . . . . . . . . . . . . 57
3.12 LDs of the duration based subsets . . . . . . . . . . . . . . . . . . . . . . 58
3.13 Schematic illustration of semi-experiments . . . . . . . . . . . . . . . . . . 61
3.14 Packet-in-flow manipulations . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.15 Semi-experiments applied to AUCK-c1 . . . . . . . . . . . . . . . . . . . 66
3.15 (continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.16 Impact of Y (t) on X(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.1 Examining flow variability for AUCK-d1 . . . . . . . . . . . . . . . . . . 77

4.2 Packet inter-arrival process . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3 Pseudo scaling of a renewal process . . . . . . . . . . . . . . . . . . . . . 80
4.4 Schematic representation of a BLPP . . . . . . . . . . . . . . . . . . . . . 82
4.5 Comparison of LDs of AUCK-d1 and BLPP model . . . . . . . . . . . . . 86
4.6 Packet process of AUCK-d1 . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.7 Comparison of data and BLPP model . . . . . . . . . . . . . . . . . . . . 89
4.8 Flow and packet density in Abilene . . . . . . . . . . . . . . . . . . . . . . 91
4.9 Periodicities at small scales . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.10 Multiscaling comparison between model and data . . . . . . . . . . . . . . 96
5.1 Analytic continuation method . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.2 Spectrum reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3 Spectrum reconstruction from flow thinned data . . . . . . . . . . . . . . . 113
5.4 Inversion of pj , light thinning . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5 Inversion of pj , heavy thinning . . . . . . . . . . . . . . . . . . . . . . . . 117
xix
5.6 BLPP fitting from flow thinned traffic . . . . . . . . . . . . . . . . . . . . 121
6.1 Experimental setup for full router monitoring . . . . . . . . . . . . . . . . 130

6.2 Link utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.3 Four snapshots of a packet crossing the router. . . . . . . . . . . . . . . . 135
6.4 Packet delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
6.5 Minimum router transit time . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.6 Router mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.7 Comparisons of measured and predicted delays . . . . . . . . . . . . . . . 141
6.8 Measured delays and model predictions . . . . . . . . . . . . . . . . . . . 143
6.9 Error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.10 Busy period statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.11 Busy period construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.12 Modelling of busy period shape with a triangle . . . . . . . . . . . . . . . 150
6.13 Average duration of a congestion episodes versus link utilization . . . . . . 152
6.14 Joint probability distribution of busy period amplitudes and durations . . . 154
7.1 Router diagram with traffic multiplexing to C2-out . . . . . . . . . . . . . 156

7.2 Second order properties of output stream and contributing inputs . . . . . . 159
7.3 Packets on link C2-out . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.4 Semi-experiments [A-Pois] and [A-Pois; P-Uni] on output link C2-out. . . 162
7.5 Semi-experiments [A-Pois] and [A-Pois; P-Uni] on all traffic streams. . . . 164
7.5 (continued) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7.6 Link utilization over 24 hours . . . . . . . . . . . . . . . . . . . . . . . . . 168
xx
Principal Notations
Where they are given, equation and page numbers indicate the first or significant use of the
notation.
Abbreviations
AS Autonomous System, page 2
BLPP Bartlett Lewis Point Process, page 26
c.f. characteristic function, page 79
CS Coarse Scale
EM Expectation Maximization, page 117
fBm fractional Brownian motion , page 19
fGn fractional Gaussian noise , page 20
FIFO First In First Out
FS Fine Scale
Gbps Gigabit per second
GR Gamma Renewal , page 79
HDLC High Level Data Link Control, page 128
HSS H Self Similar
i.i.d. independent and identically distributed , page 25
iff if and only if
IHL Internet Header Length, page 171
IP Internet Protocol, page 2
LD Logscale Diagram, page 36
LMD Linear Multiscale Diagram, page 37
LRD Long Range Dependence/Dependent, page 20
Mbps Megabit per second
PC Personnal Computer, page 42
PCI Peripheral Component Interconnect, page 42
xxi
PCP Poisson Cluster Process, page 26
PoS Packet over SONET
q-LD qth order Logscale Diagram, page 37
r.v. random variable
RTT Round Trip time, page 14
SLA Service Level Agreement, page 5
SNMP Simple Network Management Protocol, page 121
SONET Synchronous Optical NETwork, page 128
SRD Short Range Dependent, page 10
TCP Transmission Control Protocol, page 2
TTL Time To Live, page 171
VOQ Virtual Output Queue, page 126
WWW World Wide Web
Mathematical symbols
δ (.) Dirac delta function
IE(x) Expected value of the r.v. x
GP (z) Probability generating function of the discrete r.v. P, see equation (4.15), page 83
λ rate of N, especially intensity of stationnary N, page 24
IR = IR1 real line

IRd d-dimensionnal Euclidian space
Var(x) Variance of the r.v. x
ζ (·, ·) generalised Riemann Zeta function, page 88
c(u) covariance density of counts, see equation (2.36), page 29
h(u) conditional intensity function, see equation (2.25), page 28
N(A) number of points in A, page 23
N(a, b] = N((a, b]) number of points in half open interval (a, b]
N(t) = N(0,t) = N((0,t]), page 24
U(x) average number of points in [0, x], see equation (2.27), page 28
Traffic modelling parameters

λA Mean packet arrival rate within a flow, page 26
λX Mean arrival rate of X(t), page 26
λF Flow arrival rate, page 81
xxii
Gi (t) Arrival process of packets within flow i, page 81
µA Mean packet inter-arrival time within a flow, page 26
µP Mean number of packets per flow, page 26
tF (i) Arrival time of flow i, page 45
tP (k) Arrival time of packet k, page 45
D(i) Duration of flow i, page 46
FP Distribution of number of packets per flow , page 82
P(i) Number of packets in flow i, page 46
X(t) Packet arrival process, page 45
Y (t) Flow arrival process, page 46
Semi-experiments
[A-Clus] Flows are translated (without permutation) to begin at the points of a LRD Pois-
son cluster process sample path, page 70
[A-Perm] Permute flows around the original arrival points, page 62
[A-Pois] Poisson arrival process with randomised flow re-assignments, page 62
[A-Pord] Retain original flow order, but re-position arrival times according to a Poisson
process with the same rate, page 62
[P-ConstR] Rescale the packet inter-arrivals within each flow such that the average flow
rates are moved to a common value, page 64
[P-Pois] Within each flow separately, packet arrival times are replaced by a Poisson process
of the same rate, page 64
[P-ScaledR] Uniform rescale the packet inter-arrivals within each flow, page 64
[P-Uni] In each flow the first and last packet remain unchanged while the others are uni-
formly distributed, page 62
[S-Dur] Flow selection based on durations, page 65
[S-Pkt] Flow selection based on volumes, page 63
[S-Thin] i.i.d. flow thinning, page 65
[T-Pkt] Flow truncation after the first q packets, page 69
Router modelling parameters

∆λi ,Λ j (L) Minimum excess system transit time for packets of size L from link λi to link Λ j ,
see equation (6.4), page 136
M (Λ j , m) Packet matching function , see equation (6.1), page 131
τ(λi , n) System arrival time of packet n on link λi , see equation (6.2), page 134
θi Bandwidth of link i, page 134
g(T ) (T )
TL Estimate of TL from the reporting scheme, see equation (6.17), page 153
xxiii
dλi ,Λ j (m) Through-system delay experienced by packet m, see equation (6.3), page 135
t(λi , n) DAG timestamp of the nth packet on link λi , page 133

TL Mean length of time during which packet delays are larger than L, see equation (6.14),
page 151
(T )
TL Approximation of TL with triangular busy periods, see equation (6.15), page 151
xxiv
Chapter 1
Introduction
1.1 The Internet

1.1.1 History and fundamentals
The Internet refers to the global information system that is logically linked together by
a globally unique address space based on the Internet Protocol (IP) or its subsequent ex-
tensions [58]. Until the advent of the World Wide Web in 1990, its use was limited to
universities and corporate research departments. It has since been experiencing tremendous
growth.
The networking concept is said to have been first envisioned in 1962 by J. C. R. Lick-
lider of M.I.T. with his “Galactic Network”, a set of interconnected computers where one
could quickly access data from any site. On the theoretical front, L. Kleinrock published
the first paper on packet switching theory in 1961 [96]. He explained how information
between computers could be exchanged by first breaking the data into packets, and then
transmitting packets independently of each other. This was in sharp contrast to the tradi-
tional circuit switching theory used in telephone networks for instance, where a dedicated
link with a given bandwidth is established between users for the entire duration of the con-
nection. After the development of the Network Control Protocol (NCP), the first host to
host protocol [108], the first computer network, the Advanced Research Project Agency
Network (ARPANET) was put in place in 1972, soon followed by the first email protocol.
The fundamental design principle of the Internet is the idea of open-architecture net-
working. It is based on the following four rules [108]:
• Each distinct network has to stand on its own, and no internal changes are required
before being connected to the Internet
• Communications are on a best effort basis. If a packet does not reach its destination,
it is quickly retransmitted from the source.
1
2 CHAPTER 1. INTRODUCTION
• Black boxes (later called gateways and routers) are used to connect the networks.
No information is retained by the gateways about individual packets passing through
them, keeping them simple and avoiding complicated adaptation and recovery from
various failure modes.
• There is no global control at the operations level.
These rules led to the development of the first communication protocol of the Internet:
the Transmission Control Protocol/Internet Protocol (TCP/IP). It was later divided into two
separate entities: the IP protocol which provides packet addressing and forwarding, and
the TCP protocol which provides best effort service, reliability and flow control. Another
transmission protocol, called the User Datagram Protocol (UDP), was added to provide
direct access to IP services, without TCP reliability.
During the 1970s and 1980s, the Internet grew with funding agencies and private cap-
ital, to reach 50000 networks by 1995. In the meantime, link speeds had increased from
56Kbps to 45Mbps. Commercial Internet Service Providers (ISP) progressively replaced
government bodies as the developers of new links. The Internet has been growing expo-
nentially for the past 10 years. In [145] the growth of Internet traffic from a single site
between 1991 and 1994 was examined, and the total amount of traffic originating from that
site was found to grow at a rate of 120% per year. It was also shown in [38] that the total
amount of data carried across the Internet doubled every year for the period between 1998
and 2001. Today’s Internet no longer solely accommodates a small research community,
but has become both a major economic tool supporting business critical applications and a
widely used media enjoyed by a growing portion of the population.
1.1.2 Organization
At the highest level, the Internet is formed by the interconnection of different Autonomous
Systems (AS) , i.e. groups of routers administered with a single routing policy and single
technical administration. These are usually owned by large ISPs or large corporations.
Each level of interconnection is called a Tier . A Tier-1 ISP is typically an ISP with direct
access to the global Internet routing tables and that does not purchase bandwidth from other
providers [12]. Tier-1 networks are often referred to as backbone networks. The second Tier
is composed of smaller ISPs with a national presence that often lease part of their network
from Tier-1 providers. Tier-3 ISPs are local providers with no national backbone.
Current backbone networks are made of IP routers connected together with optical fiber
links, typically OC-48 (2.4 Gbps) and OC-192 (9.9 Gbps) as of 2003. Figure 1.1 illustrates
1.2. PHILOSOPHY AND AIMS OF THIS THESIS 3
Figure 1.1: Sprint North American network as of 2003.
the IP North American backbone network of a Tier-1 ISP named Sprint [165]. This type
of network is usually provisioned in such a way that no link is used at more than 50%
of its capacity and virtually no packet is lost. This policy allows for instant re-routing of
traffic in case of a link failure with minimum down time. This situation can only occur in
a context where supplementary bandwidth is available and cheap. On the other hand, an
Internet access point, for instance between a Tier-2 and a Tier-1 ISP, or between an end user
and their local ISP, is often used to its full capacity since one tries to use all the bandwidth
one pays for. Similarly, networks where the bandwidth is fundamentally limited, such as
satellite or wireless networks, are often used to their full capacity. In such situations, packets
can be lost when the buffers of switching elements fill up.
1.2 Philosophy and aims of this thesis
The Internet is a highly complex system which provides a wide scope of research topics,
such as network topology, routing policies or link dimensioning. In this thesis we narrow the
scope of our research to the characterization and understanding of packet streams through a
single router.
In essence, the following events, illustrated in figure 1.2(a), take place at each router:
(a) (b)
Packets are routed to

the appropriate output
Packets enter Packets leave

the router the router
Measure
The router exports

traffic statistics
Model Understand
Figure 1.2: (a) Schematic of router mechanisms. (b) Measuring, modelling and under-
standing Internet traffic.
packets enter the router, are then routed to the appropriate output, and exit the router. In
parallel, the router can also generate traffic statistics about the packets that cross it. In other
words, we will focus on the following three questions:
(i) How to characterize the traffic entering a router ?
(ii) How to sample packet traffic ?
(iii) What happens to packets inside a router ?
These are in a nutshell the problems this thesis addresses. We will refer to these as
question (i), question (ii) and question (iii) throughout this thesis. Although question (i)
has received a lot of attention in the field of teletraffic engineering, we believe that there is
still a lot of work to be done to fully answer it. On the other hand question (ii) has only
very recently become a research topic, while question (iii) has received very little empirical
treatment due to the fact that it relies on technically challenging measurements.
We emphasize that this work is not overly concerned with traditional problems such as
link dimensioning and the associated queuing models, and has therefore a rather different
orientation than most teletraffic studies. In particular we do not seek yet another traffic
model amenable to simple queuing analysis, nor do we place ourselves in theoretically
‘interesting’ situations where buffers of switching elements are almost always full. From
section 1.1.2, we assume that bottlenecks are located at the network edges and that no packet
loss occurs in the part of the network we focus on. In particular, we are not concerned with
the details of the transport protocols TCP and UDP.
Instead, for each of the above mentioned questions, our philosophy throughout this
work is to start from empirical measurements of Internet traffic. This is a rather obvious
1.3. TELETRAFFIC ENGINEERING 5
point from an experimental science perspective, but has proved to be overlooked by re-
searchers in the field. In fact, a vast body of literature on Internet traffic studies is based
entirely on simulations, with most of the time very little connection with the real world. In
particular, numerous studies on TCP mechanisms have been carried out with very strong as-
sumptions, such as infinite sources or very small networks. While such studies might bring
some interesting mathematical problems, they do not describe the ‘real world’ Internet. Our
view is that one cannot even simulate a network without a proper knowledge of realistic
traffic behaviour. Such knowledge can only be obtained from traffic measurements. This
is why the first step of our analysis is to measure what really happens in a network. This
gives us different time series that we then analyze and model. What makes Internet analysis
so interesting is the fact that these measured time-series contain a lot of information, such
as packet source and destination addresses, which makes a thorough understanding of the
underlying mechanisms possible. Moreover, the very large range of time scales available,
from micro seconds up to hours or days, is a very valuable asset when compared to other
experimental fields where time series can have at most a few thousand points.
From the measurements, we use the methodology described in the title of this thesis:
measure, model and understand, also illustrated in figure 1.2(b). For instance, we might
first start by getting a better understanding of our measurements, in order to select the part
that we really want to model (see section 3.4 for instance), and then do the modelling work.
On the other hand, one can follow an anti-clockwise path, and use a model based on em-
pirical measurements to get more insight and understanding on the data (see section 4.5 for
instance). A model can also potentially be used to refine the measurement stage and focus
on a particular feature of the data. In other words, the three actions measure, understand
and model are very much linked together and will be used together throughout this thesis.
In the rest of this introductory chapter, we will give some background information on
teletraffic engineering in section 1.3, and present in section 1.4 a summary of Internet traffic
modelling work relevant to question (i). The bibliographies relevant to questions (ii) and (iii)
will be presented in later chapters. We conclude this introductory chapter with a summary of
the main contributions of this thesis and an overview of the chapters to follow in section 1.5.
1.3 Teletraffic engineering
In a fierce competition to attract new customers, ISPs have not only to face difficult in-
vestment decisions, but also technical challenges to provide the best possible service. The
Quality of Service (QoS) is often specified in a Service Level Agreement (SLA), and in-
cludes packet delays through the ISP network, loss rate and availability. SLAs are most of
the time established with ‘rules of thumb’ but could benefit from a more scientific approach
known as teletraffic engineering.
1.3.1 Definition
Teletraffic is concerned with the control and transport of information within telecommu-
nications networks. Teletraffic engineering is a well established field that encompasses
modelling of telecommunication systems, performance evaluation, resource dimensioning,
forecasting and resource management. It was first developed for circuit switched networks,
such as telephone networks, and then extended to encompass packet switched networks and
the Internet.
Telephone networks are usually dimensioned by using the Erlang model [55, 116],
where deriving dimensioning guidelines is quite straightforward once a reasonable esti-
mate of the traffic level have been obtained. In such models, call arrivals are described by
Markovian models well amenable to queuing analysis. This is why the first Internet traf-
fic studies also used Markovian models, such as the Markov Modulated Poisson Process
(MMPP)[76]. However, when applied to packet switched networks, these models always
gave better performance prediction than what could be observed in practice and were there-
fore unsatisfactory [147].
In fact, telephone and data networks are fundamentally different. A telephone voice
call requires a very strict QoS to ensure that the user will perceive a satisfactory voice qual-
ity. Such QoS is met by reserving a constant bandwidth of 64 kbps through the telephone
network for the entire call length. This is why such networks are called circuit switched
networks. On the other hand, data networks only need some minimal long term bandwidth
to achieve a satisfactory perceptual quality for the user. Data applications do not send traf-
fic with a constant bandwidth, but rather in bursts. Such data traffic is often referred to as
‘‘bursty”. If one were to use a circuit switched approach to dimension a network in this
case, this would lead to enormous wastes of bandwidth. For such traffic, a packet switched
network is much more appropriate since it allows resource requirements to vary over time
by sending data in small packets. Different resource requirements and different user behav-
iours mean very different traffic characteristics.
1.3.2 Traffic modelling
An important aspect of teletraffic engineering is the mathematical modelling of the observed

traffic statistics. Given the highly complex nature of the mechanisms involved in telecom-
munications networks and the random aspect of user behaviour, a stochastic analysis of the
1.3. TELETRAFFIC ENGINEERING 7
data collected is often preferred to a dynamical systems approach. Throughout this thesis
we will focus on rather overprovisioned links, such as the ones found in the Internet back-
bone. It turns out that the TCP control loop does not really play much of a role, and there
are no significant TCP induced interactions between flows. In the following, we will distin-
guish between two types of models that we call respectively black box models and physical
models. On the one hand a black box approach will aim at blindly reproducing the statistics
of the data, with a potentially very high number of parameters with no obvious meaning.
On the other hand a physical model goes beyond the black box approach by trying to model
the physical causes of the observed statistics. It will therefore aim at giving a physical inter-
pretation to the parameters of the model in networking terms. In that case the mathematical
tractability of the model is heavily sought after because it will bring some extra insight into
the data and allow one to predict the evolution of its statistics as a function of meaningful
parameters. When mathematical tractability cannot be achieved, numerical simulations can
still be used to compare the model with the data. From there, either the model is judged
acceptable or it is modified until it leads to satisfactory results. Once a model is found, it
can be used to simulate the network under different conditions, evaluate the performance of
the current network, or help dimension the network, to achieve a given quality of service.
A turning point in traffic engineering was the discovery that Internet traffic was richer
than simple Markovian descriptions, and exhibited self-similarity and long-range depen-
dence (LRD) properties1 over time scales larger than roughly 1s. This scaling behaviour,
or scale invariance, in packet data was first observed in the seminal work of Leland et al.
[109] for Ethernet Local Area Network (LAN) traffic, and then for Variable Bit Rate (VBR)
video [21], ATM cell traffic [93], and Wide Area Network (WAN) traffic [147]. This brought
the ‘fractal’ buzz word, and led to a renewed interest in traffic modelling. In particular, it
gave a plausible explanation to the discrepancy in queuing performance observed between
real data and Markovian models [56, 135, 137]. On time scales smaller than 1s, Riedi and
Véhel [150] [171] showed that wide area traffic was consistent with multifractal behaviour.
Based on these findings, Feldmann et al. [60] speculated that IP networks appear to act as
conservative cascades and are consistent with multifractal scaling. They observed that the
packet arrivals patterns were consistent with a multiplicative structure due to Transmission
Control Protocol (TCP) feedback control mechanisms [61, 71].
In order to answer the first question of this thesis, we seek to understand the physical
reasons behind these observed traffic statistics at small and large time scales, and investigate
whether the scaling behaviours described above are genuine or not. For modelling purposes,
1 Definitions of these mathematical notions are presented in chapter 2 page 19
we consider any measured data as a sample path of an underlying stochastic process. We

also typically assume that the process is stationary and ergodic, so that its statistics can be
captured by its sample path. However, modelling data by a mathematical process sometimes
reduces to a philosophical issue of modelling choice. We illustrate this point with two
examples corresponding respectively to large and small time scales.
Large time scales
In the physical world, many systems exhibit a property of slowly decaying correlation func-
tion referred to as Long Range Dependence (LRD). Some of the better known examples
were presented in hydrology by Hurst [89]. This phenomenon has also been found for
instance in astronomical and biological systems [19].
In practice, it is impossible to decide whether a timeseries exhibits random fluctuations

at the time-scale over which the data is observed (consistent with LRD) or is simply non
stationary. It is a sample size problem: one sample does not give enough information about
the fluctuations at this time scale [68]. This has led many researchers to argue on the LRD
properties of empirical Internet traffic data [22, 53, 72, 97]. Another problem that arises
when dealing with LRD data is that being a limiting behaviour over large time scales in a
mathematical sense, LRD cannot be truly observed over a finite time interval. This means in
fact that one can always create a multi-state Markov model that will have the same statistical
behaviour as the observed data.
From a modelling perspective, if a physical model is well built, its parameters could
be fitted over a given time interval, and the model should be able to reproduce the data
statistics over larger time intervals. On the other hand, a black box Markovian model will
fail to reproduce the data statistics over any length of time larger than the time period it was
fitted on. Given that there exists a plausible physical explanation for such LRD behaviour in
Internet traffic, as detailed in section 1.4.2, our view is that the observed traffic is genuinely
LRD, as described in [21, 98, 109, 147], and exhibits a scaling behaviour over large time
scales.
However, there may always be some sense in which a given model is correct, or rather,
useful and/or appropriate over some scale range. For instance, when one is interested in the
performance of a buffer with a fixed constant size, one only needs to model the temporal
correlations of the traffic over the time scales corresponding to the buffer length [73, 132].
Therefore a short-range dependent (SRD) process might be a good description of the data
for this purpose, even if the traffic LRD.
1.4. INTERNET TRAFFIC MODELS 9
Small time scales
Long Range Dependence is a characteristic of large time scales, and does not specify any
particular behaviour for Internet traffic at small time scales. Indeed, as will become clear
in this thesis, the change of scale means a change in the objects studied, from groups of
packets transmitting a given file to individual packets. Over small time scales the traffic
has non Gaussian marginals and its description therefore requires more than simple second
order statistics, for instance with multifractal models.
However, the interpretation of traffic behaviour over small time scales is subject to dis-
cussion. First, the set of available statistical tools are not powerful enough to clarify all
the related issues. Improvements are needed in their performance, the knowledge of their
performance under different conditions, and important capabilities such as hypothesis tests
are absent. Second, a widely accepted physical explanation for this potential multifractal
behaviour has yet to be found. As will become clear in later chapters, one could therefore
argue that Internet traffic is not truly multifractal but instead exhibits a pseudo scaling be-
haviour over small time scales. This shows again that the choice of a model is difficult and
depends strongly on the modelling aims.
1.4 Internet traffic models
In this thesis, we do not attempt to give a semantic description of Internet traffic, which
would be based on the actual content of transmitted files. Instead we are interested in
modelling Internet traffic timeseries such as number of IP packets or bytes observed in
time intervals of a given size. In this section we present a summary of some of the most
significant modelling work in this area.
Most of the models for packet switched traffic tend to focus on the network layer, with-
out any knowledge of the higher layers in the protocol stack. This means for instance that
congestion control and flow control mechanisms that may be available at the transport layer
are largely ignored. These models are therefore ‘open loop’ models, as opposed to ‘closed
loop’ models that would take into account feedback mechanisms. Some practitioners have
concerns with mathematical models that do not account for retransmission resulting from
packet losses being detected by higher layers such as TCP, because TCP sources for instance
transmit at rates that are dependent upon the level of congestion of the network among other
things. Attempts at modeling the TCP protocol represent quite a large body of literature,
including [65, 74, 139]. Although closed loop models are certainly closer to the real Inter-
net traffic given that such feedback mechanisms exist in practice, we believe that one should
start by getting the best possible understanding from simple open loop models before taking
into account any feedback mechanism. A lot of work remains to be done in this area.
In the following we are interested in modelling the ‘aggregate’ traffic observed on a
link. We present black box models in section 1.4.1 and physical models in section 1.4.2.
All the fundamental mathematical concepts used in this thesis are regrouped in chapter 2.
More specific concepts are briefly introduced in the text when needed.
1.4.1 Black box traffic models

Markov Modulated Models
The first traffic models proposed for packet switched networks were based on Markovian
processes and largely inspired from traffic models used in telephone networks. A brief
summary of the main models is presented here, while a detailled presentation can be found
for instance in [160]. The models vary according to their continuous or discrete nature, and
whether or not batch arrivals of packets are permitted. The most common are
• Markov Modulated Poisson Process (MMPP)

Continuous time with single arrivals [76, 133, 183, 184]
• Batch Markovian Arrival Process (BMAP)

Continuous time with batch arrival [94, 134]
• Discrete-time Batch Markovian Arrival Process (D-BMAP)

Discrete-time with batch arrivals Markovian process [24]
• Markov Modulated Bernoulli Process (MMBP)

Discrete-time with single arrivals Markovian process [24]
Markov modulated processes are specified by the transition probabilities of the embedded
Markov chain and the arrival rate at each state. Different correlation patterns can be obtained
by setting different values for the above quantities. These models have been widely used
due to their relative simplicity and mathematical tractability. However, their fundamental
drawback is that these are all by definition Short Range Dependent (SRD) due to the finite
number of states in the Markov chain. They have nonetheless been used to approximate
LRD processes. It was shown in [15] that a mixture of N two state MMPPs can match the
correlation structure of an LRD process across a range of time scales. By increasing N, this
matching is possible for an arbitrarily large range of time scales, meaning that although this
process is not strictly LRD, it can be used to model LRD for all practical purposes. This
mixture of N MMPPs converges to a fractional Brownian motion in the limit of large N
[161].
Autoregressive models
Autoregressive processes form another popular class of timeseries used in Internet traffic
models. The Autoregressive Model of order p is denoted AR(p) and has the form
Xt = φ1 Xt−1 + φ2 Xt−2 + φ3 Xt−3 + ... + φ p Xt−p + bεt , (1.1)
where εt is white noise, φi are real numbers and Xt is the value of the process at the discrete
time t. Defining a lag operator B as Xt−1 = BXt and the polynomial Φ(B) = 1 − φ1 B −
φ2 B2 − ... − φ p B p , the AR(p) can be written as
Φ(B)Xt = bεt . (1.2)
The autocorrelation ρk verifies
ρk = φ1 ρk−1 + φ2 ρk−2 + ... + φ p ρk−p , (1.3)
and decays exponentially. This means that the AR(p) model is unable to capture autocorre-
lation functions that decay at a rate slower than exponential. In particular, it cannot strictly
model a LRD process. However, a pseudo self-similar process, i.e. a process that exhibits a
self-similar behaviour over a finite range of scales only (see [154] for a definition), can be
obtained by mixing AR processes with appropriate coefficients and has been used for traffic
modelling [10, 112].
The Autoregressive Moving Average Model of order (p, q) is denoted ARMA(p,q) and
has the form
Xt = φ1 Xt−1 + φ2 Xt−2 + φ3 Xt−3 + ... + φ p Xt−p + εt − θ1 εt−1 − θ2 εt−2 − ... − θq εt−q . (1.4)
Defining Θ(B) = 1 − θ1 B − θ2 B2 − ... − θq Bq , this model can be represented as
Φ(B)Xt = Θ(B)εt . (1.5)
The autocorrelation function of the ARMA(p,q) model can be calculated for all lags k. For
k > q, it is identical to the autocorrelation function of the AR(p) model, and is therefore
unable to represent a process with autocorrelation function decaying at a rate slower than
exponential.
The Autoregressive Integrated Moving Average Model of order (p, d, q), denoted ARIMA
(p, d, q), is an extension of the ARMA(p, q) obtained by allowing the polynomial Φ(B) to
have d roots equal to unity, with the rest of the roots lying outside the unit circle. It has the
form
Ψ(B)∆d Xt = Θ(B)εt , (1.6)
where ∆ is the difference operator defined by ∆Xt = Xt − Xt−1 = (1 − B)Xt . When d is

an integer, the ARIMA(p,q) is a strictly SRD process. It can be extended to the Frac-
tional ARIMA (FARIMA) process by taking 0 < d < 0.5. For instance, the FARIMA(0,d,0)
process is a stationary process with autocorrelation function given by [87]:
Γ(1 − d)Γ(k + d) Γ(1 − d) 2k−1

ρ(k) = ∼ k as k → +∞. (1.7)
Γ(d)Γ(k + 1 − d) Γ(d)
This is a long-range dependent process with Hurst parameter H = d + 0.5.

FARIMA processes can therefore be used successfully to model LRD Internet traffic
[21]. A fast generation method for FARIMA processes can be found in [117]. Simulation
results concerning the queuing performance of a FARIMA(1,d,0) process were given in
[8]. Techniques for fitting the parameters of a FARIMA process to measured traffic were
proposed in [180]. FARIMA processes have also been used to compare different techniques
of estimation of the Hurst parameter [168].
Fractional Brownian motion
Fractional Brownian motion (fBm) has proved very popular as a model of Internet traf-
fic because it is a simple model that exhibits LRD and is amenable to analysis. It is the
only Gaussian self-similar process with stationary increments 2 . Norros gave an expression
for the lower bound of queuing performance of a fractional Gaussian noise process [135],
and presented techniques for the use of fBm in the modelling of telecommunications net-
works [136].
The fBm is also of special interest as a limiting case for many other models of LRD
traffic. For instance Brichet et al. [26] showed that the superposition of N fluid ON/OFF
sources with heavy tailed ON and/or OFF times converges to an LRD Gaussian process as
N tends to infinity, and gave a relationship between the queuing behaviour of this limiting
process and that of fBm.
A large number of queuing results have been derived for fBm traffic: large deviation
results in very large buffers [54], queue length asymptotics using the Fourier decomposition
of fBm [129], or estimates for the queuing behaviour of fBm processes [122].
Point process models
A practical way to generate a LRD process is to use a doubly stochastic process, for instance
a compound Poisson process [118, 119]. For instance, the LRD behaviour can be introduced
in the intensity process λ (t) by a power law shot noise. More specifically, let define a power
2A formal definition of self-similarity and fBm can be found in section 2.2.1 page 19
law shot noise λ (t) by
λ (t) = ∑ h(Kn ,t − un ), (1.8)

n
Kt −β

0 < A ≤ t < B,
h(K,t) = (1.9)
0 otherwise
where the {un } stand for arrival times drawn from an homogeneous Poisson process and
the {Kn } for a set of independant and identically distributed random variables representing
amplitude. The process λ (t) will be long range dependent, with exponent α = 2(1 − β ),
when B is infinite and 1/2 < β < 1. More details about Long Range Dependent doubly
stochastic point processes can be found in [158] and references therein.
Other models
There is a large variety of mathematical constructions that have been used to describe Inter-
net traffic in addition to the above mentioned examples. For instance the authors in [144]
suggested the M/G/∞ input process as a viable model for network traffic. The Ornstein
Uhlenbeck process, inspired by physics theory and the Langevin equation, has also proved
to be an interesting approach [99, 162] . Moreover models have been used to improve on
some of the major drawbacks of fBm. First, fBm models only describe the traffic at large
time scales. This is why more elaborate models based for instance on Infinitely Divisible
Cascades (IDC) have been introduced to link large and small scale behaviours in a single
model [156, 172]. Second, they have Gaussian marginals and therefore cannot guarantee
positive marginals while the processes they seek to model has inherently positive marginals.
Wavelet models have been developed [152] which have positive marginals and reproduce
the scaling of Internet data.
1.4.2 Physical models
In this section we describe modelling work based on physical models, i.e. models for which
the parameters can be related to a networking cause.
ON/OFF processes
An ON/OFF process is a process that can take only two values that define its ON and OFF
state. ON and OFF periods are all mutually independent and identically distributed (i.i.d.).
This process has been a very popular with the traffic modelling community because of its
underlying physical meaning. It was first proposed by Mandelbrot in an economics context
[121].
Assume that aggregate traffic is composed of many independent sources, and that each
source is of the ON/OFF type. ON and OFF period durations can follow different distrib-
utions, and during an ON period the source transmits with a certain ‘rate’ or ‘reward’ (one
ON period could correspond to a flow3 of IP packets transmitting a given file). What can be
said about the normalized limit process in the limit of a large number of sources and/or large
time scales? Depending on the nature of the rate distribution function, the aggregate traffic
can have ‘mild marginals’ with finite variance or ‘wild’ marginals with infinite variance.
Depending on the distribution of the ON and OFF period duration, the traffic can be SRD,
with Markovian characteristics, or LRD, i.e. non Markovian. This illustrates two different
kinds of variability: marginals and time correlation, and defines different levels of traffic
‘burstiness’.
It was shown in [170] how the self-similar characteristics of the aggregate traffic are
intrinsically linked to the heavy tailed nature of the ON/OFF duration distributions. If the
distribution of the ON and/or OFF duration is heavy tailed, then the process is LRD [75].
The superposition of such processes is also LRD [115, 170, 178]. This model is constructive
because it can be related to an underlying network cause: the sizes of the files transmitted
on the Internet have a heavy tailed distribution [43]. Each ON period corresponds to the
transmission of a file. If the transmission rate is constant, the durations of the ON periods
will also be heavy tailed [143]. This superposition of ON/OFF processes with heavy tailed
durations, or models that tend to it, is the most widely accepted mechanism to explain LRD
in Internet traffic.
Modelling TCP behaviour
The study of TCP performance can be done at three different levels: experimentation on a
real network, simulation using emulating software such as ns [106], or analytical modelling.
TCP modelling can be used for instance to determine the factors impacting the performance
of the protocol or to devise new congestion control algorithms. In the case of a drop-tail
buffer and synchronized flows, the throughput of a TCP connection is inversely proportional
to the square of the average Round Trip Time (RTT) [102]. Models have also shown that
dropping packets randomly in network routers as with active queues (e.g. Random Early
Detection (RED) [66]) improves the fairness of TCP by making the throughput inversely
proportional to the average RTT [13, 139]. However, recent measurement work on the
topic has showed that RED might have a negative impact on the performance of a network
[123]. Most studies on TCP modelling assume long transfers and focus on the congestion
3 An exact definition of a flow will be given in section 3.2.3
1.5. CONTRIBUTIONS AND THESIS OUTLINE 15
avoidance mode of TCP, ignoring the short duration of the slow start mode [13, 102, 127,
128].
Other researchers have tried to give a physical explanation to the apparent scaling behav-
iour observed at small time scales. An obvious candidate for such statistics is the feedback
dynamics of TCP. In [65] the authors first showed how a Markovian model could describe
the correlation structure of both the exponential back-off and congestion avoidance phases
of TCP. In [74], the authors also modelled short TCP connections using a simple Markov
chain, and showed that in the case of high losses, the TCP congestion control algorithm
generates traffic with heavy-tailed OFF periods. Veres and Boda [175] showed that under
severe network conditions, TCP congestion control protocol shows chaotic nature and gen-
erates self-similar behaviour. In [176] they showed that TCP preserves the LRD created at
the application layer. Indeed, when a TCP connection is mixed with self similar traffic in
a bottleneck buffer, it takes on the second order properties of that traffic and can therefore
propagate them to other parts of the network. They also noted that HTTP/1.1 should adapt
better to changing traffic fluctuations and therefore improve the propagation of the self sim-
ilar behaviour. Baccelli and Huong [16] observed that the sharing of a bottleneck router by
several long lived TCP connections could be reduced to products of random matrices and
showed that the ‘Additive Increase, Multiplicative Decrease’ of the TCP congestion control
could lead to self-similar behaviour, but could not (yet) account for a multifractal scaling
behaviour.
In [159] the authors introduced the notion of alpha and beta traffic: in short, they
showed that, on time scales around 50-500ms, a burst in IP traffic was usually dominated
by a single high-rate flow. Such flows were called alpha and the remainder the beta traffic.
The alpha traffic is extremely bursty and concerns only a very small proportion of flows.
1.5 Contributions and thesis outline
In this thesis, we concentrate exclusively on so called physical models because our aim is
to gain a better understanding of how IP packets flow through a network. Contrary to most
of the studied presented in section 1.4 where so called fluid models were used, we seek to
model individual packets and will therefore use point process models.
1.5.1 Contributions
This thesis makes the following main contributions:
• We base all our analysis on empirical measurements of Internet traffic, so that no

assumption is made on the traffic characteristics. We present the first empirical results
of a fully instrumented router.
• We use extensively a technique we call semi-experiments to make informed deci-

sions on what aspects of the data have the most impact and should be modelled. In
particular, we explain why the flow arrival process has little influence on the packet
arrival process for current backbone Internet traffic.
• We present a physical model of packet arrival times based on our empirical observa-
tions. It is a Bartlett-Lewis point process which can reproduce packet statistics for
time scales larger than 10ms.
• We show that current practices to report sampled traffic information in routers can be
improved by using a flow sampling technique.
• We present a simple model of a router based on our empirical results and give a
thorough understanding of single hop packet delays.
• We present a new technique to report packet delay information in routers based on

busy period statistics.
• We show that the Bartlett-Lewis point process can model the splitting and merging
of packet streams through a router.
1.5.2 Outline
The rest of the thesis is organized as follows:

Chapter 2 provides some mathematical background on scaling processes, point process
theory and our primary statistical tool: the wavelet analysis. In particular we formally
introduce the fundamental notions of Long-Range Dependence and wavelet based scaling
estimators, and give results on point processes spectral theory used in chapters 4 and 5.
Chapter 3 presents the empirical evidence upon which our modelling effort is based. We
first describe the passive measurements used in the thesis. We then identify the networking
causes of the observed packet trace statistics by selectively modifying several of the com-
ponents comprising the full packet stream. We call this way of investigating our empirical
data the semi-experimental method. Our fundamental result is that for the purpose of mod-
elling the overall process of IP packets, flows can be treated as statistically independent.
Last we study in more detail how the flow arrival process could influence the second order
properties of the packet arrival process should certain circumstances be met.
1.6. HOW TO READ THIS THESIS 17
In chapter 4 we build a packet arrival model based on the empirical findings of chapter 3.
We show how a particular type of Poisson cluster process known as Bartlett-Lewis point
process can very accurately model the packet arrival process at time scales larger than back-
to-back packet arrivals. It only uses a small number of physically meaningful parameters
and provides insight on the role of each of these parameters in the overall traffic statistics.
In chapter 5 we study the problem of traffic sampling, a very timely problem given the
ever increasing link speeds. We compare the performance of packet and flow sampling
techniques, and advocate the use of flow sampling for many purposes. We also derive
sampling results for our traffic model and show how the model parameters can be fitted
from sampled data.
In chapter 6 we study Internet traffic on smaller time scales than what was done in
chapter 4 by focusing on queuing mechanisms inside a router. We present the first empirical
results on a fully monitored router. We use these results to build a mathematical model of
a store and forward router, and show how packet delays through the router can be very
accurately predicted by our model. We also propose a method to directly report router
performance information based on busy period statistics.
In chapter 7 we use the results from all the previous chapters to present a global valida-
tion of our traffic model at a network node. We first validate the main assumptions of the
model on a very large amount of empirical data and then show that the model can account
for the splitting and merging of packet streams in a router.
Last, in chapter 8, we summarize our main contributions and propose possible topics
for future research.
1.6 How to read this thesis
The mathematical background of chapter 2 is presented for completeness only. Its full un-
derstanding is not a prerequisite to the rest of the thesis. However the reader should at least
become familiar with the Logscale Diagram statistical tool because it is used throughout.
Chapters 3 and 4 should be read in succession since our modelling work of chapter 4 relies
heavily on the empirical findings presented in chapter 3. Chapters 5 and 6 are fairly self
contained and could potentially be read independently of the rest. Chapter 7 builds on all
the previous chapters and should therefore be read last.
In order to ease the reading and understanding of the thesis, a summary of the principal
notations is provided (page xxi), as well as an index (page 173). In the bibliography section,
page numbers where each reference is cited are given.
Chapter 2
Mathematical background
2.1 Introduction
This chapter presents some mathematical background used throughout the thesis. First the
notion of scaling behaviour is formally introduced. In particular we give rigorous defin-
itions of concepts mentioned in chapter 1, such as self-similar and long-range dependent
processes. Second, results on point process theory are given. Last, the discrete wavelet
transform and its use for statistical estimation of scaling processes are discussed.
2.2 Self-similarity and other scaling behaviours

2.2.1 Self-similarity
Definition 2.2.1. A stochastic process X(t) is said to be H Self Similar (HSS) with station-
ary increments if it has stationary increments and if the following equality in distributions
holds for all scales:
d
Y (at) = aH Y (t), for all a > 0. (2.1)
This definition means that one cannot distinguish between the statistics of the process and a
affinely dilated version of the process. There is therefore no reference scale. The process is
in fact characterized by the relation between scales governed by the parameter H, known as
the Hurst parameter. A consequence of equation (2.1) is that the moments of a HSS process
(if they exist) behave as power laws of time:
E|X(t)|q = E|X(1)|q |t|qH . (2.2)
A classic example of HSS process is the fractional Brownian motion:
Definition 2.2.2. The fractional Brownian motion (fBm) BH (t) is defined as the centered
Gaussian process with variance σ 2 and covariance
σ 2 2H 2H
E(BH (s)BH (t)) = (s + t − |t − s|2H ), H ∈ [0, 1]. (2.3)
2
19
20 CHAPTER 2. MATHEMATICAL BACKGROUND
It is the only Gaussian self-similar process with stationary increments. While a process
d
Y (t) satisfying equation (2.1) cannot be stationary (this would involve Y (at) = Y (t)), it is
assumed in the following to have stationary increments.
The property of self-similarity is extremely strong since it has the following two conse-
quences:
1. The scaling behaviour applies at all the scales of a process.
2. The scaling of moments is governed by a single exponent H (equation (2.2)).
This is therefore quite restrictive. In actual data, scaling is often found to hold in the limit
of small or large scales, and the scaling of moments can be governed by a collection of
exponents instead of a single one. These two restrictive consequences can be respectively
alleviated with the properties of Long Range Dependence and Multifractal.
2.2.2 Long-Range Dependence
Long Range Dependence (LRD) models scaling behaviours observed in the limit of large
scales [20]. It is defined in terms of second order statistics as follows.
Definition 2.2.3. A stationary stochastic process X(t) is said to be Long Range Dependent
(LRD) if its autocorrelation function γX (k) is characterised by a power-law decrease at large
lags:
|k|→+∞
γX (k) ∼ cγ |k|−(1−α) , with α ∈ (0, 1), (2.4)
or, equivalently, if its power spectral density ΓX (ν) has a power law behaviour at frequencies
close to the origins:
|ν|→0
ΓX (ν) ∼ cΓ |ν|−α , with α ∈ (0, 1). (2.5)
The power law decrease of the autocovariance function implies that its integral sum
diverges because the past values are so heavily weighted. LRD models are also called ‘long
memory’ processes for this reason. By contrast, the autocovariance function of a ‘short
memory’ process, such as ARMA processes introduced in section 1.4.1, has an asymptotic
exponential decrease and its sum converges. An example of LRD process is the fractional
Gaussian noise (fGn), defined as the increments of fBm, and illustrated in figure 2.1.
In fact, for any HSS process X(t) with stationary increments, its increments Yδ (t) =
X(t + δ ) − X(t) verify
E|Yδ (t)|q = E|X(1)|q |δ |qH . (2.6)

2.2. SELF-SIMILARITY AND OTHER SCALING BEHAVIOURS 21
Figure 2.1: Illustration of the scale invariance phenomenon for fGn with H=0.7.
It can be shown that the autocovariance of the increments has the following asymptotic
behaviour
s>>δ
γYδ (s) ∼ E|X(1)|2 H(2H − 1)s2(H−1) , . (2.7)
From equation (2.4), this means that when 0.5 < H < 1, Yδ (t) is a LRD process with
α = 2H − 1. (2.8)
Self-similarity and long-range dependence are often practically studied with an aggregation
technique, more suited for time series analysis:
Definition 2.2.4. Let Y = {Yi } be a stationary sequence and

km
1
Y (m) (k) = ∑ Y (i), k=1,2,..., (2.9)
m i=(k−1)m+1
be the corresponding m-aggregated sequence. If Y is the increment process of a self-similar

process defined in (2.1), then
d
Y = m(1−H)Y (m) , for all integer m. (2.10)
A stationary sequence Y = {Yi } is said to be exactly self-similar if it satisfies equation (2.10)

for all aggregation levels m. A stationary sequence Y = {Yi } is said to be asymptotically
self-similar or Long Range Dependent if equation (2.10) holds as m tends to infinity.
As already mentioned in section 1.4, there are many ways to generate LRD processes,
such as sum of ON/OFF processes with heavy tailed ON and/or OFF periods [170], filtering
of fractional ARIMA processes [20] or use of doubly stochastic point processes [118].
2.2.3 Multifractals
We now focus on scale invariance over small time scales, and introduce the notion of multi-
fractal. Since this concept will not be thoroughly used in this thesis, the presentation is not
always perfectly rigorous. An in-depth definition can be found for instance in [151].
Definition 2.2.5. A process X(t) with stationary increments Yδ (t) is said to be of Hölder
regularity h(t0 ) in t0 , 0 ≤ h ≤ 1, if there exists a constant K > 0 such that
|Yδ (t0 )| ∼ K|δ |h(t0 ) . (2.11)

δ →0
Local Hölder regularity therefore compares a sample path at each points with a power
law. From that respect, it is quite close to the definition of scaling. Indeed equation (2.11) is
reminiscent of equation (2.6) and one can loosely relate a HSS process with a process having
the same local Hölder regularity at every point [148]. The function h(t) characterizes the
smoothness or the sharpness of the graph of the function X at time t. For instance, when
0 ≤ h < 1, the function is not differentiable. One can get a geometrical interpretation of this
concept by defining a Hausdorff spectrum D(h), defined as the Hausdorff dimension of the
set of points t with Hölder regularity h(t) = h. One can also think of it as measuring ‘how
often’ each value h(t) = h is found and get a frequency representation of h. However, this
geometrical approach is not applicable in practice since the Hausdorff dimension of such
sets cannot be estimated from empirical data.
This is why one often prefers a statistical description of the multifractal spectrum, linked
to the above geometrical approach through the multifractal formalism [151]. In essence, one
can show that the qth order moment of the stationary increment process Yδ (t) verifies
E|Yδ (t)|q = cq |δ |H(q) , (2.12)
where H(q) is not necessarily a linear qH behaviour as found for HSS processes. The multi-
fractal Legendre spectrum is obtained from the exponents H(q) with a Legendre transform:
D(h) = inf (qh − H(q)) (2.13)

q∈IR
In this thesis, we do not use D(h) to study multifractal properties but instead a wavelet-based
approach described in section 2.4.3.
2.2.4 Infinitely Divisible Cascades
There is a very elegant way to describe all the above statistical behaviours and describe
the different scaling regimes at small and large time scales with a single mathematical ob-
ject called an Infinitely Divisible Cascade (IDC). In a nutshell, one can rewrite the scaling
2.3. POINT PROCESSES 23
behaviours of HSS and multifractals as follows:
Self-similarity: E|Yδ (t)|q = cq |δ |qH = cq exp(qH ln δ ), (2.14)
Multifractal: E|Yδ (t)|q = cq |δ |H(q) = cq exp(H(q) ln δ ). (2.15)
One can then introduce the even more general scaling behaviour
Infinitely Divisible Cascade: E|Yδ (t)|q = cq exp(H(q)n(δ )), (2.16)
where n(δ ) is a priori an arbitrary function of δ . The two main features of IDCs are that
their moments do not have to behave as power laws of the scales, and the scaling of moments
is governed by a collection of exponents. The concept of IDCs was first introduced in [27]
and has since been widely used to study intermittent phenomena in turbulence (see [29] and
references therein for details). We will not thoroughly study this last topic in this thesis, but
simply mention when it could be used, for further investigation.
2.3 Point Processes

2.3.1 Introduction
The IP traffic on a given link is fully characterized by the arrival times of IP packets and their
respective size. The arrival times of IP packets can be modeled by a point process, while
the full description of the traffic ( i.e. packets arrival times plus packet sizes) necessitates a
marked point process . In this section we give some basic properties of point processes that
will be used in chapters 4 and 5. Our presentation follows Cox and Isham [42] and Daley
and Vere-Jones [46].
In mathematical terms, a point process is a random collection of points falling in some
space. When modeling temporal events, the space in which points fall is a portion of the
real line. On the other hand, when modelling fire spread or forest growth for instance, the
space is IR2 . A point process can be defined as a random measure N on a space S taking
non negative integer values in Z + . In this framework, N(A) represents the number of points
falling in the subset A of S. Typically one restricts the definition to random measures that are
finite on any compact subset of S, and to the case where S is a completely separable metric
space such as IRd . In what follows we try to keep the definitions and properties of point
processes as general as possible, but restrict ourselves to IR for simplicity when necessary.
In the particular case of temporal point processes there are other possible definitions,
perhaps more intuitive than the random measure. Consider the times of events falling be-
tween 0 and T . The point process N can be defined by the ordered list of events times
{t1 ,t2 ,t3 , ...}. The equivalent information can be conveyed by the series of inter event times
{τ1 , τ2 , τ3 , ...} where t0 = 0 and τi = ti − ti−1 . Alternatively, N can be defined by the count-
ing process N(t) where for any t between 0 and T , N(t) is the number of events occurring
at or before t. This process N(t) must take non negative integer values, be non decreasing
and right continuous.
A realization of a point process is often written as a sum of Dirac delta measures δti
where for any measurable set A, δti (A) = 1 if A contains ti and δti (A) = 0 otherwise. The
R
integral with respect to dN, noted A dN, is the number of points in the set A. For any
function f :
Z
f dN = ∑ f (ti ) (2.17)
A i:ti ∈A
A point process is called simple if all its points {ti } are distinct, i.e. ti 6= t j for i 6= j. A point
process is orderly if for any t:
1
P(N[t,t + ∆t] > 1) → 0 when ∆t → 0. (2.18)
∆t
Let A1 , A2 ... be disjoint Borel sets on the real line. The joint distributions
Pr{N(Ai ) = ni ; i = 1, ..., k}, (2.19)
for ni = 0, 1, 2, ..., i = 1, ..., k, k = 1, 2, ... fully characterize the point process.

In what follows we will be concerned with stationary point processes only . The qualita-
tive idea of stationarity is that the structure of the point process is unaffected by translation
of the time axis. The process is strictly stationary if the joint probabilities defined by equa-
tion (2.19) are unchanged by translating the sets A1 , A2 , ..., Ak . If the distribution of N(I) is
invariant under translation of the arbitrary interval I then the process is simply stationary. If
the mean and the variance of N(I) are invariant under translation of the arbitrary interval I,
the process is weakly stationary.
A temporal point process N is typically described by its conditional rate process λ ,
also refered to as conditionnal intensity. Formally, the conditional rate λ associated with an
orderly point process N is defined via
P(N[t,t + ∆t] > 0|Ht )

λ (t) = lim , (2.20)
∆t→0 ∆t
where Ht is the entire history of the point process up to time t defined as Ht = {t j |t j ≤ t}.
λ (t) represents the expected instantaneous rate of events at time t, given the entire history
up to time t. Since all the finite dimensional distributions of N can be derived from the
conditional rate [46], λ (t) fully characterizes the point process N.
2.3.2 Definitions
Poisson process
The most important type of point processes is the Poisson process . It is defined as a simple
point process for which the number of points in any set follows a Poisson distribution and
the numbers of points in disjoint sets are independent: N is a Poisson process if for any dis-
joint measurable subsets A1 , ..., An of S, N(A1 ), ..., N(An ) are independent Poisson random
variables.
Definition 2.3.1. [46] (p. 18) Let N(ai , bi ] denote the number of events of a process falling
in the half-open interval (ai , bi ] with ai < bi ≤ ai+1 . The stationary Poisson process on the
line with rate λ is completely defined by the following equation:
k
[λ (bi − ai )]ni −λ (bi −ai )
Pr{N(ai , bi ] = ni , i = 1, ..., k} = ∏ e (2.21)
i=1 ni !
Proposition 2.3.1. For a stationary Poisson process,
(i) the number of points in each finite interval (ai , bi ] has a Poisson distribution,
(ii) the numbers of points in disjoint intervals are independent random variables, and
(iii) the distributions are stationary: they depend only on the lengths bi − ai of the inter-
vals.
Proposition 2.3.2. For any N-uple (t1 , ..,tN ), 0 ≤ t1 < ... < tN ≤ T , the conditional den-
sity of obtaining points at (t1 , ..,tN ) given N points in the interval (0, T ] is N!/T N , which
corresponds to a uniform distribution.
This result can also be thought of in the following way: there are in fact N! ways of allo-
cating the N time points (t1 , ..,tN ), with each time point being uniformly and independently
distributed over (0, T ].
Renewal process
In the Poisson process, the interarrival times are independently exponentially distributed. A
generalization is to allow the intervals to be independent and identically distributed (i.i.d.)
random variables. The resulting series of events is called a renewal process . Let f1 (x) be
the pdf of the first interval X1 and f (x) be the pdf of i.i.d. intervals X2 , X3 , .... Depending on
the choice of the time origin, the following situations can occur:
• f1 (x) = f (x), so that all random variables are identically distributed, the process is
then an ordinary renewal process;
• f1 (x) and f (x) are not necessarily the same, the process is a modified renewal process;
1−F(x)
• f1 (x) = µ , where µ = EXi , i = 2, 3, ... and F(x) is the distribution function cor-
responding to f (x), the process is an equilibrium or stationary renewal process.
For an ordinary renewal process, the density function f governing each inter event time is
called the renewal density. The conditional rate is given by λ (t) = s(t − t˜), where t˜ is the
time of the most recent event prior to time t, s is the survivor function corresponding to f
defined as
f (t)
s(t) = , (2.22)
1 − F(t)
Rt
with F(t) = 0 f (u)du the cumulative distribution function corresponding to f .
Cluster process
Cluster processes are constructed as follows: there is a point process of cluster centres and to
each cluster is associated a random number of points forming a subsidiary process or cluster.
These subsidiary points are distributed around the cluster center in some specified way. The
cluster process consists of the superposition of all the separate clusters. The cluster centers
may or may not be included in the final process. In what follows it is assumed that the
number of points in different clusters is i.i.d.
A case of particular interest is a Poisson process of cluster centers. The resulting cluster
process is then called Poisson cluster process(PCP) . There are two main cluster processes
widely studied in applications: the Neyman-Scott point process and the Bartlett-Lewis point
process(BLPP). In the Neyman-Scott process the points of a cluster are i.i.d. around the
cluster center with some probability density function f . In the Bartlett-Lewis process, the
intervals between successive points in a cluster are i.i.d. with probability density function f
and each cluster therefore forms a finite ordinary renewal process.
Let consider a Bartlett-Lewis process X(t) with clusters arrival rate λF , a number P of
points in a cluster with discrete density Pr{P = k} = pk , and distribution of inter-arrivals
within a cluster FA (x). Let µP be the mean number of events per cluster, µA the mean inter-
arrival of events within a cluster and λA = 1/µA the average rate of arrivals within a cluster.
A necessary condition to have a stationary Bartlett-Lewis process is to have both µP and µA
finite [110]. In that case, the average rate of X(t) reads
λX = λF µP . (2.23)
Equilibrium conditions and inter-arrival time distribution can also be derived [110]. More
details on BLPPs will be given in chapter 4 where they will be used to model IP packet
arrival times.
Infinitely divisible point process
Definition 2.3.2. [46] (p. 255) A point process is said to be infinitely divisible if, for every
k, it can be represented as the superposition of k independent, identically distributed, point
process components.
Lemma 2.3.3. [46] (p. 256) A point process is infinitely divisible if and only if its finite
dimensional distributions are infinitely divisible.
A Poisson process provides a simple example of such process . In fact, if X(t) is a

Poisson process with rate λ then, for each integer k > 0, X can be represented as a sum X =
Xk1 + ... + Xkk where the Xki , i = 1, ..., k, are independent Poisson processes with common
rate λ /k.
The Bartlett-Lewis process described in the previous section is also an infinitely di-
visible process since for any integer k > 0 it can be expressed as a sum of Bartlett-Lewis
processes with cluster rate λF /k and same characteristics for the internal structure of each
cluster. This means that for any arbitrary set A, the distribution of N(A) for a Bartlett-
Lewis process is infinitely divisible1 . From [64], an infinitely divisible discrete distribution
is a compound Poisson distribution, i.e. its probability generating function is of the form
exp[−µ(1 − G(z)] where G(z) is itself a probability generating function. Therefore N(A)
can be seen as a random sum of independent random variables with common p.g.f. G(z),
where the number of terms in the sum is a Poisson random variable with mean µ.
There are many other interesting types of point processes, such as self exciting point
processes and autoregressive processes (see [70] for an interesting study of infinitely divisi-
ble autoregressive processes). However, we will only be using Poisson, renewal and cluster
processes in the modelling work presented in this thesis, so we focus exclusively on these
models in the following.
2.3.3 Moments
Let consider IE{N(A)}, Var{N(A)} and Cov{N(A), N(B)} for arbitrary sets A and B.
First order moment
For a stationary orderly process of finite rate λ ,
IE{N(A)} = λ |A|, (2.24)
1 more generally it can be shown that any Poisson cluster process is in fact an infinitely divisible point process
where |A| is the measure of the set A. In particular if A = (0,t), E{N(t)} = λt. Suppose
that an event occurs at time t = 0. Define the conditional intensity function by
Pr{N(t,t + δ2 ) > 0|N(−δ1 , 0) > 0}
h(t) = lim . (2.25)
δ1 ,δ2 →0 δ2
Let U(x) be the expected number of points in the interval [0, x]. By definition
U(x) = IE[N(x)] (2.26)

Z x
= h(u)du (2.27)
0
Therefore we have the formal relation

.
U(x) = h(x) (2.28)
U(x) and h(u) will be used in the remaining to derive basic properties of point processes.
Second order moment
For disjoint sets A and B:
Var{N(A ∪ B)} = Var{N(A)} + Var{N(B)} + 2 Cov{N(A), N(B)}. (2.29)
Let divide the interval (0,t) into K intervals of length δ and consider N(t) as a sum of
counts in small intervals
K−1
N(t) = ∑ N(kδ , (k + 1)δ ] (2.30)
k=0
and
K−1
Var{N(t)} = ∑ Var{N(kδ , (k + 1)δ ]}
k=0
K−1 K−k
+2 ∑ ∑ Cov{N(kδ , (k + 1)δ ], N(lδ , (l + 1)δ ]}. (2.31)
k=0 l=0
In the limit of large K this leads

Z t
N(t) = dN(v), (2.32)
0
and
Z t Z t Z t−z
Var{N(t)} = Var{dN(v)} + 2 dz du Cov{dN(z), dN(z + u)} (2.33)
0 0 0
For an orderly process the probability of at least two point in (z, z + δ1 ] is o(δ1 ), so in
the limit δ1 → 0, N(z, z + δ1 ) take values 0 or 1. Thus
Var{N(z, z + δ1 ]} = IE{(N(z, z + δ1 ])2 } − (IE{N(z, z + δ1 ]})2
= Pr{N(z, z + δ1 ] = 1} − (Pr{N(z, z + δ1 ] = 1})2 + o(δ1 )
= λ δ1 + o(δ1 ) (2.34)
For u > 0,
Cov{N(z, z + δ1 ), N(z + u, z + u + δ2 )}
= IE[N(z, z + δ ]IE{N(z + u, z + u + δ2 ]|N(z, z + δ1 ]}]
−IE{N(z, z + δ1 ]}IE{N(z + u, z + u + δ2 ]}
= Pr{N(z, z + δ1 ] = 1}Pr{N(z + u, z + u + δ2 ] = 1|N(z, z + δ1 ] = 1}
−Pr{N(z, z + δ1 ] = 1}Pr{N(z + u, z + u + δ2 ] = 1} + o(δ1 δ2 )
= λ h(u)δ1 δ2 − λ 2 δ1 δ2 + o(δ1 δ2 ) (2.35)
One can define the covariance density of counts by
c(u) = λ δ (u) + λ h(u) − λ 2 . (2.36)
Equation (2.33) becomes

Z t Z t Z t−z
Var{N(t)} = λ dv + 2 dz duλ h(u) − λ 2
0 0 0
Z t Z t
= dz du c(u − z) (2.37)
0 0
Because of stationarity h(−u) = h(u) and therefore c(−u) = c(u).
2.3.4 Density spectrum
The spectral density of counts is defined as the Fourier transform of the covariance density
and reads
Z +∞
ψ(ω) = c(u)e− jωu du
−∞
Z +∞
= λ +λ (h(u) − λ )e− jwu du (2.38)
−∞
Let h̃(s) be the Laplace transform of h(u). For ω > 0 equation (2.38) can be written as
ψ(ω) = λ (h̃( jω) + h̃(− jω) + 1) (2.39)
Renewal process
Let consider a sequence X1 , X2 ,... of i.i.d. non negative random variables with probability
distribution F. Define S0 = 0 and Sn = Sn−1 + Xn = X1 + ... + Xn for n = 1,2,.... {Sn }
represents the arrival times of an ordinary renewal process. In the case of an ordinary
renewal process, the function U(x) defined in section 2.3.3 is called the renewal function
and reads:
U(x) = IE[N(t)]
∞
= 1 + ∑ Pr(Sk ≤ x)
k=1
∞
= 1 + ∑ F k∗ (x) (2.40)
k=1
Conditionning on X1 , time of the first renewal, one can write:

Z ∞
U(t) = IE[N(t)|X1 = x]dF(x). (2.41)
0
If x > t the only point in [0,t] is X0 and IE[N(t)|X1 = x] = 1. If x ≤ t IE[N(t)|X1 = x] =

1 +U(t − x). U(t) therefore reads
Z ∞ Z t
U(t) = dF(x) + (1 +U(t − x))dF(x)
t 0
Z t
= 1+ U(t − x)dF(x). (2.42)
0
Equation (2.42) is called the renewal equation . Assuming that F has a density f , one gets
by differentiation
Z t
U̇(t) = 0 +U(0) f (t) + U̇(t − x) f (x)dx. (2.43)
0
From equation (2.28) we have h(t) = U̇(t), and equation (2.43) reads
Z t
h(t) = f (t) + h(t − x) f (x)dx. (2.44)
0
In the Laplace domaine:

h̃(s) = f˜(s) + h̃(s) f˜(s), (2.45)
where h̃ = L [h] and f˜ = L [ f ]. This leads
f˜(s)
h̃(s) = (2.46)
1 − f˜(s)
The spectrum of the renewal process with inter-arrival density f therefore reads
f˜( jω)
f˜(− jω)
ψ(ω) = λ + +1
1 − f˜( jω) 1 − f˜(− jω)
1 1
= λ + −1 (2.47)
1 − f˜( jω) 1 − f˜(− jω)
For the special case of a Poisson process with rate λ : f (x) = λ exp(−λ x), the renewal
density (or hazard function) is h(t) = λ and the density spectrum is
φn (ω) = λ /(2π). (2.48)

Bartlett-Lewis point process
Details on the spectral density of the BLPP will be presented in section 4.3.2.
2.3.5 Operations on point processes
Point processes are mathematical objects that lend themselves to a large range of operations.
Superposition
The superposition of point processes corresponds mathematically to addition: N3 is the

superposition of N1 and N2 if for any measurable set A of S N3 (A) = N1 (A) + N2 (A).
Let X1 (t) and X2 (t) be two independent point processes with respective spectra ψ1 (ω)
and ψ2 (ω). Let X3 (t) = X1 (t) + X2 (t). Its spectrum reads
ψ3 (ω) = ψ1 (ω) + ψ2 (ω). (2.49)
Thinning
In general terms, the thinning of a point process X with rate λ consists in keeping each point
of X with probability q or rejecting it with probability 1 − q to form a new point process Xq
with rate λq = qλ . This notion will be used extensively in chapter 5.
In what follows, we are only concerned with i.i.d. thinning. A useful quantity when
studying thinning is the average number of points in [0, x] defined by equation (2.26) as
U(x) = IE{N(x)}. (2.50)
Define Uq (x) the number of points in [0, x] for the thinned process Xq and hq (u) its condi-
tional intensity. One has the relation
Uq (x) − 1 = q(U(x) − 1), (2.51)
and therefore
hq (u) = qh(u) and h̃q (s) = qh̃(s). (2.52)
From equation (2.38) the spectrum of Xq reads
ψq (ω) = λq (h˜q ( jω) + h˜q (− jω) + 1)
= qλ (qh̃( jω) + qh̃( jω) + 1)
(2.53)
and thus
ψq (ω) = q2 ψ(ω) + q(1 − q)λ . (2.54)
Equation (2.54) is valid for the i.i.d. thinning of any point process X which is simple, locally
finite and second order stationary.
• Poisson process
A fundamental result concerns the i.i.d. thinning of a Poisson process. Let the count-
ing process N(t) be a Poisson process with rate λ . Each time an event occurs it is
classified as a type I event with probability q or a type II event with probability 1 − q
independently of all other events. Let N1 (t) and N2 (t) denote respectively the type I
and type II events. Note that N(t) = N1 (t) + N2 (t).
Theorem 2.3.4. N1 (t) is a Poisson process with rate λ q and N2 (t) is a Poisson
process with rate λ (1 − q). The two Poisson processes are independent.
• Renewal process
Let us now consider a thinning process where each point Sn of a renewal process X
is retained with probability q or omitted with probability 1 − q. The new process Xq
is a renewal process with interarrival density fq (x). From equation (2.51), its renewal
function Uq is given by
Uq (x) − 1 = q(U(x) − 1). (2.55)
Thus
f˜q (s) f˜(s)
h̃q (s) = = qh̃(s) = q , (2.56)
1 − f˜q (s) 1 − f˜(s)
and
q f˜(s)
f˜q (s) = . (2.57)
1 − (1 − q) f˜(s)
In the particular case of a Poisson process with rate λ and f˜(s) = λ
s+λ , equation
(2.57) gives f˜q (s) = qλ /(s + qλ ). This is another way to prove that a thinned Poisson
process with rate λ is a Poisson process with rate qλ .
• Cluster process
The more complex case of thinning a BLPP will be detailled in chapter 5.
Random translation
In this operation, each point ti of a stationary orderly point process X(t)with rate λ is shifted
by a random amount with p.d.f. f to form a new point process X f (t). Obviously, X f (t) is
also an orderly stationary process with rate λ f = λ . The conditional intensity of X f (t) is
given by
Z +∞
h f (u) = h(u − v) fD (v)dv, (2.58)
−∞
2.4. WAVELET ANALYSIS 33
where fD is the density of the difference D between two independent translations each with
density f . From equation (2.38), the spectrum of X f (t) satisfies:
Z +∞
ψ f (ω) = λ + λ (h f (u) − λ )e− jωu du
−∞
Z +∞ Z +∞
= λ +λ dv fD (v) du(h(u) − λ )e− jωu du
−∞ −∞
= f˜D (ω)ψ(ω) + (1 − f˜D (ω))λ , (2.59)
where
Z +∞
f˜D (ω) = fD (u)e− jωu du. (2.60)
−∞
Since fD is symmetric, f˜D (ω) is real. In fact f˜D (ω) = | f˜(ω)|2 and the spectrum of the
translated point process reads
ψ f (ω) = | f˜(ω)|2 ψ(ω) + (1 − | f˜(ω)|2 )λ (2.61)
Time substitution
In this operation, a point process M is transformed into a point process N by writing
N(t) = M[Λ(t)], (2.62)
for some non-decreasing, possibly random, function Λ. Such operation may be used to
transform a general point process into a Poisson process [104].
Limits
The Poisson process frequently arises as a limiting process for the above operations. In fact,
the operations of superposing, thinning or random translations on an initial point process are
entropy increasing and tend in the limit to create a Poisson process (process with maximum
entropy) [46]. For instance in the limit of small q equation (2.54) becomes
q→0
ψq (ω) ∼ qλ , (2.63)
which is the spectrum of a Poisson process. We will use this result in chapter 5.
2.4 Wavelet analysis
The last mathematical topic of interest in this chapter concerns the analysis of scaling
process. There exists many ways to study the scaling behaviour of a process based on
aggregated time series [169]. However, in what follows, we focus on wavelet based meth-
ods because of their intrinsic scaling properties which make them particularly suited for this
purpose. For instance they allow to get an unbiased estimate of the Hurst parameter of an
LRD process, whereas estimates based on a Fourier spectrum are biased [4, 5]. Moreover
wavelet transforms can be calculated with a fast O(n) algorithm. A thorough description of
wavelet analysis can be found in [120], and see [6] for theoretical and practical details.
2.4.1 Definition
Discrete wavelet analysis consists in comparing a signal X(t) with locally oscillating wave-
forms known as wavelets. It is in essence similar to a Fourier transform where one compares
a signal X(t) with a family of sinusoids. More precisely, a mother wavelet ψ is a band-
pass function localised both in time and frequency, with central frequency f0 . This mother
wavelet can be shifted and scaled to give rise to a family of wavelets ψ j,k = 2− j/2 ψ(2− j t −
k), shifted in time by 2 j k and with central frequency 2− j f0 . In a similar way to expressing
X(t) as a sum of weighted sinusoids in a Fourier transform, one can express X(t) as a sum
of weighted wavelets:
nj
X(t) = ∑ cX ( j0 , k)φ j0 ,k + ∑ ∑ dX ( j, k)ψ j,k (t), (2.64)
k j≥ j0 k=1
where φ is a low pass function companion of the mother wavelet ψ. The first term in
equation (2.64) constitutes a coarse approximation of the signal X(t), while the second
term adds details at different scales. The cX ( j0 , k) are known as scaling coefficients while
the dX ( j, k) are the wavelet coefficients.
Wavelet and scaling functions can be constructed to be orthogonal, in which case the
scaling and wavelet coefficients can be obtained by inner products:
cX ( j0 , k) = hX, φ j,k i, (2.65)
dX ( j, k) = hX, ψ j,k i. (2.66)
A key practical advantage of wavelets is the fact that the coefficients can be computed from
a fast recursive algorithm with computational complexity O(n).
The mother wavelet ψ is also characterized by an integer N, called the number of van-
ishing moments, defined as the largest integer N such that
Z
t k ψ(t)dt = 0, k = 0, 1, ..., N − 1. (2.67)
It can be shown that the number of vanishing moments plays a key role in the analysis of
scaling [6]. In particular wavelets with higher N are smoother and capable of analysing
signals with higher order divergences.
2.4.2 Properties
The wavelet basis is by definition scale invariant, which is why it is well suited to study
scaling phenomena. More particularly, it can be shown that the discrete wavelet coefficients
dX ( j, k) of a HSS process X(t) with stationary increments have the following properties
(see [6] and references therein):
d
• P1: {dX ( j, k), k ∈ Z } = {2 j(H+1/2) dX (0, k), k ∈ Z },
• P2: {dX ( j, k), k ∈ Z } is stationary for each j fixed, and short range dependent if
N ≥ H + 1/2,
• P3: E|dX ( j, k)|q = 2 j(qH+q/2) E|dX (0, k)|q for each j fixed.
Property P1 can be compared with equation (2.1) and shows that wavelets form an ‘ideal’
basis to study HSS process since the relation between wavelet coefficients at different scales
mimics equation (2.1). Property P2 means that the long range correlation in the signal has
been turned to a short range correlation in the wavelet domain. This applies for correlation
between coefficients at a given scale or between scales. Finally property P3 means that there
exists a power law relationship between the qth order moment E|d( j, k)|q and the scale j. In
the case of an LRD process, the scaling behaviour of the wavelet coefficients reads
E|dX ( j, k)|2 = 2 j(2H−1) E|dX (0, k)|2 for large j. (2.68)
Let X(t) be a continuous time stationary process with power spectral density ΓX (ν). It
can be shown that the variance of its wavelet coefficients satisfies:
Z
IE|dX ( j, k)|2 = ΓX (ν)2 j |Ψ(2 j ν)|2 dν, (2.69)
where Ψ(ν) denotes the Fourier transform of ψ. Equation (2.69) can be viewed as defining
a kind of wavelet energy spectrum, analogous to a Fourier spectrum, but much better suited
to the study of fractal processes.
2.4.3 Estimation
The fact that the coefficients dX ( j, k) are stationary for j fixed (property P2) implies that
one can use an ergodicity argument to efficiently estimate the statistical average E|dX ( j, k)|q
with a simple time average
1
Sq ( j) = |dX ( j, k)|q , (2.70)
nj ∑
k
where n j is the number of wavelet coefficients at scale j.

1 4 16 64 256 1024 4096
10
8
log2 Variance( j )
2 4 6 8 10 12 14
j = log2 ( scale )
Figure 2.2: Examples of LDs. Stars (lower curve): Poisson process (λ = 1), Diamonds:
Renewal process with gamma inter-arrivals (λ = 1, shape=1/4), Top plot: fGn
(H = 0.8, α = 0.6).
In practice, Abry and Veitch [7] showed that second order scaling can be efficiently
studied by plotting log2 (S2 ( j)) against the scale j in a so called Logscale Diagram (LD):
LD : log2 (S2 ( j)) vs j. (2.71)
LDs will be our primary tool to analyze Internet traffic throughout this thesis. Figure 2.2
gives an example with synthetic data, while an example with traffic data can be found in
figure 3.3(d) page 47. In these diagrams, straight lines constitute experimental evidence for
the presence of scaling within the analyzed data over a certain range of scales. For example,
a straight line observed in the range of the largest scales with slope in (0, 1) (see figure 2.2)
betrays long memory. More generally, semi-parametric estimates of scaling exponents with
excellent properties can be formed using weighted regression to measure the slope over the
range of scales where the scaling exists.
For a HSS process, a linear relationship exists over the whole range of scales, whereas
a LRD process is characterized by a straight line at large scales for q = 2. The value of
the Hurst parameter can be obtained from the estimated slope α2 of the line. For instance,
at q = 2, if 0 < α2 < 1, the process is LRD with H = (α2 + 1)/2. On the other hand, the
case α2 > 1 corresponds to a non stationary (asymptotically) self-similar process with H =
(α2 − 1)/2. In any case care must be taken in the interpretation of the logscale diagram.
One has to check in particular that a potentially LRD data is stationary for the value of
1/2 < H < 1 to be valid.
Scaling of higher order moments can be studied in a similar fashion by plotting log2 (Sq ( j))
against the scale j in a qth order Logscale Diagram:
q-LD : log2 (Sq ( j)) vs j. (2.72)
For q fixed, a behaviour IE|dX ( j, ·)|q = c 2αq j over some scale range is seen as a straight line
in the q-LD, and a measurement of its slopes is an estimate of the corresponding q-specific
scaling exponent αq .
The question of alignment of points is intrinsicly related to the confidence intervals put
on the estimation of Sq ( j) at different scales j. By definition of the wavelet coefficients, the
number of coefficients at scale j is twice what it is at scale j + 1. This means that the values
of Sq ( j) for small j are estimated with a greater confidence than at large j. The estimation
of the slope is made by a weighted linear regression of Sq ( j) on j. More details on and an
exact formulation of the scaling exponent estimation can be found for instance in [6].
Another practical problem when estimating the scaling behaviour of a process is to
know which values of q to choose. By definition, Long Range Dependence is a second
order property, so looking at q = 2 only is enough to detect it. On the other hand, when
looking at the scaling for small values of j, one has to look over a range of q values. In-
deed, recall from property P3 that for a HSS process with stationary increments the local
slope αq relates to the order q via αq = qH − q/2, whereas in the multifractal case one
would have αq = H(q) − q/2. Therefore, if the plot H(q) = αq + q/2 against q is a straight
line, the process is HSS whereas any departure from a straight line indicates a multifractal
behaviour, where a single exponent is insufficient to describe the scaling behaviour. Follow-
ing [6], rather than plotting H(q) against q and looking for linearity, which can be delicate
in marginal cases, we plot hq ≡ H(q)/q against q and check for horizontal alignment. This
plot:
LMD : hq vs q, (2.73)
we call the Linear Multiscale Diagram. Using this approach also has the advantage that the
confidence intervals are approximately of the same size, making it easier to assess align-
ment.
2.4.4 Making sense at small scales
The analysis at small scales is considerably more difficult than at large scales. We address
three relevant issues which are typically ignored.
(1) Confidence intervals often receive little attention, or are based strongly on Gaussian
assumptions. Since at small time scales TCP/IP data is highly non-Gaussian, we use a
non parametric technique based on general wavelet properties to estimate them more
directly from data.
(2) The O(n) algorithm which calculates the dX ( j, k) requires initialisation, however for
real data, typically this is either omitted, or only samples X(kτ) are available, result-
ing in initialisation errors which are very significant for j = 1, 2. This is important as
3/4 of the data is concentrated at these scales! However, in the case of a point process,
such as X(t), one can do an exact initialisation which alleviates this problem.
(3) For intrinsically discrete data, such as packets inter-arrival times, standard wavelet
analysis does not apply and we use the special initialisation step of [174], without
which, again, significant errors are made for j = 1, 2.
As a guide to interpretation, in figure 2.2 Logscale Diagrams are given of two con-
tinuous time and one discrete time process. In the continuous cases the base resolution,
j = 0, was set to τ = 1/4 as an example. The horizontal axis is calibrated both in oc-
tave j and time t = τ ∗ 2 j . The lower curve is for a Poisson process with λ = 1, viewed
as a continuous time process with delta functions at each arrival point, with spectrum
Γ(ν) = λ 2 δ (ν) + λ . Equation (2.69) predicts IE|dX ( j, k)|2 = λ , a flat wavelet spectrum
corresponding to trivial scaling (α = 0), which agrees with the estimate in the figure, as
log2 (S2 ( j)) = log2 (variance( j)) = log2 λ = 0. It is important to understand that this level
corresponds to variance and not to rate. Means are eliminated by the wavelet analysis, and
multiplication of X(t) by a constant a translates as a level shift in the LD of log2 (a). A Pois-
son process is a simple model of flow or packet arrivals, however real inter-arrival times are
not necessarily exponential. The middle curve shows a point process with i.i.d. gamma
distributed inter-arrivals with shape parameter c = 1/4, also with λ = 1. The spectrum is
no longer flat at small scales, but it is asymptotically flat at a level of log2 (λ /c) = 2 which
reflects the higher variance 1/cλ 2 of the inter-arrivals. An approximate onset scale for this
trivial scaling at large scale is log2 (16/λ τ) = 6. Note the apparent scaling at small scales
with α > 0. The third plot is the familiar near-linear graph of fractional Gaussian noise
(fGn), a discrete time series with an early onset of LRD at j = 3. Note how the confidence
intervals are smallest at small scale.
2.5 Conclusion
In this chapter we have presented the main mathematical tools that will be used in the rest
of this thesis. The fundamental concepts of LRD point processes and Logscale Diagrams
2.5. CONCLUSION 39
were detailled. Although a thorough understanding of all these notions is not required to
understand the following chapters, this mathematical background is given as a reference to
which the reader can come back to when needed.
Chapter 3
Empirical observations and
semi-experiments
3.1 Introduction
In this chapter we first describe the empirical measurements used throughout this thesis, and
give some general statistics about Internet traffic. We then present a very thorough analysis
of the origins of these traffic statistics for both flow and packet arrival processes. The insight
we get about Internet traffic is of fundamental importance for the rest of the thesis, and will
directly inspire the choice of traffic model we make in chapter 4.
At the packet level, our approach is based on the idea of ‘shuffling’, the random re-
ordering of blocks of a time series, first proposed in [56]. This is a way of modifying
the correlations of the data whilst preserving the original structure within blocks. We ex-
tend this idea and selectively modify several of the components comprising the full packet
stream. We call this way of virtually investigating ‘what if’ scenarios the semi-experimental
method, and we employ it extensively as a tool to track down the connections and origins
of scaling behaviour. It can also be used to selectively test models for portions of the traffic
structure, without having to postulate a full model from the outset, a difficult task for such
complex data. For example, details of the arrival process of flows can be altered while pre-
serving in full the packet patterns within each flow, and the resulting effect on the scaling
structure noted.
At the flow level, our starting point is the somewhat surprising observation that the
scaling seen at the IP level, such as packets counts, is roughly similar to that found in the
arrival process of TCP flows. Namely, clear LRD at large scales, a second, though less
clear, scaling regime at small scales, and a transition scale at around 1 second separating
them. This is surprising in that the prevailing view on the origins of LRD at the IP level, i.e.
heavy tailed file sizes [179], cannot explain LRD in the flow arrival process. This similarity
41
42 CHAPTER 3. EMPIRICAL OBSERVATIONS AND SEMI-EXPERIMENTS
immediately raises the question of the link between the two. Are the twin scaling regimes
at the IP level, or aspects of it, due to or influenced by the corresponding features at the flow
level, or are they both the result of some common mechanism, or even two independent
mechanisms ?
Answers to such questions will tell us if the fractal structure of flow arrivals is important
to model accurately or not. This is important for hierarchal traffic models where an arrival
process of sessions, and then flows, forms the backbone of the final packet level model.
Although the IP level is of great importance for router throughput, another motivation for
pursuing an understanding of the flow level arrival processes is the direct role they play for
flow level performance, for example processor load in web servers and proxies.
We first introduce the data in section 3.2 and then present some statistics on the flow
arrival process in section 3.3. We then apply the semi-experimental method on the packet
arrival process in section 3.4 and report our findings. We analyze the link between packet
and flow arrival processes in more detail in section 3.5, and present our conclusions in
section 3.6.
3.2 The data and data processing

3.2.1 Passive measurements
Packet collection tools have been developed since the inception of packet switched networks
as a means to debug protocol stacks or network interfaces. There are in fact many ways of
collecting packets on a link, based either on a software or hardware solution, for both online
and offline analysis. For instance one can use hardware equipment such as a line tester or
a protocol analyser to generate real time counts of link layer faults or packet arrivals. One
can also use software tools such as tcpdump to investigate IP packets on a LAN. These
measurement techniques are non intrusive, in the sense that they do not modify the traffic,
and are often referred to as passive measurements. They differ from active measurement
techniques where artificial traffic is injected in the network, for instance to estimate link
bandwidth.
In this thesis, we focus solely on passive measurements, and use packet traces collected
with high precision hardware known as DAG cards [44]. These cards are plugged into the
Peripheral Component Interconnect (PCI) bus of a standard Personal Computer (PC) run-
ning the Linux operating system. They provide loss-less measurements of the link with
GPS synchronized timestamps [125]. The resulting traces gather the timestamp, the phys-
ical layer header and the first 40 bytes of the physical layer payload, which is sufficient in
most cases to extract IP and TCP header information. For privacy reasons, packet addresses
3.2. THE DATA AND DATA PROCESSING 43
on publicly available traces are systematically anonymized, and TCP and UDP payloads
removed. Specifications of IP, TCP and UDP headers are provided for reference in the
appendix.
We do not give further details on hardware considerations here since we did not collect
any of the traces presented in this section and used in chapters 3, 4 and 5. Further technical
details on DAG cards and physical layer overhead will be given in section 6.2.1, when we
present the data used in chapters 6 and 7, for which we were slightly more involved in the
collection.
The Internet traces we analyze have a range of link speeds and geographic locations.
We mainly study traces recorded by the WAND group at the University of Waikato in New
Zealand. These traces, the Auckland II and Auckland IV data sets, were collected on the
Internet access link of the University of Auckland, and are freely available on the web [177].
In fact we analyze subsets of these datasets, details of which are summarized in table 3.1.
We focus on two three hour periods during week days, 2:00 to 5:00 and 13:00 to 16:00
local time, corresponding to apparently stationary traffic rate for a ‘low’ and ‘high’ activity
period respectively.
We also study traces recorded by the Distributed Real Time Systems group [47] at the
University of North Carolina (UNC-a0 and UNC-a1), from the NLANR repository [130]
(NLANR-SDC and NLANR-TXS), from the Cooperative Association for Internet Data
Analysis [39] (CAIDA-b1) and from the Abilene Internet II [2]. These traces are used
to make sanity checks on our main results as they are from different geographical regions
and have different bit rates. The last three traces included in table 3.1 are from a small In-
ternet provider based in Melbourne, renamed MelbISP, and provide diversity in the packet
rate within individual flows, owing to the speed limitations of modems.
The raw traces are processed with the freely available CAIDA Coralreef tool suite [40]
and C programs, allowing the extraction of each IP packet header together with an accurate
timestamp. We first give a brief overview of simple traffic statistics obtained from IP, TCP
and UDP headers in section 3.2.2, before presenting the concept of IP flow in section 3.2.3
and the central observations of our empirical work in section 3.2.4.
3.2.2 First observations
For most of the traces studied, TCP represents 90% of the packets and up to 97% of the bytes
carried on the link. This shows that TCP is by far the dominant transport protocol on the
Internet. Moreover, when the measurements were taken, World Wide Web (WWW) traffic 1
1 We define WWW traffic as traffic on port 80, as well as ports 8080 and other web proxies.
Traces Date Time Rate Link

(local time) (Mbps)
AUCK-a0 1999/12/01 13:00 to 16:00 1.4 OC3 (155 Mbps)
AUCK-b0 2001/03/30 13:00 to 16:00 3.5 OC3
AUCK-c0 2001/04/02 02:00 to 05:00 0.3 OC3
AUCK-c1 2001/04/02 02:00 to 05:00 0.5 OC3
AUCK-d0 2001/04/02 13:00 to 16:00 3.6 OC3
AUCK-d1 2001/04/02 13:00 to 16:00 2.4 OC3
UNC-a0 2000/09/27 19:30 to 20:30 179.8 OC12 (622 Mbps)
UNC-a1 2000/09/27 19:30 to 20:30 44.8 OC12
NLANR-SDC 1998/11/26 90s peak period 11.0 OC3c (155Mbps)
NLANR-TXS 2002/01/10 90s peak period 22.5 OC3c
CAIDA-b1 2002/08/14 10:00 to 10:10 638 OC48 (2.5 Gbps)
Abilene 2002/08/14 10:00 to 10:10 418 OC48c
MelbISP-1 2000/04/25 19:00 to 22:00 0.03 Unknown
Table 3.1: Description of the traces: name, date of the recordings, time of the day analyzed,
utilization, link speed.
represented roughly 70% of all the TCP packets. These numbers might fluctuate, depending
on the ‘killer’ application at the time of measurements. For instance, one would expect that
peer-to-peer traffic would, as of 2004, represent a large portion of the TCP traffic, whereas
it was not very significant in 2001. Finally, the proportion of UDP packets might also be
different in 2004 traffic due to the advent of streaming media and voice over IP technology.
Another interesting observation concerns the IP packet sizes. Figure 3.1 shows the
packet size distribution for the UNC-a1 trace. The striking feature is that there are virtually
only three packet sizes on the link: 40 bytes, which corresponds to the minimum IP packet
size and is often an acknowledgment packet sent by the TCP receiver, another one around
600 bytes, and 1500 bytes, the maximum IP packet size for ethernet traffic. Packet
size distributions for other traces are very similar. This simple empirical observation shows
that the usual assumption made in the field of queuing theory where one often takes an
exponential distribution to describe packet sizes has no empirical backing.
3.2.3 IP flow decomposition
The information contained in IP, TCP and UDP headers allows IP packets to be categorized
into different flows, a notion central to our work already introduced in the previous chapter
and that we now develop further.
In the research community the generally accepted flow definition is a set of packets
with the same 5-tuple {IP protocol; source IP address; destination IP address; source port;
destination port}, and with a maximum nearest neighbour packet inter-arrival time T0 [35].
On the other hand, IP flows are defined slightly differently in a router where a flow can
0.8
0.6
F(p)
0.4
0.2
0
0 500 1000 1500
packet size p
Figure 3.1: Packet size distribution for UNC-a1.
be terminated due to: (i) timeout, but also (ii) protocol (FIN or RST packet sent by TCP)
or (iii) memory management (the flow is terminated by the router software exporting flow
statistics in order to free resources for new flows). Another definition worth mentioning is
found in [157] where an adaptive timeout based on flow characteristics is used. In the rest
of the thesis we adopt the first definition, i.e. 5-tuple with static timeout. The actual value
of the timeout T0 will be taken to be 64 seconds [40] for all the traces. A discussion on the
value of T0 will be provided in chapter 5.
This classification only uses IP level information and port information common to TCP
and UDP. It therefore gives a general framework to compare TCP and UDP flows. In the
case of TCP, it was found that the above definition gave a very similar classification to
that provided by tracking TCP connections by monitoring SYN, SYN-ACK, FIN and RST
packets, with the additional advantage of keeping track of late packets transmitted after
connection closure. This technique also captures the many connections that do not terminate
correctly, representing less than 1% of all connections.
Considerable computation is required to perform the packet and flow level analyses. The
UNC-a0 trace for example, consists of 2 Gigabytes compressed, contains 800, 000 flows and
77 million packets, all individually tracked. To run our C and Matlab programs, we used
a dedicated file server delivering compressed data off a RAID over Gigabit Ethernet to a
2.4Mhz workstation running Linux with 2 Gigabytes of fast memory.
From the raw data many different time series can be constructed. At the packet level,
where flows are not individually tracked, the key quantity is the set of arrival times tP (k) of
packets indexed in arrival order: k = 1, 2, · · · K. This time series defines the continuous time
point process X(t) = ∑ δ (t − tP (k)) of packet arrivals we wish to model, or equivalently the
inter-arrival sequence A(k) = tP (k) − tP (k − 1). At the flow level , statistics of individual
Flow arrivals: Y (t)
Packet arrivals: X(t)

Time
Figure 3.2: Flow decomposition of the packet arrival process X(t): the bottom axis shows
the arrival times of packets. Packets with the same 5 tuple have the same color
and are grouped together to form an IP flow. The flow arrival process Y (t)
illustrated on the top axis is formed by the first packet of each flow.
flows are collected. In addition to the set of arrival times tF (i) of flows defining the flow
arrival process Y (t) = ∑ δ (t − tF (i)) , the intrinsically discrete series P(i) and D(i), i = 1,
2, · · · I, give the number of packets and durations in seconds respectively of successive flows
(D(i) is only defined if P(i) > 1). We also located and stored, for each flow, a complete list
of packet inter-arrival times, which requires extensive computation and storage space.
Figure 3.2 illustrates the decomposition of Internet traffic into flows. The bottom axis
represents the arrival times of IP packets on a link, i.e. the packet arrival process X(t).
Once the flow analysis is done, packets with the same five tuple are grouped together as
represented by the coloured rectangles. The flow arrival process Y (t) is constructed by
taking the arrival time of the first packet of each flow.
3.2.4 Central observations: biscaling and heavy tails

Scaling behaviour
We first illustrate with empirical data the scaling phenomenon introduced in chapter 2. In
figure 3.3(a), the packet count for trace AUCK-d0 is plotted for different levels of aggre-
gation, as defined in equation (2.9). An aggregated Poisson process with same arrival rate
as the data is plotted for comparison in figure 3.3(b). The striking observation is that the
Poisson process becomes very smooth as the aggregation level increases, while the empiri-
cal data shows more variations. Figure 3.3(c) shows the variance of the packet count X(m)
in bins of size m as a function of the aggregation level m on a log log plot. For large time
scales, the variance decays as a power law:
Var(X (m) ) = O(m−β ), β ' 0.2. (3.1)
This is consistent with a long range dependent phenomenon with Hurst parameter which
(a) Aggregated data (b) Aggregated Poisson process

log2 (m) = 2 log2 (m) = 2
5000 5000
0 0
2000 4000 6000 8000 10000 2000 4000 6000 8000 10000
log2 (m) = 6 log2 (m) = 6
2000 2000
1000 1000
0 0
2000 4000 6000 8000 10000 2000 4000 6000 8000 10000
log2 (m) = 10 log2 (m) = 10
1500 1500
1000 1000
500 500
2000 4000 6000 8000 10000 2000 4000 6000 8000 10000
log2 (m) = 14 log2 (m) = 14
1000 1000
800 800
600 600
2000 4000 6000 8000 10000 2000 4000 6000 8000 10000
Time in seconds Time in seconds
(c) Variance Time plot (d) Wavelet Spectrum

0.031 0.12 0.5 2 8 32 128
22
Data Data
Poisson process 20 Poisson process
2
18
log2 ( Var( X (m) ))
log2 Var( d j )
0 16
14
−2
12
10
−4
8
0 5 10 15 −5 −3 −1 1 3 5 7
log2 (m) j = log2 ( a )
Figure 3.3: Packet arrival rate with aggregation level m for (a) measured IP packet arrivals
and (b) Poisson process.
(c) Corresponding variance-time plot, with ‘slowly’ decaying variance.
(d) Wavelet energy spectrum.
can be roughly estimated as H = 0.8. The variance of X(m) is said to decay ‘slowly’ by
comparison with the exponentially decaying variance of the corresponding Poisson process.
Although this aggregation technique gives a clear illustration of the LRD phenomenon,
and shows convincingly that the variance of the packet counts decays ‘slowly’ for large m,
it does not provide any confidence intervals on the estimation of H. Estimating the Hurst
parameter of a time series is by itself a research topic. A review of different techniques
can be found for instance in [168]. In the rest of this thesis we use wavelet based estimates
called Logscale Diagrams (LD), which were introduced in section 2.4.3.
Figure 3.3(d) shows the LD for the packet arrival process of trace AUCK-d0. The
30.5mus 977mus 0.031 1 32 1024

12
0.02 0.08 0.32 1.28 5.12 20.48 81.92 327.68
AUCK−a0 AUCK−a0
10 AUCK−b0 AUCK−b0
AUCK−c0 10 AUCK−c0
AUCK−c1 AUCK−c1
8 AUCK−d0 AUCK−d0
AUCK−d1 8 AUCK−d1
log2 Variance ( j )
UNC−a0 UNC−a0
6 UNC−a1
Var ( dj )
UNC−a1
Abilene 6 NLANR−SDC
Mel ISP−1 NLANR−TXS
4 Mel ISP−2
Mel ISP−3 4
2
2
0 (a) 0 (b)
−15 −10 −5 0 5 10 0 2 4 6 8 10 12 14 16 18
j = log2(a) j = log2 (scale)
12
0.02 0.08 0.32 1.28 5.12 20.48 81.92 327.68
AUCK−a0
AUCK−b0
10 AUCK−c0
AUCK−c1
AUCK−d0
8 AUCK−d1
log2 Variance ( j )
UNC−a0
UNC−a1
6 NLANR−SDC
NLANR−TXS
0 (c)
0 2 4 6 8 10 12 14 16 18
j = log2 (scale)
Figure 3.4: Biscaling in (a) packet arrival process X(t), (b) byte arrival process, and (c)
flow arrival process Y (t) across all traces
horizontal axis is labeled both in logarithmic scales (bottom) and in seconds (top). The
vertical axis gives the log energy at a given time scale. Each vertical line gives the 95%
confidence interval on the variance estimation at the corresponding time scale. Slopes are
estimated by a weighted linear regression, and lead to values of the local scaling parameters
with confidence intervals. For instance, a linear regression over the octaves 3 to 7 lead an
estimate of the Hurst parameter of H = 0.82, with confidence interval [0.78, 0.86]. The
wavelet spectral density of the Poisson process with same rate as the data is the horizontal
line, in accordance with equations (2.48) and (2.69).
The data LD exhibits a ‘biscaling’ behaviour, that is dual scaling regimes separated by
a distinct knee . The founding observation underlying our approach is the prevalence of
this biscaling in all the traces we studied, for both packet and byte counts. This has also
been reported by other researchers[62]. Figures 3.4(a) and (b) show the LDs of packet and
byte counts for most of the traces described in table 3.1. For ease of comparison the plot
ordinates have been normalised. At large scales the LRD is clearly seen in each trace, and
the ‘knees’ in the curves are distinctive and all located in a narrow band at about 1s. At
smaller scales evidence for scaling is also present which, although much noisier, recurs
(a) (b)
0 0
−0.2
−1
−0.4
−0.6
−2
log( Pr[ P > k ] )
log( Pr[ D > x ] )

−0.8
−1
−3
−1.2
−4 −1.4
−1.6
AUCK AUCK
−5 UNC −1.8 UNC
Abilene Abilene
Mel ISP −2 Mel ISP
−6 −0.5 0 0.5 1 1.5 2 2.5
0 1 2 3 4 5 6
log( k ) log( x sec )
Figure 3.5: Flows characteristics: (a) Heavy tailed body and tail of P (# packets in flows),
(b) Heavy tailed flow durations D.
consistently across traces. The fact that LDs of packet and byte counts have a similar shape
intuitively means that packet sizes have little impact on the correlation of the byte count
process. This explains why in the following we will focus more on the timing of packets
rather than on their size. This biscaling behaviour is also found in the flow arrival process,
as illustrated in figure 3.4(c).
Flow characteristics
We now make use of the information contained in each packet header to do a flow decompo-
sition of the traces, as illustrated in figure 3.2, and study flows in more detail. We start with
general characteristics, such as flow size P and flow duration D, and then illustrate packet
dynamics inside a TCP flow.
Figure 3.5(a) shows the remarkable power-law form of the distribution of P across
traces, and similarly for D in plot (b). This heavy tail behaviour is consistent with the
physical explanations of LRD given in section 1.4.2. In chapter 4 we will discuss the con-
sequences of the fact that P, in addition to a power-law tail that contains only around 1%
(depending on the exact definition of ‘tail’) of the mass, also has a distribution body which
is close to power-law, but with different parameters. In all cases results from the same group
(AUCK, UNC, MelbISP) are very consistent.
Let us now illustrate the packet arrival process within a TCP connection obtained from
measurements. Theoretical details of TCP mechanisms are not presented in this thesis since
they are not of primary importance for our work. A presentation of key concepts, such
as three way handshake, slow start phase or retransmission mechanisms can be found for
instance in [166]. Very briefly, a TCP server sends a certain number of data packets, cor-
(a) (b)
Sequence Number
260 260.5 261 261.5 262

Time in s
Figure 3.6: Packet arrivals in a TCP connection: (a) Close up on TCP mechanism in a long
connection. (b) Packet arrivals patterns vary wildly between connections.
responding to a window size, to a TCP client that will then send back acknowledgment
packets to the host. When the acknowledgment has been received, the server sends another
group of data packets. Figure 3.6(a) illustrates a rare ‘textbook’ TCP connection taken from
an Auckland trace. Packets sent by the server are plotted with their respective arrival time
and sequence number, and joined by a solid line. Packets sent by the client are plotted with
their respective arrival time and acknowledgment number, and joined by a dotted line. After
a packet loss detection, indicated by a sudden drop in the sequence number of the packet
being transmitted, the TCP connection goes back to a slow start phase with exponential
increase of its window size, followed by a linear increase of its window size in a second
phase. Figure 3.6(b) shows the packet arrival patterns for various measured TCP connec-
tions. Since sequence numbers are ignored, packet arrival times from the same TCP flow
are represented by vertical marks on a given axis. Time scales are different for different
flows.
The main observation is that any pattern of packet arrivals can be found in ‘real life’ TCP
connections: periodic, successive bursts, large periods of inactivity... Given the different
link speeds, paths, bottlenecks and window sizes, there is no obvious universal pattern of
packet arrivals within a measured TCP connection. This is an important point to keep in
mind when doing traffic modelling. Also, as seen in figure 3.5(a), the notion of ‘infinite’
source often used to model TCP behaviour proves to be a mathematical concept with little
empirical backing. Most TCP connections are in fact short and do not transmit enough
packets to exhibit the ‘textbook’ behaviour illustrated in figure 3.6(a), while a few are very
long. Many factors influence TCP connections, such as the type of application using the
connection, the cross traffic encountered through the network or the bandwidth of the access
3.3. FLOW ARRIVAL PROCESS 51
0.016 0.062 0.25 1 4 16 64 256 1024

16
AUCK2
AUCK4
14
12
log Var( d )
j
10
2
8
4
−6 −4 −2 0 2 4 6 8 10
j = log ( a )
2
Figure 3.7: Analysing the Flow Arrival Process Y (t): Logscale Diagrams for the Auck.II
(lower set) and Auck.IV traces. Each has LRD and a similar knee position j∗ .
point. The picture that emerges from the observations made in this section is as follows:
• Both the packet arrival process X(t) and the flow arrival process Y (t) have a
‘biscaling’ structure.
• Flows have a heavy tailed distributed number of packets.
• The structure of packet arrivals within a TCP flow is highly complex, with no
apparent dominant feature.
In the rest of this chapter, our aim is to get a better understanding of these observed
statistics. Keeping in mind that we want to understand the structure of the packet arrival
process in order to model it and answer question (i), we start with the underlying flow
arrival process Y (t) in section 3.3, and then examine X(t) in section 3.4. The exact impact
of the flow arrival process on the packet arrival process is studied in section 3.5.
3.3 Flow arrival process
In all the traces studied in this thesis, interesting structure for Y is consistently found.
Specifically it has LRD, as shown in figure 3.4(c), and has an onset scale or ‘knee’ where
the LRD begins which is very pronounced. In this section we examine Y more closely, in
particular the position of the knee as a function of network parameters.
Figure 3.7 superimposes LDs of Y across many of the Auckland traces: they are very
similar. The prominent features are the LRD at large scales, a clear knee at a characteristic
scale around 1s (top edge shows seconds), and at small scales evidence for another scaling
regime. The precise value of the LRD exponent varies but is typically around α = 0.6,
and will be discussed further in the next section. This biscaling behaviour is also seen
All Connections TCP
2 2
log (energy) 1 1
log2 (energy)
0 0
−1 −1
2
−2 −2
−3 −3
5 10 15 5 10 15
log2 (scale) log2 (scale)
UDP HTTP
−5 2
−6 1
log2 (energy)
log2 (energy)
−7 0
−8 −1
−9 −2
−10 −3
5 10 15 5 10 15
log2 (scale) log2 (scale)
Figure 3.8: Logscale Diagram for arrival times, for: All flows, TCP only, UDP only, and
HTTP. The scaling for HTTP is extremely similar to the global scaling.
consistently through subsets of IP flows. Figure 3.8 compares the LDs of Y and of different
subsets: TCP flow arrivals, HTTP flow arrivals and UDP flow arrivals. The plot for UDP
flows shows different behaviour at large scales, but there is still a transition at roughly the
same scale. The other LDs look almost identical.
The origin of the LRD in Y , in contrast to that in X, is at present unknown, and we do

not attempt to fully explain it here. A key issue is the lack of a visible mechanism which
could lend a rich structure to such a sequence of arrivals. The dynamics of TCP connection
generation in WWW browsing sessions however is an obvious candidate. The advent of
persistent TCP connections and connection pipelining allowed by HTTP version 1.1 [100],
suggests that such dynamics could be in the process of changing. To investigate this, we
plotted together in figure 3.7 the LDs for all AUCK2 traces (lower group), and AUCK4
traces, which were collected approximately one year later at a time when HTTP 1.1 was be-
ing deployed. The knee position is unchanged, however the slope at small scales is smaller
for AUCK4. Unfortunately it was not possible to check via the packet level logs whether
HTTP1.1, persistent connections, and/or pipelining was being extensively employed2 . The
2 see however [164] for an interesting method of inferring HTTP details from packet header measurements.
question of the reason for the change in slope therefore remains open. We do not attempt
to understand the detailed characterisation of the small scale regime of Y either, and for
simplicity we model it by a trivial flat spectrum, which is very accurate in the case of Auck-
land IV.
Instead, in this section we focus on the onset scale of LRD, which we also call the knee
position j∗ , both for its intrinsic importance as a characteristic scale whose origin is also not
understood, and because it has received little attention in the literature. Our aim is to find
networking parameters that will in some way be responsible for this onset scale. We start
our analysis by presenting a simple algorithm to determine the onset scale, before using it
to detect knee movements as a function of networking parameters.
3.3.1 Knee tracking algorithm
On most of the empirical time series studied, we found LDs consisting of two asymptotic
straight lines, separated by a knee. To detect this knee robustly and automatically, we
designed an algorithm based on detecting a consistent departure from a straight line fitted
over the smallest scales.
Since the data is noisy, the local slope is estimated by a 3 point moving average. A
threshold is preset to determine if the slope difference is large enough to be the start of a
new slope. To check that the new slope is meaningful, it was required to be close to constant
over three different octaves. Since we use a discrete wavelet transform, we only get a small
number of points on the logscale diagram and the estimated cutoff scale is an integer. One
way around this problem is to use wavelets interleaving to get discrete values at other scales
by changing the sampling period. However, this is computationally intensive. Instead, we
calculate the intersection of the local slopes on each side of the estimated integer cutoff
scales. This allows a spread of the resulting cutoff scale over the real axis without any
supplementary calculations. Examples of cutoff scale detection are provided on figure 3.10
(a) and (b).
3.3.2 Dependence on traffic characteristics
Our main approach is to study subsets of flows according to various criteria, in an attempt
to observe and quantify the parameters affecting j∗ . When looking at different time series,
the behaviours at large and small time scales show significant variation, ie. different values
of the slopes are found. In constrast, the existence of the change point, the knee, it very
persistent. Indeed, it is difficult to find time series whose knee position varies from others
of the same trace, or indeed of different traces. For example, figure 3.9 shows the logscale
SMTP Random HTTP, prob = 0.1

0
−5
−1
−6 −2
log2 (energy)
log2( S2(j) )
−7 −3
−8 −4
−5
−9
−6
−10
−7
5 10 15 5 10 15
log2 (scale) j
Figure 3.9: Searching for knee variation: subsets based on (a) protocol (SMTP), (b) ran-
dom thinning (10% of HTTP flow arrivals)
diagrams for two different subsets of an AUCK4 trace. The left plot shows the behaviour
for mail connection, which although different from HTTP, still shows biscaling. The right
plot selects 10% of HTTP flow arrivals randomly, and shows the same behaviour as the full
set displayed in figure 3.8.
As the clearest biscaling was seen in HTTP connections, which is also the dominant
traffic type, we focus on an analysis of HTTP connection arrivals. We include both alpha
and beta traffic, as defined in section 1.4.2, but exclude connections with less than three
packets, which constitute a surprisingly high (≈ 20%) proportion of arrivals. We attribute
these to failed connection attempts.
It has been suggested by Feldmann et al. [61] that the knee position at the IP level is
related to the round trip time of TCP connections. However, given the probable role of
HTTP sessions, where groups of connections are launched by a single download action,
and further groups may have to wait for the completion of the first, connection durations
may have a greater influence. Accordingly, in the next subsections we investigate these
dependencies in detail, although we concentrate primarily at the flow arrival level. We also
investigate a related question, of the dependence on (average) packet rate.
Duration dependence
In this section we investigate the relation between knee position in the LD of aggregated
arrival times, and connection duration. Since the same distribution of durations can be seen
over all the Auckland traces, we cannot get a single knee value for each trace and hope to
see a lot of variability. It is therefore necessary to divide the arrival times of each trace into
subgroups according to the durations of connections, and find the knee for each using our
previously described algorithm. By plotting these values against (for instance) the median
(a) (b) (c)

17
2 2 AUCK 2
16 AUCK 4
NLANR
0 0 15
14
−2 −2 13
log2(S2(j))
log2(S2(j))
12
−4 −4
11
j*
10
−6 −6
9
8
−8 −8
7
−10 −10 6
0 10 20 0 10 20 2−4 2−3 2−2 2−1 20 21 22 23 24 25 26
j j
durations (sec)
Figure 3.10: Tracking the knee position in Y (t): Examples of knee tracking for (a) first
and (b) 7th deci-quantile range of durations. The star marks the cutoff scale
found by the knee tracking algorithm. The knee position is clearly shifted to
larger values with increasing duration. (c) Knee position j∗ as a function of
median flow duration for the subsets. The dependence is linear, albeit noisy.
duration of the corresponding subgroup, we can look to see what the relationship is.
More precisely, we group flows (in fact TCP flows carrying HTTP), into 10 equal sized
subsets based on percentiles: the shortest 10% and so on up to the longest 10%. Using
the algorithm previously described, we find the knee position for the LD of the flow ar-
rival times in each of the subsets. This operation is illustrated in figures 3.10(a) and (b)
for two of the subsets. The resulting automatically measured knee values are plotted in
figure 3.10(c) against the median flow duration D̄ of the corresponding subset. A clear and
robust dependency based on flow duration D is found:
∗
t ∗ ≡ 2 j ' 3D̄, (3.2)
where t ∗ is the timescale associated to j∗ . The straight line with slope 1 on the logarithmic
scale is equivalent to equation (3.2).
We now comment on the way this interesting result was obtained. From figure 3.10(c),
the cutoff scales of the AUCK2 and AUCK4 datasets line up reasonably well for small
durations, while for larger durations the results are a bit more widely spread. The vertical
dotted line on figure 3.10(c) tries to quantify this phenomenon by separating ‘noisy’ from
‘less noisy’ measurement data points. In fact, recall from section 2.4.3 that the confidence
intervals on the estimation of S2 ( j) increase with the scale j, which means that the points
in the logscale diagram are less reliable at large scales. Therefore, any measurement based
solely on those points, such as the local slope used by the detection algorithm, is also bound
to be less accurate. This is why the values of the cutoff scales for large durations are widely
spread. The LD is sometimes so noisy that the detection algorithm fails entirely, as indicated
by the points at the bottom of the plot at scale −5 for large durations. Moreover, the cutoff
values corresponding to durations larger than 25 seconds are the ones obtained for the 10th
quantile durations and therefore include the ‘heavy tail’ of the durations distribution, which
make their estimation even more problematic. On the other hand, with the same argument
based on confidence intervals, the values of S2 ( j) at small to medium scales are very well
estimated, and the cutoff scale is found with great precision. In addition, in the Auckland
traces the knees obtained for small durations are also the ones that line up the best.
In order to check the sanity of our results, we conducted similar calculations with the
NLANR traces. Due to their relative short durations (90 seconds compared to 3 hours for
Auckland traces), a slightly different method had to be used to measure the knees. In fact,
we simply changed the number of subsets from 10 to 3 to obtain enough data in each subset.
Another aspect to consider is that the value of S2 ( j) can only be estimated up to scale 12
due to the limited duration. Given the location of the cutoff scale around 9 for the durations
considered here, this means that the results are inherently noisy. It is therefore quite striking
that the cutoff scales obtained gather around the line previously obtained with AUCK2 and
AUCK4 traces.
Round trip time dependence
The motivation to study the influence of the Round Trip Time (RTT) on the knee of the
flow arrivals LDs is based on the hypothesis, evoked in [61], that the cutoff scale in the IP
biscaling is related to the RTT. Using a similar method to that of the previous section, by
simply changing the criteria of connections selection from duration to RTT we performed
a cutoff frequency analysis on the AUCK2 and AUCK4 data sets. The results are shown
in figure 3.11(a) and indicate that there is in fact no obvious relationship. The RTTs were
calculated for all AUCK2 traces and one AUCK4 trace only.
These results seem to contradict the phenomenon described in [61] where the authors
showed using ns [106] that the RTT could influence the cutoff scale of the IP level traffic
(bytes per bin). They performed a simulation on a small network topology consisting of
a single webserver and 420 clients, modelling a small ISP environment. By changing the
delay of the access link, and therefore the RTT, they obtained a different scaling behav-
iour depending on the RTT. More precisely, they obtained a pronounced dip at the scale
corresponding to the RTT. However, the limited complexity of such a network makes its
conclusions difficult to apply given the extreme richness we observe in real traces. This
could explain why figure 3.11(a) does not show the dependency found in [61]. Another rea-
son could be an estimation issue: it is notoriously difficult to estimate RTTs from passive
(a) (b)
17 17
AUCK 2 AUCK 2
AUCK 4
AUCK 4
16 16
15 15
14
14
13
cutoff scale
13
cutoff scale
12
12
11
11
10
10
9
8 9
7 8
6
0 0.5 1 1.5 2 2.5 3 2−2 20 22 24 26 28 210 212 214
RTT (sec) rate (packets/second)
Figure 3.11: (a) Knee position as a function of the Round Trip Time. (b) Knee position as
a function of the rate. In each figure the rectangle marks the core of the scatter
plot (points lying between the 0.25 and 0.75 quantiles in both dimensions).
measurements since it involves reconstructing the TCP stack at the end host from measure-
ments taken at an unknown point in the network [164]. To alleviate this difficulty, we simply
evaluated the RTT of TCP connections by measuring the time delay between packets in the
three-way handshake during the connection establishment. However, we believe that it is
unlikely that a more sophisticated RTT estimation would lead significantly different results
because most connections are so short that network conditions can be considered constant
over the connection duration.
Rate dependence
The average rate of connections is another measure that we can use to group connections.
Repeating the same procedure, we plotted the cutoff scale as a function of the connection
rates on figure 3.11(b). Again, no clear dependency could be found between connection
rates and cutoff scale.
3.3.3 Reconstruction from subsets
We showed in the previous section a clear dependence of the knee position of Y on the flow
duration. We now analyze this dependency further by looking at how one can reconstruct Y
from the duration subsets.
Figure 3.12 shows the 10 subset LDs, obtained by averaging over the AUCK4 traces,
the sum of those 10 LDs, as well as the original LD of Y . The fact that the data is very close
from the sum of the subsets indicates that the subsets are roughly independent of each other.
This is in itself an interesting result.
Let us justify the averaging done on the LDs. For all the AUCK4 traces, we observed
such regularity in the LDs that we can consider that the arrival times of HTTP connections
0.004 0.016 0.062 0.25 1 4 16 64 256 1024

16
all
sum quantiles
14 0.1 quantile
0.2 quantile
0.3 quantile
12 0.4 quantile
0.5 quantile
10 0.6 quantile
log2(Var( dj ))
0.7 quantile
0.8 quantile
8 0.9 quantile
1.0 quantile
0
−8 −6 −4 −2 0 2 4 6 8 10
j = log2( a )
Figure 3.12: The LDs of the duration based subsets of Y , and the LD of their superposition
compared with data, averaged over 8 traces.
recorded for each trace are in fact different realizations of the same stationary stochastic
process. It therefore makes sense to average the LDs obtained for each trace to obtain a
less variable LD, plotted with a thick gray line on the figure. Moreover, if we assume that
the separation of arrival times is made with roughly the same quantile durations in each
trace, we can apply the same reasoning to the LDs obtained for a given subset across all
the traces. From figure 3.12, the resulting expected LDs for each quantile duration have
some nice features. First, they all line up at small scales. This is due to the fact that at
small scales the flow arrival process of each subset tends to a Poisson process, and that the
Poisson limit is the same for all subsets given that the subsets have the same number of
points. Second, the LDs all have roughly the same slope at large scales, meaning that the
LRD behaviour is essentially the same for all subgroups. More precisely, the estimation of
the Hurst parameter for each subsets gave results in the range [0.68, 0.74], to be compared
with the Hurst parameter of 0.7 for the total average. The only difference is really the knee
position.
From this we learn that the knee in the data can be understood as a smoothed ‘mixture’
of sharper knees corresponding to independent subsets of flows which, in an idealised limit,
would each have constant flow duration. Note that confidence intervals in the estimated
wavelet spectra grow with scale (not shown on figure 3.12 for clarity) and are such that the
differences between the LDs for j > 8 are not significant.
We now formally detail the quantity labeled ‘sum quantile’ in figure 3.12. Consider the
decomposition of the arrival times {tF (i)} in N subsets {tF (i)}(l) , 1 ≤ l ≤ N, according to
a given criteria. The point process of arrival times Y (t) = ∑i δ (t − tF (i)) can therefore be
written as
N N
Y (t) = ∑ Y (l) (t) = ∑ ∑ δ t − tF (i)(l) . (3.3)
l=1 l=1 i
Since the discrete wavelet transform is a linear operator, the wavelet coefficients dY ( j, k)
can be written as
N
(l)
dY ( j, k) = ∑ dY ( j, k), (3.4)
l=1
(l)
where dY ( j, k) is the wavelet coefficient at scale j and time k of the timeseries Y (l) (t).
Recall that by definition we have
n
1 j
S2 ( j) = ∑ |dY ( j, k)|2 .
n j k=1
(3.5)
Therefore
2
n
1 j N (l)
S2 ( j) = ∑ dY ( j, k) . (3.6)

∑
n j k=1 l=1
The sum of the subset LDs corresponds to a zeroth order approximation of S2 ( j) defined as
n
(0) 1 j N (l) 2
S2 ( j) = d ( j, k) . (3.7)

∑ ∑ Y
n j k=1 l=1
From further empirical studies of the correlation of arrival times between different sub-
sets, we found that subsets were mostly independent, with the strongest correlation found
between adjacent subsets. Based on these considerations, we propose a first order approxi-
(1)
mation S2 ( j) of S2 ( j) defined as
n
(1) 1 j (1) (1) (2)
S2 ( j) = ∑ dY ( j, k)2 + dY ( j, k)dY ( j, k)
n j k=1
N−1 l+1
+
(l) (m)
dY ( j, k)dY ( j, k) (3.8)
∑ ∑
l=2 m=l−1

(N) (N−1) (N)
+ dY ( j, k)2 + dY ( j, k)dY ( j, k).

3.3.4 Summary
In this section we have found that the flow arrival process Y (t) is LRD and that the onset
scale at which LRD ‘begins’ is linked to the flow durations through equation (3.2). A
physical explanation for this phenomenon is the topic of current research. We found no
obvious relationship with any other physical parameter. We also showed that the flow arrival
process Y (t) has a complex structure which can be decomposed as the sum of elementary
subsets based on flow durations. However, the structure of these elementary subsets has not
been explained yet. While the analysis of Y (t) is interesting in its own right, we now turn to
the analysis of the packet arrival process X(t) in section 3.4 since it is what we really seek
to understand to answer question (i).
3.4 Packet arrival process and semi-experiments
In the previous section we transformed Y (t) in selective ways in order to better understand
its internal structure. Here we will use and refine this technique, which we call the semi-
experimental method, to study X(t). It was first proposed in [56], and proves invaluable as a
means to track down the origins of, the connections between, and to selectively test models
of, portions of the traffic structure, without having to postulate a full model from the outset.
Our approach is to begin at the IP level, and progressively modify aspects of it to de-
termine the links to the arrival level and the source(s) of the scaling behaviour. Note that
we are only interested in transformations which have a physical interpretation in terms of
flow arrivals or packet structure within flows. We do not consider ‘black box’ modifications
based solely on bins, such as random shuffling of blocks of a given size as was done in [56].
There is a very large number of manipulations with a ‘physical’ sense one can perform
on the packet arrival process. In this section we restrict ourselves to the following three
categories of manipulations:
A: Flow Arrival manipulation,
P: Packet-in-flow manipulation,
S: Flow Selection manipulation.
We will illustrate the semi-experimental method on these manipulations and draw con-
clusions on the structure of the packet arrival process. We start with some basic manipu-
lations in section 3.4.1, and then present some more advanced ones in section 3.4.2. For
convenience a complete list of all the semi-experiments used in this thesis can be found on
page xxiii.
3.4.1 Basic manipulations
The results presented in this section give the fundamental empirical backing of the mod-
elling work developed in the next chapter. They are illustrated in figure 3.13, along side
some schematics corresponding to each manipulation class. Figure 3.13(a) shows the prin-
ciples of the flow decomposition introduced in section 3.2.3 and gives the LD of the original
trace AUCK-c1. A presentation of each manipulation class follows.
Flow arrival manipulation
The results of flow arrival manipulations are described in figure 3.13(b). The arrival process
of flows is modified in three separate ways of increasing severity, whilst maintaining in full
the integrity of the packet arrival patterns within each flow. Specifically:
3.4. PACKET ARRIVAL PROCESS AND SEMI-EXPERIMENTS 61
(a) Data
0.016 0.062 0.25 1 4 16 64 256 1024
20
18
16
log Variance( j )
14
12
10
2
8
4
Time
−6 −4 −2 0 2 4 6 8 10
scale j
0.016 0.062 0.25 1 4 16 64 256 1024
(b) [A-Pois] 20
Data
18 [A−Perm]
[A−Pord]
16 [A−Pois]
log Variance( j )
14
12
2 10
−6 −4 −2 0 2 4 6 8 10
scale j
0.016 0.062 0.25 1 4 16 64 256 1024
(c) [A-Pois; P-Uni] 20
Data
18 [A−Pois]
[A−Pois; P−Uni]
16
log2 Variance( j )
14
12
10
−6 −4 −2 0 2 4 6 8 10
scale j
0.016 0.062 0.25 1 4 16 64 256 1024
(d) [A-Pois; P-Uni; S-Pkt] 20
Data
18 [A−Pois]
[A−Pois; P−Uni]
16 [A−Pois; P−Uni; S−Pkt]
log2 Variance( j )
14
12
10
−6 −4 −2 0 2 4 6 8 10
scale j
Figure 3.13: Illustration of semi-experimental manipulations and results for trace AUCK-
c1. (a) Flow decomposition of the original data. (b) [A-Pois]: Flow arrivals
follow a Poisson process with randomized re-assignments. (c) [A-Pois; P-
Uni]: [A-Pois] combined with uniform packet arrivals within flows. (d) [A-
Pois; P-Uni; S-Pkt]: [A-Pois; P-Uni] combined with selection of ‘short’
flows only.
[A-Perm]: Permute flows around the original arrival points.
[A-Pord]: Retain original flow order, but re-position arrival times according to a
Poisson process with the same rate.
[A-Pois]: Combine the previous two: a Poisson arrival process with randomised flow
re-assignments. In other words, the flow arrival times are replaced by a sample path
of a homogeneous Poisson process (conditional on the observed number of flows),
the flow order is randomly permuted, and the flows themselves are then translated to
the corresponding new arrival times.
Figure 3.13(b) shows that none of these manipulations has any significant effect on the IP
level scaling, even [A-Pois], which completely erases the original flow arrival structure and
inter-flow dependencies. Two important inferences follow from this result:
• The biscaling structure in the arrival process is not responsible for the biscaling
structure at the IP level, and in fact does not influence it at either small or large
scales.
• Dependencies between packet processes across different flows are very weak.
The above inferences have important consequences. The first indicates that, at least in
terms of second order statistics, it is pointless to include properties of the arrival process
beyond the average rate in models of IP level traffic. This is significant as there is consid-
erable interest in hierarchal modeling approaches where packet level traffic characteristics
are derived beginning from a model of web session arrivals, leading to correlated launching
of TCP connections and so on. The second point indicates strongly that there is no synchro-
nisation (driven by TCP dynamics or anything else) between packet level processes across
flows.
Thus far, in terms of relevance for IP packets, we have an image of traffic as a collection
of entirely independent flows which are laid down in some independent way.
Packet-in-flow manipulation
After having ‘randomized’ the flow arrival process, we now show how in-flow packets can
be also ‘randomized’ with the following semi-experiment:
[P-Uni]: In each flow the first and last packet remain unchanged while the others are
uniformly distributed. In other words, if P(i) = 1 for flow i then the sole packet is
simply placed at its surrogate arrival point tF0 (i). If P(i) = 2 then the second point is
placed at t = tF0 (i) + D(i). If P(i) ≥ 3 then the P(i) − 2 internal points are indepen-
dently placed according to a uniform distribution over the duration of the flow. In this
way, the flow lengths are left unchanged while the packet dynamics inside flows is
totally randomized.
As seen in figure 3.13(c), the effect of randomising the packet patterns within flows is clearly
visible, although not overwhelming, and restricted to small scales. It is significant however
that the spectrum has become flat. From figure 2.2 we know that this does not necessarily
indicate that the process has become Poisson at small scales, however, as the level is equal
to the arrival rate, this is the case here. On the other hand the large scale behaviour seems
unaffected. Two tentative conclusions of note emerge from these observations:
• The scaling structure at small scales has its origin in the packet patterns within
flows.
• The LRD structure at large scales is not influenced by the packet level structure
within flows.
Flow selection manipulation
Through exploring the effects of both arrival and packet structure, we have been able to
isolate a source of small scale scaling in IP, however the large scale behaviour has remained
unaffected thus far.
After performing [A-Pois; P-Pois], the only original features of the traffic left, where
the origin of the LRD must lie, are the flow durations D(i) and the flow packet counts P(i).
To narrow down this statistical origin more precisely, we select flow subsets according to
the number of packets per flow. Figure 3.13(d) reports on the following manipulations:
[A-Pois; S-Pkt]: Combining flows with packet volumes below the 70% percentile
with randomised arrival times.
[A-Pois; P-Uni; S-Pkt]: Randomising packet arrivals in flows in addition to [A-Pois;

S-Pkt].
In [A-Pois; S-Pkt] we select only those flows with volume below the 70% percentile. The
result is the removal of the LRD, in keeping with the findings of [179] that show how the
LRD at the IP level can be explained by the heavy tailed distribution of file sizes, as already
mentioned in section 1.4.2.
The main conclusions we can draw from these basic semi-experiments are:
• The LRD in X(t) has origins in the heavy tailed nature of flow durations (a
known result), and does not have a component due to packet processes within
flows.
• When the concern is IP level modeling only, flows can be viewed as arriving as a
Poisson process, with no dependence on other flows.
3.4.2 Advanced manipulations
We now refine the observations made in the previous section by performing more advanced
packet-in-flow and flow selection manipulations.
Packet-in-flow manipulation
Although duration is a natural descriptor of a flow, it is a highly derivative one in that it is

a dependent function of both the traffic source, and the effect of the network. On the other
hand P(i) acts like an independent variable describing the source, and the average rate
R(i) = P(i)/D(i), i ≥ 2, combines source and link characteristics, since the average (and
peak) rate of a flow is conditioned by the bandwidths of links it traversed before reaching
the measurement point. We investigate the role of rate with a new experiment:
P-ConstR: Rescale the packet inter-arrivals within each flow i by a factor s(i) such
that the average flow rates are moved to a common value: R∗ = s(i)R(i), chosen here
to be the median rate.
The result of this manipulation is illustrated in figure 3.14. Despite preserving P(i) as well
as the individuality of packet structures within flows, the impact is notable: the entire large
scale behaviour is translated by a significant amount. In a similar way, one could define
a manipulation [P-ScaledR] where the packet inter-arrivals within flow i are rescaled by
a constant factor s such that the average flow rate of flow i becomes R0(i) = sR(i). The
corresponding LD is a simple time translation of the original LD by − log2 (s). These flow
rate manipulations bring the following comment:
• The packet rate within flows acts as a scale parameter.
This suggests that the focus should therefore be on rate rather than duration. One can
then extend the in-flow packet randomization so that D(i) is no longer preserved, but made a
linear function of R(i). A simple way to do this (in an average sense), is to do the following
manipulation
[P-Pois]: Within each flow separately, packet arrival times are replaced by a Poisson
process of the same rate. Flow arrival times, durations and sizes are retained in full.
0.016 0.062 0.25 1 4 16 64 256 1024

20
Original
18 [ A−Pois; P−Uni ]
[ A−Pois; P−Pois ]
[ A−Pois; P−ConstR ]
16 [ A−Pois; P−Pois; P−ConstR ]
14
log2 Var( dj )
12
10
−6 −4 −2 0 2 4 6 8 10
j = log2 ( a )
Figure 3.14: Small scales determined by in-flow structure, D can be taken as proportional
to 1/P (Note: [A-Pois; P-Uni]) and [A-Pois; P-Pois] are almost indistin-
guishable). Flow rate changes translate large scale behaviour
The two LDs corresponding to [A-Pois; P-Uni] and [A-Pois; P-Pois] are plotted in fig-
ure 3.14 and are almost indistinguishable. This shows that flows for which it would not
be appropriate to slave D(i) to rate (effectively to 1/P(i)), such as those with very large
gaps, have a negligible impact. This is also intuitively in accordance with a result given in
chapter 2 proposition 2.3.2 linking Poisson processes and uniform distributions.
Flow selection manipulation
We now report on three new flow selection manipulations:
[S-Thin]: Flow and packet structure is fully retained, flows thinned by rejecting with
probability 0.3.
[A-Pois; S-Dur]: Combining flows with durations below the 70% percentile with
randomised arrival times.
[A-Pois; P-Pois; S-Dur]: Randomising packet arrivals in flows in addition to [A-

Pois; S-Dur].
The LDs resulting from these new manipulations, as well as from some of the previous
semi-experiments, are presented in figure 3.15(a,b,c) for the trace AUCK-c1. Figure 3.15(a)
shows the results of the flow arrival manipulations described in the previous section, while
figure 3.15(b) illustrates the effect of [P-Pois] and [A-Pois; P-Pois]. The fact that these
two manipulations give such similar results simply reinforces the earlier conclusion that the
flow arrival process does not impact on the IP level. The effects of the new flow selection
manipulations presented here can be seen in figure 3.15(c). The random thinning [S-Thin]
18
0.02 0.08 0.32 1.28 5.12 20.5 81.9 328
Original
A−Perm
16
A−Pord
A−Pois
14
log2 Variance ( j )
12
10
4 (a)
0 2 4 6 8 10 12 14 16 18
j = log2 (scale)
18
0.02 0.08 0.32 1.28 5.12 20.5 81.9 328
Original
P−Pois
16
[A−Pois; P−Pois]
14
log2 Variance ( j )
12
10
4 (b)
0 2 4 6 8 10 12 14 16 18
j = log2 (scale)
18
0.02 0.08 0.32 1.28 5.12 20.5 81.9 328
Original
S−Thin
16
[ A−Pois; S−Dur ]
[ A−Pois; S−Pkt ]
14 [ A−Pois; P−Pois; S−Dur ]
[ A−Pois; P−Pois; S−Pkt ]
log2 Variance ( j )
12
10
4 (c)
0 2 4 6 8 10 12 14 16 18
j = log2 (scale)
Figure 3.15: Semi-experimental method applied to AUCK-c1 (a,b,c)
leads to a LD with the same shape as the original, with a variance which is approximately
70% of it, consistent with an i.i.d. superposition model, where variances simply add. In
contrast, in [A-Pois; S-Pkt] we select only those flows with number of packets below the
70% percentile. The result is the removal of the LRD, as already shown in figure 3.13. A
similar result is obtained with[A-Pois; S-Dur] , when a selection is made based on the 70%
percentile of D. The LRD of [A-Pois; S-Dur] is a simple consequence of the observations
of [A-Pois; S-Pkt] since we made D(i) a dependent variable.
Figure 3.15 (d,e,f) shows the same manipulations as figure 3.15(a,b,c) for a higher rate
trace, AUCK-b0. The results are very similar, although we observed two systematic differ-
ences in the outbound Auckland traffic during the peak period of the day: (i) a small flow
arrival dependence at the smallest scales (note the drop on the left in graph (d)), and (ii) a
0.02 0.08 0.32 1.28 5.12 20.5 81.9 328 0.004 0.016 0.064 0.256 1.02 4.1 16.4 65.5 262 1050
Original Original
20 30
A−Perm A−Perm
A−Pord A−Pord
18 A−Pois A−Pois
25
log2 Variance ( j )
log2 Variance ( j )
16
14
20
12
15
10
8
10
(d) (g)
6
0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 20
j = log2 (scale) j = log2 (scale)
0.02 0.08 0.32 1.28 5.12 20.5 81.9 328 0.004 0.016 0.064 0.256 1.02 4.1 16.4 65.5 262 1050
Original Original
20 30
P−Pois P−Pois
[A−Pois; P−Pois] [A−Pois; P−Pois]
18
25
log2 Variance ( j )
log2 Variance ( j )
16
14
20
12
15
10
8
10
(e) (h)
6
0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 20
0.02 0.08 0.32 1.28 5.12 20.5 81.9 328 0.004 0.016 0.064 0.256 1.02 4.1 16.4 65.5 262 1050
Original Original
20 30
S−Thin S−Thin
[ A−Pois; S−Dur ] [ A−Pois; S−Dur ]
18 [ A−Pois; S−Pkt ] [ A−Pois; S−Pkt ]
[ A−Pois; P−Pois; S−Dur ] [ A−Pois; P−Pois; S−Dur ]
25
[ A−Pois; P−Pois; S−Pkt ] [ A−Pois; P−Pois; S−Pkt ]
log2 Variance ( j )
log2 Variance ( j )
16
14
20
12
15
10
8
10
(f) (i)
6
0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 20
Figure 3.15: (continued) Semi-experimental method applied to AUCK-b0 (d,e,f) and

UNC-a0 (g,h,i).
smaller LRD exponent for flow arrivals (figure 2c). We speculate that this could indicate
some traffic shaping at small scales. The third column in figure 3.15 shows the results of the
same manipulations for trace UNC-a0 which was recorded in a different location and has a
rate 3 orders of magnitude higher than AUCK-c1. The fact that they are again very similar
indicates that the findings presented in this chapter are of wide applicability.
3.4.3 Summary
The main conclusions we can draw are:
• The LRD in X(t) has origins in the heavy tailed nature of flow durations (a
known result), and does not have a component due to packet processes within
flows.
• When the concern is IP level modeling only, flows can be viewed as arriving as a
Poisson process, with no dependence on other flows.
Although further validation from an even wider range of processes is desirable3 , we

can tentatively answer the question in the introduction as follows: the fractal scaling at
the IP level does not depend to any significant extent on the TCP arrival process. These
empirical findings, summarised in figure 3.13, constitute the corner stone of the modelling
work presented in chapter 4. They also have a great impact by themselves since they justify
for the first time a very common assumption of traffic modelling which consists in modelling
the flow arrival process by a Poisson process.
3.5 Impact on packet arrival process
In the previous section we examined empirically the impact of the structure of Y on X

using Internet traces from a number of sources, with a focus on the potential dependencies
between their scale invariance properties. Surprisingly, aside from the first order statistic,
the stationary arrival intensity, we found that the influence of Y was negligible.
Because of the divergent growth of low frequency power characteristic of LRD, it will
not necessarily be the case that Y never has an impact on X. In this section, we use the
knowledge on the scaling behaviour of Y gained from section 3.3 to understand its link with
X. More specifically, we use the same methodology as in section 3.4 and develop new semi-
experiments. What we find enables us to clearly explain when and how Y might impact on
X.
3.5.1 Flow volumes manipulation
We start by studying the influence of flow volumes. In each plot in figure 3.16 the upper
grey curve is the LD of X for our chosen trace, whereas the lower dashed grey curve is
for Y . It is natural that the LD for Y lies below that of X. In the LD a uniform unit drop
of 1 corresponds to halving the variance, and can be understood very roughly as a global
reduction by 2 in the total number of packets. Although the points of Y have an important
structural significance, at another level Y is simply a subset of X comprising, for Auckland
data, around 6% of its points.
We now introduce a new kind of manipulation to complement the three categories A
3 Results from semi-experiments on other traces will be presented in chapter 7
3.5. IMPACT ON PACKET ARRIVAL PROCESS 69
(Flow Arrival manipulation), P (Packet-in-flow manipulation) and S (Flow Selection ma-

nipulation) already presented in section 3.4:
T: Flow Truncation manipulation.
This manipulation consists in truncating flows after a number q of packets. If the original
flow has less than q packets, it remains unchanged. This is different from the flow selection
semi-experiment since here the flow arrival process Y is preserved.
Figure 3.16(a) shows the result of the semi-experiment [A-Pois;T-Pkt], where we have,
in addition to A-Pois, truncated flows after the first q packets, in this case at q = 6, the
60% percentile. The resulting dramatic elimination of the LRD is consistent with what we
observed in the previous section. The considerable drop in level in the LD follows from the
fact that the heavy tailed nature of P results in a very small proportion of flows containing a
notable percentage of total packets.
Thus far the structure of X seems very unproblematic, and the correlation structure
of Y irrelevant to it, however things are not as simple as they would appear. In the third
experiment, [T-Pkt], the flow arrivals are not altered in any way, but the same packet volume
truncation is made. In apparent contradiction to our previous conclusion, the LRD has
‘returned’ despite the absence of the heavy tail of P. Furthermore, the difference between
[T-Pkt] and [A-Pois;T-Pkt] is dramatic, apparently contradicting our first conclusion that
Y has no influence.
To explain this apparent paradox, the first observation is that since Y is LRD, then
so must be [T-Pkt], as for any truncation level it includes Y as a subset. This LRD was
obscured previously through the ‘noise’ of the dominant LRD generated by the heavy tail
of P. To explore this in more detail, observe that with a truncation level of 100% (q = ∞), the
truncated process [T-Pkt] is simply X, and when q = 1, it is Y . Thus as the truncation level
q drops, the truncated process [T-Pkt] passes from X(t) to Y (t). The evolution toward Y is
particularly easy to see when q is small, and takes an especially simple form at large scale.
There, the packets of a given flow appear co-located compared to the scale of observation, so
that [T-Pkt] is approximately just Y (t) scaled up by some factor, corresponding to a vertical
shift in the LD. This is seen in figure 3.16(a) at scales beyond j = 1.5, corresponding to the
scale of average duration after truncation.
3.5.2 Knee position manipulation
We have seen how Y , although of negligible influence on X over scales up to j = 11 or 1

hour, is present just behind the scenes with a potentially influential LRD. To examine the
0.016 0.062 0.25 1 4 16 64 256 1024

20
X(t) Data
18 Y(t) Data
16 [A−Pois]
[A−Pois; T−Pkt]
14 [T−Pkt]
12
log Var( d )
j
10
2 8
0
(a)
−2
−6 −4 −2 0 2 4 6 8 10
j = log2 ( a )
0.016 0.062 0.25 1 4 16 64 256 1024

20
X(t) Data
18 Y(t) Data
16
[A−Clus1]
Y1(t)
14
[A−Clus2]
12 Y2(t)
log2 Var( dj )
10
0
(b)
−2
−6 −4 −2 0 2 4 6 8 10
j = log2 ( a )
0.016 0.062 0.25 1 4 16 64 256 1024

20
X(t) Data
18 Y(t) Data
16
[S−Dur1]
Y1 (t)
14
[S−Dur2]
12 Y2 (t)
log2 Var( dj )
10
0
(c)
−2
−6 −4 −2 0 2 4 6 8 10
j = log2 ( a )
Figure 3.16: Semi-experiments: impact of Y (t) on X(t). (a) Manipulating arrivals: [A-
Pois], flow volumes: [T-pkt], and both: [A-Pois;T-pkt], (b) Manipulating
the knee j∗ of Y : [A-Clus] - the effect on X is large for small j∗ , (c) Looking
at different j∗ using flow subsets: [S-Dur] - the results are weighted by their
‘packet impact’, long durations dominate.
question of when, if ever, this LRD can rise to prominence at the packet level, we consider
the impact on X of the knee movement in Y found in section 3.3.2. In a new type of semi-
experiment, [A-Clus], the original flows are translated (without permutation) to begin at
3.5. IMPACT ON PACKET ARRIVAL PROCESS 71
the points of a LRD Poisson cluster process sample path with matched average intensity.
Poisson cluster processes were introduced in section 2.3.2, and will be used extensively in
chapter 4 to model X. Here they serve simply as a convenient parametric class to model Y
which allows us to easily reproduce, in a black box fashion, a flat spectrum at small scales
and LRD at large scales, with a controllable knee position. A stationary Poisson cluster
process consists of a Poisson process of rate λS defining the locations of ‘seeds’, about
which a group of points are placed according to i.i.d. copies of another process, chosen
here to be a finite Poisson process of rate λA beginning at the seed. In fact 1/λA is a scale
parameter for the process. Increasing λA simply translates the spectrum, and hence the entire
LD, toward smaller scales, a simple way to adjust j∗ .
Figure 3.16(b) shows two different [A-Clus] experiments in addition to the data. The
Y processes are also plotted to show the very different knee positions chosen for the two
experiments. The knee for Y2 was put at a larger scale than j∗ . Not surprisingly, the cor-
responding X process, [A-Clus2 ], shows little change, as the LD for Y2 is below that of Y
and so contains even less energy. In contrast, the knee for Y1 is at a scale which is small
enough so that its LRD in fact does have a significant impact on the overall packet process
[A-Clus1 ], both in terms of the knee position and the spectrum at scales beyond it.
3.5.3 Flow subsets manipulation
The last observation above illustrates a principle which is in contradiction to the original
[A-Pois] conclusion, that the finer structure of Y plays no role. We therefore also performed
experiments using the Selection of flow subsets method of section 3.4, in order to induce a
change in j∗ without imposing it across all flows in such a uniform manner. Two subsets,
each containing 10% of flows, were selected based on duration ranges designed to give a
wide contrast in j∗ .
To obtain a j∗ value at large scale a subset consisting of the longest 10% of flows was
selected, yielding Y2 as seen in figure 3.16(c). Despite a knee around j = 6.3 for Y , the
reconstructed packet level process [S-Dur2 ] is very similar to the original X, with a knee
around j = 0.4. This result is in agreement with the corresponding one from figure 3.16(b),
but it also contains an additional important element. In this case Y2 only contains 10% of
flows, yet [S-Dur2 ] accounts for about half (48%) of the spectrum of X. This is a clear
indication that the tail of P, which strongly influences the flows with the longest durations,
is disproportionately responsible for the form of the LD of X.
To obtain a contrasting Y1 with a j∗ value at small scale we do not select the very shortest
flows, as flows with just a single packet have somewhat different properties which would
complicate the analysis. Instead, a subset totalling 10% of flows is selected by choosing
the shortest flows which have at least 2 packets. In figure 3.16(c) the smaller j∗ = −1 of Y1
translates to an earlier knee in the packet level process [S-Dur1 ] which looks quite different
from X, again in agreement with the corresponding experiment from figure 3.16(b). A key
difference however, it that instead of [S-Dur1 ] being well above [S-Dur2 ], it is in fact well
below it. The subset corresponding to shorter durations has considerably less energy than
that of the longer durations despite the delayed entry of the former’s LRD.
We can now give a coherent picture explaining the above observations. From sec-
tion 3.3.2 we know that flows of different durations have different knee positions, and from
the experiments of figure 3.16(b) we know that, as a result, for a small enough flow dura-
tion the LRD of the corresponding subset of Y can indeed impact on the spectrum of X.
However, it is essential to consider the impact at the packet level of any given subset of
flows. Although average packet rates within flows vary widely, broadly speaking the flows
with a very large number of packets are naturally also very long. Thus, the subset of X
corresponding to the flow subset with the longest durations contains the strong LRD due
to the heavy tailed packet size distribution, and simultaneously the weakest portion of the
LRD from Y . Conversely, in the case of short durations the LRD from Y is stronger, but the
number of packets corresponding to it is far lower, resulting in a small subset of X with low
energy. The findings here will be complemented by further analysis in chapter 4, where we
will show that the body and the tail of the distribution of P has a strong influence on both
the LRD and the knee position of X, and therefore that the overall behaviour of the wavelet
spectrum is strongly influenced by this ‘packet-level impact’ weighting.
Thus far we have not discussed the role of the comparative values of the LRD exponents
of X and Y . This is because, in the traces we have studied, the exponents for the two are
roughly similar, which leaves the knee position as the key feature to understand. Clearly, if
the exponent for Y were much greater than that of X, then its impact would always show up
for sufficiently large scale, and in practice would make itself felt more often and at a smaller
scale.
3.5.4 Summary
In this section we showed that the flow arrival process Y could impact on the second order
properties of the overall packet process X should certain circumstances be met. Flows of
small duration have onset scales at small enough scales to allow their LRD to impact the
spectrum of (the corresponding subset of) X despite the fact that the packets marking the
beginning of flows constitute only a small proportion of total packets. We were able to
3.6. CONCLUSION 73
explain why the LRD of Y has little impact despite this observation, by showing that the
heavy tailed nature of the number of packets in flows means that the spectrum of X is very
heavily weighted towards the flows with the most packets, which also have the longest
durations. These flows have the longest onset scales for Y , and so the impact of the LRD of
Y is the weakest precisely for the most important flows. The current balance between the
two sources of LRD, and their impact, could change if flows of smaller duration increased
in importance in terms of their proportion of overall packets.
3.6 Conclusion
In this chapter, we presented the first set of traffic measurements used in this thesis and
we carried out a detailled analysis of the physical causes of the observed statistics. These
results constitute the cornerstone of the thesis since they provide the starting point of our
modelling work. We studied both the flow arrival process Y and the packet arrival process
X. Using mixtures of real data and models we call ‘semi-experiments’, we showed using
a second order wavelet analysis that in current traces the process Y does not impact on the
second order properties of the overall packet process X. One can in fact replace Y by a Pois-
son process and consider that flows are independent as far as the study of X is concerned.
This fact has important implications for traffic modelling and performance analysis. In par-
ticular, these empirical findings contradict modelling approaches which postulate the need
for ‘session level’ structure linking flows, at least for the lightly loaded links studied here.
Chapter 4
Cluster processes, a natural language
for network traffic
4.1 Introduction
In this chapter we propose the use of a particular class of point processes, Poisson cluster
models, to model the IP packet arrival process X. These models are relatively simple, yet
strongly motivated by empirical features of traffic, in particular the role of flows, and their
tractability allows the quantitative investigation of key properties as a function of meaning-
ful network parameters. They are also easily synthesized, and have marginals which are
intrinsically positive. Through these models we are able to give strong answers to several
outstanding questions, and clarify many issues. Cluster processes have been used to model
various phenomena, ranging from computer failure patterns [110] to forest fire spreading
[17] and rainfall events [41]. We are not aware of prior applications to IP packet traf-
fic modeling. Very recent applications of cluster processes in networking have concerned
HTTP request arrivals [103] and TCP packet losses [181].
This chapter builds on the empirical findings presented in chapter 3. The starting point
is the surprising observation that the scaling seen in the point process of packet arrivals
X is broadly similar to that found in the arrival process of flow arrival points Y . Namely,
clear LRD at large scales, evidence for a second, though less clear, scaling regime at small
scales, and a transition scale at around 1 second separating them. This similarity led to the
question, in what way are the twin scaling regimes at the IP level due to or influenced by
the corresponding features at the flow level? Of the conclusions, the following, based on a
second order wavelet analysis, directly inspires the models we investigate here:
• The scaling in the flow arrival process is not responsible for that at the IP level, and
further, it does not influence it significantly at either small or large scales.
75
76 CHAPTER 4. CLUSTER PROCESSES
• Dependencies between packet arrival processes across different flows are very weak.
• The structure at small scales has its origin in the packet patterns within flows.
• The LRD has its origins in the heavy tailed nature of flow volumes (a known result),
and does not have a component due to packet processes within flows (new result).
• The packet rate within flows is a scale variable.
These findings are consistent with recent work of [182] and have two very strong im-
plications for traffic modelling. They suggest that, for the purpose of modelling the overall
process of IP packets, flows can be treated as statistically independent. Thus, the point
process of packet arrivals is seen as the superposition of independent point processes, one
for each flow. Second, the lack of impact of the detailed nature of the flow arrival statistics
suggests that they can be effectively modelled as a Poisson process. Finally, the isolation of
the LRD as a property of the number of packets per flow, allows them to be modelled using
simple and intuitive heavy tailed ingredients. Cluster models are ideally suited to modelling
the above features.
One of the main goals of this thesis is to explain all forms of scaling present in both
statistical and networking terms in order to answer question (i). We contribute substantially
to this issue in this chapter. Through a model with a firm physical basis, we show that there
are good reasons to believe that there is in fact no true scaling behaviour at second order
over small scales, which in turn implies no true multifractal behaviour over those scales. We
also provide explicit formulae capable of predicting the onset scale of LRD as a function of
meaningful parameters.
Another goal of this chapter is to contribute to a clarification of the meaning and role
of the elephant (large but rare) and mice (small but numerous) flow concept which has
become popular in describing packet traffic. Rather than proposing fixed definitions of these
categories, we let the data speak for itself and point out the orthogonal roles of ‘volume’
versus ‘rate’ based approaches, and the importance of time-scale.
The chapter is structured as follows. In section 4.2 we present the data analysis under-
lying the choice of the models, based on the findings of chapter 3. Section 4.3 is the main
part of the chapter, where the cluster models are introduced, their properties given, and the
fit to the data examined. Further analyses on the data are then performed, leading to sug-
gested refinements to the model in section 4.4. Section 4.5 uses the model to examine in
a well defined context the question, “Does traffic become more bursty or more Poisson as
link rates increase?”, and related issues. In section 4.6 we investigate higher order statistics
of both the data and the model. We conclude in section 4.7.
4.2. EMPIRICAL OBSERVATIONS 77
4 4
−2
−2.5
(a) (b)
−2.5
3
−3
−3
log( P )
log( P )
−3.5
2 2
−3.5
−4
−4
1
−4.5
−4.5
0 −5 0 −5
−2 −1 0 1 2 3 −2 −1 0 1 2 3
log( R ) log( R )
4
2.5
(c)
2
log( P )
1.5
2
0.5
0 0
−2 −1 0 1 2 3
log( R )
Figure 4.1: Examining flow variability (AUCK-d1). (a) Flow density plot over (R(i), P(i)),
showing high mass over a distribution of rates, (b) Packet density plot, (flow
density weighted by number of packets), (c) Coefficient of variation per flow.
In the main high mass region flows are overdispersed.
4.2 Empirical observations
We start this chapter by presenting more empirical justification for the choice of our traffic
model. We recalled in section 4.1 the main physical reasons behind this choice. Here we
further examine flow variability to find a simple model for in-flow dynamics.
We first consider flow behaviour as a function of the ‘quasi independent’ variables: av-
erage rate and flow volume. This is a direct consequence of section 3.4.2 where we made
D(i) a dependent variable. Because P is discrete, a scatter plot of (R(i), P(i)) hides mass
along discrete lines and is very misleading. We therefore discretise the scatter plot to form
the density plot, figure 4.1(a), where each square in the (R, P) plane is shaded according to
the number of points within it. The mass is highly concentrated (most flows have a small
number of packets), so a logarithmic scale is used to greatly enhance the outer regions. For a
fixed packet volume, the average rates cover a wide range, and similarly a flow with a given
rate may contain many packets, or as few as the minimum of 2. Furthermore, although
the spread of values indicates high variability across flows, we do not see any bimodality
which would suggest a need to classify flows into two or more classes. Simplifying things
somewhat, the picture that emerges is that, in the range of rate values where the density is
highest, the packet volume distribution is approximately independent of rate (and is heavy
tailed). In figure 4.1(b) we give packet density rather than flow density, in effect weighting
plot (a) by the packet impact of each underlying flow. The dark elements at large P(i) cor-
respond to volume-elephant flows, which have an appreciable packet impact despite arising
from a very small percentage of flows – they were invisible in plot (a). Our conclusions are
not altered however, the epicentre of activity is still located at the dark region of plot (a).
We return to the question of elephants in section 4.4.2.
Figure 4.1(c) gives the value of the index of dispersion σ /µ of the inter-arrivals within a
flow, calculated individually for each flow with at least 3 packets, then averaged over squares
in a log-log plot. We see that they cover a wide variety of values, but the most extreme of
these are not in the main region of high mass as revealed in figure 4.1(b), which is of the
same trace and shares the same scale1 . On the contrary, the values in the main high mass
region are reasonably uniform, with a weighted average value around 1.4: over-dispersed
compared to Poisson, but not extremely so.
We now disregard flows, and examine the inter-arrival series A(k). Figure 4.2 shows its
histogram for AUCK-d0, which fits well to a Gamma random variable with σ /µ = 1.29.
The autocorrelation in plot (b) is negligible over small lags (small scale). Similar results
apply for other traces, but it should be remembered that the time scale corresponding to a
lag varies inversely as the packet rate. Whilst these results are true as such, they are in fact
misleading. This can be revealed using a multiscale analysis and explained using a cluster
model.
4.3 Cluster models
In this section we define and evaluate two models for the point process X(t) of packet
arrivals, inspired by the observations of section 4.2 .
4.3.1 A black box model: gamma renewal
As already detailled in section 2.3.2, a renewal process is a simple point process where
the inter-arrival variables {A(k)}, k ∈ Z, are i.i.d. We will examine its utility as a direct
model for the inter-packet times. Although we seek meaningful constructive models rather
than those of black box type, there are good reasons to first examine a renewal model.
1 Atvery small rate, we have a small number of very regular flows. These, due to TCP keepalive packets,
have little impact.
4.3. CLUSTER MODELS 79
(a) (b)
1
1 Data
Gamma distribution
0.8
0.8
autocorrelation
0.6
Pr[ A ≤ x ]
0.6
0.4
0.4
0.2
0.2
0
0
−6 −5 −4 −3 −2 −1 0 50 100 150 200 250 300
log( x ) lag
Figure 4.2: Examining the Inter-Arrival Process (AUCK-d0). (a) The inter-arrival dis-
tribution, with fitted Gamma distribution (shape= 0.6, mean= 1.2ms). (b) The
autocorrelation of (detrended) inter-arrivals.
First, figure 4.2(b) directly suggests it. The second reason is that a renewal process has
the potential to generate scaling (or apparent scaling) behaviour at small scales, as will be
shown in figure 4.3. The possibility of gaining a statistical understanding of this effect in a
very simple context is worth pursuing. Finally, the spectrum of a renewal process plays a
direct role in the cluster models introduced in section 4.3.2.
From section 2.3.4, the spectrum of the continuous time renewal process X(t) is
h i
ΓX (ν) = λA (1 − ΦA (ω))−1 + (1 − ΦA (−ω))−1 − 1 (4.1)
where ΦA (ω) = E[exp(iωA)] is the characteristic function of the inter-arrival distribution,

and ω = 2πν is the unnormalised frequency. Figure 4.2(a) justifies a Gamma distribution
for A, with characteristic function (c.f.) ΦA (ω; b, c) = (1 − ibω)−c , where c > 0 is the shape
parameter. The exponential case is c = 1, corresponding to the Poisson process. As b is a
scale parameter, ΦA (ω; b, c) = ΦA (bω; 1, c). The mean and standard deviation are given by
√ √
µA = bc, σA = b c, the coefficient of variation by σA /µA = 1/ c, and the arrival intensity
λA = 1/µA . The following properties of the Gamma Renewal (GR) spectrum hold:
h 1 (c2 − 1) i λA
ν→0
ΓGR (ν) = λA + (bω)2 + O(ω 4 ) → (4.2)
c 12c c
h 2 cos(cπ/2) i
+ o(ω −c ) → λA .
ν→∞
= λA 1 + (4.3)
(bω)c
One can show that, in the over-dispersed case (c < 1) of interest here, IR(ΦA (ω)) is monotonic
decreasing, from which it follows that the spectrum is also. Since a monotonic spectrum
implies a monotonic wavelet spectrum, the Logscale Diagram of GR with c < 1 monoton-
ically increases from the asymptotic level log2 (λA ) up to log2 (λA /c), as in figure 4.3. The
small scale asympotic level is that of a Poisson process, also of rate λA . However, this limit
is not specific to Poisson but is due to the general point process property that points do not
coincide.
30.5mus 977mus 0.031 1 32 1024
1.3
1.2
1.1
log2 Var( d j )
1
0.9
0.8
0.7
0.6
0.5
−15 −10 −5 0 5 10
j = log2 ( a )
Figure 4.3: Pseudo scaling of a gamma renewal process (shape=0.6, mean=1.2)
Figure 4.3 illustrates how, for a range of scales close to the upper asymptotic level, the
LD of a GR process can appear to follow a straight line, a ‘pseudo scaling’. To quantify
this, we define a lower cutoff frequency ν ∗ where the spectrum can be said to ‘first’ deviate
from its asymptotic value. Fix a deviation parameter ε ∈ (0, 1). Define ν ∗ as the smallest
ν such that the second term of equation (4.2) deviates from the first by ε times the distance
λA |(1/c − 1)| between the asymptotic levels. The result, which respects the role of the scale
parameter b, is
1 12ε 1/2
ν∗ = , c > 0. (4.4)
2πb c + 1
∗ = − log ν , is marked by asterisks in figure 4.3 (ε = 0.1). Expres-
The LD equivalent: jGR 2 ∗
sions for the centre of the zone where such a pseudo scaling exists, and its slope can also
be derived, allowing predictive tests of the model. Approximate expressions for c ∈ (0.2, 1]
c (b, c) ≈ 1/b · 1, and α (c) ≈ (1 − c)/4.
are given by jGR GR
The model is easily calibrated through the sample mean and variance of the inter-
arrivals. Comparing the resulting GR wavelet spectrum against the AUCK-c1 trace in
figure 4.7(a), we see reasonable agreement at low scales and up to the onset of LRD. In
general however the predictive ability of the GR model fails badly. The reasons for this
become clear when one moves to the cluster model and result in useful insights, as we
presently show.
Our final but important comment relates to the pitfalls in interpretation that ‘pseudo
∗ (b, c) is the same order of magnitude
slopes’ can cause. Since, for realistic values of c, jGR
as µA , for both practical and physical reasons one is led to focus analysis on scales above
it. This is standard practice in traffic analysis, as it seems inefficient to study time series
which are mostly zeros. We have verified that if one does so, pseudo-slopes exist not only at
second order but also more generally. Consequently if one performs for example a wavelet
multiscaling analysis of the type described in section 2.4.3 over a range of moment orders,
one finds empirical indications of multiscaling (possibly multifractal) behaviour. This can
lead to an erroneous belief that the data is much richer than a mere renewal process when
in fact in this respect it is entirely consistent with it. Indeed, it is likely that in many cases
the evidence for multifractal behaviour over time scales below 1s has been mis-interpreted.
Although we have not yet presented results beyond second order, it is clear that if scaling
(over some given scale range) is only apparent at second order, then the process cannot be
multifractal, as multifractality would imply true scaling over a range of orders including
second order. More details on this issue will be given in section 4.6 (see also [182]).
4.3.2 A flow based model: Bartlett-Lewis point process
The main observations of sections 3.4, that flows can been seen as independent entities
arriving according to a Poisson process, fit naturally into a cluster process framework. Re-
call from section 2.3.2 that a stationary Poisson cluster process on the real line consists
of a Poisson process defining the locations of ‘seeds’, about which a group of points are
placed according to i.i.d. copies of another process. In a harmless abuse of notation, sym-
bols already defined for the data will be reused. For convenience, a list of the traffic model
parameters is given page xxii.
Let the arrival times {tF (i)} of flows (the seeds) follow a Poisson process of rate λF . The
packet arrival process can be written as
X(t) = ∑ Gi (t − tF (i)), (4.5)

i
where Gi (t) represents the arrival process of packets within flow i. Let the {Gi } be i.i.d.,
and consider a representative G (t). It is a point process containing a finite number P ≥ 1 of
points (packets), with the first located at t = 0.
Determining an appropriate process for Gi (t), given the complexity of TCP dynamics
(see figure 3.6(b) page 50) and network heterogenity, is a challenge. An interesting fluid
model approach can be found in [18], but we focus on point processes here. Recall how-
ever from section 3.4.2 that the manipulations [P-Uni] and [P-Pois] showed that simple
‘constant rate’ models accounted for most of the second order properties seen at the packet
level. A (finite) renewal process model is a simple way to obey this finding which has the
advantage of falling within the theoretical framework of Bartlett-Lewis cluster processes
(BLPP) already introduced in section 2.3.2.
We choose the inter-arrival random variable A to be Gamma distributed (with charac-
teristic function ΦA (ω; b, c)), for several reasons. First, it has a scale parameter, making
P(i) packets
A i (l)
0 t F (i)
packet j packet j+1
t
Figure 4.4: Schematic representation of a BLPP. The solid vertical lines mark the P(i)
packets belonging to flow i, starting at tF (i). The vertical dotted lines mark the
arrival times of other packets.
it consistent (see below) with the observations on rate dependence of figure 3.14 page 65.
Second, we have seen that [P-Pois] failed to reproduce important qualitative behaviour at
small scales. We will see that incorporating burstiness through the variance to mean ratio
is, in many cases, sufficient to reinstate this structure. This is easily and naturally achieved
in the Gamma family, as the second parameter c is equivalent to this ratio, and c = 1 corre-
sponds to [P-Pois]. Thus, finally, although the parameters λA , c of Gamma are not derived
from network ‘first principles’, they do have physical meaning taken directly from data, and
two is clearly the minimum number necessary.
The number of packets in a flow is a random variable P with density p j = Pr(P = j),
probability generating function GP (z) = ∑∞j=0 p j z j , |z| ≤ 1, and distribution function FP (we
j→∞
take p0 = 0). From figure 3.4(b) it is taken to be heavy tailed, that is 1 − FP ( j) ∼ L j−β ,
β ∈ (1, 2), implying finite mean µP , but infinite variance.
Assembling these components, the flow model can be written as
P(i) j−1
Gi (t) = ∑ δ t − ∑ Ai (l) , (4.6)
j=1 l=1
where δ (t) is a delta function centered at t = 0, Ai (l) denotes the l−th inter-arrival for flow
i, and the inner sum is defined to be zero if j = 1. The average arrival intensity is given
by λX = λF µP . The parameters of the model are λF ; λA , c; and µP , β . This is the smallest
number allowing a packet level description of traffic with physical meaning: one parameter
for flow arrivals, two for in-flow packet arrivals, and two for flow volume. A schematic
representation of the BLPP is given in figure 4.4.
Apart from its physical motivation, one of the main advantages of the model is that
its second order properties are tractable. The expressions for the spectral density of the
BLPP (found for instance in [46, p.417] and [42, p.79]) can be coerced into the following
instructive real form:
µ
P
ΓX (ν) = λF ΓG (ν) + SG (ω) + SG (−ω) , (4.7)
λA
where ΓG (ν) is the spectrum of the stationary renewal process with the same parameters as
the finite flow renewal process, here ΓG (ν) = ΓGR (ν; b, c), and
ΦA (ω)
SG (ω) = G P (Φ A (ω)) − 1 . (4.8)
(1 − ΦA (ω))2
Proof. We use the notations and concepts introduced in section 2.3.4. Assume that there is
a point at t = z being the l th point of a cluster of size k. A point a t = z + u can either be from
the same cluster or from a different one. The conditional probability that there is a point
∗j
from the same cluster in (z + u, z + u + δ2 ) is δ2 pk ∑k−l
j=1 f A (u). Taking also into account
the Poisson contribution of points from different clusters and summing over all l and k, one
gets: Pr(N(z, z + δ1 ) = N(z + u, z + u + δ2 ) = 1)
∞ ∞ k−l
= (λF µP )2 δ1 δ2 + λF ∑ ∑ pk ∑ fA∗ j (u) δ1 δ2 + o(δ1 δ2 ). (4.9)
l=1 k=l+1 j=1
and
Pr(N(z, z + δ1 ) = N(z + u, z + u + δ2 ) = 1)
h(u) =
λF µP δ1 δ2
∞ ∞ k−l
1
= λF µP + ∑ ∑ pk ∑ fA∗ j (u) ,
µP l=1 k=l+1 j=1
∞ ∞
1 ∗k

= λF µP + ∑ A f (u) ∑ (l − k)p l . (4.10)
µP k=1 l=k+1
Let ΦA (ω) = E[exp(iωA)] be the characteristic function of the inter-arrival distribution.

From equation (2.39) page 29, the spectrum of the Bartlett-Lewis process reads:
ΓX (ν) = λF µP (h̃( jω) + h̃(− jω) + 1)

1 ∞ ∞
= λF µP 1 + ∑ ∑ (l − k)pl (ΦA (ω)k + ΦA (−ω)k ) (4.11)
µP k=1 l=k+1
and after re-ordering of the terms, this leads
µ
P
ΓX (ν) = λF ΓG (ν) + SG (ω) + SG (−ω) , (4.12)
λA
where ΓG (ν) is the spectrum of the stationary renewal process with the same parameters as
the finite renewal process of each cluster:
h i
ΓG (ν) = λA (1 − ΦA (ω))−1 + (1 − ΦA (−ω))−1 − 1 from equation (2.47), (4.13)
SG (ω) is such that

ΦA (ω)
SG (ω) = GP (ΦA (ω)) − 1 , (4.14)
(1 − ΦA (ω))2
and GP (z) is the probability generating function of P defined as
∞
GP (z) = ∑ p jz j, |z| ≤ 1. (4.15)
j=0
Since SG (ω) and SG (−ω) are complex conjugates, SG (ω) is real.

One sees immediately that the flow arrivals enter equation (4.7) only through λF . From
equation (2.49), this simple variance prefactor has the interpretation that one independently
superposes ‘λF ’ of the same thing. Furthermore, the parameter 1/λA acts as a scale parame-
ter:
ΓX (ω; λF , λA , c, FP ) = ΓX (ω/λA ; λF , 1, c, FP ). (4.16)
This is a direct consequence of chosing ΦA with a scale parameter obeying b ∝ λA−1 , and is
in agreement with the empirical results presented in section 3.4. The third striking feature
λX
is that the expression consists of two terms of which the first, λA ΓGR (ν), is familiar from
section 4.3.1. To understand the second, we note that:
LB(β )(2πλA )2−β ω −(2−β ) → ∞

ω→0
SG (ω) ∼ (4.17)
ω→∞ cos(cπ/2)
∼ − →0 (4.18)
(bω)c
where B(β ) = ψ(1 − β ) cos(πβ /2)/(2π)(2−β ) > 0, ψ(·) denoting Euler’s Gamma function.
Proof. Let the distribution of number of packets per flow F(k) be such that:
k→∞ L(k)
1 − F(k) ∼ , 1 < β < 2, (4.19)
kβ
where L(.) is a slowly varying function (F(k) has a finite mean µP and infinite variance). In
what follows it is assumed that L(k) = L, L > 0. From Tauberian theorem [23, p.333] , this
leads
s→0
f1 (s) ∼ −sβ Lψ(1 − β ), (4.20)
where
f1 (s) = F̃(s) − 1 + µP s, (4.21)
and F̃ is the Laplace-Stieltjes transform of F given by

Z +∞
F̃(s) = IE{exp(−sX)} = exp(−sx)dF(x)
0
∞
= ∑ exp(−sk)pk , for a discrete r.v. (4.22)
k=0
With the change of variable z = exp(−s), one gets:
F̃(s) = GP (exp(−s)), (4.23)
which leads
f1 (s) = F̃(s) − 1 + µP s = GP (z) − 1 − µP log(z), (4.24)
and therefore from equation (4.20)
z→1
GP (z) − 1 − µP log(z) ∼ −(− log(z))β Lψ(1 − β ). (4.25)
Replacing z by the characteristic function of packets inter-arrivals within a flow
ω 1 2 1
ΦA (ω) = IE[exp(iωA)] = 1 + j − σA + 2 ω 2 + o(ω 2 ) (4.26)
λA 2 λA
one gets:
β
GP (ΦA (ω)) − 1 = µP log(ΦA (ω)) − − log(ΦA (ω)) Lψ(1 − β ) + o (− log(ΦA (ω)))β .
(4.27)
The asymptotic behaviour of SG (ω) when ω → 0 can therefore be shown to be
LB(β )(2πλA )2−β ω −(2−β ) → ∞

ω→0
SG (ω) ∼ (4.28)
where
B(β ) = ψ(1 − β ) cos(πβ /2)/(2π)(2−β ) > 0. (4.29)
Thus, at high frequency the spectrum is dominated by the scaled GR term, and at low
frequency by the divergent second term. Comparing with equation (2.5), we see that the
2−β
model is LRD with parameters (cf , α) = (2λF LB(β )λA , 2 − β ). It is significant that equa-
tion (4.17) depends only on the intensity λA of the GR flow processes, and not on the second
order statistics: at large scale the finer details of the flows cease to matter. This remains true
if the standard deviation σP of P exists, in which case
ν→0
ΓX (ν) → λF (σP2 + µP2 ). (4.30)
Recall that the GR term is monotonic decreasing when c < 1. The second term shares this
property as ν → 0, and was observed to obey it for all ν where it is non-negligible. Carrying
over these observations to the wavelet spectrum, the generic shape of the LD for the model
is similar to that of the dashed curve in figure 2.2(c): a monotonic function with the form of
a (scaled) GR process, saturating at medium scales before crossing over to a LRD behaviour
at large scale.
An example of a wavelet spectrum for the model, evaluated using equation (2.69), ap-
pears in figure 4.5 where the magnitude of the (scaled) GR and LRD components are also
plotted. The knee in the LD is now seen as the zone where these two compete. To cap-
ture its position as a function of parameters in a practically useful way, it is essential to
realise that the scale at which the ‘road to LRD’ begins may be very different from where
the asymptotic LRD behaviour of equation (4.17) dominates. We accordingly give two dif-
ferent definitions of transition scale. The first is the largest scale at which the small scale
effects, represented by the saturation level log2 λX /c of the GR component, accounts for
0.0039 0.016 0.062 0.25 1 4 16 64 256 1024
20 Data
Model
GR component
Cluster component
18 Poisson limit
log2 Var( dj ) 16
14
λ µ
F P
12 c
10 λ µ
F P
−8 −6 −4 −2 0 2 4 6 8 10 12
j = log2 ( a )
Figure 4.5: Comparison of LDs of AUCK-d1 and fitted BLPP model. The asterisk
∗ (resp. j ∗ ).
(resp. square) marks the transition scale jGR PGR
∗ , is the one we use for comparison

half of the wavelet spectrum. This scale, denoted by jPGR
against data, as it includes the important medium scale effects. The second definition looks
for equality between the large scale asymptotic behaviours of the two spectral components
ΓX (ν) = λX /c and ΓX (ν) = cf ν −α , yielding
∗∗ 1
jPGR = − log2 λA + log2 µP − log2 (2LB(β )) − log2 c . (4.31)
2−β
Its greater tractability encourages its use, see section 4.5, in describing the qualitative pa-
rameter dependence of the knee, as the parameter dependencies of the two definitions are
very similar. In order to see whether the GR component saturates before the LRD domi-
nates, creating a plateau at medium scales as schematised in figure 2.2, one can compare
∗ against j ∗ , which can be rewritten as
jPGR GR
∗
jGR = − log2 λA + log2 (π 2 (c + 1)/(3εc2 )). (4.32)
∗ < j ∗ , and so the plateau is visible, although just barely. If j ∗ ≈ j ∗

In figure 4.7(a) jGR PGR GR PGR
∗ j ∗ is not possible, since the

then the plateau will have negligible width, however jGR PGR
departure from small scales leading to LRD can only take effect at scales where there are
many packets in a flow, intuitively the same criterion defining GR saturation.
Another advantage of the model is that the packet inter-arrival time distribution can
be calculated analytically [46], enabling comparisons against data and fitted Gamma inter-
arrivals. Finally, simulation of the model is trivial and fast, apart from the long transient
induced by the LRD. However one can avoid the transient regime by starting the simulation
in equilibrium conditions at time t, which are determined by (i) the distribution of the time
from t to the next flow arrival, (ii) the distribution of the number Z of active flows at time
4.4. MODEL VALIDATION 87
t, (iii) the distribution of the forward recurrence time from t to the first event in each flow,
and (iv) the distribution of the number Q of packets remaining in each of the Z active flows.
Analytical expressions for the equilibrium BLPP can in fact be derived [111]. They can be
written as:
(i) the time from t to the next flow arrival follows an exponential distribution with para-
meter λF ,
(ii) the number Z of active flows at time t follows a Poisson distribution with parameter
λF µA (µP − 1),
(iii) the forward recurrence time from t to the first event in each flow has the survivor
Rt
function 1 − λA 0 1 − FA (u)du,
(iv) the probability that there remains Q = j packets in an active flow is [105]:
∞
1
Pr{Q = j} = pi . (4.33)
µP − 1 i=∑
j+1
4.4 Model validation

4.4.1 Marginals
The model works well when fitted to the packet process for the AUCK-d1 trace, as seen in
figure 4.5. The use of GR flows, here with σA /µA = 1.58 (c = 0.4), succeeds in modelling
∗ ≈
most of the burstiness which was not reproduced using [P-Pois] in figure 3.14. Here jGR
∗ , predicting that the plateau is not visible. This is the case, and j ∗ agrees visually
jPGR PGR
with the onset of LRD. Furthermore, the visual agreement in the process X(t) itself was
found to be excellent, not only over the scales shown in figure 4.6, but over all scales.
This agreement, essentially in the marginals, goes beyond second order even though the
experiments were judged through the eyes of the wavelet spectrum, and indicates that the
‘physics’ has been captured.
We can now explain the failure of the black box GR model. Simply, a scaled renewal
process is not a renewal process. Thus, the ‘GR term’ λF µP ΓGR /λA of equation (4.7), al-
though sharing the general form of a GR process, and the possibility of pseudo scaling with
∗ value, is not equivalent to one. The cluster model and the black box GR model
the same jGR
can therefore never coincide at small scales unless λF µP /λA = λX /λA = 1. Fortuitously, this
is almost the case in figure 4.7(a) where λX /λA ≈ 2.1, but not in general. For the Abilene
trace λX /λA = 278. This result is significant since, looking solely at the results in figure 4.2,
a GR process seems reasonable at small scales, but such measures cannot resolve important
dependencies in the data, which are captured by the cluster model.
80 80
number of packets (a) (b)
number of packets
60 60
40 40
20 20
0 0
0 2 4 6 8 10 0 2 4 6 8 10
time (sec) time (sec)
Figure 4.6: The packet process (AUCK-d1). (a) Synthetic X(t) data binned with τ = 50ms,
(b) Corresponding original process X(t).
We now describe the parameter fitting in detail. The flow arrival parameter λF was
estimated directly from the sample mean of flow inter-arrivals. Determining an appropriate
λA for in-flow packet arrivals is not trivial. Simple choices such as the median of R(i) (see
figure 3.14), or the mean, perform poorly. This is because, as we are modelling the packet
arrival process, it is essential to capture the impact of each value of rate in terms of packets.
We accordingly weight the average rate by P(i) − 1, the number of inter-arrivals in each
flow. This results in values that are generally considerably above a simple mean, which
agree well with semi-experimental comparisons.
The tail parameters (L, β ) are measured via a least squares fit in a log-log plot of 1 −
FP (k). The fit is at logarithmically separated k and begins at small (k = 6) or medium (0.8
quantile) values, rather than just the far tail. The exponents of heavy tails are notoriously
difficult to estimate, and the factor L is even more so. The above procedure includes more
data and thus stabilises the estimation, and in addition is important in the present context
where the distribution body is also power-law like over a range of scales. The resulting
behaviour in the LD is thus a mix of effects which must be appropriately captured when
measuring (L, β ). A measurement from the far tail only would not be consistent with (cf , α)
estimates except at extremely large scales beyond the usual observable range.
An entire distribution FP is required for P to link its physical parameters µP , β , L. The
discrete Pareto-like variable H(k; a, β ):
FH (k; a, β ) = 1 − (ak + 1)−β ∼ 1 − Lk−β , k = 1, 2, · · · , (4.34)
where a = L−1/β > 0 is a scale parameter, has mean IE[H] = a−β ζ (β , 1/a) for β > 1 (the
generalised Riemann Zeta function ζ (·, ·) can be evaluated to chosen precision. Unfortu-
nately IE[H] can fail to match µP by a large factor. To broaden the family whilst respect-
ing the power-law tail and/or bodies, we define the mixture distribution FP (k; p, a, β ) =
(a) (b)
0.0039 0.016 0.062 0.25 1 4 16 64 256 1024 977mus 0.0039 0.016 0.062 0.25 1 4 16 64 256 1024
Data Data
19 Model 25
Model
GR component Model with empirical P(i)
Black box GR 23 [ A−Pois; S−Pkt ]
17
upper limit from eq. (15)
lower limit
15 21
log Var( d )
log2 Var( dj )
j
19
13
2
17
11
15
9
13
7
11
5
−8 −6 −4 −2 0 2 4 6 8 10 12 −10 −8 −6 −4 −2 0 2 4 6 8 10
j = log2 ( a ) j = log2 ( a )
(c)
61mus 244mus977mus 0.0039 0.016 0.062 0.25 1 4 16 64 256
Data
27 Model with empirical P(i)
GR component
Cluster component
25
log2 Var( dj )
23
21
19
17
−14 −12 −10 −8 −6 −4 −2 0 2 4 6 8

j = log2 ( a )
Figure 4.7: Comparison of data and BLPP model. (a) The fit to AUCK-c1 is good, whereas
the quality of the black box GR model is fortuitous, (b) The fit to UNC-a1
shows distortion not present when the empirical P histogram is used. A model
using truncated empirical P agrees with the predicted level. (c) With Abilene
deviations remain even with the empirical P. The asterisk (resp. square) marks
∗ (resp. j ∗ ).
the transition scale jGR PGR
pFH (k; a2 , γ)) + (1 − p)FH (k; a, β )). For fixed γ > 2 (finite variance) and a2 > 0, the mixture
parameter p ∈ [0, 1] allows the mean µP to be independently matched . For AUCK-d1 the
fitting procedure yields:
(λF , λA , c) = (63, 8, 0.15) (4.35)
(p, a2 , γ, a, β ) = (0.0236, 0.005, 3, 0.2711, 1.3510) (4.36)
Finally, c can be tuned to fit the LD over scales below the LRD. Alternatively, a mean-
ingful value of c can be obtained by packet weighting as for λA above. The flows with the
largest packet volume, as they also have higher average dispersion (see figure 4.1(c)), act
disproportionately to decrease the effective c value. This illustrates a more general point.
The detailed parameter fitting procedures above show that meaningful values can be given
to the (meaningful) parameters, thus completing the physical validation of the model. For
some parameters however, notably c and λA , this can be computationally intensive. How-
ever, faster methods more akin to ‘fitting’ could be used for more routine application of the
model.
In figure 4.7(b) the fit to UNC-a1 is not quite as good, although the main features are re-
produced, in particular the knee position prediction is satisfactory. Much of the discrepancy
is due to the more complex form of FP . To see this, we also plot a ‘semi-model’ fit where the
empirical distribution of P is used instead of the fitted model distribution. The improvement
reveals that the body of the distribution of P plays an important role in the shape of the ‘ap-
proach’ to the LRD asymptote. Indeed, we have observed that in many cases the observed
‘LRD’ can be dominated by the shape of FP at ‘medium’ scales, resulting in estimates of the
LRD exponent α which are very misleading. To illustrate the relevance of equation (4.30),
in the lower part of the figure we show a semi-experimental LD where the empirical distri-
bution has been truncated at the 90th percentile, rendering the data short-range dependent.
The LD then saturates at a value (dashed line) which agrees well with equation (4.30).
Finally figure 4.7(c) shows the result for the high rate Abilene trace. As the fit is poorer,
we show only the semi-model fit using the empirical distribution for P. We see that despite
eliminating mismatches in the shape of P, the model fails to account for some of the vari-
ability at medium scales (also reported in [182] for other OC48 traces). Understanding the
reasons for this requires a return to the data as well as an enhancement to the model.
4.4.2 Elephants, mice, and a multiclass cluster model
The term ‘elephants and mice’ has become common parlance. It refers to the fact that
often a small proportion of flows, the ‘elephants’, have a disproportionate impact over the
more numerous ‘mice’. Typically this distinction is made in terms of flow volume (bytes),
but it can also be applied to other quantities. The heavy tailed modelling for P respects
this idea, and the results for the Auckland and UNC traces show that the BLPP model
is capable of naturally modelling both elephants and mice within a single model class.
However, the concept can, and should, also be applied to the orthogonal dimension of traffic
rate (see [159]). An important reason for this is that what constitutes a ‘large impact’ is
scale dependent. Only a small number of packets from volume-elephant flows intersect a
given small interval, so their contribution will be negligible compared to that of volume-
mice. Instead, flows with very high rate, rate-elephants, would make themselves felt at such
small scales. On the other hand at large scales localised high rates are irrelevant, and the
contribution of volume-elephants is significant.
Although we noted in section 4.2 that flow rates vary widely, in the BLPP model they
6 −1.5 6 −1.5
(a) (b)
−2 −2
−2.5 −2.5
4 4
−3 −3
log( P )
log( P )
−3.5 −3.5
2 2
−4 −4
−4.5 −4.5
0 −5 0 −5
−2 −1 0 1 2 3 −2 −1 0 1 2 3
log( R ) log( R )
6
3
(c)
2.5
4
2
log( P )
1.5
2 1
0.5
0
−2 −1 0 1 2 3
log( R )
Figure 4.8: Flow and packet density in Abilene. (a) Flow density plot over (R(i), P(i)),
(b) Packet Density plot, (flow density weighted by number of packets), (c)
Coefficient of variation per flow.
share a deterministic value λA . This was acceptable as a single value of λA could be found
which represented well the range seen in the high density portions of figures 4.1(a) and (b).
This would not be the case if rate-elephants and rate-mice were present. A cluster model
incorporating two distinct classes would then be needed in order to successfully describe
behaviour at all scales. To calculate the spectrum of a cluster model like BLPP but where
the parameters can fall into two distinct classes: ‘E’ with rate λAE , shape cE and flow volume
distribution FPE , and ‘M’ with parameters λAM , cM and FPM , we proceed as follows. Let B be a
Bernouilli random variable (independent of P etc.) taking value ‘E’ with probability q, else
‘M’. Consider a cluster process where for each flow an independent copy of B determines its
class. By a well known splitting property of Poisson processes (see theorem 2.3.4 page 32),
the set of seeds of clusters of type ‘E’ (resp.‘M’) is also a Poisson process with rate λFE = qλF
(resp. λFM = (1 − q)λF ). These two new processes, which each have constant rate, shape and
flow volume distribution, are independent BLPP processes. Thus the spectrum ΓX of the
‘multiclass’ cluster model is just the weighted sum of two spectra of BLPP type. This
construction can easily be extended to a countable number of classes.
With these additional tools at our disposal, we return to the Abilene trace with the flow
density plot of figure 4.8(a). It tells a similar story to that of figure 4.1(a), albeit with a
shift to higher rate (note that the diagonal boundary across the top is an edge effect due
to the short duration of the trace). However, when we move to the packet density plot
of figure 4.8(b) we see a striking change in the centre of mass which is not found in the
AUCK traces, where the epicentres of ‘packet’ density and flow density coincide (compare
figures 4.1(a) and (b)). The location in (R, P) space of this high density region represents
an empirical definition of ‘elephant’ which is not tied to rate or packet volume alone. It
is characterised by a very small proportion of flows containing a high proportion of to-
tal packets, with a higher average rate and higher average dispersion (lower c values), as
seen from figure 4.8(c). Thus the Abilene trace contains very strong, bursty, and high rate
volume-elephants, and yet by the argument above, the volume-mice must still be impor-
tant for small enough scale, suggesting that a multiclass model may be essential for a full
description of this data.
In future work we will examine the usefulness of the dual class cluster model to explain
the form of the wavelet spectrum shown in figure 4.7(c) (similar spectra have been observed
in OC-48 commercial backbone links [182]). Alternatives to Gamma renewal models will
also be investigated to model more extreme in-flow burstiness. Although the number of
parameters increases when moving to multiclass models, it may be necessary to capture
important network features. Network traffic is complex, and cannot be reproduced accu-
rately, nor meaningfully understood, with just 3 or 4 parameters. As the Abilene trace
is a very recent one and is from a large backbone link, these complexities are exciting to
explore since in many ways they constitute a taste of the future of traffic. However, as
networks evolve, models may have to adapt and a BLPP model might not be universally
applicable.
4.5 Towards understanding traffic evolution
In this section we examine in more detail the nature of the BLPP model as a function of
parameters, and illustrate its use as a tool to speculate on the future shape of traffic. For
convenience we recall that for large j the LD tends to log2 (cf C) + α j, or

log2 2λF · LB(β )C(2 − β ) + (2 − β )( j + log2 λA ). (4.37)
The flow arrival parameter λF .
The role of λF is to vary the number of flows, which, through equation (4.7), can be seen
as an i.i.d. superposition leaving the form of the second order structure invariant. The mag-
4.5. TOWARDS UNDERSTANDING TRAFFIC EVOLUTION 93
nitude of second order dependencies relative to the mean decreases as (λF µP )−1/2 , so this
result is not in contradiction with the well known weak convergence of such a superposition
to a Poisson process [46, p.285]. In traffic engineering this relative decrease of variability
is known as statistical multiplexing gain and is a standard yet powerful argument for using
links with higher capacity to enable more flows to mix together, effectively lowering vari-
ability, even for LRD traffic. This argument follows ‘open loop’ model reasoning, where
network feedback is weak. This however is currently valid for backbone links, as network
utilisations are low, and are likely to remain so.
The flow structure parameters λA and c.
Since 1/λA is a scale parameter, increasing λA results simply in translating the wavelet spec-
trum toward smaller scales. This can be seen explicitly in the expressons for the transition
∗∗ and j ∗ , and in (4.37) above. Increasing λ also obviously scales back flow dura-
scales jPGR GR A
tions proportionally. At a fixed scale of observation, say at the sampling rate of a particular
measurement infrastructure, one would see the traffic burstiness increase and become de-
cidedly less Poisson as both the in-flow burstiness and scaling behaviour translate to smaller
scale. In network terms, increased λA could correspond to the same traffic passing through
faster access networks before reaching the measured link. This is in agreement with fig-
ure 3.4 (page 48) which shows that the onset of LRD happens at a larger time scale for the
Melbourne ISP traces than for the Abilene trace. One can indeed infer that the customers of
the Melbourne ISP access the Internet through 56kbps modem connections, resulting in a
small λA , while the Abilene network is accessed from high bandwidth university links, with
larger λA .
Equation (4.37) is independent of c. Decreasing c results mainly in an increase in bursti-
ness at scales below LRD through the plateau height λX /c, and an increase in the pseudo
∗ . It also results in a monotonic movement, of approximately the
slope at octaves below jGR
∗ and j ∗∗ to higher scales. Increased flow burstiness could arise
same speed, of both jGR PGR
through lower utilisations on network links, resulting in less queueing and therefore less
traffic smoothing, and also through more aggressive TCP flow control.
The flow volume parameters µP , and (β , L).
We assume that these three can be varied independently, although this can never be entirely
∗∗ the tail parameters (β , L) have no
realised in a parametric family. At scales below jPGR
∗ is entirely independent of P, and µ enters only as a
impact. The plateau onset scale jGR P
∗∗ (thus scaling up the pseudo-

variance factor magnifying the burstiness at scales below jPGR
slope). At the other extreme, the LRD is unaffected by µP but strongly influenced by the tail
parameters: the asymptotic line moves up when the tail is made heavier either by increasing
∗∗ is the result of competing effects. It is pushed up
L or by decreasing β . The onset scale jPGR
when increased µP increases short-range burstiness, grows to a limiting value with increas-
ing β , but decreases with increasing L. In terms of networks, a smaller β corresponds to an
increased spread of file sizes, whereas L and µP trade off the proportion of ‘small’ versus
‘large’ files.
The parameter dependencies above can be combined according to possible future traffic
scenarios. For example, assume that increased access link rates promote a proportional
increase in network usage according to: λF 7→ ΛλF , λA 7→ ΛλA , and consider the question,
will traffic become more or less bursty? Clearly the answer must be time scale dependent. If
∗ , j ∗∗ ] both before and after the increase, then
observing at a scale which is in the range [ jGR PGR
the multiplexing effect due to λF will apply, reducing (relative) burstiness. At scales above
∗∗ however the increase in λ largely cancels this out, and in addition the LRD invades
jPGR A
lower scales. If the more generous access rates also encourage greater transfer volumes:
µP 7→ ΛµP , then λX 7→ Λ2 λX and the multiplexing effect will win out.
Care must be taken when one moves the scale of observation as parameters vary, such as
when studying packet inter-arrivals. There the characteristic timescale, 1/λX = 1/(λF µP ),
∗ is invariant with respect to each
shrinks with increased flow rate or volume. Since jGR
of these, as Λ increases the point of observation in fact moves towards the point process
limit of λX , regardless of the actual change(s) in traffic structure. Indeed, if smaller inter-
arrivals occur purely because of greater µP , then absolute burstiness has in fact increased
∗∗ , whereas the change in perspective might suggest that the traffic had
at scales below jPGR
become more Poisson-like. At such small scales one should also be aware of the physical
limitations of the point process model, which breaks down when packet sizes are reached.
At [OC48,OC3] speeds (assuming a large 1500 byte packet), the model breaks down at
around [5, 77]µs, or j = [−15, −11].
To illustrate this point further, figure 4.9 shows the LD and an average periodogram of
the very fine scale regime of the CAIDA-b1 trace. The two can be linked by reinterpreting
equation (2.69) as a spectral estimator and setting ν = 1/a. The Fourier analysis reveals pe-
riodicities in the packet arrival process at scales j ≤ 10 due to physical network effects, such
as back to back packets on upstream bottleneck links, which translate to shaped, roughly
periodic traffic on the observed link. The wavelet analysis averages these out and leads
to a roughly flat spectrum consistent with a Poisson process. The model presented in this
chapter cannot reproduce this behaviour since it does not include the notion of packet size.
4.6. HIGHER ORDER STATISTICS 95
30.5mus 977mus 0.031 1 32

Averaged Periodogram
28 LD
Poisson Spectrum
26
log Variance( d )
j
24
22
20
2
18
16
14
−15 −10 −5 0 5 10
j = log2 ( a )
Figure 4.9: Periodicities at small scales
Further details on this topic will be given in chapter 7.
4.6 Higher order statistics
Up to this point we have only considered second order moments when comparing the data
and the BLPP model. This is due to the fact that we were mainly concerned with the LRD
character of the data, which is a second order property. In this section we study higher order
statistics with the following two aims: first check that the model satisfactorily captures
higher order statistics of the data, and then investigate the small scale behaviour of Internet
traffic.
4.6.1 Model fit
Recall from chapter 2 that we use q-LDs defined in equation (2.72) to study higher order
statistics. Similarly to what was observed in the case q = 2, the q-LDs exhibit a biscaling
behaviour, i.e. two straight lines separated by a knee. In order to compare the statistics
of the data and the fitted BLPP, we measure the local slopes αq in the q-LDs, at both fine
scales (FS), i.e. below the knee, and coarse scales (CS), i.e. above the knee . We then form
a Linear Multiscale Diagram (LMD), defined in equation (2.73), for each range of scales.
Figure 4.10 shows the LMDs for both data and fitted model at coarse scales and fine
scales. Given the size of the confidence intervals, there is no statistical differences between
the LMDs of the model and the data at CS. At FS, the difference between the LMDs tend to
increase with q values, but the absolute difference remains small. Moreover, it is notoriously
difficult to estimate higher order statistics in empirical data due to local non-stationarities.
−0.05
−0.1
−0.15
−0.2
−0.25
hq
−0.3
−0.35
−0.4
−0.45
−0.5
0 1 2 3 4 5 6
q
Figure 4.10: Multiscaling comparison between AUCK-d1 (grey) and the fitted BLPP
model (black). Dotted lines represent coarse scale behaviour while solid line
represent fine scale behaviour.
We conclude that the BLPP captures the higher order statistics of the empirical packet arrival
process. Given the fact that we only used the autocorrelation of X to fit the parameters, this
means that the BLPP model really captures the ‘physics’ of the data.
4.6.2 Small scale behaviour: multifractal or not ?
While we simply remarked that the LMDs of the data and the BLPP model were fairly
similar in the previous section, we now investigate their actual values and their statistical
meaning. Our aim is to determine whether a multifractal description of the traffic makes
sense. There is in fact a fair body of literature on multifractal modelling of Internet traffic,
which we do not attempt to summarize here. Useful references can be found for instance
in [150] and [167]. We point out that studying the multifractal nature of a process is an
arduous task, and we will show how our wavelet estimator, arguably the best of its class,
can be fooled.
Recall from section 2.4.3 that we basically have to check whether the points of the LMD
are horizontally aligned (monoscaling behaviour) or not (multiscaling behaviour). The size
of the confidence intervals in figure 4.10 leads us to the conclusion that the data exhibits a
monoscaling behaviour at CS, consistent with LRD. At FS, one could conclude a very weak
multiscaling, or a monoscaling behaviour.
Going beyond the close agreement between the data and the model, which is very sat-
isfying in itself, the point we wish to make here is something quite different. The BLPP
model is not multifractal. Nonetheless it reproduces a non-trivial multiscaling behaviour
4.7. CONCLUSION 97
(at least to the same extent as the data). It therefore provides another example of pseudo
scaling, in the same spirit as the transition effect between the two asymptotic values of the
Gamma renewal LD illustrated in figure 4.3. From section 2.3.2, the BLPP is an infinitely
divisible point process. However, the exact consequences of this property are still unclear.
From the above, we see that pseudo scaling is responsible for the empirical scaling
within a model which has a strong physical foundation. If we accept this model as preferable
on such grounds, which we do here, then we are led to conclude that the evidence for the
multiscaling itself (whether it be monofractal or not) is misleading. In fact, this shows
effectively a lack of power on the part of the statistical procedure since it cannot distinguish
between the signatures of a true multifractal scaling and a pseudo scaling process. In the
hypothesis test language this corresponds to an ‘error of type II’ 2 .
4.7 Conclusion
Our analysis of the structure of TCP packet arrivals in Internet traffic led to several signifi-
cant conclusions. It is based on empirical findings detailled in chapter 3 where we showed
(at least in the context of lightly loaded links) that both the flow arrival process and de-
pendencies between flows have negligible impact, as do higher layer mechanisms grouping
flows such as web browsing sessions. The key element was found to be the concept of
independence between flows. Using wavelet analysis, the second order statistics of packet
arrivals were shown to be determined by in-flow packet arrival burstiness at small scales,
and heavy tailed flow volume at large scale. The scaling-like behaviour at small scales was
clearly linked to the burstiness within flows.
In this chapter, a stationary Poisson cluster process class was proposed as an ideal model
capturing these features. Poisson arrival instants with rate λF denote the arrival of flows.
Packets within flows follow finite Gamma Renewal processes with rate λA and shape c,
flow volume being given by a heavy tailed variable P with infinite variance. The model
has many advantages including a known spectrum, positive marginals, simple synthesis,
and a minimum number of parameters each with direct physical interpretation in terms of
network traffic. Its spectrum can be written as a sum of a scaled spectrum of a renewal
process controlling small scale behaviour, and a term controlling asymptotic large scale
behaviour. A detailed description was given of the behaviour of the spectrum, and the
wavelet spectrum, as a function of parameters, and the corresponding interpretation for
networks. The model offers the possibility of a new, and very simple, alternative explanation
for empirical evidence of multiscaling behaviour at small scales, as a transitional effect over
2 More details on the topic can be found in [173]
a narrow range of scales of simple in-flow burstiness, suggesting that such traffic is not truly
multifractal over these time scales. An expression for the onset scale of LRD was given,
analysed as a function of network parameters, and found to be accurate. The model is highly
structural, rather than black box, enabling its use as an investigative tool for the evolution
of traffic properties.
The model was verified against large quantities of accurate Internet data, and was found
to reproduce the second order statistics well. The parameter fitting was described in detail.
It led to meaningful parameter values, and visually convincing model sample paths, con-
firming that the model actually captures much of the network ‘physics’. Some departures
from the model were found for a recent, very high bit rate traffic trace. Further data analy-
sis revealed some of the underlying reasons, and a multi-class version of the model was
described as a possible means to account for them.
It was shown how the model can naturally incorporate the notion of elephant and mice
flows without the need to explicitly define them and treat them separately. It was also used
to illustrate how a packet volume based definition of elephants is not sufficient, and how
‘rate-elephants’ could be accounted for in the model, should they exist.
Chapter 5
Inverting sampled traffic
5.1 Introduction
The findings presented in this thesis up to this point concern mainly question (i), and can be
summarized by saying that the Bartlett-Lewis point process (BLPP) is a very good model of
Internet packet arrivals (chapter 4), with strong empirical backing (chapter 3). We now turn
to question (ii) and study in this chapter how to sample packet traffic at a router interface.
5.1.1 Motivation
Network traffic measurement is essential for traffic engineering (e.g. link upgrades or traffic
re-routing) and traffic accounting (e.g. usage based pricing). Routers offer tools such as
Cisco’s Netflow [33] or Inmon’s sFlow [91] that give information about the flows of packets
that traverse them. However the generation of detailed traffic statistics does not scale well
with link speed. This is why packet sampling techniques are increasingly being used in
routers [34] to export the statistics of a portion of the traffic only. The problem that then
immediately arises is how to deal with such partial measurements. One can think of this as
a two step process: first recover the statistics of the full traffic from the retained sampled
data through some inversion procedure, and second, take appropriate decisions based on
the characteristics of the full traffic. While the second step is left to traffic engineers and
managers, the first corresponds to an interesting and important task which has only recently
been attracting attention. Our aim in this chapter is to provide theoretical results for the
problem of recovering statistics beyond first order from sampled traffic, and to see how
successfully such results can be applied in practice with real traffic. We focus mainly on
two statistics: the spectral density of the packet arrival process, and the distribution of the
number of packets per flow. This implies that we limit ourselves to portions of traffic that
can be considered stationary. It also means that we do not try to recover sample values, such
as actual number of packets in flows on the measured link, but rather the distribution from
99
100 CHAPTER 5. INVERTING SAMPLED TRAFFIC
which these samples were drawn.

Traffic statistics commonly considered vary widely depending on user requirements and
the capabilities of the collection mechanism. In this chapter we first place ourselves in a
general framework whereby any raw statistics of the sampled data that we may need are
considered to be available, as we focus primarily on the feasibility of the inversion problem.
In some cases these statistics may not be readily available in today’s routers, or they may be
close to impossible to provide because of real-time constraints. For example few routers can
export packet level statistics such as sizes and timestamps of individual packets. In addition,
currently high-end routers use switched instead of shared backplanes, and therefore not all
packets are seen at any single point of the backplane [90]. Purpose built link monitoring
boxes however, or dedicated passive measurement infrastructures supporting offline studies
based on sampled traffic, will be capable of much finer grained storage and processing.
5.1.2 Terminology
In accordance with the rest of this thesis, we are mainly interested in the point process of
packet arrival times and will not be concerned with packet sizes. In point process theory,
the action of ‘sampling’ points along the real line is called thinning, and we will use these
two terms interchangeably. From a theoretical perspective, one is interested in recovering
as much information as possible about the original point process by observing a thinned
version of it. We will use ‘full’ or ‘original’ to refer to the non-sampled packet traffic, and
‘sampled’ or ‘thinned’ to refer to the sampled traffic. The process of thinning the packet
process is to be understood in general terms as the action of only recording part of the total
traffic according to a certain rule.
In this chapter we will study two different sampling rules: packet thinning, which acts
directly on individual packets and is ignorant of flows, and flow thinning, where entire flows
of packets are retained or discarded at once. Independent and identically distributed (i.i.d.)
packet thinning consists of, for each packet in an independent manner, retaining the packet
with probability q or discarding it with probability 1 − q. Similarly, i.i.d. flow thinning
consists of, for each flow independently, leaving the flow untouched with probability q or
removing it entirely with probability 1 − q.
We use a hierarchy of descriptors to study the statistics of packet traffic, based on con-
cepts introduced in chapter 3. We refer to packet level when describing statistics which do
not use or refer to any imposed structure or detailed modelling assumptions. Examples of
packet level statistics are the mean packet arrival rate or the spectral density of the packet
arrival process. Flow level is concerned with statistics arising from the grouping of packets
5.1. INTRODUCTION 101
into flows, such as the distribution of the number of packets per flow.
Finally, although of less importance here, we use in-flow level to refer to statistics de-
scribing the placement of packets within a flow, such as the mean arrival rate of packets
belonging to a given flow.
5.1.3 Previous work
In the early 1990’s, data collection on the T1 NSFNET backbone showed that information
was lost during peak periods. Sampling methods were therefore advocated in [36] to reduce
the load on the measurement infrastructure. The aim of this work was to estimate the packet
size distribution from the sizes of sampled packets. Different sampling strategies were
compared: deterministically taking one in every N packets (systematic sampling), taking on
average one in N packets (simple random sampling) or taking one packet in every bucket
of size N (stratified random sampling). In [50] an adaptive sampling rate was proposed to
optimize the resource allocation. An adaptive sampling technique was also used in [32]
where a bound on the sampling error for traffic load measurement was studied. Another
study of sampling techniques can be found in [31] where the mean number of packets and
the packet size distribution are estimated from a sampling where the number of skipped
packets is a Poisson random variable. Sampling strategies were also used in [88] for the
detection of denial of service attacks. In [52], Duffield et al. provided estimates from
sampled traffic of the mean number of bytes or packets of a set of packets with common
properties (e.g. protocol, IP addresses, Autonomous System,...). Because of the heavy tailed
distribution of file sizes, a particular kind of sampling, known as stratified sampling [37], is
used to reduce the variance of the estimators. It basically consists in sampling ‘more’ in the
heavy tail of the distribution and gives different weights to different samples.
Each of the aforementioned studies were concerned with a packet level description of
network traffic in the sense described above. Much closer in spirit to this chapter is the work
presented in [131], where it is shown how certain first-order IP flow level statistics can be
recovered from sampled traffic. In particular, an estimator (and its all important variance)
of the mean number of packets per flow is given. The estimation is not blind and makes
strong use of additional information contained in the TCP packet header. More specifically
the recovery scheme requires the knowledge of the number of original flows, and as it is
assumed that this is not measured directly, it must be inferred separately. It is shown how
this can be achieved by looking for TCP SYN packets in the case of ‘ideal’ TCP flows
which all begin with a SYN packet and have infinite timeouts. Recently a technique to
approximate the full distribution of number of packets per flow was proposed in [51]. This
method is based on an expectation maximization technique and gives a smooth estimate of

the original distribution. These results will be discussed further in section 5.3.2.
These previous studies are only concerned with the distribution of number of packets
per flow. We are not aware of any previous study on recovering the spectral density of the
packet arrival process from sampled traffic.
5.1.4 Outline and main contributions
We are interested in inverting sampled traffic in a statistical sense, focusing mainly on two
quantities: the spectral density (packet level), and the distribution of the number of packets
per flow (flow level). In section 5.2 we address this problem from a theoretical perspective.
We first consider the case of packet sampling since it is the method currently implemented
in routers. We show in particular how the theory of point processes can help recover the
original spectral density from the thinned data. We also propose a theoretical scheme to
recover the full distribution of the number of packets per flow. In this respect we extend the
work of [131] where only the mean number of packets was recovered. We then present an
alternative sampling technique named flow sampling which is (almost) as computationally
feasible as packet sampling, but has a more straightforward inversion mechanism both at the
packet and flow level. The inversion methods require different assumptions on the original
traffic depending on the sampling method which are carefully detailled and justified.
The practical application of the two methods to real traffic and the limitations of their
numerical evaluation are given in section 5.3. Our main contribution is the demonstration
of the fact that inversion is essentially impossible in practice in the case of packet thinning
for any useful thinning probability, whereas flow thinning can be usefully inverted no mat-
ter how high this probability becomes, provided enough traffic is sampled. Section 5.4 is
concerned with the application of the sampling techniques to the BLPP model introduced in
chapter 4. It is shown how the parameters of the model can be fitted from the thinned data
obtained from both sampling techniques in theory, not always in practice. In section 5.5 we
summarize our findings and discuss the use of different sampling techniques for different
tasks. We conclude in section 5.6.
5.2 Inverting sampling: theory
In this section we study two different sampling techniques, which we call i.i.d. packet sam-
pling and i.i.d. flow sampling. We present theoretical inversion methods to recover the spec-
tral density and the distribution of the number of packets per flow from the observed thinned
traffic. All quantities corresponding to thinned traffic will be written with the superscript (q) ,
5.2. INVERTING SAMPLING: THEORY 103
where q is the retention probability defined below.
5.2.1 Packet sampling
In general terms, the i.i.d. packet thinning of a stationary point process X with rate λ consists
in independently keeping each point of X with probability q or rejecting it with probability
1−q to form a new point process X (q) with rate λ (q) = qλ .
Packet level
The original rate can be recovered from that of the thinned process in a straightforward way
via
1
λ = λ (q) . (5.1)
q
A much less intuitive result links the spectral densities of X and X (q) . From [25, 46], for
any simple, locally finite and second order stationary point process X with spectral density
ΓX (ω), the spectral density of X (q) reads
(q)
ΓX (ω) = q2 ΓX (ω) + q(1 − q)λ . (5.2)
Proof. See section 2.3.5 page 31.
From equations (5.2) and (5.1) the spectrum ΓX (ω) of the original process can therefore
be recovered and reads
1 (q) (q)

ΓX (ω) = Γ (ω) − (1 − q)λ . (5.3)
q2 X
This powerful result gives readily accessible and very useful information about the orig-
inal process without making any assumptions on its detailled structure. In particular no
modelling assumptions are required beyond stationarity.
Flow level
Let us assume that the original process is in fact the superposition of identically distributed
groups of points called clusters. In the traffic context these are packets grouped into flows.
Recall the notations of chapter 3: P is the discrete random variable describing the number
of points per cluster, with density pk = Pr(P = k), distribution FP , and finite mean µP . In
practice no flow of length 0 is observed and therefore p0 = 0. Let P(q) be the discrete random
(q)
variable describing the number of packets per flow after packet thinning, with density pk =
Pr(P(q) = k), and distribution FP(q) . In this subsection our aim is to recover the properties of
(q)
the marginal FP of the original flows from FP . Since we look at the marginal only, there is
no need to assume independence between flows.
(q)
Conditioning on the number of packets in a given original flow, the probability pk of
having a flow of size k ≥ 0 after thinning reads
∞
(q)
pk = ∑ Pr{k packets after thinning| j packets before thinning}p j
j=k
∞
j k
= ∑ q (1 − q) j−k p j . (5.4)
j=k k
Equation (5.4) gives the densities of the thinned flows as a function of the densities of the
original flows. To invert this relation we use results on probability generating functions and
complex analysis.
Let us first introduce some notation. In the following C (z, r), D(z, r) and D̄(z, r) will
denote respectively the circle, the open disk and the full disk with center z and radius r.
Denote by B the binomial random variable such that Pr(B = 0) = 1 − q and Pr(B = 1) = q.
(q)
Let GP (z), GP (z) and GB (z) be the probability generating functions of P, P(q) and B defined
respectively as
∞ ∞
(q) (q) j
GP (z) = ∑ p jz j , GP (z) = ∑ pj z,
j=0 j=0
(q)
and GB (z) = 1 − q + qz. GP (z) and GP (z) are defined on the closed unit disk D̄(z, r) =
D(0, 1) ∪ C (0, 1), but if FP is heavy tailed they are only analytic on the open unit disk
D(0, 1) due to a singularity at z = 1. GB (z) is an entire function (analytic for all z ∈ C).
By definition of i.i.d. packet thinning, P(q) can be expressed as a sum of P i.i.d. binomial
random variables. From results on the generating function of a compound distribution the
following relation holds:
(q)
GP (z) = GP (GB (z)) for z ∈ D̄(0, 1). (5.5)
This equation is the transform domain version of equation (5.4). Since G−1
B (D̄(0, 1)) =
D̄(1 − q, q), one can obtain GP from equation (5.5) as

(q) z − (1 − q)

GP (z) = GP for z ∈ D̄(1 − q, q). (5.6)
q
Now, as we see from equation (5.5), the probabilities p j that we wish to calculate can be
obtained by picking out the coefficients of a power series expansion of GP about the origin.
However, equation (5.6) only gives an inversion formula for GP for z ∈ D̄(1 − q, q), a closed
disk which lies within the unit circle and is centered at z0 = 1 − q. (see the thick circle in
figure 5.1(a)). It does not give us GP over the full unit disk, nor an expansion about the
origin. We consider how to circumvent these difficulties in a moment.
Using standard results on generating functions, the mean number of packets per flow
can be recovered via
(q) (q)
dGP 1 dGP µP
µP = = = . (5.7)
dz z=1 q dz z=1 q
Let FP be a heavy tailed distribution such that
x→+∞ L
1 − FP (x) ∼ , (5.8)
xα
where L > 0 and 1 < α < 2. From equations (5.8) and (5.6) one can show by using a
(q)
Tauberian theorem [23, p.333] that FP has tail behaviour
(q) x→+∞ L(q)

1 − FP (x) ∼ , (5.9)
xα
where
L(q) = qα L. (5.10)
The thinned distribution for the number of packets per flow is therefore also heavy tailed
with the same index but reduced tail mass. In fact the Tauberian theorem used above is
even stronger and gives an equivalence between equations (5.8) and (5.9). This means that
if a heavy tailed is observed in the thinned traffic, it must come from the original traffic,
and cannot have been created by the thinning process itself. From equation (5.10) one can
trivially invert the tail prefactor:
1 (q)
L= L . (5.11)
qα
We now present two different theoretical schemes to recover the original probability
densities.
Scheme 1: analytic continuation
Our aim is to construct a power series expansion of GP about the origin in order to recover
the p j via the expansion on the left in equation (5.5). In principle, since GP is analytic
in D(0, 1) and from equation (5.6) its values are known on D(1 − q, q) which lies inside
D(0, 1), GP is known on D(0, 1) through analytic continuation. The required expansion
about the origin can therefore be found. Carrying this through in practice however is not
straightforward.
We denote by z0 = 1−q the origin of the original analytic domain D0 = D(z0 , q). Within
D0 it is easy to show from equation (5.6) that the following power series expansion holds:
∞
GP (z) = ∑ a0n (z − z0 )n , z ∈ D0 . (5.12)
n=0
where the coefficients obey

(q)
pn
a0n = (5.13)
qn
and the radius of convergence is r0 = q.
(a) (b)
q = 0.6 q = 0.1
1 1
z0 z0
z1 z1
0.5 0.5
z2
z3
z4
z5
0 0
−0.5 −0.5
−1 −1
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Figure 5.1: Analytic continuation method for (a) q = 0.6, and (b) q = 0.1. The thick solid
dark circle represents C0 = C (z0 , q) and the thick dotted grey circle is the unit
circle C (0, 1). For q = 0.6 z1 can be chosen as the origin and an expansion
made there whereas for q = 0.1 a series of analytic continuations are required,
before a point, z5 , can be chosen as the origin.
The basic principle we employ is to choose a point z1 ∈ D0 and to expand GP as a power

series about it. The coefficients of this new series can then be obtained by comparing with
the series of equation (5.12) evaluated at z = z1 , and are
∞
n 0
a1j =∑ an (z1 − z0 )n− j . (5.14)
n= j j
Consider how this works for the simple case of q ∈]0.5, 1] where we are able to choose
(q)
pn
z1 to be the origin, as illustrated in figure 5.1(a) for q = 0.6. Substituting a0n = qn into
equation (5.14) and noting from equation (5.5) that a1j = p j in this case, we have
n (−1)n− j
∞
(q)
pj = ∑ n
(1 − q)n− j pn , (5.15)
n= j j q
which only converges for q ∈]0.5, 1] when FP is heavy tailed. An alternative way to derive
this inversion formula is to directly apply a combinatorial identity to invert equation (5.4).
The identity in question states [153, p.49], with no convergence criteria given, that Bk =
j k k+ j B are inverses. In the present context this identity

∑∞j=k k A j and A j = ∑∞
k= j j (−1) k
can only help us for q ∈]0.5, 1], a very mild degree of thinning.
For q ∈ [0, 0.5] z1 cannot be chosen at the origin, and we adopt a recursive procedure
involving a sequence {zk }, k = 1, 2, · · · l, of points along the real axis obeying 1 > z0 >
z1 > · · · zl = 0, zl being the origin itself (figure 5.1(b) illustrates the case where q = 0.1 and
l = 5). At the kth stage, zk will be chosen to lie inside the circle of convergence Ck−1 from
the previous stage, and GP will be expanded in a power series centered about zk , whose
coefficients akj will be obtained through those of the previous stage:

∞
n k−1
akj = ∑ an (zk − zk−1 )n− j . (5.16)
n= j j
Since zk lies inside the unit circle where we know GP is analytic, its circle of convergence
Ck will first encounter a singularity at z = 1, and so the corresponding radius of convergence
will be rk = 1 − zk . In this way, as the sequence {zk } marches towards the origin, the radii
of convergence increase monotonically to 1. In fact the zk can be chosen so that the origin
is approached geometrically: a minimum of d− log2 (q)e iterations is required. As before,
the coefficients of the final power series will be the desired densities, that is p j = alj .
Scheme 2: Cauchy integral
A second theoretical scheme to recover the original p j is based on another important result
of complex analysis: the Cauchy integral formula, which for our particular problem reads
GP (z)
I
pj = dz, (5.17)
S z j+1
where S can be any closed contour containing the origin. Inversion methods based on
equation (5.17), including methods using inverse Fourier transforms and damping tech-
niques, are summarized in [1]. They work well when one can directly evaluate GP on a
contour including the origin1 . Methods have also been developed to remove the aliasing
terms caused by the unavoidable discretization of the integral in the numerical evaluation
of equation (5.17), both when Fp is light tailed [45] and heavy tailed [155]. Note that for
q > 0.5, we can choose S = C0 and the Cauchy integral can be directly evaluated along
this contour. However for q < 0.5 we first have to infer the values of GP on some suitable
contour S and then use equation (5.17) to recover the p j .
A common method to do so is to use Padé approximants . It consists in approximating
GP at the point z = z0 by a quotient of two polynomials P(z) and Q(z), of degree L and
M respectively, and then evaluating P(z)/Q(z) at the desired values of z. Details on the
determination of these polynomials and convergence issues can be found for instance in
[14]. In our case we evaluate the Padé approximants on a contour S chosen to be the unit
circle. The main drawback of the Padé approximation is that there are no general bounds on
the error. The natural bound in this case, that |G p (z)| ≤ 1 on the unit circle, is not of much
use.
While complex analysis provides elegant theoretical results for the recovery of FP from
(q)
FP , the procedures are quite involved. Moreover, it is an ill-posed problem in the sense that
1 In some queuing problems for instance one has an explicit expression for the generating function to be
inverted for the corresponding probability densities.
small errors in the evaluation of G p at points in the original domain D̄0 become magnified
in the extrapolation [126]. To put this in perspective, in the case of significant thinning, say
q = 0.001, we are trying to extrapolate values from a tiny circle of radius q close to z = 1 up
to the entire unit circle. Note that equations (5.7) and (5.11) do not suffer from this problem
as z = 1 is on the circle for all q. As we will see in section 5.3, the practical limitations of
the two schemes described above are so severe that only a few values for the first iteration
step can be obtained numerically. Given these fundamental difficulties at the flow level,
we do not attempt to investigate the inversion of in-flow statistics for packet thinning. We
now turn to a very different kind of sampling, i.i.d. flow thinning, which has quite different
properties.
5.2.2 Flow sampling
As stated in the introduction, i.i.d. flow sampling consists in selecting flows with probability
q. Flows will be taken to be identically distributed through an assumption of stationarity,
Flow level
Since the flows that are kept by the thinning procedure are identically the same as the origi-
nal flows, all the marginal flow properties, and in particular the distribution P of the number
of packets per flow, can be readily estimated from the observed thinned traffic. There is no
inversion problem as such (beyond estimation issues), and the value of q plays no theoreti-
cal role. The same holds true for in-flow statistics, which are uncorrupted by flow thinning.
This is in marked contrast with the packet thinning scenario and its problematic inversion
requirements. As for packet thinning, no assumption of flow independence is needed at this
point.
Packet level
We now explain how, under reasonable assumptions on the underlying process, packet level
information such as the spectral density of X can be recovered from flow sampled traffic.
We place ourselves in the modelling framework detailled in chapter 4 where we consider
that the flow arrival process Y (t) is a Poisson process and we take flows to be mutually
independent. The packet arrival process X(t) is therefore a Poisson cluster process (PCP),
as defined in section 2.3.2:
X(t) = ∑ Gi (t − tF (i)), (5.18)
i
where the flow arrival times {tF (i)} follow a Poisson process and Gi (t) represents the ar-
rival process of packets within flow i. It is assumed that the subsidiary process Gi (t) has a
5.3. INVERTING SAMPLING: PRACTICE 109
finite mean number µP of packets per flow and a finite intensity. These two conditions are
necessary for X(t) to be stationary [110].
Let ΓG (ω) be the ‘spectrum’ of Gi (t), more precisely the expectation of the modulus
squared of the Fourier transform of Gi (t). The spectral density of X(t) can be shown to be
simply [46]
ΓX (ω) = λF ΓG (ω). (5.19)
From [110] the rate of the stationary process X(t) reads
λ = λF µP . (5.20)
Let us now consider the effect of flow thinning a PCP. Using the well known indepen-
dent splitting property of a Poisson process (theorem 2.3.4 page 32), one can show that the
i.i.d. sampling with probability q of the Poisson flow arrival process Y (t) with rate λF is
(q)
a Poisson process Y (q) (t) with rate λF = qλF . This means that flow sampling transforms
(q)
X(t) into a PCP X (q) (t) with flow rate λF and the same Gi (t). The rate of X (q) (t) reads
(q) (q)
λ (q) = λF µP = qλF µP = qλ and the original rate can be recovered via
1
λ = λ (p) . (5.21)
q
The spectral density of X (q) (t) reads
(q) (q)
ΓX (ω) = λF ΓG (ω) = qΓX (ω), (5.22)
from which the original spectral density can be expressed as
1 (q)
ΓX (ω) = ΓX (ω). (5.23)
q
5.3 Inverting sampling: practice
The previous section was concerned with theoretical inversion methods for two different
kinds of thinning. In this section we present a numerical evaluation of these inversion tech-
niques. We begin with the packet level statistics in 5.3.1 before tackling the flow level
statistics in 5.3.2. Results concerning the estimates of first order quantities and their con-
fidence intervals for packet sampled traffic can be found in [131] and will not be detailed
here.
The passive measurements used to illustrate the thinning methods are presented in ta-
ble 3.1 page 44. They come from the Auckland-IV [177] and Abilene NLANR [130] trace
repositories. The traffic can be considered stationary for the period of time covered by the
traces.
5.3.1 Packet level
From equations (5.3) and (5.23), the spectrum of the full traffic can be recovered from the
spectrum and the rate of the thinned traffic for both sampling techniques. When estimating
from data however, because of the scaling properties of network traffic we use a wavelet
based estimate of the spectral density. Because of the linearity of the relationship between
the Fourier and wavelet spectra detailled in equation (2.69) page 35, essentially the same
inversion formulae can be used.
Figure 5.2 illustrates the inversion methods in the (log) wavelet domain. The thick
gray line corresponds to the wavelet spectrum of the original traffic, while the vertical lines
mark confidence intervals on the spectrum estimate at the different scales. The straight line
observed over large scales betrays long memory.
When q is relatively large (q = 0.1), the spectrum inferred from the packet thinned
traffic is remarkably close to the ‘true’ spectrum estimated directly from the full traffic, as
illustrated on figure 5.2(a). The fact that fine details of the spectrum can be reproduced is
due to the fact that equation (5.2) is valid for any second order stationary point process.
On the other hand, the spectrum reconstructed from the flow thinned traffic does not match
the true spectrum quite as well. While very good at large scales, the reconstruction fails to
precisely match the small scale behaviour. This is a direct consequence of our assumption
underlying the inversion formula that flows are uncorrelated. The inversion is incapable
of re-inserting the flow dependencies which were weakened by the thinning. Despite this
strong assumption however, the inverted spectrum clearly reproduces the main features of
the true spectrum.
When one moves to much smaller values of q however, as figure 5.2(b) shows for
q = 0.001, the flow based thinning still gives a qualitatively accurate estimate while the
inversion technique based on packet thinning is highly inaccurate. In fact, from the form of
equation (5.3) one can see that the original spectrum ΓX (ω) is recovered by measuring the
(q)
difference between ΓX (ω) and a Poisson noise. Since for small q the confidence intervals
(q)
on the estimation become so large that ΓX (ω) cannot be reliably distinguished from this
noise, the inversion procedure must fail. The problem clearly becomes steadily worse as q
drops. This is significant since as link rates increase a trend to ever more aggressive thinning
seems likely.
In contrast to the above, recovery of the spectrum in the case of flow thinning does
(q)
not suffer from the same drawback as it simply involves multiplying ΓX (ω) by a scale
factor (an upward translation on the logarithmic scale of the LD). In fact, the quality of the
(a)
q = 0.1
0.004 0.016 0.062 0.25 1 4 16 64 256 1024
18 Original
Packet Thinned
Inferred from Packet Thinned
16 Flow Thinned
Inferred from Flow Thinned
14
log Var( d )
j
12
2
10
6
−8 −6 −4 −2 0 2 4 6 8 10 12
j = log2 ( a )
(b)
Packet: q = 0.001, Flow: q = 0.0001
30.5mus 977mus 0.031 1 32
Original
35 Packet Thinned
Inferred from Packet Thinned
30 Flow Thinned
Inferred from Flow Thinned
25
log2 Var( d j )
20
15
10
−14 −12 −10 −8 −6 −4 −2 0 2 4 6

j = log2 ( a )
Figure 5.2: Spectrum reconstruction: (a) AUCK-d1: Logscale diagrams of the original
traffic, packet and flow thinned traffic each with q = 0.1, and the two corre-
sponding inverted estimates for the full traffic (T0 = 64s). The top axis marks
the timescale in seconds. (b) IPLS: Logscale diagrams of the original traffic,
packet thinned traffic with q = 0.001, flow thinned traffic with q = 0.0001, and
the two corresponding inverted estimates for the full traffic. Despite the fact
that the flow thinned traffic is ten times thinner, the estimate recovered from it
is far better. Similar experiments on other traces led to the same conclusions.
estimation through the flow thinning inversion method depends mainly on the number of
flows N remaining after thinning, the value of q being largely irrelevant. At constant N, the
inversion method will therefore lead to an approximately ‘constant’ error, irrespective of the
thinning probability. This point is illustrated in figure 5.3 where inversion based estimates
are given for two values of q at constant N. In practice however, non-stationarities and edge
effects make it difficult to accurately estimate the spectrum when the number of remaining
flows drops too low (In the case of the traffic used in figure 5.3, although there were 3
million flows, the trace was only 10 minutes long resulting in quite strong edge effects).
The near independence of the inversion method with respect to q is a strong argument in
favour of flow based sampling for spectral estimation2 .
5.3.2 Flow level
In general, estimating the distribution of the number of packets per flow consists of an
(q)
estimation of the densities p j for the thinned process, followed by an inversion phase.
(q)
For flow thinning we have already seen that the inversion is trivial, and the p j can be
estimated from a histogram. For packet thinning however even the first phase is potentially
(q)
problematic, as a knowledge of p0 > 0 is needed. Since the proportion of discarded flows
(q)
is not automatically observed as it is in flow thinning, the quantity p0 cannot be estimated
without extra information.
The simplest solution is to supply the total number NF of flows with the measured
sampled traffic, in the spirit of Inmon’s sFlow [91]. Another solution proposed by [131]
was already mentioned in the introduction. Assume that each original TCP flow has only
one packet with a SYN flag and that it is the first. Consider the set of such SYN packets.
It is clear that the probability that a given SYN packet is retained is also q, and that this
(q)
is therefore the probability that a flow has been retained. Let NF be the total number of
(q)
observed flows, and N1 the number of observed packets with a SYN flag. An estimate of
(q)
the number of flows NF before thinning is NF = N1 /q. One can construct an estimate of
(q) (q) (q)
p0 via p0 = NF /NF .
Another important practical issue with packet thinning concerns the consistency of the
flow definition before and after thinning. For example, in order to prevent the breakup of
flows due to their sparsity after thinning, one should at least replace the timeout value T0
with T0 /q. However this does not eliminate all problems and extra flows can still be created
for some types of applications [131]. It is another advantage of flow thinning that problems
of this type do not arise. The flow definition and timeout value adopted for the full traffic
applies without change after sampling.
To clearly evaluate the performance of the thinning inversion techniques in isolation
(q)
from other issues such as those above, we assume in what follows that p0 is known. In
addition, we will first assume that we know the distribution of P and can therefore evaluate
2 This LD reconstruction from i.i.d.
flow thinning also gives a nice semi-experimental result further justifying
the flow independence of the BLPP packet traffic model
61mus 977mus 0.016 0.25 4 64

Original
28
Inferred from Flow Thinned with q=0.1
Inferred from Flow Thinned with q=0.01
26
24
log2 Var( d j )
22
20
18
16
−14 −10 −6 −2 2 6
j = log2 ( a )
Figure 5.3: Reconstruction of the (log wavelet) spectral density from flow thinning when
the number of flows after thinning is constant (N=3000). The quality of the
estimation remains roughly unchanged as q varies. The confidence intervals
for q = 0.1 (omitted for clarity) are similar to those of q = 0.01.
(q)
pk numerically from equation (5.4). For this purpose we use a simple discrete Pareto-like
variable H with distribution
FH (k; a, β ) = 1 − (ak + 1)−β ∼ 1 − Lk−β , k = 1, 2, · · · , (5.24)
where a = L−1/β > 0 is a scale parameter. The mean of H is IE[H] = a−β ζ (β , 1/a) for
β > 1, and is therefore fully determined by the tail behaviour. The variance is infinite.
Scheme 1: analytic continuation
We first consider inversion scheme 1 (using analytic continuation) in the case where q > 0.5
for which we can calculate p j from equation (5.15), which we repeat here:
∞
pj = ∑ (−1)n− j c(n, j, q) (5.25)
n= j
with
n (1 − q)n− j (q)

c(n, j, q) = pn . (5.26)
j qn
There are three main issues with the numerical evaluation of this sum.
(i) First the evaluation of the coefficients c(n, j, q) is not entirely straightforward since
their magnitudes can become enormous due to the q−n factor. In fact, one can show,
using equation (5.24), that for fixed j and q the function c(n, j, q) is unimodal with a
maximum cmax ( j, q) reached for n = nmax ( j, q), and decays exponentially fast to zero
for large n. The functions nmax ( j, q) and cmax ( j, q) can be respectively approximated
by
qj
max ( j, q) =
ng , (5.27)
2q − 1
and
−j β +1
max ( j, q) = (2q − 1) − j
cg + γ, (5.28)
where
γ = β a−β q−1 (2q − 1)−(β +1) . (5.29)
Values of cmax ( j) and cg

max ( j, q) are plotted in figure 5.4(a). Given the exponential
increase of cmax ( j), it is important to understand how many values can be accurately
calculated before a loss of precision occurs. This can be done by finding the smallest
max ( j, q) > M, where M is the largest floating point number

integer jmax such that cg
that can be stored by the calculator. Taking the leading term in equation (5.28) yields
the following approximation for jmax :
& '
log(c max )
jmax =
g
1
. (5.30)
log 2q−1
An exact analytic expression of jmax can also be found from equation (5.28), but it is
rather cumbersome since it involves the Lambert W-function3 . In practice, a simple
dichotomy program will locate jmax very efficiently. Using typical double precision
with 32 bits used for the mantissa of a floating point number (Matlab was used),
jmax = 22 is the smallest value for which cmax ( j, q) > 232 .
(ii) Second, assuming that all the c(n, j, q) can be accurately evaluated, the sum must
be truncated. Since for n > nmax ( j, q) the sequence c(n, j, q) is positive, decreasing
and tends to zero, the absolute error on the partial sum of the alternating series is
bounded by the first neglected term. The probability p j can therefore be evaluated
with precision ε by summing the first n0 terms, where n0 is the smallest integer larger
than nmax ( j, q) such that c(n0 + 1, j, q) ≤ ε. Given that the terms c(n, j, q) decay
exponentially for large n, the convergence of the series is very fast.
(q)
(iii) Finally, in practice there are the additional errors from the need to estimate the pn .
Numerical results for scheme 1 are presented in figure 5.4(b) where the truncation issue
has been carefully addressed, the estimation issue does not apply as exact values are used,
but where nonetheless precision limitations create serious problems. They show that the
numerical evaluation of p j fails for j ≥ 22. This is a direct consequence of (i): the calculator
lacks precision to accurately cancel out the very large coefficients appearing in the sum.
Moreover, as q tends to 0.5 from above, other computational problems due to the truncation
3 The Lambert W-function is defined as the inverse of the function f (W ) = W exp(W ).
(a)
10
10
cmax ( j )
10
5
232
cmax ( j , q )
0 cmax ( j , q ) from (5.28)
10
0 1 2 3
10 10 10 10
j (number of packets per flow)
(b)
0
10
Theoretical original density
Flow thinning
Packet thinning: scheme 1
Packet thinning: scheme 2
−2
10
Pr(P=j)
−4
10
−6
10
0 1 2 3
10 10 10 10
Figure 5.4: Inversion of the pj , light thinning: Numerical evaluation of the different in-
version schemes for q = 0.6 using FP given by equation (5.24) with a = 1 and
β = 1.5.
(a) cmax ( j, q),
(b) Packet thinning inversion: Scheme 1: even with no estimation, the in-
version becomes unstable as soon as cmax ( j) cannot be accurately calculated.
Scheme 2: Some improvement at high computational cost. (L = M = 200 and
215 discretization steps for the evaluation of the Cauchy integral). Flow thin-
ning inversion and estimation: starting with 106 flows, estimates are reliable,
extend to much greater j, and degrade gracefully.
issue (ii) will occur since nmax will become very large. Finally, for q < 0.5, the recursion
introduced in equation (5.16) makes these numerical problems dramatically worse.
It is important to note that analytic continuation can be successfully applied in practice

[28], and that there are ways of controlling the truncation error at each iteration [77]. This
inversion scheme only fails in this particular case due to the form of the coefficients of the
power series.
Scheme 2: Cauchy integral
The numerical evaluation of scheme 2 (Padé approximants followed by the Cauchy integral
formula) takes us a little further, but at the price of a fairly intensive numerical evaluation.
It was found that increasing the degree of the Padé approximants did not significantly im-
prove the accuracy of the calculations. In contrast to these packet thinning based inversion
(q)
schemes, the ‘inversion’ from flow thinning, including the numerical estimation of the pn ,
provided a low cost and reliable estimation of p j , whose accuracy dropped gracefully as
(q)
j increased as seen in figure 5.4. The estimates of the pn were made according to the
(q)
following formula, where p0 was assumed known:
(q) (q) (q)

p j = (1 − p0 )o j for j ≥ 1, (5.31)
(q) (q)
where o1 , o2 ... are the normalized histogram estimates of the number of packets per flow
after flow thinning.
In the general case where q ∈ (0, 1], algorithms to recursively compute the coefficients
(k)
a j based on the analytic continuation idea of scheme 1 can be found in [77] and [28].
However since the particular form of the coefficients (equation (5.13)) prevents a precise
numerical evaluation of even the first step of the recursion, the method is not applicable
here. When the thinning procedure removes more than half of the original packets, the
inversion method for packet thinning cannot be used in practice unless ‘infinite precision’
arithmetic is employed. This is unlikely to be computationally feasible in a router context.
Again, flow based thinning avoids the above problems, and just as for the spectrum, the
quality of the estimation depends essentially on the number of flows N after thinning, and
not on the value of q. This is shown in figure 5.5 where the complementary cumulative
distribution function (CCDF) of the number of packets per flow on an OC-48 link, and
the estimated CCDFs for three different values of q, are plotted at constant N = 3000. As
expected, the quality of the estimation of the CCDF is roughly independent of q. However,
since the estimation of heavy tails is a notoriously difficult problem [11], one should make
sure that N will be large enough to allow a sufficiently precise estimation of the distribution
tail.
In summary, despite the fact that both sampling types can theoretically be inverted, the
numerical study carried out in this section reveals the following:
• The packet sampling technique leads to an excellent reconstruction of the spectrum

and a fair estimate of the p j for j up to the order of 50 for q > 0.5. However in the
useful range q 0.5, the quality of the spectrum estimate is poor and deteriorates
0
10
Empirical original CCDF
CCDF for q=0.1
CCDF for q=0.01
−1
CCDF for q=0.001
10
Pr(P>j) −2
10
−3
10
−4
10 0 1 2 3 4 5
10 10 10 10 10 10
Figure 5.5: Inversion of pj , heavy thinning: Empirical CCDF of the number of packets
per flow for the IPLS trace, and estimated CCDFs obtained from flow thinned
traffic for q {0.1, 0.01, 0.001} while the number of flows after thinning remains
constant (N = 3000). The quality of the estimation remains unchanged despite
the wide variation in q.
steadily as q gets smaller and it becomes impossible to evaluate the p j even for small
j as soon as q drops below 0.5 without extended precision arithmetic.
• The flow sampling technique gives a reasonable estimate of the spectrum and an ex-
cellent estimate of the p j for a large range of thinning probabilities. In particular, for
thinning probabilities used in practice, of the order of 1% or less, flow thinning is by
far superior to packet thinning if one is interested in recovering detailed characteris-
tics of the original traffic such as its spectrum or the distribution of flow size.
It is worth noting at this point that if a parametric family was chosen for FP one could
try to estimate its parameters, a much easier task than trying to numerically recover each
p j . Since no family has been identified as valid for all Internet flows, we have not pursued
this here.
We now compare our inversion technique to recover the distribution of flow size from
packet sampled traffic with recently published results [51]. The method proposed in [51]
gives a smooth estimate of the histogram of flow sizes for a given traffic sample by using an
expectation maximization (EM) technique. This means that this method is fundamentally
different from the direct inversion of equation (5.4) presented above for the following two
reasons:
(i) First, we consider the observed flow sizes as i.i.d. copies of the random variable P
and aim to recover the distribution FP . On the other hand, the aim of [51] is to recover
the original frequencies of each observed flow size for a given data set.
(ii) Second, the EM method gives a smooth estimate of flow size frequencies whereas we
aim to recover the exact flow size densities.
As seen in [51, fig. 4], the EM method performs relatively well for ‘large’ values of q such
as q = 0.1, but the results are already less satisfactory for smaller q such as q = 0.01. In
particular, the EM algorithm fails to capture the heavy tailed distribution of the original
traffic. We expect the results for practical values of q, such as q = 0.001, to be even worse.
The EM method therefore gives results for q one order of magnitude smaller than for the
direct inversion, and the value q = 0.5 plays no theoretical role in this approach. This
apparent discrepancy between the results of the two methods comes from the fact that the
direct inversion is much more demanding than the EM method. Indeed the EM method
averages through the values of p j whereas the direct inversion does not. In other words, the
EM does not aim to recover the exact values of the p j , but gives instead a histogram that is
optimal for a specific metric. This means in particular that the EM can give wrong estimates
for even the first values p1 and p2 .
5.4 The Bartlett-Lewis point process
In this section we apply both packet and flow thinning to the Bartlett-Lewis point process
(BLPP) model presented in detail in chapter 4. We simply recall from equation (4.7) page 82
the form of its spectral density ΓX (ν) since it is of particular interest here:
µ
P
ΓX (ν) = λF ΓG (ν) + SG (ω) + SG (−ω) , (5.32)
λA
where ΓG (ν) is the spectral density of the stationary renewal process with the same parame-
ters as the finite flow renewal process, namely
h i
ΓG (ν) = λA (1 − ΦA (ω))−1 + (1 − ΦA (−ω))−1 − 1 , (5.33)
and
ΦA (ω)
SG (ω) = G P (ΦA (ω)) − 1 . (5.34)
(1 − ΦA (ω))2
As expected equation (5.32) is consistent with the general form for the spectral density of
a PCP given in equation (5.19).
We derive the thinning results of the BLPP in section 5.4.1, and investigate the viability
of fitting the model via measurements on thinned data in section 5.4.2.
5.4. THE BARTLETT-LEWIS POINT PROCESS 119
5.4.1 Thinning Bartlett-Lewis point processes
If the BLPP model is to be useful in practice, for example for the dimensioning of backbone
links, one needs to be able to measure its parameters from data. It is therefore of interest to
see if it is compatible with either or both of the thinning procedures. In this subsection we
derive the properties of thinned BLPPs.
Theorem 5.4.1. An i.i.d. packet thinned Bartlett-Lewis process X (q) is also a Bartlett-Lewis
process with parameters:
(q) (q)
• flow rate: λF = λF (1 − p0 ),
(q)
(q) pj (q)
• density of P(q) : x j = (q) , j > 0, and x0 = 0,
1−p0
• density of in-flow packet inter-arrivals:
q f˜(s)

−1
f (q)
(x) = L .
1 − (1 − q) f˜(s)
Proof. Let X be a BLPP and X (q) the process resulting from its i.i.d. packet thinning with
(q)
probability q. The thinned flows are clearly i.i.d. with marginal distribution FP given by
(q)
equation (5.4). Since p0 = ∑∞j=1 (1 − q) j p j > 0, in this picture λF is unchanged but some
flows may be empty. To conform to a convention where a BLPP has zero probability of
(q) (q)
an empty flow, we must renormalise the p j from equation (5.4) to obtain a FP with
(q)
(q) pj (q)
densities x j = (q) , j > 0, and x0 = 0. The average flow arrival rate is then reduced to
1−p0
(q) (q)
λF = λF (1 − p0 ).
From section 2.3.5 page 31 we know that if X is a renewal process with inter-arrival
density f (x), then the i.i.d. thinned process X (q) is another renewal process, with inter-
arrival density fq (x) whose Laplace transform f˜q (s) reads
q f˜(s)
f˜q (s) = . (5.35)
1 − (1 − q) f˜(s)
It follows that each finite ordinary renewal process that constitutes a flow of X will become
another ordinary renewal process with the inter-arrival density above provided it has at least
2 points.
The remaining property of X (q) to specify is the arrival process of the non-empty thinned
(q)
flows. We now show that this is in fact a Poisson process with rate λF . Since the flow evap-
(q)
oration probability p0 acts independently on flows, by theorem 2.3.4 on Poisson splitting
the original flow starting points (which may have themselves been thinned) of flows which
(q)
do not evaporate form a Poisson process O of rate λF . Consider such a flow which has sur-
vived thinning. There exists a random variable T ≥ 0 giving the time interval between the
original starting point and the first non-thinned point after thinning in that flow. As this can
be viewed as an i.i.d. translation by T of the points of O, which by a well known theorem
[46] is another Poisson process of the same rate, the result follows.
This pleasant closure property of a BLPP, which is worth mentioning in its own right,
also helps to make the inversion of its parameters analytically tractable.
Theorem 5.4.2. An i.i.d. flow thinned Bartlett-Lewis process X (q) is also a Bartlett-Lewis
(q) (q)
process with flow rate λF = qλF , x j = p j , and f (q) (x) = f (x).
Proof. The result follows from the discussion at the end of section 5.2.2.
We see that the BLPP model has almost ideal theoretical properties with respect to the
interpretation of thinned forms of itself, and the parameter inversion problem. In the next
section we briefly consider the practical side of the question.
5.4.2 Fitting from thinned data
With respect to i.i.d. packet thinning, despite the attractive theoretical properties described
above, most of theorem 5.4.1 cannot be exploited in practice if q > 0.5. The reasons are
(q)
the same as those stated in section 5.3.2 concerning the recovery of the p j from the x j .
Moreover, even if one could numerically evaluate these, there would be another inversion
problem to recover the in-flow packet inter-arrival density from its Laplace transform, with
(q)
similar limitations due, ultimately, to the very small values of all the x j except at j = 1 if
q is small. For completeness, we note that the relevant inversion techniques are also based
on Cauchy’s integral formula and are similar to the one presented in section 5.2.1. They can
be implemented using the Fast Fourier Transform [1].
We turn then to fitting from i.i.d. flow thinned data, where the simple inversion of the-
orem 5.4.2 presents no difficulties. Figure 5.6 illustrates the procedure for p = 0.001. The
remarkable thing about this approach is that we do not need to explicitly invert the more
complex in-flow or even flow level statistics. One merely fits the model on the thinned data
as one would normally, and then scales up the value of λF . Figure 5.6 shows that the results
can be good even for p = 0.001, and as before, it is to an excellent approximation only the
total number of flows which determines the size of the confidence intervals, not q.
5.5 How to sample traffic ?
Both the IETF working groups IPFIX (Internet Protocol Flow Information Export) [92]
and PSAMP (Packet Sampling) [138] advocate the use of packet sampling. However, the
5.5. HOW TO SAMPLE TRAFFIC ? 121
0.004 0.016 0.062 0.25 1 4 16 64 256 1024

20 Original
BLPP matched to Original
Flow Thinned
BLPP matched to Flow Thinned
15 BLPP reconstructed from Thinned
log2 Var( d j )
10
−8 −6 −4 −2 0 2 4 6 8 10 12
j = log2 ( a )
Figure 5.6: BLPP parameter fitting from flow thinned traffic AUCK-d1 data, thinned
with p = 0.001, is matched to the BLPP, the theoretical spectrum calculated,
and then inverted by simply shifting it vertically. The inversion compares
well with the original data showing that the model can be successfully fitted
from thinned data. The same fitting procedure applied to the full traffic is also
shown.
results presented in this chapter indicate that in certain circumstances flow sampling is a
much more efficient option. In this section we summarize our main findings and give some
indication on how and when each sampling technique should be used.
5.5.1 Packet sampling

Implementation
Packet sampling is widely implemented in today’s routers, where the sampling method
usually keeps one out of N sequential packets. Some line card engines can also sample
consecutive packets. These sampling techniques are not equivalent to the mathematically
friendly i.i.d. packet thinning presented in section 5.2, but have the advantage of requiring
very low computations.
Usage
Packet sampling is very useful when one wants to get traffic information at a higher reso-
lution than that obtained from the Simple Network Management Protocol (SNMP) coarse-
grained counters or active probe data. For instance, packet sampling will perform very well
to estimate the average packet rate [131]. Packet sampling is also well suited for collecting
other basic statistics, such as source and destination IP addresses, route prefixes and au-
tonomous system numbers. However we showed in section 5.3.1 that its performance when
recovering higher level statistics, such as the spectral density of the packet arrival process
or distribution of flow length, is very poor.
5.5.2 Flow sampling

Implementation
Today’s routers do not implement any flow sampling strategy. This is because flow sampling
does require more overhead than packet sampling. In fact, for flow thinning, all packets
have to be grouped into flows before they can be processed or discarded. This involves
more computation and more memory if one uses the traditional hash table approach with
one entry per flow. However this might not be such a drawback if new flow classification
techniques, such as bitmap algorithms [57] or Bloom filters [101], can be applied instead.
The total number of packets stored, for a given q, is essentially the same for both packet and
flow sampling.
Usage
The fundamental point of this chapter is that when one is interested in recovering more
detailed information about packet traffic beyond first order statistics, then packet sampling
can be fundamentally unsuitable, whereas flow sampling can perform very well. The higher
cost of implementing flow sampling has to be compared with the very high cost or near
impossibility of recovering certain statistics, such as the distribution of number of packets
per flow, from packet sampling. For instance, knowing the flow length is very valuable
when deploying web proxies [63], setting up connection thresholds in flow-switched net-
works [59], or characterizing certain network attacks[51]. For all these applications, flow
sampling should be the preferred sampling method whenever possible. Flow sampling also
has the combined advantage of avoiding the problem of flow splitting and having a preci-
sion that depends only on the number of remaining flows after thinning, not on the thinning
probability q. In other ways it has ideal scaling with respect to sampling rate.
5.6 Conclusion
We have explored in detail the question of recovering the spectrum and the distribution
of the number of packets per flow of the packet arrival process, from sampled data. Two
kinds of sampling were used, i.i.d. packet sampling, and i.i.d. flow sampling, with a given
probability q of retaining a packet or flow respectively. In each case, exact theoretical
inversion techniques were derived. However, in the case of packet thinning, we showed
how the inversion methods were of little to no use in practice for q small enough to be truly
useful, such as q = 0.01 or smaller, and become much worse as q becomes smaller still.
An exception to this is the asymptotic tail which can be recovered by a different technique
5.6. CONCLUSION 123
(although in practice it remains a very difficult problem). In sharp contrast, as flow thinning
preserves flows intact but simply reduces their number, it avoids these problems entirely
and the inversion is trivial. The performance of inversion methods based on flow thinning
does not deteriorate with q but depends essentially on the number of retained flows, which
could be set in practice depending on memory and computational limitations. However,
the inversion step does assume flow independence, and so cannot capture all aspects of the
traffic, whereas packet thinning based methods can provided q is large enough. However,
for backbone links where there is strong evidence that dependence between flows is weak,
this may not be important.
We also investigated the fitting of a useful type of cluster model describing packet ar-
rivals. It was shown that the model class is closed under both kinds of thinning and that
exact inversion is theoretically possible. In practice however, again inversion based on
packet thinned packet is not feasible for realistic values of q, whereas inversion based on
fitting from flow based thinning performs well.
Although this chapter concludes that packet sampling is not appropriate to recover the
flow length distribution in practice, we have not proven that different inversion techniques
could not lead to better results. One path worth exploring would be to incorporate con-
straints such as a heavy tail flow length distribution in the inversion step. This could be
considered in the maximum likelihood context, as in [51], or in analytic continuation ap-
proaches [49].
Chapter 6
Bridging router performance and
queuing theory
6.1 Introduction
After having investigated questions (i) and (ii) in the previous chapters, we now examine
‘through-router’ delays and investigate the detailled mechanisms of a router to answer ques-
tion (iii).
End-to-end packet delay is an important metric to measure in networks, both from the
network operator and application performance points of view. An important component of
this delay is the time for packets to traverse the different forwarding elements along the
path. This is particularly important for network providers, who may have Service Level
Agreements (SLAs) specifying allowable values of delay statistics across the domains they
control. A fundamental building block of the path delay experienced by packets in IP net-
works is the delay incurred when passing through a single IP router.
Although there have been many studies examining delay statistics measured at the edges
of the network, very few have been able to report with any degree of authority on what
actually occurs at switching elements. In [141] an analysis of single hop delay on an IP
backbone network was presented, and different delay components were isolated. However,
since the measurements were limited to a subset of the router interfaces, only samples of the
delays experienced by packets, on some links, were identified. In [177] single hop delays
were also obtained for a router. However since the router only had one input and one output
link, which were of the same speed, the internal queueing was extremely limited. This is
not a typical operating scenario, and in particular it led to the through-router delays being
extremely low. In this chapter we work from a data set recording all IP packets traversing
a Tier-1 access router over a 13 hour period. All input and output links1 were monitored,
1 with one negligible exception.
125
126 CHAPTER 6. BRIDGING ROUTER PERFORMANCE AND QUEUING THEORY
allowing a complete picture of through-router delays to be obtained.
The first aim of this chapter is to exploit the unique certainty provided by the data set by
reporting in detail on the actual magnitudes, and temporal structure, of delays on a subset of
links which experienced significant congestion: mean utilisation levels on the target output
link ranged from ρ = 0.3 to ρ = 0.7. High utilisation scenarios with significant delays are of
the most interest, and yet are rare in today’s backbone IP networks. From a measurement
point of view, this chapter provides the most comprehensive picture of end-to-end router
delay performance that we are aware of. We base all our analysis on empirical results and
do not make any assumptions on traffic statistics or router functionalities.
Our second aim is to use the completeness of the data as a tool to investigate how packet
delays occur inside the router, in other words to provide a physical model of the router delay
performance. For this purpose we first position ourselves in the context of the popular store
& forward router architectures with Virtual Output Queues (VOQs) at the input links [124].
We are able to confirm in a detailed way the prevailing assumption that the bottleneck of
such an architecture is in the output queues, and justify the commonly used fluid output
queue model for the router. We go further to provide two refinements to the simple queue
idea which lead to a model with excellent accuracy, close to the limits of timestamping
precision. We explain why the model should be robust to many details of the architecture.
The model focuses on datapath functions, performed at the hardware level for every IP
datagram. It only imperfectly takes account of the much rarer control functions, performed
in software on a very small subset of packets.
The third contribution of the chapter is to combine the insights from the data, and sim-
plications from the model, to address the question of how delay statistics can be most ef-
fectively summarised and reported. Currently, the existing Simple Network Management
Protocol (SNMP) focuses on reporting utilisation statistics rather than delay. Although it is
possible to gain insight into the duration and amplitude of congestion episodes through a
multi-scale approach to utilisation reporting [140], the connection between the two is com-
plex and strongly dependent on the structure of traffic arriving to the router. We explain
why trying to infer delay from utilisation is in fact fundamentally flawed, and propose a
new approach based on direct reporting of queue level statistics. This is practically feasible
as buffer levels are already made available to active queue management schemes imple-
mented in modern routers (note however that active management was switched off in the
router under study). We propose a computationally feasible way of recording the structure
of congestion episodes, and reporting them back via SNMP. The statistics we select are rich
enough to allow detailed metrics of congestion behaviour to be estimated with reasonable
6.2. FULL ROUTER MONITORING 127
accuracy. A key advantage is that a generically rich description is reported, without the need
for any traffic assumptions.
The chapter is organized as follows. The router measurements are presented in sec-
tion 6.2, and analyzed in section 6.3, where the methodology and sources of error are de-
scribed in detail. In section 6.4 we construct and justify the router model, measure its accu-
racy and discuss the nature of residual errors. In section 6.5 we define congestion episodes
and show how important details of their structure can be captured in a simple way. We then
describe how to report the statistics with low bandwidth requirements, and illustrate how
such measurements can be exploited.
6.2 Full router monitoring
In this section we describe the hardware involved in the passive measurements, present our
experiment setup to monitor a full router, and detail how packets from different traces are
matched.
6.2.1 Hardware considerations
We first give the most pertinent features of the architecture of the router we monitor, and
then recall relevant physical considerations of the SONET link layer, before describing our
passive measurement infrastructure.
Router architecture
As mentioned in the introduction, our router is of the store & forward type, and implements
Virtual Output Queues (VOQ). Details of such an architecture can be found in [124]. The
router is essentially composed of a switching fabric controlled by a centralized scheduler,
and interfaces or linecards. Each linecard controls two links: one input and one output.
A typical datapath followed by a packet crossing the router is as follows. When a packet
arrives at the input link of a linecard, its destination address is looked up in the forwarding
table. This does not occur however until the packet completely leaves the input link and fully
arrives in the linecard’s memory, i.e. the ‘store’ part of store & forward. Virtual Output
Queuing means that each input interface has a separate First In First Out (FIFO) queue
dedicated to each output interface. The packet is stored in the appropriate queue of the input
interface where it is decomposed into fixed length cells. When the packet reaches the head
of line it is transmitted through the switching fabric cell by cell (possibly interleaved with
competing cells from VOQ’s at other input interfaces dedicated to the same output interface)
to its output interface, and reassembled before being handed to the output link scheduler,
i.e. the ‘forward’ part of store & forward. The packet might then experience queuing before
being serialised without interruption onto the output link. In queuing terminology it is
‘served’ at a rate equal to the bandwidth of the output link, and the output process is of fluid
type because the packet flows out gradually instead of leaving in an instant.
In the above description the packet might be queued both at the input interface and
the output link scheduler. However in practice the switch fabric is overprovisioned and
therefore very little queueing should be expected at the input queues.
Layer overheads
Each interface on the router uses the High Level Data Link Control (HDLC) protocol as a
transport layer to carry IP datagrams over a Synchronous Optical NETwork (SONET) phys-
ical layer. Packet over SONET (PoS) is a popular choice to carry IP packets in high speed
networks because it provides a more efficient link layer than IP over ATM, and faster fail-
ure detection than broadcast technologies. We now detail the calculation of the bandwidth
available to IP datagrams encapsulated with HDLC over SONET.
The first level of encapsulation is the SONET framing mechanism. A basic SONET OC-
1 frame contains 810 bytes and is repeated with a 8kHz frequency. This yields a nominal
bandwidth of 51.84Mbps. Since each SONET frame is divided into a transport overhead of
27 bytes, a path overhead of 3 bytes and an effective payload of 780 bytes, the bandwidth
accessible to the transport protocol, also called the IP bandwidth, is in fact 49.92 Mbps.
OC-n bandwidth (with n ∈ {3, 12, 48, 192}) is achieved by merging n basic frames into a
single larger frame, and sending it at the same 8kHz rate. In this case the IP bandwidth is
(49.92 ∗ n) Mbps. For instance the IP bandwidth of an OC-3 link is exactly 149.76 Mbps.
The second level of encapsulation is the HDLC transport layer. This protocol adds 5
bytes before and 4 bytes after each IP datagram, irrespective of the SONET interface speed
[163].
These layer overheads mean that in terms of queuing behaviour, an IP datagram of size
b bytes carried over an OC-3 link should be considered as a b + 9 byte packet transmitted at
149.76 Mbps. The importance of these seemingly technical points will be demonstrated in
section 6.4.
Timestamping of PoS packets
As already mentioned in section 3.2.1, all measurements presented in this thesis are made
using high performance passive monitoring ‘DAG’ cards [44]. We use DAG 3.2 cards to
monitor OC-3c and OC-12c links, and DAG 4.11 cards to monitor OC-48 links on the router.
Average Matched Duplicate

Link # packets rate packets packets Router traffic
(Mbps) (% total traffic) (% total traffic) (% total traffic)
BB1-in 817883374 83 99.87% 0.045 0.004
BB1-out 808319378 53 99.79% 0.066 0.014
BB2-in 1143729157 80 99.84% 0.038 0.009
BB2-out 882107803 69 99.81% 0.084 0.008
C1-out 103211197 3 99.60% 0.155 0.023
C1-in 133293630 15 99.61% 0.249 0.006
C2-out 735717147 77 99.93% 0.011 0.001
C2-in 1479788404 70 99.84% 0.050 0.001
C3-out 382732458 64 99.98% 0.005 0.001
C3-in 16263 0.003 N/A N/A N/A
C4-out 480635952 20 99.74% 0.109 0.008
C4-in 342414216 36 99.76% 0.129 0.008
Table 6.1: Trace details: Each was collected on Aug. 14 2003, between 03:30 – 16:30
UTC.
The cards use different technologies to timestamp PoS packets.

DAG 3.2 cards are based on a design dedicated to ATM measurement and therefore op-
erate with 53 byte chunks corresponding to the length of an ATM cell. The PoS timestamp-
ing functionality was added at a later stage without altering the original 53 byte process-
ing scheme. However, since PoS frames are not aligned with the 53 byte divisions of the
PoS stream operated by the DAG card, significant timestamping errors occur. In fact, a
timestamp is generated when a new SONET frame is detected within a 53 byte chunk. This
mechanism can cause errors of up to 2.2µs on an OC-3 link [48].
DAG 4.11 cards are dedicated to PoS measurement and do not suffer from the above
limitations. They look past the PoS encapsulation (in this case HDLC) to consistently
timestamp each IP datagram after the first (32 bit) word has arrived.
As a direct consequence of the characteristics of the measurement cards, timestamps
on OC-3 links have a worst case precision of 2.2µs. Adding errors due to potential GPS
synchronization problems between different DAG cards leads to a worst case error of 6µs
[67]. This number should be kept in mind when we assess our router model performance.
6.2.2 Experimental setup
The data analyzed in this chapter was collected in August 2003 at a gateway router of the
Sprint IP backbone network, and constitutes the second set of empirical data used in this
thesis, in complement of the traces presented in table 3.1 page 3.1. Six interfaces of the
router were monitored, accounting for more than 99.95% of all traffic flowing through it.
monitor monitor
monitor monitor monitor monitor
in out
BB1 out OC48 OC3 in
C1
out
OC3 in C2
GPS clock
out signal
OC3 in
C3
in out
BB2 out OC48 OC12 in C4
monitor monitor monitor monitor
monitor monitor
Figure 6.1: Experimental setup: gateway router with 12 synchronized DAG cards.
The experimental setup is illustrated in figure 6.1. Two of the interfaces are OC-48 linecards
connecting to two backbone routers (BB1 and BB2), while the other four connect customer
links: two trans-pacific OC-3 linecards to Asia (C2 and C3), one OC-3 (C1) and one OC-12
(C4) linecard to domestic customers. A small link carrying less than 5 packets per second
was not monitored for technical reasons.
Each DAG card is synchronized with the same GPS signal and outputs a fixed length
64 byte record for each packet on the monitored link. The details of the record depend on
the link type (ATM, SONET or Ethernet). In our case all the IP packets are PoS packets,
and each 64 byte record consists of 8 bytes for the timestamp, 12 bytes for control and PoS
headers, 20 bytes for the IP header and the first 24 bytes of the IP payload. We captured
13 hours of mutually synchronized traces, representing more than 7.3 billion IP packets or
3 Tera Bytes of traffic. The DAG cards are located physically close enough to the router so
that the time taken by packets to go between them can be neglected.
6.2.3 Packet matching
The next step after the trace collection is the packet matching procedure. It consists in
identifying, across all the traces, the records corresponding to the same packet appearing at
different interfaces at different times. In our case the records all relate to a single router,
but the packet matching program can also accommodate multi-hop situations. We describe
below the matching procedure, and illustrate it in the specific case of the customer link
C2-out. Our methodology follows [141].
We match identical packets coming in and out of the router by using a hash table. The
hash function is based on the CRC algorithm and uses the IP source and destination ad-
dresses, the IP header identification number, and in most cases the full 24 byte IP header
data part. In fact when a packet size is less than 44 bytes, the DAG card uses a padding
technique to extend the record length to 64 bytes. Since different models of DAG cards
use different padding content, the padded bytes are not included in the hash function. Our
matching algorithm uses a sliding window over all the synchronized traces in parallel to
match packets hashing to the same key. When two packets from two different links are
matched, a record of the input and output timestamps as well as the 44 byte PoS payload is
produced. Sometimes two packets from the same link hash to the same key because they are
identical: these packets are duplicate packets generated by the physical layer [146]. They
can create ambiguities in the matching process and are therefore discarded, however their
frequency is monitored.
Matching packets is computationally intensive2 and demanding in terms of storage: the
total size of the result files rivals that of the raw data. For each output link of the router, the
packet matching program creates one file of matched packets per contributing input link. For
instance, for output link C2-out four files are created, corresponding to the packets coming
respectively from BB1-in, BB2-in, C1-in and C4-in (the input link C3-in has virtually no
traffic and is discarded by the matching algorithm). All the packets on a link for which no
match could be found were carefully analyzed. Apart from duplicate packets, unmatched
packets comprise packets going to or coming from the small unmonitored link, or with
source or destination at the router interfaces themselves. There could also be unmatched
packets due to packet drops at the router. Since the router did not drop a single packet over
the 13 hours, no such packets were found.
Assume that the matching algorithm has determined that the mth packet of output link
Λ j corresponds to the nth packet of input link λi . This can be formalized by a matching
function M , obeying
M (Λ j , m) = (λi , n). (6.1)
The matching procedure effectively defines this function for all packets over all output
links. Packets that can not be matched are not considered part of the domain of definition
of M .
2 Thepacket matching code was designed and written by Konstantina Papagiannaki, Gianluca Iannacone
and Tao Ye from Sprint Advanced Technology Laboratories.
Set Link # Matched packets % traffic on C2-out

C4 in 215987 0.03%
C1 in 70376 0.01%
BB1 in 345796622 47.00%
BB2 in 389153772 52.89%
C2 out 735236757 99.93%
Table 6.2: Breakdown of packet matching for output link C2-out.
Table 6.2.1 summarizes the results of the matching procedure. The percentage of matched
packets is at least 99.6% on each link, and as high as 99.98%, showing convincingly that
almost all packets are matched. In fact, even if there were no duplicate packets and if ab-
solutely all packets were monitored, 100% could not be attained because of router generated
packets, which represents roughly 0.01% of all traffic.
The packet matching results for the customer link C2-out are detailed in table 6.2. For
this link, 99.93% of the packets can be successfully traced back to packets entering the
router. In fact, C2-out receives most of its packets from the two OC-48 backbone links
BB1-in and BB2-in.
This is illustrated in figure 6.2 where the utilization of C2-out across the full 13 hours
is plotted. The breakdown of traffic according to packet origin shows that the contributions
of the two incoming backbone links are roughly similar. This is the result of the Equal Cost
Multi Path [86] policy deployed in the network when packets may follow more than one
path to the same destination. While the utilization in Mbps in figure 6.2(a) gives an idea
of how congested the link might be, the utilization in packets per second is important from
a packet tracking perspective. Since the matching procedure is a per packet mechanism,
figure 6.2(b) illustrates the fact that roughly all packets are matched: the sum of the input
traffic is almost indistinguishable from the output packet count.
In the remainder of the chapter we focus on link C2-out because it is the most highly
utilized link, and is fed by two higher capacity links. It is therefore the best candidate for
observing queuing behaviour within the router.
6.3 Preliminary delay analysis
In this section we analyze the data obtained from the packet matching procedure. We start
by carefully defining the system under study, and then present the statistics of the delays
experienced by packets crossing it. The point of view is that of looking from the outside of
the router, seen largely as a ‘black box’, and we concentrate on simple statistics. In the next
section we begin to look inside the router, and examine delays in greater detail.
6.3. PRELIMINARY DELAY ANALYSIS 133
(a)
110
Total output C2−out
input BB1−in to C2−out
100 input BB2−in to C2−out
Total input
90
Link Utilization (Mbps)

80
70
60
50
40
30
20
06:00 09:00 12:00 15:00
Time of day (HH:MM UTC)
(b)
22
Total output C2−out
input BB1−in to C2−out
20 input BB2−in to C2−out
Total input
18
Link Utilization (kpps)
16
14
12
10
4
06:00 09:00 12:00 15:00
Figure 6.2: Utilization for link C2-out in (a): Megabit per second (Mbps) and (b): kilo
packet per second (kpps).
6.3.1 System definition
Recall the notation from equation (6.1): the mth packet of output link Λ j corresponds to
the nth packet of input link λi . The DAG timestamps an IP packet on the incoming in-
terface side as t(λi , n), and later on the outgoing interface at time t(Λ j , m). As the DAG
cards are physically close to the router, one might think to define the through-router de-
lay as t(Λ j , m) − t(λi , n). However, this would amount to defining the router ‘system’ in a
somewhat arbitrary way, because, as we showed in section 6.2.1, packets are timestamped
differently depending on the measurement hardware involved. Furthermore there are sev-
eral other disadvantages to such a definition, leading us to suggest the following alternative.
For self-consistency and extensibility to a multi-hop scenario, where we would like

individual router delays to add, arrival and departure times of a packet should be measured
consistently using the same bit. It is natural to focus on the end of the (IP) packet for two
reasons: (1) as a store & forward router, the output queue is the most important component
to describe. It is therefore appropriate to consider that the packet has left the router when
it completes its service at the output queue, that is when it has completely exited the router.
(2) Again as a store and forward router, no action (for example the forwarding decision)
is performed until the packet has fully entered the router. Thus the input buffer can be
considered as part of the input link, and not part of the system. With this convention, a
packet has entered the system when its last bit has been serviced by the input queue.
The arrival and departure instants in fact define the ‘system’, which is the part of the
router which we study, and is not exactly the same as the physical router as it excises the
input buffer. This buffer is the place where the packets are stored when they reach the
input linecard. This is a component which is already understood since its service rate is
the same as the bandwidth of the incoming link. Therefore it does not have to be modelled
or measured. Defining the system in this way can be compared with choosing the most
practical coordinate system to solve a given problem.
We now establish the precise relationships between the DAG timestamps defined earlier
and the time instants τ(λi , n) of arrival and τ(Λ j , m) of departure of a given packet to the
system as just defined. Denote by ln = Lm the size of the packet in bytes when indexed on
links λi and Λ j respectively, and let θi and Θ j be the corresponding link bandwidths in bits
per second. We denote by H the function giving the depth of bytes into the IP packet where
the DAG timestamps it. H is a function of the link speed, but not the link direction. For a
given link λi , H is defined as
H(λi ) = 4 if λi is an OC-48 link,
= b if λi is an OC-3 or OC-12 link,
where we take b to be a uniformly distributed integer between 0 and min(ln , 53) to account
for the ATM based discretisation described earlier. We can now derive the desired system
arrival and departure event times as:
τ(λi , n) = t(λi , n) + 8(ln − H(λi ))/θi (6.2)
τ(Λ j , m) = t(Λ j , m) + 8(Lm − H(Λ j ))/Θ j
These definitions are displayed schematically in figure 6.3. The snapshots are: (a): the
packet is timestamped by the DAG card monitoring the input interface at time t(λi , n), at
which point it has already entered the router, but not yet the system, (b): it has finished
entering the router (arrives at the system) at time τ(λi , n), and (c): is timestamped by the
6.3. PRELIMINARY DELAY ANALYSIS 135
λ i ,θ i
(a)
Λj ,Θ j
t(λ i ,n) time
λ i ,θ i
(b)
Λj ,Θ j
τ(λ i ,n) time
λ i ,θ i
(c)
Λj ,Θ j
t(Λ j ,m) time
λ i ,θ i
(d)
Λj ,Θ j
τ(Λ j,m) time
Figure 6.3: Four snapshots of a packet crossing the router.
DAG at the output interface at time t(Λ j , m). Finally (d): it fully exits the router (and
system) at time τ(Λ j , m).
With the above notations, the through-system delay experienced by packet m on link Λ j
is defined as
dλi ,Λ j (m) = τ(Λ j , m) − τ(λi , n). (6.3)
To simplify notations we shorten this to d(m) in what follows.
6.3.2 Delay statistics
A thorough analysis of single hop delays was presented in [141]. Here we follow a similar
methodology and obtain comparable results, but with the added certainty gained from not
needing to address the sampling issues caused by unobservable packets on the input side.
BB1−in to C2−out
5.5
min
mean
5 max
4.5
log10 Delay (us)

3.5
2.5
1.5
1
06:00:00 09:00:00 12:00:00 15:00:00
Figure 6.4: Packet delays from BB1-in to C2-out. All delays above 10ms are due to option
packets.
Figure 6.4 shows the minimum, mean and maximum delay experienced by packets go-
ing from input link BB1-in to output link C2-out over consecutive 1 minute intervals. As
observed in [141], there is a constant minimum delay across time, up to timestamping pre-
cision. The fluctuations in the mean delay follow roughly the changes in the link utilization
presented in figure 6.2. The maximum delay value has a noisy component with similar vari-
ations to the mean, as well as a spiky component. All the spikes above 10 ms have been
individually studied. The analysis revealed that they are caused by IP packets carrying op-
tions, representing less than 0.0001% of all packets. Option packets take different paths
through the router since they are processed through software, while all other packets are
processed with dedicated hardware on the so-called ‘fast path’. This explains why they
take significantly longer to cross the router.
In any router architecture it is likely that many components of delay will be proportional
to packet size. This is certainly the case for store & forward routers, as discussed in [95].
To investigate this here we compute the ‘excess’ minimum delay experienced by packets of
different sizes, that is not including their transmission time on the output link, a packet size
dependent component which is already understood. Formally, for every packet size L we
compute
∆λi ,Λ j (L) = min{dλi ,Λ j (m) − 8lm /Θ j |lm = L}. (6.4)

m
Note that our definition of arrival time to the system conveniently excludes another packet
size dependent component, namely the time interval between beginning and completing
entry to the router at the input interface.
Figure 6.5 shows the values of ∆λi ,Λ j (L) for packets going from BB1-in to C2-out. The
6.4. MODELLING 137
40
Minimum Router Transit Time ( µs )

35
30
25
20
0 500 1000 1500

packet size (bytes)
Figure 6.5: Measured minimum excess system transit times from BB1-in to C2-out.
IP packet sizes observed varied between 28 and 1500 bytes. We assume (for each size) that
the minimum value found across 13 hours corresponds to the true minimum, i.e. that at least
one packet encountered no contention on its way to the output queue and no packet in the
output queue when it arrived there. In other words, we assume that the system was empty
from the point of view of this input-output pair. This means that the excess minimum delay
corresponds to the time taken to make a forwarding decision (not packet size dependent), to
divide the packet into cells, transmit it across the switch fabric and reassemble it (each being
packet size dependent operations), and finally to deliver it to the appropriate output queue.
The step like curve means that there exist ranges of packet sizes with the same minimum
transit time. This is consistent with the fact that each packet is divided into fixed length
cells, transmitted through the backplane cell by cell, and reassembled. A given number of
cells can therefore correspond to a contiguous range of packet sizes with the same minimum
transit time. Visually, it appears that the steps each have a downward slopes, which means
that a full cell is transmitted faster than a nearly empty one. This could be due to the time
it takes for a non full cell to be padded with random bytes before being sent through the
switching fabric.
6.4 Modelling
We are now in a position to exploit the completeness of the data set to look inside the
system. This enables us to find a physically meaningful model which can be used both to
understand and predict the end-to-end system delay very accurately.
6.4.1 The fluid queue
We first recall some basic properties of FIFO queues that will be central in what follows.
Consider a FIFO queue with a single server of deterministic service rate µ, and let ti be the
arrival time to the system of packet i of size li bytes. We assume that the entire packet arrives
instantaneously (which models a fast transfer across the switch), but it leaves progressively
as it is served (modelling the output serialisation). Thus it is a fluid queue at the output but
not at the input. Nonetheless we will for convenience refer to it as the ‘fluid queue’.
Let Wi be the length of time packet i waits before being served. The service time of
packet i is simply li /µ, so the system time, that is the total amount of time spent in the
system, is
li
Si = Wi + . (6.5)
µ
The waiting time of the next packet (i + 1) to enter the system can be expressed by the
following recursion:
li
Wi+1 = [Wi + − (ti+1 − ti )]+ , (6.6)
µ
where [x]+ = max(x, 0). The service time of packet i + 1 reads
li+1
Si+1 = [Si − (ti+1 − ti )]+ + . (6.7)
µ
We denote by U(t) the amount of unfinished work at time t, that is the time it would take,
with no further inputs, for the system to completely drain. The unfinished work at the instant
following the arrival of packet i is nothing other than the end-to-end delay that that packet
will experience across the queuing system. It is therefore the natural mathematical quantity
to consider when studying delay. Note that it is defined at all real times t.
6.4.2 A simple router model
The delay analysis of section 6.3 revealed two main features of the system delay which
should be taken into account in a model: the minimum delay experienced by a packet, which
is size, interface, and architecture dependent, and the delay corresponding to the time spent
in the output buffer, which is a function of the rate of the output interface and the occupancy
of the queue. The delay across the output buffer could by itself be modelled by the fluid
queue as described above, however it is not immediately obvious how to incorporate the
minimum delay property in a sensible way.
Assume for instance that the router has N input links λ1 , ..., λN contributing to a given
output link Λ j and that a packet of size l arriving on link λi experiences at least the minimum
possible delay ∆λi ,Λ j (l) before being transferred to the output buffer. A representation of this
6.4. MODELLING 139
(a)
Δ1
N inputs
ΔN
(b)
N inputs Δ
Figure 6.6: Router mechanisms: (a) Simple conceptual picture including VOQs. (b) Actual
model with a single common minimum delay.
situation is given in figure 6.6(a). Our first problem is that given different technologies on
different interfaces, the functions ∆λ1 ,Λ j , ..., ∆λn ,Λ j are not necessarily identical. The second
is that we do not know how to measure, nor to take into account, the potentially complex
interactions between packets which do not experience the minimum excess delay but some
larger value due to contention in the router arising from cross traffic.
We address this by in fact simplifying the picture still further, in two ways. First we
assume that the minimum delays are identical across all input interfaces: a packet of size l
arriving on link λi and leaving the router on link Λ j now experiences an excess minimum
delay
∆Λ j (l) = min{∆λi ,Λ j (l)}. (6.8)
i
In the following we drop the subscript Λ j to ease the notation. Second, we assume that
the multiplexing of the different input streams takes place before the packets experience
their minimum delay. By this we mean that we preserve the order of their arrival times and
consider them to enter a single FIFO input buffer. In doing so, we effectively ignore all
complex interactions between the input streams. Our highly simplified picture, which is in
fact the model we propose, is shown in figure 6.6(b). We will justify these simplifications a
posteriori in section 6.4.3 where the comparison with measurement shows that the model is
remarkably accurate. We now explain why we can expect this accuracy to be robust.
Suppose that a packet of size l enters the system at time t + and that the amount of
unfinished work in the system at time t − was U(t − ) > ∆(l). The following two scenarios
produce the same total delay:
(i) the packet experiences a delay ∆(l), then reaches the output queue and waits U(t) −
∆(l) > 0 before being served, or
(ii) the packet reaches the output queue straight away and has to wait U(t) before being
served.
In other words, as long as there is more than an amount ∆(l) of work in the queue when a
packet of size l enters the system, the fact that the packet should wait ∆(l) before reaching
the output queue can be neglected. Once the system is busy, it behaves exactly like a simple
fluid queue. This implies that no matter how complicated the front end of the router is, one
can simply neglect it when the output queue is sufficiently busy. The errors made through
this approximation will be strongly concentrated on packets with very small delays, whereas
the more important medium to large delays will be faithfully reproduced. Apart from its
simplicity, this robustness is the main motivation for the model.
A system equation for our two stage model can be derived as follows. Assume that the
system is empty at time t0− and that packet k0 of size l0 enters the system at time t0+ . It waits
∆(l0 ) before reaching the empty output queue where it immediately starts being served. Its
service time is l0 /µ and therefore its total system time is
l0
S0 = ∆(l0 ) + . (6.9)
µ
Suppose a second packet enters the system at time t1 and reaches the output queue before
the first packet has finished being served, i.e. t1 + ∆(l1 ) < t0 + S0 . It will start being served
when packet k0 leaves the system, i.e at t0 + S0 . Its system time will therefore be:
l1
S1 = S0 − (t1 − t0 ) + .
µ
The same recursion holds for successive packets k and k + 1 as long as the amount of unfin-
ished work in the queue remains above ∆(lk+1 ) when packet k + 1 enters the system:
tk+1 + ∆(lk+1 ) < tk + Sk . (6.10)
Therefore, as long as equation (6.10) is verified, the system times of successive packets are
obtained by the same recursion as for the case of a busy fluid queue:
lk+1
Sk+1 = Sk − (tk+1 − tk ) + . (6.11)
µ
+
Suppose now that packet k + 1 of size lk+1 enters the system at time tk+1 and that the
− −
amount of unfinished work in the system at time tk+1 is such that 0 < U(tk+1 ) < ∆(lk+1 ).
In this case, the output buffer will be empty by the time packet k + 1 reaches it after having
waited ∆(lk+1 ) in the first stage of the model. The service time of packet k + 1 therefore
reads
lk+1
Sk+1 = ∆(lk+1 ) + . (6.12)
µ
6.4. MODELLING 141
450
160 data data
model 400 model
140
350
120
queue size ( µs )
queue size ( µs )
300
100
250
80
200
60
150
40 100
20 50
0 0
0 0.2 0.4 0.6 0.8 1 1.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
time ( ms ) time ( ms )
Figure 6.7: Comparisons of measured and predicted delays on link C2-out: Grey line: un-
finished work U(t) in the system according to the model, Black dots: measured
delay value for each packet.
A crucial point to note here is that in this situation, the output queue can be empty but the
system still busy with a packet waiting in the front end. This is also true of the actual router.
Once the queue has drained, the system is idle until the arrival of the next packet. The
time between the arrival of a packet to the empty system and the time when the system
becomes empty again defines a system busy period. In this brief analysis, we have assumed
an infinite buffer size. It is a reasonable assumption since it is quite common for a line card
to be able to accommodate up to 500 ms worth of traffic.
6.4.3 Evaluation
We now evaluate our model and compare its results with empirical delay measurements.
The model delays are obtained by multiplexing the traffic streams BB1-in to C2-out and
BB2-in to C2-out and feeding the resulting packet train to the model in an exact trace driven
‘simulation’. Figure 6.7 shows two sample paths of the unfinished work U(t) corresponding
to two fragments of real traffic destined to C2-out. The process U(t) is a right continuous
jump process where each jump marks the arrival time of a new packet. The resultant new
local maximum is the time taken by the newly arrived packet to cross the system, that is
its delay. The black dots represent the actual measured delays for the corresponding input
packets. In practice the queue state can only be measured when a packet enters the system.
Thus the black dots can be thought of samples of U(t) obtained from measurements, and
agreement between the two seems very good.
In order to see the limitations of our model, we focus on a set of busy periods on link
C2-out involving 510 packets all together. The top plot of figure 6.8 shows the system
times experienced by incoming packets, both from the model and from measurements. The
largest busy period on the figure has a duration of roughly 16 ms and an amplitude of more
than 5 ms. Once again, the model reproduces the measured delays very well. The lower
plot in figure 6.8 shows the error of our model, that is the difference between measured and
modeled delays at each packet arrival time, plotted on the same time axis as the upper plot.
There are three main points one can make about the model accuracy. First, the absolute
error is within 30µs of the measured delays for almost all packets. Second, the error is much
larger for a few packets, as shown by the spiky behaviour of the error plot. These spikes
are due to a local reordering of packets inside the router that is not captured by our model.
Recall from figure 6.6(b) that we made the simplifying assumption that the multiplexing
of the input streams takes place before the packets experience their minimum delay. This
means that packets exit our system in the exact same order as they entered it. However
in practice local reordering can happen when a large packet arrives at the system on one
interface just before a small packet on another interface. Given that the minimum transit
time of a packet depends linearly on its size (see figure 6.5), the small packet can overtake
the large one and reach the output buffer first. Once the two packets have reached the output
buffer, the amount of work in the system is the same, irrespectively of their arrival order.
Thus these local errors do not accumulate. Intuitively, local reordering requires that two
packets arrive almost at the same time on two different interfaces. This is much more likely
to happen when the links are busy. This is in agreement with figure 6.8 which shows that
spikes always happen when the queuing delays are increasing, a sign of high local link
utilization.
The last point worth noticing is the systematic linear drift of the error across a busy
period duration. This is due to the fact that our queuing model drains slightly faster than
the real queue. We could not confirm any physical reason why the IP bandwidth of the
link C2-out is smaller than what was predicted in section 6.2.1. However, the important
observation is that this phenomenon is only noticeable for very large busy periods, and is
lost in measurement noise for most busy periods.
The model presented above has some limitations. First it does not take into account
the fact that a small number of option packets will take a ‘slow’ software path through
the router instead of being entirely processed at the hardware level. As a result, option
packets experience a much larger delay before reaching the output buffer, but as far as the
model is concerned, transit times through the router only depend on packet sizes. Second,
the output queue stores not only the packets crossing the router, but also the ‘unmatched’
packets generated by the router itself, as well as control PoS packets. These packets are not
accounted for in the model.
6.4. MODELLING 143
6000
measured delays
5000 model
4000
delay ( µs )
3000
2000
1000
0
0 5 10 15 20 25
time ( ms )
150
100
error ( µs )
50
−50
0 5 10 15 20 25
time ( ms )
Figure 6.8: Measured delays and model predictions (top), Absolute error between data and
model (bottom).
Despite its simplicity, our model is considerably more accurate than other single-hop
delay models. Figure 6.9(a) compares the errors made on the packet delays from the OC-
3 link C2-out presented in figure 6.8 with three different models: our two stage model, a
fluid queue with OC-3 nominal bandwidth, and a fluid queue with OC-3 IP bandwidth. As
expected, with a simple fluid model, i.e. when one does not take into account the minimum
transit time, all the delays are systematically underestimated. If moreover one chooses the
nominal link bandwidth (155.52 Mbps) for the queue instead of a carefully justified IP
bandwidth (149.76 Mbps), the errors inside a busy period build up very quickly because
the queue drains too fast. There is in fact only a 4% difference between the nominal and
effective bandwidths, but this is enough to create errors up 800µs inside a moderately large
busy period.
Figure 6.9(b) shows the cumulative distribution function of the delay error for a 5 minute
window of C2-out traffic. Of the delays inferred by our model, 90% are within 20µs of the
measured ones. Given the timestamping precision issues described in section 6.2.1, these
results are very satisfactory.
(a)
200
−200
error ( µs )
−400
−600
−800
model
fluid queue with OC−3 effective bandwidth
−1000 fluid queue with OC−3 nominal bandwidth
0 5 10 15 20 25
time ( ms )
(b) (c)
1 1
0.8 0.5
Relative error (%)
0.6 0
0.4 −0.5
0.2 −1
0 −1.5
−200 −100 0 100 200 50 60 70 80 90 100 110
error (µs) Link utilization (Mbps)
Figure 6.9: (a) Comparison of error in delay predictions from different models of the sam-
ple path from figure 6.8. (b) Cumulative distribution function of model error
over a 5 minute window on link C2-out. (c) Relative mean error between delay
measurements and model on link C2-out vs link utilization.
We now evaluate the performance of our model over the entire 13 hours of traffic on
C2-out as follows. We divide the period into 156 intervals of 5 minutes. For each interval,
we plot the average relative delay error against the average link utilization. The results are
presented in figure 6.9. The absolute relative error is less than 1.5% for the whole trace,
which confirms the excellent match between the model and the measurements. For large
utilisation levels, the relative error grows due to the fact that large busy periods are more
frequent. The packet delays therefore tend to be underestimated more often due to the
unexplained bandwidth mismatch occurring inside large busy periods. Overall, our model
performs very well for a large range of link utilizations.
6.5. DELAY PERFORMANCE: UNDERSTANDING AND REPORTING 145
6.4.4 Router model summary
Based on the observations and analysis presented above, we propose the following simple
approach for modeling store and forward routers. For each output link Λ j :
(i) measure the minimum excess (i.e. excluding service time) packet transit time ∆λi ,Λ j
between each input λi and the given output Λ j , as defined in equation (6.4). These
depend only on the hardware involved, not the type of traffic, and could potentially be
tabulated. Define the overall minimum packet transit time ∆Λ j as the minimum over
all input links λi , as described in equation (6.8).
(ii) calculate the IP bandwidth of the output link by taking into account the different
levels of packet encapsulation, as described in section 6.2.1.
(iii) obtain packet delays by aggregating the input traffic corresponding to the given output
link, and feeding it to a simple two stage model, illustrated in figure 6.6(b), where
packets are first delayed by an amount ∆Λ j before entering a FIFO queue. System
equations are given in section 6.4.2.
A model of a full router can be obtained by putting together the models obtained for each
output link Λ j .
Although very simple, this model performed remarkably well for our data set, where
the router was lightly loaded and the output buffer was clearly the bottleneck. As explained
above, we expect the model to continue to perform well even under heavier load where
interactions in the front end become more pronounced, but not dominant. The accuracy
would drop off under loads heavy enough to shift the bottleneck to the switching fabric,
when details of the scheduling algorithm could no longer be neglected.
6.5 Delay performance: understanding and reporting

6.5.1 Motivation
From the previous section, our router model can accurately predict delays when the input
traffic is fully characterized. However in practice the traffic is unknown, which is why
network operators rely on available simple statistics, such as curves giving upper bounds on
delay as a function of link utilization, when they want to infer packet delays through their
networks. The problem is that these curves are not unique since packet delays depend not
only on the mean traffic rate, but also on more detailed traffic statistics.
In fact, link utilization alone can be very misleading as a way of inferring packet delays.
Suppose for instance that there is a group of back to back packets on link C2-out. This
means that packets follow each other on the link without gaps, i.e. the local link utilization
is 100%. However this does not imply that these packets have experienced large delays
inside the router. They could very well be coming back to back from the input link C1-in
with the same bandwidth as C2-out. In this case they would actually cross the router with
minimum delay in the absence of cross traffic.
Inferring average packet delays from link utilization only is therefore fundamentally
flawed. Instead, we propose to study performance related questions by going back to the
source of large delays: queue build-ups in the output buffer. In this section we use our
understanding of the router mechanisms obtained from our measurements and modelling
work of the previous sections to first describe the statistics and causes of busy periods, and
second to propose a simple mechanism that could be used to report useful delay information
about a router.
6.5.2 Busy periods

Definition
Recall from section 6.4 that we defined busy periods as the time between the arrival of a
packet in the empty system and the time when the system goes back to its empty state. The
equivalent definition in terms of measurements is as follows: a busy period starts when a
packet of size l bytes crosses the system with a delay ∆(l) + l/µ, and it ends with the last
packet before the start of another busy period. This definition, which makes full use of
our measurements, is a lot more robust than an alternate definition based solely on packet
inter-arrival times at the output link. For instance, if one were to detect busy periods by
using timestamps and packet sizes to group together back-to-back packets, the following
two problems would occur. First, timestamping errors could lead to wrong busy periods
separations. Second and more importantly, according to our system definition from section
6.4.2, packets belonging to the same busy period are not necessarily back to back on the
output link (see equation 6.12).
Statistics
To describe busy periods, we begin by collecting per busy period statistics, such as dura-
tion, number of packets and bytes, and amplitude (maximum delay experienced by a packet
inside the busy period). The cumulative distribution functions (CDF) of busy period ampli-
tudes and durations are plotted in figures 6.10(a) and 6.10(b) for a 5 minute traffic window.
For this traffic window, 90% of busy periods have an amplitude smaller than 200µs, and
80% last less than 500µs. Figure 6.10(c) shows a scatter plot of busy period amplitudes
(a) (b)
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 200 400 600 800 1000 0 0.5 1 1.5 2 2.5 3
Amplitude (µs) Duration (ms)
(c) (d)
7 7
6.5 6.5
6 6
busy period amplitude (ms)
busy period amplitude (ms)
5.5 5.5
5 5
4.5 4.5
4 4
3.5 3.5
3 3
2.5 2.5
2 2
0 10 20 30 40 50 60 70 80 0 0.5 1 1.5 2 2.5 3 3.5 4
busy period duration (ms) median delay (ms)
Figure 6.10: (a) CDF of busy period amplitudes. (b) CDF of busy period durations. (c)
Busy period amplitudes as a function of busy period durations. (d) Busy
period amplitudes as a function of median packet delay.
against busy period durations for amplitudes larger than 2ms on link C2-out (busy periods
containing option packets are not shown). There does not seem to be any clear pattern link-
ing amplitude and duration of a busy period in this data set, although roughly speaking the
longer the busy period the larger its amplitude.
A scatter plot of busy period amplitudes against the median delay experienced by pack-
ets inside the busy period is presented in figure 6.10(d). One can see a linear, albeit noisy,
relationship between maximum and median delay experienced by packets inside a busy pe-
riod. This means intuitively that busy periods have a ‘regular’ shape, i.e. busy periods
where most of the packets experience small delays and only a few packets experience much
larger delays are unlikely.
Origins
Our full router measurements allow us to go further in the characterization of busy periods.
In particular, we can use our knowledge about the input packet streams on each interface
to understand the mechanisms that create the busy periods observed for our router output
links. It is clear that, by definition, busy periods are created by a local aggregate arrival rate
which exceeds the output link service rate. This can be achieved by a single input stream,
the multiplexing of different input streams, or a combination of both phenomena. A detailed
analysis can be found in [142]. We restrict ourselves in this section to an illustration of these
different mechanisms.
To create the busy periods shown in figure 6.11, we store the individual packet streams
BB1-in to C2-out and BB2-in to C2-out, feed them individually to our model and obtain
virtual busy periods. The delays obtained are plotted on figure 6.11(a), together with the
true delays measured on link C2-out for the same time window as in figure 6.8. In the
absence of cross traffic, the maximum delay experienced by packets from each individual
input stream is around 1ms. However, the largest delay for the multiplexed inputs is around
5ms. The large busy period is therefore due to the fact that the delays of the two individual
packet streams peak at the same time. This non linear phenomenon is the cause of all
the large busy periods observed in our traces. A more surprising example is illustrated in
figure 6.11(b) that shows one input stream creating at most a 1ms packet delay by itself and
the other a succession of 200µs delays. The resulting congestion episode for the multiplexed
inputs is again much larger than the individual episodes. A different situation is shown on
figure 6.11(c), where one link contributes almost all the traffic of the output link for a short
time period. In this case, the measured delays are almost the same as the virtual ones caused
by the busy input link.
It is interesting to notice that the three large busy periods plotted in figures 6.11(a),
6.11(b) and 6.11(c) all have a roughly triangular shape. Figures 6.11(d), 6.11(e) and 6.11(f)
that show that this is not due to a particular choice of busy periods. They were obtained as
follows. For each 5 min interval, we detect the largest packet delay, store the corresponding
packet arrival time t0 , and plot the delays experienced by packets in a window 10ms before
and 15ms after t0 . The resulting sets of busy periods are grouped according to the largest
packet delay observed: figure 6.11(d) when the largest amplitude is between 5ms and 6ms,
figure 6.11(e) between 4ms and 5ms, and figure 6.11(f) between 2ms and 3ms. For each
of the plots 6.11(d), (e) and (f), the black line highlights the busy period detailed in the
plot directly above it. The striking point is that most busy periods have a roughly triangular
shape. The largest busy periods have slightly less regular shapes, but a triangular assumption
can still hold.
These results are reminiscent of the theory of large deviations, which states that rare
events happen in the most likely way. Some hints on the shape of large busy periods in
(Gaussian) queues can be found in [9] where it is shown that, in the limit of large amplitude,
(a) (b) (c)
6 6 6
C2−out C2−out C2−out
BB1−in to C2−out BB1−in to C2−out BB1−in to C2−out
BB2−in to C2−out 5 BB2−in to C2−out 5 BB2−in to C2−out
5
4 4 4
3 3 3
delay ( ms )
delay ( ms )
delay ( ms )
2 2 2
1 1 1
0 0 0
0 5 10 15 20 25 0 5 10 15 20 0 5 10 15 20
time ( ms ) time ( ms ) time ( ms )
(d) (e) (f)

6 6 6
5 5 5
4 4 4
3 3 3
delay ( ms )
delay ( ms )
delay ( ms )
2 2 2
1 1 1
0 0 0
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
6.5. DELAY PERFORMANCE: UNDERSTANDING AND REPORTING
time ( ms ) time ( ms ) time ( ms )
Figure 6.11: (a) (b) (c) Illustration of the multiplexing effect leading to a busy period on the output link C2-out. (d) (e) (f) Collection of largest busy periods in
each 5 min interval on the output link C2-out.
149
D
measured busy period
theoretical bound
modelled busy period
delay
0
0 D
time
Figure 6.12: Modelling of busy period shape with a triangle.
busy periods tend to be antisymmetric about their midway point, in agreement with what
we see here.
6.5.3 Modelling busy period shape
Although a triangular approximation may seem very crude at first, we now study how useful
such a model could be. To do so, we first illustrate in figure 6.12 a basic principle: any busy
period of duration D seconds is bounded above by the busy period obtained in the case
where the D seconds worth of work arrive in the system at maximum input link speed. The
amount of work then decreases with slope −1 if no more packets enter the system. In the
case of the OC-3 link C2-out fed by the two OC-48 links BB1 and BB2 (each link being
16 times faster than C2-out), it takes at least D/32 seconds for the load to enter the system.
From our measurements, busy periods are quite different from their theoretical bound. The
busy period shown in figures 6.8 and 6.11(a) is again plotted in figure 6.12 for comparison.
One can see that its amplitude A is much lower than the theoretical maximum, in agreement
with the scatter plot of figure 6.10(c).
In the rest of the chapter we model the shape of a busy period of duration D and ampli-
tude A by a triangle with base D, height A and same apex position as the busy period. This
is illustrated in figure 6.12 by the triangle superposed over the measured busy period. This
very rough approximation can give surprisingly valuable insight into packet delays. We de-
fine our performance metric as follows. Let L be the delay experienced by a packet crossing
the router. A network operator might be interested in knowing how long a congestion level
larger than L will last, because this gives a direct indication of the performance of the router.
Let dL,A,D be the length of time the workload of the system remains above L during a
(T )
busy period of duration D and amplitude A, as obtained from our delay analysis. Let dL,A,D
(T )
be the approximated duration obtained from the shape model. Both dL,A,D and dL,A,D are
plotted with a dashed line in figure 6.12. From basic geometry one can show that
D(1 − AL ) if A ≥ L

(T )
dL,A,D = (6.13)
0 otherwise.
(T )
In other words, dL,A,D is a function of L, A and D only. For the metric considered, the two
parameters (A, D) are therefore enough to describe busy periods, the knowledge of the apex
position does not improve our estimate of dL,A,D .
Denote by ΠA,D the random process governing {A, D} pairs for successive busy periods
over time. The mean length of time during which packet delays are larger than L reads
Z
TL = dL,A,D dΠA,D . (6.14)
TL can be approximated by our busy period model with

Z
(T ) (T )
TL = dL,A,D dΠA,D . (6.15)
We use equation (6.15) to approximate TL on the link C2-out. The results are plotted on
figure 6.13 for two 5 minute windows of traffic with different average utilizations. For both
utilization levels, the measured durations (solid line) and the results from the triangular
approximation (dashed line) are fairly similar. This shows that our very simple triangular
shape approximation captures enough information about busy periods to answer questions
about duration of congestion episodes of a certain level. The small discrepancy between data
and model can be considered insignificant in the context of Internet applications because a
service provider will be realistically only interested in the order of magnitude (1ms, 10ms,
100ms) of a congestion episode greater than L. Our simple approach therefore fulfills that
role very well.
Let us now qualitatively describe the behaviours observed on figure 6.13. For a small
congestion level L, the mean duration of the congestion episode is also small. This is due
to the fact that, although a large number of busy periods have an amplitude larger than L,
as seen for instance from the amplitude CDF in figure 6.10(a), most busy periods do not
exceed L by a large amount, so the mean duration is small. It is also worth noticing that the
results are very similar for the two different link utilizations. This means that busy periods
with small amplitude are roughly similar at this time scale, and do not depend on average
utilization.
1.8
1.6
Mean duration (ms)

1.4
1.2
0.8 Data 0.7 utilization

Equation (6.15)
0.6 Equation (6.17)
Data 0.35 utilization
Equation (6.15)
0.4 Equation (6.17)
0 0.5 1 1.5 2 2.5 3 3.5
L (ms)
Figure 6.13: Average duration of a congestion episode above L ms defined by equa-

tion (6.15), for two different utilization levels (0.3 and 0.7) on link C2-out.
Solid lines: data, dashed lines: equation (6.15), dots: equation (6.17).
As the threshold L increases, the (conditional on L) mean duration first increases as

there are still a large number of busy periods with amplitude greater than L on the link, and
of these, most are considerably larger than L. With an even larger values of L however,
fewer and fewer busy periods qualify. The ones that do cross the threshold L do so for a
smaller and smaller amount of time, up to the point where there are no busy periods larger
than L in the trace.
6.5.4 Reporting busy period statistics
The study presented above shows that one can get useful information about delays by jointly
using the amplitude and duration of busy periods. Now we look into ways in which such
statistics could be concisely reported using SNMP.
We start by forming busy periods from the queue size values and collecting (A, D) pairs
during 5 minutes intervals. This is feasible in practice since the queue size is already ac-
cessed by other software such as active queue management schemes. Measuring A and D is
easily performed on-line. In principle we need to report the pair (A, D) for each busy period
in order to recreate the process ΠA,D and evaluate equation (6.15). Since this represents a
very large amount of data in practice, we instead assume that busy periods are independent
and therefore that the full process ΠA,D can be described by the joint marginal distribution
FA,D of A and D. Thus, for each busy period we need simply update a sparse 2-D histogram.
The bin sizes should be as fine as possible consistent with available computing power and
memory. We do not consider these details here. They are not critical since at the end of the
5 minute interval a much coarser discretisation is performed in order to limit the volume of
data finally exported via SNMP. We control this directly by choosing N bins for each of the
amplitude and the duration dimensions.
As we do not know a priori what delay values are common, the discretisation scheme
must adapt to the traffic to be useful. A simple and natural way to do this is to select bin
boundaries for D and A separately based on quantiles, i.e. on bin populations. For exam-
ple a simple equal population scheme for D would define bins such that each contained
(100/N)% of the measured values. Denote by M the N × N matrix representing the quan-
tized version of FA,D . The element p(i, j) of M is defined as the probability of observing
a busy period with duration between the (i − 1)th and ith duration quantile, and amplitude
between the ( j − 1)th and jth amplitude quantile. Given that for every busy period A < D,
the matrix is triangular, as shown in figure 6.14. Every 5 minutes, 2N bin boundary values
for amplitude and duration, and N 2 /2 joint probability values, are exported.
The 2-D histogram stored in M contains the 1-D marginals for amplitude and duration,
characterizing respectively packet delays and link utilization. In addition however, from
the 2-D histogram we can see at a glance the relative frequencies of different busy period
shapes. Using this richer information, together with a shape model, M can be used to
answer performance related questions. Applying this to the measurement of TL introduced
in section 6.5.3, and assuming independent busy periods, equation (6.15) becomes

L
Z Z
(T ) (T )
TL = dL,A,D dFA,D = D 1− dFA,D . (6.16)
A>L A
To evaluate this, we need to determine a single representative amplitude Ai and average

duration D j for each quantized probability density value p(i, j), (i, j) ∈ {1, ..., N}2 , from M.
One can for instance choose the center of gravity of each of the tiles plotted in figure 6.14.
For a given level L, the average duration TL can then be estimated by
N j
(T ) 1 (T )
TL =
g
∑ ∑ dL,A ,D p(i, j), (6.17)
nL j=1 i=1
i j
Ai >L
where nL is the number of pairs (Ai , D j ) such that Ai > L. Estimates obtained from equa-
tion (6.17) are plotted in figure 6.13. They are fairly close to the measured durations despite
the strong assumption of independence.
Although very simple and based on a rough approximation of busy period shapes, this
reporting scheme can give some interesting information about the delay performance of a
router. In this preliminary study we have only illustrated how TL could be approximated
with the reported busy period information, but other performance related questions could
be answered in the same way. In any case, our reporting scheme provides a much more
1100
0.09
1000
0.08
900
0.07
800
Amplitude ( µs )
700 0.06
600 0.05
500 0.04
400
0.03
300
0.02
200
0.01
100
0
200 400 600 800 1000
Duration ( µs )
Figure 6.14: Histogram of the quantized joint probability distribution of busy period am-
plitudes and durations with N = 10 equally spaced quantiles along each di-
mension for a 5 minute window on link C2-out.
valuable insight about packet delays than presently available statistics based on average link
utilization. Moreover, it is only based on measurements and is therefore traffic independent.
6.6 Conclusion
In this chapter we have explored in detail ‘through-router’ delays. We first described a

unique experimental setup where we captured all IP packets crossing a Tier-1 access router
and presented authoritative empirical results about packet delays. Second, we used our
dataset to provide a physical model of router delay performance, and showed that our model
could very accurately infer packet delays. Our third contribution concerns a fundamental
understanding of delay performance. We gave the first measured statistics of router busy
periods that we are aware of, and presented a simple triangular shape model that can capture
useful delay information. We then proposed a scheme to export router delay performance
in a compact way.
Chapter 7
Modelling Internet traffic
7.1 Introduction
In this chapter, we use the knowledge gained by answering the three questions detailled
in section 1.2, and present a complete validation of our traffic model at a network node.
We will in fact put together the empirical findings from chapter 3, the modelling work
from chapter 4, some mathematical results from chapter 5 and the router mechanisms from
chapter 6 to gain a thorough understanding of the problem.
We start by presenting some empirical results showing how a packet train is modified
by the router in section 7.2. Based on these findings, we define the ‘packet’ time scale as
the smallest time scale at which the BLPP can be applied. In section 7.3 we validate the
BLPP model on the data collected in chapter 6 and show how it can be used to model the
splitting and merging of packet streams through a router. We conclude in section 7.4.
7.2 Empirical observations
We now have all the elements in place to understand how a router modifies a packet train.
We first, in section 7.2.1, give the details of the traffic streams on which we base our analy-
sis, and then illustrate how a router modifies a packet train through a queuing mechanism in
section 7.2.2. We present some consequences for traffic modelling in section 7.2.3.
7.2.1 Details of traffic streams
In this section, we give a full description of the virtual paths linking input and output
linecards in our fully instrumented router. Recall that general characteristics of the traf-
fic over the entire 13 hours of the data collection were already given in chapter 6. Here
we focus on a two hour period where the traffic is roughly stationary on all the links, and
over which we aim to validate our modelling results. Figure 7.1 shows a schematic of the
155
156 CHAPTER 7. MODELLING INTERNET TRAFFIC
in out
BB1 out OC48 OC3 in
C1
out
OC3 in C2
out
OC3 in
C3
in out
BB2 out OC48 OC12 in
C4
Figure 7.1: Router diagram illustrating the multiplexing of input streams contributing to
link C2-out.
Trace # Packets # Flows Band.width

(Mb.ps)
C1-in 21363721 1845783 16.2
C1-out 17384671 2643529 3.2
C2-in 216140434 27320806 71.7
C2-out 108637851 7857864 79.7
C3-out 52998594 3945802 57.6
C4-in 49801794 3655830 39.5
C4-out 67797464 6848361 20.4
BB1-in 119808388 9502484 81.2
BB1-out 120286864 15742387 53.6
BB2-in 126566855 11761474 78.9
BB2-out 166385423 16874143 73.7
Table 7.1: Details of traces collected over a two hour period: trace name, number of pack-
ets, number of flows, average bandwidth.
router. The details of the traces collected at each linecard are given in table 7.1. There is a
wide range of link utilizations: backbone links are utilized less than 4%, while utilization
on customer links ranges from 2% on C1-in to 52% on C2-out. Over the 2 hour period
considered, the router routed over 1 billion packets, corresponding to roughly 100 million
IP flows, at an average rate of 575mbps. From table 6.2.1 page 129 we know that there is
virtually no traffic on input link C3-in, and we therefore discard this link.
We use the results of the packet matching analysis presented in section 6.2.3 page130
to decompose each input packet trace into groups of packets, or substreams, flowing from a
given input to a given output linecard where they exit the router. Similarly, an output trace
is decomposed into substreams corresponding to the different input line cards.
Table 7.2 shows the logical paths inside the router. The router is not ‘fully meshed’, i.e.
not every input contributes to every output. First, it makes sense from a routing perspective
that there is no traffic on the matrix diagonal: there is no point sending traffic to the router
C1-in C2-in C3-in C4-in BB1-in BB2-in

C1-out ✔ ✔ ✔
C2-out ✔ ✔ ✔ ✔
C3-out ✔ ✔ ✔
C4-out ✔ ✔ ✔
BB1-out ✔ ✔ ✔
BB2-out ✔ ✔ ✔
Table 7.2: Router ‘matrix’ showing the packet streams through the router. Empty boxes
mean that there is no traffic flowing between the specified input and output
linecards.
Substream # Packets # Flows Band.width

(Mb.ps)
C1-in to C2-out 12445 1052 0.004
C1-in to BB1-out 9664976 853932 7.0
C1-in to BB2-out 11669672 988326 9.2
C2-in to C4-out 300495 35445 0.05
C2-in to BB1-out 87430709 13294988 28.6
C2-in to BB2-out 127968526 13814708 43.0
C4-in to C1-out 29419 2308 0.003
C4-in to C2-out 39039 4768 0.02
C4-in to C3-out 98359 7087 0.09
C4-in to BB1-out 22955506 1573749 17.9
C4-in to BB2-out 26577591 2056484 21.5
BB1-in to C1-out 9634095 1414227 1.88
BB1-in to C2-out 50210170 3403399 36.6
BB1-in to C3-out 28653178 1966237 32.4
BB1-in to C4-out 31184234 2705399 11.0
BB2-in to C1-out 7709435 1226433 1.27
BB2-in to C2-out 58258428 4423855 43.1
BB2-in to C3-out 24224258 1969708 25.2
BB2-in to C4-out 36207136 4107360 9.3
Table 7.3: Details of each substream obtained with the packet matching procedure: name,
number of packets, number of flows, average bandwidth.
if the traffic is then sent back to where it comes from. Second, because this router is a
gateway router linking clients to a Tier-1 backbone network, most of the traffic goes from
the backbone to the clients or from the clients to the backbone. There is little traffic between
clients, and none between the backbone links. A typical situation for an output link is
illustrated in figure 7.1: packets exiting on link C2-out come from input links C1-in, C4-in,
BB1-in and BB2-in. From table 7.2 we can form 19 substreams between router linecards,
details of which can be found in table 7.3.
7.2.2 Packet train through a router
From the understanding of packet delay mechanisms in chapter 6, we know that a packet
train is modified by a router since not all packets experience the same delay. In this section,
we seek to quantify the extent of the changes. We could simply study the differences of
packet arrival times for each substream of table 7.3 timestamped both before and after the
router. However this would not take into account the cross traffic and would make any
physical interpretation of the results difficult. Instead, we compare the timing of all the
packets on an output link with the arrival times of these same packets taken together on the
different input interfaces. In this way, we capture all the packets in the output buffer and are
more likely to give a physical explanation of our results.
We place ourselves in a worst case scenario by studying the most congested link, C2-
out, and study the modifications incurred by substreams contributing to this link. In order
to simplify the notation, we call Xout the packet train observed on C2-out. From table 7.3
these packets entered the router on links BB1-in and BB2-in, with a small fraction also
coming from C1-in and C4-in. We can form a virtual input packet train Xin , timestamped
on the input linecards, by multiplexing the packets coming from these four input links and
destined to C2-out.
Figure 7.2 illustrates the second order properties of Xin and Xout , and shows how the
point process of packet arrival times is modified through the router. The thick black line
and the thin grey line represent respectively the LDs of Xin and Xout . The LDs are identical
at scales larger than 20ms, while they are noticeably different at smaller time scales. Recall
that at small scales the confidence intervals on the estimation are very small, which means
that the observed difference is significant. This result can be intuitively understood from the
delay analysis carried out in chapter 6: most packets experience less than 1ms delay in the
router therefore the behaviour of Xin over time scales larger than 1ms will remain unchanged
through the router. In very rough terms, one can think of the point process Xout as a point
by point translation of Xin , as introduced in section 2.3.5 page 31. This is of course an
approximation since the translation operation moves each point by an i.i.d. amount whereas
in practice the delay of a packet is conditioned both by its size and the history of the queuing
process.
At very small scales, where periodicities due to back-to-back packets might occur,
wavelets are not necessarily the best analysis tool to use since they average out the spectral
estimation on a certain frequency range. We therefore use periodograms to estimate the
power spectral density at small scales. The periodograms are represented by the ‘noisy’
30.5mus 977mus 0.031 1

Input Fourier spectrum
19 Output Fourier spectrum
Input LD
Output LD
18
17
1500 bytes
at OC-3 speed
16
15
Poisson 14
level
13
−15 −10 −5 0 5
Micro Fine Knee Coarse

time scales time scales transition time scales
Packet
time scale
BLPP model
Figure 7.2: Fourier spectral density and LD for output traffic C2-out and sum of contribut-
ing input streams. Definition of the ‘packet’ time scale.
signals in figure 7.2. Although wavelet and periodogram estimates agree for time scales
larger than 200µs and above, the Fourier spectral density shows strong periodicities in Xout
that were averaged out by the wavelet analysis. A quantitative analysis of the largest peak
exhibited by the power spectral density of Xout reveals that it corresponds to the transmis-
sion time of a 1500 byte packet at the nominal bandwidth of link C2-out, i.e. 1500 byte
packets placed back-to-back on the output link. No such periodicities could be observed
in the incoming packet train Xin . Other peaks are not as easily identifiable, since they mix
other packet sizes and harmonics of lower frequencies.
Finally, we illustrate the micro behaviour of Xin and Xout on a 2ms time window. The
jumps in the top plot of figure 7.3 mark the packet arrival times of Xin while the bottom plot
shows Xout . The packet sizes are represented respectively by the height of the jumps in the
top plot and by the grey rectangles in the bottom one. The router busy periods explain how
Xin has been locally modified by the router to become Xout .
U(t) 200
100
0
0 1 2
time ( ms )
Figure 7.3: Top: Unfinished work U(t) in the queue. Bottom: Corresponding packets on
link C2-out.
7.2.3 Modelling consequences
Such router measurements highlight some interesting problems of traffic modelling. For
instance, they show that traffic characteristics at scales larger than 10ms were not altered by
the router we monitored. Keeping in mind that we chose the worst case queuing scenario
in section 7.2.2, we can assume that traffic characteristics at scales larger than 10ms remain
unchanged through a backbone network were all the links are over dimensioned. Modelling
traffic at scales larger than 10ms from measurements taken at one point of a network is
therefore worth doing. By this we mean that one can draw general conclusions on traffic
characteristics by studying a single link. On the other hand, if traffic statistics were different
over all time scales at every point of a network, every link would need to be analyzed.
These measurements also show that the results of very small scale traffic analysis, below
1 or 10ms, will strongly depend on the point of the network where they were obtained and
should therefore be generalized only with the greatest care. Figure 7.2 illustrates these
different time scales. We call packet time scale the scale below which the BLPP does
not apply. We do not give a quantitative value to this scale, but characterize it instead in a
qualitative manner by saying that it is a few orders of magnitude larger than the transmission
time of a 1500 byte packet on the link considered. This corresponds to roughly 1ms for an
OC-3 link. One can think of it as the scale below which the size of a packet ‘matters’. As
already defined in section 4.6.1, we call coarse scales (CS) the time scales larger than the
knee transition area, and fine scales (FS) the time scales smaller than the knee transition but
larger than the packet time scale. We will solely focus on scales larger than the packet time
scale in the following.
7.3. VALIDATION OF THE BLPP 161
7.3 Validation of the BLPP
The first aim of this section is to validate the BLPP as a versatile traffic model for time scales
larger than the packet time scale. We will show, by using key semi-experiments, that the
underlying assumptions of our model are in fact verified over a large range of link speeds
and link utilizations. We thus complement the preliminary findings presented in chapters 3
and 4 and obtained on the lightly loaded links described in table 3.1. The second aim is to
use the data from table 7.3 to extend the validation of our traffic model from a single link to
a node and then to a network.
We emphasize the fact that the results presented here constitute by far the most thorough
traffic model validation we are aware of. It is an intensive computer task that involves
the individual manipulation of more than 1.5 billion packets comprised in thirty 2 hour
long traces. This represents at least a hundred times more data than most other traffic
modelling studies where one or two relatively short traces are usually deemed a sufficient
empirical check. We start with the traces detailled in table 7.1 and show in section 7.3.1
that our previous empirical findings are valid for most of them. In section 7.3.2, we study
the splitting and merging of traffic with the substreams detailled in table 7.3. We extend the
model to the non stationary case in section 7.3.3.
7.3.1 Individual links
Recall from chapter 4 that the fact that the BLPP model works is a direct consequence of
the results of selected semi-experiments. For instance, the choice of i.i.d. flows following a
Poisson process comes from the fact that the manipulation [A-Pois] has virtually no impact
on the structure of X(t). Moreover, the results of [P-Uni] show that most of the energy at
small time scales comes from in-flow dynamics. In the following we will check the results
of the two key semi-experiments [A-Pois] and [A-Pois; P-Uni] on all the traffic crossing
the router.
We first focus on link C2-out since it has the highest load of all the traces detailled in
table 7.1 and is therefore the link where, intuitively, the flow independence assumption is
the most likely to fail. Figure 7.4 shows the results of the semi-experiments on link C2-out.
The thick grey line represents the LD of the original traffic, while the solid black line and the
line with circles represent the LDs of the reconstructed packet arrival process after applying
respectively [A-Pois] and [A-Pois; P-Uni]. At most time scales, the differences due to the
[A-Pois] manipulation are not significant. In agreement with the findings of chapter 3, we
conclude that our flow independence hypothesis is also verified for this fairly loaded link.
0.004 0.016 0.062 0.25 1 4 16 64 256
Orig
25 A−Pois
A−Pois P−Uni
21
log2 Var( d j )
17
13
9
−8 −6 −4 −2 0 2 4 6 8
j = log ( a )
2
Figure 7.4: Semi-experiments [A-Pois] and [A-Pois; P-Uni] on output link C2-out.
The manipulation [A-Pois; P-Uni] drastically changes the small scale behaviour of the LD
below 1s (flat spectrum up to the knee), while larger time scales remain mostly unchanged,
as found in chapter 3.
We now present similar results for the traces collected at each router interface. Fig-
ure 7.5, stretched over pages 164 and 165, shows the LDs of the two semi-experiments
[A-Pois] and [A-Pois; P-Uni] on the 30 traces detailled in both tables 7.1 and 7.3. The
legend is the same as in figure 7.4. The plots are presented in a ‘matrix’ organisation that
matches the traffic matrix presented in table 7.2. In this section, we focus on the traffic seen
on each linecard, for which semi-experiments results are illustrated in the top row and left
column of figure 7.4. In most cases, the manipulations [A-Pois] and [A-Pois; P-Uni] have
similar effects to the ones observed in chapter 3. This shows that most traces satisfy the
requirements for a BLPP model to apply, and gives very convincing empirical evidence of
the wide applicability of the model. Two notable exceptions, where [A-Pois] significantly
changes the form of the LD, are C2-in and C4-out which we now detail.
First, C2-in carries traffic from Asia on a transpacific link using up to 50% of its capac-
ity, and one could think that this relatively high utilization is enough to explain the changes
created by [A-Pois]. However, since the same manipulation on link C2-out with similar uti-
lization did not create significant changes, high utilization is not by itself a sufficient reason.
Instead, this has more to do with the way the link is being used, and how packets were put
on it. We can only infer that traffic coming from the end-users is heavily shaped by access
routers, where packets certainly experience large queuing delays resulting in high correla-
tion between the flows, before the traffic gets to link C2-in. On the other hand, most packets
on C2-out would be generated by web servers in the US with very fast network connection.
Our edge router, which is probably the first bottleneck encountered by the packets, does not
shape the traffic enough for [A-Pois] to have a significant impact on flow correlations.
The second exception concerns the link C4-out, which is less than 15% utilized. This
would intuitively tend to indicate that flows on the link are only weakly correlated. Again,
the fact that [A-Pois] causes significant changes means that this reasoning is flawed. What
this shows is that a large proportion of these flows share a common bottleneck upstream,
probably only a few hops away from the sources of these packets. Given the very low
queuing delays in the core, these highly correlated flows have been left virtually unchanged
when they reach our measurement point. We see again that one cannot accurately predict
the results of the semi-experiments based solely on link utilization and emphasizes the fact
that utilization tells very little about the burstiness of traffic. As shown in chapter 6, a
description of traffic in terms of busy period statistics is a lot more informative.
7.3.2 Splitting and merging of traffic through a router
The second aim of this section is to check that the BLPP can model the splitting and merging
properties of traffic substreams through a router. We first briefly recall some results on the
splitting of BLPPs and then present our empirical validation.
Theory
Recall from chapter 4 that the BLPP is a single class traffic model where all the flows are
considered to have the same dynamics with constant µA and c chosen to be representative
of the measured flow statistics. When this single class approach is not possible because
very different kinds of traffic are mixed [159], one can use a multi class model, where all
the flow characteristics have an extra level of randomization: µA and c are then random
variables while P becomes a doubly stochastic random variable. Technically, this multi
class model is not a BLPP, but we will refer to it as multi class BLPP for convenience.
In essence, the splitting and merging of a BLPP is done on a flow by flow basis since
all the packets in a flow belong to the same substream. We know from chapters 4 and 5 that
these operations are easily applied to the BLPP: because flow arrival times in a BLPP follow
a Poisson process, one can use results on the splitting and merging of Poisson processes to
study the splitting and merging of BLPPs. An i.i.d. flow based splitting of a single class
BLPP will lead two independent single class BLPPs. The merging of two independent
single class BLPPs will lead respectively a multi class or single class BLPP depending
whether the BLPPs being merged have different or similar flow characteristics. This means
that in the general case where substreams have different flow characteristics, the BLPP does
25 25
C1−in C2−in
21 21
17 17
13 13
9 9
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
25
C1−out
21
17
No traffic No traffic
13
9
−8 −6 −4 −2 0 2 4 6 8
25 16
C2−out C1−in to C2−out
21 12
17 8
No traffic
13 4
9 0
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
25
C3−out
21
17
13
9
−8 −6 −4 −2 0 2 4 6 8
25 16
C4−out C2−in to C4−out
21 12
17 8
No traffic
13 4
9 0
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
25 25 25
BB1−out C1−in to BB1−out C2−in to BB1−out
21 21 21
17 17 17
13 13 13
9 9 9
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
25 25 25
BB2−out C1−in to BB2−out C2−in to BB2−out
21 21 21
17 17 17
13 13 13
9 9 9
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
Figure 7.5: Semi-experiments [A-Pois] and [A-Pois; P-Uni] on all traffic streams.
25 25 25
C4−in BB1−in BB2−in
21 21 21
17 17 17
13 13 13
9 9 9
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
16 25 25
C4−in to C1−out BB1−in to C1−out BB2−in to C1−out
12 21 21
8 17 17
4 13 13
0 9 9
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
16 25 25
12 21 21
8 17 17
4 13 13
0 9 9
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
16 25 25
12 21 21
8 17 17
4 13 13
0 9 9
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
25 25
BB1−in to C4−out BB2−in to C4−out
21 21
17 17
No traffic
13 13
9 9
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
25
C4−in to BB1−out
21
17
13
9
−8 −6 −4 −2 0 2 4 6 8
25
C4−in to BB2−out
21
17
13
9
−8 −6 −4 −2 0 2 4 6 8
Figure 7.5: (continued) See text for details.

not quite lend itself to the same linear operations as a Poisson process. However, if flows
are considered to follow the same statistics on every substream, one can split and merge
BLPPs in exactly the same way as Poisson processes.
Empirical validation
We know from section 7.2.2 that the router does not introduce any significant non linearities:
the correlation of a packet train is not modified above the packet time scale. This means that
semi-experiments on the substreams defined in table 7.3 will give similar results whether the
substreams are timestamped before or after the router. We therefore only study one set of
semi-experimental results per substream, corresponding to the packets being timestamped
before they enter the router.
The results of the semi-experiments for the substreams presented in table 7.3 are shown
in figure 7.5. Again, in most cases, [A-Pois] does not have a significant impact on the form
of the LD, while [A-Pois; P-Uni] only modifies the small scale behaviour. This means
that each individual substream can be modelled by a single class BLPP process as a first
approximation. A packet train on a given link, seen as the superposition of independent
substreams, can therefore be modelled as a sum of independent single class BLPPs. We
know that this sum is in fact a multi-class BLPP when different substreams have different
flow characteristics, and a single class BLPP when all the substreams have the same flow
characteristics.
There is no contradiction between a single class and a multi class approach to the mod-
elling of a packet trace. The later is simply richer than the former, but they both have the
same empirical backing. Although slightly less precise, a single class BLPP is preferable
since it is does not require the knowledge of routing tables and has a smaller number of
parameters. One can thus study the splitting and merging properties of traffic substreams
through a node, and then through a network, in exactly the same way as what is done with
Poisson processes, with the added benefit of having a traffic model with strong empirical
backing. We did not pursue the multi class approach further since it would lead to an even
larger number of parameters to be fitted, whereas our aim has been from the beginning to
understand the ‘physics’ and the networking causes of the observed statistics. Our simple
single class BLPP model is sufficient for this purpose.
7.3.3 Model extension
Apart from the above mentioned improvement of a multi class approach to take into account
the fact that flows do not all have the same rate [159], an other obvious improvement to the
model concerns the relaxation of the stationarity hypothesis.
Our starting point is, again, an empirical observation. We quantitatively compare in

figure 7.6 the arrival rates of packets, bytes and flows on link C2-out over a 24 hour period.
In agreement with one of our earliest observations made in section 3.2.2, the byte rate
mimics the packet rate. However, for our purpose, the most interesting point is that the
packet arrival rate follows the flow arrival rate.
In the context of our BLPP model, this means that flow characteristics remain roughly
unchanged over time, and that the packet arrival rate can be approximated by a scaled ver-
sion of the time dependent flow arrival rate. In fact recall that in the stationary case the
packet arrival rate is given by equation (2.23) λX = λF µP . From the above observations,
one can write
λX (t) = λF (t)µP , (7.1)
where the constant flow arrival rate λF has been replaced by a time dependent function
λF (t), with a 24 hour periodicity to account for daily cycles, while the flow characteris-
tics are time invariant. In this context, the flow arrival process Y is an inhomogeneous
Poisson process with periodic rate λF (t). It is no longer stationary but is instead cyclo-
stationary [69]. The resulting process packet arrival process X(t) is non stationary as well.
Although such model does not lend itself to the same analytic treatment as its stationary
counterpart, it could be useful for simulation purposes.
From a practical perspective, there are at least two ways of simulating the arrival times
of an inhomogeneous Poisson process. On the one hand, one can apply a time substitution
operation to a Poisson process, as mentioned in section 2.3.5. However this technique might
have poor efficiency if the inversion of the rate function has to be computed numerically
[107]. On the other hand there exists in fact a very simple and elegant way of simulating an
inhomogeneous Poisson process with rate λF (t) based on a thinning operation. Assuming
that λF (t) has a finite maximum λF∗ , one can use the following algorithm [113]:
(1) Generate the arrival times {tF (i)∗ } of a homogeneous Poisson process with rate λF∗ .
(2) Reject {tF (i)∗ } with probability 1 − λF (tF (i)∗ )/λF∗ .
The remaining points follow an inhomogeneous Poisson process with rate λF (t). Once the
flow arrival times have been determined, one simply has to lay down the packets corre-
sponding to each flow.
Link Utilization (kpps)

20
15
10
5
03:00 06:00 09:00 12:00 15:00 18:00 21:00 00:00
Link Utilization (Mbps)
100
80
60
40
20
03:00 06:00 09:00 12:00 15:00 18:00 21:00 00:00
15
flows (1000/s)
10
5
03:00 06:00 09:00 12:00 15:00 18:00 21:00 00:00
Figure 7.6: Utilization over 24 hours of link C2-out in packets, bytes and flows per sec-
onds. The flow arrival rate ‘shapes’ the traffic.
7.4 Conclusion
In this chapter we presented a validation of our BLPP traffic model over a large number of
traces with a wide range of utilizations and link speeds. Using our knowledge of Internet
traffic gained by answering the three questions detailled in section 1.2, we used results from
all the previous chapters to show that the BLPP can model the splitting and merging of
packet streams in a router. This proves that the model is versatile and could potentially be
used to model packet streams through a network. We also used results from a 24 hour link
monitoring to show how the model could be extended to take into account daily variations
of traffic loads.
Chapter 8
Conclusion
This last chapter presents a brief summary of our main findings, without repeating the de-
tailled comments made in the previous chapters, and gives a few directions for future work.
8.1 Contributions
In this thesis we studied how IP packets cross a router and answered the three questions
itemized on page 4:
(i) How to characterize the traffic entering a router ? (Chapters 3 and 4)
Starting from empirical measurements, we used extensively a technique we called

semi-experiments to determine the impact of different traffic characteristics on the
packet arrival process. We showed in particular that packet flows can be considered as
independent entities with Poisson arrival times. These findings led to a new physically
motivated packet model called a Bartlett-Lewis point process (BLPP), with strong
empirical backing and useful analytic properties.
(ii) How to sample packet traffic ? (Chapter 5)
Having characterized the traffic entering a router, we then studied how packets are
accounted for in today’s routers using packet sampling methods. We advocated the
use of a new flow sampling technique whenever possible in order to recover detailled
statistics about the traffic crossing the router. We also showed that the BLPP model
has a nice closure property under both forms of thinning.
(iii) What happens to packets inside a router ? (Chapter 6)
In order to get a better understanding of Internet traffic at very small scales, we

analysed router mechanisms thanks to a unique experimental setup where all the pack-
ets crossing a router could be captured. We proposed a simple model explaining the
169
170 CHAPTER 8. CONCLUSION
packet delays through the router and presented a proof of concept for a new way of
exporting traffic information based on busy period statistics.
Answering these three questions gave us the deep understanding of network traffic over
all time scales that we summarized in chapter 7. We also showed how the BLPP could
capture the splitting and merging of traffic substreams through a router and could therefore
be extended to a network wide traffic model.
8.2 Future work
Although this thesis brings a lot of insight on Internet traffic, there is still a large amount of
work that could be done.
In terms of our BLPP model, one could work on a method to do a blind fitting of the
parameters in order to ease the use of the model. One could also try to improve the model by
including the LRD flow arrival process empirically observed. A related unsolved problem
concerns the physical explanation of the empirical findings concerning the dependency of
the knee position on flow durations described in section 3.3.2. Another interesting problem
would be to link recent results on infinitely divisible cascades [30] with BLPPs expressed
as compound Poisson processes [149].
From a queuing theory perspective, the empirical data presented in chapter 6 provides a
unique opportunity to investigate the empirical evidence of some large deviations principles
often used to solve buffer occupancy problems [68]. This would provide a very interesting
study, whether these large deviation principles are in fact verified or not. Another natural ex-
tension of our full router monitoring work concerns the further testing and implementation
of the router performance reporting scheme based on busy period statistics.
Finally, on a more philosophical level, this work pushes the open-loop approach of
physical traffic models a very long way. One could always add minor improvements to
the model, for instance by using a Markovian description of in-flow dynamics. However,
in order to really make a step forward in our understanding of Internet traffic, we will
probably need a closed-loop system where feedback mechanisms and traffic interactions
will be taken into account. A first step in this direction is to understand the Internet’s router-
level topology [114] in order to add a spatial dimension to the usual temporal analysis of
packet traffic. This spatiotemporal analysis represents the next challenge of Internet traffic
modelling.
Appendix A
IP Packet structure
IP Header
All IP packets are structured the same way: an IP header followed by a variable length data
field, as shown in table A.1. The two most common transport protocols are TCP and UDP,
described respectively in tables A.2 and A.3.
0 4 8 16 19 32
Version IHL Type of Service Total Length
Identification Flags Fragment Offset
Time To Live Protocol Header Checksum
Source IP Address
Destination IP Address
Options (+padding)
- Version: IP version number.

- IHL: Internet header length is the length - Fragment offset: Indicates where this
of the Internet header in 32-bit words fragment belongs in the datagram.
- Type of service: Indicates the quality of - Time To Live (TTL): Number of hops.
service desired. - Protocol: Indicates the next level proto-
- Total length: Length of the IP datagram col used in the data portion of the Internet
in bytes. datagram.
- Identification: Value assigned by the - Header Checksum: checksum on the IP
sender to aid in assembling the fragments header.
of a datagram. - Options: IP options.
- Flags: 3 control flags.
Table A.1: IP Header
171
172 APPENDIX A. IP PACKET STRUCTURE
TCP Header
0 4 10 16 32
Source Port Destination Port
Sequence Number
Acknowledgement Number
Offset Reserved Flags Window
Checksum Urgent Pointer
Options (+ padding)
Data (variable)
- Source Port: Source Port number - Flags: 6 bits

- Destination Port: Destination Port number .URG: Urgent pointer field.
- Sequence Number: sequence number of the .ACK: Acknowledgment field.
first data byte in this segment. .PSH: Push function.
- Acknowledgment Number: value of the next .RST: Reset the connection.
sequence number which the sender of the segment .SYN: Synchronize sequence numbers.
is expecting to receive. .FIN: No more data from sender.
- Offset: number of 32 bit words in the TCP - Window: number of data bytes which the sender
header, which indicates where the data begins. of this segment is willing to accept.
- Reserved: 6 bits reserved for future use. - Data: TCP data
Table A.2: TCP Header
UDP Header
0 16 32
Source Port Destination Port
Length Checksum
Data
- Source Port: Source Port number. - Length: Length of the datagram in bytes.
- Destination Port: Destination Port number. - Data: UDP data.
Table A.3: UDP Header

Index
Active measurements, 42 Hurst parameter, 19

Analytic continuation, 105
numerical evaluation, 113 Infinitely divisible point process, 27
Autonomous system, 2 Internet, 1
IP, 2
Back to back packets, 145 bandwidth, 128
Backbone network, 2 header, 171
Bartlett-Lewis process, 26, 118
infinitely divisible, 27 Knee, 48, 51
equilibrium conditions, 87 tracking algorithm, 53
spectral density, 82
asymptotic behaviour, 84 Legendre spectrum, see Spectrum
stationary condition, 26 Long Range Dependence, 8, 20
thinning, 119
Minimum router transit time, 137
Biscaling, 48
Black box models, see Traffic models Neyman-Scott process, 26
Busy period, 141, 146
model, 150 ON/OFF Process , see Traffic models13
origins, 147 Onset scale, see Knee
reporting, 152 Operations on point processes, 31
statistics, 146 random translation, 32
superposition, 31
Circuit switching, 1, 6 thinning, 31
Cluster process , see Point process26 time substitution, 33
Compound Poisson distribution, 27
Conditional intensity function, 28 Packet
Covariance density of counts, 29 delay, 132
minimum router transit time, 137
DAG card, 42, 128
model, 138, 145
Delay, see Packet delay
statistics, 135
FIFO queue, 127, 138 level, 45, 100, 103, 108, 110
Flow, 44 matching, 130
elephant, 76 option, 136, 142
level, 45, 101, 103, 108, 112 sampling , see Sampling
mice, 76 substream , 156
sampling , seeSampling108 switching, 1, 6
thinning , seeSampling108 thinning, see Sampling
Fractional Gaussian Noise, 20 time scale, 160
Padé approximants, 107
Hölder regularity, 22 numerical evaluation, 116
Hausdorff spectrum, see Spectrum Passive measurements, 42, 128
HDLC, 128 precision, 128
173
174 INDEX
Physical models, see Traffic models S-Dur, 65, 66

Point process, 23 S-Pkt, 63, 66
cluster, 26 S-Thin, 65
conditional intensity, 24 T-Pkt, 69
marked, 23 SLA, 5, 125
operations on, see Operations on point SNMP, 126, 152
processes SONET, 128
orderly, 24 frame, 128
Poisson, see Poisson process Spectral density, 29
Poisson cluster, 26 Bartlett-Lewis process, 82
renewal , seeRenewal process25 renewal process, 30
simple, 24 Spectrum
stationary, 24 Legendre, 22
Poisson cluster process , see Point process26 Hausdorff, 22
Poisson process, 25 Substream , 156
infinitely divisible, 27 Superposition of point processes, see Opera-
splitting, 109, 119 tions on point processes
thinning, 32 Survivor function, 26
Poisson process , inhomogeneous167
PoS, 128 Tauberian theorem, 84, 105
TCP, 2
Random measure, 23 header, 172
Renewal density, 26 model, 14
Renewal equation, 30 Teletraffic, 6
Renewal function, 29 Thinning of a point process, see Operations
Renewal process, 25 on point processes
equilibrium, see stationary Tier, 2
modified, 26 Time scale
ordinary, 25 packet, 160
spectral density, 30 Time scale
stationary, 26 coarse, 160
thinning, 32 fine, 160
Round Trip Time, 14, 56 Traffic models
Router Ornstein Uhlenbeck process, 13
architecture, 127 Traffic models
model, 138, 145 TCP, 14
monitoring, 127 Autoregressive process, 11
store and forward, 127 black box, 7, 10
system definition, 133 Compound Poisson process, 12
Fractional Brownian Motion, 12, 19
Sampling, 116, 120 ON/OFF Process, 13
Flow, 108, 122 physical, 7, 13
Packet, 103, 121
Self Similar Process, 19 UDP, 2
Semi-experiments, 60 header, 172
A-Clus, 70
A-Perm, 62 Wavelet transform, 34
A-Pois, 62
A-Pord, 62
P-ConstR, 64
P-Pois, 64
P-ScaledR, 64
P-Uni, 62
Bibliography
[1] J. Abate and W. Whitt, “The Fourier-series method for inverting transforms of prob-
ability distributions.”, Queueing Systems, 10:5–88, 1992. 107, 120
[2] Abilene Network. http://abilene.internet2.edu. 43
[3] P. Abry, P. Flandrin, N. Hohn, and D. Veitch, “Invariance d’échelle dans l’Internet”,
in Proc. Colloque Mesure de lInternet, Nice, France, May 2003. ix
[4] P. Abry, P. Gonçalvès, and P. Flandrin, “Wavelet-based spectral analysis of 1/ f

processes”, in Proc. IEEE ICASSP, pp. III 237–240, 1993. 34
[5] P. Abry, P. Gonçalvès, and P. Flandrin, Wavelets and Statistics, chapter Wavelets,
spectrum estimation and 1/ f processes, p. 103. Springer-Verlag, New York, lecture
notes in statistics edition, 1995. 34
[6] P. Abry, M. S. Taqqu, P. Flandrin, and D. Veitch, Self-Similar Network Traffic and
Performance Evaluation, K. Park and W. Willinger, editors, chapter Wavelets for the
analysis, estimation, and synthesis of scaling data. Wiley, 2000. 34, 35, 37
[7] P. Abry and D. Veitch, “Wavelet analysis of long-range dependent traffic”, IEEE
Transactions on Information Theory, 44(1):2–15, 1998. 36
[8] A. Adas and A. Mukherjee, “On resource management and QoS quarantees for long-
range dependent traffic”, in Proc. IEEE Infocom’95, pp. 779–787, 1995. 12
[9] R. Addie, P. Mannersalo, and I. Norros, “Performance formulae for queues with
Gaussian input”, in Proc. 16th International Teletraffic Congress, 1999. 148
[10] R. G. Addie, M. Zukerman, and T. D. Neame, “Fractal traffic: measurements, mod-

elling and performance evaluation”, in Proc. IEEE Infocom ’95, pp. 977–984, 1995.
11
[11] R. J. Adler, R. E. Feldmann, and M. S. Taqqu, A practical guide to heavy tails.

Birkhäuser, 1998. 116
[12] D. Allen, “The impact of peering on ISP performance: what’s best for you ?”, Net-
work Magazine, November 2001. 2
[13] E. Altman, K. Avranchenkov, and C. Barakat, “A stochastic model for TCP/IP with
stationary random losses”, in Proc. of ACM SIGCOMM, 2000. 14, 15
[14] H. Amindavar and J. Ritchey, “Padé approximations of probability density func-

tions”, IEEE Transactions on Aerospace and Electronic Systems, 30(2):416–424,
1994. 107
175
176 BIBLIOGRAPHY
[15] A. T. Andersen and B. F. Nielsen, “A Markovian approach for modelling packet

traffic with long-range dependence”, IEEE Journal on Selected Areas of Communi-
cation, 5(16):719–732, 1998. 10
[16] F. Baccelli and D. Huong, “AIMD, fairness and fractal scaling of TCP traffic”, in
Proc. IEEE Infocom ’02, pp. 229 –238, 2002. 15
[17] P. Bak, K. Chen, and C. Tang, “A forest-fire model and some thoughts on turbulence”,
Phys. Lett. A, 147:297–300, 1990. 75
[18] C. Barakat, P. Thiran, G. Iannaccone, C. Diot, and P. Owezarski, “A flow-based

model for Internet backbone traffic”, in ACM SIGCOMM Internet Measurement
Workshop (IMW-2002), pp. 35–48, Marseille, Nov 6–8 2002. 81
[19] J. Beran, “Statistical methods for data with long range dependence”, Statistical Sci-
ence, 7(4):404–427, 1992. 8
[20] J. Beran, Statistics for Long-Memory Processes. Chapman and Hall/CRC, 1994. 20,
21
[21] J. Beran, R. Sherman, M. Taqqu, and W. Willinger, “Variable-bit-rate video traf-

fic and long range dependence”, IEEE Transactions on Communications, 43:1566–
1579, 1995. 7, 8, 12
[22] R. N. Bhattacharya, V. K. Gupta, and E. Waymire, “The Hurst effect under trends”,
Journal of Applied Probability, (20):649–662, 1983. 8
[23] N. Bingham, C. Goldie, and J. Teugels, Regular Variation. Cambridge University

Press, Cambridge England, 1987. 84, 105
[24] C. Blondia, “A discrete-time batch Markovian arrival process as B-ISDN traffic

model”, Belgian Journal of Operation Research, Statistics and Computer Science,
2(32):3–23, 1993. 10
[25] P. Brémaud and L. Massoullié, “Power spectra of general shot noises and hawkes
processes with a random excitation”, Adv. Appl. Proba., 34:205–222, 2002. 103
[26] F. Brichet, J. Roberts, A. Simonian, and D. Veitch, “Heavy traffic analysis of a storage
model with long range dependent On/Off sources”, Queueing Systems, 23:197–225,
1996. 12
[27] B. Castaing, “The temperature of turbulent flows”, J. Phys. II France, 6:105–114,

1996. 23
[28] C. Chaffy, “The analytic continuation process: from computer algebra to numerical
analysis”, in Proc. ACM International symposium on symbolic and algebraic com-
putation, pp. 216–222, 1994. 115, 116
[29] P. Chainais, Cascades infiniment divisibles et analyse multirésolution. application à

l’étude des intermittences en turbulence. PhD thesis, ENS Lyon, 2001. 23
[30] P. Chainais, R. Riedi, and P. Abry, “On non scale invariant infinitely divisible cas-
cades”, Technical Report RR-04-06, LIMOS UMR CNRS 6158, 2004. 170
[31] G. Cheng and J. Gong, “Traffic behavior analysis with Poisson sampling on high-
speed network”, in Proc. ICII 2001, pp. 158–163, 2001. 101
BIBLIOGRAPHY 177
[32] B.-Y. Choi, J. Park, and Z.-L. Zhang, “Adaptive random sampling for total load es-
timation”, in Proc. IEEE International Conference on Communications, pp. 1552–
1556, 2003. 101
[33] Cisco Netflow. http://www.cisco.com. 99
[34] Cisco Sampled NetFlow. http://www.cisco.com. 99
[35] K. Claffy, H.-W. Braun, and G. Polyzos., “Parameterizable methodology for Internet
traffic flow profiling”, IEEE Journal on Selected Areas in Communications, 136(8):
1481–1494, 1995. 44
[36] K. Claffy, G. Polyzos, and H.-W. Braun., “Application of sampling methodologies to

network traffic characterization”, in Proc. ACM SIGCOMM, pp. 13–17, 1993. 101
[37] W. Cochran, Sampling Techniques. Wiley, 1987. 101
[38] K. G. Coffman and A. M. Odlyzko, Handbook of Massive Data Sets, chapter Internet
growth: is there a Moore’s Law for data traffic ?, pp. 47–73. Kluwer Academic
Publishing, 2002. 2
[39] Cooperative Association for Internet Data Analysis. http://www.caida.org.

43
[40] Coralreef software. http://www.caida.org/tools/measurement/

coralreef/. 43, 45
[41] P. Cowpertwait, “A renewal cluster model for the inter-arrival times of rainfall
events”, Int. J. Climatol., 21:49–61, 2000. 75
[42] D. Cox and V. Isham, Point Processes. Chapman & Hall, 1980. 23, 82
[43] M. Crovella and A. Bestavros, “Self-similarity in world wide web traffic: Evidence
and possible causes”, IEEE/ACM Transactions on Networking, 5(6):835–846, 1997.
14
[44] DAG network measurement card. http://dag.cs.waikato.ac.nz/. 42,

128
[45] J. Daigle, “Queue length distributions from probability generating functions via
Fourier transforms”, Operations Research Letters, (8):229–236, 1989. 107
[46] D. J. Daley and D. Vere-Jones, An Introduction to the Theory of Point Processes.

Springer-Verlag, 1988. 23, 24, 25, 27, 33, 82, 86, 93, 103, 109, 120
[47] DIstributed Real Time Systems - University of North Carolina. http://www.cs.

unc.edu/Research/dirt/. 43
[48] S. Donnelly, High Precision Timing in Passive Measurements of Data Networks. PhD
thesis, University of Waikato, 2002. 129
[49] D. A. Drabold and J. L. Jones, “Maximum-entropy approach to series extrapolation

and analytic continuation”, J. Phys. A: Math. Gen., 24:4705–4714, 1991. 123
[50] J. Drobisz and K. Christensen, “Adaptive sampling methods to determine network

traffic statistics including the Hurst parameter”, in Proc. IEEE Annual Conference on
Local Computer Networks, pp. 238–247, 1998. 101
178 BIBLIOGRAPHY
[51] N. Duffield, C. Lund, and M. Thorup, “Estimating flow distributions from sampled
flow statistics”, in Proc. ACM/SIGCOMM conference, pp. 325–336. ACM Press,
2003. 101, 117, 118, 122, 123
[52] N. Duffield, C. Lund, and M. Thorup, “Learn more, sample less: control of volume
and variance in network measurement”, submitted, 2003. 101
[53] N. G. Duffield, J. T. Lewis, N. O’Connel, R. Russel, and F. Toomey, “Predicting

quality of service for traffic with long-range dependence”, Proc. IEEE ICC, pp. 473–
477, 1995. 8
[54] N. G. Duffield and N. O’Connel, “Large deviations and overflow probabilities for the
general single-server queue, with applications”, Mathematical Proceedings of the
Cambridge Philosophical Society, 118:363–374, 1995. 12
[55] A. K. Erlang, “Solution of some problems in the theory of probabilities of some

significance in automatic telephone exchanges”, Post Office Electrical Engineers’
Journal, 10:189–197, 1918. 6
[56] A. Erramilli, O. Narayan, and W. Willinger, “Experimental queueing analysis with

long-range dependent packet traffic”, IEEE/ACM Transactions on Networking, 4(2):
209–223, April 1996. 7, 41, 60
[57] C. Estan, G. Varghese, and M. Fisk, “Bitmap algorithms for counting active flows
on high speed links”, in Proc. ACM SIGCOMM Internet Measurement Conference,
2003. 122
[58] Federal Networking Council. Internet monthly reports, October 1995. 1
[59] A. Feldmann, R. Cáceres, F. Douglis, G. Glass, and M. Rabinovich, “Performance

of web proxy caching in heterogeneous bandwidth environments”, in Proc. IEEE
INFOCOM’99, pp. 107–116 vol. 1, 1999. 122
[60] A. Feldmann, A. Gilbert, and W. Willinger, “Data networks as cascades: Investi-

gating the multifractal nature of Internet WAN traffic”, in Proc. ACM SIGCOMM,
Vancouver, Canada, 1998. 7
[61] A. Feldmann, A. C. Gilbert, P. Huang, and W. Willinger, “Dynamics of IP traffic: A

study of the role of variability and the impact of control”, in Proc. ACM SIGCOMM,
pp. 301–313, 1999. 7, 54, 56
[62] A. Feldmann, A. C. Gilbert, W. Willinger, and T. Kurtz, “The changing nature of

network traffic: Scaling phenomena”, Computer Communication Review, April 1998.
48
[63] A. Feldmann, J. Rexford, and R. Cáceres, “Efficient policies for carrying web traffic
over flow-switched networks”, IEEE/ACM Transactions on Networking, 6(6):673–
685, 1998. 122
[64] W. Feller, An Introduction to Probability Theory and Its Applications, volume 2. John
Wiley and Sons, 2nd edition, 1971. 27
[65] D. R. Figueiredo, B. Liu, V. Misra, and D. Towsley, “On the autocorrelation structure
of TCP traffic”, Computer Networks Journal Special Issue on Advances in Modeling
and Engineering of Long-Range Dependent Traffic, 2002. 9, 15
[66] S. Floyd and V. Jacobson, “The NewReno modification to TCP’s fast recovery algo-
rithm”, IEEE/ACM Transactions on Networking, 1(4):397–413, 1993. 14
BIBLIOGRAPHY 179
[67] C. Fraleigh, S. Moon, B. Lyles, C. Cotton, M. Khan, D. Moll, R. Rockell, T. Seely,

and C. Diot, “Packet-level traffic measurements from the Sprint IP backbone”, IEEE
Networks, 17(6):6–16, 2003. 129
[68] A. Ganesh, N. O’Connel, and D. Wischik, Big Queues. Lectures Notes in Mathemat-
ics. Springer, 2004. 8, 170
[69] W. A. Gardner, Cyclostationarity in Communication and Signal Processing. IEEE

Press, New York, 1994. 167
[70] D. P. Gaver and P. A. W. Lewis, “First-order autoregressive gamma sequences and

point processes”, Adv. Appl. Probab., 12:727–745, 1980. 27
[71] A. C. Gilbert, W. Willinger, and A. Feldmann, “Scaling analysis of conservative

cascades, with application to network traffic”, IEEE Trans. Information Theory, 45
(3):971–991, 1999. 7
[72] M. Grasse, M. R. Frater, and J. F. Arnold, “On the non-stationarity of MPEG-2 video
traffic”, Technical Report COST 242, University of New South Wales, 1995. 8
[73] M. Grossglauser and J.-C. Bolot, “On the relevance of long-range dependence in
network traffic”, IEEE/ACM Transactions on Networking, pp. 629 – 640, 1999. 8
[74] L. Guo, M. Crovella, and I. Matta, “How does TCP generate self-similarity ?”, in
Proceedings of the Ninth International Symposium in Modeling, Analysis and Simu-
lation of Computer and Telecommunication Systems (MASCOTS’01), p. 215, 2001.
9, 15
[75] D. Heath, S. Resnick, and G. Samorodnitsky, “Heavy tails and long-range depen-
dence in ON/OFF processes and associated fluid models”, Mathematics of Opera-
tions Research, 1(23):145–164, 1998. 14
[76] H. Heffes and D. M. Lucantoni, “A Markov modulated characterization of packetized

voice and data traffic and related statistical multiplexer performance”, IEEE Journal
on Selected Areas in Communications, 6(4):856–867, 1986. 6, 10
[77] P. Henrici, Applied and Computational Complex Analysis, Vol 1. Wiley and Sons,
1974. 115, 116
[78] N. Hohn and D. Veitch, “Inverting sampled traffic”, in Proc. ACM Internet Measure-
ment Conference, pp. 222–233, Miami, USA, October 2003. ix
[79] N. Hohn and D. Veitch, “Inverting sampled traffic”, IEEE/ACM Transactions on Net-
working, (fast track submission). ix
[80] N. Hohn, D. Veitch, and P. Abry, “Does fractal scaling at the IP level depend on TCP
flow arrival processes ?”, in Proc. ACM Internet Measurement Workshop, pp. 63–68,
Marseille, France, November 2002. ix
[81] N. Hohn, D. Veitch, and P. Abry, “Investigating the scaling behaviour of Internet
flow arrivals”, in Proc. International Conference on Self-Similarity and Applications,
Annales Mathématiques Blaise Pascal, Clermont Ferrand, France, May 2002, To be
published. ix
[82] N. Hohn, D. Veitch, and P. Abry, “Cluster Processes, a Natural Langage for Network
Traffic”, IEEE Transactions on Signal Processing, Special Issue on Signal Process-
ing in Networking, 51(8):2229–2244, 2003. ix
180 BIBLIOGRAPHY
[83] N. Hohn, D. Veitch, and P. Abry, “The impact of the flow arrival process in Internet
traffic”, in Proc. IEEE ICASSP, pp. VI 37–40, Hong Kong, April 2003. ix
[84] N. Hohn, D. Veitch, K. Papagiannaki, and C. Diot, “Bridging router performance and
queueing theory”, in Proc. ACM SIGMETRICS conference, NYC, USA, June 2004.
ix
[85] N. Hohn, D. Veitch, and T. Ye, “Splitting and merging of a traffic model: validation”.
(submitted), 2004. x
[86] C. Hopps, Analysis of an Equal-Cost Multi-Path Algorithm. RFC 2992, 2000. 132
[87] J. R. M. Hosking, “Fractional differencing”, Biometrika, 68(1):165–176, 1981. 12
[88] Y. Huang and J. M. Pullen, “Countering denial-of-service attacks using congestion

triggered packet sampling and filtering”, in Proc. International Conference on Com-
puter Communications and Networks, pp. 490–494, 2001. 101
[89] H. E. Hurst, “Long-term storage capacity in reservoirs”, Proc. American Society of

Civil Eng., 76(11), 1950. 8
[90] G. Iannaccone, C. Diot, I. Graham, and N. M. Keown, “Monitoring very high speed
links”, in Proc. ACM SIGCOMM Internet Measurement Workshop, 2001. 100
[91] Inmon Corporation, sFlow accuracy and billing. http://www.inmon.com/

PDF/sFlowBilling.pdf. 99, 112
[92] Internet Protocol Flow Information eXport (IPFIX) - IETF Working Group. http:
//www.ietf.org/html.charters/ipfix-charter.html. 120
[93] J. L. Jerkins and J. L. Wang, “A measurement analysis of ATM cell-level aggregate

traffic”, in Proc. IEEE Globecom ’97, pp. 1589–1595, 1997. 7
[94] S. H. Kang, Y. H. Kim, D. K. Sung, and B. D. Choi, “An application of Markovian ar-
rival process (MAP) to modeling superposed ATM cell streams”, IEEE Transactions
on Communications, 4(50):633–642, 2002. 10
[95] S. Keshav and S. Rosen, “Issues and trends in router design”, IEEE Communication
Magazine, 36(5):144–151, 1998. 136
[96] L. Kleinrock, “Information flow in large communication nets”, RLE Quarterly

Progress Report, 1961. 1
[97] L. Kleinrock, Queuing Systems, Volume 1: Theory. John Wiley and Sons, 1975. 8
[98] S. Klivansky, A. Mukherjee, and C. Song, “On long-range dependence in NSFNET

traffic”, Technical report, Georgia Institute of Technology, 1995. 8
[99] C. Knessel and J. A. Morrison, “Heavy-traffic analysis of a data handling system with
many sources”, SIAM Journal of Applied Mathematics, 51(1):187–213, 1991. 13
[100] B. Krishnamurthy, J. C. Mogul, and D. M. Kristol, “Key differences between

HTTP/1.0 and HTTP/1.1”, in Proceedings of the WWW-8 Conference, Toronto, 1999.
52
[101] A. Kumar, J. Xu, L. Li, and J. Wang, “Space-code bloom filter for efficient traffic
flow measurement”, in Proc. ACM SIGCOMM Internet Measurement Conference,
2003. 122
BIBLIOGRAPHY 181
[102] T. V. Lakshman and U. Madhow, “Performance analysis of TCP/IP for networks

with high bandwidth-delay products and random loss”, IEEE/ACM Transactions on
Networking, 5(3):336–350, 1997. 14, 15
[103] G. Latouche and M.-A. Remiche, “An MAP-based Poisson cluster model for web
traffic”, Performance Evaluation, 49(1-4):359–370, 2002. 75
[104] A. Law and W. Kelton, Simulation Modeling and Analysis. Industrial Engineering
and Management Science. McGraw Hill, 2nd edition, 1991. 33
[105] A. J. Lawrence, “Arbitrary event initial conditions for branching Poisson processes”,
Journal of the Royal Statistical Society, Series B, 34(1):114–123, 1972. 87
[106] LBNL Network Simulator. http://www.isi.edu/nsnam/ns/. 14, 56
[107] S. Lee and J. R. Wilson, “Modeling and simulation of a nonhomogeneous Poisson

process having cycle behaviour”, Commun. Satist. Simula., 20(2):777–809, 1991.
167
[108] B. Leiner, V. Cerf, D. Clark, R. Khan, L. Kleinrock, D. Lynch, J. Postel, L. Roberts,

and S. Wolff, “The past and future history of the Internet”, Communications of the
ACM, 40(2):102–108, 1997. 1
[109] W. Leland, M. Taqqu, W. Willinger, and D. Wilson, “On the self-similar nature of
ethernet traffic”, IEEE/ACM Transactions on Networking, 2:1–15, 1994. 7, 8
[110] P. A. W. Lewis, “A branching Poisson process model for the analysis of computer
failure patterns”, Journal of the Royal Statistical Society. Series B, 26(3):398–456,
1964. 26, 75, 109
[111] P. A. W. Lewis, “Asymptotic properties and equilibrium conditions for branching

Poisson processes”, J. Appl. Prob., 6:355–371, 1969. 87
[112] P. A. W. Lewis and B. K. Ray, “Modeling long-range dependent, non linearity and
periodic phenomenon in sea surface temperatures using TSMARS”, Journal of the
American Statistical Association, (92):881–893, 1997. 11
[113] P. A. W. Lewis and G. S. Shedler, “Simulation of nonhomogeneous Poisson processes

by thinning”, Naval Res. Logistics Quart., 26(3):403–413, 1979. 167
[114] L. Li, D. Alderson, W. Willinger, and J. C. Doyle, “A first-principles approach to

understanding the Internet’s router-level topology”, in Proc. ACM SIGCOMM, 2004
(to be published). 170
[115] N. Likhanov, B. Tsybakov, and N. D. Georganas, “Analysis of an ATM buffer with

self-similar (fractal) input traffic”, in Proc. IEEE Infocom ’95, 1995. 14
[116] D. V. Lindley, “The theory of queues with a single server”, in Proc. Cambridge Phil.
soc., volume 48, pp. 277–289, 1952. 6
[117] J. C. Lopez-Ardao, C. Lopez-Garcia, A. Suarez-Gonzales, M. Fernandez-Veiga, and

R. Rodriguez-Rubio, “On the use of self-similar processes in network simulation”,
ACM Transactions on Modelling and Computer Simulation, 10(2):125–151, 2000.
12
[118] S. B. Lowen and M. C. Teich, “Doubly stochastic point process driven by fractal shot
noise”, Physical Review A, 43(8):4192–4213, 1991. 12, 21
182 BIBLIOGRAPHY
[119] S. B. Lowen and M. C. Teich, “Fractal renewal processes generate 1/f noise”, Phys.
Rev. E, 47(2):992–1001, 1993. 12
[120] S. Mallat, A Wavelet Tour of Signal Processing. Academic Press, 1998. 34
[121] B. Mandelbrot, “Long-run linearity, locally Gaussian processes, H-spectra and infi-
nite variance”, International Economic Review, 10:82–113, 1969. 13
[122] L. Massoulié and A. Simonian, “Large buffer asymptotics for the queue with frac-
tional Brownian input”, Journal of Applied Probabilities, 36:894–906, 1999. 12
[123] M. May, J.-C. Bolot, C. Diot, and B. Lyles, “Reasons not to deploy RED”, in Proc.
7th IEEE/IFIP International Workshop on Quality of Service (IWQoS’99), London,
June 1999. 14
[124] N. McKeown, “iSLIP: A scheduling algorithm for input-queued switches”,

IEEE/ACM Transactions on Networking, 7(2):188–201, 1999. 126, 127
[125] J. Micheel, I. Graham, and N. Brownlee, “The Auckland data set: an access link
observed”, in Proc. 14th ITC Specialist Seminar, 2000. 42
[126] K. Miller, “Stabilized numerical analytic prolongation with poles”, SIAM Journal on
Applied Mathematics, 183(2):346–363, 1970. 108
[127] V. Misra, W. B. Gong, and D. Towsley, “Stochastic differential equation modeling

and analysis of TCP-windowsize behaviour”, Performance, 1999. 15
[128] V. Misra, W. B. Gong, and D. Towsley, “Fluid-based analysis of a network of AQM

routers supporting TCP flows with application to RED”, in Proc. ACM SIGCOMM,
2000. 15
[129] O. Narayan, “Exact asymptotic queue length distribution for fractional Brownian
motion”, Advances in Performance Analysis, 1(1):39–63, 1998. 12
[130] National Laboratory for Applied Network Research. http://www.nlanr.net/.

43, 109
[131] N.Duffield, C. Lund, and M. Thorup, “Properties and prediction of flow statis-
tics from sampled packet streams”, in Proc. ACM SIGCOMM Internet Measurment
Workshop, 2002. 101, 102, 109, 112, 121
[132] A. L. Neidhardt and J. L. Wang, “The concept of relevant time scales and its appli-
cation to queuing analysis of self-similar traffic (or is Hurst naughty or nice?)”, in
Proc. ACM SIGMETRICS, pp. 222 – 232, 1998. 8
[133] M. F. Neuts, Matrix-Geometric Solutions in Stochastic Models. John Hopkins Uni-

versity Press, 1981. 10
[134] M. F. Neuts, “Models based on the Markovian arrival process”, IEEE Transactions
on Communications, (75):1255–1265, 1992. 10
[135] I. Norros, “A storage model with self-similar input”, Queueing Syst., 16:387–396,
1994. 7, 12
[136] I. Norros, “On the use of fractional Brownian motion in the theory of connectionless
networks.”, IEEE Journal on Selected Areas in Communications, pp. 953–962, Aug.
1995. 12
BIBLIOGRAPHY 183
[137] I. Norros, A. Simonian, D. Veitch, and J. Virtamo, “A Beneš formula for a buffer
with fractional Brownian input”, in Proc. 9th ITC Specialists Seminar’95: Teletraffic
Modelling and Measurement, 1995. 7
[138] Packet Sampling - IETF Working Group. http://www.ietf.org/html.

charters/psamp-charter.html. 120
[139] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, “Modeling TCP throughput: a simple
model and its empirical validation”, in Proc. of ACM SIGCOMM, 1998. 9, 14
[140] K. Papagiannaki, R. Cruz, and C. Diot, “Network performance monitoring at small

time scales”, in Proc. ACM SIGCOMM Internet Measurement Conference, pp. 295–
300, Miami, 2003. 126
[141] K. Papagiannaki, S. Moon, C. Fraleigh, P. Thiran, F. Tobagi, and C. Diot, “Analysis

of measured single-hop delay from an operational back bone network”, in Proc. IEEE
Infocom, New York, 2002. 125, 131, 135, 136
[142] K. Papagiannaki, D. Veitch, and N. Hohn, “Origins of microcongestion in an access

router”, in Proc. Passive and Active Measurment Workshop, Antibes, France, April
2004. ix, 148
[143] K. Park, G. Klim, and M. Crovella, “On the relationship between file sizes, transport
protocols and self-similar network traffic”, in Proc. IEEE International Conference
on Network Protocols, pp. 171–180, 1996. 14
[144] M. Parulekar and A. M. Makowski, “M/G/∞ input processes: A versatile class of

models for network traffic”, in Proc. INFOCOM, p. 419, 1997. 13
[145] V. Paxson, “Growth trends in wide-area TCP connections”, IEEE Networks, 4(8):
8–17, 1994. 2
[146] V. Paxson, Measurements and analysis of end-to-end Internet dynamics. PhD thesis,
University of California, Berkley, 1997. 131
[147] V. Paxson and S. Floyd, “Wide-area traffic: The failure of Poisson modelling”,
IEEE/ACM Transactions on Networking, 3(3):316–336, 1995. 6, 7, 8
[148] R. F. Peltier and J. Lévy-Véhel, “Multifractional Brownian motion: definition and

preliminary results”, Technical Report RR-2645, INRIA, 1995. 22
[149] C. Philipson, “Lewis’ branching Poisson process model from the point of view of the
theory of compound poisson processes”, Skand. Aktuartidskr., pp. 183–198, 1966.
170
[150] R. Riedi and J. L. Véhel, “Multifractal properties of TCP traffic: a numerical study”,
Technical Report 3129, INRIA Rocquencourt, France, 1997. 7, 96
[151] R. H. Riedi, “An improved multifractal formalism and self-similar measures”,

J. Math. Anal. Appl., 189:462–490, 1995. 22
[152] R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R. G. Baraniuk, “A multifractal wavelet

model with application to network traffic”, IEEE Transactions on Information The-
ory, 45(3):992 –1018, 1999. 13
[153] J. Riordan, Combinatorial Identities. Wiley and Sons, 1968. 106

184 BIBLIOGRAPHY
[154] S. Robert and J.-Y. L. Boudec, “On a markov modulated chain exhibiting self-
similarities over finite timescale”, Performance Evaluation, 27-28:159–173, 1996.
11
[155] M. Roughan, D. Veitch, and M. Rumsewicz, “Computing queue length distributions

for power-law queues”, in Proc. IEEE Infocom, pp. 356–363, 1998. 107
[156] S. Roux, D. Veitch, P. Abry, L. Huang, P. Flandrin, and J. Micheel, “Statistical scaling
analysis of TCP/IP data”, in Proc ICASSP 2001, Salt Lake City, USA, May 2001. 13
[157] B. Ryu, D. Cheney, and H. Braun, “Internet flow characterization: Adaptive timeout
strategy and statistical modeling”, in Proc. Passive and Active Measurement work-
shop, 2001. 45
[158] B. Ryu and S. B. Lowen, “Point processes models for self-similar network traffic,
with applications”, Stochastic models, 14:735–761, 1998. 13
[159] S. Sarvotham, R. Riedi, and R. Baraniuk, “Connection-level analysis and modeling

of network traffic”, in Proceedings of the ACM SIGCOMM Internet Measurement
Workshop, 2001. 15, 90, 163, 166
[160] M. Schwartz, Broadband Integrated Networks. Prentice-Hall, 1996. 10
[161] B. Sikdar and K. S. Vastola, “On the convergence of MMPP and fractional ARIMA
processes with long-range dependence to fractional Brownian motion”, in Proc. 34th
Conference on Information Sciences and Systems, 2000. 10
[162] A. Simonian and J. T. Virtamo, “Transient ans stationary distributions for fluid queues
and input processes with a density”, SIAM Journal of Applied Mathematics, 51, 1991.
13
[163] W. Simpson, PPP in HDLC-like Framing. RFC 1662, 1994. 128
[164] F. D. Smith, F. H. Campos, K. Jeffay, and D. Ott, “What TCP/IP protocol headers
can tell us about the web”, in Proceedings of the Joint International Conference on
Measurement and Modeling of Computer Systems, pp. 245–256, 2001. 52, 57
[165] Sprint corporation. http://www.sprint.com. 3
[166] W. Stevens, TCP/IP illustrated, Volume 1: The Protocols. Addison-Wesley, 1994.

49
[167] M. Taqqu, V. Teverovsky, and W. Willinger, “Is network traffic self-similar or multi-
fractal?”, Fractals, (5):63–73, 1997. 96
[168] M. S. Taqqu and V. Teverovsky, A Practical Guide to Heavy Tails: Statistical Tech-
niques and Applications, chapter On estimating the intensity of long-range depen-
dence in finite and infinite variance time series, pp. 177–217. Birkhauser, 1998. 12,
47
[169] M. S. Taqqu, V. Teverovsky, and W. Willinger, “Estimators for long-range depen-

dence: an empirical study”, Fractals, 3(4):785–798, 1995. 33
[170] M. S. Taqqu, W. Willinger, and R. Sherman, “Proof of a fundamental result in self-

similar traffic modeling”, Computer Communication Review, 26:5–23, 1997. 14, 21
[171] J. L. Véhel and R. H. Riedi, in Fractals in Engineering’97, J. Lévy Véhel and E. Lut-
ton and C. Tricot, editors, chapter Fractional Brownian motion and data traffic mod-
eling: The other end of the spectrum. Springer, 1997. 7
BIBLIOGRAPHY 185
[172] D. Veitch, P. Abry, P. Flandrin, and P. Chainais, “Infinitely divisible cascade analysis
of network traffic data”, in Proc. of IEEE ICASSP, 2000. 13
[173] D. Veitch, N. Hohn, and P. Abry, “Multifractality in TCP/IP traffic : the case against”.
(submitted), 2004. ix, 97
[174] D. Veitch, M. Taqqu, and P. Abry, “Meaningful MRA initialisation for discrete time
series”, Signal Processing, 8:1971–1983, 2000. 38
[175] A. Veres and M. Boda, “The chaotic nature of TCP congestion control”, in Proc.
IEEE Infocom, 2000. 15
[176] A. Veres, Z. Kenesi, S. Molnar, and G. Vattay, “On the propagation of long range
dependence in the Internet”, in Proceedings of ACM SIGCOMM, 2000. 15
[177] Waikato Applied Network Dynamics. http://wand.cs.waikato.ac.nz/

wand/wits/. 43, 109, 125
[178] W. Willinger, Stochastic Networks, chapter Traffic modeling for high-speed net-
works: Theory versus practice, pp. 395–409. Springer-Verlag, 1995. 14
[179] W. Willinger, M. Taqqu, R. Sherman, and D. Wilson, “Self-similarity through high

variability: statistical analysis of Ethernet LAN traffic at the source level”, in Pro-
ceedings of ACM SIGCOMM’95, pp. 100–113, 1995. 41, 63
[180] F. Xue, “Modeling and predicting long-range dependent traffic with FARIMA
process”, in Proc. of 1999 International Symposium on Communication, November
1999. 12
[181] Y. Zhang, N. Duffield, V. Paxson, and S. Shenker, “On the constancy of Internet
path properties”, in Procedings of ACM/SIGCOMM Internet Measurement Workshop
2001, 2001. 75
[182] Z.-L. Zhang, V. Ribeiro, S. Moon, and C. Diot, “Small-time scaling behaviors of
Internet backbone traffic: An empirical study.”, in IEEE Infocom, 2003. 76, 81, 90,
92
[183] M. Zukerman and I. Rubin, “On multi channel queuing systems with fluctuating
parameters”, in Proc. IEEE Infocom ’86, pp. 600–608, 1986. 10
[184] M. Zukerman and I. Rubin, “Queue size and delay analysis for a communication
system subject to traffic activity mode changes”, IEEE Transactions on Communica-
tions, 6(34):622–628, 1986. 10

Nicolas Hohn PHD Thesis

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Nicolas Hohn PHD Thesis

Caricato da

Copyright:

Formati disponibili

PRODUCED ON ACID-FREE PAPER

SUBMITTED IN TOTAL FULFILLMENT OF THE

DEPARTMENT OF ELECTRICAL AND ELECTRONIC ENGINEERING

This is to certify that:

(i) the thesis comprises only my original work;

List of Tables xvii

List of Figures xix

Principal Notations xxi

3 Empirical observations and semi-experiments 41

5 Inverting sampled traffic 99

6 Bridging router performance and queuing theory 125

7 Modelling Internet traffic 155

A IP Packet structure 171

3.1 Details of packet traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.1 Full router trace details over 13 hours . . . . . . . . . . . . . . . . . . . . 129

A.1 IP Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

1.1 Sprint North American network . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 Illustration of scale invariance . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1 Packet size distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 Examining flow variability for AUCK-d1 . . . . . . . . . . . . . . . . . . 77

5.1 Analytic continuation method . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.1 Experimental setup for full router monitoring . . . . . . . . . . . . . . . . 130

7.1 Router diagram with traffic multiplexing to C2-out . . . . . . . . . . . . . 156

IR = IR1 real line

Traffic modelling parameters

Router modelling parameters

t(λi , n) DAG timestamp of the nth packet on link λi , page 133

1.1 The Internet

• There is no global control at the operations level.

Figure 1.1: Sprint North American network as of 2003.

1.2 Philosophy and aims of this thesis

Packets are routed to

Packets enter Packets leave

The router exports

(i) How to characterize the traffic entering a router ?

(ii) How to sample packet traffic ?

(iii) What happens to packets inside a router ?

1.3 Teletraffic engineering

1.3.2 Traffic modelling

An important aspect of teletraffic engineering is the mathematical modelling of the observed

we consider any measured data as a sample path of an underlying stochastic process. We

Large time scales

In practice, it is impossible to decide whether a timeseries exhibits random fluctuations

Small time scales

1.4 Internet traffic models

1.4.1 Black box traffic models

• Markov Modulated Poisson Process (MMPP)

• Batch Markovian Arrival Process (BMAP)

• Discrete-time Batch Markovian Arrival Process (D-BMAP)

• Markov Modulated Bernoulli Process (MMBP)

Xt = φ1 Xt−1 + φ2 Xt−2 + φ3 Xt−3 + ... + φ p Xt−p + bεt , (1.1)

Φ(B)Xt = bεt . (1.2)

The autocorrelation ρk verifies

ρk = φ1 ρk−1 + φ2 ρk−2 + ... + φ p ρk−p , (1.3)

Defining Θ(B) = 1 − θ1 B − θ2 B2 − ... − θq Bq , this model can be represented as

Φ(B)Xt = Θ(B)εt . (1.5)

where ∆ is the difference operator defined by ∆Xt = Xt − Xt−1 = (1 − B)Xt . When d is

Γ(1 − d)Γ(k + d) Γ(1 − d) 2k−1

This is a long-range dependent process with Hurst parameter H = d + 0.5.

Fractional Brownian motion

Point process models

law shot noise λ (t) by