PHD Thesis

Simulation and Statistical Inference of Stochastic
Reaction Networks with Applications to Epidemic

Models
Thesis by
Alvaro Moraes
In Partial Fulfillment of the Requirements
For the Degree of
Doctor of Philosophy
(Applied Mathematics and Computational Science)
King Abdullah University of Science and Technology (KAUST),
Thuwal, Makkah Province,
Kingdom of Saudi Arabia
Copyright January 2015
by Alvaro Moraes
All Rights Reserved

2
The thesis of Alvaro Moraes is approved by the examination committee
Committee Chairperson: Dr. Raúl Tempone
Committee Member: Dr. Omar Knio
Committee Member: Dr. Marc Genton
Committee Member: Dr. Fabrizio Bisetti
Committee Member: Dr. Boualem Djehiche
Committee Member: Dr. Michael Giles

3
ABSTRACT
Simulation and Statistical Inference of Stochastic Reaction
Networks with Applications to Epidemic Models
Alvaro Moraes
Epidemics have shaped, sometimes more than wars and natural disasters, demo-
graphic aspects of human populations around the world, their health habits and their
economies. Ebola and the Middle East Respiratory Syndrome (MERS) are clear and
current examples of potential hazards at planetary scale.
During the spread of an epidemic disease, there are phenomena, like the sudden
extinction of the epidemic, that can not be captured by deterministic models. As a
consequence, stochastic models have been proposed during the last decades. A typical
forward problem in the stochastic setting could be the approximation of the expected
number of infected individuals found in one month from now. On the other hand, a
typical inverse problem could be, given a discretely observed set of epidemiological
data, infer the transmission rate of the epidemic or its basic reproduction number.
Markovian epidemic models are stochastic models belonging to a wide class of pure
jump processes known as Stochastic Reaction Networks (SRNs), that are intended to
describe the time evolution of interacting particle systems where one particle interacts
with the others through a finite set of reaction channels. SRNs have been mainly
developed to model biochemical reactions but they also have applications in neural
networks, virus kinetics, and dynamics of social networks, among others.
4
This PhD thesis is focused on novel fast simulation algorithms and statistical
inference methods for SRNs.
Our novel Multi-level Monte Carlo (MLMC) hybrid simulation algorithms provide
accurate estimates of expected values of a given observable of SRNs at a prescribed
final time. They are designed to control the global approximation error up to a
user-selected accuracy and up to a certain confidence level, and with near optimal
computational work. We also present novel dual-weighted residual expansions for fast
estimation of weak and strong errors arising from the MLMC methodology.
Regarding the statistical inference aspect, we first mention an innovative multi-
scale approach, where we introduce a deterministic systematic way of using up-scaled
likelihoods for parameter estimation while the statistical fittings are done in the base
model through the use of the Master Equation. In a di↵erent approach, we derive
a new forward-reverse representation for simulating stochastic bridges between con-
secutive observations. This allows us to use the well-known EM Algorithm to infer
the reaction rates. The forward-reverse methodology is boosted by an initial phase
where, using multi-scale approximation techniques, we provide initial values for the
EM Algorithm.
5
ACKNOWLEDGEMENTS
Foremost, I would like to thank my supervisor, Raúl Tempone, for his outstanding
scientific advise and his permanent care on providing an ideal atmosphere for doing
research. To Pedro Vilanova, for our long-standing collaboration and friendship.
To my collaborators Christian Bayer and Fabrizio Ruggeri. To Anders Szepessy,
Jesper Oppelstrup, Erik von Schwerin, Håkon Hoel and Georgios Zouraris, for many
interesting scientific discussions and support during my PhD studies. To the past and
present members of the KAUST Stochastic Numerics Research Group and the SRI
Uncertainty Quantification Center, specially to Omar Knio, Olivier Le Maı̂tre and
Serge Prudhomme. To the members of my PhD thesis committee, for their feedback
and constructive criticism. To Carlos Castillo-Chavez, for his advise and generosity,
and to all his team in the MTBI program. To Boualem Djehiche, for sharing his
deep insights on stochastic analysis. To Petr Plecháč, for hosting me twice at Oak
Ridge National Lab. To Peter Glynn and Gerardo Rubino, for generously sharing
their vision about critical parts of my PhD research.
To my mentors and teachers in probability and statistics, Enrique Cabaña, Marco
Scavino, Gonzalo Perera and Ernesto Mordecki. To my friends and colleagues in the
University of the Republic, Jorge Graneri, Gustavo Guerbero↵, Franco Robledo and
Claudio Risso who always encouraged me to pursue my dreams.
To Leticia Garcı́a and Amal El Euch for their extraordinary efficiency, care and
constant help regarding administrative and logistic issues.
Last but not least, this PhD thesis is dedicated to my wife Estela and my children
Bruno and Juana, for their unconditional patience and love.
6
TABLE OF CONTENTS
Examination Committee Approval 2
Abstract 3
Acknowledgements 5
List of Abbreviations 13
List of Figures 14
List of Tables 17
I Introductory Chapters 18
1 Stochastic Reaction Networks 19

1.1 Definitions and Terminology . . . . . . . . . . . . . . . . . . . . . . . 20
1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3 The Master Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4 The Infinitesimal Generator of a SRN . . . . . . . . . . . . . . . . . . 26
1.5 Dynkin’s Formula for SRN . . . . . . . . . . . . . . . . . . . . . . . . 27
1.6 The Backward Kolmogorov Equation for SRNs . . . . . . . . . . . . . 28
1.7 Reaction Rates ODEs and Langevin Di↵usion Approximations to SRNs 30
1.8 Simulation of SRNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.8.1 Exact Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.8.2 Approximate Algorithms . . . . . . . . . . . . . . . . . . . . . 35
1.9 Statistical Inference for SRNs . . . . . . . . . . . . . . . . . . . . . . 39
1.9.1 The likelihood function . . . . . . . . . . . . . . . . . . . . . . 39
1.9.2 Methods for Maximum Likelihood Estimation . . . . . . . . . 41
1.10 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7
References 43
REFERENCES . . . . . . . . . . . . . . . . . 43
2 Stochastic Epidemic Models 47

2.1 Markovian Epidemic Models as SRNs . . . . . . . . . . . . . . . . . . 47
2.1.1 The SIR Model . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.1.2 Examples of Epidemic Models Based on the SIR Model . . . . 55
2.2 The Role of SRNs in Epidemic Models . . . . . . . . . . . . . . . . . 58
2.3 Challenges and Opportunities in Stochastic Epidemic Models . . . . . 60
References 63
REFERENCES . . . . . . . . . . . . . . . . . 63
3 Overview of Articles 66
3.1 Overview of Article I . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.1.1 The Cherno↵ Tau-Leap Method . . . . . . . . . . . . . . . . . 67
3.1.2 Our Hybrid Switching Rule . . . . . . . . . . . . . . . . . . . 68
3.1.3 Global Error Control . . . . . . . . . . . . . . . . . . . . . . . 71
3.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2 Overview of Article II . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2.1 Coupling Two Hybrid Paths . . . . . . . . . . . . . . . . . . . 74
3.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3 Overview of Article III . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.3.1 Optimal-work Splitting Rule . . . . . . . . . . . . . . . . . . . 79
3.3.2 Coupling Two Mixed Paths . . . . . . . . . . . . . . . . . . . 80
3.3.3 A New Control Variate . . . . . . . . . . . . . . . . . . . . . . 81
3.3.5 An illuminating example: A holding company . . . . . . . . . 82
3.3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.4 Overview of Article IV . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.4.1 What is wear? . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4.2 The Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4.3 The Base Model . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.4.4 A Gaussian moment expansion . . . . . . . . . . . . . . . . . 91
8
3.4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5 Overview of Article V . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5.1 A Two-phase Algorithm . . . . . . . . . . . . . . . . . . . . . 98
3.5.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
References 105
REFERENCES . . . . . . . . . . . . . . . . . 105
4 Concluding Remarks 106

4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.2 Future Research Work . . . . . . . . . . . . . . . . . . . . . . . . . . 107
References 109
REFERENCES . . . . . . . . . . . . . . . . . 109
II Included Papers 111
5 Hybrid Cherno↵ Tau-leap 112

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.1.1 The Pure Jump Process . . . . . . . . . . . . . . . . . . . . . 117
5.1.2 Gillespie’s SSA Method . . . . . . . . . . . . . . . . . . . . . . 119
5.1.3 The Tau-leap Approximation . . . . . . . . . . . . . . . . . . 120
5.1.4 Outline of this Work . . . . . . . . . . . . . . . . . . . . . . . 120
5.2 The Cherno↵ bound: One Step Exit Probabilities . . . . . . . . . . . 121
5.2.1 The Single-reaction Case . . . . . . . . . . . . . . . . . . . . . 122
5.2.2 The Many-reaction Case . . . . . . . . . . . . . . . . . . . . . 123
5.2.3 Cherno↵ Bound versus Gaussian Approximation . . . . . . . . 133
5.3 The One-step Switching Rule and Hybrid Trajectories . . . . . . . . . 136
5.3.1 The One-step Switching Rule Algorithm . . . . . . . . . . . . 136
5.3.2 The Hybrid Algorithm . . . . . . . . . . . . . . . . . . . . . . 140
5.3.3 The Global Exit Probability Bound . . . . . . . . . . . . . . . 141
5.4 Error Decomposition, Estimation and Control . . . . . . . . . . . . . 145
5.4.1 Global Error Decomposition . . . . . . . . . . . . . . . . . . . 145
5.4.2 Error Estimation and Control . . . . . . . . . . . . . . . . . . 147
5.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
5.5.1 A Simple Decay Model . . . . . . . . . . . . . . . . . . . . . . 152
9
5.5.2 Gene transcription and translation (GTT) . . . . . . . . . . . 158
5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Appendices 164
Appendix 5.A Expected number of tau-leap steps of a hybrid trajectory . 164
References 166
REFERENCES . . . . . . . . . . . . . . . . . 166
6 Multilevel Hybrid Cherno↵ Tau-leap 169

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.1.1 A Class of Markovian Pure Jump Processes . . . . . . . . . . 174
6.1.2 Description of the Modified Next Reaction Method (MNRM) . 176
6.1.3 The Tau-Leap Approximation . . . . . . . . . . . . . . . . . . 177
6.1.4 The Cherno↵-Based Pre-Leap Check . . . . . . . . . . . . . . 179
6.1.5 The Hybrid Algorithm for Single-Path Generation . . . . . . . 180
6.1.6 The Multilevel Monte Carlo Setting . . . . . . . . . . . . . . . 182
6.1.7 The Large Kurtosis Problem . . . . . . . . . . . . . . . . . . . 183
6.1.8 Outline of this Work . . . . . . . . . . . . . . . . . . . . . . . 185
6.2 Generating Coupled Hybrid Paths . . . . . . . . . . . . . . . . . . . . 186
6.2.1 Coupling Two Poisson Random Variables . . . . . . . . . . . . 187
6.2.2 Coupling Two Hybrid Paths . . . . . . . . . . . . . . . . . . . 188
6.3 Multilevel Monte Carlo Estimator and Global Error Decomposition . 191
6.3.1 The MLMC Estimator . . . . . . . . . . . . . . . . . . . . . . 191
6.3.3 Dual-weighted Residual Estimation of V` . . . . . . . . . . . . 194
6.4 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.4.1 Phase I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.4.2 Phase II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
6.4.3 Phase III . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.5.1 A Simple Decay Model . . . . . . . . . . . . . . . . . . . . . . 214
6.5.2 Gene Transcription and Translation [1] . . . . . . . . . . . . . 220
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Appendices 231
10
References 241
REFERENCES . . . . . . . . . . . . . . . . . 241
7 A multilevel adaptive reaction-splitting simulation method for stochas-

tic reaction networks 244
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
7.1.1 A Class of Markovian Pure Jump Processes . . . . . . . . . . 249
7.1.2 The Modified Next Reaction Method (MNRM) . . . . . . . . 249
7.1.3 The Tau-Leap Approximation . . . . . . . . . . . . . . . . . . 250
7.1.4 The Hybrid Cherno↵ Tau-leap Method . . . . . . . . . . . . . 251
7.2 Generating Mixed Paths . . . . . . . . . . . . . . . . . . . . . . . . . 252
7.2.1 The Splitting Heuristic . . . . . . . . . . . . . . . . . . . . . . 252
7.2.2 On the Work required to the Splitting Heuristic, Cs =Cs (J) . . 254
7.2.3 The one-step Mixing Rule . . . . . . . . . . . . . . . . . . . . 256
7.2.4 The Mixed-Path Algorithm . . . . . . . . . . . . . . . . . . . 257
7.2.5 Coupled Mixed Paths . . . . . . . . . . . . . . . . . . . . . . . 257
7.3 The Multilevel Estimator and Total Error Decomposition . . . . . . . 260
7.3.1 The MLMC Estimator . . . . . . . . . . . . . . . . . . . . . . 260
7.3.3 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . 263
7.4 A Control Variate Based on a Deterministic Time Change . . . . . . 265
7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Appendices 278
References 282
REFERENCES . . . . . . . . . . . . . . . . . 282
8 Multiscale Modeling of Wear Degradation in Cylinder Liners 284

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
8.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
8.2.1 The Pure Jump Process . . . . . . . . . . . . . . . . . . . . . 290
8.2.2 Upscaling: the Mean Field and Langevin equations . . . . . . 291
8.2.3 The second-order moment expansion . . . . . . . . . . . . . . 292
8.3 The thickness measurement process . . . . . . . . . . . . . . . . . . . 294
8.4 Inference for the thickness measurement process . . . . . . . . . . . . 295
11
8.4.1 Mean Field approximation . . . . . . . . . . . . . . . . . . . . 296
8.4.2 A Gaussian approximation based on moment expansion . . . . 296
8.5 Hitting Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
8.5.1 Conditional Residual Reliability . . . . . . . . . . . . . . . . 299
8.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
8.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
References 308
REFERENCES . . . . . . . . . . . . . . . . . 308
9 The Forward-Reverse Algorithm for Stochastic Reaction Networks

with Applications to Statistical Inference 310
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
9.1.1 Stochastic Reaction Networks . . . . . . . . . . . . . . . . . . 316
9.1.2 Deterministic Approximations of SRN . . . . . . . . . . . . . 317
9.1.3 The Stochastic Simulation Algorithm . . . . . . . . . . . . . . 318
9.1.4 Bridge Simulation for SDEs . . . . . . . . . . . . . . . . . . . 319
9.2 Expectations of Functionals of SRN-Bridges . . . . . . . . . . . . . . 322
9.2.1 The Master Equation . . . . . . . . . . . . . . . . . . . . . . . 322
9.2.2 Derivation or the Reverse Process . . . . . . . . . . . . . . . . 323
9.2.3 The Forward-reverse Formula for SRN . . . . . . . . . . . . . 325
9.3 The EM Algorithm for SRN . . . . . . . . . . . . . . . . . . . . . . . 329
9.3.1 The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 329
9.3.2 The Log-likelihood Function for Continuously Observed Paths 330
9.3.3 The EM Algorithm for SRNs . . . . . . . . . . . . . . . . . . 332
9.4 The Forward-Reverse Monte Carlo EM Algorithm for SRN . . . . . . 333
9.4.1 Phase I: Using Approximating ODEs . . . . . . . . . . . . . . 333
9.4.2 Phase II: The Monte Carlo EM . . . . . . . . . . . . . . . . . 334
9.5 Computational Details . . . . . . . . . . . . . . . . . . . . . . . . . . 338
9.5.1 On the Selection of the Number of Simulated Forward-Backward
Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
9.5.2 On the Complexity of the Path Joining Algorithm . . . . . . . 340
9.5.3 A Linear Transformation for the Epanechnikov Kernel . . . . 342
9.5.4 On the Stopping Criterion . . . . . . . . . . . . . . . . . . . . 344
9.6.1 Decay Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
9.6.2 Wear in Cylinder Liners . . . . . . . . . . . . . . . . . . . . . 349
12
9.6.3 Birth-death Process . . . . . . . . . . . . . . . . . . . . . . . . 352
9.6.4 SIR Epidemic Model . . . . . . . . . . . . . . . . . . . . . . . 352
9.6.5 Auto-regulatory Gene Network . . . . . . . . . . . . . . . . . 356
9.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Appendices 359
References 362
REFERENCES . . . . . . . . . . . . . . . . . 362
A Brief Review of Probability and Random Processes 365

A.1 Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
A.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
A.2.1 The Bernoulli random variable . . . . . . . . . . . . . . . . . . 371
A.2.2 Binomial and Geometric random variables . . . . . . . . . . . 372
A.2.3 The Uniform random variable . . . . . . . . . . . . . . . . . . 372
A.2.4 Inverse Transformation Method . . . . . . . . . . . . . . . . . 372
A.2.5 The Exponential random variable . . . . . . . . . . . . . . . . 373
A.2.6 Minimum of independent random variables . . . . . . . . . . . 374
A.2.7 Gaussian random variable . . . . . . . . . . . . . . . . . . . . 374
A.3 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
A.3.1 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . 375
A.3.2 Relation between Poisson and Exponential . . . . . . . . . . . 376
A.4 Convergence Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 377
A.5 Branching Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
A.6 The Monte Carlo Method . . . . . . . . . . . . . . . . . . . . . . . . 379
A.7 Multilevel Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Appendices 365
13
LIST OF ABBREVIATIONS
CDF Cumulative Density Function
EM Expectation Maximization
FREM Forward-Reverse Expectation Maximization
GTT Gene Transcription and Translation
ME Master Equation
MGF Moment Generating Function
MLE Maximum Likelihood Estimator
MLMC Multi-level Monte Carlo
MNRM Modified Next Reaction Method
ODE Ordinary Di↵erential Equations
PDF Probability Density Function
SDE Stochastic Di↵erential Equation

SEIR Susceptible - Exposed - Infective - Susceptible
SIR Susceptible - Infective - Removed
SIS Susceptible - Infective - Susceptible
SRN Stochastic Reaction Networks
SSA Stochastic Simulation Algorithm
14
LIST OF FIGURES
2.1 SIRdiagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 SIR stencil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3 SIR 3 paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4 SIR Mean Field and SSA paths . . . . . . . . . . . . . . . . . . . . . 52
2.5 Mean of Langevin paths for the SIR model . . . . . . . . . . . . . . . 55
2.6 SISdiagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.7 SEIRdiagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.8 SIRDEMdiagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.9 SIR with Demography . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.10 Mixed paths for the SIR model . . . . . . . . . . . . . . . . . . . . . 59
2.11 SIR example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1 Cherno↵ Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.2 Regions of the one-step switching rule . . . . . . . . . . . . . . . . . . 69
3.3 Computational work for generating Poisson random variates . . . . . 71
3.4 Predicted work (runtime) versus the estimated error bound GTT . . 73
3.5 Synchronization Horizons . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.6 Predicted work versus error bound for the GTT model . . . . . . . . 78
3.7 One SSA path of the money level of the holding company and five of
the business units paths . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.8 Loglog plot of the average work (runtime) per path over 5 batches. . . 86
3.9 Predicted work (runtime) versus the estimated error bound and the
control variate at level 0 . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.10 Data of cylinder liners . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.11 Global maximum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.12 Left panel: CDF of the hitting-time for B = 1. Right panel: PDF of
the hitting-time to the critical level. . . . . . . . . . . . . . . . . . . . 93
3.13 Conditional residual reliability function . . . . . . . . . . . . . . . . . 94
3.14 The two-phase estimation process . . . . . . . . . . . . . . . . . . . . 96
15
3.15 Data trajectory for the Birth-death example . . . . . . . . . . . . . . 96
3.16 FREM estimation (phase I and phase II) for the birth-death process . 97
3.17 Illustration of the forward reverse path simulation in Phase II. . . . . 100
5.1 Cherno↵ Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.2 Sign of Di (si ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.3 The other two cases for ⌧i (s) . . . . . . . . . . . . . . . . . . . . . . . 129
5.4 The approximating parabola . . . . . . . . . . . . . . . . . . . . . . . 131
5.5 The Cherno↵ bound vs. the Gaussian approximation . . . . . . . . . 135
5.6 Computational work for generating Poisson random variates . . . . . 138
5.7 Regions of the one-step switching rule . . . . . . . . . . . . . . . . . . 140
5.8 Cherno↵ step size for x0 2 {5, 10, 15, 20} . . . . . . . . . . . . . . . . 154
5.9 Cherno↵ step size as a function of x0 . . . . . . . . . . . . . . . . . . 155
5.10 SSA paths for the simple decay model with X0 =100 and T =2 . . . . 156
5.11 SSA paths for the simple decay model with X0 =105 and T =0.5 . . . 156
5.12 Predicted work (runtime) versus the estimated error bound . . . . . . 157
5.13 Efficiency index for EI and 95% confidence intervals . . . . . . . . . . 159
5.14 Predicted work (runtime) versus the estimated error bound GTT . . 160
5.15 Efficiency index for EI and 95% confidence intervals . . . . . . . . . . 161
6.1 Synchronization Horizons . . . . . . . . . . . . . . . . . . . . . . . . . 189

6.1 Predicted work versus error bound for the simple decay model . . . . 215
6.2 estimated weak error and estimated variance of the di↵erence between
two consecutive levels . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
6.3 Percentage of the statistical error over the computational global error,
for the simple decay model . . . . . . . . . . . . . . . . . . . . . . . . 218
6.4 One-step exit probability bound . . . . . . . . . . . . . . . . . . . . . 218
6.5 Strong error estimate and estimated variance of V` . . . . . . . . . . . 219
6.6 QQ-plot for the hybrid Cherno↵ MLMC estimates . . . . . . . . . . . 219
6.7 Predicted work versus error bound for the GTT model . . . . . . . . 221
6.8 Estimated weak error, ÊI,` , as a function of the time mesh size, t . . 222
6.9 Percentage of the statistical error over the computational global error,
for the GTT model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.10 The one-step exit probability bound in the GTT model . . . . . . . . 223
6.11 Strong error estimate for the GTT model . . . . . . . . . . . . . . . . 224
6.12 QQ-plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
16
6.13 Proportion of the number of Cherno↵ tau-leap steps over the total
number of tau-leap steps for the GTT model . . . . . . . . . . . . . . 227
6.14 Total number of tau-leap steps per path for the GTT model . . . . . 228
6.15 Total number of exact steps per path for the GTT model . . . . . . . 229
6.16 The ‘blending’ e↵ect . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
7.1 Predicted work versus the estimated error bound and details for the
ensemble run of the phase II algorithm . . . . . . . . . . . . . . . . . 271
7.2 Estimated weak error, ÊI,` , as a function of the time mesh size, h . . . 272
7.3 Percentage of the statistical error over the total error . . . . . . . . . 273
7.4 The one-step exit probability bound . . . . . . . . . . . . . . . . . . . 273
7.5 T OL versus the actual computational error . . . . . . . . . . . . . . . 274
7.6 Predicted work (runtime) versus the estimated error bound . . . . . . 276
8.1 Data of cylinder liners . . . . . . . . . . . . . . . . . . . . . . . . . . 286

8.1 Residuals of the opposite of the loglikelihood . . . . . . . . . . . . . . 301
8.2 Residuals of the minus loglikelihood . . . . . . . . . . . . . . . . . . . 301
8.3 Exact 90% confidence band from the Master Equation . . . . . . . . 303
8.4 Solution of the Master Equation . . . . . . . . . . . . . . . . . . . . . 303
8.5 CDF of the hitting time for B = 1 . . . . . . . . . . . . . . . . . . . . 304
8.6 Wear data and confidence band . . . . . . . . . . . . . . . . . . . . . 304
8.7 QQ-plot of the normalized thickness data . . . . . . . . . . . . . . . . 305
8.8 The conditional residual reliability function . . . . . . . . . . . . . . . 305
9.1 Illustration of the forward-reverse path simulation in Phase II . . . . 335

9.1 Gaussian clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9.2 Cloud H(Z). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9.1 Data trajectory for the Decay example . . . . . . . . . . . . . . . . . 348
9.2 FREM estimation (phase I and phase II) for the decay process. . . . . 348
9.3 Wear data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
9.4 FREM estimation (phase I and phase II) for the wear data set. . . . . 350
9.5 Wear confidence bands . . . . . . . . . . . . . . . . . . . . . . . . . . 351
9.6 Data trajectory for the Birth-death example . . . . . . . . . . . . . . 353
9.7 FREM estimation (phase I and phase II) for the birth-death process. 353
9.8 Data trajectory for the SIR example . . . . . . . . . . . . . . . . . . 355
9.9 FREM estimation (phase I and phase II) for the SIR model . . . . . . 355
9.10 Data trajectory for the auto-regulatory gene network example . . . . 357
17
LIST OF TABLES
2.1 Values generated by the FREM Algorithm for the SIR model. . . . . 60
3.1 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.2 FREM algorithm for the birth-death example . . . . . . . . . . . . . 97
5.1 One-step switching rule summary . . . . . . . . . . . . . . . . . . . . 139

5.2 Details for an ensemble for the simple decay model . . . . . . . . . . 158
5.3 Ensemble of five independent runs of Algorithm 9 for GTT model . . 161
6.1 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

6.1 Details of the ensemble run of Algorithm 18 for the simple decay model 216
6.2 Ensemble run of Algorithm 18 for the GTT model . . . . . . . . . . . 222
7.1 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
9.1 Values computed by the FREM Algorithm for the decay example. . . 347
9.2 Values computed by the FREM Algorithm for the wear example. . . . 351
9.3 Values computed by the FREM Algorithm for the birth-death example. 354
9.4 Values computed by the FREM Algorithm for the SIR model. . . . . 354
18
Part I
Introductory Chapters
19
Chapter 1
Stochastic Reaction Networks
Stochastic reaction networks (SRNs) are continuous-time Markov chains intended to

describe, from the kinetic point of view, the time-evolution of chemical systems in
which molecules of di↵erent chemical species undergo a finite set of reaction channels.
These chemical systems are assumed to be at thermal equilibrium, but not necessarily
at chemical equilibrium. Stochastic variability has been observed in chemical systems
where one or more species are present in low molecule numbers. In those cases, the
classical description by reaction-rate ordinary di↵erential equations (ODEs) cannot
capture phenomena such as the sudden extinction of one of the species. The theory of
SRNs o↵ers instead a natural framework for incorporating the stochastic variability
of such systems.
Among the phenomena usually modeled by SNRs, we mention: spread of epi-
demic diseases in homogeneously mixed populations, neural networks, virus kinetics,
communication networks, and dynamics of social networks.
The outline of this chapter is as follows: we first give a mathematical definition
of SRNs and mention a few examples addressing typical questions associated with
those models. Then, we present a (typically very high-dimensional) linear system
of ODEs, known as the Master Equation, that describes the time evolution of the
probability distribution of a SRN in the state space of the system. Then, we intro-
duce the infinitesimal generator, the Dynkin formula and the Kolmogorov backward
20
equations. Next, through the notion of infinitesimal generator of a Markov process
and Dynkin’s formula, we show how our stochastic description of SRNs can be con-
nected to Langevin di↵usions and the classical reaction-rate ODEs. Later, we show
how to simulate paths of a SRN through the Stochastic Simulation Algorithm (SSA).
Paths generated by the SSA are considered exact because its probability distribution
obeys the Master Equation. Depending on each particular case, to generate exact
paths by the SSA may be computationally demanding. Next, we introduce the tau-
leap method which generates approximate paths but using less computational work.
Finally, we refer to the inverse problem of inferring the reaction coefficients of the
reaction channels from observed data.
1.1 Definitions and Terminology
Given an underlying probability space, (⌦, F, P) 1 , a SRN is stochastic process in

continuous time, X : [0, T ] ⇥ ⌦ ! Zd+ , that describes the time evolution of a
homogeneously-mixed chemically reacting system where d di↵erent species of molecules,
(S1 , S2 , . . . , Sd ), undergo a finite set of reaction channels, (R1 , R2 , . . . , RJ ). Depend-
ing on the context, individuals or particles may be used as synonyms of molecules.
The i-th coordinate of X is a non-negative integer number, Xi (t), that keeps track
of the abundance of the i-th species, Si , at time, t. The reaction channels are pairs,
Rj = (⌫j , aj ), j = 1, 2, . . . , J, where ⌫j 2 Zd are known as stoichiometric vectors,
and aj : Zd+ ! R+ are know as propensity functions. More concretely, a SRN is a
continuous-time Markov chain defined by the probabilities:
P X(t + dt) = x + ⌫j X(t) = x = aj (x) dt, j = 1, 2, . . . , J and (1.1)

J
!
X
P X(t + dt) = x X(t) = x = 1 aj (x) dt.
j=1
1
Appendix A contains a brief review of Probability and Random Processes
21
Formula 1.1 means that the probability of observing a jump of the process, X, from
the state x to the state x + ⌫j , caused by the firing of the j-th reaction channel,
Rj , during the infinitesimal time interval, (t, t + dt], is proportional to the length
of the time interval, dt, with aj (x) as the constant of proportionality. Even more,
due to the memory-less property of Markov processes, given that X is at state x
at time t, the time to the next reaction is exponentially distributed with parameter
P
a0 (x) := Jj=1 aj (x).
Every reaction channel, Rj , must satisfy the non-negativity assumption: if x 2 Zd+
/ Zd+ , then, aj (x) = 0, i.e., the system can never produce negative
but x + ⌫j 2
population values.
The jargon (e.g. “species” and “reaction channels”) is taken from the theory of
chemical kinetics. For instance, the stoichiometric vectors, ⌫j , and propensity func-
tions, aj , are known as the transition vectors and the intensity functions, respectively,
in the theory of Markov processes. Another very important feature taken from the
theory of stochastic chemical kinetics is the called stochastic mass-action kinetics
principle, which provides a mathematical model for the reaction channels, Rj . The
stochastic mass-action kinetics principle is usually represented by a diagram like:
cj
↵j,1 S1 + · · · + ↵j,d Sd * j,1 S1 + · · · + j,d Sd , (1.2)
implying that, when the j-th reaction channel, Rj = (⌫j , aj (x)), fires in the infinitesi-
mal time interval, (t, t + dt], and the process X is at the state x at time t, the number
of particles of the species, Si , changes from xi to xi ↵j,i + j,i . More specifically,
the relation (1.2) implies that, when the reaction Rj takes place, ↵j,i molecules of
the species Si are consumed and j,i are produced. Thus, ↵j,i 2 Z+ , j,i 2 Z+ but
j,i ↵j,i can be a negative integer. In this case, the vectors, ⌫j , in the equation
(1.1) are defined by ⌫j := ( j,1 ↵j,1 , . . . , j,d ↵j,d ) 2 Zd and the propensities, aj ,
22
appearing in the right hand side of the equation (1.1) are defined by
d
Y xi !
aj (x) := cj 1{x ↵j,i } , (1.3)
i=1
(xi ↵j,i )! i
where cj is a positive constant for j = 1, 2, . . . , J, and 1A is the indicator function

of the set A, i.e., 1A = 1 when A is satisfied and 0 otherwise. In our examples,
the propensities, aj , obey the relation 1.3, unless otherwise stated. We observe that
under the stochastic mass-action kinetics principle, the non-negativity assumption is
satisfied.
Remark 1.1.1 (Random Time-change Representation). Let X be a SRN defined

through its reaction channels, Rj = (⌫j , aj ), j = 1, 2, . . . , J, with initial state x0 . The
following random time-change representation of X is due to T.G. Kurtz [1, 2, 3]:
J
X ✓Z t ◆
X(t) = x0 + ⌫ j Yj aj (X(s)) ds , (1.4)
j=1 0
where Yj : R+ ⇥⌦ ! Z+ are independent unit-rate Poisson processes. The represen-

tation 1.4 can be interpreted as follows: given ! 2 ⌦, consider the paths of J indepen-
dent unit-rate Poisson processes, say Y1 (t, !), Y2 (t, !), . . . , YJ (t, !), if supx a0 (x) < 1,
there is a unique solution, X(t, !), of (1.4)(where as usual, ! has been omitted).
A solution of (1.4), can explode in finite time unless we impose additional con-
ditions on the set of reaction channels. For instance, in [4] the authors assume that
species can only be transformed into other species or be consumed. In [5], a balance
condition on the propensity functions that control the flow of particles entering and
leaving the system is proposed. More recent studies can be found in [6, 7].
23
1.2 Examples
In this section, we present a few examples of SRNs. For each example, from the
chemical-kinetics diagrams, we derive the reaction channels: Rj = (⌫j , aj (x)).
1
Example 1.2.1 (Standard Poisson Process). Consider the diagram, ; * I, that
corresponds to a process, Y , taking values in Z+ , with a unique reaction channel,
R = (1, 1). The process evolves in time by jumps of size 1, and with constant intensity,
a(y) ⌘ 1, y = 0, 1, 2, . . .. Therefore, Y , is defined by:
P Y (t + dt) = y + 1 Y (t) = y = dt.
If Y (0) = 0 then Y is a standard Poisson process with rate = 1 on the real line.
ty
In this case, we know that P (Y (t) = y) = exp( t) , y = 0, 1, 2, . . ., i.e., Y (t) ⇠
y!
Poisson (t) (see Section A.3.1).
µ
Example 1.2.2 (Exponential Decay). Let us consider the diagram I * ; which
corresponds to a SRN with a unique reaction channel R = ( 1, µ i). If the state of
the process X at time t is X(t) = i, the holding time to the next reaction is given by
an exponential random variable with expected value (µ i) 1 , implying that the system
slows with the pass of the time. If we set X(0) = x0 then, it can be proved using the
Dynkin formula (1.10) that E [X(t)] = x0 exp( µ t).
Example 1.2.3 (Birth and Death Processes). Let us consider a stochastic process X
in the set of non-negative integers, Z+ , such that for the state i we have two possible
transitions: i ! i+1 (birth) and i ! i 1 (death, when i > 1).
Now, let us list few cases of birth and death processes derived from SRNs:
µ
1. ; * I and I * ;. It means that there are two reaction channels, R1 =
(1, ) and R2 = ( 1, µ i). The first channel, R1 , means that newborns are
24
introduced in the population at constant rate . The second reaction channel,
R2 , corresponds to a death process driven by an exponential decay with rate µ.
µ
2. The diagrams I * 2 I and I * ; correspond to the reaction channels R1 =
(1, i) and R2 = ( 1, µ i). Here both propensities are linear functions of the
state of the process, i.
µ
3. The diagrams I * 2 I and 2 I * I correspond to the reaction channels, R1 =
(1, i) and R2 = ( 1, µ i(i 1)). Observe that the second reaction happens with
zero probability when i < 2.
In this context, a common place problem is the estimation of the expected value
of a given functional, g, of the SRN, X, at a certain time T , i.e., E [g(X(T ))] with a
prescribed accuracy and up to a certain confidence level.
1.3 The Master Equation
The Master Equation, also known as the Chemical Master Equation (CME), is a linear
system of ODEs for P X(t) = x X(0) = x0 , i.e., the probability of the process, X,
being in the state x at time t, having departed from the state x0 at time 0.
Since X is a Markov chain, it satisfies the Chapman-Kolmogorov equation: for
any s 2 (0, t)
X
P X(t) = x X(0) = x0 = P X(t) = x X(s) = y ⇥ (1.5)
y2Z+
d
⇥ P X(s) = y X(0) = x0 .
Consider the probability mass function, px (t) := P X(t) = x X(0) = x0 , de-

fined in t 2 [0, +1) and x 2 Zd+ . By the Chapman-Kolmogorov equation (1.5), and
25
the definition of X in (1.1), we have
X
px (t + h) = P X(t) = y X(0) = x0 P X(t + h) = x X(t) = y
y
X
= P X(t) = x ⌫j X(0) = x0 P X(t + h) = x X(t) = y
j
+ P X(t) = x X(0) = x0 P X(t + h) = x X(t) = y

!
X X
= px ⌫j (t)(aj (x ⌫j )h + o (h)) + px (t) 1 aj (x)h + o (h) ,
j j
which gives the Master Equation for SRNs (see [8, 9, 10, 11]):
8
> dpx (t) px (t + h) px (t)
>
> = lim
>
< dt h!0 h
P P
= (1.6)
> j px ⌫j (t)aj (x ⌫j ) px (t) j aj (x)
>
>
>
: p (0) = 1
x x=x0 ,
dP (t)
which is a linear system of ODEs, = AP (t), where: i) the square matrix A is
dt
sparse, ii) it has the size of the state space of the process, X, and iii) its entries are
the propensities, aj , which in most cases are nonlinear functions of the states of X.
In principle, every question about the time-evolution of a SRN can be answered by
solving its corresponding Master Equation, but a general analytic solution for (1.6) is
in general too costly in terms of computational work [11]. Since numerical solutions
are computationally feasible for relatively non-sti↵ small systems, those questions can
be addressed only by Monte Carlo methods. Any Monte Carlo method applied to SRN
depends on simulating paths of the process, X, in the time interval [0, T ]. Developing
fast and accurate methods for simulating paths of SRNs is a major motivation in this
PhD thesis.
26
1.4 The Infinitesimal Generator of a SRN
A SRN {X(t)}t 0 is a continuous-time Markov process and therefore, it can be charac-

terized by its infinitesimal generator [2]. More precisely, given the filtration generated
by the process X, FtX , we can define the strongly continuous semi-group {T (t)}t 0
of operators by
⇥ ⇤
T (s)f (X(t)) := E f (X(t + s)) FtX .
The infinitesimal generator of {T (t)}t 0 is defined by:
1
LX (f ) := lim+ {T (t)f f }, (1.7)
t!0 t
where the domain of the di↵erential operator, LX , is the set of functions, Dom(f ), in
which the limit of the right hand side of (1.7), exists.
Since X, is a Markov process, we know that conditioning on the filtration gen-
erated by the process X up to the time t, FtX , is equivalent to conditioning on the
current state of the process X(t) = x. Then, we have,
1 1 ⇥ ⇤
{T (h)f (x) f (x)} = E f (X(t + h)) X(t) = x f (x)
h h !
1 X
= f (y)P X(t + h) = y X(t) = x f (x)
h y
! !
1 X X
= f (x + ⌫j )(aj (x)h + o (h)) + f (x) 1 h aj (x) + o (h) f (x)
h j j
!
X o (h) X o (h)
= f (x + ⌫j )(aj (x) + ) f (x) aj (x) .
j
h j
h
Taking limits when h ! 0+ , we obtain the formula for the infinitesimal generator of
X:
J
X
Lf (x) := aj (x) (f (x + ⌫j ) f (x)) . (1.8)
j=1
27
1.5 Dynkin’s Formula for SRN
The Dynkin formula [2, 12] states that
Z t
E [f (X(t))] = f (X(0)) + E [Lf (X(s))] ds. (1.9)
0
In the particular case of SRNs, we obtain the following identity:
Z " #
t X
E [f (X(t))] = f (X(0)) + E aj (X(s)) (f (X(s) + ⌫j ) f (X(s))) ds. (1.10)
0 j
Let us prove this particular case: first, multiply both sides of the Master Equation
(1.6) by f (x) and sum over x 2 Zd+ . We obtain (provided all the series converge)
X dpx (t) X X
f (x) = f (x) px ⌫j (t)aj (x ⌫j ) px (t)aj (x)
x
dt x j
XX
(⇤) = f (x)aj (x ⌫j )px ⌫j (t)
j x
XX
f (x)aj (x)px (t)
j x
XX
(⇤⇤) = f (x0 + ⌫j )aj (x0 )px0 (t)
j x0
XX
f (x)aj (x)px (t)
j x
XX
= (f (x0 + ⌫j ) f (x0 ))aj (x0 )px0 (t)
j x0
X X
= px (t) (f (x + ⌫j ) f (x))aj (x)
x j
= E [Lf (X(t))] .
P
Notice that in j f (x)aj (x ⌫j )px ⌫j (t) we are keeping fixed the final state, x, while
P
in j f (x0 + ⌫j )aj (x0 )px0 (t) we are keeping fixed the starting state, x0 , so we conclude
P P
that both sums are not equal. But notice that in (⇤) = j x f (x)aj (x ⌫j )px ⌫j (t)
28
P P 0 0
and (⇤⇤) = j x0 f (x + ⌫j )aj (x )px0 (t) all pair of states, y and z, which are con-
nected by one reaction channel (i.e., y + ⌫j = z for some ⌫j ), are counted once, with
a and p computed on y and f computed on z, thus (⇤) = (⇤⇤).
P dpx (t) dE [f (X(t))]

Since, x f (x) = , we have
dt dt
dE [f (X(t))]
= E [Lf (X(t))] . (1.11)
dt
Formula (1.10) follows integrating both sides of (1.11) in the interval (0, t]
Example 1.5.1. By taking a linear observable f (x) = x, Dynkin’s formula (1.10)

R t hP J i
gives: E [X(t)] = x0 + 0 E j=1 a j (X(u))⌫ j du. In the exponential decay example
Rt
1.2.2, it results in E [X(t)] = x0 0
µ(E [X(u))] du. By di↵erentiating both sides of
this equation, we have a first order linear di↵erential equation for m(t) = E [X(t)]:
ṁ(t) = µm(t), with the initial condition m(0) = x0 . This gives: E [X(t)] =
x0 exp( µ t).
Remark 1.5.2 (Closure techniques). One can be tempted to use Dynkin’s formula to
obtain systems of ODEs for E [f (X(t))] or Var [f (X(t))] (or higher order moments of
f (X(t))), but in most cases, this is only possible by using moment closure techniques
[13, 14].
1.6 The Backward Kolmogorov Equation for SRNs
Let us define the cost-to-go function
⇥ ⇤ X
u(t, x) := E f (X(T )) X(t) = x = f (y) P X(T ) = y X(t) = x .
y
Here, we derive a system of ODEs satisfied by u known as Kolmogorov’s backward

equations. First, by the Chapman-Kolmogorov equation and the definition of SRN,
29
we can write
X
P X(T ) = y X(t) = x = P X(T ) = y X(t + h) = x + ⌫j (aj (x)h + o (h))
j
X
+ P X(T ) = y X(t + h) = x (1 h aj (x) + o (h)).
j
As a consequence:
P X(T ) = y X(t + h) = x P X(T ) = y X(t) = x

X
= P X(T ) = y X(t + h) = x + ⌫j (aj (x)h + o (h))
j
X
+ P X(T ) = y X(t + h) = x (aj (x)h + o (h)) .
j
Therefore,
@P X(T ) = y X(t) = x
=
@t
X
aj (x) P X(t) = y X(t) = x + ⌫j P X(t) = y X(t) = x .
j
Now, multiply both sides by f (y) and sum over y 2 Zd+ , to obtain
@u(t, x) X @P X(T ) = y X(t) = x

= f (y)
@t y
@t
!
X X
= f (y) aj (x){P X(t) = y X(t) = x + ⌫j P X(t) = y X(t) = x }
y j
X X
= aj (x) f (y)P X(T ) = y X(t) = x + ⌫j
j y
X X
+ aj (x) f (y)P X(T ) = y X(t) = x
j y
X
= aj (x)(u(t, x + ⌫j ) u(t, x)) = Lu(t, x).
j
30
We arrive at the famous system of Kolmogorov’s backward equations:
8
< @u(t, x) + Lu(t, x) = 0 with t 2 [0, T ), x 2 Zd ,
>
+
@t (1.12)
>
: u(T, x) = f (x) with x 2 Zd+ .
In general, it is not possible to solve in closed form the linear system of ODEs (1.12),
which can be seen as the dual of the Master Equation. Numerical methods for solving
(1.12) is an active area of research [15]. Its unique solution u admits a stochastic
⇥ ⇤
representation, E f (X(T )) X(t) = x , which is a particular case of the more general
Feynman-Kac formula (see [16]). In this PhD thesis we are particularly interested
⇥ ⇤
in estimating u(0, x0 ), through its stochastic representation E f (X(T )) X(0) = x0 ,
using Monte Carlo Methods. It means that we sample independent and identically
distributed approximate paths of X departing from x0 at time zero, evaluate f at the
final time, T , and consider empirical averages of those values.
1.7 Reaction Rates ODEs and Langevin Di↵usion
Approximations to SRNs
In this section, we consider two types of approximations to our SRN models. Re-
garding the scale in which we describe some natural and artificial phenomena, a SRN
model corresponds to the microscopic scale, while its reaction-rate ODEs and the
Langevin-Itô di↵usions approximations correspond to its macroscopic and mesoscopic
scales, respectively [17, 18].
Let LX be the generator of a SRN, X, defined through its reaction channels,
Rj = (⌫j , aj ), j = 1, 2, . . . , J, and with initial state x0 :
X
LX f (x) = aj (x)(f (x + ⌫j ) f (x)).
j
31
A formal Taylor-expansion of f of order one, at the state x, gives:
X
LZ f (x) = aj (x)f 0 (x)⌫j .
j
It turns out that LZ is the generator of a deterministic system of ODEs known as

reaction-rate ODEs or mean field equations (see [18]):
8
>
< dZ(t) = ⌫ a(Z(t)) dt, t 2 R+ ,
(1.13)
>
: Z(0) = x0 .
where the j column of the matrix ⌫ is ⌫j , and a is a column vector with components
aj .
To see this, define the strongly continuous semi-group {T (t)}t 0 of operators by
T (s)f (Z(t)) := f (Z(t + s; Z(t) = z))
that is, the value of the solution Z of (1.13) at time t + s, knowing that Z(t) = z.
Consider the quotient
1 1
{T (h)f (z) f (z)} = (f (Z(t + h; Z(t) = z)) f (z))
h h✓ ◆
1 0 dZ(t)
= f (z) + f (z) h + o (h) f (z)
h dt
dZ(t) o (h)
= f 0 (z) + .
dt h
32
+
Then, taking limits when h ! 0 , we obtain
dZ(t)
LZ f (z) = f 0 (z)
dt
= f 0 (z)⌫ (a(Z(t)))
X
= f 0 (z) ⌫j aj (z)
j
X
= aj (z)f 0 (z)⌫j .
j
Remark 1.7.1 (Relation with Dynkin’s formula). In the affine case, i.e., when all
the propensities aj (x) are affine functions of x as well as f ; the reaction-rate ODEs
coincides with the di↵erential equation for E [X(t)] obtained by Dynkin’s formula (see
Example 1.5.1).
In 1970s, T.G. Kurtz [1, 2] proved versions of the law of large numbers and the
central limit theorem relating SRNs, and the associated reaction-rate ODEs (1.13).
These results are intended for density dependent SRNs, i.e., when aj (x) can be written
as n ãj (x/n). The parameter n is a scaling parameter, that could be, for instance, the
initial number of particles in the system. Kurtz consider the limit of a scaled family of
SRNs indexed by n where the sequence of initial states, x0,n , is such that x0,n /n ! x0
as n ! +1. It means that, at least for the first moments of our system, all the species
are in abundance. For that reason, the propensity functions obey the power law known
Q ↵
as deterministic mass-action kinetics principle, i.e., a⇤j (x) := cj di=1 xi j,i ⇡ aj (x) for
large n. In this PhD thesis, we are not particularly interested in asymptotic results,
by the contrary, we are interested in the cases when one or more species (but not all)
are scarce.
Now, let us consider a generator obtained by a formal Taylor-expansion of f of
33
order two, at the state x:
X✓ 1 T 00
◆
0
LY f (x) = aj (x)f (x)⌫j + ⌫j f (x)⌫j ,
j
2
which is the generator of an Itô di↵usion, Y , which satisfies the Langevin stochastic
di↵erential equation (SDE) (see [18]):
8
> p
< dY (t) = ⌫ a(Y (t)) dt + ⌫ diag( a(Y (t))) dW (t), t 2 R+ ,
(1.14)
>
: Y (0) = x0 2 R+ ,
where W (t) is standard J-dimensional Wiener process and diag(v), where v is a

vector, denotes a diagonal matrix whose diagonal entries are the elements of v. To
prove this assertion, we require the tools from Itô calculus which is beyond the scope
of this introduction (see [19]).
1.8 Simulation of SRNs
As we previously mentioned, analytic or numerical solutions of the Master Equation

are infeasible in most cases. This is due to nonlinearites in the propensities, aj as well
as high dimension or high number of states of the state space of X. By simulating
paths of the process X, we can use the Monte Carlo method to approximate quantities
of interest like i) expected values of observables of the process in a certain fixed time
T , i.e., E [g(X(T ))]; ii) hitting times of X, i.e., the elapsed random time that the
process X takes to reach for the first time a certain subset A of the state space, iii)
functionals of the paths of X, to mention a few. In this section, we review exact and
approximate path-simulation algorithms.
34
1.8.1 Exact Algorithms
Exact algorithms simulate paths of X that satisfy the defining probabilities (1.1) and
consequently the Master Equation (1.6). Hence, a path simulated by an exact method
has the correct statistical distribution and, as a consequence, only the sampling error
(Monte Carlo error or statistical error) is relevant to estimate quantities of interest
like E [g(X(T ))].
The Stochastic Simulation Algorithm (SSA)
In [20], the algorithm known as SSA or next reaction or Kinetic Monte Carlo or
Feller-Doob algorithm, is popularized by Gillespie for simulating chemical reactions:
1. Set x x0 and t 0
P
2. In state x at time t, compute (aj (x))Jj=1 and the sum a0 (x) = j aj (x).
3. Simulate the time to the next reaction, ⌧ , as an exponential random variable

with rate a0 (x).
4. Simulate independently the next reaction, ⌫j , according to the probability mass

function (aj (x)/a0 (x))Jj=1 .
5. Update: t t + ⌧ and x x+⌫j .
6. Record (t, x). Return to step 2 if t < T , otherwise end the simulation.
It is based on the idea that, obeying (1.1), the probability for only the j-th reaction
P
firing in (t, t + h), is aj (x) exp( a0 (x)h), where x = X(t) and a0 (x) = j aj (x).
aj (x)
This last expression can be written as the product ⇥ a0 (x) exp ( a0 (x)h). So,
a0 (x)
it can be sampled as the product of two independent random variables: the first
aj (x)
factor, , is the probability of choosing the j-th reaction proportional to its cur-
a0 (x)
rent propensity. The second factor is the density of an exponential random variable
35
with rate a0 (x). Summarizing, the SSA can be deduced as follows: since X is a
continuous-time Markov chain, we have that: a) the holding times (or times between
two consecutive reactions or inter-arrival times) are exponentially distributed, with
rate equals to the sum of the propensities evaluated at the current state, x; b) the re-
action to occur next should be chosen at random such that the probability of choosing
reaction i is proportional to ai (x).
Modified Next Reaction Method (MNRM)
The MNRM is based on the random time-change representation (1.4). The MNRM is
based on the Next Reaction Method (NRM) [21]. It is an exact simulation algorithm
like Gillespie’s SSA but it needs only one exponential random variable per iteration.
The reaction times are modeled with firing times of Poisson processes, Yj , with in-
ternal times given by the integrated propensity functions. The MRNM can be easily
modified to generate paths in the cases where the rate functions depend on time and
also when there are reactions delayed in time. The MNRM is also used in [22] to
couple exact and approximate step in the multilevel Monte Carlo setting. A careful
description of the MNRM due to D. Anderson [23] is made in Section 7.1.2.
1.8.2 Approximate Algorithms
At first glance, there is no need for approximate algorithms since we have simple and
easy to implement exact algorithms at hand (e.g. SSA and MNRM). Still, observe that
1
the expected holding time is (a0 (x)) which can be extremely short in some regions
of the state space, making simulations computationally very costly. Large values of
a0 (x) can be caused for instance by a few reaction channels with high propensities in x
(channels with high activity) while the others having small propensity values. In this
case, if there is variability caused by the reaction channels with smaller propensities
then we can not rely on the reaction-rate ODE approximation.
36
Tau-Leap Methods
Approximate algorithms, instead of simulating the time between two consecutive

reactions (inter-arrival times), fix a time interval, say (t, t + ⌧ ], and simulate the
number of firings during (t, t + ⌧ ], N1 , N2 , . . . , NJ , of each reaction channel; producing
a leap, from the state x at time t, to the state x+N1 ⌫1 +. . .+NJ ⌫J at time t+⌧ . For
that reason, these methods are known as tau-leap methods. The di↵erences among
them are due to the choice of the method for sampling N1 , N2 , . . . , NJ and the choice
of the time increment ⌧ .
The Explicit Tau-Leap Method: motivated by Kurtz’s random time-change
representation:
J
X ✓Z t ◆
X(t) = X(0) + Yj aj (X(s))ds ⌫j ,
j=1 0
where Yj are independent unit-rate Poisson processes, we obtain the explicit tau-leap
Rt
method [24] by approximating the integrals, 0 aj (X(s))ds, by forward Euler sums in
a partition, {s0 , s1 , . . . , sN }, of the interval [0, t]; resulting in
J N
!
X X1
X̄(t) = x0 + Yj aj (X̄(sn ))(sn+1 sn ) ⌫ j .
j=1 n=0
⇣P ⌘
N 1
In this case, Nj = Yj n=0 aj (X̄(sn ))(sn+1 sn ) . Since each Yj is a path of a unit-
rate Poisson process, to simulate X̄(t + ⌧ ) conditional on having observed X̄ until
⇣P ⌘
N 1
time t, we can split Yj n=0 aj (X̄(sn ))(sn+1 sn ) + aj (X̄(t)) ⌧ into two terms: i)
⇣P ⌘
N 1
Yj n=0 aj (X̄(sn ))(sn+1 sn ) , which is known because it depends on X̄ up to the
time t, and ii) a Poisson random variable Pj (aj (X̄(t)) ⌧ ) 2 . In this way, we have an
2
A Poisson random variable with rate 1 + 2 can be decomposed as the sum of two independent
Poisson random variables with respective rates 1 and 2 , provided 1 and 2 are non-negative real
numbers.
37
explicit iterative tau-leap method: given X̄(t) = x 2 Zd+ ,
J
X
X̄(t+⌧ ) = x + Pj (aj (x) ⌧ ) ⌫j .
j=1
Here, Pj ( j ), j = 1, 2, . . . , J, are independent Poisson random variables with rate j.
An Implicit Tau-Leap Method: the motivation for an implicit scheme is that

larger time-steps can be taken when some reactions are very fast and the system
is close to a stable manifold. To implement an implicit iterative tau-leap method,
P
consider we can add and subtract ⌧ Jj=1 aj (x) ⌫j in the explicit tau-leap, obtaining
J
X
X̄(t+⌧ ) = x + Pj (aj (x) ⌧ ) ⌫j
j=1
J
X J
X
=x+⌧ aj (x) ⌫j + (Pj (aj (x) ⌧ ) aj (x) ⌧ ) ⌫j .
j=1 j=1
This implicit tau-leap method has two steps, if x = X̄(t):

PJ
1. Solve: y = x + ⌧ j=1 aj (y) ⌫j .
PJ
2. Simulate: X̄(t + ⌧ ) = x + j=1 Pj (aj (y) ⌧ ) ⌫j , (observe that X̄(t + ⌧ ) 2 Zd+ ).
Semi-implicit methods have been proposed in the literature [25].
Negative Populations in the Tau-leap Method
The increment of the tau-leap method are linear combinations of the stoichiometric
vectors, i.e., X̄(t; ⌧ ) = N1 ⌫1 + . . . + NJ ⌫J . Let us remind that, i) the random
variables Nj , j = 1, 2, . . . , J, are independent and non-negative integer valued, and
ii) the stoichiometric vectors, ⌫j , may have negative components. As a consequence,
a drawback of the tau-leap method is that, it is perfectly possible to jump from
one state x, with non-negative components, to a state y with at least one negative
coordinate, that is, x = X̄(t) 2 Zd+ , but x + / Zd+ .
X̄(t; ⌧ ) 2
38
To remedy this feature many methods have been proposed in the literature, in
general they fall in at least one of the following categories:
/ Zd+ , set the negative values of y equal to zero or

1. Projection or reflection: if y 2
assign a positive number, for instance, taking the absolute value of the negative
coordinates (see e.g. [22]).
2. Change the distribution of Nj : instead of sampling Poisson random variables,

which are unbounded, replace it by binomial random variables, which are bounded,
and control its parameters to avoid negative populations (see e.g. [26, 27]).
/ Zd+ given x (see

3. Pre-leap methods: choose ⌧ to control the probability of y 2
e.g. [28, 29]).
4. Post-leap methods: each time y has negative components, record the Poisson
random variables and sample new ones but halving ⌧ . Repeat recursively this
until all the coordinates are non-negative. Then, use the recorded values of the
Poisson variates to sample Poisson bridges (see e.g. [30, 31]).
All the previous categories have their own disadvantages: in the first two, we are
introducing a modeling error that should be taken into account when approximating
quantities of interest; in the pre-leap category, we are not avoiding the problem of
reaching negative populations, just controlling the exit probability; the post-leap
methods can be memory consuming and impractical to perform multilevel Monte
Carlo. In this PhD thesis, we propose a pre-leap method for the explicit tau-leap
scheme, so we do not avoid the possibility of reaching negative populations, but we
introduce an efficient way of controlling the exit probabilities. Even more, when ⌧ is
of the order of the inter-arrival times in the exact method, we switch to the SSA or
the MNRM adaptively. Thus, we propose hybrid path-simulation methods.
39
1.9 Statistical Inference for SRNs
The problem of inferring the coefficients of the propensity functions based on observed
data is relevant in the applications (e.g. condition-based maintenance, design of
chemical reactors, control of epidemic diseases).
Di↵erent techniques can be used depending on the data and the prior knowledge
about the unknown parameters. Regarding the data, it can be continuously observed
(when we observe the time-evolution of the paths of X) or discretely observed (when
there is a separation between consecutive observation of one path of X). Let us
remark that SRNs are pure jump processes and for that reason they are constant
between to consecutive jumps, and the jumps are events of Poisson type. For that
reason, by knowing the jump times and its respective jump vectors, we have complete
observation of the path. Observe that this is not possible in other type of stochastic
processes like Itô di↵usions. Finally, the data can be completely or partially observed,
depending whether we observe all the coordinates of X or a fixed subset of them.
1.9.1 The likelihood function
Let us derive the likelihood function of a continuously and completely observed path
X(t, !0 ), t 2 [0, T ], !0 2 ⌦. Let us assume that the propensity functions aj can be
written as aj (x) = cj gj (x), 8j = 1, . . . , J and x 2 Zd+ . Assume also that the functions
gj are known, for instance, by the stochastic mass-action kinetics principle. Define
✓ := (c1 , . . . , cJ ) as the vector of unknown coefficients that we have to infer from our
data.
Let us consider the jump times of (X(t, !0 ))t2[0,T ] in (0, T ) as ⇠1 , ⇠2 , . . . , ⇠N 1.
Define ⇠0 := 0, ⇠N := T and ⇠i = ⇠i+1 ⇠i for i = 0, 1, . . . , N 1.

Let us assume that the system is in the state x0 at time 0. We have that ⇠1 is
the time to the first reaction, or equivalently, the time that the system spends at x0
40
(sojourn time or holding time at state x0 ). Let us denote ⌫⇠1 as the reaction that
took place at ⇠1 , and therefore, the system at time ⇠1 is in the state x1 := x0 + ⌫⇠1 .
From the SSA algorithm it is easy to see that the probability density corresponding
to this transition is the product a⌫⇠1 (x0 ) exp ( a0 (x0 ) ⇠0 ).
By the Markov property we can see that the density of one path ((⇠i , xi ))N 1
i=0 is
given by
N
Y1
a⌫⇠i (xi 1 ) exp ( a0 (xi 1 ) ⇠i 1 ) ⇥ exp ( a0 (xN 1) ⇠N 1 ). (1.15)
i=1
The last factor in (1.15) is due to the fact that we know that the system will remain
in the state xN 1 in the time interval [⇠N 1 , T ).
Rearranging the factors in (1.15), we obtain
N
!N 1
X1 Y
exp a0 (xi ) ⇠i a⌫⇠i (xi 1 ). (1.16)
i=0 i=1
Now, taking logarithms in (1.16), we have
N
X1 N
X1
a0 (xi ) ⇠i + log(a⌫⇠i (xi 1 )). (1.17)
i=0 i=1
By the definition of a0 and the assumption aj (x) = cj gj (x), we can write (1.17) as
N
X1 X
J N
X1
cj gj (xi ) ⇠i + log(c⌫⇠i g⌫⇠i (xi 1 )).
i=0 j=1 i=1
Interchanging the order in the summation and noting the number of times that the
reaction ⌫j occurred in the interval [0, T ] as Rj,[0,T ] , we have
J N
!
X X1 N
X1
cj gj (xi ) ⇠i + log(cj )Rj,[0,T ] + log(g⌫⇠i (xi 1 )). (1.18)
j=1 i=0 i=1
41
Observing that the last term in (1.18) does not depend on ✓, we conclude that for
any particular !0 2 ⌦, the complete log-likelihood of the path (X(t, !0 ))t2[0,T ] is up
to constant terms given by
J
X
c
` (✓) := log(cj )Rj,[0,T ] (!0 ) cj Fj,[0,T ] (!0 ) ,
j=1
where Fj,[0,T ] := gj (x0 ) ⇠0 + · · · + gj (xN 1) ⇠N 1, which is in this case an exact

RT
Forward Euler sum for 0 gj (X(s)) ds.
1.9.2 Methods for Maximum Likelihood Estimation
Computing the maximum value of `c (✓), i.e., the maximum likelihood estimator
(MLE) of ✓, is a trivial task if we have observed completely and continuously one
or more paths of X. But in general, the data available is discrete and partial. In such
cases, numerical methods like the EM algorithm or its Monte Carlo version [32] can
be applied (EM stands for its two steps, Expectation and Maximization, see 9.3.1 for
a complete description). Other methods are derived from upscaled versions of SRNs,
that is, inference based on reaction-rate ODEs, Langevin di↵usions or stochastic dif-
ferential equations driven by other type of noise, such as Gamma noise.
In this PhD thesis, we present two di↵erent approaches to the problems of es-
timating the vector of unknown coefficients, ✓ = (c1 , c2 , . . . , cJ ), of the propensity
functions, aj (x) = cj gj (x), j = 1, 2, . . . , J. The first approach is an indirect inference
method where for all t, X(t) is treated as a Gaussian random variable where its mean,
2
m(t), and its variance, (t), are adjusted to the data through the matching moments
technique. We also assume independent Gaussian noise in the measurements, and ar-
rive at a penalized, weighted, non-linear least squares problem, that can be solved by
classical, deterministic, optimization methods. It is worth to mention that in this ap-
proach we do not assume the knowledge of any aspect of the set of reaction channels,
42
we methodologically depart from the simplest possible model and add complexity
until find a model that reasonably fit the data and allow us to make predictions. The
second approach is based on the Monte Carlo EM algorithm, where we show how to
estimate expected values of functional of bridges that link consecutive observations.
This approach is based on a forward-reverse representation for bridges and here we
assume that we know the set of reaction channels with the exception of the coeffi-
cients ✓. This representation is derived from the Master Equation and the backward
Kolmogorov’s equations for Markov processes. We refer to Sections 3.4 and 3.5 for
overviews of this two contributions.
1.10 Bibliographical Notes
We refer to the review article [33] for an interesting compendium of examples of

SRNs. The recent book [11] is recommended to learn about physical principles,
di↵erent approaches to find approximate solutions to the Master Equation as well
as interesting applications of Stochastic Chemical Kinetics. It is worth comparing it
with a previous version [34] to see more than two decades of progress in this area.
The review article [35] contains interesting insights of the chemical-kinetic structure
of SRNs and numerical simulation schemes previous to the ones we present in this
PhD thesis. Closer to our approach is the first chapter of [3] by David Anderson and
Tom Kurtz.
43
REFERENCES
[1] T. G. Kurtz, Approximation of Population Processes (CBMS-NSF Regional Con-
ference Series in Applied Mathematics). Society for Industrial and Applied
Mathematics, 1 1987.
[2] S. Ethier and T. Kurtz, Markov Processes: Characterization and Convergence

(Wiley Series in Probability and Statistics), 2nd ed. Wiley-Interscience, 9 2005.
[3] H. Koeppl, D. Densmore, G. Setti, and M. di Bernardo (Editors), Design &

Analysis of Biomolecular Circuits. Springer, 1 2011.
[4] J. Karlsson and R. Tempone, “Towards automatic global error control: Com-
putable weak error expansion for the tau-leap method,” Monte Carlo Methods
and Applications, vol. 17, no. 3, pp. 233–278, 2011.
[5] D. Blount, “Limit theorems for a sequence of nonlinear reaction-di↵usion sys-

tems,” Stochastic processes and their applications, vol. 45, no. 2, pp. 193–207,
1993.
[6] S. Engblom, “On the stability of stochastic jump kinetics,” arXiv preprint
arXiv:1202.3892v6, 2014.
[7] C. Briat, A. Gupta, and M. Khammash, “A scalable computational framework

for establishing long-term behavior of stochastic reaction networks,” PLOS com-
putational biology, vol. 10, 2014.
[8] N. V. Kampen, Stochastic Processes in Physics and Chemistry, Third Edition

(North-Holland Personal Library), 3rd ed. North Holland, 5 2007.
[9] C. Gardiner, Stochastic Methods: A Handbook for the Natural and Social Sciences
(Springer Series in Synergetics). Springer, 2010.
[10] H. Risken and T. Frank, The Fokker-Planck Equation: Methods of Solution and
Applications (Springer Series in Synergetics). Springer, 1996.
44
[11] P. Érdi and G. Lente, Stochastic Chemical Kinetics: Theory and (Mostly) Sys-
tems Biological Applications (Springer Series in Synergetics), 1st ed. Springer,
5 2014.
[12] F. Klebaner, Introduction to Stochastic Calculus With Applications (2nd Edi-

tion), 2nd ed. Imperial College Press, 6 2005.
[13] C. Gillespie, “Moment-closure approximations for mass-action models,” IET Syst

Biol, vol. 3, no. 1, 2009.
[14] P. Smadbeck and Y. Kaznessis, “A closure scheme for chemical master equa-
tions,” Proc Natl Acad Sci USA, vol. 110, no. 35, 2013.
[15] V. Kazeev, M. Khammash, M. Nip, and C. Schwab, “Direct solution of the

chemical master equation using quantized tensor trains,” PLoS computational
biology, vol. 10, no. 3, p. e1003359, 2014.
[16] L. C. G. Rogers and D. Williams, Di↵usions, Markov processes, and martingales.

Volume 1. , Foundations, ser. Cambridge mathematical library. Cambridge,
U.K., New York: Cambridge University Press, 2000.
[17] M. Lachowicz, “Microscopic, mesoscopic and macroscopic descriptions of com-

plex systems,” Journal of Probabilistic Engineering Mechanics, vol. 26, 2011.
[18] G. Pavliotis and A. Stuart, Multiscale Methods: Averaging and Homogenization

(Texts in Applied Mathematics). Springer, 11 2010.
[19] B. Øksendal, Stochastic Di↵erential Equations: An Introduction with Applica-

tions (Universitext), 6th ed. Springer, 2 2014.
[20] D. T. Gillespie, “A general method for numerically simulating the stochastic

time evolution of coupled chemical reactions,” Journal of Computational Physics,
vol. 22, pp. 403–434, 1976.
[21] M. A. Gibson and J. Bruck, “Efficient exact stochastic simulation of chemical

systems with many species and many channels,” The Journal of Physical Chem-
istry A, vol. 104, no. 9, pp. 1876–1889, 2000.
45
[22] D. Anderson and D. Higham, “Multilevel Monte Carlo for continuous Markov
chains, with applications in biochemical kinetics,” SIAM Multiscal Model. Simul.,
vol. 10, no. 1, mar 2012.
[23] D. F. Anderson, “A modified next reaction method for simulating chemical sys-
tems with time dependent propensities and delays,” The Journal of Chemical
Physics, vol. 127, no. 21, p. 214107, 2007.
[24] D. T. Gillespie, “Approximate accelerated stochastic simulation of chemically

reacting systems,” Journal of Chemical Physics, vol. 115, pp. 1716–1733, Jul.
2001.
[25] T. Li, “Analysis of explicit tau-leaping schemes for simulating chemically reacting
systems,” Multiscale Modeling and Simulation, vol. 6, no. 2, pp. 417–436, 2007.
[26] T. Tian and K. Burrage, “Binomial leap methods for simulating stochastic chem-
ical kinetics,” The Journal of Chemical Physics, vol. 121, no. 21, pp. 10 356–
10 364, 2004.
[27] A. Chatterjee, D. G. Vlachos, and M. A. Katsoulakis, “Binomial distribution

based ⌧ -leap accelerated stochastic simulation,” The Journal of chemical physics,
vol. 122, no. 2, p. 024112, 2005.
[28] A. Moraes, R. Tempone, and P. Vilanova, “Hybrid cherno↵ tau-leap,” Multiscale

Modeling and Simulation, vol. 12, no. 2, pp. 581–615, 2014.
[29] Y. Cao, D. Gillespie, and L. Petzold, “Avoiding negative populations in explicit

poisson tau-leaping,” The Journal of Chemical Physics, vol. 123, p. 054104, 2005.
[30] D. F. Anderson, “Incorporating postleap checks in tau-leaping,” Journal of

Chemical Physics, vol. 128, no. 5, Feb 7 2008.
and Applications, vol. 17, no. 3, pp. 233–278, March 2011.
46
[32] C. Robert and G. Casella, Monte Carlo Statistical Methods (Springer Texts in
Statistics), 2nd ed. Springer, 2005.
[33] J. Goutsias and G. Jenkinson, “Markovian dynamics on complex reaction net-

works,” Physics Reports, vol. 529, no. 2, pp. 199–264, 2013.
[34] P. Érdi and J. Tóth, Mathematical Models of Chemical Reactions: Theory

and Applications of Deterministic and Stochastic Models (Nonlinear Science).
Princeton Universiy Press, 2 1989.
[35] D. T. Gillespie, A. Hellander, and L. R. Petzold, “Perspective: Stochastic algo-

rithms for chemical kinetics,” The Journal of Chemical Physics, vol. 138, 2013.
47
Chapter 2
Stochastic Epidemic Models
The theory of Markov Processes and its applications to population processes has
been rigorously developed by Kurtz during the last decades [1]. His results have
been applied by Andersson and Britton in Chapters 5 and 8 of [2] to the context
of Markovian epidemic models. In [3], Greenwood and Grodillo make a concise and
clear exposition of stochastic epidemic modeling that explicitly recognizes the need for
stochastic simulation for models exhibiting mild complexity. The aim of this chapter
is to present the elements of Markovian epidemic models in the context of SRNs where
we have been developing fast and accurate stochastic simulation techniques as well
as inference methods (see Chapter 3).
2.1 Markovian Epidemic Models as SRNs
Let us consider a homogeneous and well mixed population of individuals. This popu-
lation is partitioned into mutually exclusive compartments that describe the possible
di↵erent stages of an epidemic process. In the context of SRNs, we identify the set of
species with the set of compartments while the set of reaction channels is defined by
the natural flow of the epidemic process through its di↵erent stages taking into ac-
count the rates at which individuals move from one compartment to the next. Thanks
to the simplifying assumptions about homogeneity and well mixing of the population,
48
we do not need to add any spatial (either graph or other) structure to our epidemic
models.
Stochastic models have some clear advantages over the deterministic ones. First,
the nature of a contagion contact seems to be more a consequence of chance than
a deterministic phenomenon. Second, deterministic models, like the SIR below, do
not admit the possibility of the sudden extinction of the epidemic or the possibility
of a minor epidemic outbreak. There is a standard list of questions associated with
stochastic epidemic models that does not have a counterpart in the deterministic
setting, including, among others: the probability of extinction, the probability of
observing an outbreak, the distribution of the duration of the epidemic process, the
distribution of the maximum number of infected individuals, the distribution of the
final size of the epidemic, etc.
Simulation studies for epidemic models have been performed extensively during
the last decades, but the increasing complexity of the mathematical models describing
the spread of transmissible diseases and the size of the involved populations make
exact simulation methods, like the SSA, computationally infeasible. Articles I, II
and III in this PhD thesis, introduce novel, fast hybrid path-simulation algorithms
that can be used to compute quantities associated with complex stochastic epidemic
models.
In what follows, we show how to interpret the compartment-rate diagrams, that
are typical in deterministic epidemic models like SRNs, and how to use the tools
presented in Chapter 1.
2.1.1 The SIR Model
Let us consider a closed population partitioned into three compartments: S(susceptible),

I(infectious) and R(removed)1 .
1
see Chapter 9 of [4] for the historical origin of the SIR model
49
SI I
S I R
Figure 2.1: Three compartments: S(susceptible), I(infectious) and R(removed).
Assume that individuals in the I-class are not only infected but also they are
infectious (or infective), that is, they port the pathogen that causes a certain disease
and they are also able to transmit the disease to susceptible individuals. Contagious
contacts only can happen when an individual from the S-class meets an infective
individual. Infected individuals recovers after an exponentially distributed random
time and gain immunity to the disease becoming and individual of the R-class. A
removed individual is no longer part of the epidemic process. In the language of
SRNs, we have:
S+I ! 2 I, contagious contact.
I ! R, removal.
Since we are considering a closed population, the sum S + I + R is constant in time

and equals to the total number of individuals in the population, N . Therefore, we
just need to keep track the sizes of S and I classes only.
I (s 1, i+1)
⌫1
(s, i)
⌫2
(s, i 1)
S
Figure 2.2: The two possible transitions from the state (s, i) in the SIR model. Ob-
serve that the disease-free states (s, 0) are absorbing states.
By the kinetic mass-action principle (see (1.3)), we have two reaction channels: i)
50
contagion, R1 = (⌫1 , a1 (s, i)) = (( 1, 1)T , s i) and, ii) remotion (act of removing),
R2 = (⌫2 , a2 (s, i)) = ((0, 1)T , i). The units of the parameter are [individuals] 2 ⇥
[time] 1 , expressing the transmission rate per capita whereas the parameter is
1 1
expressed in [individuals] ⇥ [time] and it represents the recovery rate.
In the notation used in (1.1), the stochastic process X(t) defining the SIR model
is described by
8
>
< P (X(t + dt) = (s, i) + ( 1, 1)|X(t) = (s, i)) = si dt + o(dt),
X(t) : (2.1)
>
: P (X(t + dt) = (s, i) + (0, 1)|X(t) = (s, i)) = i dt + o(dt).
Figure 2.2 depicts the possible transitions from (s, i). Figure 2.3 shows 3 SSA paths
of the SIR model.
SIR Model (s0,i0)=(99,1) SIR Model (s0,i0)=(99,1)
50
6
45
40 5
35
4
30
Infected
Infected
25
3
20
2
15
10
1
5
0 0
0 20 40 60 80 100 88 90 92 94 96 98 100
Susceptible Susceptible
Figure 2.3: 3 SSA paths. Left: Observe that one of the paths is quickly absorbed
by the disease-free states (s, 0); in this case we do not observe an epidemic outbreak.
Right: details close to the initial point (S0 , I0 ) = (99, 1). Here the population size is
N = 100.
The stoichiometric matrix, ⌫, and the vector of propensities, a, are given by:
0 1T 0 1
B 1 1 C B SI C
⌫=(⌫j )Jj=1 := @ A and a(X):= @ A,
0 1 I
where X(t) = (S(t), I(t)).

51
Let us denote X(0) = (S0 , I0 ) as the initial number of susceptible and infected,
respectively. The random time-change representation (see (1.4)) of the process X is
given by
0 1 0 1 0 1
✓Z ◆
B S(t) C B S0 C 1 C
t
B
@ A=@ A + Y1 S(u) I(u) du @ A (2.2)
I(t) I0 0 1
0 1
✓Z ◆
t
B 0 C
+ Y2 I(u) du @ A,
0 1
where Y1 and Y2 are two independent unit-rate Poisson processes.
The Mean Field Approximation
By replacing the Poisson processes Y1 and Y2 in (2.2) by the identity function, we

obtain the classical deterministic reaction-rate ODEs (Kermack and McKendrick 1927
[5]) also called mean field equations (see (1.13)) in integral form:
0 1 0 1 0 1 0 1
Z Z
B SM F (t) C B S0 C 1 C 0 C
t t
B B
@ A=@ A+ SM F (u) IM F (u) du @ A+ IM F (u) du @ A,
IM F (t) I0 0 1 0 1
which, in di↵erential form is
8
>
>
>
> ṠM F (t) = SM F (t) IM F (t)
<
I˙M F (t) = SM F (t) IM F (t) IM F (t) (2.3)
>
>
>
>
: (SM F (t), IM F (0)) = (S0 , I0 ),
In Figure 2.4, we observe that when I0 is small, the mean field approximation
(SM F (t), IM F (t)), given by the solution of (2.3), overestimate the average number of
52
SIR Model (s0,i0)=(99,1) SIR Model (s0,i0)=(90,10)
60 70
Mean field Mean field
SSA−Average SSA−Average
60
50
50
40
40
Infected
Infected
30
30
20
20
10
10
0 0
0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90
Figure 2.4: Notice how the mean field path overestimates the mean of the stochastic
SIR model when I0 = 1, but when the initial number of infectives is I0 , the mean
field gives a better approximation. Observe that the mean field seems to give a good
approximation for the mean of the trajectories that escape from disease-free states,
I = 0.
infected individuals, that is IM F (t) E [I(t)] , 8t 2 (0, T ]. In certain sense, the mean
field does not takes into account the number of paths of X that are quickly absorbed
by the disease-free states. The SSA paths are generated with the algorithm described
in Section 1.8.1.
The Infinitesimal Generator of the SIR Model
The infinitesimal generator (see (1.8)) of X is given by:
LX f (s, i) = s i (f (s 1, i + 1) f (s, i)) + i (f (s, i 1) f (s, i)).
Notice that if we apply the Dynkin formula (see (1.10)) for g(s, i) = i, we have
that
LX g(s, i) = si i,
53
and therefore
dE [I(t)]
= E [S(t)I(t)] E [I(t)] ,
dt
implying that an ODE for E [I(t)] depends on higher order moments, in this case,
E [S(t)I(t)]. This is caused by the nonlinearity of the term a1 (s, i) = s i (see Remark
1.5.2).
The Master Equation for the SIR Model
The Master Equation (see (1.6)) of the SIR model is given by
dp(s,i) (t)
= (s + 1) (i 1)1{(s+1,i 1)2D} p(s+1,i 1) (t) (2.4)
dt
+ (i + 1)1{(s,i+1)2D} p(s,i+1) (t)
( s i + i)1{(s,i)2D} p(s,i) (t),
where D := {(s, i) 2 Z2+ : s + i  S0 + I0 } is the state space of X, and the initial

condition is a Dirac’s delta on (S0 , I0 ). The indicator functions, 1· , are necessary to
avoid any probability inflow from outside the natural domain of X. In this case,
the ODE system (2.4) can be solved numerically for modest population levels, but
it quickly turns infeasible for moderate population sizes (observe that the size of the
system is O |D|2 ), where |D| is the size of the state space of the process X or, in
general, it is the size of a truncated approximation of D (see [6] for a discussion of
techniques for solving the Master Equation).
Remark 2.1.1. Due to the structure and sparsity of the coefficient matrix of (2.4),
it may be possible to apply fast numerical method based on, for example, numerical
tensorial linear algebra [7], but we do not follow this approach in this PhD thesis.
54
The Langevin Approximation for the SIR Model
According to (1.14), the Langevin di↵usion approximation, Y (t) = (SChL (t), IChL (t)),
to our epidemic process, X(t), is the following stochastic process driven by the pair
of independent standard Wiener processes (WS (t), WI (t))t2(0,T ] :
8
> p
>
> dS = SIdt + SIdWS
>
<
p p
Y (t) : dI = ( SI I)dt SIdWS + IdWI
>
>
>
>
: IC : (S(0), I(0)) = (s, i).
For sufficiently high reaction rates where the Gaussian random variables are good
approximations for Poisson random variables, the Langevin di↵usion Y is an inter-
esting alternative to SSA-paths since the linear combination of independent Gaussian
random variables is a Gaussian and there are fast Gaussian random number genera-
tors. Figure 2.5 shows the Langevin approximation to X. Tools from the Theory of
Stochastic Di↵erential Equations (SDEs) (see [8]) can be used derive distributions of
some typical quantities associated with epidemic models, such as: i) the basic repro-
duction number, R0 , defined as the average number of infections caused by a single
infective in a susceptible population; ii) the quasi-stationary distribution of infectives,
that is, the limit distribution of the number of infectives conditional on the disease-
free boundary has not been reached; iii) the time to extinction of the epidemic, that
is, the hitting time to the disease-free boundary, iv) final size of the epidemic, which is
defined as, N S(+1), that is, the number of individuals untouched by the epidemic
process. All those quantities are specific to each epidemic model. An application of
this approximation technique, i.e., SRNs approximated by SDEs, can be found in [9]
in the context of alcohol drinking in college campuses across the USA.
55
Figure 2.5: Here we observe that the empirical mean of the Langeving paths is close to
the empirical mean of the SSA paths independently of the initial number of infectives.
This is due to the same boundary phenomenon. KMC stands for Kinetic Monte Carlo,
which is another common name for the SSA.
2.1.2 Examples of Epidemic Models Based on the SIR Model
The SIS Model
The SIS model is a particular case of the SIR model which assumes that once one
infected individual recovers, she does not gain immunity and then immediately returns
to the susceptible class, see Figure 2.6. Since S(t) + I(t) = N , the two reaction
SI
S I
I
Figure 2.6: Two compartments: S(susceptible) and I(infectious).
channels can be expressed as follows: i) contagion, R1 = (⌫1 , a1 (i)) = (1, (N i) i)

and ii) back to susceptibility, R2 = (⌫2 , a2 (i)) = ( 1, i). According to this, the
infinitesimal generator of the SIS model is given by
Lf (i) := (N i) i (f (i + 1) f (i)) + i (f (i 1) f (i)).

56
Taking f (i) = i and using the Dynkin formula, we obtain a di↵erential equation for
E [I(t)]
dE [I(t)] ⇥ ⇤
=( N )E [I(t)] E I 2 (t) , (2.5)
dt
which depends on higher order moments of I(t). At this point, we would like to
remark how straightforwardly we obtained the equation (2.5) for the SRNs machin-
ery developed in Chapter 1. For instance, in Chapter 3 of [10] there is a two-page
derivation of (2.5) based on moment generating functions (MGFs).
The SEIR Model
While the SIR model is suitable for diseases in which individuals of the I-class are in-
fected and infectives at the same time, there are infectious diseases where a recently in-
fected individual has an exposed period before it may develops symptoms and becomes
infective. Figure 2.7 depicts the compartmental diagram for the SEIR model. The
reaction channels of the SEIR model are: R1 = (⌫1 , a1 (s, e, i)) = (( 1, 1, 0)T , s i),
R2 = (⌫2 , a2 (s, e, i)) = ((0, 1, 1)T ,  e) and R3 = (⌫3 , a3 (s, e, i)) = ((0, 0, 1)T , i).
SI E I
S E I R
Figure 2.7: SEIR model. An infected individual (E) may become infective (I). The
exposed period is an exponential random variable with rate .
The SIR Model with Demography
For endemic diseases the scale in which the epidemic develops should account for
demographic e↵ects. In this case, a birth and death process (see 1.2.3) a↵ects the
SIR epidemic process. If the population is in demographic equilibrium, the inflow of
newborns, who we assume are susceptible (no vertical transmission of the disease),
57
should match the outflow due to the deaths which may occur in any compartment.
Figure 2.8 depicts the compartment diagram for the SIR model with demography.
✓N SI I
S I R
✓S ✓I ✓R
Figure 2.8: SIR model with demographic e↵ects.
The set of reaction channels in this case is (in the 2-dimensional case in which we
do not track the number of removed individuals): i) contagion R1 = (⌫1 , a1 (s, i)) =
(( 1, 1)T , s i), ii) remotion R2 = (⌫2 , a2 (s, i)) = ((0, 1)T , i) , iii) birth R3 =
(⌫3 , a3 (s, i)) = ((1, 0)T , ✓ N ) , iv) death of a susceptible R4 = (⌫4 , a4 (s, i)) = (( 1, 0)T , ✓ s)
and, v) death of an infective R4 = (⌫5 , a5 (s, i)) = ((0, 1)T , ✓ i). Figure 2.9 shows 100
SSA-paths of the SIR model with demography along its mean field and the empirical
average of the SSA-paths.
SIR Model with demography (s0,i0)=(99,1), N=100 SIR Model with demography (s0,i0)=(90,10), N=100
80 80
Mean field Mean field
SSA−Average SSA−Average
70 70
60 60
50 50
Infected
Infected
40 40
30 30
20 20
10 10
0 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100
Figure 2.9: SIR with demography, the reaction-rate ODEs vs the mean of its cor-
responding SRN. Left: one initial infected produces a relatively large probability of
a quick absorption by the disease-free states. Right: the empirical average di↵ers
from the mean field by not producing a spiral behavior, again due to the positive
probability of the sudden extinction of the epidemic disease. Observe that the mean
field seems to give a good approximation for the mean of the trajectories that escape
from the disease-free states.
Remark 2.1.2 (Markovian SIR, SIS and SIR with demography). In chapters 5 and
58
8 of the very pedagogical lecture notes by Andersson and Britton [2], a survey of
results regarding Markovian SIR, SIS and SIR with demography are presented. There,
functional laws of large numbers and central limit theorems are derived from the results
of Kurtz [1]. It also contains results on epidemic models obtained by Djehiche, Nåsell,
Ball and many others.
2.2 The Role of SRNs in Epidemic Models
In Articles I, II and III (references [11, 12, 13] respectively), presented in the second
part of this PhD thesis, we developed fast hybrid algorithms for simulating paths of
SRNs. We also developed Multilevel Monte Carlo methods for estimating expected
values of observables of SRNs at some fixed time T , for example, the expected value
and the variance of the number of infected individuals after one month of started the
epidemic process. With the aid of our numerical methods it is possible to estimate
quantities of interest arising from Markovian epidemic models by paths simulation.
Observe that our hybrid paths take values on the lattice Zd+ . This is a desirable
characteristic not shared by the paths generated by Langevin SDEs. In Figure 2.10, we
can see a single SIR Mixed-path generated by Algorithm 25. Observe that, when the
hybrid process is visiting states close to the boundary, an exact method is preferred,
but, sufficiently far from the boundary, the method selects the Cherno↵ tau-leap for
some reactions (generally both in this case), allowing us to take larger time-steps and
therefore, saving computational work.
Regarding the inferential aspects of SRNs, in Articles IV and V (references [14, 15]
respectively) presented in the second part of this PhD thesis, we address the statistical
estimation problem of estimating the coefficients of a given SRNs from collected
discretely observed data. The traditional least squares approach (see Chapter 10 of
[4]) can be viewed as an indirect inference method where goodness-of-fit techniques
59
Mixed path (s0,i0)=(700,1), N=100 Mixed path (s0,i0)=(700,1), N=100
300
35
250
30
200 25
Infected
Infected
20
150
15
100
10
50
5
0 0
0 100 200 300 400 500 600 700 620 630 640 650 660 670 680 690 700
Figure 2.10: Left: mixed paths for the SIR model starting at (S0 , I0 , R0 ) = (700, 1, 0).
Right: detail of the paths close to the disease-free boundary.
based on the Master Equation can be applied [14]. Now, we briefly summarize our
results obtained in [15] for the classical SRI stochastic model given by (2.1). In a
SIR problem, consider an initial state X0 = (S0 , I0 , R0 ) = (300, 5, 0), T = 10 and
consider synthetic data generated using the parameters c1 = 1.66 and c2 = 0.44 by
1
observing at uniform time intervals of size t= 16
, without adding observation noise.
The data trajectory is shown in the left panel of Figure 2.11.
Data trajectory
100 1
90
S Initial point phase I
I 0.9
Initial point phase II
80 0.8 Final point phase II
70
0.7
Species count
60
0.6
50
0.5
40
0.4
30
0.3
20
0.2
10
0.1
0
0 2 4 6 8 10 0
Time 0 0.5 1 1.5 2 2.5 3
Figure 2.11: Left: data trajectory for the SIR example. This is obtained by observing
1
the values of an SSA path at regular time intervals of size t= 16 . Right: FREM
estimation (phase I and phase II) for the SIR model.
ˆ
Our FREM estimation gave us a cluster average of ✓=(1.86, 0.43). The FREM
algorithm took p⇤ =3 iterations to converge (minimum imposed). Details can be found
60
in Table 2.1 and right panel of Figure 2.11.
Remark 2.2.1. At this point it is worth mentioning that the distance between the esti-
ˆ
mation ✓=(1.86, 0.43) and the values used for generating the synthetic data (1.66, 0.44)
ˆ
is meaningless. The important one is the distance between our FREM estimation, ✓,
and the true MLE estimate of ✓ (which we do not have).
(p⇤ )
= ✓ÎI,i
(0) (0)
i ⇤ = ✓I,i 3 = ✓II,i
1 (0.40, 0.05) (2.96, 0.66) (1.86, 0.43)
2 (0.40, 1.00) (2.96, 0.66) (1.86, 0.43)
3 (3.00, 0.05) (2.96, 0.66) (1.86, 0.43)
4 (3.00, 1.00) (2.96, 0.66) (1.86, 0.43)
Table 2.1: Values generated by the FREM Algorithm for the SIR model.
Summarizing, using the numerical stochastic methods developed in this PhD

thesis, we can perform simulation studies of complex epidemic models modeled by
SRNs where due to the scarcity of individuals in one or more compartments make
the reaction-rate ODEs a non-attractive option. Regarding the statistical inference
methodology for SRNs, it o↵ers a powerful novel approach that should be explored
deeply in the epidemics context.
2.3 Challenges and Opportunities in Stochastic Epi-
demic Models
There are many di↵erent generalizations of the classical SIR model: we have models
for specific diseases, for interaction between populations, for multiple concurrent epi-
demics, for age-dependent contagion-rates, for vaccination and quarantine strategies,
just to mention a few.
The scientific production of Carlos Castillo-Chavez and his collaborators [4, 16,
10, 3, 17, 18, 19, 20, 21, 22] constitutes a major reference for all classes of epidemic
61
models where compartmental models described by systems of ODEs have a privileged
position. In [16], models for Influenza, HIV, Tuberculosis and Sexually Transmitted
Diseases (STDs) are found. Complex diagrams and their associated reaction-rate
ODEs are presented and analyzed using tools of the theory of dynamical systems [23].
Many of these models have an immediate translation into SRNs where simulation
studies can be performed. Fast and efficient simulation methods are required to
deal with complex epidemic models. In our immediate horizon of future work we
would like to mention: implicit hybrid tau-leap schemes, incorporation of the spatial
inhomogeneity, sensitivity analysis of SRNs by dual-based methods and control as
well as keep developing statistical inference techniques.
Stochastic Backward Bifurcation
An interesting phenomenon related with the basic reproduction number, R0 , is back-

ward bifurcation, discovered by Castillo-Chavez and Hadeler [24] and clearly exposed
in Section 2.5 of [16]. Many epidemic models described by reaction-rate ODEs, ex-
hibits a bifurcation of the equilibrium number of infected individuals with respect to
the parameter R0 at the value 1 called forward bifurcation: if R0 < 1, the infection
dies out but, if R0 > 1, there is a positive endemic number of infectives. It has been
observed a backward bifurcation phenomenon in some vaccination models, that is,
there exists a positive critical value Rc < 1 such that when R0 2 (Rc , 1), there are
two positive endemic equilibrium points, one unstable and the other asymptotically
stable, in such way that, if the initial number of infectives is greater than certain
value, the system moves to the stable equilibrium endemic state. A related ongo-
ing research in collaboration with Dr. Fabio Sánchez (University of Costa Rica) is
focused on whether backward bifurcation occurs in the stochastic versions of those
vaccination models as well as its related probability distributions. A deterministic
backward bifurcation study for a simple SIS model with demography and treatment
62
can be found in Song et al. [25].
Partially Observed Data
When collecting epidemic data, we rarely can observe the number of individuals in
all compartments at the same time, especially in complex models. Typically, we can
only count the symptomatic individuals reported by the health authorities. Dr.Anuj
Mubay (ASU) suggested to develop fast statistical methods for partially observed
data in Leishmaniasis models [26].
Estimation of R0
The estimation of the basic reproduction number, R0 , is mainly based on the obser-
vation of the first stage of the epidemic process [27] where it behaves like a branching
process (see A.5). Prof. Carlos Castillo-Chavez (ASU) suggested to estimate R0 in
stochastic models from the final size relation (see Chapter 9 of [4])
✓ ◆ ✓ ◆
S0 S0
log = R0 1 .
S(+1) S(+1)
Remark 2.3.1 (MTBI). The Mathematical and Theoretical Biology Institute (MTBI)
is a research program created and organized by Prof. Carlos Castillo-Chavez that
yearly encourages a diverse and well-motivated group of young undergraduate and
graduate students coming from all around the world to formulate their own research
questions while acquiring an impressive number of skills to answer them with the
help of mentors and outstanding, experienced researchers. The results of the research
projects can be found at http: // mtbi. asu. edu/ research/ archive . This is not
only a valuable source of relevant epidemic models but it also represents a tremendous
source for research opportunities. For a thorough description of MTBI see [28].
63
REFERENCES
[1] S. Ethier and T. Kurtz, Markov Processes: Characterization and Convergence
(Wiley Series in Probability and Statistics), 2nd ed. Wiley-Interscience, 9 2005.
[2] H. Andersson and T. Britton, Stochastic Epidemic Models and Their Statistical
Analysis (Lecture Notes in Statistics), 2000th ed. Springer, 7 2000.
[3] G. Chowell, J. M. Hayman, L. M. A. Bettencourt, and C. C.-C. (Editors), Eds.,

Mathematical and Statistical Estimation Approaches in Epidemiology, 2009th ed.
Springer, 6 2009.
[4] F. Brauer and C. Castillo-Chavez, Mathematical Models for Communicable Dis-

eases (CBMS-NSF Regional Conference Series in Applied Mathematics). Society
for Industrial and Applied Mathematics, 12 2012.
[5] W. Kermack and A. McKendrick, “Contributions to the mathematical theory of

epidemicsi,” Bulletin of mathematical biology, vol. 53, no. 1, pp. 33–55, 1991.
[6] V. Kazeev, M. Khammash, M. Nip, and C. Schwab, “Direct solution of the

chemical master equation using quantized tensor trains,” PLoS computational
biology, vol. 10, no. 3, p. e1003359, 2014.
[7] W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus (Springer Series
in Computational Mathematics, Vol. 42), 1st ed. Springer, 2 2012.
[8] B. Øksendal, Stochastic Di↵erential Equations: An Introduction with Applica-

tions (Universitext), 6th ed. Springer, 2 2014.
[9] R. Bani, R. Hameed, S. Szymanowski, P. Greenwood, C. M. Kribs-Zaleta, and

A. Mubayi, “Influence of environmental factors on college alcohol drinking pat-
terns,” Mathematical biosciences and engineering, vol. 10, no. 5-6, pp. 1281–1300,
2013.
[10] F. Brauer, P. van den Driessche, J. Wu, L. Allen, C. Bauch, C. Castillo-Chavez,

D. Earn, Z. Feng, M. Lewis, J. Li, M. Martcheva, M. Nuno, J. Watmough,
M. Wonham, and P. Yan, Eds., Mathematical Epidemiology (Lecture Notes in
64
Mathematics / Mathematical Biosciences Subseries), 2008th ed. Springer, 4
2008.

[12] ——, “Multilevel hybrid Cherno↵ tau-leap,” Accepted for publication in BIT
Numerical Mathematics, 2015.
[13] ——, “A multilevel adaptive reaction-splitting simulation method for stochastic

reaction networks,” preprint arXiv:1406.1989, 2014.
[14] A. Moraes, F. Ruggeri, R. Tempone, and P. Vilanova, “Multiscale modeling of

wear degradation in cylinder liners,” Multiscale Modeling and Simulation, 2014.
[15] C. Bayer, A. Moraes, R. Tempone, and P. Vilanova, “The forward-reverse algo-

rithm for stochastic reaction networks with applications to statistical inference,”
preprint, 2015.
[16] F. Brauer and C. Castillo-Chavez, Mathematical Models in Population Biology

and Epidemiology (Texts in Applied Mathematics), 2nd ed. Springer, 9 2011.
[17] A. Gumel, C. Castillo-Chavez, R. E. Mickens, and D. P. Clemence, Mathemati-

cal Studies on Human Disease Dynamics: Emerging Paradigms and Challenges
(Contemporary Mathematics). American Mathematical Society, 11 2006.
[18] C. Castillo-Chavez, S. Blower, P. van den Driessche, D. Kirschner, and A.-A.

Yakubu, Eds., Mathematical Approaches for Emerging and Reemerging Infec-
tious Diseases: An Introduction (The IMA Volumes in Mathematics and its Ap-
plications), 2002nd ed. Springer, 5 2002.
[19] ——, Mathematical Approaches for Emerging and Reemerging Infectious Dis-
eases: Models, Methods, and Theory (The IMA Volumes in Mathematics and its
Applications), 2002nd ed. Springer, 5 2002.
65
[20] D. Zeng, H. Chen, C. Castillo-Chavez, W. B. Lober, and M. Thurmond, Eds.,
Infectious Disease Informatics and Biosurveillance (Integrated Series in Infor-
mation Systems), 2011th ed. Springer, 11 2010.
[21] H. T. Banks and C. Castillo-Chavez, Eds., Bioterrorism: Mathematical Modeling

Applications in Homeland Security (Frontiers in Applied Mathematics), 1st ed.
Society for Industrial and Applied Mathematics, 1 1987.
[22] C. Castillo-Chavez, Ed., Mathematical and Statistical Approaches to AIDS Epi-

demiology (Lecture Notes in Biomathematics). Springer, 1 1990.
[23] S. H. Strogatz, Nonlinear Dynamics And Chaos: With Applications To Physics,

Biology, Chemistry, And Engineering (Studies in Nonlinearity), 1st ed. West-
view Press, 1 2001.
[24] K. P. Hadeler and C. Castillo-Chávez, “A core group model for disease trans-
mission,” Mathematical Biosciences, vol. 128, no. 1, pp. 41–55, 1995.
[25] B. Song, W. Du, and J. Lou, “Di↵erent types of backward bifurcations due
to density-dependent treatments.” Mathematical biosciences and engineering:
MBE, vol. 10, no. 5-6, p. 1651, 2013.
[26] A. Mubayi, C. Castillo-Chavez, G. Chowell, C. Kribs-Zaleta, N. Ali Siddiqui,

N. Kumar, and P. Das, “Transmission dynamics and underreporting of kala-azar
in the indian state of bihar,” Journal of theoretical biology, vol. 262, no. 1, pp.
177–185, 2010.
[27] G. Chowell, C. Ammon, N. Hengartner, and J. Hyman, “Estimation of the re-

productive number of the spanish flu epidemic in geneva, switzerland,” Vaccine,
vol. 24, no. 44, pp. 6747–6750, 2006.
[28] E. T. Camacho, C. Kribs-Zaleta, and S. Wirkus, “The mathematical and theo-

retical biology institute-a model of mentorship through research.” Mathematical
biosciences and engineering: MBE, vol. 10, no. 5-6, p. 1351, 2013.
66
Chapter 3
Overview of Articles
The central subject of this PhD thesis is known under di↵erent names; among the
most common ones we have: stochastic reaction networks (SRNs), chemical reac-
tion kinetics and continuous-time Markovian pure jump processes. For the reader
unfamiliar with this topic, a quick review of SRNs is presented in Chapter 1.
In this work, we have been focused on two di↵erent problems related to SRNs: i)
fast path-simulation and global error control and ii) statistical inference for the set
of reaction coefficients. Problem i) is treated in the first three chapters of the second
part of this thesis (Articles I, II and III), while the problem ii) is addressed in the
last two (Articles IV and V).
The main objective of the problem i) is the following: given a SRN, X, de-
fined though its set of reaction channels, and its deterministic initial state, estimate
E [g(X(T ))], that is the expected value of a scalar observable, g, of the process, X,
at a fixed time, T . This problem lead us to define a series of Monte Carlo estimators,
M, that with high probability can produce values close to the quantity of interest,
E [g(X(T ))]. More specifically, given a user-selected tolerance, T OL, and a small
confidence level, ⌘, find an estimator, M, based on sampled paths of X, such that,
P (|E [g(X(T ))] M| > T OL) < ⌘;

67
furthermore, we want to achieve this objective with near optimal computational work.
Regarding the problem ii), we want to infer, from observed data, the unknown
reaction rates of our SRN, that is, the unknown coefficients of our propensity func-
tions.
3.1 Overview of Article I
A. Moraes, R. Tempone and P. Vilanova, “Hybrid Cherno↵ Tau-Leap”, SIAM

Multiscale Modeling and Simulation, Vol. 12, Issue 2, (2014).
The author contributed to the theoretical sections of the paper and specially
with the formulation of the Cherno↵ bound. This work has been presented by
the author in the ECCOMAS conference, Sept 2012, Vienna - Austria.
In this article, we present a novel, adaptive, hybrid algorithm for simulating paths
of SRNs. It is hybrid because, at each step, our algorithm decides between the SSA
(see Section 1.8.1) and the Cherno↵ tau-leap method.
3.1.1 The Cherno↵ Tau-Leap Method
We develop in this article a pre-leap method (see Section 1.8.2) for controlling, but
not avoiding, the one-step exit probability consequence of the tau-leap method, i.e.,
let x = X̄(t) be the tau-leap approximation of X(t), then, the value of X̄ at the
P
next leap of size ⌧ is given by X̄(t + ⌧ ) = x + Jj=1 Pj (aj (x) ⌧ ) ⌫j , where Pj ( j ), j =
1, 2, . . . , J, are independent Poisson random variables with rates j , respectively. Note
P
that Jj=1 Pj (aj (x) ⌧ ) is a linear combination of the stoichiometric vectors, ⌫j , with
unbounded coefficients. If any ⌫j has at least one negative component, then, there
is a positive probability that the state X̄(t + ⌧ ) has negative components too. This
probability is clearly a function of x and ⌧ . In this article, we address the problem of,
68
given / Zd+ X̄(t) = x <
> 0, finding the largest ⌧ ⌘ ⌧ (x, ) such that, P X̄(t + ⌧ ) 2
. We develop a Cherno↵-type of bound (see Section A), ChBnd(x, ⌧ ) for a linear
combination of independent Poisson random variables. The function ChBnd(x, ⌧ )
satisfies:
/ Zd+ X̄(t) = x  ChBnd(x, ⌧ ) . .
P X̄(t + ⌧ ) 2 (3.1)
Let ⌧Ch be the largest value of ⌧ satisfying (3.1). Figure 3.1 depicts the ChBnd in
a simple decay example. The Cherno↵ bound has a closed analytic expression only
in cases where there is a single reaction channel. In this work, we introduce a fast
numerical algorithm for approximating the value ⌧Ch (since its exact value involves
solving a transcendental equation).
0.1
0.01
Klar !1D"
0.001
Chernoff ! this work"
10 ! 4 Poisson ! exact "
Gaussian ! approxim ation "
10 ! 5
10 ! 6
4 6 8 10
Λ
Figure 3.1: Let n =⇣ 10 and 2 (2, 10). ⌘Semi-logarithmic plot of P (Q( ) n) 
ChBnd(n, ) = exp n(1 log(n/ ) ) . See Klar’s bound in [1] and Gaussian
approximation in [2].
3.1.2 Our Hybrid Switching Rule
The way in which both methods are blended in one path depends on a cost-based
switching rule. This rule takes into account the time-mesh and the one-step exit
probabilities associated with the tau-leap method.
69
The decision rule is as follows: given the current time, t, the current state, x,
the next mesh point, Tk and a given one-step exit probability bound, ; our hybrid
algorithm makes its choice by comparing the expected computational cost of reaching
Tk from t between the SSA and Cherno↵ tau-leap methods. In this way, our hybrid
method induces a natural partition of the state space of the process X into two
regions, one for the SSA and the other for the tau-leap. It turns out that the SSA-
region is close to the boundaries, where the probability of the tau-leap step should be
very small to control the one-step exit probability. Figure 3.1.2 depicts this fact for
the Gene Transcription and Translation (GTT) example, described in 5.5.2. Observe
that in the left panel of the Figure 3.1.2 the SSA region has few points of the form
(0, y) with small y. It means that there is at least one reaction channel pushing
the process X in the direction of the vector (0, 1) such that, for the states (0, y)
with small y, and for the given time-mesh, the tau-leap method has a one-step exit
probability greater than = 10 2 . In the central panel of the same figure, the SSA
region incorporates many states of the form (1, y) and (x, 1), but not (0, y) or (x, 0)!.
In the states (1, y), there is a reaction pushing out the lattice in the direction ( 1, 0),
which is not active in (0, y) because when the process is at the boundary, the reactions
pushing out of this boundary are inactive; analogously in the (x, 1) case.
Chernoff tau−leap and SSA regions Chernoff tau−leap and SSA regions Chernoff tau−leap and SSA regions
40 40 40
Tau−leap Tau−leap Tau−leap
35 SSA 35 SSA 35 SSA
30 30 30
25 25 25
Proteins
Proteins
Proteins
20 20 20
15 15 15
10 10 10
5 5 5
0 0 0
0 10 20 30 40 0 10 20 30 40 0 10 20 30 40
mRNA mRNA mRNA
Figure 3.2: Regions of the one-step switching rule in the Gene Transcription and
Translation model (see Section 5.5.2). The blue and red dots show the Cherno↵
tau-leap and the SSA regions, respectively. From left to right, = 10 2 , 10 4 , 10 6 ,
respectively.
70
We observe in 5.A that, when the size of the time-step or the parameter goes
to zero, then, the hybrid method decides for the SSA. This implies that the expected
work of a hybrid path remains bounded by the expected computational work of one
SSA path.
Let us describe the hybrid algorithm in more detail: When ⌧Ch is of the same
order as the expected inter-arrival time of the SSA, ⌧SSA = (a0 (x)) 1 , it is convenient
to take an exact step instead of a tau-leap step. In this way, we arrive at a hybrid
(exact-approximate) algorithm (Algorithm 1) that adaptively switches between the
SSA and the Cherno↵ tau-leap method by choosing the method that moves forwards
faster per unit cost.
Algorithm 1 Let x be the state of our hybrid path at time t. Let Tk be the next
grid point. K1 is the cost of computing ⌧Ch (x) divided by the cost of taking an SSA
step. K2 = K2 (x, ) is the cost of taking a Cherno↵ Tau-leap step divided by the
cost of taking an SSA step plus the cost of computing ⌧Ch (x, ). This cost analysis is
due to the fact that the computational cost of generating Poisson random variables
depends on its rate , see Figure 3.3.
1: Compute ⌧SSA . (A low cost calculation.)
2: if K1 ⌧SSA > Tk t then
3: Use SSA.
4: else
5: Compute ⌧Ch (A more expensive calculation.)
6: if ⌧Ch K2 ⌧SSA then
7: Use Cherno↵ Tau-leap.
8: else
9: Use SSA.
10: end if
11: end if
71
Poisson random variates computational work model Poisson random variates computational work model
−4 −4
x10 x10
Actual simulation runtimes
6 Least squares fit
5
CP(λ)
CP(λ)
4 2
2 Actual simulation runtimes

Least squares fit
1
500 1000 1500 0 5 10 15
λ λ
Figure 3.3: Left: The computational work (runtime) model for generating a Poisson
random variate, using the Gamma method by Ahrens and Dieter [3]. Right: Linear
growth detail, for 2 [0, 15].
3.1.3 Global Error Control
The global error is defined as the di↵erence E := E [g(X(T ))] M, where the Monte
Carlo estimator, M is
M
1 X
M := g(X̄(T ))1A (!m ).
M m=1
Here A is the event in which the hybrid path, X̄, arrives at the final time T , without
exiting the state space of X, and Ac is its complement. Notice that M is an unbiased
⇥ ⇤
estimator of E g(X̄(T ))1A , but a biased estimator of E [g(X(T ))].
The global error can be decomposed as follows:
⇥ ⇤ ⇥ ⇤
E [g(X(T ))] M = E [g(X(T ))(1A + 1Ac )]
E g(X̄(T ))1A + E g(X̄(T ))1A M=
M ⇥ ⇤ !
⇥ ⇤ X E g(X̄(T ))1A g(X̄(T ))1A
E (g(X(T )) g(X̄(T )))1A + E [g(X(T ))1Ac ] + (!m ).
m=1
M
⇥ ⇤
The first component, E (g(X(T )) g(X̄(T )))1A , is the discretization error, EI . It
depends mostly on the size of the time-step, t. In this article, we introduce a dual-
weighted method for fast estimation of EI . The second component, E [g(X(T ))1Ac ],
72
is named the global exit error, EE . It is controlled by the one-step exit probability
bound, , but it also depends on the expected number of tau-leap steps in a hybrid
path, which depends also on t. The third term of the global error decomposition,
PM ⇥ ⇤
M 1 E g(X̄(T ))1A g(X̄(T ))1A (!m ), is the statistical error, ES . It can be
m=1
controlled by the number of generated hybrid paths, M .
To provide the simulation setting, i.e., the one-step exit probability bound, ,
the time-step, t, and the number of hybrid paths, M , needed for estimating,
E [g(X(T ))], with near optimal computational work, we show in this article a cal-
ibration algorithm design to approximately solve:
8
>
>
> minM, t, M ( t, )
>
<
s.t. . (3.2)
>
>
>
>
: EI + EE + ES  T OL
Here, ( t, ) is the expected cost of a hybrid path generated with parameters t

and .
Remark 3.1.1 (On the optimization problem). In fact, the constraint we use in (3.2)
is the sum of three terms: i) T OL2 as an upper bound for |EE | (this is achieved by
controlling ), ii) a dual-weighted estimate of the magnitude of EI , and iii) a term
p
proportional to an estimate of Var [ES ], where the constant of proportionality is
chosen according to the confidence level we want to achieve.
3.1.4 Results
In Figure 3.4, we show an ensemble of five independent realizations of the calibration

algorithm and the comparisons of its corresponding predicted and actual work. We
can appreciate the robustness of the calibration procedure. We can also observe that
the hybrid method converges to the SSA one when the tolerance goes to zero.
73
Predicted work vs. Error bound, Genes model Predicted/Actual work vs. Error bound, Genes model
−1 −1
10 10
SSA SSA predicted
Hybrid Hybrid predicted
slope 1/2 reference SSA
Hybrid
Error bound
Error bound
−2
10
−2
10
1 2 3 4 1 2 3 4
10 10 10 10 10 10 10 10
Predicted work (runtime, seconds) Work (runtime, seconds)
Figure 3.4: Left: Predicted work (runtime) versus the estimated error bound for the
gene transcription and translation model. The hybrid method is preferred over the
SSA one, for the first two tolerances (larger ones). For the last four tolerances, the
SSA is preferred. Therefore, in the latter case, the total predicted runtime is the
same for the hybrid and SSA methods. Right: Predicted and actual work (runtime)
versus the estimated error bound.
3.1.5 Summary
Our hybrid method allows us (i) to control the global exit error caused by the tau-
leap steps and (ii) to obtain accurate and computable estimates of the expected value
of observables of SRNs with near optimal computational work. Another advantage
derived from the use of a hard bound for one-step exit probabilities is that we do
not need to make any distributional approximation for the tau-leap increments (e.g.
exchanging Poisson random variables for binomial ones) and then, we are not intro-
ducing additional modeling error. It is worth mentioning that, by simulating hybrid
paths, we obtained accurate estimates of the average number of steps required by the
SSA method to reach the final time. This is especially relevant in problems where
the process visits regions of the state space where the total propensity is very high.
3.2 Overview of Article II
A. Moraes, R. Tempone and P. Vilanova, “Multilevel Hybrid Cherno↵ Tau-

Leap”, Accepted for publication in BIT Numerical Mathematics, (2015).
74
The author contributed specially to the development of the hybrid coupling-
paths technique and the strong error formula. This work has been presented by
the author in XI MCQMC, April 2014, Leuven - Belgium.
This article extends the hybrid Cherno↵ tau-leap method presented in [4] to the
multilevel Monte Carlo (MLMC) setting. Inspired by the work of Anderson and
Higham on the tau-leap MLMC method with uniform time-meshes, we develop a
novel algorithm that is able to couple two hybrid Cherno↵ tau-leap paths at di↵erent
levels. But, unlike the multilevel algorithms proposed by Anderson and Higham, we
do not need to distinguish between biased and unbiased discretization. When our
hybrid algorithm chooses exact paths at the bottom level, we automatically have an
unbiased algorithm.
3.2.1 Coupling Two Hybrid Paths
The levels are given by a hierarchy of L + 1 nested time-meshes of the interval [0, T ],
indexed by ` = 0, 1, . . . , L, such that t0 is the size of the coarsest time-mesh and
`
t` = 2 t0 , ` = 1, . . . , L. Let X̄` (·):=X̄(·; t` , ) be a hybrid Cherno↵ tau-leap
path generated using a time-mesh of size t` and one-step exit probability bound
¯ : X̄` (t) 2 Zd+ , 8t 2 [0, T ]}, and g` := g(X̄` (T )). The MLMC
!2⌦
. Define A` :={¯
estimator proposed in this article, ML , requires to sample from the random variables
[g` g` 1 ](!), that is, the di↵erence between the observable g, computed at the end
of two coupled hybrid paths generated by two consecutive time-meshes of sizes t`
and t` 1 , respectively.
To couple two hybrid paths, we use at each time step 4 algorithms as building
blocks:
75
Block ` 1 ` description
B1 TL TL couple two tau-leap paths in [t, H]

B2 TL MNRM couple one tau-leap path with one exact path in [t, H]
B3 MNRM TL couple one exact path with one tau-leap path in [t, H]
B4 MNRM MNRM couple two exact paths in [t, H]
These algorithms are detailed in [5].

¯ , which are the processes associated with
In Algorithm 2, we couple X̄ and X̄
the coarse and fine time-meshes, t` 1 and t` , respectively. The coupling is based
¯ , for the processes X̄ and X̄
on sequences of horizons, H̄ and H̄ ¯ , respectively. The
¯ , choose independently which method to use for moving forward
processes, X̄ and X̄
from the present time to its respective time-horizon. Once we know the method and
the horizons, we move forwards both processes from the current time t, to the next
¯ }, using one of the building blocks B1, B2, B3
synchronization point H := min{H̄, H̄
and B4 as shown in Algorithm 2.
To ensure the telescoping sum property of the MLMC method, the decision about
the method chosen for the next step should be made disregarding completely the
decision made by the other process. For that reason, each process has its own next
horizon, H as decision point. See Figure 3.5 showing the time-horizon scheme.
t+¯ t̄
H̄
¯t̄ t+¯
t ¯
H̄ T
Figure 3.5: This figure depicts a particular instance of the Cherno↵ hybrid coupling
algorithm (Algorithm 2), where ⌧¯ < ⌧¯. The synchronization horizon H, defined as
H:= min{H̄, H̄¯ }, is equal to H̄ in this case. Notice that H̄:= min{t + ⌧¯, t̄, T } and
¯ ¯ from t to H.
H̄ := min{t + ⌧¯, t̄¯, T }. Algorithm 2 computes X̄ and X̄
76
Algorithm 2 Inputs: Initial point x0 , coarse and fine meshes, final time T . Outputs:
¯ , in the interval [0, T ].
two coupled hybrid paths, X̄ and X̄
1: Set X̄ ¯
x , X̄ x
0 0
2: Set t H
3: Set t̄ as the smallest coarse mesh point greater than t
4: Set t̄¯ as the smallest fine mesh point greater than t
5: Compute H̄ H(t, X̄, t̄, T )
6: Compute H̄ ¯ ¯ , t̄¯, T )
H(t, X̄
7: while t < T do
8: H ¯}
min{H̄, H̄
9: Select Block and move forwards X̄ and X̄ ¯ from t to H
10: Set t H
11: Set t̄ as the smallest coarse mesh point greater than t
12: Set t̄¯ as the smallest fine mesh point greater than t
13: if H = H̄ then
14: Compute H̄ H(t, X̄, t̄, T ) using Algorithm 3
15: end if
16: if H = H̄ ¯ then
17: Compute H̄ ¯ H(t, X̄ ¯ , t̄¯, T ) using Algorithm 3
18: end if
19: end while
Algorithm 3 Inputs: current time t, current state x, smallest mesh point s greater
than t, and final point T . Outputs: H.
1: Given x, s and t and T , get the method m and ⌧
2: if m is TL then
3: H min{t + ⌧, s, T }
4: else
5: H min{t + ⌧, T }
6: end if
7: return H(t, x, s, T )
Here, the global error is EL := E [g(X(T ))] ML , where
M0 L M
1 X X 1 X̀
ML := g0 1A0 (!m0 ) + [g` 1A` g` 1 1A` 1 ](!m` ).
M0 m =1 `=1
M` m =1
0 `
Using dual-weighted residual expansion techniques, we also develop a new way

to estimate the variance of the di↵erence of two consecutive levels, g` g` 1 . This
77
is crucial because the computational work required to stabilize the sample variance
estimator of the di↵erence between two consecutive levels is often una↵ordable for
the deepest levels of the MLMC hierarchy. Our algorithm enforces the total error to
be below a prescribed tolerance, T OL, with high probability. This is achieved with
nearly optimal computational work. More specifically, we solve:
8
> PL
>
> min L }
`=0 ` M`
>
< {L,(M ` , ` ) `=0
s.t. .
>
>
>
>
: EI,L + EE,L + ES,L  T OL
The meaning of these expressions are analogous to the ones described in Article I
(see Remark 3.1.1). For reaching this optimality, we derived novel formulas based
on dual-weighted residual estimations for computing the variance of the di↵erence
of the observables between two consecutive levels in coupled hybrid paths and also
the bias of the deepest level. These formulas are particularly relevant for Stochastic
Reaction Networks since alternative standard sample estimators become too costly
at deep levels because of the presence of large kurtosis.
Of paramount importance is that the computational complexity of our hybrid
MLMC method is of order O (T OL 2 ), that is, the same computational complexity
of an exact method, but with a smaller constant. To put this into perspective, our
algorithm acts as if we were generating exact paths and then use the standard Monte
Carlo method.
Our numerical examples show substantial gains compared to the previous single-
level approach and the SSA.
3.2.3 Results
We now analyze an ensemble of five independent runs of the calibration algorithm,

using di↵erent relative tolerances. In Figure 3.6, we show, in the left panel, the total
78
predicted work (runtime) for the single-level hybrid method, for the multilevel hybrid
method and for the SSA method, versus the estimated error bound. We also show the
estimated asymptotic work of the multilevel method. Again, the multilevel hybrid
method outperforms the others and we remark that the observed computational work
of the multilevel method is of order O (T OL 2 ).
Predicted work vs. Error bound, Genes model Actual work vs. Error bound, Genes model
−1
−1 10
10 SSA
SSA
Hybrid Hybrid
Hybrid ML Hybrid ML
slope 1/2 slope 1/2
Error bound
Error bound
Asymptotic
−2 −2
10 10
1 2 3 4
10 10 10 10 1 2 3 4
10 10 10 10
Predicted work (runtime, seconds) Actual work (runtime, seconds)
Gene Transcription and Translation model 5.5.2. The hybrid method is preferred over
the SSA for the first three tolerances only. The multilevel hybrid method is preferred
over the SSA and the single-level method for all the tolerances. Right: Actual work
(runtime) versus the estimated error bound.
3.2.4 Summary
In this article, we developed a multilevel Monte Carlo version for the single-level hy-
brid Cherno↵ tau-leap algorithm presented in [4]. We showed that the computational
complexity of this method is of order O (T OL 2 ) and, therefore, that it can be seen
as a variance reduction of the SSA method, which has the same complexity. This
represents an important advantage of the hybrid tau-leap compared the pure tau-
leap in the multilevel context. In our numerical examples, we obtained substantial
gains with respect to both the SSA and the single-level hybrid Cherno↵ tau-leap. The
present approach, like the one in [4], also provides an approximation of E [g(X(T ))]
79
with prescribed accuracy and confidence level, with nearly optimal computational
work.
3.3 Overview of Article III
A. Moraes, R. Tempone and P. Vilanova, “Multilevel adaptive reaction-splitting

simulation method for stochastic reaction networks”, preprint arXiv:1406.1989v1,
(2014).
The author contributed specially to the splitting algorithm and the formulation
of the expected computational work per path. This work has been presented by
the author in the Mathematical, Computational and Modeling Sciences Center
at Arizona State University, June 2014, Tempe - USA.
In this article, we present a novel multilevel Monte Carlo method for kinetic simulation
of stochastic reaction networks that is specifically designed for systems in which the
set of reaction channels can be adaptively partitioned into two subsets: RTL and
RMNRM ; the idea is to find the next state of the system, X̄n+1 , as the current state,
X̄n , plus two increments, TL + MNRM , where TL is a tau-leap increment involving
the reactions in the class RTL and, MNRM is an exact increment produced by the
reactions in the class RMNRM . Adaptivity in this context means that the partition
evolves in time according to the states visited by the stochastic paths of the system.
3.3.1 Optimal-work Splitting Rule
The partition of the set of reaction channels is based on a heuristic manner of greed-
ily optimizing an objective function defined as the expected computational work of
moving the system from the current time t to the next time-horizon H.
For a reaction j to be in the RTL class there are two simultaneous requirements:
80
A) high propensity aj (x) and B) low probability, ✓j , of reaching a negative population
state (see the precise definition in Equation (7.4)).
We propose to split the sorted set of penalized propensities
Sort ({(1 ✓1 )a1 (x), (1 ✓2 )a2 (x), . . . , (1 ✓J )aJ (x)}) .
In such a case, the objective function has J + 1 values. We propose to reduce this
number to 3 by searching only the current partition and its two neighbors, that is,
if for the k-step of our algorithm, we select the highest p penalized propensities to
be in the tau-leap group, then, in the k + 1-step, we evaluate the objective function
at three partitions: the highest p, p 1 and p + 1 penalized propensities. Algorithm
4 performs the described split. Observe that if the current time is close to the next
grid point, we select a trivial partition (;, R, ), that is, we take an exact step.
Algorithm 4 The one-step mixing rule. Inputs: the current state of the approximate
process, X̄(t), the current time, t, the values of the propensity functions evaluated
at X̄(t), (aj (X̄(t)))Jj=1 , the one-step exit probability bound , the next grid point,
T̃ , and the previous optimal split, . Outputs: the tau-leap set, RTL , the exact set,
RMNRM , and the new optimal split .
PJ
Require: a0 j=1 aj > 0
1: if K1 /a0 < T̃ t then
2: Compute ✓j , j=1, .., J (see (7.4))
3: ã (j) Sort{(1 ✓j )aj } descending, j=1, .., J
4: Si Compute the splits taking into account the previous optimal split 
5: (RTL , RMNRM , ) Take the minimum work split
6: return (RTL , RMNRM , )
7: else
8: return (;, R, )
9: end if
3.3.2 Coupling Two Mixed Paths

¯ be two mixed paths, corresponding to two nested time discretizations,
Let X̄ and X̄
called coarse and fine, respectively. Assume that the current time is t, and we know the
81
¯
states, X̄(t) and X̄ (t), the next grid points at each level, t̄, t̄¯, and the corresponding
one-step exit probabilities, ¯ and ¯. Based on this knowledge, we have to determine
¯ , R̄
the four sets (R̄TL , R̄MNRM , R̄ ¯
TL MNRM ), that correspond to four algorithms, B1,
B2, B3 and B4, that we use as building blocks. Table 3.1 summarizes them. In order
R̄TL R̄MNRM
¯
R̄ B1 B2
TL
¯
R̄ B3 B4
MNRM
Table 3.1: Building blocks for simulating two coupled mixed Cherno↵ tau-leap paths.
Algorithms B1 and B2 are presented as Algorithms 2 and 3 in [6]. Algorithms B3
and B4 can be directly obtained from Algorithm B2 (see [5]).
to do that, the algorithm computes, independently, the sets RTL and RMNRM for
each level, and the time until the next decision is taken, H, using Algorithm 27 in
Section 7.6. Next, it computes concurrently the increments due to each one of the
sets (storing the results in X̄ and ¯ for the coarse and fine grid, respectively).
X̄
We note that the only case in which we use a Poisson random variates generator for
the tau-leap method is in Algorithm B1 (Algorithm 28 in Section 7.6).
3.3.3 A New Control Variate
Based on the random time-change representation of a SRN, it is possible to replace

the current process X(t) by its approximating reaction-rate ODEs, which can be
computed beforehand. Thus, by keeping track of the values of the unit-rate Poisson
processes, it is possible to obtain a control variate for the 0-level of our multilevel
approach. Details can be found in Section (7.4).
To estimate expected values of observables of the system at a prescribed final time,

our method bounds the global computational error below a prescribed tolerance,
82
T OL, within a given confidence level. In this article, we use the same error control
strategy employed in [5]. Therefore we achieve a computational complexity of order
O (T OL 2 ), the same as with an exact method, but with a smaller constant.
3.3.5 An illuminating example: A holding company
We consider a system with hundreds of species and reactions but still easy to repro-
duce due to its simple structure. This example is intended to show the advantages
of our multilevel adaptive reaction-splitting technique over the multilevel tau-leap
proposed by Anderson and Higham in [6] which is regarded as the state of the art. To
this end, to make the comparison clear and fair, we do not use the MATLAB feature
that allows to call a batch of Poisson deviates, because this feature is not present
in programming languages like C or FORTRAN. Remember that coupling two simu-
lated paths in two consecutive time-meshes is essential in the multilevel Monte Carlo
methods (see Section A.7).
Example description
Consider a closed system (economy) formed by one big particle (the holding company)
and N small particles (business units). Each business unit obtains its funds by trading
with the environment (represented by the empty set) and it makes net transfers to the
holding company according to its current money level. At the same time, the holding
company pays dividends to the environment, and from time to time, its money level
is increased by its own investments. Let us consider the following particular case:
The number of business units is N = 200.
Initial money levels: X(0) = (1, ..., 1, 1e5).

| {z }
N times
Each business unit obtains its funds from the environment at constant rate and
makes net transfers to the holding company at a rate proportional to its current
83
money level,
k k
; ! X1 , ..., ; ! XN
c c
X1 ! Y, ..., XN ! Y
The holding company pays dividends to the environment, and receives returns
from its own investments,
a b
Y ! ;, Y ! 50 Y
This last reaction can be interpreted as:
P Y (t + dt) = y + 49 Y (y) = y = b y dt + O dt2
that is, with a rate proportional to its current money level, the holding company
increases its money level by 49 units.
0 1tr
1 0 ... 0
B C
B C
B 0 1 ... 0 C
B C
B . .. C
B .. ... . C
B C
B C
B C
B 1 0 ... 1 C
Stoichiometric matrix, ⌫ = BB
C
C
B 0 1 ... 1 C
B C
B . .. C
B .. ... . C
B C
B C
B C
B 0 0 ... 1 C
@ A
0 0 ... 49
(2(N +1)⇥N +1)
Propensity functions: a(X) = (k, ..., k , c X1 , ..., c XN , a Y, b Y ).

| {z }
N times
Coefficient rates k = 1, c = 1, a = 0.5, b = 0.01.
Final simulation time: T = 3.

84
Quantity of Interest: Y (T ), that is, the holding company final money level.
4
x 10
10.2 2
1.8
10.1
1.6
1.4
10
1.2
9.9 1
0.8
9.8
0.6
0.4
9.7
0.2
9.6 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
Figure 3.7: Left panel: One SSA path of the money level of the holding company.
Right panel: five SSA paths of the business units. Observe that this example is far
from a deterministic mean field approximation and has relevant stochastic behavior.
Results using 5 batches of M = 200 sampled paths
In this section, we present results showing the computational work involved in gener-
ating sample paths. We compare single-level path generation: i) vanilla tau-leap, with
ii) our mixed method using uniform time-meshes, and then, we compare coupled-level
path generation: iii) Anderson and Higham’s coupled TL-TL method with iv) our
coupled mixed method using uniform time-meshes.
Since we are just comparing discretization schemes, to make fair comparisons, we
do not control the exit error for the generation of mixed paths, so there is no Cherno↵
bound cost involved.
The savings in computational work when generating Poisson random variables
heavily depend on MATLAB’s performance. For example, we do not generate the
random variates in batches nor we use any “vectorization” advantage. In fact, we
should expect better results from our method if we implement our algorithms in
more performance-oriented languages or if we sample Poisson random variables in
batches.
85
As a baseline to compare, the average work (runtime) per path of the SSA is
17 seconds.
The Anderson and Higham unbiased scheme, that involves the generation of
two coupled paths, where the tau-leap one uses a time-mesh with t = 0.0625,
and the other is generated using the exact Modified Next Reaction Method by
Anderson, has an average work per path (runtime) of 74 seconds. This makes
the unbiased approach not attractive and therefore it is not considered in this
comparison any further.
Average work (runtime) per path over 5 batches
We compute the average, minimum and maximum value over a batch of 5 runs.
t coarse level TL (avg,min,max) Mix (avg,min,max)
0.0625 (1.27e+00, 1.27e+00, 1.27e+00) (8.21e-01, 8.18e-01, 8.23e-01)

0.0312 (2.52e+00, 2.52e+00, 2.52e+00) (8.71e-01, 8.68e-01, 8.76e-01)
0.0156 (5.02e+00, 5.02e+00, 5.03e+00) (9.67e-01, 9.60e-01, 9.72e-01)
0.0078 (9.99e+00, 9.98e+00, 1.00e+01) (1.12e+00, 1.12e+00, 1.13e+00)
t coarse level A&H biased (avg,min,max) Mix Coupled (avg,min,max)
0.0625 (2.53e+00, 2.53e+00, 2.54e+00) (1.85e+00, 1.82e+00, 1.89e+00)

0.0312 (5.05e+00, 5.04e+00, 5.05e+00) (2.01e+00, 2.01e+00, 2.03e+00)
0.0156 (1.00e+01, 1.00e+01, 1.01e+01) (2.35e+00, 2.34e+00, 2.38e+00)
0.0078 (2.01e+01, 2.00e+01, 2.01e+01) (3.01e+00, 2.99e+00, 3.03e+00)
Percentage of exit paths over total sampled paths
Note that our Mixed and Coupled Mixed path (with constant t) both have zero
observed exited paths.
86
TL
Mixed
A&H Coupled
101 Mixed Coupled
Actual work per path
100
0.01 0.02 0.03 0.04 0.05 0.06

delta t
Figure 3.8: Loglog plot of the average work (runtime) per path over 5 batches.
t coarse level TL (avg,min,max) A&H biased (avg,min,max)
0.0625 (48 steps) (99.9, 99.5, 100.0) (99.1, 98.0, 100.0)

0.0312 (96 steps) (98.0, 97.0, 99.0) (87.9, 86.0, 91.0)
0.0156 (192 steps) (85.4, 82.0, 89.0) (64.1, 60.5, 70.0)
0.0078 (384 steps) (61.7, 58.0, 67.0) (39.0, 35.5, 44.0)
Average number of exit events given that the path exited
Note that our Mixed and Coupled Mixed path (with constant t) have zero exited
paths.
t coarse level TL (avg,min,max) A&H biased (avg,min,max)
0.0625 (48 steps) (7.42, 7.35, 7.55) (7.93, 7.45, 8.28)

0.0312 (96 steps) (3.92, 3.78, 4.03) (4.56, 4.49, 4.66)
0.0156 (192 steps) (2.33, 2.10, 2.53) (3.14, 3.06, 3.34)
0.0078 (384 steps) (1.51, 1.46, 1.63) (2.52, 2.44, 2.62)
87
Calibration Cherno↵ Mixed ML
In this section we compare our Cherno↵ Mixed ML method, which controls the global
approximation error and in particular the e↵ect of the exit error against SSA, for
di↵erent levels of T OL. Work is measured in runtime, seconds.
Work vs. Error bound, NPLAY2 model Work vs. Error bound, NPLAY2 model
SSA SSA
Mixed ML Mixed ML
slope 1/2 slope 1/2
Asymptotic Asymptotic
Error bound
Error bound
10-3 10-3
103 104 105 106 102 104 106

Work Work
Figure 3.9: Left: Predicted work (runtime) versus the estimated error bound. Right:
Predicted work (runtime) versus the estimated error bound using the control variate
at level 0 (see Section (7.4)). An additional gain of a multiplicative factor of 50 is
obtained.
T OL Work Mixed ML with CV Work Mixed ML Work SSA
6.25e-03 3.84e+00 1.22e+02 2.27e+04

3.12e-03 8.04e+00 4.43e+02 8.33e+04
1.56e-03 4.25e+01 2.24e+03 4.19e+05
7.81e-04 1.95e+02 7.52e+03 1.40e+06
3.91e-04 5.83e+02 3.15e+04 5.86e+06
Conclusions from the example
This example, which is similar to economic or communication networks, shows that

the mixed strategy exhibits a noticeable advantage over the vanilla Anderson and
Higham. We also observe that the unbiased version of the Anderson and Higham
algorithm is too expensive, and the corresponding biased version is here of relative
88
little value since it is both slower than the mixed approach and moreover, its generated
paths frequently reach negative population numbers.
3.3.6 Summary
In this article, we developed an adaptive reaction-splitting multilevel Monte Carlo

method, based on our Cherno↵ tau-leap methods [4, 5]. Its computational complex-
ity is O (T OL 2 ) and therefore, it can be seen as a variance reduction of the SSA,
which has the same complexity. In our numerical examples, this algorithm shows im-
portant advantages in performance compared to a non-split strategy. We also present
a novel control variate technique based on the stochastic time-change representa-
tion by Kurtz, which may dramatically reduce the variance of the coarsest level at a
negligible computational cost.
3.4 Overview of Article IV
A. Moraes, F. Ruggeri, R. Tempone and P. Vilanova, “Multiscale Modeling of

Wear Degradation in Cylinder Liners”, SIAM Multiscale Modeling and Simu-
lation, Vol. 12, Issue 1 (2014).
The author contributed specially by proposing and computing the goodness-of-

fit techniques based on the Master Equation. This work has been presented
by the author in the seminar talk of the Laser Interferometer Gravitational-
Wave Observatory (LIGO) at the California Institute of Technology, July 2014,
Pasadena - USA.
Every mechanical system is naturally subjected to some kind of wear process that,
at some point, will cause failure in the system if no monitoring or treatment process
is applied. Since failures are expensive, it is essential both to predict and to avoid
89
them. To achieve this, a monitoring system of the wear level should be implemented
to decrease the risk of failure. In this work, we take a first step into the development of
a multiscale indirect inference methodology for state-dependent Markovian pure jump
processes. This allows us to model the evolution of the wear level, and to identify
when the system reaches some critical level that triggers a maintenance response.
Since the likelihood function of a discretely observed pure jump process does not
have an expression that is simple enough for standard non-sampling optimization
methods, we approximate this likelihood by expressions from upscaled models of the
data. We use the Master Equation to assess the goodness-of-fit and to compute the
distribution of the hitting-time to the critical level.
3.4.1 What is wear?
“In materials science, wear is erosion or sideways displacement of material from its
‘derivative’ and original position on a solid surface performed by the action of another
surface” (Wikipedia). The wear in the cylinder liner is mainly because of following
reasons:(http://www.marineinsight.com)
3.4.2 The Data Set
The data set consists of wear levels observed on 32 cylinder liners of eight-cylinder
SULZER engines and measured by a caliper with a precision of = 0.05 mm (see
90
Figure 3.10). Warranty clauses specify that the liner should be substituted before it
accumulates a wear level of 4.0 mm, in order to avoid expensive failures.
A motivational question could be: when should we send the ship for maintenance?
Observed wear process

5
4
Wear [mm]
0
0 1 2 3 4 5 6
Operating time [h] x 10
4
Figure 3.10: Due to the caliper’s finite precision, every single measurement of the
wear process, W (t), belongs to the lattice {0, , 2 , . . .}.
3.4.3 The Base Model
We find more natural to our framework to model a decay process instead of an

increasing process. For that reason we define X(t) as the thickness process, i.e.,
X(t) = X0 W (t), where W is the wear process, and X0 is the initial thickness.
After a systematic model selection process, we obtained: (a1 (x), ⌫1 ) = (c1 x, ) and
(a2 (x), ⌫2 ) = (c2 x, k ), where k is a positive integer to be determined.
The probability of observing a thickness decrease in a small time interval (t, t+dt)
91
is
P (X(t + dt) = X(t) X(t) = x) = c1 x dt (3.3)
P (X(t + dt) = X(t) k X(t) = x) = c2 x dt
where X(0) = x0 is the initial thickness and ✓ = (c1 , c2 , X0 , k) is the vector of unknown
parameters.
3.4.4 A Gaussian moment expansion model for the indirect
inference model
The data x = {xi }ni=1 are modeled according to xi = Z(ti ) + ✏i (indirect inference
2 2
model). Here Z(t) ⇠ N (m(t), (t)), and m(t) and (t) satisfy
8
>
>
>
> dm(t) = (c1 ⌫1 + c2 ⌫2 )m(t)dt,
<
d 2 (t) = (2(c1 ⌫1 + c2 ⌫2 ) 2 (t) + (c1 ⌫12 + c2 ⌫2 2 )m(t))dt,
>
>
>
>
: (m(0), 2
(0)) = (x0 , 0), x0 2 R+ , t 2 R+ ,
2
and ✏i are i.i.d. realizations of N (0, ) for i = 1, . . . , n. Here, ⌫1 = and
⌫2 = k according to (3.3).
In this case, the likelihood can be written as
n
Y ⇢
1 (xi m(ti ; ✓))2
L(✓; x) = p exp .
i=1 2⇡( 2 + 2 (t
i ; ✓)) 2( 2 + 2 (ti ; ✓))
The MLE for ✓ is given by the minimizer of minus the log likelihood,
n ⇢
X
⇤ (xi m(ti ; ✓))2 2 2
✓ = arg min 2 + 2 (t ; ✓)
+ log( + (ti ; ✓)) .
✓2⇥ i
i=1
We determine first the minimum conditioned on k and X0 and then the global
92
optimizer.
3.4.5 Results
In the Figure 3.11, we see that the likelihood provided by our indirect inference model
has a unique maximum at (c⇤1 , c⇤2 ) = (0.63 · 10 4 , 1.2 · 10 4 ). The 90% confidence band
derived from the Master Equation (1.6) associated with our SRN (3.3) is given in the
right panel.
x 10 Residuals of the opposite of the loglikelihood

−4
2.5 4.5
10
4
9
2 3.5
8
3
7
Wear [mm]
1.5
6 2.5
c2
5 2
1
4
1.5
3
1 Data
0.5 2 90% conf. band model 1
1
0.5 90% conf. band model 2
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
c −4
Operating time [h] 4
1 x 10 x 10
Figure 3.11: Left panel: Unique global maximum (c⇤1 , c⇤2 ) = (0.63 · 10 4 , 1.2 · 10 4 ).
Right panel: the exact 90% confidence band computed from the associated Master
Equation.
Answer to the motivational question: The ship should be sent to maintenance

at the time that the thickness process, X, is less than or equal than B = X0 4 mm,
where X0 = 5 mm is the initial thickness.
P
We have that F⌧B ;✓ (t) := P (X(t)  B|✓) = xB px (t; ✓), where px (t; ✓) is the
probability that X(t)=x, given the value of the parameter vector ✓.
Conditional Residual Reliability
Suppose that we know that the wear process, W , is at level w0 at time t0 0. Assume
that there exists a critical stopping level, wmax > w0 , that determines the residual
lifetime ⌧max t0 . For t > 0, the residual lifetime is greater than t, if and only if
93
CDF of the hitting time, B = 1 PDF of the hitting time, B = 1
1 0.05
0.8 0.04
0.6 0.03
0.4 0.02
0.2 0.01
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
x 10 Operating time [h] 4
x 10
Figure 3.12: Left panel: CDF of the hitting-time for B = 1. Right panel: PDF of
the hitting-time to the critical level.
W (t0 + t) < wmax . Therefore, the conditional probability
P (⌧max t0 > t|W (t0 ) = w0 ) = P (W (t0 + t) < wmax |W (t0 ) = w0 ) .
Taking into account the relation between the wear and the thickness processes, we
have that the conditional residual reliability function defined as
R(t; t0 , w0 ) := P (⌧max t0 > t|W (t0 ) = w0 )
can be written as P (X(t; X0 w 0 ) > X0 wmax ), where X(·, x0 ) is the thickness

process starting from x0 .
3.4.6 Summary
In this paper, we presented a novel approach to the problem of modeling the wear
process of cylinder liners. Since the measuring caliper has finite precision, the wear
process takes values in a lattice, and therefore a pure jump process is a sensible model.
In this approach, we started fitting one of the most simple pure jump processes, i.e.,
the simple decay model, and added complexity only when necessary. We found that
the wear process can be modeled using only two jumps of amplitudes, and 4 , with
94
Conditional residual reliabilty
1.4
1.2
1 w0 =2
w0 =3
0.8
w0 =4
0.6
w0 =5
0.4
0.2
0
0 2 4 6 8 10 12
Residual lifetime [h] 4
x 10
Figure 3.13: Behavior of the conditional residual reliability function, R(t; 0, w0 ) for
some values of w0 . In this case, we set wmax = 4. As expected, for a fixed residual
lifetime t, we have that R(t; 0, w0 ) is a decreasing function of w0 .
linear propensity functions. In contrast to the work of Giorgio, Guida, and Pulcini
[7], we did not need to use age-dependent propensity functions or gamma noise. Nev-
ertheless, our approach can deal with age-dependent propensities since time does not
play any other role than a given constant. One of the main contributions of this work
is the multiscale indirect inference approach, where the inferences are based on up-
scaled models. The coefficients of the linear propensity functions were inferred using
the likelihood associated with a Gaussian upscaled model. The mean and variance
of this Gaussian process are the solutions of a second-order moment expansion ODE
system. In this way, we computed the MLE by solving a standard nonlinear least
squares problem. We observe that this method is much simpler than dealing directly
with the likelihood of the pure jump process, which in general cannot be expressed in
closed form and requires computationally intensive sampling techniques to be solved.
We notice that, as long as the probability distribution of the pure jump process is
unimodal at every time, our Gaussian inference approach is applicable and produces
95
substantial savings in the computational work. Otherwise, the Langevin model, while
more computationally demanding, is more flexible.
Thanks to the remarkable simplicity of our model, we can easily obtain the dis-
tribution of any observable of the process directly from the solution of the associ-
ated Master Equation, which provides the probability distribution of the process at
all times. From this probability mass function, we easily compute the CDF of the
hitting-time to the critical value stipulated in the warranty and the conditional resid-
ual reliability function. It is worth mentioning that we did not use Monte Carlo
simulation or any other sampling procedure.
3.5 Overview of Article V
C. Bayer, A. Moraes, R. Tempone and P. Vilanova, “Forward-Reverse Rep-

resentation for Stochastic Reaction Networks with Applications to Statistical
Inference”, preprint, (2015).
The author contributed specially to the forward-reverse formulae, the theoretical

development of the Monte Carlo EM estimators and the method for reducing
the computational work based on reaction-rate ODEs.
In this work, we present an extension of the forward-reverse algorithm by Bayer and

Schoenmakers [8] to the context of stochastic reaction networks (SRNs). It makes the
approximation of expected values of functionals of bridges for this type of processes
computationally feasible.
We then apply this SRN-bridge-generation technique to the statistical inference
problem of estimating the reaction coefficients based on discretely observed data.
To this end, we introduce a two-phase iterative inference method, named FREM, in
which, during the first phase, we solve a set of deterministic optimization problems
where the SRNs are replaced by their reaction-rate ODE approximation; then, during
96
the second phase, the Monte Carlo version of the Expectation-Maximization (EM)
algorithm is applied starting from the output of the previous phase. The method is
represented in Figure 3.14.
✓I ! ✓II ! ✓ÎI ! · · · ✓ÎI ! · · · ! ✓ˆ

(0) (0) (1) (p)
(0)
Figure 3.14: The two-phase estimation process. In the first step, we obtain ✓II from
(0)
✓I by solving the optimization problem (3.5)). In the subsequent steps, we generate
(p)
the stochastic sequence (✓II )+1
p=1 using Monte Carlo EM (3.12).
Starting from a set of over-dispersed seeds, the output of our two-phase method is
a cluster of maximum likelihood estimates obtained by using convergence assessment
techniques from the theory of Markov chain Monte Carlo. An example of the output
of our method for a Birth and Death process (see 1.2.3) is provided in the Figure 3.16
and Table 3.2. The data set for this example is shown in Figure 3.15.
Data trajectory
30
X
28
26
24
Species count
22
20
18
16
14
12
0 50 100 150 200
Time
Figure 3.15: Data trajectory for the Birth-death example. This is obtained by ob-
serving the values of an SSA path at uniform time intervals of size t=5.
97
0.09
Initial point phase I
0.08
Final point phase II
0.07
0.06
0.05
0.04
0.03
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Figure 3.16: FREM estimation (phase I and phase II) for the birth-death process.
The horizontal axis is for the birth rate and the vertical axis for the death rate.
(p⇤ )
= ✓ÎI,i
(0) (0)
seed i ⇤ = ✓I,i 3 = ✓II,i
1 (0.5, 0.04) (6.24e-01, 3.29e-02) (1.24e+00, 6.55e-02)
2 (0.5, 0.08) (7.68e-01, 4.07e-02) (1.29e+00, 6.67e-02)
3 (1.5, 0.04) (1.01e+00, 5.25e-02) (1.18e+00 6.27e-02)
4 (1.5, 0.08) (1.53e+00, 7.97e-02) (1.20e+00, 6.34e-02)
Table 3.2: Values generated by the FREM algorithm for the birth-death example.
98
3.5.1 A Two-phase Algorithm
In this section, we present a two-phase Forward-reverse Expectation Maximization

(FREM) algorithm, for estimating the parameter ✓. The phase I is deterministic
while the phase II is stochastic.
We can organize the information in our data set, D, as a finite collection,
D = ([sk , tk ], x(sk ), x(tk ))K

k=1 , (3.4)
such that, for each k, Ik := [sk , tk ] is the time interval determined by two consecutive
observational points, sk and tk , where the states x(sk ) and x(tk ) have been observed,
respectively.
Phase I: Using approximating ODEs
The main goal of the phase I is to address the key problem of finding a suitable initial
(0)
point, ✓II for the phase II. The idea is to increase (in some cases dramatically) the
number of SRN-bridges from the sampled forward-reverse trajectories for all time
intervals.
(0)
Let us now describe the phase I. From the user-selected seed, ✓I , we solve the fol-
lowing deterministic optimization problem using some appropriate numerical iterative
method:
X ⇣ ⌘
(0) (0)
✓II := arg min wk d Z̃ (f ) (t⇤k ; ✓), Z̃ (b) (t⇤k ; ✓) , starting from ✓I . (3.5)
✓ 0
k
Here Z̃ (f ) is the ODE approximation, defined by (9.5) (or (1.13)), in the interval
[sk , t⇤k ], to the SRN defined by the reaction channels, ((⌫j , aj ))Jj=1 , and the initial
condition x(sk ); and, Z̃ (r) , is the ODE approximation in the interval [t⇤k , tk ], to the
SRN defined by the reaction channels, (( ⌫j , ãj ))Jj=1 , and by the initial condition
99
x(tk ), where ãj (x) := aj (x ⌫j ). We define Z̃ (b) (u, ✓):=Z̃ (r) (t⇤k +tk u, ✓) for u 2 [t⇤k , tk ].
Here wk :=(tk sk ) 1
and d(·, ·) is an appropriate distance in Rd . The rationale behind
this particular choice of the weight factors, wk , is based on the mitigation of the
e↵ect of very large time intervals where the evolution of the process, X, may be more
uncertain. A better (but more costly) measure would be the inverse of the maximal
variance of the SRN-bridge.
(0)
Remark 3.5.1 (Alternative definition of ✓II ). In some cases, convergence issues
arise when solving the problem (3.5). We found useful to solve a set of simpler
problems whose answers can be combined to provide a reasonable seed for the phase
II: more precisely, we solve K deterministic optimization problems, one for each time
interval [sk , tk ]:
k := arg min Z̃ (f ) (t⇤k ; ✓) Z̃ (b) (t⇤k ; ✓) ,

✓ 0
(0)
all of them solved iteratively with the same seed, ✓I . Then, we define
P
(0) wk k
✓II := Pk , (3.6)
k wk
Phase II: The Monte Carlo EM
In our statistical estimation approach, the Monte Carlo EM algorithm uses data
(pseudo-data) generated by those forward and backward simulated paths that result
in SRN-bridges, either exact or approximate bridges. Figure 3.17 illustrates this idea.
This last notion is associated with the use of kernels. The phase II implements the
Monte Carlo EM algorithm for SRNs.
Simulating Forward and Backward Paths: this phase starts with the sim-
ulation of forward and backward paths at each time interval Ik . More specifically,
given an estimation of the true parameter ✓, say, ✓ˆ = (ĉ1 , ĉ2 , . . . , ĉJ ), the first step
100
70
65
60
W
55
50
45
0.25 0.3 0.35 0.4 0.45 0.5

Time
Figure 3.17: Illustration of the forward reverse path simulation in Phase II. This
corresponds to a given interval for the wear data, presented in Section 9.6.2. The
observed values are the one at the beginning of the interval and the last one. In
the y-axis we plot the thickness process X(t), derived from the wear process of the
cylinder liner. Observe that every forward path that ends up at a certain value will
be joined with every backward path that ends up in the same value, when using the
Dirac kernel.
101
is to simulate Mk forward paths with reaction channels (⌫j , ĉj gj (x))Jj=1 in [sk , t⇤k ], all
of them starting at sk from x(sk ) (see Section 9.5.1 for details about the selection of
Mk ). Then, we simulate Mk backward paths with reaction channels ( ⌫j , ĉj gj (x
˜ m ))M
⌫j ))Jj=1 in [t⇤k , tk ], all of them starting at tk from x(tk ). Let (X̃ (f ) (t⇤k , ! m=1 and
k
˜ m0 ))M
(X̃ (b) (t⇤k , ! m0 =1 denote the values of the simulated forward and backward paths at
k
the time, t⇤k , respectively. If the intersection of this two sets of points is nonempty,
then, there exists at least one m and one m0 such that the forward and backward
paths can be linked as one SRN-bridge connecting the data values x(sk ) and x(tk ).
When the number of simulated paths, Mk , is large enough, and an appropriate
guess of the parameter ✓ is used to generate those paths, then, due to the discrete
nature of our state space, Zd+ , we expect to generate a number of exact SRN-bridges
sufficiently large to perform statistical inference. However, at early stages of the
Monte Carlo EM algorithm, our approximations to the unknown parameter, ✓, are
not expected to provide a large number of exact SRN-bridges. In such a case, we
can use kernels to relax the notion of exact SRN-bridge, (see Section 9.2.3). Notice
that in the case of exact SRN-bridges, in the formula (9.16), we are implicitly using
a Kronecker’s kernel, that is,  takes the value 1 when X̃ (f ) (t⇤k , !
˜ m ) = X̃ (b) (t⇤k , !
˜ m0 )
and 0 otherwise. We can relax this condition to obtain approximate SRN-bridges.
To make an efficient use of kernels, we first transform the endpoints of the forward
and backward paths generated in the interval Ik ,
Xk := (X̃ (f ) (t⇤k , !
˜ 1 ), X̃ (f ) (t⇤k , !
˜ 2 ), . . . , X̃ (f ) (t⇤k , !
˜ Mk ), (3.7)
X̃ (b) (t⇤k , !
˜ Mk +1 ), X̃ (b) (t⇤k , !
˜ Mk +2 ), . . . , X̃ (b) (t⇤k , !
˜ 2Mk ))
102
into
H(Xk ) := (Ỹ (f ) (t⇤k , !

˜ 1 ), Ỹ (f ) (t⇤k , !
˜ 2 ), . . . , Ỹ (f ) (t⇤k , !
˜ Mk ), (3.8)
Ỹ (b) (t⇤k , !
˜ Mk +1 ), Ỹ (b) (t⇤k , !
˜ Mk +2 ), . . . , Ỹ (b) (t⇤k , !
˜ 2Mk ))
by a linear transformation, H, with the aim of eliminate possibly high correlations in

the components of Xk . The original cloud of points, Xk , formed by the extremes of the
forward and backward paths, is then transformed into, H(Xk ), which hopefully has a
covariance matrix close to a multiple of the d-dimensional identity matrix, ↵Id . Ide-
ally, the coefficient, ↵, should be chosen in such way that each d-dimensional unitary
cube, centered at Ỹ (f ) (t⇤k , !
˜ m ) contains in average, one element of [m0 {Ỹ (b) (t⇤k , !
˜ m0 )}
(see Section 9.5.3 for details about the selection of ↵ and H).
In our numerical examples, we use Epanechnikov’s kernel:
✓ ◆d Yd
3
(⌘) := (1 ⌘i2 )1|⌘i |1 , (3.9)
4 i=1
where ⌘ is defined as
⌘ ⌘ ⌘k (m, m0 ) := Ỹ (f ) (t⇤k , !
˜m) Ỹ (b) (t⇤k , !
˜ m0 ). (3.10)
This choice is motivated by the way in which we compute ⌘k (m, m0 ) avoiding

whenever possible to make Mk2 calculations. The support of  is perfectly adapted to
our strategy of dividing Rd into unitary cubes with vertices in Zd .
Kernel-weighted Averages for the Monte Carlo EM: as we previously men-
tioned, the only available data in the interval Ik correspond to the observed values
⇥ ⇤
of the process, X, at its extremes. Therefore, the expected values, E✓(p) Rj,Ik D
⇥ ⇤
and E✓(p) Fj,Ik D , in the formula (9.24) must be approximated by SRN-bridge sim-
ulation. To this end, we generate a set of Mk forward paths in the interval Ik using
103
✓ÎI
(p)
as the current guess for the unknown parameter ✓(p) . Having generated those
(f ) (f )
paths, we record Rj,Ik (˜
!m ) and Fj,Ik (˜
!m ) for all j = 1, 2, . . . , J and m = 1, 2, . . . , Mk
(b) (b)
as defined in Section 9.3.2. Analogously, we record Rj,Ik (˜
!m0 ) and Fj,Ik (˜
!m0 ) for all
j = 1, 2, . . . , J and m0 = 1, 2, . . . , Mk .
⇥ ⇤
Consider the following -weighted averages that approximate E✓(p) Rj,Ik D and
⇥ ⇤
E✓(p) Fj,Ik D , respectively:
P ⇣ ⌘
(f ) (b)
Rj,Ik (˜
m,m0 !m ) + Rj,Ik (˜ !m0 ) (⌘k (m, m0 )) k (m0 )
A✓ˆ(p) (Rj,Ik D; ) := P 0 0
(3.11)
m,m0 (⌘k (m, m )) k (m )
II
P ⇣ ⌘
(f ) (b)
m,m 0 F j,Ik (˜
! m ) + F j,Ik (˜
! m 0) (⌘k (m, m0 )) k (m0 )
A✓ˆ(p) (Fj,Ik D; ) := P 0 0
m,m0 (⌘k (m, m )) k (m )
II
where ⌘ (m, m0 ) has been defined in (3.10) and m, m0 = 1, 2, . . . , Mk , and k (m0 ) :=

⇣R ⌘
t
exp t⇤k cj (X̃ (b) (s, !
˜ m0 ))ds , according to Theorem 9.2.2. Observe that we generate
k
Mk forward and reverse paths in the interval Ik , but we do not control directly
the number of exact or approximate SRN-bridges that are created. The number
Mk is chosen such that either the number of SRN-bridges is of order O (Mk ) or we
reach a computational budget Mb , which is 200 in our numerical experiments. In
Section 9.5.2, we indicate an algorithm to reduce the the computational complexity
of computing those -weighted averages from O (Mk2 ) to O (Mk ).
Finally, the Monte Carlo EM algorithm for this particular problem generates a
stochastic sequence (✓ÎI )+1
(p) (0)
p=1 staring from the initial guess ✓II provided by the phase
I (3.5), and evolving by
PK
k=1 A✓ˆ(p) (Rj,Ik D; )
(p+1) II
ĉ = PK , (3.12)
k=1 A✓ˆ(p) (Fj,Ik D; )
II
⇣ ⌘
where ✓ÎI = ĉ1 , . . . , ĉJ . In Section 9.5.4, a stopping criterion based on techniques
(p) (p) (p)
widely used in Markov Chain Monte Carlo (MCMC) [9] is applied.

104
3.5.2 Summary
In this work, we addressed the problem of efficiently computing approximations of

expectations of functionals of bridges in the context of stochastic reaction networks
by extending the forward-reverse technique developed by Bayer and Schoenmakers in
[8]. We showed how to apply this technique in the statistical problem of inferring the
set of coefficients of the propensity functions. We presented a two-phase approach,
namely FREM algorithm, in which the first phase, based on reaction-rate ODEs is
deterministic and intended to provide a starting point that reduces the computational
work of the second phase, which is properly the Monte Carlo EM Algorithm. Our
novel algorithm for generating bridges provides a clear advantage over shooting meth-
ods and methods based on acceptance rejection techniques. Our work is illustrated
with numerical examples. As future work we plan to incorporate higher order kernels
and multilevel Monte Carlo methods in the FREM algorithm as well as extend the
inference methodology to the set of stoichiometric vectors, ⌫j .
105
REFERENCES
[1] B. Klar, “Bounds on tail probabilities of discrete distributions,” Probab. Eng. Inf.
Sci., vol. 14, pp. 161–171, April 2000.
[2] Y. Cao, D. Gillespie, and L. Petzold, “Avoiding negative populations in explicit

poisson tau-leaping,” The Journal of Chemical Physics, vol. 123, p. 054104, 2005.
[3] J. Ahrens and U. Dieter, “Computer methods for sampling from gamma, beta,
Poisson and bionomial distributions,” Computing, vol. 12, pp. 223–246, 1974.

[5] ——, “Multilevel hybrid Cherno↵ tau-leap,” Accepted for publication in BIT Nu-
merical Mathematics, 2015.
vol. 10, no. 1, Mar. 2012.
[7] M. Giorgio, M. Guida, and G. Pulcini, “An age- and state-dependent Markov
model for degradation processes,” IIE Transactions, vol. 43, no. 9, pp. 621–632,
2011.
[8] C. Bayer and J. Schoenmakers, “Simulation of forward-reverse stochastic repre-

sentations for conditional di↵usions,” Annals of Applied Probability, vol. 24, no. 5,
pp. 1994–2032, October 2014.
106
Chapter 4
Concluding Remarks
4.1 Summary
For path-simulation of stochastic reaction networks with error control we developed in

[1, 2, 3] stochastic algorithms for approximating the quantity of interest, E [g(X(T ))],
with the same computational complexity as an exact algorithm but with a (some-
times dramatically) smaller constant. Our path-simulation algorithms are based on
a hybrid exact-approximate adaptive strategy guided by a greedy cost-based prin-
ciple. Inspired by the work of Anderson and Higham [4], in [2] we extended our
single path-simulation hybrid method introduced in [1] to the Multilevel Monte Carlo
setting. A major challenge posed in [2] is the accurate estimation of the variances
of the observable of the process for two consecutive levels in the hierarchy of non-
nested time discretizations. By a thorough analysis of the coupling method and the
use of dual-based residual expansion we derived a computable approximation for this
quantity. The relevance of this variance estimation technique is clear since it has the
potential to mitigate the e↵ect of the large kurtosis issue, frequent in standard esti-
mators of processes with a discrete state space. Our third contribution [3] efficiently
addresses sti↵ problems by an adaptive reaction-splitting technique. Based on a low
computational-work heuristic it is able to produce hybrid path-simulations for a wide
class of problems where the scheme proposed in [4] is clearly inadequate. Our work
107
provides a precise analysis and control of the global error estimator and simulation
settings where the accuracy goal is achieved with near optimal computational work.
The inverse problem for SRNs, that is, the statistical inference problem of es-
timating the coefficients of the propensity functions is addressed in the articles [5]
and [6]. Article [5] presents an indirect inference strategy based on a Gaussian field
perturbation of the up-scaled reaction-rate ODE system associated with our SRN.
In this up-scaled model, a weighted least-squares penalized problem is solved and
its solution is tested using the Master Equation of the system. This non-sampling
inference method allows quick estimation of the propensity coefficients to be used,
for instance, in conditional based maintenance as we show by solving a real example
in wear degradation of cylinder liners. In [6], we present a direct inference method
based on building stochastic bridges for SNRs between two consecutive observations.
We extend a technique of Bayer and Schoenmakers [7] to the context of SRNs and
combine it with the celebrated Expectation-Maximization (EM) algorithm. As a re-
sult, we propose a Monte Carlo estimation strategy that can efficiently approximate
the maximum likelihood estimators of the propensity coefficients.
4.2 Future Research Work
The simulation of SRNs started with Feller and Doob [8] and has been an active area
of research since the last sixty years; specially active after the introduction of the
tau-leap method by Gillespie [9] in 2001 (see also [10]).
Simulation methods with rigorous error control for systems with hundreds or even
thousands of reaction channels and/or high number of species, are required by chem-
ical combustion, genomics, social networks and design of hydro-crackers, just to men-
tion a few disciplines. Fast algorithms for statistical inference in high dimensional
systems are also required.
108
Incorporating techniques from polynomial chaos, mean-field learning, tensorial
numerical analysis and sparse matrix computations seems to be attractive to adequate
to address those simulation and inference problems.
To the best of our knowledge, up-scaled approximation of SRNs like Chemical
Langevin difussions or reaction-rate ODEs has been proposed in hybrid simulation
schemes, but neither rigorously studied from the numerical point of view nor global
error control was performed. We believe that our dual-weighted residual expansion
techniques can be applied to obtain accurate error estimates and then our global error
control techniques can be applied to optimize the computational work.
In our immediate research plans, we have the exploration of techniques for incor-
porating implicit tau-leap methods to our hybrid schemes presented in [1, 2, 3] as
well as methods for incorporating spatial dimensions.
Regarding the statistical inference problem, we plan to extend the FREM al-
gorithm presented in [6] to the Multilevel Monte Carlo setting and to incorporate
higher-order kernels to deal with high-dimension problems. We are also planning to
extend our indirect inference methodology presented in [5] to the multidimensional
case.
109
REFERENCES
[3] ——, “A multilevel adaptive reaction-splitting simulation method for stochastic

reaction networks,” submitted to SIAM Journal of Scientific Computing, preprint
arXiv:1406.1989, 2014.
vol. 10, no. 1, Mar. 2012.

wear degradation in cylinder liners,” Multiscale Modeling and Simulation, 2014.
[6] C. Bayer, A. Moraes, R. Tempone, and P. Vilanova, “The forward-reverse algo-

rithm for stochastic reaction networks with applications to statistical inference,”
preprint, 2015.
[7] C. Bayer and J. Schoenmakers, “Simulation of forward-reverse stochastic rep-

resentations for conditional di↵usions,” Annals of Applied Probability, vol. 24,
no. 5, pp. 1994–2032, October 2014.
[8] P. Érdi and G. Lente, Stochastic Chemical Kinetics: Theory and (Mostly) Sys-
tems Biological Applications (Springer Series in Synergetics), 1st ed. Springer,
5 2014.

2001.
110
[10] J. P. Aparicio and H. Solari, “Population dynamics: Poisson approximation and
its relation to the Langevin processs,” Physical Reveiw Letters, vol. 86, no. 18,
pp. 4183–4186, Apr. 2001.
111
Part II
Included Papers
112
Chapter 5
Hybrid Cherno↵ Tau-leap

1
Alvaro Moraes, Raúl Tempone and Pedro Vilanova
Abstract
Markovian pure jump processes model a wide range of phenomena, including chem-
ical reactions at the molecular level, dynamics of wireless communication networks
and the spread of epidemic diseases in small populations. There exist algorithms
like Gillespie’s Stochastic Simulation Algorithm (SSA) or Anderson’s Modified Next
Reaction Method (MNRM) that simulate a single path with the exact distribution
of the process, but this can be time consuming when many reactions take place dur-
ing a short time interval. Gillespie’s approximated tau-leap method, on the other
hand, can be used to reduce computational time, but it may lead to non-physical
values due to a positive one-step exit probability, and it also introduces a time dis-
cretization error. Here, we present a novel hybrid algorithm for simulating individual
paths which adaptively switches between the SSA and the tau-leap method. The
switching strategy is based on a comparison of the expected inter-arrival time of the
SSA and an adaptive time step derived from a Cherno↵-type bound for the one-step
exit probability. Because this bound is non-asymptotic, we do not need to make any
1
A. Moraes, R. Tempone and P. Vilanova, “Hybrid Cherno↵ Tau-Leap”, SIAM Multiscale Mod-
eling and Simulation, Vol. 12, Issue 2, (2014).
113
distributional approximation for the tau-leap increments. This hybrid method allows
us (i) to control the global exit probability of any simulated path and (ii) to obtain
accurate and computable estimates of the expected value of any smooth observable of
the process with minimal computational work. We present numerical examples that
illustrate the performance of the proposed method.
5.1 Introduction
In this work, we present a hybrid algorithm to accurately compute
E [g(X(T ))] , (5.1)
the expected value of some given smooth function, g : Rd ! R, where X is a non-

homogeneous Poisson process taking values in Zd+ , and T is a given final time. Here,
Z+ denotes the set of non-negative integers, and the i-th component, Xi (t), describes,
for example, the number of particles of species i present in a chemical system at time
t. In that type of system, di↵erent species undergo reactions at random times by
changing the number of particles of at least one of the species. The probability of a
reaction to happen in a small time interval is modeled by a propensity function that
depends on the current state of the system.
Pathwise realizations of such pure jump processes (see [1]) can be simulated ex-
actly using the stochastic simulation algorithm (SSA) introduced by Gillespie in [2].
Independently, an equivalent kinetic Monte Carlo algorithm was developed in the
physics community in the 1960’s (see [3] for references).
Although these algorithms generate exact realizations of the Markov process, X,
they are only computationally feasible for relatively low propensities. For example,
in the SSA, at each time step, the process is simulated exactly by sampling the
next reaction to occur and the waiting time for this reaction to happen (see 5.1.2).
114
Then, the total computational work of the SSA roughly becomes proportional to the
expected value of the total propensity integrated over an SSA path (see 5.3.3). For
that reason, Gillespie proposed in [4] the tau-leap method to approximate the SSA by
evolving the process with fixed time steps, keeping the propensity fixed within each
time step. In fact, the tau-leap method can be seen as a forward Euler method for a
stochastic di↵erential equation driven by Poisson random measures (see [5]). In the
limit, as the time steps go to zero, the tau-leap solution converges to the SSA one,
(see [6]).
A drawback with the tau-leap method is that the simulated process may take
negative values, which is an undesirable consequence of the approximation and it is
not a feature of the original process. For this purpose, a Cherno↵-type bound for the
time step size is developed here. It controls the probability of taking negative values by
adjusting the time steps. Nevertheless, there are two main scenarios in which we could
obtain extremely small time steps by using the Cherno↵ bound: either in the case of
very stringent probabilities of taking negative values, or because the current state of
the tau-leap approximate process is relatively close to the boundary. On the contrary,
by using an exact step, the probability of taking negative values is obviously zero,
and, when the process is relatively close to the boundary, the expected time step size
of the exact method is usually larger than the one obtained by the Cherno↵ bound.
Therefore, to avoid extremely small time steps, we propose to switch between the
SSA and the Cherno↵ tau-leap method adaptively, creating a hybrid SSA/Cherno↵
tau-leap method. The selection of the simulation method depends on the current
state of the approximate process through the total propensity, which is a measure of
the activity of the system around the current state. Therefore, the hybrid algorithm
reveals the existence of two scales (low/high) of activity that determine whether to
choose an exact or approximate simulation method. Moreover, our Hybrid Cherno↵
tau-leap method gives accurate estimates of the global error of the approximation
115
and also its corresponding computational work.
In [7], a hybrid SSA/tau-leap algorithm is proposed. In that work, the proposed
switching rule depends on two free parameters, and it is based on the so-called leap
condition, which can be interpreted as a local time-discretization error control. While
they are focused on avoiding negative population values, the global error control and
its computational work are not treated. Methods to prevent negative values for the
tau-leap method can roughly be divided into three classes: pre-leap checks, post-leap
checks, and modifications of the Poisson distributed increments. A pre-leap check
calculates the largest possible time step fulfilling some leap criterion, often based on
controlling the relative change in the propensity function before taking the step (see
[7, 8, 9]). This is primarily aimed at reducing the local time-discretization error, but
it also reduces the probability of taking negative values. The approach presented
here includes a pre-leap check that strictly bounds the exit probability, and it is
better suited for estimating the tails of the Poisson distribution than is a standard
Gaussian approximation. In [10], an alternative post-leap check was introduced to
guarantee a non-negative population in each step. If a step leading to a negative
population has been taken, the post-leap procedure retakes a shorter step, conditioned
on already sampled data from the failed step, to avoid sampling bias. However,
this procedure may be expensive since, when computing the new step, binomial-
distributed Poisson bridges need to be simulated. A third way to prevent negative
populations is to replace the Poisson-distributed increments in the tau-leap method
with bounded increments from the binomial or multinomial distributions (see [11,
12, 13]). This technique introduces another approximation error but also imposes a
restriction on the maximum step size in order to preserve the expected value of the
tau-leap increment.
In this work, we derive a novel pre-leap check that is based on the general Cherno↵
bound used in large deviation theory [14]. More specifically, let x̄ be the state of the
116
approximate process at time t, and let 2 (0, 1) be given. We compute a time step,
⌧ = ⌧ ( , x̄), such that the probability that the approximate process reaches an non-
physical negative value in the interval [t, t + ⌧ ) is less than . Also, by bounding the
one-step exit probability by , we are able to control the probability that a whole
hybrid path exits the Zd+ lattice. Simply put, this is a global exit probability.
The global error arising from the hybrid method can be decomposed into three
components: the global exit error, the time discretization error and the statistical
error. This global error should be less than a prescribed tolerance, T OL, with proba-
bility larger than a certain confidence level. The global exit error is a quantity that is
derived from the global exit probability and therefore it can be controlled by . The
analysis and control of this component together are among the main contributions
of this work. The discretization error inherent in the tau-leap method is controlled
through a time mesh of size h (see [15]). Finally, the statistical error is controlled
by the number of hybrid paths, M , by making use of the Central Limit Theorem
([16]). The parameters , h and M are functions of T OL since they are obtained by
approximately minimizing the computational work of the hybrid method under the
constraint that the global error must be less than T OL. Here, the computational work
is measured as the amount of time needed for computing an estimate of E [g(X(T ))]
within T OL with a given level of confidence. This is known in the literature as CPU
runtime.
The methodology presented here also allows the determination of when an exact
method is preferred over the hybrid method. Similar hybrid methods have been
proposed for the regular tau-leap method (see [7]), but without the rigorous global
error estimation and control that are presented here.
117
5.1.1 The Pure Jump Process
To describe the pure jump process, X : [0, T ] ⇥ ⌦ ! Zd+ , occurring in (5.1), we

consider a system of d species interacting through J di↵erent reaction channels. For
the sake of brevity, we write X(t, !) ⌘ X(t). Let Xi (t) be the number of particles of
species i in the system at time t. We want to study the evolution of the state vector,
X(t) = (X1 (t), . . . , Xd (t)) 2 Zd+ ,
modelled as a continuous-time, discrete-space Markov chain starting at some state,

X(0) 2 Zd+ . Each reaction can be described by the vector ⌫j 2 Zd such that, for a
state vector x 2 Zd+ , a single firing of reaction j leads to the change
x ! x + ⌫j .
The probability that reaction j will occur during the small interval (t, t + dt) is then
assumed to be
P reaction j fires during (t, t + dt) X(t) = x = aj (x)dt + o (dt) , (5.2)
with a given non-negative polynomial propensity function, aj : Rd ! R. We set

/ Zd+ .
aj (x)=0 for those x such that x+⌫j 2
A process, X, that satisfies the Markov property together with (5.2) is a continuous-
time, discrete-space Markov chain that can be characterized by the non-homogeneous
Poisson process,
J
X ✓Z t ◆
X(t) = X(0) + ⌫ j Yj aj (X(s)) ds , (5.3)
j=1 0
where Yj : R+ ⇥⌦ ! Z+ are independent unit-rate Poisson processes [1]. In this work,

118
we do not assume that the species can only be transformed into other species or be
consumed like in [17]. In our numerical examples, we allow the set of possible states
of the system to be infinite, but we explicitly avoid cases in which one or more species
grows exponentially fast or blows up in the time interval [0, T ].
Remark 5.1.1. In chemical kinetics, the above setting can be used to describe well-
stirred systems of chemical species, interacting through di↵erent chemical reactions,
characterized by stoichiometric vectors, ⌫j , and polynomial propensities, aj , derived
from the mass-action principle (see [18]). Such systems are assumed to be confined
to a constant volume and to be in thermal, but not necessarily chemical, equilibrium
at some constant temperature. Other popular applications can be found in population
biology, epidemiology, and communication networks (see e.g., [19, 20]).
c
Example 5.1.2 (Simple decay model). Consider the reaction X ! ; where one
particle is consumed. In this case, the state vector X(t) is in Z+ where X denotes
the number of particles in the system. The vector for this reaction is ⌫ = 1. The
propensity functions in this case could be, for example, a(X) = c X, where c > 0.
The classical approach to chemical kinetics deals with state vectors of non-negative
real numbers representing the concentration of species at time t, usually measured in
moles per liter. In this setting, the concentrations are assumed to vary continuously
in time, according to the mass action principle, which says that each reaction in the
system a↵ects the rate of change of the species. More precisely, the e↵ect on the
instantaneous rate of change is proportional to the product of the concentrations of
the reacting species. For the simple decay example, we have the reaction rate ODE
(or mean field): ẋ(t) = cx(t) for t 2 R+ and x(0) = x0 2 R+ . In general, let
⌫ be the stoichiometric matrix with columns ⌫j , and a(x) be the column vector of
119
propensities. Then, we have
8
>
< ẋ(t) = ⌫a(x(t)), t 2 R+
(5.4)
>
: x(0) = x0 2 R+ .
5.1.2 Gillespie’s SSA Method
The SSA method simulates exact paths of X using equation (5.3). It requires the
sampling of two random variables per time step: one to find the time of the next
reaction and another to determine which is the reaction that is firing at that time.
In [2], Gillespie presented the original SSA or the direct method.
Given a state X(t), the direct method is carried out by drawing two uniform
random numbers, U1 , U2 ⇠ U (0, 1), which give the time to, and index of, the next
reaction, i.e.,
n Xk o ✓ ◆
ai (X(t)) 1 1
j = min k 2 {1, . . . , J} : >U1 , ⌧min = ln ,
i=1
a 0 (X(t)) a 0 (X(t)) U2
PJ
where a0 (x) := j=1 aj (x). The new state is X(t+⌧min ) = X(t)+⌫j , and by repeating
the above procedure until final time T , a complete path of the process, X, can be
simulated.
The drawback of this algorithm appears clearly as the sum of the intensities of all
reactions, a0 (x), becomes large: since all the jump times have to be included in the
time discretization, the corresponding computational work may become una↵ordable.
Indeed, we have that the mean value of the jump times on the interval (t, t + ⌧ ) is
approximately a0 (X(t))⌧ + o (⌧ ).
120
5.1.3 The Tau-leap Approximation
In the following, we denote by X̄ : [0, T ] ⇥ ⌦ ! Zd the tau-leap approximation of

X. To avoid the computational drawback of the exact methods, i.e., when many
reactions occur during a short time interval, the tau-leap method was proposed in
[4]: given a population, X̄(t), and a time step, ⌧ > 0, the population at time t + ⌧ is
generated by
J
X
X̄(t + ⌧ ) = X̄(t) + ⌫j Yj aj (X̄(t))⌧ , (5.5)
j=1
where {Yj ( j )}Jj=1 are independent Poisson distributed random variables with param-
eter j, used to model the number of times that the reaction j fires during the (t, t+⌧ )
interval. This is nothing else than a forward Euler discretization of the stochastic dif-
ferential equation of the pure jump process (5.3), realized by the Poisson random
measure with state dependent intensity (see [5]).
In the limit, when ⌧ ! 0, the tau-leap method gives the same solution as the exact
methods, using the property that, for a constant propensity, the firing probability in
one reaction channel is independent of the other reaction channels. The total number
of firings in each channel is then a Poisson distributed stochastic variable depending
only on the initial population, X̄(t). The error thus comes from the variation of
a(X(s)) for s 2 (t, t + ⌧ ).
5.1.4 Outline of this Work
The outline of this work is as follows. In Section 5.2, we derive and give an implemen-
tation of the Cherno↵-type bound that guarantees that the one-step exit probability
in the tau-leap method is less than a predefined quantity. We also show that the
Gaussian pre-leap selection step is not accurate and should not be used as a reliable
bound. In Section 5.3, we motivate and give implementation details of the one-step
switching decision rule, which will be the key ingredient for generating hybrid paths.
121
We show how to choose between the SSA or the tau-leap method, on the basis of the
current state of the approximated process. Next, we show how to generate hybrid
paths and to obtain an estimate of the path exit error based on the probability that
one hybrid path exits the Zd+ lattice. This estimation of the global exit probability
depends on the expected number of tau-leap steps taken by the hybrid algorithm. It is
easy to prove that this number is finite. Hybrid paths can also be used for estimating
the expected number of steps that the SSA algorithm needs in order to reach the final
time. In Section 5.4, we decompose the total error into three components, the dis-
cretization error, the statistical error and the global exit error, which were studied in
the previous section. To control these errors, we give an algorithm capable of estimat-
ing the error components. We also compute the necessary ingredients for obtaining
the desired estimate, i.e., a time mesh, a bound for the one-step exit probability and
the total number of Monte Carlo hybrid paths to be simulated. These ingredients
are computed by optimizing the expected work of the hybrid method constrained to
the error requirements. In Section 5.5, we present some numerical experiments using
well-known examples taken from the literature. Finally, in Section 5.6, we provide
conclusions and suggest directions for future work.
5.2 The Cherno↵ bound: One Step Exit Probabil-
ities
In this section, we derive a Cherno↵-type bound that helps us to guarantee that the
one-step exit probability in the tau-leap method is less than a predefined quantity,
> 0. This is crucial to controlling the computational global error, E, which is
defined below in Section 5.4. To motivate the main ideas, the bound is first derived
for the single reaction case and then generalized to several reactions. At the end of
this section, we present an algorithm that efficiently computes the step size.
122
5.2.1 The Single-reaction Case
Let Q ⌘ Q( ) be a Poisson random variable with parameter > 0. Given a non-

negative integer, n, consider the two following upper bounds for P (Q n): the Klar
bound [21] and a Cherno↵-type bound [14], which we derive below. The Klar bound
is given by
✓ ◆ 1 n
P (Q n)  1 exp( ) , (5.6)
n+1 n!
and is valid for < n + 1, while the Cherno↵ bound is given by
⇣ ⌘
P (Q n)  exp n(1 log(n/ ) ) , (5.7)
and is valid for < n; otherwise, it is trivial.

In order to prove the Cherno↵ bound (5.7), we first note that the Markov inequality
gives, for every s > 0,
⇥ ⇤
sQ sn E esQ
P (Q n) = P e e 
esn
and thus
⇣ ⌘
s
P (Q n)  exp inf { sn + (e 1)} .
s>0
When 2 (0, n), the infimum,
inf { sn + (es 1)},

s>0
is achieved at s⇤ = log(n/ ), and its value is n(1 log(n/ ) ). From this simple
calculation, we obtain the Cherno↵ bound (5.7).
Given a positive integer, n, representing the state of the system at a certain time,
and 2 (0, 1), we would like to obtain the largest value for such that P (Q( ) n) 
123
. From the Cherno↵ bound, we have
n(1 log(n/ ) )  log( ),
or equivalently
log( )
log( )  log(n) + 1. (5.8)
n n
If in the Klar bound (5.6) we neglect the factor
✓ ◆ 1
1 ,
n+1
which lies between 1 and n + 1 when 2 (0, n), then we obtain
n
exp( )  .
n!
Taking logarithms on both sides, we arrive exactly at the Cherno↵ bound (5.8). We
can see in Figure 5.1 that the Klar bound (5.6) is sharp, except when gets close
to the singularity at n + 1. The Cherno↵ bound (5.7) is not as sharp as Klar’s
bound but, as we will see in the next subsection, it has a generalization to the more
practical many-reaction case. We observe that the Gaussian approximation in Figure
5.1 performs poorly for small values of and is not a bound in general.
5.2.2 The Many-reaction Case
To the best of our knowledge, there is no simple expression for the cumulative dis-
tribution function of a linear combination of independent Poisson random variables.
For that reason, we propose a Cherno↵-type bound for estimating the maximum size
of the tau-leap step when many reactions are involved.
Consider the following pre-leap check problem: find the largest possible ⌧ such
124
1
0.1
0.01
Klar !1D"
0.001
Chernoff ! this work"
10 ! 4 Poisson ! exact "
Gaussian ! approxim ation "
10 ! 5
10 ! 6
4 6 8 10
Λ
Figure 5.1: Let n = 10 and 2 (2, 10). Here, we show the semi-logarithmic plot of
P (Q( ) n), the Cherno↵ bound exp ((n(1 log(n/ ) ))), the Klar bound and
the Gaussian approximation.
that, with high probability, the next step of the tau-leap method will take a value in
the Zd+ lattice of non-negative integers, i.e.,
J
!
X
P X̄(t) + ⌫j Yj aj (X̄(t))⌧ 2 Zd+ X̄(t) 1 , (5.9)
j=1
for some small > 0. Observe that this value of ⌧ depends on X̄(t).
Condition (5.9) can be achieved by solving d auxiliary problems, one for each
x-coordinate, i = 1, 2, . . . , d. Find the largest possible ⌧i 0, such that
J
!
X
P X̄i (t) + ⌫ji Yj aj (X̄(t))⌧i < 0 X̄(t)  i, (5.10)
j=1
where i = /d and ⌫ji is the i-th coordinate of the j-th reaction channel, ⌫j . In-
equality (5.9) is then fulfilled if we let ⌧ := min{⌧i : i = 1, 2, . . . , d}.
In the following sections, we show how to find the largest time steps, ⌧i .
125
Defining the function ⌧i (s)
Consider the random variable Qi (t, ⌧i ) representing the opposite of the increment of
the process, X̄i (t).
J
X
Qi (t, ⌧i ) := ( ⌫ji )Yj aj (X̄(t))⌧i .
j=1
Observe that Qi (t, ⌧i ) is a linear combination of J independent Poisson random vari-

ables whose intensities are multiples of ⌧i .
For all s > 0, using the Markov inequality, we obtain an upper bound for the
probabilities we want to control:
⇣ ⌘
P Qi (t, ⌧i ) > X̄i (t) X̄(t) = P exp (sQi (t, ⌧i )) > exp sX̄i (t) X̄(t)
E [exp (sQi (t, ⌧i ))] (5.11)
 .
exp sX̄i (t)
Observe that the independent Poisson random variables, Yj aj (X̄(t))⌧i , have moment-
generating functions,
Mj (s) = exp aj (X̄(t))⌧i (es 1) ,
and, therefore,
J
Y
E [exp (sQi (t, ⌧i ))] = Mj ( s⌫ji )
j=1
J
! (5.12)
X
s⌫ji
= exp ⌧i aj (X̄(t))(e 1) .
j=1
By combining (5.11) and (5.12), we obtain the Cherno↵ bound for the multi-
reaction case, namely
J
!
X
s⌫ji
P Qi (t, ⌧i )>X̄i (t) X̄(t)  inf exp sX̄i (t) + ⌧i aj (X̄(t))(e 1) . (5.13)
s>0
j=1
126
To avoid the computational problem of finding exactly the above infimum and to
guarantee that
P Qi (t, ⌧i ) > X̄i (t) X̄(t)  i ,
we proceed as follows. First, according to (5.10) and (5.13),
J
X
s⌫ji
sX̄i (t) + ⌧i aj (X̄(t))(e 1) = log( i ).
j=1
Using this fact, we can express ⌧i as a function of s:
log( i ) + sX̄i (t)

⌧i (s) = P , (5.14)
a0 X̄(t) + Jj=1 aj (X̄(t))e s ⌫ji
where
J
X
a0 (X̄(t)) := aj (X̄(t)).
j=1
Study of ⌧i (s)
In this section, we study how much we can increase ⌧i while satisfying condition
(5.10). Obviously, it is satisfied for ⌧i = 0+ . By a continuity argument, we want
to obtain ⌧i⇤ defined as the maximum ⌧i such that every point of the interval [0, ⌧i ]
satisfies (5.10). Note that ⌧i⇤ could be +1.
We discuss how, depending on certain relations among the pairs {(aj (X̄(t)), ⌫ji )}Jj=1 ,
we can conclude that ⌧i⇤ is either a real number or +1. First of all, if ⌫ji > 0, 8j,
then ⌧i⇤ must be +1, since no reaction is pointing to zero. From now on, we assume
that, given the coordinate i, there is at least one reaction pointing to zero, i.e.,
9j such that ⌫ji < 0. (5.15)

127
The denominator of (5.14) is the function,
J
X
s ⌫ji
Di (s) := a0 X̄(t) + aj (X̄(t))e , (5.16)
j=1
s⌫ji
which is convex since it is a positive linear combination of the convex functions e
plus the constant term a0 (X̄(t)). We also notice that Di (0) = 0 and Di (+1) = +1
when (5.15) holds.
On the other hand, the numerator of (5.14),
Ri (s) := log( i ) + sX̄i (t),
is a straight line crossing the vertical axis at log( i ) < 0, and we can assume that its
slope, X̄i (t), is positive. Otherwise, the X̄(t) process is at the boundary of Zd+ , and
therefore no reaction is pointing outside the lattice, Zd+ . We therefore set ⌧i⇤ = +1.
Let us define si as the root of the numerator Ri (s), i.e.,
si := log( i )/X̄i (t). (5.17)
By direct substitution of (5.17) into (5.16), we obtain
J
X ⌫ji /X̄i (t)
Di (si ) = a0 (X̄(t)) + aj (X̄(t)) i , (5.18)
j=1
and
J
X ⌫ji /X̄i (t)
Di0 (si ) = aj (X̄(t))⌫ji i . (5.19)
j=1
In order to determine whether ⌧i⇤ < 1 or ⌧i⇤ = 1, we have to analyze all possible
cases regarding the pair (Ri (s), Di (s)).
128
Indeed, note that
J
X
Di0 (0) = aj (X̄(t))⌫ji ,
j=1
and if Di0 (0) 0, which could be interpreted as a drift pointing to the boundary, then
Di (s) is monotonically increasing in [0, +1). This situation is illustrated in Figure
5.2.2: in the left panel, we see the pair (Ri (s), Di (s)); in the right panel, we see the
quotient ⌧i (s) = Ri (s)/Di (s). The function ⌧i achieves its maximum, ⌧i⇤ , at a unique
point, s̃i .
60 0.10 Τi
50
0.08
40 Num erator
30 Denom inator 0.06
20 0.04
10
0.02
s
0.1 0.2 0.3
! 10 s
0.2 0.4 0.6 0.8 1.0
[⌧i (s)]Left: Numerator Ri (s) and denominator Di (s). Right: Quotient
⌧i (s) = Ri (s)/Di (s). Both plots are for the case Di0 (0) 0.
If Di0 (0) < 0, which can be interpreted as a drift pointing to +1, the value of
⌧i⇤ depends on X̄i (t), i.e., on the size of the slope of Ri (s). Observe that Di (s) is
then negative in an interval (0, di ), with Di (di ) = 0, and in general there is not a
closed form for di . Also, since Di (s) and Ri (s) may have opposite signs for some
s  max(si , di ), this allows for artificially negative values of ⌧i , which should not be
taken into account.
The value of ⌧i⇤ is finite or +1 according to the sign of Di (si ). These three cases
are shown in the left panel of the Figure 5.2. When X̄i (t) is large enough, i.e., when
Di (si ) < 0, we can see in the right panel of Figure 5.2 that ⌧i ⇤ = +1. This is true
because the limit of ⌧i (s), as s ! d+
i , is +1. Therefore, if X̄i (t) is far from the
boundary and the drift is pointing to +1, we can take ⌧i to be as large as we wish.
The two other cases are as follows: if Di (si ) > 0, it means that X̄i (t) is, in a
129
Numerator R1 i
40
Numerator R2 i 1.0
Τi
20 Numerator R3 i
0.5
Denominator
s s
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1.0
! 20 ! 0.5
PJ
Figure 5.2: In this case j=1 aj (X̄(t))⌫ji > 0. Left: relative positions of
(Ri (s), Di (s)), depending on the sign of Di (si ). Right: ⌧i (s) = Ri (s)/Di (s) in the
case Di (si ) < 0.
certain sense, close to the boundary, and even if the drift is pointing to +1, there
exists an upper bound for ⌧i . This is illustrated in the left part of Figure 5.3, where
⌧i⇤ is the maximum to the right of si . Finally, if Di (si ) = 0, then ⌧i⇤ can be obtained
as the limit of ⌧i (s) as s ! d+ ⇤ 0
i . By l’Hôpital’s rule, we have that ⌧i = X̄i (t)/Di (si ).
0.10 Τi 1.0
0.08 0.8
Τi
0.6
0.06
0.4
0.04
0.2
0.02
s
0.2 0.4 0.6 0.8 1.0
s ! 0.2
0.2 0.4 0.6 0.8 1.0
PJ
Figure 5.3: The other two cases for ⌧i (s) when j=1 aj (X̄(t))⌫ji > 0. Left: Di (si ) > 0.
Right: Di (si ) = 0.
We can summarize the previous discussion as follows: If ⌫ji 0, for all j, then
⌧i⇤ = +1; otherwise, we have the following three cases:
1. Di (si ) > 0. In this case, ⌧i (si ) = 0 and Di (s) is positive and increasing as
8s si . Therefore, ⌧i (s) is equal to the ratio of two positive increasing functions.
The numerator, Ri (s), is a linear function and the denominator, Di (s), grows
exponentially fast. Then, there exist an upper bound, ⌧i⇤ , and a unique number,
s̃i , which satisfies ⌧i (s̃i ) = ⌧i⇤ . We develop an algorithm for approximating s̃i ,
130
using the relation ⌧i0 (s̃i ) = 0.
2. If Di (si ) < 0, then ⌧i⇤ = +1.
3. If Di (si ) = 0, then ⌧i⇤ = X̄i (t)/Di0 (si ).
Approximating s̃i
In this section, we present a simple and fast algorithm for approximating s̃i , which
was defined in case (1) above. We proceed in two steps. In the first step, we find
an initial guess, s⇤i,0 , and in the second one, we improve this guess and obtain s⇤i,1 .
Therefore, ⌧i⇤ =⌧i (s̃i ) will be approximated by ⌧i (s⇤i,1 ).
From (5.15), the equation ⌧i0 (s) = 0 is equivalent to
J
X J
X
a0 (X̄(t))+ aj (X̄(t)) exp( s⌫ji ) = (s si ) aj (X̄(t))( ⌫ji ) exp( s⌫ji ). (5.20)
j=1 j=1
Let us define
⌫ji /X̄i (t)

ŝ := s si , and bji (X̄(t)) := aj (X̄(t)) i > 0.
As a consequence,
⌫ji /X̄i (t)
exp( s⌫ji ) = i exp( ŝ⌫ji ),
and therefore (5.20) can be written as
J
X J
X
bji (X̄(t)) exp( ŝ⌫ji ) = a0 (X̄(t)) + ŝ bji (X̄(t))( ⌫ji ) exp( ŝ⌫ji ). (5.21)
j=1 j=1
Once we introduce the auxiliary functions ji ,
ji (y) := exp( ⌫ji y)(1 + ⌫ji y),

131
(5.21) becomes equivalent to finding s̃i such that G(s̃i ) = 0, where
J
X
G(y) = a0 (X̄(t)) + bji (X̄(t)) ji (y).
j=1
The left graph in Figure 5.4 shows the shape of ji depending on the sign of ⌫ji . We
deduce that G is a decreasing function such that G(0) = D(si ) and G(+1) = 1.
1.0 30
20
0.5
10
0.5 1.0 1.5 2.0 s

! 0.1 0.1 0.2 0.3
! 0.5 Ν postive ! 10 G! s"
Ν zero ! 20
G! intial guess"
! 1.0
Ν negative parabola
! 30
Figure 5.4: Left: function (s) for di↵erent values of ⌫. Right: function G and its
approximating parabola.
By neglecting the exponential term in ji , we can obtain an initial guess for s̃i ,
i.e., P
a0 (X̄(t)) + Jj=1 bji (X̄(t)) Di (si )
s⇤i,0 = PJ = 0 .
j=1 bji (X̄(t))( ⌫ji )
Di (si )
As we observed in case (1), the values for Di (si ) and Di0 (si ) are positive, and our
initial guess, s⇤i,0 , is a positive number.
In the right graph in Figure 5.4, we can see that the parabola obtained as the
second-order approximation of G at s⇤i,0 is a good approximation of G close to its
root, s̃i . Therefore, we obtain s⇤i,1 as the largest root of the approximating parabola.
By evaluating ⌧i (s⇤i,1 ), we obtain a sharp lower bound of sups>0 ⌧i (s).
An expression for s⇤i,1 in terms of G and its derivatives up to the second order
evaluated at s⇤i,0 is given by
⇣ q ⌘
s⇤i,1 = s⇤i,0 + G 0
(s⇤i,0 ) + G0 (s⇤i,0 )2 2G00 (s⇤i,0 )G(s⇤i,0 ) /G00 (s⇤i,0 ). (5.22)
132
An efficient implementation for computing ⌧i (s⇤i,1 ) ⇡ ⌧i⇤ can be found in Algorithm
5 (see the definition of ⌧i⇤ in case (1) at the end of 5.2.2).
Algorithm 5 Computes the Cherno↵ tau-leap step size. Inputs: the current state
of the approximate process, X̄, the propensity functions evaluated at X̄, (aj (X̄))Jj=1 ,
and the stoichiometric matrix, ⌫ji . Output: ⌧ . Notes: for a fixed coordinate, i, such
that (5.15) is fulfilled (otherwise ⌧i = + 1), this algorithm determines whether of not
⌧i⇤ is finite. When ⌧i⇤ is finite, this algorithm computes an approximation for ⌧i (s⇤i,1 )
based on (5.22).
PJ
1: for i=1 to d do
2: if 9j : ⌫ji < 0 and X̄i (t) > 0 then
3: x X̄i (t)
⌫ /x
4: bj a ji
PjJ i
5: b̂ j=1 bj
6: if b̂ a0 < 0 then
7: ⌧i +1
8: else
9: if b̂ a0 > 0 then P
10: s (a0 b̂)/ Jj=1 bj ⌫ji
11: ⇠j bj exp( s⌫ji )
PJ p
12: cp j=1 ⇠j ⌫ji , p = 0, 1, 2, 3
1
13: ↵ (c s c2 )
2 3
14: c2 s
15: a0 + c 0 p+ c1 s
16: s s ( + 2 4↵ )/2↵
PJ
17: ⌧i sx/( a0 + j=1 bj exp( s⌫ji ))
18: else P
19: ⌧i x/ Jj=1 bj ⌫ji
20: end if
21: end if
22: else
23: ⌧i +1
24: end if
25: end for
26: return min{⌧1 , . . . , ⌧d }
133
5.2.3 Computational Work of the Pre-Leap Methods: Cher-
no↵ Bound versus Gaussian Approximation
In this section, we first summarize an alternative pre-leap method, introduced in [15],

which uses a Gaussian-type approximation. We then compare the algorithm that
computes the Cherno↵ step size with the one that computes the Gaussian-type step
size, ⌧gau .
Given > 0, we want to find the largest ⌧gau such that
⇣ ⌘
P X̄i (t + ⌧gau ) < 0 X̄(t)  , i=1, . . . , d. (5.23)
Using (5.5), we get
P X̄i (t) Qi (t, ⌧gau ) < 0 X̄(t) = P Qi (t, ⌧gau ) > X̄i (t) X̄(t)
=1 P Qi (t)  X̄i (t) X̄(t)  ,
PJ
where Qi (t, ⌧gau ) := j=1 ⌫ji Yj (aj (X̄(t))⌧gau ).
Now, we approximate Qi (t, ⌧gau ) by
q
Q̂i (t, ⌧gau ) := E [Qi (t, ⌧gau )] + Var [Qi (t, ⌧gau )]N ,
where N ⇠ N (0, 1) is a standard normal random variable. We get
0 PJ 1
⇣ ⌘ X̄i (t)+ ⌫ji aj (X̄(t))⌧gau
@ j=1 A,
P Q̂i (t)  X̄i (t) X̄(t) = qP
J 2
j=1 ⌫ji aj (X̄(t))⌧gau
where is the cumulative density function for the standard normal distribution.
Finally, let z satisfy (z ) = 1 . Then, the ⌧gau that approximately solves (5.23)
134
is obtained from
v
J u J
X uX
X̄i (t)+ ⌫ji aj (X̄(t))⌧gau =z t 2
⌫ji aj (X̄(t))⌧gau .
j=1 j=1
Algorithm 6 efficiently computes the step size, ⌧gau , using the Gaussian approxi-
mation.
Algorithm 6 Computes the tau-leap step size using a Gaussian approximation. In-
puts: the current state of the approximate process, X̄, the propensity functions eval-
uated at X̄, (aj (X̄))Jj=1 , and the stoichiometric matrix, ⌫ji . Output: ⌧ . Notes: for a
fixed coordinate, i, this algorithm determines whether of not ⌧i⇤ is finite. When ⌧i⇤ is
finite, this algorithm computes its value.
PJ
Require: j=1 aj > 0
1: for i=1 to d do
2: x X̄
P i (t)
J p
3: cp j=1 aj ⌫ji , p = 1, 2
4: ⇢ z 2i
5: ↵ ⇢2 c22 4⇢c1 c2 x
6: if c2 = 0 or (c1 > 0 and ↵ < 0) then
7: ⌧i +1
8: else
9: if c2 6= 0 and (c1 < 0 or (c1 > 0 and ↵ 0)) then
10: ⇢c2 p 2c1 x
11: ⌧i ( ↵)/2c21
12: else
13: ⌧i x2 /⇢c2
14: end if
15: end if
16: end for
17: return min{⌧1 , . . . , ⌧d }
To quantify the relative efficiency of Algorithm 5 vs Algorithm 6, we use the

following nominal operation count convention (based on McMahon [22]): add-mul,
subtraction, and division 1 flop; square root 4 flops; and exp function 8 flops. We do
not count the flow control work, and we assume d=1 because it is easily extended to
d>1. Moreover, we are not taking into account the memory access cost, which usually
is dominant. The total flop count for Algorithm 5 is 33 + 26J, and for Algorithm
135
6 it is 19 + 2J. The ratio tends to 13 when J ! 1. However, the actual runtime
in the MATLAB implementation is, in all the examples we tested, more optimistic
than the predicted using the flop count. Empirically, we observed that the dominant
computational work of the hybrid algorithm at each step is due to the simulation of
a Poisson random variable (see [23] for details). The additional work of computing
the Cherno↵ step size is, in fact, almost negligible.
In Figure 5.5, we show the comparison between the Cherno↵ bound and the Gaus-
sian approximation for the simple decay model, with initial condition X0 =100 (see
Section 5.5). The Cherno↵ bound appears to be conservative and it holds for any ,
which is not the case for the Gaussian approximation, whereas their computational
work are of the same order. We can see that in the Gaussian case, the approximation
does not attain the required one-step exit probability, with a confidence level of 95%,
for most .
Empirical delta vs delta. Chernoff case. Decay model Empirical delta vs delta. Gaussian case. Decay model
−2
10
δ
empirical δ
Empirical delta with 95% CI
Empirical delta with 95% CI
−4
10
−4
10
−6
10
δ
−6 empirical δ
10
−6 −4 −6 −4
10 10 10 10
δ δ
Figure 5.5: The Cherno↵ bound vs. the Gaussian approximation in the simple decay
model example, with initial condition X0 =100 (see Section 5.5). Left: The empirical
one-step exit probability bound with asymptotic confidence intervals (95%) versus
a reference line with a unit slope (solid line) for the Cherno↵ tau-leap. Missing
confidence intervals means that the values are zero or negative. Right: The Gaussian
approximation case. We can observe that the Cherno↵ bound holds for any , with a
confidence level of 95%, which is not the case for the Gaussian approximation.
136
5.3 The One-step Switching Rule and Hybrid Tra-
jectories
In this section, we first present a one-step switching rule that, given the current state
of the approximate process, X̄(t), adaptively determines whether to use an exact
or an approximated method for the next step. Then, we present an algorithm for
simulating a whole hybrid path. This algorithm consists of a certain number of exact
and approximate steps. Next, we estimate the probability that one hybrid path exits
the lattice, Zd+ , which is an event that depends on the expected number of tau-leap
steps, as we will see. Finally, we show how to estimate, based only on hybrid paths,
the expected number of steps of a pure SSA path.
5.3.1 The One-step Switching Rule Algorithm
Here, we provide a justification for the one-step switching rule algorithm, as described
in Algorithm 7.
Let x = X̄(t) be the current state of the approximate process, X̄. Therefore, the
expected time step of the SSA algorithm is given by 1/a0 (x). Let ⌧Ch =⌧Ch (x, ) be the
Cherno↵ tau-leap step, obtained using Algorithm 5. To move one step forward using
the SSA method, we should compute at least a0 (x) and sample two uniform random
variables. On the other hand, to move one step forward using the Cherno↵ tau-leap
method, we not only have to compute ⌧Ch (discussed at the end of Section 5.2), but we
also have to generate J Poisson random variables, where J is the number of reaction
channels. It is critical to observe that the computational work of generating J Poisson
random variables is much larger than the computational work of generating only two
uniform random variables. This computational work could be measured, for example,
as the average execution time for the operations involved in it.
We now describe K1 and K2 . In order to avoid the overhead caused by unnecessary
137
computations of ⌧Ch , we first estimate the computational work of moving forward from
the current time, t, to the next grid point, T0 , by using the SSA method. If this work is
less than the work of computing ⌧Ch , we take an exact step. This motivates us to define
K1 as the ratio between the work of computing ⌧Ch and the work of computing a0 (x)
plus sampling two uniform random variables. Otherwise, we compute ⌧Ch and decide
whether to take an SSA step or a tau-leap one, according to the comparison between
⌧Ch and K2 /a0 (x). Here K2 =K2 (x, ) is defined as the work of taking a Cherno↵ tau-
leap step given the current state of the process, divided by the work of taking an SSA
step plus the work of computing ⌧Ch . As we mentioned, associated with each type of
step, there is computational work. In the first case, when K1 /a0 (x) > T0 t, the work
is C1 , and includes the computation of 1/a0 (x) and the generation of two uniform
random variates. In the same way, when K1 /a0 (x) > T0 t and K2 /a0 (x) > ⌧Ch , the
work is C2 , and involves the work contained in C1 and of computing ⌧Ch (x, ), which
is denoted by C3 . On the other hand, when a Cherno↵ tau-leap step is taken, we have
not only the constant work, C3 , but also variable work, which is the work of generating
the Poisson random variates. The latter work is a function of the propensities of all
the reaction channels, namely, a(x)⌧Ch (x, ). We model the computational work of
generating one Poisson random variate according to [23], and this work is denoted
by CP (·). In the Gamma simulation method developed by Ahrens and Dieter in [23],
which is used by MATLAB, the work grows like b1 +b2 ln where > 15 is the rate
of the Poisson random variable. For  15, the growth is linear. In practice, it is
possible to estimate b1 and b2 using a Monte Carlo method with a least squares fit,
as shown in Figure 5.6.
PJ
C3 C3 + j=1 CP (aj (X̄(t))⌧Ch (X̄(t), ))
Summarizing, K1 := C1
, and K2 (X̄(t), ) := C1 +C3
.
C3 +Jb1
Observe that K2 (x, ) ! C1 +C3
=: C̃ > 0 as ! 0.
Here, we estimate the coefficients (o✏ine precomputed, machine dependent quan-
tities) C1 , C2 , C3 , b1 , and b2 by computing average execution times of the correspond-
138
Poisson random variates computational work model Poisson random variates computational work model
−4 −4
x10 x10
Actual simulation runtimes
6 Least squares fit
5
CP(λ)
CP(λ)
4 2
2 Actual simulation runtimes

Least squares fit
1
500 1000 1500 0 5 10 15
λ λ
Figure 5.6: Left: The computational work (runtime) model for generating a Poisson
random variate, using the Gamma method by Ahrens and Dieter [23]. Right: Linear
growth detail, for 2 [0, 15].
ing machine code block (in this case MATLAB code).
Algorithm 7 The one-step switching rule. Inputs: the current time, t, the current
state of the approximate process, X̄(t), the propensity functions, (aj (X̄(t)))Jj=1 , and
⇥ ⇤
the next grid point, T0 . Outputs: method and ⌧ . Notes: based on E ⌧SSA (X̄(t)) =
1/a0 (X̄(t)) and ⌧Ch (X̄(t), ), this algorithm adaptively selects which method to use:
SSA or TL. We denote by ⌧SSA (⌧Ch ) the step size when the decision is to use the
SSA (tau-leap) method.
PJ
1: if K1 /a0 < T0 t then
2: ⌧Ch Algorithm 5
3: if ⌧Ch < K2 (X̄(t), )/a0 then
4: return (SSA, ⌧SSA )
5: else
6: return (T L, ⌧Ch )
7: end if
8: else
9: return (SSA, ⌧SSA )
10: end if
We now briefly describe Algorithm 7. The first decision is made through the
comparison between the expected SSA step size and the remaining time until the
next grid point, T0 . To interpret this rule, we first assume that T0 t tends to zero.
Then, the selected method tends to be SSA. This decision rule favors SSA over tau-
leap and trivially guarantees the Cherno↵ bound. In the case of problems where
139
the SSA method is more convenient, the advantage is obvious: it is not necessary
to superfluously compute the tau-leap step size. On the other hand, this choice has
“reasonable” computational work in terms of choosing SSA over tau-leap, since there
is little time left until T0 . Now assume that K1 /a0 tends to infinity; that is, a0 tends
to zero. Then, the reasonable choice is SSA, because the Cherno↵ tau-leap step size
tends to zero, in this case. It should be noted that this first decision rule has no extra
computational work, because a0 must be computed anyway. If K1 /a0 < T0 t holds,
then the tau-leap size is computed and the second decision is made (line 3). In this
case, first assume that ⌧Ch tends to zero. Then, the selected method tends to be SSA,
which is a natural choice. If, on the contrary, ⌧Ch tends to infinity, the chosen method
tends to be the tau-leap, which again is a natural choice. Now, assume that K2 /a0
tends to infinity. Then, a reasonable choice is SSA, because the step size is large and
the bound is guaranteed. If K2 /a0 tends to zero, the reasonable choice is tau-leap.
A summary of the one-step switching rule decisions is given in Table 5.1.
aa
aa
tends to
If aa 1 0
aa
Decision 1 T0 t go to Decision 2 SSA
K1 /a0 SSA TL
Decision 2 ⌧Ch TL SSA
K2 /a0 SSA TL
Table 5.1: One-step switching rule summary. Decision 1 is made at line 1 of algorithm
7, whereas decision 2 is made at line 3.
Remark 5.3.1. In Figure 5.3.1, we illustrate the result of the one-step switching
rule in the Gene Transcription and Translation model (see Section 5.5). As (the
parameter that controls the one-step exit probability) decreases, the SSA region, in the
state space of the problem, increases. We observe that, for = 10 2 , almost all the
state space is a Cherno↵ tau-leap region. For smaller , we observe that, if the number
of proteins (y-axis) is zero, and the number of mRNA’s (x-axis) is large enough, the
states belong to the tau-leap region, because the propensity of the reactions pointing
140
outside the lattice is weaker than the propensity of the reactions pointing inside the
lattice. When the number of proteins increases, there is a narrow region in which the
propensity of the reactions pointing out dominates, and consequently, the switching
rule chooses for the SSA method. After that, the Cherno↵ tau-leap is preferred. The
situation is almost symmetric in the x = y axis.
Chernoff tau−leap and SSA regions Chernoff tau−leap and SSA regions Chernoff tau−leap and SSA regions
40 40 40
Tau−leap Tau−leap Tau−leap
35 SSA 35 SSA 35 SSA
30 30 30
25 25 25
Proteins
Proteins
Proteins
20 20 20
15 15 15
10 10 10
5 5 5
0 0 0
0 10 20 30 40 0 10 20 30 40 0 10 20 30 40
mRNA mRNA mRNA
Figure 5.7: Regions of the one-step switching rule in the gene transcription and
translation model (see Section 5.5). The blue and red dots show the Cherno↵ tau-
leap and the SSA regions, respectively. From left to right, = 10 2 , 10 4 , 10 6 ,
respectively.
Remark 5.3.2. According to Algorithm 7, the selection of the simulation method

depends on the current state, x, of the approximate process, X̄, through the total
propensity, a0 (x), which is a measure of the activity of the system around the state,
x. High activity around x leads to the Cherno↵ tau-leap method. Therefore, Algorithm
7 reveals the existence of two scales (low/high) of activity that determine whether to
choose an exact or approximate simulation method. Observe that the scale of activity
depends not only on x but also on the one-step exit probability bound, , through the
Cherno↵ step size, ⌧Ch , and the time grid.
5.3.2 The Hybrid Algorithm
In this subsection, we present a novel algorithm that adaptively switches between the
approximate (Cherno↵ tau-leap) and the exact (SSA) method to generate a whole
141
hybrid path. Algorithm 8 presents this idea.
On the one hand, a path generated by an exact method never exits the lattice,
Zd+ , although the computational work could be una↵ordable due to many small inter-
arrival times typically occurring when the process is “far” from the boundary. On the
other hand, a tau-leap path, which may be cheaper than an exact one, could leave
the lattice at any step. It depends on the size of the next time step and the current
state of the approximate process, X̄(t). This one-step exit probability could be large,
especially when the approximate process is “close” to the boundary. In Section 5.2
we show how to control this one-step exit probability adaptively, by adjusting the
tau-leap step size. As we previously mentioned, a hybrid path consists of a certain
number of exact and approximate steps. A hybrid path could therefore leave the
lattice. In Section 5.3.3 we show how to estimate and control the probability of this
event.
Given a problem, Algorithm 8 returns the last system state, X̄(tK ), and its respec-
tive time tK , such that the process belongs to the lattice. At each time, tk , Algorithm
7 chooses the method to use (exact or approximate) for taking the (k+1) th step
and its size.
5.3.3 The Global Exit Probability Bound
Once we introduce the hybrid approximate process, X̄, one issue is to estimate the
¯ be the sample space for
probability that one hybrid path exits the lattice, Zd+ . Let ⌦
the set of all hybrid paths generated by Algorithm 8. The event A = {¯ ¯ : tK =T }
!2⌦
k=0 , belongs to the lattice, Z+ . Among

¯ ))K d
means that the whole hybrid path, (X̄(tk , !
!) ⌘
these paths, the number of successful leaps using the tau-leap method is NT L (¯
NT L . Then,
¯ = Ac [ A = {¯
⌦ ¯ : tK < T } [ {¯
!2⌦ ¯ : tK = T }.
!2⌦
142
Algorithm 8 The hybrid tau-leap algorithm. Inputs: the initial state, X(0), the
propensity functions, (aj )Jj=1 , the stoichiometric vectors, ⌫=(⌫j )Jj=1 , and the final
time, T . Outputs: a sequence of states, (X̄(tk ))K k=0 ⇢ Z+ such that tK  T . If
d
tK < T , then the path exited the Z+ lattice before the final time T . It also returns the
d
number of times, NT L , the tau-leap method was successfully applied (i.e., X̄(tk ) 2 Zd+ ,
apply the tau-leap method obtaining X̄(tk+1 ) 2 Zd+ ) , the number of SSA steps such
that K1 /a0 (X̄(t)) > tk t is true, NSSA,K1 , and the number of SSA steps such that
K1 /a0 (X̄(t)) > tk t is false and K2 (X̄(t))/a0 (X̄(t)) > ⌧Ch is true, NSSA,K2 (see
Algorithm 7). Notes: given the current state, nextSSA computes the next state using
the SSA method. Here, ti denotes the current time at the i-th step.
1: i 0, ti 0, X̄(ti ) X(0), Z̄ X(0)
2: while ti < T do
3: T0 next grid point greater than ti
4: (m, ⌧ ) Algorithm 7 with (ti , Z̄, (aj (Z̄))Jj=1 , T0 )
5: if m = SSA then
6: NSSA NSSA + 1
7: if ti +⌧ < T then
8: Z̄ nextSSA (Z̄)
9: end if
10: ti+1 min{T, ti +⌧ }
11: else
12: ⌧ min{⌧, T ti }
13: Z̄ Z̄ + P(a(Z̄)⌧ )⌫
14: if Z̄ 2 Zd+ then
15: NTL NTL + 1
16: ti+1 ti + ⌧
17: else
18: return ((X̄(tk ))ik=0 , NTL , NSSA )
19: end if
20: end if
21: i i+1
22: X̄(ti ) Z̄
23: end while
24: return ((X̄(tk ))ik=0 , NTL , NSSA )
P
Let tk = k ⌧k , where each ⌧k is obtained using either SSA or the tau-leap method,
! 2 A} , {9k 2 N : tk = T }
{¯
+1
[
, ({9k 2 N : tk = T } \ {NT L = n}) .
n=0
143
Then, we can write
+1
X
P (A) = P ({9k 2 N : tk = T, NT L = n})
n=0
+1
X
= P {9k 2 N : tk = T NT L = n} P (NT L = n) .
n=0
At each step, the probability of exiting the lattice will be less than when the step
size is computed using the Cherno↵ method (Algorithm 5), and it will be equal to
zero if the SSA is adopted. At this stage, it should be pointed out that if we use the
Gaussian approximation (Algorithm 6), it will not be possible to guarantee an upper
bound for the probability of event A. Observe that
P ({9k 2 N : tk = T }) = P X̄(t0 )2Zd+ , X̄(t1 )2Zd+ , . . . , X̄(tk )2Zd+
= P X̄(t1 )2Zd+ X̄(t0 ) P X̄(t2 )2Zd+ X̄(t1 ) . . . P X̄(tk )2Zd+ X̄(tk 1 ) ,
where the notation P Y 2 Zd+ X assumes that X 2 Zd+ . Now, by construction,
P {9k 2 N : tk = T NT L = n} (1 )n
because
P X̄(tj )2Zd+ X̄(tj 1 ) 1 if we use the Cherno↵ algorithm
= 1 if we use the SSA.
That is, if the path reached time T , and NT L = n, then the Cherno↵ algorithm was
successfully applied n times. By definition,
+1
X ⇥ ⇤
P (A) = P {9k 2 N : tk = T NT L = n} P (NT L = n) E (1 ) NT L .
n=0
144
Moreover, for small values of , using a second-order Taylor approximation for the
function (1 )NT L and taking expectations, we obtain the following:
⇥ ⇤ 2 ⇥ ⇤
E (1 ) NT L = 1 E [NT L ] + (E NT2 L E [NT L ]) + o( 2 ).
2
Finally, we arrive at the desired path exit probability,
2 ⇥ ⇤
P (Ac )  E [NT L ] (E NT2 L E [NT L ]) + o( 2 ).
2
In practice, we use E [NT L ] as an upper bound of P (Ac ) since is very small

and Var [NT L ] is moderate. In Appendix 5.A, we prove that E [NT L ] is bounded for
polynomial propensity functions and tends to zero when ! 0.
Remark 5.3.3 (Hybrid estimation of E [NSSA⇤ ]: the expected number of steps of a

pure SSA path). In the SSA algorithm, the expected time spent in the state X(s),
namely t|X(s), is an exponential random variable with intensity a0 (X(s)). There-
fore, the quantity
Z T Z T
1
a0 (X(s))ds = ds
0 0 E [ t|X(s)]
is an approximation of the number of steps of an exact path, (X(s))0sT . By sam-

⇣R ⌘
T
pling M hybrid paths, we have that the sample mean, A 0 a0 (X̄(s))ds; M , defined
PM
by A (Y ; M ) := M1 i=1 Y (!i ), is an estimator of E [NSSA ].
⇤
This allows us, for example, to approximate CostSSA (T OL), i.e., the computa-
tional work that the SSA method requires to estimate E [g(X(T ))] for a given tolerance
(T OL). This remark is used below in Algorithm 9.
145
5.4 Error Decomposition, Estimation and Control
In this section, we define the computational global error, E, and show how it can be
naturally decomposed into three components: the discretization error, EI , and the
exit error, EE , both coming from the tau-leap part of the hybrid method, and the
Monte Carlo statistical error, ES . Next, we show how to model and control the global
error, E, giving upper bounds for each one of the three components. Finally, given
a prescribed tolerance, T OL, we present a procedure for obtaining the parameters
needed for estimating E [g(X(T ))] by sampling hybrid paths. These parameters are
the time mesh, (tk )K
k=0 (T OL), the one-step exit probability bound, (T OL), and the
number of Monte Carlo samples, MHyb (T OL).
5.4.1 Global Error Decomposition
As we already mentioned, the main goal of this work is to estimate accurately and
efficiently the expected value E [g(X(T ))], where X : [0, T ]!Zd+ is a Markov pure
jump process and g : Rd !R is a smooth observable of the process at final time T .
We propose the following estimator:
M
1 X
g(X̄(T ))1A (¯
!m ), (5.24)
M m=1
where X̄ : [0, T ]!Zd is the hybrid approximate process introduced in Section 5.3.2,
¯ The set A ⇢ ⌦
¯ 2 ⌦.
and ! ¯ was defined in Section 5.3.3. We recall that 1A (¯
!m ) = 1
if and only if the m-hybrid path did not exit Zd+ .
We define the computational global error, E, as
M
1 X
E := E [g(X(T ))] g(X̄(T ))1A (¯
!m ). (5.25)
M m=1
146
We can split E into three parts:
1
PM ⇥ ⇤
E [g(X(T ))] M m=1 g(X̄(T ))1A (¯
!m ) = E g(X(T )) g(X̄(T )) 1A +
| {z }
=:EI
M
1 X ⇥ ⇤
E [g(X(T ))1Ac ] + E g(X̄(T ))1A g(X̄(T ))1A (¯
!m ) .
| {z } M m=1
=:EE | {z }
=:ES
Here, EI and ES are the discretization and Monte Carlo statistical errors, respec-
tively, and they are associated with the hybrid paths, X̄ on A. EE is the global
exit error. We observe that the error term, EE , is defined as the expected value of
¯ More specifically, we set
g(X(T ))1Ac , which is a random variable defined on ⌦ ⇥ ⌦.
EE such that
|EE | = min |EE (P )|,
P 2P
¯ By choosing P 2 P as the
where P is the set of all probability measures on ⌦ ⇥ ⌦.
product probability measure, we have that g(X(T )) and 1Ac are independent random
variables. As a consequence,
|EE |  |E [g(X(T ))] | P (Ac ) .
An approximate upper bound, B, for |E [g(X(T ))] | could be obtained, for instance,
as the 95% quantile of a bootstrap sample for |A (g(X(T ); ·) |. As we showed in
Section 5.3.3, P (Ac ) can be approximated by E [NT L ]. Therefore, B A (NT L ; ·) is
an approximated upper bound for |EE |, where A (NT L ; ·) is the estimator of E [NT L ].
⇥ ⇤
The discretization error, EI = E g(X(T )) g(X̄(T )) 1A , is actually the weak
error associated with the hybrid paths in A. An efficient procedure for accurately
estimating this quantity in the context of the tau-leap method is described in [17].
This procedure computes EI (¯ ¯ ))K
! ) for every simulated hybrid path, (X̄(tk , ! k=0 , as a
weighted sum of local errors at the mesh times, (tk )K

k=0 . The sequence of weights,
147
! ))K
('k (¯ k=1 considered in [17], is defined as the duals motivated by approximate vari-
ations of g(X̄(T )) with respect to the initial data. According to this method, EI is
approximated by A (EI (¯
! ); ·). We adapt this method in Algorithm 11 for estimating
the weak error in the hybrid context. A brief description follows. For each hybrid
path, we compute backwards the sequence of dual weights:
'K = rg(X̄K )
'k = Id + ⌧k JTa (X̄k ) ⌫ T 'k+1 , k = K 1, K 2, . . . , 1,
where r is the gradient operator and Ja (X̄k )=[@i aj (X̄k )]j,i is the Jacobian matrix of
the propensity function, aj , for j=1 . . . J and i=1 . . . d. Then, we have,
K J
!
X ⌧k X
EI (¯
!) = 'k 1T L (k) ⌫jT aj,k (¯
! ).
k=1
2 j=1
Here, X̄k ⌘ X̄(tk ), ⌧k =tk+1 tk , aj,k =aj (X̄k+1 ) aj (X̄k ), 1T L (k)=1 if and only if,
at time tk , the tau-leap method was used and Id is the d ⇥ d identity matrix.
We model the Monte Carlo statistical error, ES , as a Gaussian random variable
that has zero mean and variance Var [g(X(T ))] /M , which could be controlled by
obtaining a rough estimate of Var [g(X(T ))]. The sample variance is denoted as
p
S 2 (Y ; M ) :=A (Y 2 ; M ) A (Y ; M )2 . Therefore, CA S 2 (g(X(T )); ·) /M is used as
an estimation of ES , where CA 2 is a desired confidence level.
5.4.2 Error Estimation and Control
Given a tolerance, T OL, we would like to have a procedure that determines whether
we should use the SSA method or the hybrid one. This decision should be based on the
expected computational work of both methods, and the procedure should provide, in
any case, the necessary elements for computing the estimator. When the SSA method
148
is chosen, the procedure should provide the number of simulations, MSSA (T OL). On
the contrary, when the hybrid method is chosen, the procedure should provide not
only the number of simulations, MHyb (T OL), but also the time mesh, (tk )K
k=0 (T OL),
and the one-step exit probability bound, (T OL). Let us describe such a decision
procedure. The building block of a hybrid path is Algorithm 7, which adaptively
determines whether to use an SSA step or a tau-leap one. According to this algorithm,
given the current state of the approximate process, x, there are two ways of taking an
SSA step, depending on the logical conditions K1 /a0 (x) > T0 t and K2 (x, )/a0 (x) >
⌧Ch . The first way is when K1 /a0 (x) > T0 t is true. In this case, we take an SSA step
and avoid the computation of ⌧Ch (x). The second is when K1 /a0 (x) > T0 t is false
and K2 (x, )/a0 (x) > ⌧Ch is true; but in this case, we have to compute ⌧Ch (x). We
consider one particular hybrid path, and we let NSSA,K1 (h, ) be the number of SSA
steps such that K1 /a0 (x) > T0 t is true. In the same way, let NSSA,K2 (h, ) be the
number of SSA steps such that K1 /a0 (x) > T0 t is false and K2 (x, )/a0 (x) > ⌧Ch is
true. Finally, let NTL (h, ) be the total number of tau-leap steps. We define (h, )
as the expected work of a hybrid path, i.e.,
(h, ) = C1 E [NSSA,K1 (h, )] + C2 E [NSSA,K2 (h, )]

XJ Z
+ C3 E [NTL (h, )] + E Cp (aj (X̄(s))⌧Ch (X̄(s), ))1T L (X̄(s))ds .
j=1 [0,T ]
Therefore, the expected computational work of the hybrid method is M (h, ),

where M is the total number of hybrid paths.
Given T OL > 0, we consider the problem,
8
>
>
>
> minM,h, M (h, )
<
s.t. . (5.26)
>
>
>
>
: EI + EE + ES  T OL
149
In Algorithm 9, we propose an iterative method for obtaining an approximate
solution to this problem.
A brief description of the ideas involved in this algorithm follows. Consider that a
relative tolerance, T OL > 0, is given. By using Algorithm 10, we simulate a number
of hybrid paths in a coarse mesh of size h0 , with sufficiently small (say = 10 6 ),
⇥ ⇤
to obtain accurate estimates of Var g(X̄(T )) and EI . The total runtime of this
procedure is recorded in the variable r.
Now, we estimate (h0 , ), and, in particular, the error bound, B A (NT L ; Ms ),
for the exit error, EE . The desired order of this error is to be of order O (T OL2 ). We
thus divide by a factor (e.g., 10) and re-estimate B A (NT L ; Ms ) until this condition
is fulfilled. Then, we compute the discretization error, EI , and S 2 (g(X(T ); Ms ).
For fixed > 0 and ✏ > 0, let us consider an auxiliary problem:
8
>
>
>
> minM,h M (h, )
<
s.t. , (5.27)
>
>
>
> p
: EI (h, ) + CA S 2 (g(X(T )); Ms ) /M = ✏
where CA 2.
Instead of solving 5.27, we proceed as follows. First, we fix h = h0 and derive
Maux and ✏0 as functions of h0 and .
p !2
@h (h0 , ) C S 2 (g(X(T )); M )
A s
Maux (h0 , ) = . , (5.28)
(h0 , ) 2@h EI (h0 , )
p
S 2 (g(X(T )); ·)
and ✏0 (h0 , ) = EI (h0 , ) + CA p .
Maux (h0 , )
If ✏0 < T OL T OL2 , we take the current values of h0 and as solutions of our

optimization problem (5.26). Otherwise, we divide the time mesh by a factor (e.g.,
4, which is near the optimal value of the multilevel tau-leap) and proceed iteratively.
Each time we refine the mesh or , we set the budget for the computational work, r0 , as
150
2·r which is the current total computational work of the calibration algorithm (see the
details in Algorithm 10). In this way, we can guarantee that the current computational
work of the calibration is less than or equal to two times the computational work at
the last refinement.
Once the previous process is finished, we can take advantage of the slack
T OL T OL2 EI (h0 , ) for reducing the value of Maux and obtain
p !2
CA S 2 (g(X(T )); Ms )
MHyb (h0 , ) = . (5.29)
T OL T OL2 EI (h0 , )
@h (h0 , )
The estimation of (h0 , )
and @h EI (h0 , ) in (5.28) deserves some remarks.
@h (h0 , )
First, note that (h0 , )
= @h log( (h0 , )). In a pure tau-leap regime, the number
of steps is approximately inversely proportional to the size of the mesh. We therefore
have E [NT L (h)] =O (h 1 ). In a hybrid regime, we model E [ (h, )] =O (ha ). There-
fore, for large values of h, a plausible model for log( (h, )) is a log(h) + b. We denote
it with ˜ (h; a, b). See Algorithm 9 for details. An initial guess for a is 1.
For the estimation of @h EI (h0 , ), we simply take numerical derivatives when con-
secutive meshes are available as follows:
@h EI (hk , ) ⇡ 2/EI (hk 1 , k 1 ) (EI (hk , k ) EI (hk 1 , k 1 )) .
As an initial value, we can consider EI (h0 , )/h0 .

When h is close to zero, (h, ) is the expected work of an exact path, C1 E [NSSA⇤ ]
(see 5.3.3). Therefore, if in any iteration (h, ) is greater than C1 A (NSSA⇤ ; ·), we
decide to use the SSA method.
151
Algorithm 9 Calibration and error estimation. Inputs: the initial state, X(0), the
final time, T , the propensity functions, (aj )Jj=1 , the stoichiometric vectors, (⌫j )Jj=1 ,
the smooth observable, g, and T OL > 0. Outputs: (SSA, MSSA ) or (Hyb, MHyb , ,
(tk )K
k=0 ). Notes: the values CA and C1 are defined in Section 5.3.1. For the sake of
simplicity, we omit the arguments of the algorithms when there is no risk of confusion.
1: Set initial mesh {tk }K k=0 (h0 its diameter)
3
2: O(T OL )
3: r0 1
ˆ
4: ( , r, S 2 g(X̄(T )); · , A {g(X̄(T )), NSSA⇤ , EI , NTL }; · ) Algorithm 10
5: MSSA CA2 S 2 g(X̄(T )); · /(T OL T OL2 )2
6: a 1
7: b log( ˆ ) a log(h0 )
8: fin false
9: while not fin and ˆ < C1 A (NSSA⇤ ; ·) do
10: while |A g(X̄(T )); · | A (NTL ; ·) > T OL2 do
11: Refine
12: r0 2r
13: ( ˆ , r, S 2 g(X̄(T )); · , A {g(X̄(T )), NSSA⇤ , EI , NTL }; · ) Algorithm 10
2 2 2 2
14: MSSA CA S g(X̄(T )); · /(T OL T OL )
15: end while
16: Compute @h EI and @h ˜ (h; a, b)
17: Compute Maux (h0 ; ) and ✏, see (5.28)
18: if ✏ < T OL T OL2 then
19: fin true
20: Compute MHyb and ✏, see (5.29)
21: if MHyb ˆ < MSSA C1 A (NSSA⇤ ; ·) then
22: return (Hyb, MHyb , , {tk }K k=0 )
23: else
24: return (SSA, MSSA )
25: end if
26: else
27: Refine the mesh {tk }K k=0 , and set h0
28: r0 2r
29: ( ˆ , r, S 2 g(X̄(T )); · , A {g(X̄(T )), NSSA⇤ , EI , NTL }; · ) Algorithm 10
2 2 2 2
30: MSSA CA S g(X̄(T )); · /(T OL T OL )
31: Update a and b using a linear regression
32: end if
33: end while
5.5 Numerical Examples
In this section, we present two examples to illustrate the performance of our proposed
method.
152
Algorithm 10 Auxiliary function for Algorithm 9. Inputs: same as Algorithm 8,
and constant r0 , used to control the total computational work of the algorithm (bud-
get). Outputs: the estimated runtime of the hybrid path, ˆ , the total accumulated
runtime, r, an estimate of Var [g(X(T ))], S 2 g(X̄(T ); · , an estimate of E [g(X(T ))],
A g(X̄(T ); · , an estimate of E [EI ], A (EI ; ·), an estimate of the expected number of
steps needed by the SSA method, A (NSSA⇤ ; ·)) and A (NTL ; ·). Here, 1T L (k) = 1 if
and only if the decision at time tk was tau-leap. Notes: the values C1 , C2 and C3
are defined in Section 5.3.1. Set appropriate values for M0 and CV0 . For the sake of
simplicity, we omit the arguments of the algorithms when there is no risk of confusion.
1: M M0 , cv 1, r 0, Mf 0
2: while cv > CV0 and r  r0 do
3: for m 1 to M do
4: ((X̄(tk ))Kk=0 , NTL , NSSA,K1 , NSSA,K2 ) Algorithm 8
5: if the path does not exit Zd+ then
6: Mf Mf + 1
7: Compute g(X̄(T ; ! ¯ m ))
8: EI Algorithm 11
9: Use remark 5.3.3PJ for PK estimating NSSA⇤ (¯ !m )
10: CP oi (¯
!m ) j=1 C
k=0 P (a j ( X̄(t k ))(t k+1 tk ))1T L (k)
11: end if
12: end for
13: Estimate the coefficients of variation cvg and cvEI of the estimators of
Var [g(X(T ))] and E [EI ], respectively.
14: cv max{cvg , cvEI }
15: ˆ C1 A (NSSA,K1 ; Mf ) +C2 A (NSSA,K2 ; Mf ) +C3 A (NTL ; Mf ) +A (CP oi ; Mf )
16: r r + Mf ˆ
17: M 2M
18: end while
19: return ( ˆ , r, S 2 g(X̄(T )); Mf , A {g(X̄(T )), EI , NSSA⇤ , NTL }; Mf )
5.5.1 A Simple Decay Model
The classical radioactive decay model provides a simple and important example for
the application of the hybrid method. This model has only one species and one
reaction,
c
X ! ;.
153
Algorithm 11 Computes the discretization error, EI ⌘ EI (¯ !m ). Inputs: (X̄(tk ))K
k=0 .
Here, 1T L (k) = 1 if and only if the decision at time tk was tau-leap, and Id is the
d ⇥ d identity matrix Output: EI (¯ !m ).
1: EI 0
2: Compute 'K rg(X̄(tK ))
3: for k K 1 to 1 do
4: tk tk+1 tk
5: Compute Ja (X̄(tk )) = [@i aj (X̄(tk ))]j,i
6: 'k Id + tk JTa (X̄(tk )) ⌫ T 'k+1
7: ak a(X̄(tk+1 )) a(X̄(tk ))
8: EI EI + 2tk ( ak 1T L (k) ⌫ T ) 'k
9: end for
10: return EI
Its stoichiometric matrix, ⌫ 2 R, and the propensity function, a : Z+ ! R, are given

by
⌫= 1 and a(X) = c X.
Here, we choose c = 1 and g(x) = x. In this particularly simple example, we have

the exact solution, namely, E [g(X(T ))|X(t) = X0 ] = X0 exp( c(T t)).
In Figure 5.8, we show the behavior of the time step size of the Cherno↵ tau-leap
method, ⌧Ch , as a function of the one-step exit probability bound, . We compare
⌧Ch with the expected value of the SSA step size, ⌧SSA , in a log-log scale, for x0 2
{5, 10, 15, 20}. We can see that ⌧Ch goes to zero as goes to zero. For small values
of , we have that ⌧SSA = 1/a0 (x0 ) = 1/x0 is larger than ⌧Ch , and, therefore, the
SSA method is chosen by the hybrid algorithm (Algorithm 7). The expected SSA
step size, which is independent of , is shown with horizontal dotted lines starting
from the right, until the intersection with the ⌧Ch curve. For example, if x0 =10, ⌧Ch
is larger than the expected ⌧SSA whenever > 0.0259. These dotted lines show two
regimes: as we mentioned, below the dotted line, we can say that the process is close
to the boundary, but, when ⌧Ch is larger than the expected ⌧SSA , we can say that the
process, X, is far from the boundary. In this regime, the Cherno↵ tau-leap method
154
will be chosen by the hybrid algorithm.
Summarizing, in Figure 5.8, we can observe when the SSA method is preferred
over the Cherno↵ tau-leap: either because we have very stringent probabilities of
taking negative values yielding a small value for , or, because the current state of
the process (x0 ) is relatively close to the boundary.
0
Decay model, T = 2
10
τ Chernoff, E[τ SSA|X0]
−1
10
−2
10
X0= 5
X0=10
−3
10 X0=15
X0=20
E[τ SSA|X0]
−4
10 −20 −15 −10 −5 0
10 10 10 10 10
δ
Figure 5.8: Cherno↵ step size, ⌧Ch , as a function of , for x0 2 {5, 10, 15, 20}, com-
pared to E [⌧SSA |X0 ]. For x0 fixed, we can observe two regimes delimited by the dotted
lines. Above the dotted line, the Cherno↵ tau-leap method is preferred and below
the line, the preferred method is the SSA.
In Figure 5.9, we show ⌧Ch as a function of x0 , using a log scale on the x-axis, for
di↵erent values of . It is interesting to observe that the maximum value of ⌧Ch is 1,
even when the final time is T = 2. This is influenced by the propensity function and
the value of c. For smaller values of c, the maximum increases. This figure shows
that when x0 is small, the values of ⌧Ch decrease rapidly and become much smaller
than ⌧SSA . As we mentioned, to be close or far from the boundary is a relative notion
and it must be seen according to the probability of exiting the lattice. For instance,
155
when x0 = 10, we have that ⌧SSA is approximately equal to ⌧Ch for = 10 5 , which
is greater than the values of typically needed to achieve small tolerances. In the
figure, we can see that, when x0 tends to 1 (its minimum value), the expected ⌧SSA
tends to 1 and it is greater than ⌧Ch . This shows that, as we are getting closer to the
boundary by decreasing x0 , the ⌧Ch becomes too small. On the contrary, when x0
increases, the Cherno↵ tau-leap step size becomes larger and, therefore, the tau-leap
method is preferred.
Decay model, T = 2
1
0.9
0.8
τ Chernoff, E[τ SSA|X0]
0.7
δ=1e−17
0.6
δ=1e−14
0.5 δ=1e−11
0.4 δ=1e−08
δ=1e−05
0.3 δ=0.001
0.2 δ= 0.01
δ= 0.1
0.1 E[τ SSA|X0]
0 0 2 4 6
10 10 10 10
X0
Figure 5.9: Cherno↵ step size, ⌧Ch , as a function of x0 for di↵erent values of . We
observe two regimes: as x0 decreases, the SSA method is preferred; as x0 increases,
the Cherno↵ tau-leap is preferred.
Consider the initial condition X0 =100 and final time T =2. We can observe that
the process starts at a regime where the expected SSA step size is smaller than the
Cherno↵ tau-leap, but after a certain time, it is the opposite. In Figure 5.10, we show
20 SSA paths and 20 hybrid paths.
Now, we consider the initial condition, X0 =105 , and the final time, T =0.5. In
this case, the process starts far from the boundary. First, we observe in Figure 5.11
156
20 exact paths 20 Hybrid paths, δ = 1.0e−04
100 100
90 X 90
80 80
Number of particles
Number of particles
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
Time Time
Figure 5.10: Left: 20 SSA paths for the simple decay model with X0 =100 and T =2.
Right: 20 hybrid trajectories, with linear interpolation between sample points (time
steps). We can observe that, near the x-axis, the hybrid algorithm takes more SSA
steps and fewer tau-leap steps.
that the SSA paths are very close to each other; that is, the variance of g(X(T ))
is small. We analyze an ensemble of five independent realizations of the calibration
x 10
4 20 exact paths 4
x 10
20 exact paths
10
6.15
9.5
Number of particles
Number of particles
9
6.1
8.5
7.5
6.05
6.5
X 6 X
6
0 0.1 0.2 0.3 0.4 0.5 0.495 0.496 0.497 0.498 0.499 0.5
Time Time
Figure 5.11: Left: 20 SSA paths for the simple decay model with X0 =105 and T =0.5.
Right: Details.
algorithm (Algorithm 9), using di↵erent relative tolerances. In Figure 5.12, we show,
in the left panel, the total predicted work (runtime) given by the calibration algorithm
for both methods, the hybrid and the SSA, versus the estimated error bound, and
its corresponding confidence intervals at the 95% level. The method chooses for the
hybrid algorithm for the first three tolerances (largest) and the SSA for the two
smaller ones. For the fourth tolerance, the method chooses the hybrid in 80% of the
runs and SSA for the rest (see Table 5.2). Note that as T OL decreases, the hybrid
157
path converges to the exact one because goes to 0 (see Appendix 5.A). In the right
panel, we show, for di↵erent tolerances, the actual work (runtime) of both methods,
using a 12 core Intel GLNXA64 architecture and MATLAB version R2012b. The
actual runtimes are in accordance with our predictions.
Predicted work vs. Error bound, Decay model Predicted/Actual work vs. Error bound, Decay model
SSA SSA predicted
−3
10
−3 Hybrid
Error bound
Error bound
10
−4
10
−4
10
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
Figure 5.12: Left: Predicted work (runtime) versus the estimated error bound for
X0 =105 and T =0.5. The hybrid method is preferred over the SSA one for the first
three tolerances (larger ones). For the last two tolerances, the SSA is preferred.
Therefore, in that case, the total predicted runtime is the same for the hybrid and
SSA methods. Right: Predicted and actual work (runtime) versus the estimated error
bound.
In the simple decay model, where an explicit expression for E [g(X(T ))] is avail-
able, we can accurately compute the ratio between the estimated weak error and EI ,
which we call efficiency index of the discretization error. We compute this quantity
when the preferred method is the hybrid one. Recall that
⇥ ⇤
EI = E g(X(T )) g(X̄(T )) 1A .
In order to compute that quantity, for each run of the calibration algorithm (Algo-
rithm 9), we use a large sample, in order to control the statistical error. The sample
size is such that, the statistical error, in the estimation of EI , is ten times smaller
than the prescribed tolerance. In Figure 5.13 we show the efficiency index of the
discretization error, with confidence intervals at 95%. In the same figure we also
show T OL versus the actual computational error. It can be seen that the prescribed
158
T OL Method (T OL) h(T OL) M (T OL)
SSA HYB
3.13e-03 0.00 1.00 3.05e-08 2.0e-03 5.0
1.56e-03 0.00 1.00 3.81e-09 9.8e-04 1.6e+01
7.81e-04 0.00 1.00 4.77e-10 4.9e-04 6.4e+01
3.91e-04 0.80 0.20 5.96e-11 2.4e-04 2.6e+02
1.95e-04 1.00 0.00 - - -
9.77e-05 1.00 0.00 - - -
ŴHyb
T OL MSSA ŴSSA
A (NTL ; ·) A (NSSA⇤ ; ·)
3.13e-03 3.0 0.20 ±0.03 2.6e+02 3.9e+04
1.56e-03 1.2e+01 0.37 ±0.05 5.1e+02 3.9e+04
7.81e-04 4.6e+01 0.57 ±0.09 1.0e+03 3.9e+04
3.91e-04 1.8e+02 0.97 ±0.05 2.0e+03 3.9e+04
1.95e-04 7.2e+02 1.00 - 3.9e+04
9.77e-05 2.9e+03 1.00 - 3.9e+04
Table 5.2: Details for an ensemble of five independent runs of Algorithm 9 for the
simple decay model with X0 =105 and T =0.5. For example, the third row of the table
tells us that we should run M =64 hybrid paths, with a time mesh of size h=4.9 · 10 4
and a one-step exit probability bound of =4.77 · 10 10 . The work of the hybrid
method is, in average, 57% of the work of the SSA (thrid column in the second part
of the table). Here ŴHyb := MHyb ˆ and ŴSSA := MSSA C1 A (NSSA⇤ ; ·). The fourth
row shows, in the second and third columns, that in 4 runs of Algorithm 9 the SSA
method is chosen, and in one run the hybrid method. In that case, we should simulate
MSSA =180 SSA paths or M =260 hybrid paths. Confidence intervals at 95% level are
also provided.
tolerance is achieved with the required confidence of 95%, since CA =1.96.
5.5.2 Gene transcription and translation [24]
This model has five reactions,
c
1 2 c
; ! R, R ! R+P
3c 4 c
2P ! D, R ! ;
5 c
P ! ;
159
TOL vs. Estimated over actual weak error TOL vs. Total error 1.2
Estimated over actual weak error

1.6
1.4
−3
2.2
10
0.2
1.3
Total error
1.2
−4
10
1.1
1 −5
10
0.9
0.8 −6
10
−3 −4 −3
10 10 10
TOL TOL
Figure 5.13: Left: Efficiency index for EI and 95% confidence intervals. Right: T OL
versus the actual computational error. The numbers above the straight line show the
percentage of runs that had errors larger than the required tolerance. We observe
that in all cases the computational error follows the imposed tolerance closely with
the expected confidence of 95%.
described by the stoichiometric matrix and the propensity function
0 1 0 1
1 0 0 c1
B C B C
B C B C
B 0 1 0 C B c R C
B C B 2 C
B C B C
⌫=B
B 0 2 1 C and a(X) = B c3 P (P 1) C
C B
C,
B C B C
B C B C
B 1 0 0 C B c4 R C
@ A @ A
0 1 0 c5 P
respectively, where X(t) = (R(t), P (t), D(t)), and c1 =25, c2 =103 , c3 =0.001, c4 =0.1,
and c5 =1. In the simulations, the initial condition is (0, 0, 0) and the final time is
T =1. The observable is given by g(X) = X3 = D.
We can see that the abundance of the mRNA species, represented by R, is close
to zero for t 2 [0, T ]. Therefore, this can be interpreted that the process is close
to the boundary. However, according to Table 5.3, the calibration algorithm always
chooses the hybrid method only in the first two tolerances. This happens because
small tolerances induce small one-step exit probabilities, and, as a consequence, the
Cherno↵ tau-leap steps are smaller than the expected SSA steps. This suggests that
160
the reduced abundance of one of the species is not enough to ensure that the SSA
method should be used. The tolerance also plays a role in this choice.
In Figure 5.14, we show an ensemble of five independent realizations of the calibra-
tion algorithm and the comparisons of its corresponding predicted and actual work.
We can appreciate the robustness of the calibration procedure. We can also observe
that the hybrid method converges to the SSA one when the tolerance goes to zero.
Predicted work vs. Error bound, Genes model Predicted/Actual work vs. Error bound, Genes model
−1 −1
10 10
SSA SSA predicted
Hybrid
Error bound
Error bound
−2
10
−2
10
1 2 3 4 1 2 3 4
10 10 10 10 10 10 10 10
the gene transcription and translation model. The hybrid method is preferred over
the SSA one, for the first two tolerances (larger ones). For the last four tolerances,
the SSA is preferred. Therefore, in the latter case, the total predicted runtime is the
same for the hybrid and SSA methods. Right: Predicted and actual work (runtime)
versus the estimated error bound.
In Figure 5.15 we show the efficiency index of the discretization error, with con-
fidence intervals at 95%. In the same figure we also show T OL versus the actual
computational error. It can be seen that the prescribed tolerance is achieved with
the required confidence of 95%, since CA =1.96.
5.6 Conclusions
In this work, we addressed the problem of accurately estimating the expected value of
an observable of a Markov pure jump process at a given final time within a certain pre-
scribed tolerance with high probability. Examples of settings where such estimation
161
T OL Method (T OL) h(T OL) M (T OL)
SSA HYB
1.00e-01 0.00 1.00 8.0e-05 ±2e-05 2e-02 ±2e-03 66 ±3
5.00e-02 0.00 1.00 1.0e-05 ±2e-06 7e-03 ±7e-04 230 ±8
2.50e-02 0.40 0.60 1.1e-06 ±5e-07 3e-03 ±7e-04 840 ±70
1.25e-02 0.80 0.20 1.9e-07 2.0e-03 3e+03
6.25e-03 1.00 0.00 - - -
3.13e-03 1.00 0.00 - - -
ŴHyb
T OL MSSA ŴSSA
A (NTL ; ·) A (NSSA⇤ ; ·)
1.00e-01 3.5e+01 0.39 ±0.04 7e+01 ±1e+01 1.8e+04
5.00e-02 1.4e+02 0.54 ±0.10 1.4e+02 ±2e+01 1.8e+04
2.50e-02 5.5e+02 0.88 ±0.10 3.2e+02 ±9e+01 1.7e+04
1.25e-02 2.2e+03 0.99 ±0.02 4.9e+02 1.8e+04
6.25e-03 8.8e+03 1.00 - 1.8e+04
3.13e-03 3.5e+04 1.00 - 1.8e+04
Table 5.3: Details for an ensemble of five independent runs of Algorithm 9 for the
gene transcription and translation model. Details on how to read the table is provided
in Table 5.2.
TOL vs. Estimated over actual weak error TOL vs. Total error 1.6
Estimated over actual weak error
1.8 2.4
2.4
1.6 −2
0.2
10
1.4
Total error
1.2
−4
10
1
0.8
−6
0.6 10
−2 −1 −2 −1
10 10 10 10
TOL TOL
Figure 5.15: Left: Efficiency index for EI and 95% confidence intervals. Right: T OL
versus the actual computational error. The numbers above the straight line show the
percentage of runs that had errors larger than the required tolerance. We observe
that in all cases the computational error follows the imposed tolerance closely with
is necessary are message delivery times and connectivity in wireless communication

networks and the number of infected agents in epidemic modeling of small popula-
tions. Although there are methods that simulate paths with the exact distribution
162
of the process (e.g., Gillespie’s SSA method), the computational work of generating
the number of paths required to control the statistical error in a Monte Carlo setting
turns out to be prohibitive for some real applications. On the other hand, Gillespie’s
approximate tau-leap method could produce, in certain cases, less expensive paths at
the price of additionally introducing a time discretization error and an exit error.
In this work, we proposed a hybrid algorithm that, at each step, adaptively chooses
to adopt the SSA method when the work of the tau-leap step becomes high. As a
consequence, the expected work of a hybrid path remains bounded by the expected
work of an SSA path and potentially can be much smaller.
The global exit error is related to the fact that, at any time, a tau-leap path
can attain a non-physical value. Pre-leap checks are common techniques for deal-
ing with this problem by controlling the time step size. Here, we presented a novel
non-asymptotic Cherno↵-type hard bound to control large deviations of linear com-
binations of independent Poisson random variables. This bound allows us not only
to obtain a pre-leap check for the tau-leap method, which does not change the dis-
tribution of the increments nor does it require any type of assumption regarding the
reactions that can occur, but also to estimate and control the global exit error. To
the best of our knowledge, there is no previous attempt in the literature to estimate
and control this type of error at the path level.
Another important contribution of this work is a calibration algorithm that can
determine if it is suitable to use the hybrid algorithm for a given problem and also that
can provide the associated simulation parameters. In the hybrid case, the calibration
algorithm provides the one-step exit probability bound, the time mesh and the number
of hybrid paths that are needed for computing the mentioned expected value with
low computational work.
It is worth mentioning that, by simulating hybrid paths, we obtained accurate
estimates of the average number of steps required by the SSA method to reach the
163
final time. This is especially relevant in problems where the process visits regions of
the state space where the total propensity is very high.
The numerical results that we obtain from di↵erent models show that the hybrid
method proposed here is suitable for addressing problems in which one or more species
has few individuals while the total propensity is high. In these types of problems (e.g.,
the gene transcription and translation model), the reaction-rate ODEs not only do not
provide accurate approximation for the average behavior of the process, but also the
cost of the exact methods is also high. Moreover, we observed that generating Poisson
random variables makes the computational work of a tau-leap step much higher than
the work of an SSA step. This last argument, together with the advantages already
discussed in terms of the time discretization error and the global exit error, adds more
evidence in favor of avoiding the tau-leap whenever possible.
Our next step is to extend this hybrid algorithm to the Multi Level Monte Carlo
setting [25, 26]. We aim to obtain substantial computational work gains with respect
to the traditional exact methods (SSA or MNRM) and the single-level hybrid Cherno↵
tau-leap, and show that the computational complexity of this multilevel extension is
of order O (T OL 2 ).
Acknowledgments
The authors are members of the KAUST SRI Center for Uncertainty Quantification
in the division of Computer, Electrical and Mathematical Sciences and Engineering at
King Abdullah University of Science and Technology (KAUST). The authors would
like to thank Jesper Karlsson for many interesting discussions at the early stages of
this work.
164
Appendix
Appendix 5.A An upper bound for the expected
number of tau-leap steps of a hy-
brid trajectory, (E [NTL(h, )])
Let E [NTL (h, )] be the expected number of tau-leap steps of a hybrid path with a
mesh of size h and a one-step exit probability bound, . Let {Ti } be the sequence of
grid points, t the current time and X̄(t) the current state of the hybrid process, X̄.
Let ⌧Ch (X̄(t), ) be the Cherno↵ tau-leap step size computed using Algorithm 5, and
finally K1 , and K2 = K2 (X̄(t), ⌧Ch ) are the ones introduced in Section 5.3.1.
According to Algorithm 7, the logical conditions for choosing a tau-leap step are
given by
K1 K2
< Ti t, and < ⌧Ch (X̄(t), ).
a0 (X̄(t)) a0 (X̄(t))
The e↵ective step size in this case is given by min{⌧Ch (X̄(t), ), Ti t}. Observe
⇢
K2
that <⌧Ch (X̄(t), ) ! ; as ! 0, because K2 ! C̃ and ⌧Ch ! 0 (see
a0 (X̄(t))
165
Section 5.3.1). By the definition of NT L , we have that
E [NTL (h, )]
2 ⇢ 3
K1 K2
Z
6X Ti 1 a0 (X̄(t)) < Ti t, a0 (X̄(t)) < ⌧Ch (X̄(t), ) 7
= E6 4 dt7
5
i Ti 1 min{⌧Ch (X̄(t), ), Ti t}
2 ⇢ 3
K1 K2
6 XZ Ti 1
a0 (X̄(t))
< Ti t,
a0 (X̄(t))
< ⌧Ch (X̄(t), ) 7
 E6
4 dt7
5
Ti K1 K2
i 1 min{ , }
a0 (X̄(t)) a0 (X̄(t))
2 ⇢ 3
K2
Z a0 (X̄(t))1 <⌧Ch (X̄(t), )
6X Ti
a0 (X̄(t)) 7
 E6
4 dt7
5 ! 0, as ! 0.
i Ti 1
min{K1 , K2 }
It is also true that E [NT L ] has a polynomial bound since a0 is polynomial and
2 ⇢ 3
K2 RT ⇥ ⇤
6 XZ Ti a0 (X̄(t))1
a0 (X̄(t))
<⌧Ch (X̄(t), ) 7 E a0 (X̄(t)) dt
E6
4 dt7
5
0
.
i Ti 1
min{K1 , K2 } min{K1 , K2 }
Finally, for the problems where maxx2Zd a0 (x) < 1, we get the rough upper
+
bound RT ⇥ ⇤
0
E a0 (X̄(t)) dt T
 max a0 (x).
min{K1 , K2 } min{K1 , K2 } x2Zd
+
Observe that Zd+ can substituted by Zd+ (x0 , T ) ⇢ Zd+ defined by the subset of states
that can be reached by a path starting from x0 and evolving up to time T . Therefore,
we have an upper bound for E [NT L ] that does not depend on . When the lattice is
finite as in the exponential decay (Example 5.1.2), this bound is c T x0 /K2 .
166
REFERENCES
[1] S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Conver-
gence (Wiley Series in Probability and Statistics), 2nd ed. Wiley-Interscience,
9 2005.

vol. 22, pp. 403–434, 1976.
[3] A. Voter, “Introduction to the kinetic Monte Carlo method,” Radiation E↵ects
in Solids, pp. 1–23, 2007.

2001.
systems,” Multiscale Model. Simul., vol. 6, no. 2, pp. 417–436 (electronic), 2007.
[6] M. Rathinam, L. R. Petzold, Y. Cao, and D. T. Gillespie, “Consistency and

stability of tau-leaping schemes for chemical reaction systems,” Multiscale Model.
Simul., vol. 4, no. 3, pp. 867–895 (electronic), 2005.
[7] Y. Cao, D. T. Gillespie, and L. R. Petzold, “Avoiding negative populations in

explicit Poisson tau-leaping,” Journal of Chemical Physics, vol. 123, no. 5, Aug
1 2005.
[8] ——, “Efficient step size selection for the tau-leaping simulation method.” The
Journal of Chemical Physics, vol. 124, no. 4, p. 044109, 2006.
[9] D. T. Gillespie and L. R. Petzold, “Improved leap-size selection for accelerated

stochastic simulation,” The Journal of Chemical Physics, vol. 119, no. 16, p.
8229, 2003.
[10] D. F. Anderson, “Incorporating postleap checks in tau-leaping,” Journal of

Chemical Physics, vol. 128, no. 5, 2008.
167
[11] A. Chatterjee, D. G. Vlachos, and M. A. Katsoulakis, “Binomial distribution
based tau-leap accelerated stochastic simulation,” The Journal of Chemical
Physics, vol. 122, no. 2, p. 024112, 2005.
[12] M. F. Pettigrew and H. Resat, “Multinomial tau-leaping method for stochastic

kinetic simulations,” Journal of Chemical Physics, vol. 126, no. 8, p. 084101,
Feb. 2007.
[13] T. Tian and K. Burrage, “Binomial leap methods for simulating stochastic chem-
ical kinetics,” The Journal of Chemical Physics, vol. 121, no. 21, pp. 10 356–
10 364, 2004.
[14] H. Cherno↵, “A measure of asymptotic efficiency for tests of a hypothesis based

on the sum of observations.” Ann. Math. Stat., vol. 23, pp. 493–507, 1952.
[15] J. Karlsson, M. Katsoulakis, A. Szepessy, and R. Tempone, “Automatic Weak

Global Error Control for the Tau-Leap Method,” 2012, working paper.
[16] V. Petrov, Sums of Independent Random Variables. Springer, 1976.
[18] T. G. Kurtz, “The relationship between stochastic and deterministic models for
chemical reactions,” The Journal of Chemical Physics, vol. 57, no. 7, p. 2976,
1972.

and Epidemiology (Texts in Applied Mathematics), 2nd ed. Springer, 2011.
[20] F. Gebali, Analysis of Computer and Communication Networks. Springer, 2008.
[21] B. Klar, “Bounds on tail probabilities of discrete distributions,” Probab. Eng.

Inf. Sci., vol. 14, pp. 161–171, April 2000.
168
[22] F. McMahon, The livermore Fortran kernels: a computer test of the numerical
performance range. Lawrence Livermore National Laboratory, 1986.
[24] D. F. Anderson and M. Koyama, “Weak error analysis of numerical methods for
stochastic models of population processes,” Multiscale Modeling and Simulation,
vol. 10, no. 4, pp. 1493–1524, 2012.
[25] M. Giles, “Multi-level Monte Carlo path simulation,” Operations Research,

vol. 53, no. 3, pp. 607–617, 2008.
vol. 10, no. 1, 2012.
169
Chapter 6
Multilevel Hybrid Cherno↵

Tau-leap
1
Abstract
In this work, we extend the hybrid Cherno↵ tau-leap method to the multilevel Monte
Carlo (MLMC) setting. Inspired by the work of Anderson and Higham on the tau-
leap MLMC method with uniform time steps, we develop a novel algorithm that is
able to couple two hybrid Cherno↵ tau-leap paths at di↵erent levels. Using dual-
weighted residual expansion techniques, we also develop a new way to estimate the
variance of the di↵erence of two consecutive levels and the bias. This is crucial
because the computational work required to stabilize the coefficient of variation of
the sample estimators of both quantities is often una↵ordable for the deepest levels
of the MLMC hierarchy. Our method bounds the global computational error to be
below a prescribed tolerance, T OL, within a given confidence level. This is achieved
with nearly optimal computational work. Indeed, the computational complexity of
our method is of order O (T OL 2 ), the same as with an exact method, but with a
1
A. Moraes, R. Tempone and P. Vilanova, “Multilevel Hybrid Cherno↵ Tau-Leap”, Accepted for
publication in BIT Numerical Mathematics, (2015).
170
smaller constant. Our numerical examples show substantial gains with respect to the
previous single-level approach and the Stochastic Simulation Algorithm.
6.1 Introduction
This work, inspired by the multilevel discretization schemes introduced in [1], ex-
tends the hybrid Cherno↵ tau-leap method [2] to the multilevel Monte Carlo setting
[3]. Consider a non-homogeneous Poisson process, X, taking values in the lattice
of non-negative integers, Zd+ . We want to estimate the expected value of a given
observable, g : Rd ! R of X, at a final time, T , i.e., E [g(X(T ))]. For example, in
a chemical reaction in thermal equilibrium, the i-th component of X, Xi (t), could
describe the number of particles of species i present at time t. In the systems modeled
here, di↵erent species undergo reactions at random times by changing the number of
particles in at least one of the species. The probability of a single reaction happen-
ing in a small time interval is modeled by a non-negative propensity function that
depends on the current state of the system. We present a formal description of the
problem in Section 6.1.1.
Pathwise realizations of such pure jump processes (see, e.g., [4]) can be simulated
exactly using the Stochastic Simulation Algorithm (SSA), introduced by Gillespie in
[5], or the Modified Next Reaction Method (MNRM) introduced by Anderson in [6].
Although these algorithms generate exact realizations for the Markov process, X,
they are computationally feasible for only relatively low propensities.
For that reason, Gillespie in [7] and Aparicio and Solari in [8] independently
proposed the tau-leap method to approximate the SSA by evolving the process with
fixed time steps and by keeping the propensity fixed within each time step. In fact, the
tau-leap method can be seen as a forward Euler method for a stochastic di↵erential
equation driven by Poisson random measures (see, e.g., [9]).
171
A drawback of the tau-leap method is that the simulated process may take negative
values, which is an undesirable consequence of the approximation and not a qualitative
feature of the original process. For this purpose, we proposed in [2] a Cherno↵-type
bound that controls the probability of reaching negative values by adjusting the time
steps. Also, to avoid extremely small time steps, we proposed switching adaptively
between the tau-leap and an exact method, creating a hybrid tau-leap/exact method
that combines the strengths of both methods.
More specifically, let x̄ be the state of the approximate process at time t, and let
2 (0, 1) be given. The main idea is to compute a time step, ⌧ =⌧ ( , x̄), such that
the probability that the approximate process reaches an unphysical negative value in
[t, t+⌧ ) is less than . This allows us to control the probability that a entire hybrid
path exits the lattice, Zd+ . In turn, this quantity leads to the definition of the global
exit error, which is a global error component along with the time discretization error
and the statistical error (see Section 6.3.2 for details).
The multilevel Monte Carlo idea goes back at least to [10, 11]. In that setting,
the main goal was to solve high-dimensional, parameter-dependent integral equations
and to conduct corresponding complexity analyses. Later, in [3], Giles developed
and analyzed multilevel techniques that were used to reduce the computational work
when estimating an expected value using Monte Carlo path simulations of a cer-
tain quantity of interest of a stochastic di↵erential equation. Independently, in [12],
Speight introduced a multilevel approach to control variates. Control variates are a
widespread variance reduction technique with the main goal of increasing the preci-
sion of an estimator or reducing the computational e↵ort. The main idea is as follows:
to reduce the variance of the standard Monte Carlo estimator of E [X],
M
1 X
µ̂1 := X(!m ),
M m=1
172
we consider another unbiased estimator of E [X],
M
1 X
µ̂2 := (X(!m ) (Y (!m ) E [Y ])) ,
M m=1
where Y is a random variable correlated with X with known mean, E [Y ]. The

variable Y is called a control variate. Since Var [µ̂2 ] =Var [µ̂1 ] +Var [Y ] 2Cov [X, Y ],
whenever Cov [X, Y ] >Var [Y ] /2, we have that Var [µ̂2 ] Var [µ̂1 ]. If we assume that
the computational work of generating the pair (X(!), Y (!)) is less than twice the
computational work of generating X(!), it is straightforward to conclude that µ̂2 is
preferred when ⇢2X,Y >1/2, where ⇢X,Y is the correlation coefficient of the pair (X, Y ).
We observe that µ̂2 can be written as
M
1 X
µ̂2 = E [Y ] + (X Y ) (!m ).
M m=1
In the case where E [Y ] is unknown and sampling from Y is computationally less

expensive than sampling from X, it is natural to estimate E [Y ] using Monte Carlo
sampling to yield a two-level Monte Carlo estimator of E [X] based on the control
variate, Y , i.e.,
M0 M1
1 X 1 X
µ̃2 := Y (!m0 ) + (X Y ) (!m1 ).
M0 m =1 M1 m =1
0 1
See Section 6.1.6 for details about the definition of levels in our context.
In this work, we apply Giles’s multilevel control variates idea to the hybrid Cher-
no↵ tau-leap approach to reduce the computational cost, which is measured as the
amount of time needed for computing an estimate of E [g(X(T ))], within T OL, with a
given level of confidence. We show that our hybrid MLMC method has the same com-
putational complexity of the pure SSA, i.e., order O (T OL 2 ). From this perspective,
our method can be seen as a variance reduction for the SSA since our MLMC method
173
does not change the complexity; it just reduces the corresponding multiplicative con-
stant. We note in passing that in [13], the authors show that the computational
complexity for the pure MLMC tau-leap case has order O (T OL 2 (log(T OL))2 ). We
note also that here our goal is to provide an estimate of E [g(X(T ))] in the probability
sense and not in the mean square sense as in [1].
The global error arising from our hybrid tau-leap MLMC method can naturally be
decomposed into three components: the global exit error, the time discretization error
and the statistical error. This global error should be less than a prescribed tolerance,
T OL, with probability larger than a certain confidence level. The global exit error
is controlled by the one-step exit probability bound, [2]. The time discretization
error, inherent to the tau-leap method, is controlled through the size of the mesh, t,
[14]. At this point, it is crucial to stress that, by controlling the exit probability of
the set of hybrid paths, we are indirectly turning this event into a rare event. Thus,
direct sampling of exit paths is not an a↵ordable way to estimate the probability of
such an event.
Motivated by the Central Limit results of Collier et al. [15] for the Multilevel
Monte Carlo estimator (see appendix A, Theorem 1), we approximate the statistical
error with a Gaussian random variable with zero mean. In our numerical experiments,
we tested this hypothesis by employing Q-Q plots and the Shapiro-Wilk test [16].
There, we did not reject the Gaussianity of the statistical error at the 1% significance
level. The variance of the statistical error is a linear combination of the variance
at the coarsest level and variances of the di↵erence of two consecutive levels, which
we sometimes call strong errors. In Section 6.3.3, motivated by the fact that sample
variance and bias estimators are inaccurate on the deepest levels, we develop a novel
dual-weighted residual expansion that allows us to estimate those quantities, cf. (6.7)
and (6.8). We also control the statistical error through the number of coupled hybrid
paths, (M` )L`=0 , simulated at each level.
174
We note that our use of duals in this work is di↵erent from the use in [14]. That
earlier work proposed an adaptive, single-level, tau-leap algorithm for error control,
choosing the time steps non-uniformly to control the global weak error based on dual-
weighted error estimators. In this work, we do not have an adaptive time step based
on dual-weighted error estimators as in [14]. We use instead dual-weighted error
estimators to reduce the statistical error in our error estimates.
6.1.1 A Class of Markovian Pure Jump Processes
To describe the class of Markovian pure jump process, X : [0, T ] ⇥ ⌦ ! Zd+ , that we
use in this work, we consider a system of d species interacting through J di↵erent
reaction channels. For the sake of brevity, we write X(t, !)⌘X(t). Let Xi (t) be
the number of particles of species i in the system at time t. We want to study the
evolution of the state vector,
X(t) = (X1 (t), . . . , Xd (t)) 2 Zd+ ,
modelled as a continuous-time, discrete-space Markov chain starting at some state,

X(0) 2 Zd+ . Each reaction can be described by the vector ⌫j 2 Zd , such that, for a
state vector x 2 Zd+ , a single firing of reaction j leads to the change
x ! x + ⌫j .
The probability that reaction j will occur during the small interval (t, t+dt) is then
assumed to be
P (X(t + dt) = x + ⌫j |X(t) = x) = aj (x)dt + o (dt) , (6.1)

175
with a given non-negative polynomial propensity function, aj : Rd ! R. We set
/ Zd+ . A process, X, that satisfies (6.1), is a
continuous-time, discrete-space Markov chain that admits the following random time
change representation [4]:
J
X ✓Z t ◆
X(t) = X(0) + ⌫ j Yj aj (X(s)) ds , (6.2)
j=1 0
where Yj : R+ ⇥⌦ ! Z+ are independent unit-rate Poisson processes. Hence, X is a

non-homogeneous Poisson process.
In [14], the authors assume that there exists a vector, w 2 Rd+ , such that (w, ⌫j ) 
0, for any reaction ⌫j . Therefore, every reaction, ⌫j , must have at least one negative
component. This means that the species can be either transformed into other species
or be consumed during the reaction. As a consequence, the space of states is contained
in a simplex with vertices in the coordinate axis. This assumption excludes, for
instance, birth processes. In our numerical examples, we allow the set of possible
states of the system to be infinite, but we explicitly avoid cases in which one or more
species grows exponentially fast or blows up in the time interval [0, T ].
Remark 6.1.1. In this setting, the solution of the following system of ordinary dif-
ferential equations,
8
>
< ẋ(t) = ⌫a(x(t)), t 2 R+
>
: x(0) = x0 2 R+ ,
is called mean field solution, where ⌫ is the matrix with columns ⌫j and a(x) is the
column vector of propensities. In Section 6.4, we use the mean field path for scaling
and preprocessing constants associated with the computational work of the SSA and
Cherno↵ tau-leap steps.
176
6.1.2 Description of the Modified Next Reaction Method
(MNRM)
The MNRM, introduced in [6], based on the Next Reaction Method (NRM) [17], is
an exact simulation algorithm like Gillespie’s SSA that explicitly uses representation
(6.2) for simulating exact paths and generates only one exponential random variable
per iteration. The reaction times are modeled with firing times of Poisson processes,
Yj , with internal times given by the integrated propensity functions. The randomness
is now separated from the state of the system and is encapsulated in the Yj ’s. For
Rt
each reaction, j, the internal time is defined as Rj (t)= 0 aj (X(s))ds. There are J+1
time frames in the system, the absolute one, t, and one for each Poisson process,
Yj . Computing the next reaction and its time is equivalent to computing how much
time passes before one of the Poisson processes, Yj , fires, and to identifying which
process fires at that particular time, by taking the minimum of such times. The NRM
and MNRM make use of internal times to reduce the number of simulated random
variables by half. In the following, we describe the MRNM and then we present its
implementation in Algorithm 12.
Given t, we have the propensity aj =aj (X(t)) and the internal time Rj =Rj (t).
Now, let Rj be the remaining time for the reaction, j, to fire, assuming that aj
stays constant over the interval [t, t+ Rj ). Then, t+ Rj is the time when the next
reaction, j, occurs. The next internal time at which the reaction, j, fires is then
given by Rj +aj Rj . When simulating the next step, the first reaction that fires
occurs after = minj Rj . We then update the state of the system according to that
reaction, add to the global time, t, and then update the internal times by adding
aj to each Rj . We are left to determine the value of Rj , i.e., the amount of time
until the Poisson process, Yj , fires, taking into account that aj remains constant until
the first reaction occurs. Denote by Rj the first firing time of Yj that is strictly larger
than Rj , i.e., Pj := min{s>Rj : Yj (s)>Yj (Rj )} and finally Rj = a1j (Pj Rj ).
177
Algorithm 12 The Modified Next Reaction Method. Inputs: the initial state, X(0),
the next grid point, T0 > t0 , the propensity functions, (aj )Jj=1 , the stoichiometric
vectors, (⌫j )Jj=1 . Outputs: the history of system states, (X(tk ))K
k=0 . Here, we denote
S ⌘ (Sj )Jj=1 , P ⌘ (Pj )Jj=1 , and R ⌘ (Rj )Jj=1 .
1: k 0, tk 0, X(tk ) X(0) and R 0
2: Generate J independent, uniform(0, 1) random numbers, rj
3: P (log(1/rj ))Jj=1
4: while tk < T0 do
5: S (aj (X(tk )))Jj=1
6: ( Rj )Jj=1 ((Pj Rj )/Sj )Jj=1
7: µ argminj { Rj }
8: minj { Rj }
9: tk+1 tk +
10: X(tk+1 ) X(tk ) + ⌫µ
11: R R+S
12: r uniform(0, 1)
13: Pµ Pµ + log(1/r)
14: k k+1
15: end while
16: return (X(tl ))kl=01
Among the advantages already mentioned, we can easily modify Algorithm 1 to

generate paths in the cases where the rate functions depend on time and also when
there are reactions delayed in time. Finally, it is possible to simulate correlated
exact/tau-leap paths using this algorithm as well as nested tau-leap/tau-leap paths.
In [1], this technique is used to develop a uniform-step, unbiased, multilevel Monte
Carlo (MLMC) algorithm. In Section 6.2.2, we use this feature for coupling two exact
paths.
6.1.3 The Tau-Leap Approximation
In this section, we define X̄, the tau-leap approximation of the process, X, which
follows from applying the forward Euler approximation to the integral term in the
178
following random time-change representation of X:
J
X ✓Z t+⌧ ◆
X(t + ⌧ ) = X(t) + ⌫ j Yj aj (X(s)) ds .
j=1 t
The tau-leap method was proposed in [7] to avoid the computational drawback
of the exact methods, i.e., when many reactions occur during a short time interval.
The tau-leap process, X̄, starts from X(0) at time 0, and given that X̄(t)=x̄ and a
time step ⌧ >0, we have that X̄ at time t+⌧ is generated by
J
X
X̄(t + ⌧ ) = x̄ + ⌫j Pj (aj (x̄)⌧ ) ,
j=1
where {Pj ( j )}Jj=1 are independent Poisson distributed random variables with pa-
rameter j, used to model the number of times that the reaction j fires during the
(t, t+⌧ ) interval. Again, this is nothing other than a forward Euler discretization
of the stochastic di↵erential equation formulation of the pure jump process (6.2),
realized by the Poisson random measure with state dependent intensity (see, e.g.,
[9]).
In the limit, when ⌧ tends to zero, the tau-leap method gives the same solution
as the exact methods. The total number of firings in each channel is a Poisson-
distributed stochastic variable depending only on the initial population, X̄(t). The
error thus comes from the variation of a(X(s)) for s 2 (t, t+⌧ ).
We observe that the computational work of a tau-leap step involves the generation
of J independent Poisson random variables. This is in contrast to the computational
work of an exact step, which involves only the work of generating two uniform random
variables, in the case of the SSA, and only one in the case of MNRM.
179
6.1.4 The Cherno↵-Based Pre-Leap Check
In [2], we derived a Cherno↵-type bound that allows us to guarantee that the one-step
exit probability in the tau-leap method is less than a predefined quantity, >0. We
now briefly summarize the main idea. Consider the following pre-leap check problem:
find the largest possible ⌧ such that, with high probability, in the next step, the
approximate process, X̄, will take a value in the lattice, Zd+ , of non-negative integers.
The solution to that problem can be achieved by solving d auxiliary problems, one for
each x-coordinate, i = 1, 2, . . . , d, as follows. Find the largest possible ⌧i 0, such
that !
J
X
P X̄i (t) + ⌫ji Pj aj X̄(t) ⌧i < 0 X̄(t)  i, (6.3)
j=1
where i= /d, and ⌫ji is the i-th coordinate of the j-th reaction channel, ⌫j . Fi-
nally, we let ⌧ := min{⌧i : i = 1, 2, . . . , d}. To find the largest time steps, ⌧i , let
P
Qi (t, ⌧i ):= Jj=1 ( ⌫ji )Pj aj X̄(t) ⌧i . Then, for all s>0, we have the Cherno↵
bound:
J
!
X
s⌫ji
P Qi (t, ⌧i )>X̄i (t) X̄(t)  inf exp sX̄i (t) + ⌧i aj (X̄(t))(e 1) .
s>0
j=1
Expressing ⌧i as a function of s, we write
log( i ) + sX̄i (t) Ri (s)

⌧i (s) = P =: ,
a0 X̄(t) + Jj=1 aj (X̄(t))e s ⌫ji Di (s)
where
J
X
a0 (X̄(t)) := aj (X̄(t)).
j=1
We want to maximize ⌧i while satisfying condition (6.3). Let ⌧i⇤ be this maximum.
We then have the following possibilities: If ⌫ji 0, for all j, then naturally ⌧i⇤ = + 1;
otherwise, we have the following three cases:
180
1. Di (si )>0. In this case, ⌧i (si )=0 and Di (s) is positive and increasing as 8s
si . Therefore, ⌧i (s) is equal to the ratio of two positive increasing functions.
The numerator, Ri (s), is a linear function and the denominator, Di (s), grows
exponentially fast. Then, there exist an upper bound, ⌧i⇤ , and a unique number,
s̃i , which satisfy ⌧i (s̃i )=⌧i⇤ . We developed an algorithm in [2] for approximating
s̃i , using the relation ⌧i0 (s̃i )=0.
2. If Di (si )<0, then ⌧i⇤ = + 1.
3. If Di (si )=0, then ⌧i⇤ =X̄i (t)/Di0 (si ).
Here si := log( i )/X̄i (t).
6.1.5 The Hybrid Algorithm for Single-Path Generation
In this section, we briefly summarize our previous work, presented in [2], on hybrid
paths.
The main idea behind the hybrid algorithm is the following. A path generated by
an exact method (like SSA or MNRM) never exits the lattice, Zd+ , although the com-
putational cost may be una↵ordable due to many small inter-arrival times typically
occurring when the process is “far” from the boundary. A tau-leap path, which may
be cheaper than an exact one, could leave the lattice at any step. The probability
of this event depends on the size of the next time step and the current state of the
approximate process, X̄(t). This one-step exit probability could be large, especially
when the approximate process is “close” to the boundary. We developed in [2] a
Cherno↵-type bound to control the mentioned one-step exit probability. Even more,
by construction, the probability that one hybrid path exits the lattice, Zd+ , can be
estimated by
⇥ ⇤ 2 ⇥ 2 ⇤
P (Ac )  E 1 (1 )NTL = E [NTL ] (E NTL E [NTL ]) + o( 2 ),
2
181
K(¯
!)
¯ 2 A if and only if the whole hybrid path, (X̄(tk , !
where ! ¯ ))k=0 , belongs to the
lattice, Zd+ , >0 is the one-step exit probability bound, and NTL (¯
! )⌘NTL is the
number of tau-leap steps in a hybrid path. Here, Ac is the complement of the set A.
To simulate a hybrid exact/Cherno↵ tau-leap path, we first developed a one-
step switching rule that, given the current state of the approximate process, X̄(t),
adaptively determines whether to use an exact or an approximated method for the
next step. This decision is based on the relative computational cost of taking an
exact step (MNRM) versus the cost of taking a Cherno↵ tau-leap step. We show the
switching rule in Algorithm 13. To compare the mentioned computational costs, we
Algorithm 13 The one-step switching rule. Inputs: the current state of the ap-
proximate process, X̄(t), the current time, t, the values of the propensity functions
evaluated at X̄(t), (aj (X̄(t)))Jj=1 , the one-step exit probability bound , and the next
⇥ ⇤
grid point, T0 . Outputs: method and ⌧ . Notes: based on E ⌧SSA (X̄(t)) X̄(t) =
1/a0 (X̄(t)) and ⌧Ch (X̄(t), ), this algorithm adaptively selects between MNRM and
Cherno↵ tau-leap (TL). We denote by ⌧MNRM (⌧Ch ) the step size when the decision is
to use the MNRM (tau-leap) method.
PJ
1: if K1 /a0 < T0 t then
2: ⌧Ch compute Cherno↵ step size (see Section 2.2 in [2] )
3: if ⌧Ch < K2 (X̄(t), )/a0 then
4: return (MNRM, ⌧MNRM )
5: else
6: return (TL, ⌧Ch )
7: end if
8: else
9: return (MNRM, ⌧MNRM )
10: end if
define K1 as the ratio between the cost of computing ⌧Ch and the cost of computing
one step using the MNRM method, and K2 =K2 (X̄(t), ) is defined as the cost of
taking a Cherno↵ tau-leap step, divided by the cost of taking a MNRM step plus the
cost of computing ⌧Ch . For further details on the switching rule, we refer to [2].
182
6.1.6 The Multilevel Monte Carlo Setting
In this subsection, we briefly summarize the control variates idea developed by Giles
in [3]. Let {X̄` (t)}t2[0,T ] be a hybrid Cherno↵ tau-leap process with a time mesh of size
t` and a one-step exit probability bound, . We can simulate paths of {X̄` (t)}t2[0,T ]
by using Algorithm 4 in [2]. Let g` :=g(X̄` (T )).
Consider a hierarchy of nested meshes of the time interval [0, T ], indexed by
` = 0, 1, . . . , L. Let t0 be the size of the coarsest time mesh that corresponds
`
to the level `=0. The size of the time mesh at level ` 1 is given by t` =R t0 ,
where R>1 is a given integer constant.
Assume that we are interested in estimating E [gL ], and we are able to simulate
correlated pairs, (g` , g` 1 ) for ` = 1, . . . , L. Then, the following unbiased Monte Carlo
estimator of E [gL ] uses gL 1 as a control variate:
ML
1 X
µ̃L := (gL (!mL ) (gL 1 (!mL ) E [gL 1 ]))
ML m =1
L
ML
1 X
= E [gL 1] + (gL gL 1 )(!mL ).
ML m =1
L
Applying this idea recursively and taking into account the following telescopic de-
P
composition: E [gL ] = E [g0 ] + L`=1 E [g` g` 1 ], we arrive at the multilevel Monte
Carlo estimator of E [gL ]:
M0 L M
1 X X 1 X̀
µ̂L := g0 (!m0 ) + (g` g` 1 )(!m` ). (6.4)
M0 m =1 `=1
M` m =1
0 `
We have that µ̂L is unbiased, since E [µ̂L ] =E [gL ]. The variance of µ̂L is given
P
by Var [µ̂L ] = Var[g
M0
0]
+ L`=1 Var[gM
` g` 1 ]
`
. Here, we are assuming independence among
the batches between levels. For highly correlated pairs, (g` , g` 1 ), we can expect, for
the same computational work, that Var [µ̂L ] is much less than the variance of the
183
standard Monte Carlo estimator of E [gL ].
6.1.7 The Large Kurtosis Problem
Let us give a close examination of the problem of estimating Var [g` g` 1 ] for highly
correlated pairs, (g` , g` 1 ). This estimation is required to solve the optimization
problem (6.25), that indicates how to choose the simulation parameters, particularly
the number of simulated coupled paths for each pair of consecutive levels, (M` )L`=0 .
When ` becomes large, due to our coupling strategy developed in Section 6.2, we
expect to obtain g` = g` 1 in most of our simulations, while observing di↵erences only
in a very small proportion of the simulated coupled paths.
For the sake of illustration, let us assume that the random variable ` :=g` g` 1
takes values in the set { 1, 0, 1}, with respective probabilities {p` , 1 2p` , p` }, where
p` goes to zero. The kurtosis of ` is by definition
⇥ ⇤
E ( ` E [ ` ])4
⇥ ⇤ 2 3.
E ( ` E [ ` ])2
Simple calculations show that the kurtosis of ` is (2p` ) 1 , and we observe that 2
` ⇠
Bernoulli(2p` ). The maximum likelihood estimator of 2p` , ✓ˆ` , is the sample average
of M` independent and identically distributed (iid) values of 2` . The coefficient of
h i h i
variation of ✓ˆ` , defined as (Var ✓ˆ` )1/2 (E ✓ˆ` ) 1 , is (2p` M` ) 1 . Therefore, an accurate
estimation of p` requires a sample of size
M` (2p` ) 1 !1.
This lower bound on M` goes strongly against the spirit of the Multilevel Monte Carlo
method, where M` should be a decreasing function of `.
To overcome this difficulty, in Section 6.3.3, we developed a formula based on
184
dual-weighted residuals. The technique of dual-weighted residuals can be motivated
¯ , such that its position at time s, having departed
as follows: consider a process X̄
¯ (s; t, x). Notice that for
from the state x, at a previous time t, is denoted as X̄
¯ (T ; t, x) = X̄
t<s<T , we have that X̄ ¯ (T ; s, X̄
¯ (s; t, x)). Let us define an auxiliary
¯ (T ; t, x)), where g is an observable scalar function of the
function U (t, x) := g(X̄
¯ that started from the state x at the initial time, t. If
final state of the process X̄
¯ , we want to have a computable approximation for
X̄ is a process approximating X̄
g(X̄(T ; 0, x0 )) ¯ (T ; 0, x )). Consider a time mesh, {0=t , t , . . . , t =T }, and
g(X̄ 0 0 1 N
¯
define X̄tn :=X̄(tn ; 0, x0 ), X̄ ¯ ¯
tn+1 :=X̄ (tn+1 ; tn , X̄tn ) and en+1 := X̄tn+1 X̄tn+1 . Observe
that
g(X̄(T ; 0, x0 )) ¯ (T ; 0, x )) = U (T, X̄(T ; 0, x ))

g(X̄ U (0, x0 )
0 0
N
X1
= U (tn+1 , X̄tn+1 ) U (tn , X̄tn )
n=0
X1 ⇣
N ⌘
= ¯ (t ; t , X̄ ))
U (tn+1 , X̄tn+1 ) U (tn+1 , X̄ n+1 n tn
n=0
X1 ✓
N Z 1 ◆
= en+1 · rx U (tn+1 , X̄tn+1 sen+1 )ds
n=0 0
N
X1
= en+1 · rx U (tn+1 , X̄tn+1 ) + r2 U ken+1 k2 + h.o.t. .
n=0
185
N
We can now write a backward recurrence for the dual weights, ( n )n=1 :
¯ (T ; t , X̄ ))
n :=rx U (tn , X̄tn ) = @X̄tn g(X̄ n tn
¯ (T ; t , X̄ ))
= @X̄tn g(X̄ n+1 tn+1
@X̄tn+1
¯ (T ; t , X̄ ))
= @X̄tn+1 g(X̄ n+1 tn+1
@X̄tn
@X̄tn+1
= rx U (tn+1 , X̄tn+1 )
@X̄tn
@X̄tn+1
= n+1
@X̄tn
N :=rg(X̄(T ; 0, x0 )).
¯ that are pathwise di↵erentiable

This reasoning evidently works for processes X̄
with respect to the initial condition. Our space state is in general a subset of the
lattice, Zd+ , and for that reason, we can not directly apply this technique. In [18],
the authors show how this dual-weighted residual technique can be adapted to the
tau-leap case in regimes close to the mean field or to the Stochastic Langevin limit.
In more general regimes, the formula (6.8), which provides accurate estimates of
Var [g` g` 1 ] in our numerical examples (see for instance Figure 6.5 in Section 6.5), is
promising but more research is needed in this direction. Specifically, in Section 6.3.3,
the formula (6.8) is deduced from the conditional distribution of the local errors,
en+1 |F, conditional on a sigma-algebra, F, generated by the sequence, (X̄tn )N
n=1 , and
applying the tower properties of conditional expectation and conditional variance.

Similar comments apply to Formula (6.7) regarding the weak error, E [g(X(T )) gL ].
6.1.8 Outline of this Work
In Section 6.2, we first show the main idea for coupling two tau-leap paths, which
comes from a construction by Kurtz [19] for coupling two Poisson random variables.
186
Then, inspired by the ideas of Anderson and Higham in [1], we propose an algorithm
for coupling two hybrid Cherno↵ tau-leap paths (see [2]). This algorithm uses four
building blocks that result from the combination of the MNRM and the tau-leap
methods. In Section 6.3, we propose a novel hybrid MLMC estimator. Next, we
introduce a global error decomposition; and finally, we develop formulae to efficiently
estimate the variance of the di↵erence of two consecutive levels and to estimate the
bias based on dual-weighted residuals. These estimates are particularly useful to
addressing the large kurtosis problem, described in Section 6.1.7, that appears at the
deeper levels and makes standard sample estimators too costly. Next, in Section 6.4,
we show how to control the three error components of the global error and how to
obtain the parameters needed for computing the hybrid MLMC estimator to achieve
a given tolerance with nearly optimal computational work. We also show that the
computational complexity of our method is of order O (T OL 2 ). In Section 6.5,
the numerical examples illustrate the advantages of the hybrid MLMC method over
the single-level approach presented in [2] and to the SSA. Section 6.6 presents our
conclusions and suggestions for future work.
6.2 Generating Coupled Hybrid Paths
In this section, we present an algorithm that generates coupled hybrid Cherno↵ tau-
leap paths, which is an essential ingredient for the multilevel Monte Carlo estimator.
We first show how to couple two Poisson random variables and then we explain how
we make use of the two algorithms presented in [1] as Algorithms 2 and 3 and two
additional algorithms we developed to create an algorithm that generates coupled
hybrid paths.
187
6.2.1 Coupling Two Poisson Random Variables
We motivate our coupling algorithm (Algorithm 14) by first describing how to couple
two Poisson random variables. In our context, ‘coupling’ means that we want to
induce a correlation between them that is as strong as possible. This construction
was first proposed by Kurtz in [19]. Suppose that we want to couple P1 ( 1 ) and
P2 ( 2 ), two Poisson random variables, with rates 1 and 2, respectively. Consider
the following decompositions,
P1 ( 1 ) := P ⇤ ( 1 ^ 2) + Q1 ( 1 1 ^ 2)
P2 ( 2 ) := P ⇤ ( 1 ^ 2) + Q2 ( 2 1 ^ 2 ),
where P ⇤ ( 1 ^ 2 ), Q1 ( 1 1 ^ 2) and Q2 ( 2 1 ^ 2) are three independent

Poisson random variables. Here, 1 ^ 2 := min{ 1 , 2 }. Observe that at least one
of the following vanishes: Q1 ( 1 1 ^ 2) and Q2 ( 2 1 ^ 2 ). This is because
at least one of the rates is zero. Algorithm 14 implements these ideas. Finally, note
that, by construction, we have
Var [P1 ( 1 ) P2 ( 2 )] = Var [Q1 ( 1 1 ^ 2) Q2 ( 2 1 ^ 2 )]
=| 1 2| .
However, if instead we consider making P1 ( 1 ) and P2 ( 2 ) independent, then
Var [P1 ( 1 ) P2 ( 2 )] = 1 + 2,
which may be a large value even when 1 and 2 are close.

188
6.2.2 Coupling Two Hybrid Paths
In this section, we describe how to generate two coupled hybrid Cherno↵ tau-leap
¯ , corresponding to two nested time discretizations, called coarse and
paths, X̄ and X̄
fine, respectively. Assume that the current time is t, and we know the states, X̄(t)
¯ (t). Based on this knowledge, we have to determine a method for each level.
and X̄
This method can be either the MNRM or the tau-leap one, determining four possible
combinations leading to four algorithms, B1, B2, B3 and B4, that we use as building
blocks. Table 6.1 summarizes them.
Algorithm at coarser mesh at fine mesh

B1 (part of Algorithm 14) TL TL
B2 (Algorithm 16) TL MNRM
B3 ” MNRM TL
B4 ” MNRM MNRM
Table 6.1: Building blocks for simulating two coupled hybrid Cherno↵ tau-leap paths.
Algorithms B1 and B2 are presented as Algorithms 2 and 3 in [1]. Algorithm B3 can
be directly obtained from Algorithm B2. Algorithm B4 is also based on Algorithm
B2, but to produce MNRM steps, we update the propensities at the coarse level at
the beginning of each time interval defined by the fine level.
We note that the only case in which we use a Poisson random variates generator
for the tau-leap method is in Algorithm B1. In Algorithms B2 and B3, the Poisson
random variables are simulated by adding independent exponential random variables
with the same rate, , until a given time final time T is exceeded. The rate, ,
is obtained by freezing the propensity functions, a, at time t. More specifically,
the Poisson random variates are obtained by using the MNRM repeatedly without
updating the intensity.
We now briefly describe the Cherno↵ hybrid coupling algorithm, i.e., Algorithm
14. Given the current time, t, and the current state of the process at the coarse level,
¯ (t), this algorithm determines the next time point at which
X̄(t), and the fine level, X̄
we run the algorithm (called time “horizon”). To fix the idea, let us assume that,
189
based on X̄(t), the one-step switching rule, i.e., Algorithm 13, chooses the tau-leap
method at the coarse level, with the corresponding Cherno↵ step size, ⌧¯. As we
mentioned, this ⌧¯ is the largest step size such that the probability that the process,
in the next time step, takes a value outside Zd+ , is less than ¯. This step size plus
the current time, t, cannot be greater than the final time, T , and also cannot be
greater than the next time discretization grid point in the coarse grid, t̄, because the
discretization error must be controlled. Taking the minimum of all those values, we
obtain the next time horizon at the coarse grid, H̄. Note that, if the chosen method
is MNRM instead of tau-leap, we do not need to take into account the grid, and the
next time horizon will be the minimum between the next reaction time and the final
time, T .
We now explain algorithm B1 (TL-TL). Assume that tau-leap is chosen at the
coarse and at the fine level. We thus obtain two time horizons, one for the coarse
¯ . In this case, the global time horizon
level, H̄, and another for the fine level, H̄
¯ }. Since the chosen method in both grid levels is tau-leap, we
will be H:= min{H̄, H̄
need to freeze the propensities at the beginning of the corresponding intervals. In the
coarse case, during the interval [t, H̄) (the propensities are equal to a(X̄(t))=:ā), and
¯ ) (the propensities are equal to a(X̄
in the fine case during the interval [t, H̄ ¯ (t))=:ā
¯).
¯ (see Figure 6.1).
Suppose that H̄ < H̄
t+¯ t̄
H̄
¯t̄ t+¯
t ¯
H̄ T
Figure 6.1: This figure depicts a particular instance of the Cherno↵ hybrid coupling
algorithm (Algorithm 14), where ⌧¯ < ⌧¯. The synchronization horizon H, defined as
H:= min{H̄, H̄ ¯ }, is equal to H̄ in this case. Notice that H̄:= min{t̄, t + ⌧¯, T } and
¯ := min{t̄¯, t + ⌧¯, T }
H̄
190
Then, we couple two Poisson random variables at time t=H̄, using the idea de-
scribed in Section 6.2.1. When time reaches H̄, the decision between which method
to use (and the corresponding step size) at the coarse level must be made again. Note
¯ . The
that the propensities of the process at the fine grid will be kept frozen until H̄
¯ is analogous to the one we described, but the decisions on the
case when H̄ > H̄
¯ . It can also be
method and step size are made at the finer level, when time reaches H̄
¯ . In that case, the decision between which method to use (and
possible that H̄ = H̄
the corresponding step size) must be made at the coarse and at the fine level.
In the case of algorithm B2 (TL-MNRM), we assume that tau-leap is chosen
at the coarse level, and MNRM at the fine level, obtaining two time horizons, one
¯ . The only di↵erence in
for the coarse level, H̄, and another for the fine level, H̄
how we determine the time horizons between algorithms B1 and B2 is that the time
¯.
discretization grid points in the fine grid are not taken into account to determine H̄
¯ }. Suppose
Algorithm B2 is then applied until the simulation reaches H:= min{H̄, H̄
¯ < H̄. In this case, the process X̄
that H̄ ¯ could take more than one step to reach H̄
¯.
¯ (·)) are computed, but not the propensities
At each step, the propensity functions a(X̄
for the coarse level, because in that case the tau-leap method is used. Note that the
decision between which algorithm to use (B2 or another) is not made at those steps,
¯ . When time reaches H̄
but only when time reaches H̄ ¯ , the decision of which method
to use (and the corresponding step size) at the fine level must be made again. In this
case, the propensities at the coarse grid will be kept frozen until H̄. The reasoning
¯ > H̄ and H̄
for the cases H̄ ¯ = H̄ are similar to before.
The other two cases, that is, B3 and B4, are the same as B2. The only di↵erence
¯. See Algorithm 14 for more
resides is when to update the propensity values, ā and ā
details. As made clear in the preceding paragraphs, the decision on which algorithm
to use for a certain time interval is made only at the horizon points.
Remark 6.2.1. [About telescoping] To ensure the telescoping sum property, the prob-
191
ability law of the hybrid process at level ` should be the same disregarding whether level
¯ ) or the coarser in the pair (X̄ , X̄
` is the finer in the pair (X̄` 1 , X̄ ¯ ). For that
` ` `+1
reason, each process has its own next horizon as its decision points. See Figure 6.1
showing the time horizons scheme and Figures 6.14 and 6.15 in Section 6.5 to see
that the telescoping sum property is satisfied by our hybrid coupling sampling scheme.
6.3 Multilevel Monte Carlo Estimator and Global
Error Decomposition
In this section, we present the multilevel Monte Carlo estimator. We first show the
estimator and its properties and then we analyze and control the computational global
error, which is decomposed into three error components: the discretization error, the
global exit error, and the Monte Carlo statistical error. We give upper bounds for
each one of the three components.
6.3.1 The MLMC Estimator
In this section, we discuss and implement a variation of the multilevel Monte Carlo
estimator (6.4) for the hybrid Cherno↵ tau-leap case. The main ingredient of this
section is Algorithm 14, which generates coupled hybrid paths at levels ` 1 and `.
Let us now introduce some notation. Let A` be the event in which the X̄` -path arrived
at the final time, T , without exiting the state space of X. Let 1A , be the indicator
function of an arbitrary set, A. Finally, g` := g(X̄` (T )) was defined in Section 6.1.6.
Consider the following telescoping decomposition:
L
X ⇥ ⇤
E [gL 1AL ] = E [g0 1A0 ] + E g ` 1A` g ` 1 1A ` 1
,
`=1
192
which motivates the definition of our MLMC estimator of E [g(X(T ))],
M0 L M
1 X X 1 X̀
ML := g0 1A0 (!m,0 ) + [g` 1A` g` 1 1A` 1 ](!m,` ). (6.5)
M0 m=1 `=1
M` m=1
In this section, we define the computational global error, EL , and show how it can
be naturally decomposed into three components: the discretization error, EI,L , and
the exit error, EE,L , both coming from the tau-leap part of the hybrid method and
the Monte Carlo statistical error, ES,L . Next, we show how to model and control
the global error, EL , giving upper bounds for each one of the three components. We
define the computational global error, EL , as
EL := E [g(X(T ))] ML .
Now, consider the following decomposition of EL :
⇥ ⇤
E [g(X(T ))] ML = E g(X(T ))(1AL + 1AcL ) E [gL 1AL ] + E [gL 1AL ] ML
⇥ ⇤
= E g(X(T ))1AcL + E [(g(X(T )) gL ) 1AL ] + E [gL 1AL ] ML .
| {z } | {z } | {z }
=:EE,L =:EI,L =:ES,L
We show in [2] that by choosing adequately the one-step exit probability bound, ,
the exit error, EE,L , satisfies |EE,L |  |E [g(X(T ))] | P (AcL )  T OL2 . An efficient pro-
cedure for accurately estimating EI,L in the context of the tau-leap method is described
in [14]. We adapt this method in Algorithm 20 for estimating the weak error in the
N (¯
!)
hybrid context. A brief description follows. For each hybrid path, (X̄` (tn,` , !
¯ ))n=0 ,
N (¯
!)
we define the sequence of dual weights ('n,` (¯
! ))n=1 backwards as follows (see Section
193
6.1.7):
'N (¯!),` := rg(X̄` (tN (¯!),` , !

¯ )) (6.6)
'n,` := Id + tn,` JTa (X̄` (tn,` , !

¯ )) ⌫ T 'n+1,` , n = N (¯
! ) 1, . . . , 1,
where tn,` :=tn+1,` tn,` , r is the gradient operator and Ja (X̄` (tn,` , !
¯ ))⌘[@i aj (X̄` (tn,` , !
¯ ))]j,i
is the Jacobian matrix of the propensity function, aj , for j=1 . . . J and i=1 . . . d. Ac-
cording to this method, EI,L is approximated by A (EI,L (¯
! ); ·), where
N (¯
!) J
!
X tn,L X
EI,L (¯
! ) := 1T L (n) ('n,L · ⌫j ) aj,n (¯
! ), (6.7)
n=1
2 j=1
PM
A (X; M ) := M1 m=1 X(!m ), and, S 2 (X; M ) :=A (X 2 ; M ) A (X; M )2 denote the
sample mean and the sample variance of the random variable, X, respectively. Here,
aj,n (¯
! ):=aj (X̄L (tn+1,` , !
¯ )) aj (X̄L (tn,` , !
¯ )), 1T L (n)=1 if and only if, at time tn,` , the
tau-leap method was used, and we denote by Id the d ⇥ d identity matrix.
Remark 6.3.1 (Computational cost of dual computations). It is easy to see that

the computational cost per path of the dual computations in (6.6) is comparable, and
possibly smaller than the hybrid path. Indeed, no new random variables, especially
Poisson ones, which are the most computationally expensive in the forward simulation,
need to be sampled and no coupling between levels is needed. Moreover, we use (6.6)
only to determine the discretisation parameters for the actual run; so (6.6) is thus
used only in a fraction of the realisations.
P L V`
The variance of the statistical error, ES,L , is given by `=0 M` , where V0 :=
⇥ ⇤
Var [g0 1A0 ] and V` := Var g` 1A` g` 1 1A` 1 , ` 1. In the next subsection, we
show how to estimate V` efficiently using the duals from (6.6).
194
6.3.3 Dual-weighted Residual Estimation of V`
Here, we derive the formula (6.8) for estimating the variance, V` , ` 1. It is based
on dual-weighted local errors arising from two consecutive tau-leap approximations
of the process, X. For each level ` 1, the formula estimates V` with much smaller
statistical error than the standard sample estimator, which is seriously a↵ected by
the large kurtosis present at the deepest levels (see Section 6.1.7).
Let us introduce some notation:
fj,n := ('n+1 · ⌫j ),
tn X
µj,n := (raj (xn ) · ⌫i )ai (xn ),
2 i
tn X
µ̄j,n := |(raj (xn ) · ⌫i )|ai (xn ),
2 i
2 tn X
j,n := (raj (xn ) · ⌫i )2 ai (xn ),
2 i
q
mj,n := min{µ̄j,n , µ2j,n + j,n2
},
µj,n
qj,n := ,
j,n
pj,n := ( qj,n ),
µ̃j,n := µj,n (1 2pj,n ),

r
2 2
˜j,n := j,n exp( qj,n /2).
⇡
Here, (x) is the cumulative distribution function of a standard Gaussian random

195
variable. We define our dual-weighted estimator of V` as
!
X tn X
V̂` := S 2 1T L (n) fj,n µj,n ; M` (6.8)
n
2 j
!
X ( tn )3 X X
+A 1T L (n) fj,n fj 0 ,n (raj (xn ) · ⌫i )(raj 0 (xn ) · ⌫i )ai (xn ); M`
n
8 j,j 0 i
!
X tn X 2
+A 1T L (n) fj,n 1Gn (µ̃j,n + ˜j,n ) + 1Gcn mj,n , M` ,
n
2 j
tn
where 1Gn =1 if and only if aj (xn ) 2
>c for all j 2 {1, . . . , J}, where c is a positive
user-defined constant.
First, notice that V` could be a very small positive number. In fact, in our nu-
merical experiments, we observe that the standard Monte Carlo sample estimation
of this quantity turns out to be computationally infeasible due to the huge number
of simulations required to stabilize its coefficient of variation. For this reason, we
initially consider the following dual-weighted approximations:
" #
X
E [ g` g` 1 ] ⇡ E 'n+1,` 1 · en+1,` 1 , (6.9)
n
" #
X
Var [ g` g` 1 ] ⇡ Var 'n+1,` 1 · en+1,` 1 ,
n
N (¯
!) 1
where ('n+1,` 1 )n=0 , defined in (6.6), is a sequence of dual weights computed
N (¯
!)
backwards from a simulated path, (X̄` (tn,` 1 ))n=1 , and the sequence of local errors,
N (¯
!) 1
(en+1,` 1 )n=0 , defined in (6.14), is the subject of the next subsection.
Defining the Sequence of Local Errors
For simplicity of analysis, we make two assumptions: i) the time mesh associated with
the level, `, is obtained by halving the intervals of the level ` 1; ii) we perform the
tau-leap at both levels without considering the Cherno↵ bounds described in Section
196
6.1.4.
¯ be two tau-leap approximations of X based on two consecutive grid
Let X̄ and X̄
levels, for instance, X̄:=X̄` ¯ :=X̄ . Consider two consecutive time-mesh points
and X̄
1 `
¯ , {t , (t +t )/2, t }.
for X̄, {tn , tn+1 }, and three consecutive time-mesh points for X̄ n n n+1 n+1
¯ start from x at time t .

Let X̄ and X̄ n n
¯ is to define
The first step for coupling X̄ and X̄
X
X̄n+1 := xn + ⌫j Yj,n (aj (xn ) tn ), (6.10)
j
X tn
Zn+1 := xn + ⌫j Qj,n (aj (xn ) ), (6.11)
j
2
X tn
¯
X̄ ⌫j Rj,n (aj (Zn+1 )
n+1 := Zn+1 + ),
j
2
where {Yj,n }Jj=1 [ {Qj,n }Jj=1 [ {Rj,n }Jj=1 are Poisson random variables. To couple the
¯ processes, we first decompose Y (a (x ) t ) as the sum of two indepen-
X̄ and X̄ j,n j n n
tn
dent Poisson random variables, Qj,n (aj (xn ) 2
)+Q0 j,n (aj (xn ) 2tn ). As a consequence,
¯ coincide in the closed interval [t , (t +t )/2]. By applying this decompo-
X̄ and X̄ n n n+1
sition in (6.10), we obtain
X tn X tn
X̄n+1 = xn + ⌫j Qj,n (aj (xn ) )+ ⌫j Q0j,n (aj (xn ) ), (6.12)
j
2 j
2
X tn X tn
¯
X̄ = xn + ⌫j Qj,n (aj (xn ) )+ ⌫j Rj,n (aj (Zn+1 ) ).
n+1
j
2 j
2
¯ , according to [1], is as follows: let m :=

The second step for coupling X̄ and X̄ j
min{aj (xn ), aj (Zn+1 )}, cj := aj (xn ) mj and fj := aj (Zn+1 ) mj . Notice that for
each j, either cj or fj is zero (or both).
197
Now, consider the following decompositions:
tn tn tn
Q0j,n (aj (xn ) 0
) = Pj,n (mj 00
) + Pj,n (cj ), (6.13)
2 2 2
tn 0 tn tn
Rj,n (aj (xn ) ) = Pj,n (mj ) + R0j,n (fj ),
2 2 2
where P 0 , P 00 and R0 are independent Poisson random variables.

By substituting (6.13) into (6.12), we define the local error, en+1,` 1 , as
en+1,` ¯
:= X̄ X̄n+1 (6.14)
1 n+1
X ✓ ◆
0 tn 00 tn
= ⌫j Rj,n (fj ) Pj,n (cj )
j
2 2
X ✓ tn tn
◆
= ⌫j R0j,n ( aj,n )1{ aj,n >0} Pj,n 00
( aj,n )1{ aj,n <0} ,
j
2 2
where aj,n := aj (Zn+1 ) aj (xn ) and Zn+1 is defined in (6.11). Note that in (6.14)
not only are R0j,n and Pj,n
00
random variables, but aj,n is also random because it
depends on the random variables (Qj,n )Jj=1 . Also note that all the mentioned random
variables are independent.
Conditioning
At this moment, it is convenient to recall the tower properties of the conditional

expectation and the conditional variance: given a random variable, X, and a sigma
algebra, F, defined over the same probability space, we have
⇥ ⇥ ⇤⇤
E [X] = E E X F ,
⇥ ⇥ ⇤⇤ ⇥ ⇥ ⇤⇤
Var [X] = Var E X F + E Var X F . (6.15)
Hereafter, we fix ` and, for the sake of brevity, omit it as a subindex.

198
P
Applying (6.15) to n 'n+1 · en+1 and conditioning on F, we obtain
" # " " ## " " ##

X X X
Var 'n+1 · en+1 = Var E 'n+1 · en+1 F + E Var 'n+1 · en+1 F
n n n
" # " #
X ⇥ ⇤ X ⇥ ⇤
= Var E 'n+1 · en+1 F +E Var 'n+1 · en+1 F .
n n
N (¯
!)
The main idea is to generate M` Monte Carlo paths, (X̄` (tn ; !
¯ ))n=1 , and to estimate
P
Var [ n 'n+1 · en+1 ] using
0 1 0 1
B C B C
BX ⇥ ⇤ C BX ⇥ ⇤ C
V̂` := S B
B E2
' n+1 · e n+1 F (¯
! ) ; M `
C + AB
C B Var ' n+1 · e n+1 F (¯
! ) ; M `
C.
C
@ n A @ n A
| {z } | {z }
Se (¯
!) Sv (¯
!)
(6.16)
To avoid nested Monte Carlo calculations, we develop exact and approximate

⇥ ⇤ ⇥ ⇤
formulas for computing E 'n+1 · en+1 F and Var 'n+1 · en+1 F . To derive those
N (¯
!)
formulas, we consider a sigma-algebra, F, such that ('n (¯
! ))n=1 , conditioned on F,
N (¯
!)
! ))n=1 is measurable with respect to F. In this way, the
is deterministic, i.e., ('n (¯
⇥ ⇤ ⇥ ⇤
only randomness in E 'n+1 · en+1 F and Var 'n+1 · en+1 F comes from the local
N (¯
!)
errors, (en )n=1 .
Conditional Local Error Representation
In this section, we derive a local error representation that takes into account the fact
that the dual is computed backwards and the distribution of the local errors that is
relevant to our calculations is therefore not exactly the one given by (6.14), but the
distribution given by (6.17).
N (¯
!)
Consider the sequence (X̄n )n=0 defined in (6.10). For fixed n, define Fn as the
199
sigma-algebra
Fn := (Yj,k (aj (xk ) tk ))j=1,...,J, k=1,...,n ,
i.e., the information we obtain by observing the randomness used to generate X̄n+1
from x0 . Motivated by dual-weighted expansions (6.9), we want to express the local
error representation (6.14) conditional on F:=FN (¯!) .
At this point, it is convenient to remember a key result for building Poissonian
bridges. If X1 and X2 are two independent Poisson random variables with parameters
1 and 2, respectively, we have that X1 X1 + X2 =k is a binomial random variable

with parameters k and 1 /( 1 + 2 ).
tn
Applying this observation to the decomposition Yj,n (aj (xn ) tn )=Qj,n (aj (xn ) 2
)+
tn tn
Q0 j,n (aj (xn ) 2
), we conclude that the conditional distribution of Qj,n (aj (xn ) 2
)
tn
given Fn , i.e., Qj,n (aj (xn ) 2
) Fn , is binomial with parameters Yj,n and 1/2.
Define now the sigma-algebra, Gn , as
✓ ◆
tn J
Gn := (Qj,n (aj (xn ) ) Fn )j=1 .
2
00
Applying the same argument to Pj,n , defined in (6.13), we conclude that
✓ ◆
00 cj
Pj,n {Fn , Gn } ⇠ binomial Yj,n Qj,n , .
aj (xn )
P
From the definition of Zn+1 =xn + j ⌫j Qj,n in (6.11), we conclude that
✓ ◆
tn
R0j,n Gn ⇠ Poisson (aj (Zn+1 ) mj ) .
2
00
Notice that, by construction, Pj,n {Fn , Gn } and R0j,n Gn are independent random
variables. Since cj = aj,n 1{ aj,n <0} and aj (Zn+1 ) mj = aj,n 1{ aj,n 0} , we can
200
express the conditional local error as
en+1 {Fn , Gn } = (6.17)

X ✓ ✓
tn
◆ ✓
aj,n
◆ ◆
0 00
⌫j Rj,n aj,n 1{ aj,n 0} Pj,n Yj,n Qj,n , 1{ aj,n <0}
j
2 aj (xn )
in the distribution sense. For instance, we can easily compute the expectation of
en+1 {Fn , Gn } as follows:
X ✓ ◆
⇥ ⇤ tn Yj,n Qj,n
E en+1 {Fn , Gn } = ⌫j aj,n 1{ aj,n 0} + 1{ aj,n <0} .
j
2 aj (xn )
Taking into account that the joint distribution of (Qj,n )Jj=1 Fn is given by
J
Y
P
Yj,n Yj,n !
P (Qj,n = qj,n )Jj=1 Fn = 2 j , 0  qj,n  Yj,n ,
j=1
q j,n !(Y j,n q j,n )!
we can exactly compute the expected value and the variance of vn+1 · en+1 Fn for
N (¯
!)
any given deterministic vector, vn+1 . Notice that given F, the sequence (X̄n )n=0 is
N (¯
!)
deterministic and, as a consequence, the sequence ('n )n=1 F is also a deterministic
sequence of vectors. We can thus compute
" # " #
X X
E 'n+1 · en+1 F and Var 'n+1 · en+1 F (6.18)
n n
exactly and proceed as stated at the beginning of this section. However, trying to
develop computable expressions from (6.17) has two main disadvantages: i) it may
lead to computationally demanding procedures, especially for systems with many
reaction channels or in regimes with high activity; ii) it may be a↵ected by the
variance associated with the randomness in Fn and Gn .
201
Deriving a Formula for V̂`
In this section, we derive the formula (6.8). Our goal is to find computable approx-
imations of (6.18), where the underlying sigma-algebra, F, is just the information
gathered by observing the coarse path, X̄. This means that our formula should not
depend explicitly on the knowledge of the random variables that generate Fn and
Gn . At this point, it is important to recall the comments in Section 6.3.3; that is,
N (¯
!)
! ))n=1 is measurable with respect to F. This implies that, for all
the sequence ('n (¯
n, 'n+1 is independent of Gn . Hereafter, for notational convenience, we omit writing
explicitly the conditioning on F in our formulae.
It turns out that the leading order terms of the conditional moments obtained
from (6.17) are essentially the same as those computed from (6.14). We will then
derive (6.8) from (6.14). Using the notation from Section 6.3.3, we have that
X ✓ ◆
tn tn
('n+1 · en+1 ) = fj,n R0j,n ( aj,n )1{ aj,n >0}
00
Pj,n ( aj,n )1{ aj,n <0} .
j
2 2
By the tower property, we obtain
⇥ ⇥ ⇤⇤ tn X
E [('n+1 · en+1 )] = E E ('n+1 · en+1 ) Gn = fj,n E [ aj,n ] .
2 j
Now let us consider the first-order Taylor expansion:
X
aj,n := aj (xn + ⌫i Qi,n (ai (xn ) tn /2)) aj (xn )
i
X
⇡ (raj (xn ) · ⌫i Qi,n (ai (xn ) tn /2))
i
X
= (raj (xn ) · ⌫i )Qi,n (ai (xn ) tn /2).
i
Since Qi,n (ai (xn ) tn /2) ⇠ Poisson(ai (xn ) tn /2), we have that E [ aj,n ] = µj,n and
202
2
Var [ aj,n ] = j,n . Thus,
tn X
E [('n+1 · en+1 )] ⇡ fj,n µj,n .
2 j
Now, we use again the tower property for the variance:
⇥ ⇥ ⇤⇤ ⇥ ⇥ ⇤⇤
Var [('n+1 · en+1 )] = Var E ('n+1 · en+1 ) Gn + E Var ('n+1 · en+1 ) Gn .
We then immediately obtain
⇥ ⇥ ⇤⇤ ( tn )3 X X
Var E ('n+1 · en+1 ) Gn ⇡ fj,n fj 0 ,n (raj (xn ) · ⌫i )(raj 0 (xn ) · ⌫i )ai (xn ),
8 j,j 0 i
⇥ ⇥ ⇤⇤ tn X 2
E Var ('n+1 · en+1 ) Gn ⇡ f E [ aj,n sgn( aj,n )] .
2 j j,n
Let us consider the case where ai (xn ) tn /2 is large enough for all i. It is well
known that a Poisson random variable, Q( ), is well approximated by a Gaussian ran-
dom variable, N ( , ), for moderate values of , say >10. Since Qi,n (ai (xn ) tn /2) ⇠
Poisson(ai (xn ) tn /2), we have that, when ai (xn ) tn /2 is large enough for all i,
2
aj,n ⇡ N (µj,n , j,n ). Consider a Gaussian random variable Z with parameters µ
2
and > 0. Then,
Z +1
⇥ ⇤
E (µ + Z)1{µ+ Z>0} = µP (µ + Z > 0) + p z exp z 2 /2 dz (6.19)
2⇡ µ/
= µ(1 ( µ/ )) + p exp (µ/ )2 /2 .

2⇡
From (6.19), we immediately get
✓ 2 ◆
⇥ ⇤ j,n qj,n
E aj,n 1{ aj,n >0} ⇡ µj,n (1 pj,n ) + p
exp , (6.20)
2⇡ 2
✓ 2 ◆
⇥ ⇤ j,n qj,n
E aj,n 1{ aj,n <0} ⇡ µj,n pj,n p exp .
2⇡ 2
203
By subtracting the expressions in (6.20), we obtain
⇥ ⇥ ⇤⇤ tn X 2
E Var ('n+1 · en+1 ) Gn ⇡ f (µ̃j,n + ˜j,n ) . (6.21)
2 j j,n
Let us now consider the case where ai (xn ) tn /2 is close to zero for some i. We can
p
bound the expression E [ aj,n sgn( aj,n )] by E [| aj,n |] and also E [( aj,n )2 ]. It is
easy to see that E [| aj,n |]  µ̄j,n . Regarding E [( aj,n )2 ], it can be approximated by
" #
X X
E (raj (xn ) · ⌫i ) (raj (xn ) · ⌫i0 ) Qi Qi0 = (raj (xn ) · ⌫i ) (raj (xn ) · ⌫i0 ) E [Qi Qi0 ] .
i,i0 i,i0
Since
2
✓ ◆2 !
( tn ) tn tn
E [Qi Qi0 ] = ai (xn )ai0 (xn )1i6=i0 + ai (xn ) + ai (xn ) 1i=i0 , (6.22)
4 2 2
we can rearrange terms and approximate E [( aj,n )2 ] by µ2j,n + 2

j,n .
We conclude that E [ aj,n sgn( aj,n )] can be bounded by mj,n , which has been
q
defined as min{µ̄j,n , µ2j,n + j,n
2
}.
Remark 6.3.2. Formula (6.8) can be considered as an initial, relatively successful

attempt to estimate V` , but there is still room for improvement. The main problem
is the lack of sharp concentration inequalities for linear combinations of independent
Poisson random variables. With the numerical examples, we show that the efficiency
index of the formula is acceptable for our estimation purposes.
Remark 6.3.3. We are assuming that only tau-leap steps are taken, but in our hybrid
algorithms, some steps can be exact, and, hence, do not contribute to the local error.
For that reason, we include the indicator function of the tau-leap step, 1TL , in the
estimator, V̂` .
Remark 6.3.4. The dual-weighted residual approach makes the estimation of V` fea-
204
sible. In our numerical experiments, we found that, using the same number of sim-
ulated coupled hybrid paths, the variance of V̂` is much smaller than the variance of
Var [g` g` 1 ], estimated by a standard Monte Carlo. Note that V̂` can be computed
using only single-level hybrid paths at level ` 1. In the upper right panel of Figure
6.8, we can see that due to the hybrid nature of the simulated paths, it is not possible
to predict where the variance of g` g` 1 will enter into a superlinear regime. Thus,
by extrapolating the Var [g` g` 1 ] from the coarser levels, we may overestimate the
values of Var [g` g` 1 ] for the deepest levels.
6.4 Estimation Procedure
In this section, we present a procedure that estimates E [g(X(T ))] within a given
prescribed relative tolerance, T OL>0, with high probability. The process contains
three phases:
Phase I Calibration of virtual machine-dependent quantities.
Phase II Solution of the work optimization problem: we obtain the total number of
levels, L, and the sequences ( ` )L`=0 and (M` )L`=0 , i.e., the one-step exit proba-
bility bounds and the required number of simulations at each level. We recall
that in Section 6.1.6, we defined t` := t0 R ` , where R > 1 is a given integer
constant. For that reason, to define the whole sequence of meshes, ( t` )L`=0 , we
simply need to define the size of the coarsest mesh, t0 .
Phase III Estimation of E [g(X(T ))].
6.4.1 Phase I
In this section, we describe the estimation of several constants, C1 , C2 , C3 and K1 ,

and functions, CP and K2 , that allow us to model the expected computational work
205
(or just work), measured in terms of the runtime of hybrid paths, see definitions
(6.23) and (6.24). Those quantities are virtual machine dependent; that is, they are
dependent on the computer system used for running the simulations and also on the
implementation language. Those quantities are also o↵-line estimated; that is, we
need to estimate them only once for each virtual machine on which we want to run
the hybrid method.
Constants C1 , C2 , and C3 reflect the average execution times of each logical path
of Algorithm 13. We have that C1 and C2 reflect the work associated with the two
di↵erent types of steps in the MNRM. Constant C3 reflects the work needed for
computing the Cherno↵ tau-leap size, ⌧Ch . Finally, when we perform a tau-leap step,
we have the work needed for simulating Poisson random variates, which is modeled
by the function CP [2]. This function has two constants that are also virtual machine
dependent.
The constant, K1 , and the function, K2 ⌘ K2 (x, ), defined through C1 , C2 , and
C3 , were introduced in Section 6.1.5.
6.4.2 Phase II
In this section, we set and solve the work optimization problem. Our objective func-
tion is the expected total work of the MLMC estimator, ML , defined in (6.5), i.e.,
L
X
` M` ,
`=0
where L is the maximum level (deepest level), 0 is the expected work of a single-level
path at level 0, and `, for ` 1, is the expected computational work of two coupled
paths at levels ` 1 and `. Finally, M0 is the number of single-level paths at level 0,
and M` , for ` 1, is the number of coupled paths at levels ` 1 and `.
Let us now describe in detail the quantities, ( ` )L`=0 . For `=0, Algorithm 23
206
generates a single hybrid path. The building block of a single hybrid path is Algo-
rithm 13, which adaptively determines whether to use an MNRM step or a tau-leap
one. According to this algorithm, there are two ways of taking an MNRM step, de-
pending on the logical conditions, K1 /a0 (x)>T0 t and K2 /a0 (x)>⌧Ch . Given one
particular hybrid path, let NK1 ( t0 , 0 ) be the number of MNRM steps such that
K1 /a0 (x)>T0 t is true, and let NK2 ( t0 , 0 ) be the number of MNRM steps such
that K1 /a0 (x)>T0 t is false and K2 /a0 (x)>⌧Ch is true. When a Cherno↵ tau-leap
step is taken, we have constant work, C3 , and variable work computed with the aid
of CP . Then, the expected work of a single hybrid path, at level ` = 0, is
0 := C1 E [NK1 ( t0 , 0 )] + C2 E [NK2 ( t0 , 0 )] + C3 E [NTL ( t0 , 0 )] (6.23)

XJ Z
+ E CP (aj (X̄0 (s))⌧Ch (X̄0 (s), 0 ))1T L (X̄0 (s))ds ,
j=1 [0,T ]
where t0 is the size of the time mesh at level 0 and 0 is the exit probability bound
at level 0. Therefore, the expected work at level 0 is 0 M0 , where M0 is the total
number of single hybrid paths.
For ` 1, we use Algorithm 14 to generate M` -coupled paths that couple the
` 1 and ` levels. Given two coupled paths, let NK1 ( t` 1 , ` 1) and NK1 ( t` , ` ) be
the number of exact steps for level ` 1 (coarse mesh) and ` (fine mesh), respectively,
with associated work C1 . We define NK2 ( t` 1 , ` 1) and NK2 ( t` , ` ) analogously.
Then, the expected work of a pair of coupled hybrid paths at levels ` and ` 1 is
h i h i h i
(c) (c) (c)
` := C1 E NK1 (`) + C2 E NK2 (`) + C3 E NTL (`) (6.24)
J
X Z
+ E CP (aj (X̄` (s))⌧Ch (X̄` (s), ` ))1T L (X̄` (s))ds
j=1 [0,T ]
J
X Z
+ E CP (aj (X̄` 1 (s))⌧Ch (X̄` 1 (s), ` 1 ))1T L (X̄` 1 (s))ds ,
j=1 [0,T ]
207
where
(c)
NK1 (`) := NK1 ( t` , ` ) + NK1 ( t` 1 , ` 1)
(c)
NK2 (`) := NK2 ( t` , ` ) + NK2 ( t` 1 , ` 1)
(c)
NTL (`) := NTL ( t` , ` ) + NTL ( t` 1 , ` 1 ).
Now, recalling the definitions of the error decomposition, given at the beginning of
Section 6.3.2, we have all the elements to formulate the work optimization problem.
Given a relative tolerance, T OL>0, we solve
8
> PL
>
> min L }
`=0 ` M`
>
< { t 0 ,L,(M , )
` ` `=0
s.t. (6.25)
>
>
>
>
: EE,L + EI,L + ES,L  T OL.
It is natural to consider the following family of auxiliary problems indexed on

L 1, where we assume for now that the double sequence, ( t` , ` )L`=0 , is known:
8
> P
>
>
> min(M` 1)L`=0 L`=0 ` M`
<
s.t. (6.26)
>
> q
>
: EI,L + CA PL V`  T OL T OL2 ,
>
`=0 M`
where we have CA 2 to guarantee an asymptotic confidence level of at least 95%.

Let us assume for now that we know `, V` and EI,` , for ` = 0, 1, . . . , L. Let
L0 be the smallest value of L such that EI,L <T OL T OL2 . This value exists and it
is finite since the discretization error, EI,L , tends to zero as L goes to infinity. For
P
each L L0 , define wL := L`=0 ` M`⇤ , where the sequence (M`⇤ )L`=0 is the solution of
the problem (6.26). It is worth mentioning that (M`⇤ )L`=0 is quickly obtained as the
208
solution of the following Karush-Kuhn-Tucker problem (see, e.g., [20]):
8
> PL
>
> min L ` M`
>
< (M ` 1) `=0 `=0
s.t. . (6.27)
>
>
>
: P L V`  R
>
`=0 M`
We do not develop here all the calculations, but a pseudo code is given in Algorithm
22.
Let us now analyze two extreme cases: i) for L such that EI,L is less but very close
P
to T OL T OL2 , we have that L`=0 V` /M`⇤ is a very small number. As a consequence,
we obtain large values of M`⇤ and, hence, a large value of wL . By adding one more
level, i.e., LL+1, we expect a larger gap between EI,L and T OL0 ; that means that
P
we expect a larger value of L`=0 V` /M`⇤ that may lead to smaller values of M`⇤ . We
observe that, in spite of adding one more term to wL , this leads to a smaller value of
wL . ii) At the other extreme, a large value of L is associated with large values of L
and therefore with large values of wL .

This informal ‘extreme case analysis’ has been confirmed by our numerical ex-
periments (see, for instance, Figures 6.2 and 6.8 (lower-right)), which allow us to
conjecture that the sequence (wL )+1
L=L0 is a convex function of L and, hence, that it
has a unique optimal value achieved at a certain L⇤ . A pseudo algorithm to find

L⇤ could be to start computing wL0 and wL0 +1 . If wL0 +1 wL0 , we accept L⇤ =L0 ;
otherwise, we proceed to computing the next term of the sequence, (wL )+1
L=L0 . If,
for some p, we have wLp+1 wLp , we accept L⇤ =Lp . Of course, we can stop even if
wLp+1 <wLp , but the di↵erence wLp+1 wLp is sufficiently small. In this last case, we
accept L⇤ =Lp+1 .
209
Computational Complexity
At this point, we have all the necessary elements to establish a key point of this work,
the computational complexity of the multilevel hybrid Cherno↵ tau-leap method.
Let us now analyze the optimal amount of work at level L, wL , as a function of the
given relative tolerance, T OL. For simplicity, let us assume that M`⇤ >1, `=0, ..., L.
In this case, the optimal number of samples at level ` is given by
L
X
p p
M`⇤ =(CA /✓)2 T OL 2
V` / ` V` `,
`=0
for some ✓ 2 (0, 1). In fact, ✓ is the proportion of the tolerance, T OL, that our
computational cost optimization algorithm selects for the statistical error, ES,L . In
our algorithms, we impose ✓ 0.5; however, our numerical experiments always select
a larger value (see Figures 6.3 and 6.9).
By substituting M`⇤ into the total work formula, wL , we conclude that the optimal
expected work, conditional on ✓, is given by
0 12
L(✓)
⇥ ⇤ ⇤
@ CA X p A T OL 2 .
E wL (T OL) ✓ = V` `
✓ `=0
Due to the constraint, ✓ 0.5, we have that

8 !2 9
< L
X p =
wL⇤ (T OL)  sup 2 CA V` ` T OL 2 .
L : ;
`=0
P1 p
Let us consider the series `=0 V` `. First, observe that the expected compu-
tational work per path at level `, `, is bounded by a multiple of the expected
computational work of the MNRM (see Section 6.1.2), i.e., K MNRM . In our nu-
merical experiments, we observe that taking K around 3 is enough. Therefore,
P1 p p P p
`=0 V` `  K MNRM 1 `=0 V` . Observe that, by construction, V` ! 0, super-
210
linearly. More specifically, it satisfies the bound V` = O ( t` )  C t0 (1/2)` for some
P p
positive constant C. Therefore, the series 1 `=0 V` is dominated by the geometric
P p ` PL p
series 1 `=0 (1/ 2) < 1 . We conclude that supL { `=0 V` ` } is bounded and,
therefore, the expected computational complexity of the multilevel hybrid Cherno↵
tau-leap method is wL⇤ (T OL)=O (T OL 2 ).
Some Comments on the Algorithms for Phase II
In Algorithm 18, we propose an iterative method to obtain an approximate solution

to the problem (6.25). Notice that we are assuming that there are at least two levels
in the multilevel hierarchy, i.e., L 1.
To solve the problem (6.25), we bound the global exit error, EE,L , by T OL2 . More
specifically, we choose L to be sufficiently small such that
|A (gL ; ·) | L A (NTL ( tL , L ); ·) < T OL2 . (6.28)
At this point, it is crucial to observe that if we impose the condition (6.28) on any
level `<L, then we are unnecessarily enforcing a dependence of ` on T OL. This
dependence may result in very small values of `, which in turn may increase the
expected number of exact steps and tau-leap steps at level `, implying a larger ex-
pected computational work at level `. In the appendix of [2], we proved that, when
` tends to zero, the expected values of the number of tau-leap steps at level ` go to
zero, and therefore our hybrid MLMC strategy would converge to the SSA method
without the desired reduction in computational work. To avoid the dependence of
211
L 1
( ` )`=0 on T OL, we adopt a di↵erent strategy based on the following decomposition:
⇥ ⇤ ⇥ ⇤
V` = Var g` 1A` g ` 1 1A` 1
= Var g` g` 1 A` \ A` 1 P (A` \ A` 1 )
⇥ ⇤
+ Var g` A` \ Ac` 1 P A` \ Ac` 1
⇥ ⇤
+ Var g` 1 Ac` \ A` 1 P (Ac` \ A` 1 ) .
We impose that the first term of the right-hand side dominates the other two. This is
because the conditional variances appearing in the last two terms are of order O (1),
while the conditional variance appearing in the first term is of order O ( t` ), and
we make our computations with approximations of V` assuming that P (A` \ A` 1 )
is close to one. We proceed as follows: first, we approximate P (A` \ A` 1 ) by
P (A` ) P (A` 1 ); then, we consider 1 ` A (NTL ( t` , ` ); ·) as an approximate upper
bound for P (A` ) when ` A (NTL ( t` , ` ); ·) ⌧1. Those considerations lead us to im-
pose
⇥ ⇤
Var g` g` 1 A` \ A` 1 (1 ` A (NTL ( t` , ` ); ·)) (1 ` 1 A (NTL ( t` 1 , ` 1 ); ·)) >
(6.29)
⇥ ⇤ ⇥ ⇤
Var g` A` \ Ac` 1 ` 1 A (NTL ( t` 1 , ` 1 ); ·) + Var g` 1 Ac` \ A` 1 ` A (NTL ( t` , ` ); ·) .
To avoid simultaneous refinements on ` and ` 1, based on (6.29), we impose on `
the following condition:
V̂` (1 ` A (NTL ( t` , ` ); ·))2 > 2 S 2 (g; ·) ` A (NTL ( t` , ` ); ·) . (6.30)
Algorithms 23 and 18 provide A (g` ; ·), A (NTL ; ·) and the other required quanti-
ties. Condition (6.30) does not a↵ect the telescoping sum property of our multilevel
estimator, ML , defined in (6.5), since each level, `, has its own `.
Remark 6.4.1 (Multilevel estimators used in Algorithm 18). Although in algorithm

212
18 we show that the estimations of E [g(X(T ))] and Var [g(X(T ))] are computed us-
ing the information from the last level only, in fact we are computing them using a
multilevel estimator. We omit the details in the algorithm for the sake of simplicity.
⇥ ⇤
For the case of E g(X̄(T )) , we use the standard mutilevel estimator, and, for the
case of Var [g(X(T ))], we use the following telescopic decomposition:
l
X
⇥ ⇤ ⇥ ⇤ ⇥ ⇤ ⇥ ⇤
Var g(X̄l (T )) = Var g(X̄0 (T )) + (Var g(X̄` (T )) Var g(X̄` 1 (T )) ),
`=1
where l > 1 is a fixed level. Using the usual variance estimators for each level, we
obtain an unbiased multilevel estimator of the variance of g(X̄). We refer to [21] for
details.
Remark 6.4.2 (Coupled paths exiting the lattice, Zd+ ). Algorithm 14 could compute
four types of paths. It could happen that no approximate process (the coarse one,
X` 1 , or the fine one, X` ) exits the lattice, which is the most common case. It could
also happen that one of the approximate processes exits the lattice. And finally, both
approximate processes could exit the lattice. The first case is the most common one
and no further explanation is required. We now explain the case when one of processes
exits the lattice. Suppose that the coarse one exits the lattice. In that case, until the
fine process reaches time T or exits the lattice, we still simulate the coupled process
by simulating only the fine path using the single-level hybrid algorithm presented in
[2]. If the fine path reaches T , we have that 1A` 1
= 0, and 1A` = 1. Vice versa, if
the fine process exits and the coarse one reaches T , we have 1A` 1
= 1 and 1A` = 0.
Remark 6.4.3 (Coupling with an exact path). Algorithm 18 uses a computational-

cost-based stopping criterion. That is, the algorithm stops refining the time mesh when
P
the estimated total computational cost of the multilevel estimator, ŴML := l`=0 ˆ` M` ,
at level l, is greater than the corresponding computational cost for level l 1, and
only when condition EÎ <T OL T OL2 is already satisfied. In that case, L⇤ =l 1.
213
The latter condition is required for obtaining a solution of the optimization problem
(6.27). In our numerical experiments, we observed that the computational cost of two
hybrid coupled paths, `, may be greater than the computational cost of “hybrid-exact”
coupled paths; that is, the computational cost of a hybrid path at level l 1 coupled with
an exact path at level l. That kind of path, used only at the last level, leads to the
following unbiased multilevel estimator:
M0 L 1 M
1 X X 1 X̀
M̃L := g0 1A0 (!m,0 ) + [g` 1A` g` 1 1A` 1 ](!m,` )
M0 m=1 `=1
M` m=1
ML
1 X
+ [g(X(T )) g` 1 1AL 1 ](!m,L ).
ML m=1
Therefore, it is possible to add another stopping criterion to Algorithm 18 related

to the comparison between the estimated computational cost of two hybrid coupled
paths and the computational cost of hybrid-exact coupled paths. Please note that the
2
condition L A (NTL,L ; ·) A (gL ; ·) T OL trivially holds because A (NTL,L ; ·) is zero in
such a case. In our numerical examples, there are no significant computational gains
in the estimation phase from using that stopping rule and its corresponding estimator.
This alternative hybrid unbiased estimator is inspired by the work of Anderson and
Higham [1].
6.4.3 Phase III
From Phase II, we found that, to compute our multilevel Monte Carlo estimator,
ML , for a given tolerance, we have to run M0⇤ single hybrid paths with parameters
( t0 , 0 ) and M`⇤ coupled hybrid paths with parameters ( t` 1 , ` 1) and ( t` , ` ),
for ` = 1, 2, . . . , L⇤ . But, we will follow a slightly di↵erent strategy: we run half of
the required simulations and use them to update our estimations of the sequences
⇤ ⇤ ⇤
(EI,` )L`=0 , (V` )L`=0 , and ( ` )L`=0 . Then, we solve the problem (6.26) again and re-
214
calculate the values of M`⇤ for all `. We proceed iteratively until convergence. In
this way, we take advantage of the information generated by new simulated paths
and update the estimations of the sequences of weak errors, computational costs, and
variances, obtaining more control over the total work of the method.
method, and we compare the results with the single-level approach given in [2]. For
bench-marking purposes, we use Gillespie’s Stochastic Simulation Algorithm (SSA)
instead of the Modified Next Reaction Method (MNRM), because the former is widely
used in the literature.
6.5.1 A Simple Decay Model
The classical radioactive decay model provides a simple and important example for
the application of our method. This model has only one species and one first-order
reaction,
c
X ! ;. (6.31)
Its stoichiometric matrix, ⌫ 2 R, and the propensity function, a : Z+ ! R, are given

by
⌫= 1 and a(X) = c X.
Here, we choose c = 1, and define g(x) = x as the scalar observable. In this par-
ticularly simple example, we have that E [g(X(T ))|X(t) = X0 ] = X0 exp( c(T t)).
Consider the initial condition X0 =105 and the final time T =0.5. In this case, the pro-
215
cess starts relatively far from the boundary, i.e., it is a tau-leap dominated setting.
We now analyze an ensemble of five independent runs of the calibration algorithm
(Algorithm 18), using di↵erent relative tolerances. In Figure 6.1, we show, in the
left panel, the total predicted work (runtime) for the single-level hybrid method, for
the multilevel hybrid method and for the SSA method, versus the estimated error
bound. The multilevel method is preferred over the SSA and the single-level hybrid
method for all the tolerances. We also show the estimated asymptotic work of the
multilevel method. In the right panel, we show, for di↵erent tolerances, the actual
work (runtime), using a 20 core Intel GLNXA64 architecture and MATLAB version
R2014a.
In Table 6.1, we summarize an ensemble run of the calibration algorithm, where
WML is the average actual computational work of the multilevel estimator (the sum
of all the seconds taken to compute the estimation) and WSSA is the corresponding
average actual work of the SSA. We compare those values with the corresponding
estimations, ŴML and ŴSSA .
Predicted work vs. Error bound, Decay model Actual work vs. Error bound, Decay model
SSA SSA
Hybrid Hybrid
Hybrid ML Hybrid ML
−3 slope 1/2 −3 slope 1/2
Error bound
10
Error bound
10
Asymptotic
−4 −4
10 10
0 1 2 3
10 10 10 10 0 1 2 3
10 10 10 10
the simple decay model (6.31), with 95% confidence intervals. The multilevel hybrid
method is preferred over the SSA and the single-level method for all the tolerances.
Right: Actual computational work (runtime) versus the estimated error bound. No-
tice that the computational complexity has order O (T OL 2 ).
216
In Figure 6.2, we can observe how the estimated weak error, ÊI,` , and the estimated
variance of the di↵erence of the functional between two consecutive levels, V̂` , decrease
linearly as we refine the time mesh. This corresponds to the pure tau-leap case since
the process, X, remains far from the boundary in [0, T ]. As expected, the linear
relationship for the variance starts at level 1. The estimated total path work, ˆ` ,
increases as we refine the mesh. Observe that it increases more slowly than linearly.
This is because the work needed for generating Poisson random variables becomes less
as we refine the time mesh. In the lower right panel, we show the total computational
work, only in the cases in which ÊI,` < T OL T OL2 .
In Figure 6.4, we show the main outputs of Algorithm 18, ` and M` for ` =
0, ..., L⇤ , for the smallest considered tolerance. In this case, L⇤ is 12. We observe that
the number of realizations decreases slower than linearly, from levels 1 to L⇤ 1, until
it reaches ML⇤ =1.
ŴM L WM L
T OL L⇤ Min Max ŴSSA
Min Max WSSA
Min Max
3.13e-03 5 5 5 0.03 0.02 0.04 0.03 0.02 0.05
1.56e-03 6 6 6 0.04 0.02 0.10 0.04 0.02 0.13
7.81e-04 8 8 8 0.03 0.02 0.05 0.03 0.02 0.06
3.91e-04 9.2 9 10 0.02 0.02 0.03 0.02 0.01 0.03
1.95e-04 11 11 11 0.02 0.02 0.03 0.02 0.02 0.04
9.77e-05 12 12 12 0.03 0.02 0.03 0.03 0.02 0.03
Table 6.1: Details of the ensemble run of Algorithm 18 for the simple decay model
(6.31). As an example, the second row of the table indicates that, for a tolerance
T OL=1.56 · 10 3 , six levels are needed. The predicted work of the multilevel hybrid
method is, on average, 4% of the predicted work of the SSA method, which coincides
with the actual work. Observed minimum and maximum values in the ensemble are
also provided.
In the left panel of Figure 6.5, we show the performance of formula (6.8), imple-
mented in Algorithm 21, used to estimate the strong error, V` , defined in Section
6.3.2. The quotient of V̂` over a standard Monte Carlo estimate of V` is almost 1 for
the first ten levels. At levels 11 and 12, we obtain 0.99 and 0.91, respectively. Both
217
Weak error for TOL = 9.77e−05 Variancel for TOL = 9.77e−05
−2
10
−6
Linear reference
Weak Error 10
−3
10 10
−7
Varl
−8
10
−4
10
−9
10
−5
10
Linear reference 10
−10
−4 −3 −2 −1 −4 −3 −2 −1
10 10 10 10 10 10 10 10
h h
Psil for TOL = 9.77e−05 Predicted total work(L), for TOL = 9.77e−05
1400
Linear reference
1200
1
10
1000
Work
0
10
l
Psi
800
−1 600
10
400
−2
10
−4 −3 −2 −1 200
10 10 10 10 0 5 10 15
h Level (L)
Figure 6.2: Upper left: estimated weak error, ÊI,` , as a function of the time mesh size,
t, for the simple decay model (6.31). Upper right: estimated variance of the di↵er-
ence between two consecutive levels, V̂` , as a function of t. Lower left: estimated
path ˆ` , as a function of t. Lower right: estimated total computational work,
PL work,
ˆl Ml , as a function of the level, L.
l=0
quantities are estimated using a coefficient of variation less than 5%, but there is a
remarkable di↵erence in terms of computational work in favor of our dual-weighted
estimator. In the right panel of the same figure, we show the estimated variance of
V` , computed by dual-weighted estimation (6.8) and computed by direct sampling.
Observe that, in this case, the computational savings may be up to order O (105 ).
In the simulations, we observed that, as we refine T OL, the optimal number of
levels approximately increases logarithmically, which is a desirable feature. We fit the
model L⇤ = a + b log(T OL 1 ), obtaining b= 2.11 and a= 7.3.
218
TOL vs. Statistical error %
Sqrt of−4(Variancel times Psil), for TOL = 9.77e−05
x 10
0.9
2.2
Sqrt of (Varl times Psil)

2
Statistical error %
0.8
1.8
1.6
0.7
1.4
1.2
0.6
1
0.8
0.5
0.6
0.4
0.4
0 2 4 6 8 10 12
0.5 1 1.5 2 2.5 3 Level
TOL x 10
−3
Figure 6.3: Left: Percentage of the statistical error over the computational global
error, for the simple decay model q
(6.31). As mentioned in Section 6.4, it is well above
0.5 for all the tolerances. Right: V̂` ˆ` as a function of `, for the smallest tolerance,
which decreases as the level increases. Observe that the contribution of level 0 is less
than 50% of the sum of the other levels.
δl for TOL = 9.77e−05 Ml for TOL = 9.77e−05

4
10
−6
Linear reference
10
2
−8 10
10
Ml
δl
−10
10 0
10
−12
10
−2
10
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Level (l) Level (l)
Figure 6.4: One-step exit probability bound, ` , and M` for `=0, 1, ..., L⇤ , for the
smallest tolerance, for the simple decay model (6.31).
The QQ-plot in Figure 6.6 shows, for the smallest considered T OL, 103 inde-
pendent realizations of the multilevel estimator, ML (defined by (6.5)). Those 103
points are generated using 5 sets of parameters given by an independent run of the
calibration algorithm (Algorithm 18). This plot, complemented with a Shapiro-Wilk
normality test, validates our assumption about the Gaussian distribution of the sta-
tistical error. Observe that the estimates are concentrated around the theoretical
219
Dual−based and empirical variances, Decay model Variance of Variancel for TOL = 9.77e−05
−15
−7 10
10
Variance of Variancel
Variances
−8 −20
10 10
−9
10 −25
10
Cuadratic reference
Dual−based Dual−based
Empirical Empirical
−4 −3 −2 −1
10 10 10 10 2 4 6 8 10 12
h Level (l)
Figure 6.5: Left: performance of the formula (6.8) as a strong error estimate, for the
simple decay model (6.31). Here, h= t. Right: estimated variance of V` with 95%
confidence intervals.
QoI vs.
4 Standard Normal for TOL = 9.77e−05 TOL vs. Total error
x 10 3.6
6.0662 2
−3
4
6.066 10 4
2.6
6.0658 5.6
Quantiles of QoI
−4
10
6.0656
Total error
6.0654 −5
10
6.0652
−6
6.065 10
6.0648
−7
10
6.0646
6.0644
−4 −3
−3 −2 −1 0 1 2 3 10 10
Standard Normal Quantiles TOL
Figure 6.6: Left: QQ-plot for the hybrid Cherno↵ MLMC estimates, ML , in the
simple decay model (6.31). Also, we performed a Shapiro-Wilk normality test, and we
obtained a p-value of 0.0105. Right: T OL versus the actual computational error. The
numbers above the straight line show the percentage of runs that had errors larger
than the required tolerance. We observe that in all cases, except for the smallest
tolerance, the computational error follows the imposed tolerance with the expected
confidence of 95%.
value X0 exp( c(T t)) = 105 exp( 0.5) ⇡ 6.0653e + 04. In the same figure, we also
show T OL versus the actual computational error. It can be seen that the prescribed
tolerance is achieved with the required confidence of 95%, in all the tolerances.
220
6.5.2 Gene Transcription and Translation [1]
This model has five reactions,
1 c 2 c
; ! R, R ! R+P
3 c 4 c
2P ! D, R ! ; (6.32)
5 c
P ! ;
described respectively by the stoichiometric matrix and the propensity function
0 1 0 1
1 0 0 c1
B C B C
B C B C
B 0 1 0 C B c R C
B C B 2 C
B C B C
⌫=B
B 0 2 1 C and a(X) = B c3 P (P 1) C
C B
C,
B C B C
B C B C
B 1 0 0 C B c4 R C
@ A @ A
0 1 0 c5 P
where X(t) = (R(t), P (t), D(t)), and c1 =25, c2 =103 , c3 =0.001, c4 =0.1, and c5 =1.
In the simulations, the initial condition is (0, 0, 0) and the final time is T =1. The
observable is given by g(X) = D. We observe that the abundance of the mRNA
species, represented by R, is close to zero for t 2 [0, T ]. However, as we point out in
[2], the reduced abundance of one of the species is not enough to ensure that the SSA
method should be used.
We now analyze an ensemble of five independent runs of the calibration algorithm
(Algorithm 18), using di↵erent relative tolerances. In Figure 6.7, we show, in the left
panel, the total predicted work (runtime) for the single-level hybrid method, for the
multilevel hybrid method and for the SSA method, versus the estimated error bound.
We also show the estimated asymptotic work of the multilevel method. Again, the
multilevel hybrid method outperforms the others and we remark that the observed
computational work of the multilevel method is of order O (T OL 2 ).
221
Predicted work vs. Error bound, Genes model Actual work vs. Error bound, Genes model
−1
−1 10
10 SSA
SSA
Hybrid Hybrid
Hybrid ML Hybrid ML
slope 1/2 slope 1/2
Error bound
Error bound
Asymptotic
−2 −2
10 10
1 2 3 4
10 10 10 10 1 2 3 4
10 10 10 10
gene transcription and translation model (6.32). The hybrid method is preferred over
the SSA for the first three tolerances only. The multilevel hybrid method is preferred
over the SSA and the single-level method for all the tolerances. Right: Actual work
(runtime) versus the estimated error bound.
In Figure 6.8, we can observe how the estimated weak error decreases linearly
for the coarser time meshes, but, as we continue refining the time mesh, it quickly
decreases towards zero. In the case of the estimated variance, V̂` , it decreases faster
than linearly, and it also quickly decreases towards zero afterwards. This is a con-
sequence of the transition from a hybrid regime to a pure exact one. The estimated
total path work, ˆ` , increases sublinearly as we refine the mesh. Note that ˆ` reaches
a maximum, which corresponds to a SSA-dominant regime. In the lower right panel,
we show the total computational work only in the cases in which ÊI,` < T OL T OL2 .
In Figure 6.10, we show the main outputs of Algorithm 18, ` and M` for ` =
0, ..., L⇤ , for the smallest tolerance. We observe that the number of realizations de-
creases slower than linearly from levels 1 to 12.
In Figure 6.11, we see that our dual-weighted estimator of the strong error, V` ,
gives essentially the same results as the standard Monte Carlo estimator, but with
much less computational work. In this case, an accurately empirical estimate of V7
took almost 48 hours, but the dual-based computation of V̂7 just took few minutes.
222
Weak error for TOL = 3.13e−03 Variancel for TOL = 3.13e−03
−1
10
−2
10
−2
10
Weak Error
−4
Varl
10
−3
10
−6
−4
10
10
Linear reference Linear reference

−4 −3 −2 −1 −4 −3 −2 −1
10 10 10 10 10 10 10 10
h h
Psil for TOL = 3.13e−03 Predicted
x 10
4 total work(L), for TOL = 3.13e−03
3
2
10
Linear reference
2.5
1
10
2
Work
l
Psi
0 1.5
10
1
−1
10
0.5
−4 −3 −2 −1 0
10 10 10 10 0 2 4 6 8 10 12 14
h Level (L)
t, for the gene transcription and translation model (6.32). Upper right: estimated
variance of the di↵erence between two consecutive levels, V̂` , as a function of t.
Lower left: estimated pathPwork, ˆ` , as a function of t. Lower right: estimated
total computational work, Ll=0 ˆl Ml , as a function of the level, L.
ŴM L WM L
T OL L⇤ Min Max ŴSSA
Min Max WSSA
Min Max
1.00e-01 3 3 3 0.04 0.04 0.04 0.06 0.05 0.07
5.00e-02 4.6 4 5 0.04 0.03 0.04 0.05 0.05 0.05
2.50e-02 6 6 6 0.03 0.03 0.04 0.05 0.04 0.05
1.25e-02 8 8 8 0.03 0.03 0.03 0.05 0.05 0.06
6.25e-03 10 10 10 0.03 0.03 0.03 0.05 0.04 0.05
3.13e-03 11.4 11 13 0.03 0.03 0.03 0.05 0.04 0.05
Table 6.2: Details for the ensemble run of Algorithm 18 for the gene transcription
and translation model (6.32).
223
TOL vs. Statistical error %
Sqrt of (Variance times Psi ), for TOL = 3.13e−03
l l
0.95 0.035
Sqrt of (Varl times Psil)

Statistical error %
0.03
0.9
0.025
0.85 0.02
0.015
0.8
0.01
0.75 0.005
0 2 4 6 8 10 12
0.02 0.04 0.06 0.08 0.1 Level
TOL
Figure 6.9: Left: Percentage of the statistical error over the computational global
error, for the gene transcription and translation model (6.32).
q As mentioned in Section
6.4, it is well above 0.5 for all the tolerances. Right: V̂` ˆ` as a function of `,
for the smallest tolerance, which decreases as the level increases. Observe that the
contribution of level 0 is almost equal to the sum of the other levels.
δ for TOL = 3.13e−03 M for TOL = 3.13e−03

l l
−5
10
Linear reference
4
−6 10
10
−7
10
2
10
Ml
−8
δl
10
−9
10
0
−10 10
10
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Level (l) Level (l)
Figure 6.10: The one-step exit probability bound, ` , and M` for `=0, 1, ..., L⇤ , for
the smallest tolerance in the gene transcription and translation model (6.32).
In the simulations, we observed that, as we refine T OL, the optimal number of

levels approximately increases logarithmically, which is a desirable feature. We fit the
model L⇤ = a + b log(T OL 1 ), obtaining b= 2.48 and a= 2.85.
The QQ-plot in the Figure 6.12, computed in the same way as in the previous
example, together with a Shapiro-Wilk normality test, shows the validity of the Gaus-
sian assumption for the statistical errors. In the same figure, we also show T OL versus
224
Variance of Variancel for TOL = 3.13e−03
Dual−based and empirical variances, Genes model
−3
10
−10
10
Variance of Variancel
Variances
−4
10
−15
10
−5
10 Cuadratic reference
Dual−based
−20 Dual−based
Empirical 10
−2 −1
Empirical
10 10
2 4 6 8 10 12
h
Level (l)
Figure 6.11: Left: performance of formula (6.8) as a strong error estimate for the gene
transcription and translation model (6.32). Here, h= t. Right: estimated variance
of V` with 95% confidence intervals.
QoI vs. Standard Normal for TOL = 3.13e−03 TOL vs. Total error 2.8
10
−1 2.6
3730 3.6
4
3725
−2
4
10 5
Quantiles of QoI
3720
Total error
−3
3715 10
3710
−4
10
3705
3700 −5
10
3695
−2 −1
−3 −2 −1 0 1 2 3 10 10
Standard Normal Quantiles TOL
Figure 6.12: Left: QQ-plot based on ML estimates for the gene transcription and
translation model (6.32). Also, we performed a Shapiro-Wilk normality test and we
obtained a p-value of 0.6. Right: T OL versus the actual global computational error.
The numbers above the straight line show the percentage of runs that had errors
larger than the required tolerance. We observe that in all cases (except the second
for a very small margin) the computational error follows the imposed tolerance with
the actual global computational error. It can be seen that the prescribed tolerance
is achieved, except for the second smallest tolerance, with the required confidence of
95%, since CA =1.96.
225
MLMC Hybrid-Path Analysis
We now analyze an ensemble of 103 independent runs of the multilevel estimator,

ML , for T OL = 1.25e 2. In this case, L⇤ = 8. In Figures 6.13, 6.14 and 6.15, we
show boxplots corresponding to that ensemble. In each one, we indicate the coupling
pair (on the x-axis) and the value of ` (below the title of the plot). In each figure,
the first boxplot starting from the left, corresponds to single-level hybrid simulations
at the coarsest level, `=0, with a time mesh of size t0 , and with an exit bound for
the one-step exit probability, 0 =1e 5. Next, we show the boxplots corresponding to
coupled hybrid paths, at levels `=0 and `=1, generated using time meshes of size t0
and t1 , respectively, and exit probability bounds, 0 and 1, respectively. This is
indicated under the boxplots with the symbols 1C and 1F, which stand for ‘Coarse’
and ‘Fine’ levels in the first coupling, respectively. We proceed in the same fashion
until the final level, L⇤ . At this point, it is crucial to observe that the probability law
for the samples in the boxplots indicated by kF and (k+1)C should be the same for
any k 2 {0, 1, . . . , L⇤ 1} (in the single-level case, we interpret the symbol 0 as 0F).
This is because both samples are generated using the same time mesh of size tk and
the same one-step exit probability bound, k, see Remark 6.2.1.
Figure 6.13 shows the total proportion of Cherno↵ tau-leap steps over the total
number of tau-leap steps. Here, we understand that a Cherno↵ tau-leap step is taken
when the size of ⌧Ch (see Section 6.1.5) is strictly smaller than the distance from the
current time to the next mesh point and, therefore, the Cherno↵ bound is acting as
an actual constraint for the size of the tau-leap step. We can see how the Cherno↵
steps are present in the first levels but not in the final ones, where exact steps are
preferred according to our computational work criterion. We observe a small increase
of the proportion of the number of Cherno↵ steps from levels 1F/2C to levels 3F/4C
(strictly speaking, a shift in the median and the third quartile). This is due to
consecutive refinements in the values of , from 1e 5 to 1e 7, producing smaller and
226
smaller values of ⌧Ch . This is also because the Cherno↵ step size is, for some time
points, still smaller than the grid size and also because the cost of reaching the time
horizon using Cherno↵ steps is still preferred over the cost of using exact steps. The
abundance of outliers at all levels up to `=3 indicates that the Cherno↵ bound is
actively controlling the size of the tau-leap steps.
Figures 6.14 and 6.15 show the total count of tau-leap and exact steps, respectively.
These plots are intended as diagnostic plots with two main objectives: i) checking
the telescoping sum property as stated in Remark 6.2.1, and ii) understanding the
‘blending’ phenomena in our simulated hybrid paths, that is, the presence of both
methods, tau-leap and exact. It could be useful to think in terms of the domain of
each method: given a time mesh, t, and a value of the one-step exit probability
bound, , we could decompose the interval [0, T ] into two domains, ITL and IMNRM ,
for the tau-leap and exact methods, respectively. The domain, IMNRM , should be
monotonically increasing with refinements of the time mesh and , since when the
size of the time mesh, t` or `, goes to zero, the expected number of tau-leap steps
also goes to zero, see [2], Appendix A. As a consequence, we expect the total count
of exact paths to be a monotonically increasing function of the level, `. On the other
hand, the domain ITL decreases, but, since the size of the time mesh halves from by
passing from one level to the next one, we expect to see also an increasing number
of tau-leap steps, at least for no very deep levels. The blending e↵ect of the hybrid
decision rules in Algorithm 14 are depicted in Figure 6.16, where the proportion of
the tau-leap steps over the total number of steps is shown for levels ` 2 {0, 5, 8}. In
the left panel, we can see that the number of tau-leap steps dominates except close to
the origin, where the coarse time-mesh is finer. Remember that in our methodology,
our initial mesh can be nonuniform. We then see how the domain, IMNRM , increases
until it occupies almost 80% of the time interval [0, T ].
Remark 6.5.1. The savings in computational work when generating Poisson random
227
Chernoff steps over total tau−leap for TOL = 1.25e−02
1e−10
1
1e−5
1e−5
1e−5
1e−5
1e−6
1e−6
1e−7
1e−7
1e−8
1e−8
1e−8
1e−8
1e−9
1e−9
1e−9
1e−9
0.9
0.8
Proportion 0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0 1C 1F 2C 2F 3C 3F 4C 4F 5C 5F 6C 6F 7C 7F 8C 8F
Level and δl
Figure 6.13: Proportion of the number of Cherno↵ tau-leap steps over the total
number of tau-leap steps for the gene transcription and translation model (6.32). In
the x-axis, we show the corresponding level (starting from level 0) and, subsequently,
the coarse (C) and fine (F) level. Below the title, we show the corresponding ` of
each level. We observe a small increase in the proportion of the number of Cherno↵
steps from levels 1F/2C to levels 3F/4C (strictly speaking, a shift in the median and
the third quartile). This is due to consecutive refinements in the values of , from
1e 5 to 1e 7, producing smaller and smaller values of ⌧Ch .
variables heavily depend on MATLAB’s performance capabilities. For example, we

do not generate the random variates in batches, as in [1], and that could have an
impact on the results. In fact, we should expect better results from our method if we
implement our algorithms in more performance-oriented languages or if we sample
Poisson random variables in batches.
Remark 6.5.2. (Level 0 time mesh) In this example, we use an adaptive mesh at
level 0. This is because this example is mildly sti↵. Using a uniform time mesh at
level 0 imposes a small time step size requirement for all time which is not needed.
Moreover, this issue is propagated to the finer levels. In all our numerical examples,
228
Total tau−leap steps for TOL = 1.25e−02
1e−10
1e−5
1e−5
1e−5
1e−5
1e−6
1e−6
1e−7
1e−7
1e−8
1e−8
1e−8
1e−8
1e−9
1e−9
1e−9
1e−9
3000
2500
2000
Total count
1500
1000
500
0
0 1C 1F 2C 2F 3C 3F 4C 4F 5C 5F 6C 6F 7C 7F 8C 8F
Level and δl
Figure 6.14: Total number of tau-leap steps per path for the gene transcription and
translation model (6.32). In the x-axis, we show the corresponding pairings of two
consecutive levels (starting from level 0) and, subsequently, the coarse (C) and fine
(F) meshes for two consecutive levels. Below the title, we show the corresponding
` of each level. The domain ITL of the tau-leap method decreases with refinements,
but, since the size of time mesh halves from by passing from one level to the next one,
we see an increasing number of tau-leap steps until, at a certain level, there are no
more tau-leap steps due to the relative computational cost of the tau-leap method.
at level 0, we use the coarsest possible time mesh such that the Forward Euler method
is numerically stable.
6.6 Conclusions
In this work, we developed a multilevel Monte Carlo version for the single-level hybrid
Cherno↵ tau-leap algorithm presented in [2]. We showed that the computational
complexity of this method is of order O (T OL 2 ) and, therefore, that it can be seen
as a variance reduction of the SSA method, which has the same complexity. This
represents an important advantage of the hybrid tau-leap with respect to the pure
229
Exact steps for TOL = 1.25e−02
1e−10
1e−5
1e−5
1e−5
1e−5
1e−6
1e−6
1e−7
1e−7
1e−8
1e−8
1e−8
1e−8
1e−9
1e−9
1e−9
1e−9
5000
4500
4000
3500
Total count
3000
2500
2000
1500
1000
500
0
0 1C 1F 2C 2F 3C 3F 4C 4F 5C 5F 6C 6F 7C 7F 8C 8F
Level and δl
Figure 6.15: Total number of exact steps per path for the gene transcription and
translation model (6.32). In the x-axis, we show the corresponding pairings of two
consecutive levels (starting from level 0) and, subsequently, the coarse (C) and fine
(F) meshes for two consecutive levels. Below the title, we show the corresponding `
of each level. The domain IMNRM of the exact method is monotonically increasing
with refinements of the time mesh and the one-step exit probability bound. As a
consequence, we expect the total count of exact paths to be a monotonically increasing
function of the level, `.
Average proportion of tau−leap steps, level 0 Average proportion of tau−leap steps, level 5 Average proportion of tau−leap steps, level 8
1 1 1
tau−leap steps over total steps
0.9 0.9 0.9
0.8 0.8 0.8
0.7 0.7 0.7
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Time Time Time
Figure 6.16: This figure depicts the ‘blending’ e↵ect produced by our hybrid path-
simulation algorithm. Here, we can see the proportion of the tau-leap steps adaptively
taken based on expected work optimization. We see how the presence of the tau-leap
decreases when we move to the deepest levels. We observe that, for the chosen
tolerance, to couple with an exact path at the last level is not optimal.
230
tau-leap in the multilevel context. In our numerical examples, we obtained substantial
gains with respect to both the SSA and the single-level hybrid Cherno↵ tau-leap. The
present approach, like the one in [2], also provides an approximation of E [g(X(T ))]
with prescribed accuracy and confidence level, with nearly optimal computational
work. For reaching this optimality, we derived novel formulas based on dual-weighted
residual estimations for computing the variance of the di↵erence of the observables
between two consecutive levels in coupled hybrid paths and also the bias of the deepest
level (see (6.7) and (6.8)). These formulas are particularly relevant in the present
context of Stochastic Reaction Networks due to the fact that alternative standard
sample estimators become too costly at deep levels because of the presence of large
kurtosis.
Future extensions may involve better hybridization techniques as well as implicit
and higher-order versions of the hybrid MLMC.
Acknowledgments
The authors would like to thank two anonymous reviewers for their constructive
comments that helped us to improve our manuscript. We also would like to thank
Prof. Mike Giles for very enlightening discussions. The authors are members of
the KAUST SRI Center for Uncertainty Quantification in the Computer, Electrical
and Mathematical Sciences and Engineering Division at King Abdullah University of
Science and Technology (KAUST). This work was supported by KAUST.
231
Appendix
232
Algorithm 14 Coupled hybrid path. Inputs: the initial state, X(0), the final time,
T , the propensity functions, a=(aj )Jj=1 , the stoichiometric vectors, ⌫=(⌫j )Jj=1 , two
one-step exit probability bounds; one for the coarse level, ¯, and another for the fine
level, ¯, and two time meshes, one coarse (tk )K k=0 , such that tK =T and a finer one,
K0 0
(sl )l=0 , such that s0 =t0 , sK =tK , and (tk )k=0 ⇢ (sl )K
0
K
l=0 . Outputs: a sequence of states
evaluated at the coarse grid, (X̄(tk ))K k=0 ⇢ Z+ , such that tK  T , a sequence of states
d
evaluated at the fine grid (X̄ ¯ (s ))K ⇢ Zd , such that X̄(t ) 2 Zd or X̄

0 ¯ (s 0 ) 2 Zd . If
l l=0 + K + K +
tK < T , both paths exited the Zd+ lattice before the final time, T . It also returns the
number of times the tau-leap method was successfully applied at the fine level and
at the coarse level, and the number of exact steps at the fine level and at the coarse
level. For the sake of simplicity, we omit sentences involving the recording of X̄(tk )
and X̄ ¯ (s ) from the current state variables X̄ and X̄ ¯ , respectively, the counting of the
l
number of steps, and the return sentence.
1: t 0, X̄ X(0), X̄ ¯ X(0)
2: t̄ next grid point in the coarse grid larger than t
3: (H̄, m̄, ā) Algorithm 15 with (X̄,t,t̄,T , ¯,a)
4: ¯
t̄ next grid point in the fine grid larger than t
5: ¯ ¯ ¯)
(H̄, m̄, ā Algorithm 15 with (X̄ ¯ ,t,t̄¯,T , ¯,a)
6: while t < T do
7: H min{H̄, H̄ ¯}
8: if m̄ = TL and m̄ ¯ = TL then
9: S ¯)
Algorithm 17 with (ā, ā
10: ⇤ P(S·(H t)) (generate Poisson random variates)
11: ¯
X̄ ¯ + (⇤ +⇤ )⌫
X̄ 1 2
12: X̄ X̄ + (⇤1 +⇤3 )⌫
13: t H
14: else
15: Initialize internal clocks R, P if needed (see Algorithm 12)
16: while t < H do
17: ¯ = MNRM then
if m̄
18: ¯
ā ¯)
a(X̄
19: end if
20: if m̄ = MNRM then
21: ā a(X̄)
22: end if
23: S ¯)
Algorithm 17 with (ā, ā
24: ¯
(t, X̄, X̄, R, P ) ¯ , R, P, S)
Algorithm 16 with (t, H, X̄, X̄
25: end while
26: end if
233
27: if t < T then

28: if H = H̄ then
29: t̄ next grid point in the coarse grid larger than t
30: (H̄, m̄, ā) Algorithm 15 with (X̄,t,t̄,T , ¯,a)
31: end if
32: if H = H̄ ¯ then
33: ¯
t̄ next grid point in the fine grid larger than t
34: ¯ ¯ , ā
(H̄, m̄ ¯) Algorithm 15 with (X̄ ¯ ,t,t̄¯,T , ¯,a)
35: end if
36: end if
37: end while
Algorithm 15 Compute next time horizon. Inputs: the current state, X̃, the current
time, t, the next grid point, t̃, the final time, T , the one step exit probability bound, ˜,
and the propensity functions, a=(aj )Jj=1 . Outputs: the next horizon H, the selected
method m, current propensity values ã.
1: ã a(X̃)
2: (m, ⌧˜) Algorithm 13 with (X̃,t,ã, ˜,t̃)
3: if m = TL then
4: H min{t̃, t+˜⌧, T}
5: else
6: H min{t+˜ ⌧, T}
7: end if
8: return (H, m, ã)
234
Algorithm 16 Auxiliary function used in Algorithm 14. Inputs: the current time,
t, the current time horizon, H, the current system state at coarser level, X̄, and finer
¯ , the internal clocks R , P , i=1, 2, 3, and the values, S , i=1, 2, 3 (see Section
level, X̄ i i i
6.1.2 for more information on these values). Outputs: updated time, t, updated
system states, X̄, X̄ ¯ , and updated internal clocks R , P , i=1, 2, 3.
i i
1: ti (Pi Ri )/Si , for i=1, 2, 3
2: mini { ti }
3: µ argmini { ti }
4: if t + > H then
5: R R + S·(H t)
6: t H
7: else
8: update X̄ and X̄ ¯
9: R R+S
10: r uniform(0, 1)
12: t t+
13: end if
14: return (t, X̄, X̄ ¯ , R, P )
Algorithm 17 Auxiliary function used in Algorithm 14. Inputs: the propensity

¯. Output: Si , i=1, 2, 3.
values at the coarse and fine grid, ā, ā
1: S1 ¯)
min(ā, ā
2: S2 ā S1
3: S3 ¯ S1
ā
4: return S
235
Algorithm 18 Multilevel calibration and error estimation. Inputs: same as Algo-

rithm 14 plus the observable, g, and the prescribed tolerance, T OL>0. Outputs:
(M` )L`=0 , ( ` )L`=0 , ((tn,` )N L
n=0 )`=0 , the estimated computational work of the multilevel
`
estimator, ŴML , and the estimated computational work of the SSA method, ŴSSA .
We denote by gl ⌘ g(X̄l (T ; ! ¯ )), and gl+1 gl ⌘ g(X̄l+1 (T ; ! ¯ )). Here, C ⇤
¯ )) g(X̄l (T ; !
is the unitary cost of a pure SSA step, and c is the factor of refinement of (in our
experiments c=10). See also Remark 6.4.1 regarding the estimators of Var [g(X(T ))]
and E [g(X(T ))], and Remark 6.4.3.
(a)
1: l 0, l 0.01, ŴM L 1
0
2: Set initial meshes (tk )K k=0 and (sl )K
l=0
3: fin-delta false
4: while not fin-delta do
5: ( ˆ0 , S 2 (gl ; ·) , A ({gl , EI , NSSA⇤ , NTL }; ·)) Algorithm 23
2 2
6: if V̂l (1 l A (NTL ; ·)) 2S (gl ; ·) l A (NTL ; ·) and l A (NTL ; ·) < 0.1 then
7: fin-delta true
8: Refine l by a factor of c
9: end if
10: end while
11: l+1 l
12: fin false
13: while not fin do
14: fin-delta false
15: while not fin-delta do
16: ( ˆl+1 , V̂l+1 , A ({gl+1 , NSSA⇤ , EI , NT L,l+1 }; ·) , S 2 (gl+1 ; ·)) Algorithm 19
2 2
17: if V̂l+1 (1 l+1 A (NT L,l+1 ; ·)) 2S (gl+1 ; ·) l+1 A (NT L,l+1 ; ·) and
l+1 A (N T L,l+1 ; ·) < 0.1 then
18: fin-delta true
19: l l+1
20: else
21: Refine l+1 by a factor of c
22: end if
23: end while
24: MSSA CA2 S 2 (gl+1 ; ·)/T OL2
236
25: ŴSSA C ⇤ MSSA A (NSSA⇤ ; ·)

26: if EÎ < T OL T OL2 then
(M` )l+1 ˆ l+1 l+1 ˆ
27: `=0 P Algorithm 22 with (( ` )`=0 , (V̂` )`=0 , T OL, EI )
l+1 ˆ
28: ŴML `=0 ` M`
29: else
30: ŴML 1
31: end if
(a)
32: if (ŴML > ŴML or EÎ > T OL T OL2 ) and A (NT L,l+1 ; ·) > 0 then
33: l l+1
(a)
34: ŴML ŴML
K0
35: Refine meshes (tk )K k=0 and (sl )l=0
36: else
2
37: fin l+1 A (NTL,l+1 ; ·) A (gl+1 ; ·)  T OL
38: if not fin then
cb(logc (T OL /(A(gl+1 ;·)·A(NTL,l+1 ;·))c
2
39: l+1
40: while not fin do
41: (A ({gl+1 , NT L,l+1 }; ·)) Algorithm 19
2
42: fin l+1 A (NTL,l+1 ; ·) A (gl+1 ; ·)  T OL
43: if not fin then
44: Refine l+1 by a factor of c
45: end if
46: end while
47: end if
48: if A (NTL,l+1 ; ·) = 0 then
49: l l+1
50: (M` )l`=0 Algorithm 22 with (( ˆ` )l`=0 , (V̂` )l`=0 , T OL, 0)
Pl ˆ
51: ŴML `=0 ` M`
52: end if
53: end if
54: end while
237
Algorithm 19 Auxiliary function for Algorithm 18. Inputs: same as Algorithm

14. h Outputs: the estimated runtime of the coupled path, ˆ, an estimate of
i ⇣ ⌘
Var g(X̄(T )) g(X̄ ¯ (T )) , V̂, an estimate of E [g(X(T ))], A g(X̄ ¯ (T ); · , an estimate
of the expected number of steps needed by the SSA method, A (NSSA⇤ ; ·)), an estimate
of E [EI ], A (EI ; ·), an estimate of the expected number of tau-leap ⇣ steps⌘ taken at the
fine level, A (NTL ; ·), and an estimate of Var [g(X(T ))], S 2 g(X̄ ¯ (T ); · . Here, X̄
¯ (t)
refers to the approximated process using a finer grid than the approximated process,
X̄(t). Moreover, (X̄(t),X̄ ¯ (t)) are two coupled paths. Here, 1 (k) = 1 if and only
TL
if the decision at time tk was tau-leap. Set appropriate values for M0 and CV0 . For
the sake of simplicity, we omit the arguments of the algorithms when there is no risk
of confusion. See also Remark 6.4.1 regarding the estimators of Var [g(X(T ))] and
E [g(X(T ))].
1: M M0 , cv 1, Mf 0
2: while cv > CV0 do
3: for m 1 to M do
Generate two coupled paths: (X̄(sl ; ! ¯ m ))K 0 ¯ ¯ K0
4: l=0 , (X̄ (sl ; !m ))l=0 , Algorithm 14
6: Mf Mf + 1
7: ¯ ¯ m ))
(Se (!m ), Sv (! Algorithm 21 with (X̄(tk ; ! ¯ m ))K
k=0
8: ¯
EI (!m ) ¯ ¯
Algorithm 20 with (X̄ (sl ; !m ))l=0 K 0
R ¯ (s))ds (see [2])

9: Estimate NSSA⇤ (! ¯ m ), using 0T a0 (X̄
PJ PK 0
10: ¯m)
CP oi (! j=1 l=0 CP (aj (X̄(sl ))(sl+1 sl )1T L (l)
P J P K0 ¯ (s ))(s
11: + j=1 C (a (X̄
l=0 P j l l+1 ls )1 (l)
TL
(c) (c) (c)
12: Compute NK1 , NK2 , NTL , and NTL
13: end if
14: end for
15: V̂ S 2 (Se ; Mf ) + A (Sv ; Mf )
16: Compute the coefficient of variation cvV and cvEI of V̂ and A (EI ; ·), respectively.
17: cv ⇣ V , cvEI }⌘
max{cv ⇣ ⌘ ⇣ ⌘
18: ˆ (c) (c) (c)
C1 A NK1 ; Mf +C2 A NK2 ; Mf +C3 A NTL ; Mf +A (CP oi ; Mf )
19: M 2M
20: end while
⇣ ⇣ ⌘ ⇣ ⌘⌘
21: return ˆ, V̂, A {g(X̄ ¯ (T )), N ⇤ , E , N }; M , S 2 g(X̄
¯ (T )); M
SSA I TL f f
238
Algorithm 20 Compute the discretization error of a given approximated path. In-

puts: (X̄(tk ))K
k=0 . Here, 1T L (k) = 1 if and only if the decision at time tk was tau-leap,
and Id is the d ⇥ d identity matrix Output: EI . Notes: xk ⌘ X̄(tk ).
1: EI 0
2: Compute 'K rg(xk )
4: tk tk+1 tk
5: Compute Ja = [@i aj (xk )]j,i
6: 'k Id + tk JTa ⌫ T 'k+1
7: ak a(xk+1 ) a(xk )
8: EI EI + 2tk ( ak 1T L (k) ⌫ T ) 'k
9: end for
10: return EI
Algorithm 21 Compute Se ⌘ Se (¯ ! ) and Sv ⌘ Sv (¯ ! ) defined in (6.16). Inputs:

K
(X̄(tk ))k=0 and a positive constant c. Outputs: Se and Sv . Notes: if a is a vector,
then, diag(a) is a diagonal matrix with main diagonal a. Here, 1T L (k) = 1 if and only
if the decision at time tk was tau-leap, Id is the d ⇥ d identity matrix, xk ⌘ X̄(tk ),
and (x) is the cumulative distribution function of a Gaussian random variable.
1: Se 0
2: Sv 0
3: Compute 'K rg(xk )
5: tk tk+1 tk
6: Compute Ja = [@i aj (xk )]j,i
7: 'k Id + tk JTa ⌫ T 'k+1
8: ⌫' ⌫ T 'k
T
9: ⌫a (Ja ⌫)
P
tk
10: µj 2 Pi
(raj (xk )·⌫i ) ai (xk )
tk
11: µ̄j 2 Pi
|(raj (xk )·⌫i )| ai (xk )
2 tk 2
12: j 2 i (raj (xk )·⌫i ) ai (xk )
tk
13: Se Se +1TL (k) 2 µ ⌫'
( t k )3
14: aux1 (⌫a ⌫' )T diag(a)(⌫a ⌫' )
8 ⇣ q ⌘
tk P 2 n o µ (1 µj 2 1 µj 2
15: aux2 2 j (' k ·⌫ j ) 1 tk
aj (xk )>c j 2 ( j
)) + ⇡ j exp( (
2 j
) )
P
2
n q o
tk 2 n
16: aux3 2 j ('k ·⌫j ) 1 tk
a (x )<c
o min µ̄ ,
j µ2j + j2
2 j k
17: Sv Sv +1TL (k)(aux1 + aux2 + aux3 )

18: end for
19: return (Se , Sv )
239
Algorithm 22 Solve the optimization problem (6.27) using a greedy scheme. Inputs:
the estimations of the coupled path cost for all the levels, ( ˆ` )L`=0 , the estimation of
the variance of the quantity of interest at level 0, V̂0 , the estimations of the di↵er-
ences of the quantity of interest for all the coupled levels, (V̂` )L`=1 , the prescribed
tolerance, T OL, and the weak error estimation for level L, EI . Output: the number
of realizations needed for each level, (M )L`=0 .
q
PL k ˆ
`=0 ` V̂`
Define qk := PL
RHS `=L k+1 V̂`
1: RHS ((T OL T OL2 EI )/CA )2
2: fin false
3: k 0
4: while not fin and k  L do
5: if ˆL k qk2 V̂L k < 0 then
6: fin true q
7: L k
(M` )`=0 qk V̂` / ˆ`
8: else
9: ML k 1
10: k k+1
11: end if
12: end while
13: return (M` )L `=0
240
Algorithm 23 Auxiliary function for Algorithm 18. Inputs: same as Algorithm

14. Outputs: ˆ
⇥ ⇤the 2estimated runtime of the hybrid path at level 0, 0 , an estimate of
Var g(X̄(T )) , S g(X̄(T ); · , an estimate of E [g(X(T ))], A g(X̄(T ); · , an estimate
of E [EI ], A (EI ; ·), an estimate of the expected number of steps needed by the SSA
method, A (NSSA⇤ ; ·)) and A (NTL ; ·). Here, 1T L (k) = 1 if and only if the decision
at time tk was tau-leap. Notes: the values C1 , C2 and C3 are described in Section
6.4. Set appropriate values for M0 and CV0 . For the sake of simplicity, we omit the
arguments of the algorithms when there is no risk of confusion.
1: M M0 , cv 1, Mf 0
2: while cv > CV0 do
3: for m 1 to M do
4: ((X̄(tk ))K k=0 , NTL , NSSA,K1 , NSSA,K2 ) generate one hybrid path (see [2])
6: Mf Mf + 1
7: Compute g(X̄(T ; ! ¯ m ))
8: EI Algorithm 20 with (X̄(tk ))K k=0
9: (Se (!¯ m ), Sv (!¯ m )) Algorithm 21 with (X̄(tk ))K k=0
RT ¯
10: ⇤ ¯
Estimate NSSA (!m ), using 0 a0 (X̄ (s))ds (see [2])
PJ PK
11: CP oi (¯
!m ) j=1 k=0 CP (aj (X̄(tk ))(tk+1 tk ))1T L (k)
12: end if
13: end for
14: V̂ S 2 (Se ; Mf ) + A (Sv ; Mf )
15: Estimate
h the coefficientsi of⇥ variation⇤ cvV , cvg and cvEI of the estimators of
¯
Var g(X̄(T )) g(X̄ (T )) , Var g(X̄(T )) and E [E ], respectively.
I
16: cv max{cvV , cvg , cvEI }
17: ˆ0 C1 A (NSSA,K1 ; Mf ) +C2 A (NSSA,K2 ; Mf ) +C3 A (NTL ; Mf ) +A (CP oi ; Mf )
18: M 2M
19: end while
20: return ( ˆ0 , S 2 g(X̄(T )); Mf , A {g(X̄(T )), EI , NSSA⇤ , NTL }; Mf )
241
REFERENCES
vol. 10, no. 1, 2012.

Modeling & Simulation, vol. 12, no. 2, pp. 581–615, 2014.
[3] M. Giles, “Multi-level Monte Carlo path simulation,” Operations Research,

vol. 53, no. 3, pp. 607–617, 2008.

9 2005.

vol. 22, pp. 403–434, 1976.
Physics, vol. 127, no. 21, 2007.

2001.
[8] J. P. Aparicio and H. Solari, “Population dynamics: Poisson approximation and

its relation to the langevin processs,” Physical Reveiw Letters, vol. 86, no. 18,
pp. 4183–4186, Apr. 2001.
systems,” Multiscale Modeling and Simulation, vol. 6, no. 2, pp. 417–436, 2007.
242
[10] S. Heinrich, “Multilevel Monte Carlo methods,” in Large-Scale Scientific Com-
puting, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg,
2001, vol. 2179, pp. 58–67.
[11] ——, “Monte Carlo complexity of global solution of integral equations,” Journal
of Complexity, vol. 14, no. 2, pp. 151–175, 1998.
[12] A. Speight, “A multilevel approach to control variates,” Journal of Computa-

tional Finance, vol. 12, pp. 1–25, 2009.
[13] D. Anderson, D. Higham, and Y. Sun, “Complexity of multilevel Monte Carlo

tau-leaping,” arXiv:1310.2676v1, 2013.
[15] N. Collier, A.-L. Haji-Ali, F. Nobile, E. von Schwerin, and R. Tempone, “A

continuation multilevel Monte Carlo algorithm,” Mathematics Institute of Com-
putational Science and Engineering, Technical report Nr. 10.2014, EPFL, 2014.
[16] S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality (com-
plete samples),” Biometrika, vol. 52, no. 3/4, pp. 591–611, Dec. 1965.

istry A, vol. 104, no. 9, pp. 1876–1889, 2000.
[18] J. Karlsson, M. Katsoulakis, A. Szepessy, and R. Tempone, “Automatic Weak

Global Error Control for the Tau-Leap Method,” preprint arXiv:1004.2948v3,
pp. 1–22, 2010.
[19] T. G. Kurtz, “Representation and approximation of counting processes,” Adv.

Filtering OptimalStochastic Control 42, 1982.
[20] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming (International

Series in Operations Research and Management Science). Springer, 2010.
243
[21] C. Bierig and A. Chernov, “Convergence analysis of multilevel variance esti-
mators in multilevel Monte Carlo methods and application for random obsta-
cle problems,” Institute for Numerical Simulation, University of Bonn, Preprint
1309, 2013, submitted.
244
Chapter 7
A multilevel adaptive
reaction-splitting simulation
method for stochastic reaction
networks
1
Abstract
Stochastic modeling of reaction networks is a framework used to describe the time evo-
lution of many natural and artificial systems, including, biochemical reactive systems
at the molecular level, viral kinetics, the spread of epidemic diseases, and wireless
communication networks, among many other examples. In this work, we present a
novel multilevel Monte Carlo method for kinetic simulation of stochastic reaction net-
works that is specifically designed for systems in which the set of reaction channels
can be adaptively partitioned into two subsets characterized by either “high” or “low”
activity. Adaptive in this context means that the partition evolves in time according
1
A. Moraes, R. Tempone and P. Vilanova, “Multilevel adaptive reaction-splitting simulation
method for stochastic reaction networks”, preprint arXiv:1406.1989v1, (2014).
245
to the states visited by the stochastic paths of the system. To estimate expected
values of observables of the system at a prescribed final time, our method bounds
the global computational error to be below a prescribed tolerance, T OL, within a
given confidence level. This is achieved with a computational complexity of order
O (T OL 2 ), the same as with an exact method, but with a smaller constant. We also
present a novel control variate technique based on the stochastic time change repre-
sentation by Kurtz, which may dramatically reduce the variance of the coarsest level
at a negligible computational cost. Our numerical examples show substantial gains
with respect to the standard Stochastic Simulation Algorithm (SSA) by Gillespie and
also our previous hybrid Cherno↵ tau-leap method.
7.1 Introduction
Stochastic reaction networks (SRN) are mathematical models that employ Markovian
dynamics to describe the time evolution of interacting particle systems where one
particle interact with the others through a finite set of reaction channels. Typically,
there is a finite number of interacting chemical species (S1 , S2 , . . . , Sd ) and a stochastic
process, X, such that its i-th coordinate is a non-negative integer number Xi (t) that
keeps track of the abundance of the i-th species at time t. Therefore, the state space
of the process X is the lattice Zd+ .
Our main goal is to estimate the expected value E [g(X(T ))], where X is a non-
homogeneous Poisson process describing a SRN, and g : Rd ! R is a given real
observable of X at a final time T . Pathwise realizations can be simulated exactly
using the Stochastic Simulation Algorithm (SSA), introduced by Gillespie in [1] (also
known as Kinetic Monte Carlo among physicists, see [2] and references therein), or
the Modified Next Reaction Method (MNRM) introduced by Anderson in [3], among
other methods. Although these algorithms generate exact realizations of X, they
246
may be computationally expensive for systems that undergo high activity. For that
reason, Gillespie proposed in [4] the tau-leap method to approximate the SSA by
evolving the process with fixed time steps while freezing the propensity functions at
the beginning of each time step.
A drawback of the tau-leap method is that the simulated paths may take negative
values, which is a nonphysical consequence of the approximation and not a qualitative
feature of the original process. For that reason, we proposed in [5, 6] a Cherno↵-based
hybrid method that switches adaptively between the tau-leap and an exact method.
This allows us to control the probability of reaching negative values while keeping the
computational work substantially smaller than the work of an exact method. The
hybrid method developed in [5, 6] can be successfully applied to systems where the
state space, Zd+ , can be decomposed into two regions according to the activity of the
system; where all the propensities are uniformly low or uniformly high, i.e., non-sti↵
systems. To handle sti↵ systems, we first measure the total activity of the system at
a certain state by the total sum of the propensity functions evaluated at this state.
The activity of the system is low when all the propensities are uniformly low, but a
high level of activity can be the result of a high activity level in one single channel.
This observation suggests that to reduce computational costs, we should adaptively
split the set of reaction channels into two subsets according to the individual high and
low activity levels. It is natural to evolve the system in time by applying the tau-leap
method to the high activity channels and an exact method to the low activity ones.
This is the main idea we develop in this work.
Reaction-splitting methods for simulating stochastic reaction networks are treated
for instance in [7, 8, 9, 10], but our work is, to the best of our knowledge, the first
that i) achieves the computational complexity of an exact method like the SSA by
using the multilevel Monte Carlo paradigm, ii) explicitly uses a decomposition of
the global error to provide all the simulation parameters needed to achieve our goal
247
with minimal computational e↵ort, iii) e↵ectively controls the global probability of
reaching negative populations with the tau-leap method, and iv) needs only two user-
defined parameters that are natural quantities - the maximum allowed relative global
error or tolerance and the confidence level.
In [7], the authors propose an adaptive reaction-splitting scheme that considers
not only the exact and tau-leap methods but also the Langevin and Mean Field ones.
Their main goal is to obtain fast hybrid simulated paths, and they do not try to
control the global error. The efficiency of their method is measured a posteriori using
smoothed frequency histograms that should be close to the exact ones according to the
distance defined by Cao and Petzold in [11]. In their work, the tau-leap step is chosen
according to the “leap condition” (as in [12]) but they do not perform a rigorous
control of the global discretization error. In order to avoid negative populations,
the authors reverse population updates if any value is found to be negative after
accounting for all the reactions. Then, the tau-lep step size is decremented and the
path simulation is restarted. This approach introduces bias in the estimations, and
even by controlling the small reactant populations, a tau-leap step always may lead to
negative populations subsequently increasing its computational work. Our Cherno↵
-based bound is a fast and accurate procedure to obtain the correct tau-leap step size.
Finally, the method in [7] needs to define three parameters that quantify the speed
of the reaction channels, which, in principle, are not trivial to determine for a given
problem.
Puchalka and Kierzek’s approach [8] seems to be closest to our approach in spirit
since they also explore the idea of adaptively splitting the set of reaction channels
using the tau-leap method for the fast ones and an exact method for the slow ones.
They seek to simulate fast approximate paths while maintaining qualitative features
of the system. The quantitative features are checked a posteriori against an exact
method. Regarding their tau-leap step size selection, Puchalka and Kierzek consider
248
a user-defined maximal time step empirically chosen by numerical tests instead of
controlling the discretization error. Their classification rule is applied individually to
each reaction channel. It takes into account both the percentage of individual activity
and the abundance of the species consumed. In a certain sense it can be seen as a
way of controlling the probability of negative populations and an ad-hoc manner to
split the reaction channels by optimizing the computational work.
In [9] and [10], the reaction-splitting issue is addressed but the partition method
is not adaptive, i.e., fast and slow reaction channels are identified o✏ine and are
inputs of the algorithms. We note that these works do not provide any measure or
control of the resulting global error. Furthermore, they do not control the probability
of attaining negative populations.
In the remaining of this section, we introduce the mathematical model and the
path simulation techniques used in this work. In Section 7.2, we present an algorithm
to generate mixed trajectories; that is, the algorithm generates a trajectory using
an exact method for the low activity channels and the Cherno↵ tau-leap method
for the high activity ones. Then, inspired by the ideas of Anderson and Higham
[13], we propose an algorithm for coupling two mixed Cherno↵ tau-leap paths. This
algorithm uses four building blocks that result from the combination of the MNRM
and the tau-leap methods. In Section 7.3, we propose a mixed MLMC estimator.
Next, we introduce a global error decomposition and show that the computational
complexity of our method is of order O (T OL 2 ). Finally, we show the automatic
procedure that estimates our quantity of interest within a given prescribed relative
tolerance, up to a given confidence level. Next, in Section 7.4, we present a novel
control variate technique to reduce the variance of the quantity of interest at level 0.
In Section 7.5, the numerical examples illustrate the advantages of the mixed
MLMC method over the hybrid MLMC method presented in [6] and to the SSA.
Finally, Section 7.6 presents our conclusions.
249
7.1.1 A Class of Markovian Pure Jump Processes
In this section, we describe the class of Markovian pure jump processes, X : [0, T ] ⇥
⌦ ! Zd+ , frequently used for modeling stochastic biochemical reaction networks.
Consider a biochemical system of d species interacting through J di↵erent reaction
channels. For the sake of brevity, we write X(t, !)⌘X(t). Let Xi (t) be the number
of particles of species i in the system at time t. We study the evolution of the state
vector, X(t) = (X1 (t), . . . , Xd (t)) 2 Zd+ , modeled as a continuous-time Markov chain
starting at X(0) 2 Zd+ . Each reaction can be described by the vector ⌫j 2 Zd , such
that, for a state vector x 2 Zd+ , a single firing of reaction j leads to the change
x ! x + ⌫j . The probability that reaction j will occur during the small interval
(t, t+dt) is then assumed to be
P (X(t + dt) = x + ⌫j |X(t) = x) = aj (x)dt + o (dt) (7.1)
for a given non-negative polynomial propensity function, aj : Rd ! R. We set

/ Zd+ .
The process X admits the following random time change representation by Kurtz
[14]:
J
X ✓Z t ◆
X(t) = X(0) + ⌫ j Yj aj (X(s)) ds , (7.2)
j=1 0
where Yj : R+ ⇥⌦ ! Z+ are independent unit-rate Poisson processes. Hence, X is a

non-homogeneous Poisson process.
7.1.2 The Modified Next Reaction Method (MNRM)
The MNRM, introduced in [3], and based on the Next Reaction Method [15], is an
exact simulation algorithm like Gillespie’s SSA that explicitly uses representation
(7.2) for simulating exact paths and generates only one exponential random variable
250
per iteration. The reaction times are modeled with firing times of Poisson processes,
Yj , with internal times given by the integrated propensity functions.
The randomness is now separated from the state of the system and is encapsulated
in the Yj ’s.
Computing the next reaction and its time is equivalent to computing how much
time passes before one of the Poisson process, Yj , fires, and which process fires at
that particular time, by taking the minimum of such times.
It is important to mention that the MNRM is used to simulate correlated exact/tau-
leap paths as well as nested tau-leap/tau-leap paths, as in [6, 13]. In Section 7.2.5,
we use this feature for coupling two mixed paths.
7.1.3 The Tau-Leap Approximation
In this section, we define X̄, the tau-leap approximation of the process, X, which
follows from applying the forward-Euler approximation to the integral term in the
random time change representation (7.2).
The tau-leap method was proposed in [4] to avoid the computational drawback
of the exact methods, i.e., when many reactions occur during a short time interval.
The tau-leap process, X̄, starts from X(0) at time 0, and given that X̄(t)=x̄ and a
time step ⌧ >0, we have that X̄ at time t+⌧ is generated by
J
X
X̄(t + ⌧ ) = x̄ + ⌫j Pj (aj (x̄)⌧ ) ,
j=1
where {Pj ( j )}Jj=1 are independent Poisson distributed random variables with pa-
rameter j, used to model the number of times that the reaction j fires during the
(t, t+⌧ ) interval. Again, this is nothing else than a forward-Euler discretization of the
stochastic di↵erential equation formulation of the pure jump process (7.2), realized
by the Poisson random measure with state-dependent intensity (see, e.g., [16]).
251
In the limit, when ⌧ tends to zero, the tau-leap method gives the same solution
as the exact methods [16]. The total number of firings in each channel is a Poisson
distributed stochastic variable depending only on the initial population, X̄(t). The
error thus comes from the variation of a(X(s)) for s 2 (t, t+⌧ ).
7.1.4 The Hybrid Cherno↵ Tau-leap Method
In [5], we derived a Cherno↵-type bound that allows us to guarantee that the one-step
exit probability in the tau-leap method is less than a predefined quantity, >0. The
idea is to find the largest possible time step, ⌧ , such that, with high probability, in
the next step, the approximate process, X̄, will take a value in the lattice, Zd+ , of
non-negative integers.
This can be achieved by solving d auxiliary problems, one for each x-coordinate,
X̄i (t), i = 1, 2, . . . , d as follows. Find the largest possible ⌧i 0, such that
J
!
X
P X̄i (t) + ⌫ji Pj aj X̄(t) ⌧i < 0 X̄(t)  i, (7.3)
j=1
where i= /d, and ⌫ji is the i-th coordinate of the j-th reaction channel, ⌫j . Finally,
we let ⌧ := min{⌧i : i = 1, 2, . . . , d}.
Using the exact pre-leap method we developed in [5, 6] for single-level and mul-
tilevel hybrid schemes, allows us to switch adaptively between the tau-leap and an
exact method.
By construction, the probability that one hybrid path exits the lattice, Zd+ , can
be estimated by
⇥ ⇤ 2 ⇥ 2 ⇤
P (Ac )  E 1 (1 )NTL = E [NTL ] (E NTL E [NTL ]) + o( 2 ),
2
K(¯
!)
¯ 2 A if and only if the whole hybrid path, (X̄(tk , !
where ! ¯ ))k=0 , belongs to the
252
lattice, Zd+ , >0 is the one-step exit probability bound, and NTL (¯
! )⌘NTL is the
number of tau-leap steps in a hybrid path. Here, Ac is the complement of the set A.
To simulate a hybrid path, given the current state of the approximate process,
X̄(t), we adaptively determine whether to use an exact or the tau-leap method for
the next step. This decision is based on the relative computational cost of taking an
exact step versus the cost of taking a Cherno↵ tau-leap step. Instead, in the present
work, at each time step, we adaptively determine which reactions are suitable for
using the exact method and which reactions are suitable for the Cherno↵ tau-leap
method.
7.2 Generating Mixed Paths
In this section we explain how mixed paths are generated. First, we present the
splitting heuristic; that is, we discuss how to partition the set of reaction channels
at each decision time. Then, we present the one-step mixing rule, which is the main
building block for constructing a mixed path. Finally, we show how to couple two
mixed paths.
7.2.1 The Splitting Heuristic
In this section, we explain how we partition the set of reaction channels, R:={1, ..., J},
into RTL and RMNRM .
Let (t, x) be the current time and state of the approximate process, X̄, and H be
the next decision (or synchronization) time (given by the Cherno↵ tau-leap step size
⌧Ch = ⌧Ch (x, ) and the time mesh). We want to split R into two subsets, RMNRM
and RTL , such that the expected computational work of reaching H, starting at t, is
minimal for all possible splittings.
The idea goes as follows. First, we define a linear order on R, based on the basic
253
principle that we want to use tau-leap for the j-th reaction if its activity is high. This
linear order determines J+1 possible splittings, out of 2J . In order to measure the
activity, it turns out that using only the propensity functions evaluated at x, that
is, aj (x), is not enough. This is because the j-th reaction could a↵ect components
of x with small values. If this is the case, this determines small Cherno↵ tau-leap
step sizes. In order to avoid this scenario, we penalize the j-th reaction channel if
it has a high exit probability. We approximate this exit probability using a Poisson
distribution for each dimension of x. For example, let ⌫ji be the i-th component of
the j-th reaction channel. If ⌫ji < 0, then the probability that a Poisson distributed
random variable with rate aj (x)(H t) is greater than xi /⌫ji measures how likely
species xi can become negative in the interval H t, independently of reactions j 0 2R,
j6=j 0 . Let Ij :={i : ⌫ji < 0},
8 ⇣ ⌘
>
< P P(aj (x)(H t)) > mini2I { xi
j ⌫ji
} x if Ij 6= ;
✓j := . (7.4)
>
: 0 otherwise
Then, the penalty weight for aj (x) is 1 ✓j . We define ãj (x):=(1 ✓j )aj (x). The linear
order is then a permutation, , over R such that
ã (j) (x) > ã (j+1) (x), j=1, ..., J 1.
Second, we find among the J+1 partitions the one with optimal work. This is
the computational work incurred when performing one step of the algorithm using
tau-leap for the reactions RT L and the MNRM for the reactions RMNRM . The work
corresponding to RTL is
!
H t X
Work(RTL , x, t) := Cs + CP (aj (x)⌧Ch ) , (7.5)
min{⌧Ch , H t} j2RTL
254
where Cs is the work of computing the split (see Section 7.2.2), and CP ( ) is the work
H t
of a Poisson random variate with rate . The factor min{⌧Ch ,H t}
takes into account
the number of steps required to reach H = H(t) from t. For the Gamma simulation
method developed by Ahrens and Dieter in [17], which is the one used by MATLAB,
CP is defined as
8
>
< b1 +b2 ln for > 15
CP ( ) := .
>
: b3 +b4 for  15
In practice, it is possible to estimate bi , i=1, 2, 3, 4 using Monte Carlo sampling and

a least squares fit. For more details, we refer to [5].
Similarly, the work corresponding to RMNRM is
H t
Work(RMNRM , x, t) := CMNRM ,
min{⌧MNRM , H t}
⇣P ⌘ 1
where the constant CMNRM is the work of an MNRM step and ⌧MNRM = j2RMNRM aj (x) .
7.2.2 On the Work required to the Splitting Heuristic, Cs =Cs (J)
The work required to perform the splitting includes the work required to determine
Work(RTL ) and Work(RMNRM ), both defined in Section 7.2.1. The linear order pre-
viously defined determines J+1 possible splittings, Si , i=0, ..., J, as follows:
RTL RMNRM
S0 ; R
1 1 1
S1 { (1)} { (2), .., (J)}
.
1 1 1 1
S2 { (1), (2)} { (3), .., (J)}
..
.
SJ R ;
255
The cost of computing each of the J+1 splits is dominated by the cost of determining
the Cherno↵ tau-leap step size, ⌧Ch (see (7.5)). As we observe in [5], the work of
computing a single ⌧Ch is linear on J. Then, in order to avoid J 2 complexity of the
splitting rule, we implement a local search instead of computing J ⌧Ch ’s, to keep the
complexity of Cs linear on J. The main idea is to keep track of the last split at each
decision time, assuming that the propensities do not vary widely between. If that
is the case, we can just evaluate the previous split, S , and its neighbors,  1 and
+1. Then, the cost of the splitting rule is on the order of three computations of a
Cherno↵ step size. It turns out that this local search is very accurate for the examples
we worked on. In order to avoid being trapped in local minima, a randomization rule
may be applied.
Remark 7.2.1 (Pareto Splitting rule). Instead of computing a cost-based splitting at

each decision time, the following rule can be applied:
P
j2R ã (j)
RTL is defined s.t. PJ TL ⌫,
k=1 ãk
where ⌫ is a problem-dependent threshold, which can be estimated using the cost-

based splitting rule. The idea is to use the tau-leap method for a (100 ⇥ ⌫)% of the
penalized activity (measured as before using the ãj ’s), and an exact method for the
other channels.
This rule is adaptive because it depends on the current state of the process, but it
does not take into account the computational cost of the resulting partition of R. The
advantage of this rule is that it is three times faster than the previous one. For the
examples we worked on, the overall average gain in terms of computational work in a
whole mixed path is about 45% of the total work.
256
7.2.3 The one-step Mixing Rule
In this section we present the main building block for simulating a mixed path. Let
x=X̄(t) be the current state of the approximate process, X̄. Therefore, the expected
time step of the MNRM is given by 1/a0 (x). To move one step forward using the
MNRM, we should compute at least a0 (x) and sample a uniform random variable.
On the other hand, to move one step forward using the mixed Cherno↵ tau-leap
method, we need first to compute the split, then compute the tau-leap increments for
the reactions in the tau-leap set, RTL , and finally compute the MNRM steps for the
reactions in the set RMNRM , as discussed in Section 7.2.2.
To avoid the overhead caused by unnecessary computation of the split, we first
estimate the computational work of moving forward from the current time, t, to the
next grid point, T̃ , by using the MNRM only. If this work is less than the work of
computing the split, we take an exact step.
Algorithm 24 The one-step mixing rule. Inputs: the current state of the approxi-
mate process, X̄(t), the current time, t, the values of the propensity functions eval-
uated at X̄(t), (aj (X̄(t)))Jj=1 , the one-step exit probability bound , the next grid
point, T̃ , and the previous optimal split, . Outputs: the tau-leap set, RTL , the
exact set, RMNRM , and the new optimal split .
PJ
1: if K1 /a0 < T̃ t then
2: Compute ✓j , j=1, .., J (see (7.4))
3: ã (j) Sort{(1 ✓j )aj } descending, j=1, .., J
4: Si Compute the splits taking into account the previous optimal split 
5: (RTL , RMNRM , ) Take the minimum work split
6: return (RTL , RMNRM , )
7: else
8: return (;, R, )
9: end if
In order to compare the mentioned computational costs, we define K1 as the ratio

between the cost of computing the split, Cs , and the cost of computing one step using
the MNRM.
257
Remark 7.2.2 (Comparison with the one-step hybrid rule). In [5] we developed a
hybrid method, which, at each decision point, determines which method, exact or tau-
leap, is cheaper to apply to the whole set of reactions. That is, in the hybrid method,
we have either RTL = ; and RMNRM = R or RTL = R and RMNRM = ;. Then, the
mixed method can be seen as a generalization of the hybrid one. The key di↵erence
is in the cost of the decision rule, which, as we saw in Section 7.2.2, in the mixed
method is on the order of three times the computation of the Cherno↵ step size. This
di↵erence can be significant in some problems. A Pareto splitting rule may be able to
recover the cost of the hybrid one-step decision rule.
7.2.4 The Mixed-Path Algorithm
In this section, we present a novel algorithm (Algorithm 25) that combines the ap-
proximate Cherno↵ tau-leap method and the exact MNRM to generate a whole hy-
brid path. This algorithm combines the advantages of an exact method (expensive
but exact) and the tau-leap method (may be cheaper but has a discretization er-
ror and a positive probability of exiting the lattice). This algorithm automatically
and adaptively partitions the reactions into two subsets, RTL and RMNRM , using a
computational work criterion.
Since a mixed path consists of a certain number of exact/approximate steps, it
may also exit the lattice, except in those steps in which the tau-leap method is not
applied; that is, when RTL is empty. The idea of this algorithm is to apply, at each
decision point, the one-step mixing rule (Algorithm 24) to determine the sets RTL
and RMNRM , and then to apply the corresponding method.
7.2.5 Coupled Mixed Paths
In this section, we explain how to couple two mixed paths. This is essential for the
multilevel estimator. The four algorithms that are the building blocks of the coupling
258
Algorithm 25 The mixed-path algorithm. Inputs: the initial state, X(0), the
propensity functions, (aj )Jj=1 , the stoichiometric vectors, ⌫=(⌫j )Jj=1 , the final time, T ,
and the one-step exit probability bound, . Outputs: a sequence of states, (X̄(tk ))K k=0 ,
and the number of times, NTL , that the tau-leap method was successfully applied (i.e.,
X̄(tk ) 2 Zd+ , we applied the tau-leap method and we obtained an X̄(tk+1 ) 2 Zd+ ).
Notes: given the current state, nextMNRM computes the next state using the MNRM
method. Here, ti denotes the current time at the i-th step, and ⌧Ch (RTL ) is the
Cherno↵ step size associated with RTL .
1: i 0, ti t0 , X̄(ti ) X(0), Z̄ X(0)
2: Sj Compute splits, j=0, ..., J
3:  arg minj Work(Sj )
4: while ti < T do
5: T̃ next grid point greater than ti
6: (RTL , RMNRM , ) Algorithm 24 with (Z̄, ti , (aj (Z̄))Jj=1 , , T̃ , )
7: if RTL 6= ; then
8: TL P(aj (Z̄)⌧Ch (RTL ))⌫j , for j2RTL
9: H ti + ⌧Ch (RTL )
10: else P
11: H min{ti log(r)/ j aj , T }, r⇠Unif(0, 1)
12: end if
13: if RMNRM 6= ; then
14: while ti < H do
15: (Z̄, ti ) nextMNRM (Z̄, Re , ti , H)
16: end while
17: end if
18: Z̄ Z̄ + TL
19: if Z̄ 2 Zd+ then
20: NTL NTL + 1
21: ti+1 H
22: else
23: return ((X̄(tk ))ik=0 , NTL )
24: end if
25: i i+1
26: X̄(ti ) Z̄
27: end while
28: return ((X̄(tk ))ik=0 , NTL )
algorithm were already presented in [6]. The novelty here comes from the fact that
the coupled mixed algorithm may have to run the four algorithms concurrently in
the sense of the time of the process, t. In this section, we denote with a bar ¯· and a
double bar ¯· coarse and fine grid-related quantities.
259
We now briefly describe the mixed Cherno↵ coupling algorithm, i.e., Algorithm 26.
¯ be two mixed paths, corresponding to two nested time discretizations,
Let X̄ and X̄
called coarse and fine, respectively. Assume that the current time is t, and we know the
¯ (t), the next grid points at each level, t̄, t̄¯, and the corresponding
states, X̄(t) and X̄
one-step exit probabilities, ¯ and ¯. Based on this knowledge, we have to determine
¯ , R̄
the four sets (R̄TL , R̄MNRM , R̄ ¯
TL MNRM ), that correspond to four algorithms, B1,
B2, B3 and B4, that we use as building blocks. Table 7.1 summarizes them. In order
R̄TL R̄MNRM
¯
R̄ B1 B2
TL
¯
R̄ B3 B4
MNRM
Table 7.1: Building blocks for simulating two coupled mixed Cherno↵ tau-leap paths.
Algorithms B1 and B2 are presented as Algorithms 2 and 3 in [13]. Algorithms B3
and B4 can be directly obtained from Algorithm B2 (see [6]).
to do that, the algorithm computes, independently, the sets RTL and RMNRM for each
level, and the time until the next decision is taken, H, using Algorithm 27. Next, it
computes concurrently the increments due to each one of the sets (storing the results
in X̄ and ¯ for the coarse and fine grid, respectively). We note that the only case
X̄
in which we use a Poisson random variates generator for the tau-leap method is in
Algorithm B1 (Algorithm 28). For Algorithms B2, B3 and B4, the Poisson random
variables are simulated by adding independent exponential random variables with the
same rate, , until exceeding a given time final time, T . The only di↵erence in the
latter blocks are the time points at which the propensities, aj , are computed. For B2,
the coarse propensities are frozen at time t, whereas for B3 the finer are frozen at t.
In B4, the propensities are computed at each time step. After arriving at time H, the
¯ , R̄
four sets (R̄TL , R̄MNRM , R̄ ¯
TL MNRM ) and the time until the next decision is taken,
H, are determined again, and then all procedures are repeated until the simulation
reaches the final time, T .
260
7.3 The Multilevel Estimator and Total Error De-
composition
In this section, we first show the multilevel Monte Carlo estimator. We then analyze
and control the computational global error, which is decomposed into three error
components: the discretization error, the global exit error, and the Monte Carlo
statistical error. Upper bounds for each one of the three components are given.
Finally, we briefly describe the automatic estimation procedure that allows us to
estimate our quantity of interest within a given prescribed relative tolerance, up to a
given confidence level.
7.3.1 The MLMC Estimator
In this section, we discuss and implement a multilevel Monte Carlo estimator for the
mixed Cherno↵ tau-leap case.
Consider a hierarchy of nested meshes of the time interval [0, T ], indexed by
` = 0, 1, . . . , L. Let t0 be the size of the coarsest time mesh that corresponds
`
to the level `=0. The size of the time mesh at level ` 1 is given by t` =R t0 ,
where R>1 is a given integer constant. Let {X̄` (t)}t2[0,T ] be a mixed Cherno↵ tau-
leap process with a time mesh of size t` and a one-step exit probability bound ,
and let g` :=g(X̄` (T )) be our quantity of interest computed with a mesh of size t` .
We can simulate paths of {X̄` (t)}t2[0,T ] by using Algorithm 25. We are interested in
estimating E [gL ], and we can simulate correlated pairs, (g` , g` 1 ) for ` = 1, . . . , L, by
using Algorithm 26. Let A` be the event in which the `-th grid level path, X̄` , arrives
at the final time, T , without exiting the state space of X.
261
Consider the following telescopic decomposition:
L
X ⇥ ⇤
E [gL 1AL ] = E [g0 1A0 ] + E g ` 1A` g ` 1 1A ` 1
,
`=1
where 1A is the indicator function of the set A. This motivates the definition of our
MLMC estimator of E [g(X(T ))]:
M0 L M
1 X X 1 X̀
ML := g0 1A0 (!m,0 ) + [g` 1A` g` 1 1A` 1 ](!m,` ). (7.6)
M0 m=1 `=1
M ` m=1
Computational Complexity
A key property of our multilevel estimator is that the computational work is a function
of the given relative tolerance, T OL, is of the order of T OL 2 . The optimal work is
is given by
L
!2
CA X p
wL⇤ (T OL) = V` ` T OL 2 .
✓ `=0
P p
From the fact that the sum 1`=0 V` ` converges, because ` = O ( MNRM ), we con-
PL p
clude that supL { `=0 V` ` } is bounded and, therefore, the expected computational
complexity of the multilevel mixed Cherno↵ tau-leap method is wL⇤ (T OL)=O (T OL 2 ).
In this section, we define the computational global error, EL , and show how it can be
naturally decomposed into three components: the discretization error, EI,L , and the
exit error, EE,L , both coming from the tau-leap part of the mixed method, and the
Monte Carlo statistical error, ES,L . We also give upper bounds for each one of the
three components.
The computational global error, EL , is defined as
EL := E [g(X(T ))] ML ,
262
and can be decomposed as
⇥ ⇤
E [g(X(T ))] ML = E g(X(T ))(1AL + 1AcL ) ± E [gL 1AL ] ML
⇥ ⇤
= E g(X(T ))1AcL + E [(g(X(T )) gL ) 1AL ] + E [gL 1AL ] ML .
| {z } | {z } | {z }
=:EE,L =:EI,L =:ES,L
We showed in [5] that by choosing adequately the one-step exit probability bound,
, the exit error, EE,L , satisfies |EE,L |  |E [g(X(T ))] | P (AcL )  T OL2 .
An efficient procedure for accurately estimating EI,L in the context of the tau-leap
method is described in [6].
N (¯
!)
For each mixed path, (X̄` (tn,` , !
¯ ))n=0 , we define the sequence of dual weights,
N (¯
!)
('n,` (¯
! ))n=1 , backwards as follows:
'N (¯!),` := rg(X̄` (tN (¯!),` , !

¯ )) (7.7)
'n,` := Id + tn,` JTa (X̄` (tn,` , !

¯ )) ⌫ T 'n+1,` , n = N (¯
! ) 1, . . . , 1,
where tn,` :=tn+1,` tn,` , r is the gradient operator and Ja (X̄` (tn,` , !
¯ ))⌘[@i aj (X̄` (tn,` , !
¯ ))]j,i
is the Jacobian matrix of the propensity function, aj , for j=1 . . . J and i=1 . . . d.
We then approximate EI,L by A (EI,L (¯
! ); ·), where
N (¯
!) J
!
X tn,L X
EI,L (¯
! ) := 'n,L 1j2RTL (n) ⌫jT aj (X̄L (tn+1,` )) aj (X̄L (tn,` )) (¯
! ),
n=1
2 j=1
PM
A (X; M ) := M1 m=1 X(!m ) and S 2 (X; M ) :=A (X 2 ; M ) A (X; M )2 denote the
sample mean and the sample variance of the random variable, X, respectively. Here
1j2RTL (n) =1 if and only if, at time tn,` , the tau-leap method was used for reaction
channel j, and we denote by Id the d ⇥ d identity matrix.
P L V`
The variance of the statistical error, ES,L , is given by `=0 M` , where V0 :=
⇥ ⇤
Var [g0 1A0 ] and V` := Var g` 1A` g` 1 1A` 1 , ` 1. In [6], we presented an efficient
263
and accurate method for estimating V` , ` 1 using the formula
! !
X ⇥ ⇤ X ⇥ ⇤
V̂` := S 2 E 'n+1 · en+1 F (¯
! ); M` +A Var 'n+1 · en+1 F (¯
! ); M` ,
n n
N (¯
!)
where F is a suitable chosen sigma algebra such that ('n (¯
! ))n=1 is measurable,
with N (¯
! ) being the total number of steps given by Algorithm 26. In this way,
⇥ ⇤ ⇥ ⇤
the only randomness in E 'n+1 · en+1 F and Var 'n+1 · en+1 F comes from the
N (¯
!)
local errors, (en )n=1 , defined as en := X`,n X` 1,n .
In the aforementioned work,
⇥ ⇤
we derived exact and approximate formulas for computing E 'n+1 · en+1 F and
⇥ ⇤
Var 'n+1 · en+1 F .
Remark 7.3.1 (Backward Euler). In (7.7), we have that 'n,` can be computed by
a backward Euler formula when too fine time meshes are required for stability, i.e.,
1
'n,` := Id tn,` JTa (X̄` (tn,` , !
¯ )) ⌫ T 'n+1,` .
7.3.3 Estimation Procedure
In this section, we briefly describe the automatic procedure that estimates E [g(X(T ))]
within a given prescribed relative tolerance, T OL>0, up to a given confidence level.
Up to minor changes, it is the same as the one presented in [6]. It is important to
remark that the minimal user intervention is required to obtain the parameters needed
to simulate the mixed paths, and subsequently, to compute the estimations using
(7.6). Once the reaction network is given (stoichiometric matrix ⌫ and J propensity
functions aj ), the user only needs to set the required maximum allowed relative global
error or tolerance, T OL, and the confidence level, ↵. This process has three phases:
Phase I Calibration of virtual machine-dependent quantities. In this phase, we es-

timate the quantities CMNRM , CTL , Cs and the function CP that allow us to
model the expected computational work, measured in runtime.
264
Phase II Solution of the work optimization problem: we obtain the total number
of levels, L, and the sequences, ( ` )L`=0 and (M` )L`=0 , i.e., the one-step exit
probability bounds and the required number of simulations at each level. In
this phase, given a relative tolerance, T OL>0, we solve the work optimization
problem
8
> PL
>
> min L }
`=0 ` M`
>
< { t 0 ,L,(M ` , ` ) `=0
s.t. . (7.8)
>
>
>
>
: EE,L + EI,L + ES,L  T OL
An algorithm to efficiently compute the solution of this optimization problem

is given in [6]. Our objective function is the expected total work of the MLMC
PL
estimator, ML , i.e., `=0 ` M` , where L is the deepest level, 0 is the ex-
pected work of a single-level path at level 0, and `, for ` 1, is the expected

computational work of two coupled paths at levels ` 1 and `. Finally, M0 is
the number of single-level paths at level 0, and M` , for ` 1, is the number
of coupled paths at levels ` 1 and `. We now describe the quantities ( ` )L`=0 .
First, 0 is the expected work of a single hybrid path (simulated by Algorithm
25),
0 := CMNRM E [NMNRM ( t0 , 0 )] + CTL E [NTL ( t0 , 0 )] (7.9)

2 3
Z X
+ E4 CP (aj (X̄0 (s))⌧Ch (X̄0 (s), 0 ))ds5 ,
[0,T ] j2RTL (s)
where t0 is the size of the time mesh at level 0 and 0 is the exit probability
bound at level 0, and RTL = RTL (t) is the tau-leap set, which depends on time
(and also the current state of the process). The set RTL is determined at each
decision step by Algorithm 24. Therefore, the expected work at level 0 is 0 M0 ,
where M0 is the total number of single hybrid paths.

265
For ` 1, we use Algorithm 26 to generate M` -coupled paths that couple levels
` 1 and `.
The expected work of a pair of coupled hybrid paths at levels ` and ` 1 is
h i h i
(c) (c)
` := CMNRM E NMNRM (`) + CTL E NTL (`) (7.10)
2 3
Z X
+ E4 CP (aj (X̄` (s))⌧Ch (X̄` (s), ` ))ds5
[0,T ] j2RTL,` (s)
2 3
Z X
+ E4 CP (aj (X̄` 1 (s))⌧Ch (X̄` 1 (s), ` 1 ))ds
5,
[0,T ] j2RTL,` 1 (s)
where
(c)
NMNRM (`) := NMNRM ( t` , ` ) + NMNRM ( t` 1 , ` 1)
(c)
NTL (`) := NTL ( t` , ` ) + NTL ( t` 1 , ` 1 ).
Phase III Estimation of E [g(X(T ))].
7.4 A Control Variate Based on a Deterministic
Time Change
In this section, we motivate a novel control variate for the random variable X(T, !)
defined by the random time change representation,
X ✓Z T ◆
X(T, !) = x0 + ⌫ j Yj aj (X(s)) ds, ! .
j 0
First, we replace the independent Poisson processes, (Yj (s, !))s 0 , by the identity
266
function. This defines the deterministic mean field,
X Z T
Z(T ) = x0 + ⌫j aj (Z(s)) ds.
j 0
Next, we consider the random variable
X ✓Z T ◆
X̃(T, !) = x0 + ⌫ j Yj aj (Z(s)) ds, ! ,
j 0
which uses the same realizations of (Yj (s, !))s

that define X(T, !). In this way,
0
h i
we expect some correlation between X(T ) and X̃(T ). Since E X̃(T ) = Z(T ) is
a computable quantity, we have that X̃(T ) is a potential control variate for X(T )
obtained at almost negligible extra computational cost.
We have that X̃(T, !) can be considered as a deterministic time change approxi-
mation of X(T, !).
To implement this idea, we first consider the sequence Zk , defined as a forward
Euler discretization of the mean field over a suitable mesh, {t0 =0, t1 , . . . , tK =T },
tk := tk+1 tk , k=0, 1, . . . , K 1; that is,
8
< Zk+1 = Zk + P ⌫j aj (Zk ) tk , k=0, . . . , K 1
>
j
.
>
: Z =x
0 0
ˆ j,k , by
The sequence Zk allow us to define another sequence, ⇤
8
>
< ⇤
ˆ j,k+1 = ⇤
ˆ j,k + aj (Zk ) tk , k=1, . . . , K 1
,
> ˆ =0
: ⇤ j,0
RT
ˆ j,K approximates
where ⇤ aj (Z(s)) ds.
0
Then, for each realization of X̄(T, !), which is an approximation of X(T, !), we
267
compute the control variate:
X ⇣ ⌘
X̂K = x0 + ⌫ j Yj ⇤ˆ j,K , (7.11)
j
which is the corresponding approximation of X̃(T, !) and has the computable expec-
tation
h i X
µK := E X̂K = x0 + ˆ j,K .
⌫j ⇤
j
N (!)
Now, we consider the random sequence, {X̄n (!)}n=0 , generated in this case by
the mixed algorithm. Here, X̄N (!) (!) is an approximation of X(T, !). The sequence
¯ j,n (!)}, is defined by
of mixed random times, {⇤
8
>
< ⇤
¯ j,n+1 = ⇤
¯ j,n + aj (X̄n (!)) sn , n=0, . . . , N (!) 1
,
> ¯ j,0 = 0
: ⇤
over the mesh {s0 =0, s1 , . . . , sN (!) =T }, sn := sn+1 sn , n=0, 1, . . . , N (!) 1.

¯ j,n , !),
At this point, it is crucial to observe that we can keep track of the values Yj (⇤
since at each step of the approximation algorithm, we are sampling the increments of
the processes, Yj . From now on, we omit ! in our notation.
⇣ ⌘
The values Yj ⇤ ˆ j,K , required in (7.11), can by obtained by sampling the process
Yj as follows. For each realization of X̄, we have two scenarios:
¯ j,n < ⇤
1. for some n, ⇤ ˆ j,K < ⇤
¯ j,n+1 . Since Yj ⇤
¯ j,n and Yj ⇤
¯ j,n+1 are known, we
sample a Poissonian bridge (binomial), i.e.,
ˆ j,K )
Yj ( ⇤ ¯ j,n , Yj ⇤
Yj ⇤ ¯ j,n+1 ¯ j,n + B
⇠ Yj ⇤
!
ˆ j,K ⇤
⇤ ¯ j,n
¯ j,n+1
B ⇠ binomial Yj ⇤ ¯ j,n
Yj ⇤ ,¯ .
⇤j,n+1 ⇤¯ j,n
ˆ j,K > ⇤
2. ⇤ ¯ j,N . Since we know the value Yj ⇤
¯ j,N , we just have to sample a
268
Poisson random variate as follows:
ˆ j,K ) Yj ⇤
Yj ( ⇤ ¯ j,N ⇠ Yj ⇤
¯ j,N + P
ˆ j,K
P ⇠ Poisson(⇤ ¯ j,N ).
⇤
⇥ ⇤
Finally, using the aforementioned control variate, we can estimate E g(X̄(T ))
with
M M
1 X 1 X
g(X̄N (!m )) (g(X̂K (!m )) g(µK )), (7.12)
M m=1 M m=1
h i
⇣ h i⌘
for any linear functional, g, since E g(X̂K ) = g E X̂K = g(µK ).
Remark 7.4.1 (Nonlinear observables). Observe in (7.11) that X̂K is a linear com-
bination of independent Poisson random variables. For that reason, we can exactly
compute the expected value of any polynomial or exponential function of X̂K . Let C
be the class of functions spanned by the family of polynomials and exponential func-
tions on X̂K . Let g be any nonlinear function and g̃ its projection onto the class C.
Consider now the following telescoping sum:
⇥ ⇤ ⇥ ⇤ h i h i
E g(X̄N ) = E (g g̃)(X̄N ) + E g̃(X̄N ) g̃(X̂K ) + E g̃(X̂K ) .
The random variable (g g̃)(X̄N ) has small variance since g is well approximated by
g̃. The variance of g̃(X̄N )g̃(X̂K ) is small since X̂K is coupled to X̄N .
⇥ ⇤
This observation leads us to approximate E g(X̄N ) by
1 X⇣ ⌘ h i
M
g(X̄N (!m )) g̃(X̂K (!m ) + E g̃(X̂K )) ,
M m=1
where the last term can be exactly computed.

269
Remark 7.4.2 (Reducing the variance at the coarsest level). The main application
of the deterministic time change control variate, X̃(T ), in this work is at the coarsest
level of our multilevel hierarchy. Consider the trivial decomposition
⇣ ⌘
g(X̄0 (T )) = g(X̃(T )) + g(X̄0 (T )) g(X̃(T )) .
Therefore,
⇥ ⇤ h i h i
E g(X̄0 (T )) = E g(X̃(T )) + E g(X̄0 (T )) g(X̃(T )) .
h i
We assume that we can compute exactly E g(X̃(T )) (if not, we can use a g̃ as in Re-
h i ⇥ ⇤
mark 7.4.1), we just have to estimate E g(X̄0 (T )) g(X̃(T )) instead of E g(X̄0 (T ))
h i
in our multilevel scheme. The computational gain lies in the fact that Var g(X̄0 (T )) g(X̃(T ))
⇥ ⇤
could be substantially lower than Var g(X̄0 (T )) .
Remark 7.4.3 (Computational Cost). An advantage of this control variate is that the
¯ j,n
computational cost is almost negligible because we only need to store two scalars, ⇤
¯ j,n+1 , for each reaction, j. These values are determined at each step by aj (X̄n ),
and ⇤
which is a quantity that is already computed at each time step of the mixed algorithm.
Also, for each realization of the control variate, at most one Poisson random variate
is needed for each reaction channel.
Remark 7.4.4 (Empirical Time Change). We can also compute the final times,
ˆ j,K , using a sample average of mixed paths instead of the mean field. We found
⇤
no significant improvements when using that approach, which requires a lot more
computational work. We conjecture that, for settings in which the mean field is not
representative, this approach is the only reasonable option.
270
method, and we compare the results with the hybrid MLMC approach given in [6]. For
benchmarking purposes, we use Gillespie’s Stochastic Simulation Algorithm (SSA)
instead of the Modificed Next Reaction Method (MNRM) because the former is widely
used in the literature.
Intracellular Virus Kinetics
This model, first developed in [18], has four species and six reactions,
1
E ! E+G, the viral template (E) forms a viral genome (G),
0.025
G ! E, the genome generates a new template,
1000
E ! E+S, a viral structural protein (S) is generated,
7.5⇥10 6
G+S ! V , the virus (V) is produced,
0.25 2
E ! ;, S ! ; degradation reactions.
Its stoichiometric matrix and its propensity functions, aj : Z+ ! R, are given by
0 1tr 0 1
B 1 0 0 0 C B E C
B C B C
B 1 0 1 0 C B 0.025 G C
B C B C
B C B C
B C B C
B 0 1 0 0 C B 1000 E C
⌫=B
B
C
C and a(X) = B
B
C,
C
B 1 1 0 1 C B 7.5⇥10 6 G S C
B C B C
B C B C
B 0 0 1 0 C B 0.25 E C
B C B C
@ A @ A
0 1 0 0 2S
respectively.
271
In this model, X(t) = (G(t), S(t), E(t), V (t)), and g(X(t)) = V (t), the number of
viruses produced. The initial condition is X0 =(0, 0, 10, 0) and the final time is T =20.
This example is interesting because i) it shows a clear separation of time scales, ii)
our previous hybrid Cherno↵ method has no compuational work gain with respect to
an exact method, and iii) in [13] the authors take an alternative approach, not using
the multilevel aspect of their paper.
We now analyze an ensemble of 10 independent runs of the phase II algorithm (see
Section 7.3.3), using di↵erent relative tolerances. In Figure 7.1, we show the total
predicted work (runtime) for the multilevel mixed method and for the SSA method,
versus the estimated error bound. We also show the estimated asymptotic work of the
multilevel mixed method. We remark that the computational work of the multilevel
hybrid method is the same as the work of the SSA.
ŴM L
T OL L⇤ ŴSSA
1.00e-01 1.0 0.02 ±0.001

5.00e-02 1.0 0.02 ±0.001
2.50e-02 1.2 ±0.261 0.02 ±0.001
1.25e-02 2.2 ±0.261 0.03 ±0.002
6.25e-03 3.4 ±0.320 0.04 ±0.004
3.13e-03 4.6 ±0.320 0.04 ±0.002
1.56e-03 5.8 ±0.261 0.06 ±0.008
7.81e-04 7.4 ±0.433 0.07 ±0.006
3.91e-04 8.6 ±0.320 0.06 ±0.007
Figure 7.1: Left: Predicted work (runtime) versus the estimated error bound,
with 95% confidence intervals. The multilevel mixed method is preferred over
the SSA and the multilevel hybrid method for all the tolerances. P Right: Details
L⇤ ˆ
for the ensemble run of the phase II algorithm. Here, ŴM L = `=0 ` M` and
ŴSSA = MSSA CSSA A (NSSA⇤ ; ·). As an example, the fourth row of the table tells
us that, for a tolerance T OL=1.25 · 10 2 , 2.2 levels are needed on average. The work
of the multilevel hybrid method is, on average, 3% of the work of the SSA and the
multilevel hybrid method. Confidence intervals at 95% are also provided.
In Figure 7.2, we can observe how the estimated weak error, ÊI,` , and the estimated
variance of the di↵erence of the functional between two consecutive levels, V̂` , decrease
272
linearly as we refine the time mesh, which corresponds to a tau-leap dominated regime.
This linear relationship for the variance starts at level 1, as expected. When the
MNRM dominated regime is reached, both quickly converge to zero as expected. The
estimated total path work, ˆ` , increases as we refine the time mesh. Observe that it
increases linearly for the coarser grids, until it reaches a plateau, which corresponds
to the pure MNRM case where the computational cost is independent of the grid size.
In the lower right panel, we show the total computational work, only for the cases in
which ÊI,` < T OL T OL2 .
h. Upper right: estimated variance of the di↵erence between two consecutive levels,
path work, ˆ` , as a function of h. Lower
V̂` , as a function of h. Lower left: estimatedP
right: estimated total computational work, Ll=0 ˆl Ml , as a function of the level, L.
In Figure 7.4, we show the main outputs of the phase II algorithm, ` and M`
for ` = 0, ..., L⇤ , for the smallest considered tolerance. In this example, L⇤ is 8 or 9,
273
Figure 7.3: Left: Percentage of the statistical error over the total error. q As we
mentioned in Section 7.3.1, it is well above 0.5 for all the tolerances. Right: V̂` ˆ` ,
as a function of `, for the smallest tolerance, which decreases as the level increases.
Observe that the contribution of level 0 is less than 50% of the sum of the other levels.
depending on the run. We observe that the number of realizations decreases slower
than linearly, from levels 1 to L⇤ 1, until it drops, due to the change to a MNRM
dominated regime.
Figure 7.4: The one-step exit probability bound, `, and M` for `=0, 1, ..., L⇤ , for the
smallest tolerance.
In Figure 7.5, we show T OL versus the actual computational error. It can be

seen that the prescribed tolerance is achieved with the required confidence of 95%,
since CA =1.96, for all the tolerances. The QQ-plot in the right part of Figure 7.5
was obtained as follows: i) for the range of tolerances specified in the first column of
Table 7.5, we ran the phase II algorithm 5 times, ii) for each output of the calibration
274
algorithm, we sampled the multilevel estimator ML , defined in 7.6, 100 times. This
plot reaffirms our assumption about the Gaussian distribution of the statistical error.
Figure 7.5: Left: T OL versus the actual computational error. The numbers above
the straight line show the percentage of runs that had errors larger than the required
tolerance. We observe that in all cases, the computational error follows the imposed
tolerance with the expected confidence of 95%. Right: quantile-quantile plot based
on realizations of ML .
Remark 7.5.1. In the simulations, we observe that, as we refine T OL, the optimal
number of levels approximately increases logarithmically, which is a desirable feature.
We fit the model L⇤ = a log(T OL 1 ) + b, obtaining a=1.47 and b=3.56.
Remark 7.5.2 (Pareto rule). Using the cost-based rule (see remark 7.2.1), we esti-
mate the threshold for the Pareto rule, obtaining ⌫ = 0.95419. It turns out that, for
this example, ŴM ixP areto /ŴM ix ranges from 0.6 to 0.75 (for most T OLs). This shows
that it is possible to increase the computational work gains further in some examples.
Remark 7.5.3. The savings in computational work when generating Poisson random
variables heavily depend on MATLAB’s performance capabilities. In fact, we would
expect better results from our method if we were to implement our algorithms in more
performance-oriented languages or if we were to sample Poisson random variables in
batches.
275
A Simple Sti↵ System
This model, adapted from [19], has three species and a mixture of fast and slow
reaction channels,
c
1 c3 c4
*
X1 ) X2 ! X3 ! ;, c2 c 3 > c4 .
c2
Its stoichiometric matrix and propensity functions, aj : Z+ ! R, are given by
0 1tr 0 1
B 1 1 0 C B c1 X 1 C
B C B C
B 1 1 0 C B c2 X 2 C
B C B C
⌫=B C and a(X) = B C,
B 0 1 1 C B cX C
B C B 3 2 C
@ A @ A
0 0 1 c4 X 3
respectively, where g(X(t)) = X3 (t). In this model, successive firings of the reac-
tion X2 ! X3 are separated by many reversible firings between X1 and X2 , which
takes a lot of computational work in a standard SSA run. In [20], Gillespie et al.
claim that this inefficiency cannot be addressed using ordinary tau-leaping because
of the sti↵ness of the system. We show here that we have substantial gains using our
mixed method, which also controls the global error. In this example, we also show
the performance of the control variate idea, presented in Section 7.4. We analyze 10
independent runs of the phase II algorithm (see Section 7.3.3), using di↵erent rela-
tive tolerances. In Figure 7.6, we show the total predicted work (runtime) for the
multilevel mixed method with and without a control variate at level 0 and for the
SSA method versus the estimated error bound. We also show the estimated asymp-
totic work of the multilevel mixed method. Observe that, for practical tolerances, the
computational work gains with respect to the SSA method, when using the control
variate, are of a factor of 500 times. Without using the control variate, computational
gains are also substantial.
276
ŴM Lcv ŴM L
Predicted work vs. Error bound, Simple stiff model T OL L⇤
ŴSSA ŴSSA
−3
3.13e-03 1.0 0.002 ±0.0004 0.03 ±0.001
10
1.56e-03 1.0 0.003 ±0.0004 0.04 ±0.001
7.81e-04 1.0 0.003 ±0.0010 0.04 ±0.002
Error bound
3.91e-04 1.0 0.004 ±0.0004 0.06 ±0.003

−4
10 1.95e-04 2.0 0.013 ±0.0015 0.09 ±0.008
SSA
9.77e-05 3.0 0.027 ±0.0040 0.13 ±0.016
slope 1/2
Mixed 4.88e-05 4.0 0.065 ±0.0146 0.19 ±0.025
−5
10
Mixed CV level 0 2.44e-05 6.0 0.100 ±0.0136 0.21 ±0.020
Asymptotic 1.22e-05 6.0 0.109 ±0.0299 0.22 ±0.029
4 6 8 10
10 10 10
Predicted work (runtime)
10
6.10e-06 6.0 0.108 ±0.0168 0.19 ±0.020
Figure 7.6: Left: Predicted work (runtime) versus the estimated error bound, with
95% confidence intervals, for the simple sti↵ model with and without using the control
variate at level 0, as described in Section 7.4. Right: Details of the ensemble run of
the phase II algorithm using the control variate (third column) and without using
the control variate (fourth column). As an example, the fifth row of the table tells
us that, for a tolerance T OL=1.95 · 10 4 , 2 levels are needed on average. The work
of the multilevel mixed method using the control variate at level 0 is, on average, 1%
of the work of the SSA. When not using the control variate, it is 9%. Confidence
intervals at 95% are also provided.
7.6 Conclusions
In this work, we addressed the problem of approximating the quantity of interest

E [g(X(T ))], where X is a non-homogeneous Poisson process that describes a stochas-
tic reactions network, and g is a given suitable observable of X, within a given pre-
scribed relative tolerance, T OL>0, up to a given confidence level at near-optimal
computational work.
We developed an automatic, adaptive reaction-splitting multilevel Monte Carlo
method, based on our Cherno↵ tau-leap mehthod [5, 6]. Its computational complexity
is O (T OL 2 ). This method can be therefore seen as a variance reduction of the SSA,
which has the same complexity. In our numerical examples, we obtained substantial
gains with respect to SSA and, for systems in which the set of reaction channels can
be adaptively partitioned into “high” and “low” activity, over our previous multilevel
hybrid Cherno↵ tau-leap method.
277
We also presented a novel control variate for g(X(T )), which adds negligible com-
putational cost when simulating a path of X(T ), and it may lead to additional dra-
matic cost reductions.
Acknowledgments
The research reported here was supported by King Abdullah University of Science
and Technology (KAUST). The authors are members of the KAUST SRI Center
for Uncertainty Quantification at the Computer, Electrical and Mathematical Sci-
ences & Engineering Division at King Abdullah University of Science and Technology
(KAUST).
278
Appendix
279
Algorithm 26 Coupled mixed path. Inputs: the initial state, X(0), the final time
T , the propensity functions, (aj )Jj=1 , the stoichiometric vectors, (⌫j )Jj=1 , and two time
N0
meshes, one coarser (ti )N i=0 , such that tN =T and a finer one, (sj )j=0 , such that s0 =t0 ,
N0
sM =tN , and (ti )N i=0 ⇢(sj )j=0 . Outputs: a sequence of states evaluated at the coarse
k=0 ⇢ Z+ , such that tK  T , a sequence of states evaluated at the fine
grid, (X̄(tk ))K d
grid (X̄¯ (s ))K ⇢ Zd , such that X̄(t ) 2 Zd or X̄

0 ¯ (s 0 ) 2 Zd . If t < T , both
l l=0 + K + K + K
paths exit the Z+ lattice before the final time, T . It also returns the number of times
d
the tau-leap method was successfully applied at the fine level and at the coarse level
and the number of exact steps at the fine level and at the coarse level. For the sake
of simplicity, we omit sentences involving the recording of current state variables,
counting of the number of steps, checking if the path jumps out of the lattice, the
updating of the current split, , and the return sentence.
1: t t ; X̄ X(0); X̄¯ X(0)
0
2: t̄ next grid point in (ti )N i=0 larger than t
3: (H̄, R̄TL , R̄MNRM , ā) Algorithm 27 with (X̄,t,t̄,T , ¯)
4: t̄¯ next grid point in (si )N i=0 larger than t
5: ¯ ¯ ¯
(H̄, R̄TL , R̄MNRM , ā) ¯ Algorithm 27 with (X̄ ¯ ,t,t̄¯,T , ¯)
6: while t < T do
7: H min{H̄, H̄ ¯}
(B1 , B2 , B3 , B4 ) ¯ , R̄
split building blocks from (R̄TL , R̄MNRM , R̄ ¯
8: TL MNRM )
9: Algorithm 28 (compute state changes due to block B1 )
10: Initialize internal clocks R, P if needed (see [5, 6])
11: X̄ 0; X̄ ¯ 0
12: for B = B2 , B3 , B4 do
13: tr t
14: X̄r ¯
X̄; X̄ ¯
X̄
r
15: while tr < H do
16: update Pj2B
17: switch B
18: case B2
19: d¯ āj2B
20: d¯ aj2B (X̄ ¯)
21: ⌧r Compute the Cherno↵ tau-leap step size using (X̄r , āj2B , H, ¯)
22: end case
23: case B3
24: d¯ aj2B (X̄)
25: d¯ ā ¯j2B
¯ , ā ¯
26: ⌧r Compute the Cherno↵ tau-leap step size using (X̄ r ¯j2B , H, )
27: end case
28: case B4
29: d¯ aj2B (X̄)
30: d¯ aj2B (X̄ ¯)
31: ⌧r 1
32: end case
33: end switch
280
34: A1 min(d, ¯
¯ d)
35: A2 d¯ A1 ; A3 d¯ A1
36: Hr min{H, tr +⌧r }
37: ¯ ,R ,P )
(tr , X̄r , X̄ ¯ , R , P , A)
Algorithm 29 with (tr , Hr , X̄r , X̄
r jB j2B r j2B j2B
38: end while
39: X̄ X̄ + (X̄r X̄); ¯
X̄ ¯ + (X̄
X̄ ¯ X̄¯)
r
40: end for
¯ ¯+ ˆ ¯
41: X̄ X̄ + X̂ + X̄; X̄ X̄ X̂ + X̄
42: t H
43: if t < T then
44: if H̄  H̄ ¯ then
45: t̄ next grid point in (ti )N
i=0 larger than t
46: (H̄, R̄TL , R̄MNRM , ā) Algorithm 27 with (X̄,t,t̄,T , ¯)
47: end if
48: if H̄ H̄ ¯ then
t̄¯ next grid point in
0
49: (sj )N
j=0 larger than t
(H̄¯ , R̄
¯ , R̄ ¯ ¯) ¯ ,t,t̄¯,T , ¯)
50: TL MNRM , ā Algorithm 27 with (X̄
51: end if
52: end if
53: end while
Algorithm 27 Compute the next time horizon. Inputs: the current state, X̃, the current
time, t, the next grid point, t̃, the final time, T , the one step exit probability bound, ˜,
and the propensity functions, a=(aj )Jj=1 . Outputs: the next horizon H, the set of reaction
channels to which the Tau-leap method should be applied, R̃TL , the set of reaction channels
to which MNRM should be applied, R̃MNRM , and current propensity values ã.
1: ã a(X̃)
2: (R̃TL , R̃MNRM ) Algorithm 24 with (X̄, t, (aj (X̄))Jj=1 , ˜, t̃, )
3: if R̃TL 6= ; then
4: H̃ min{t̃, t+⌧ (R̃TL ), T }
5: else
6: H̃ min{t+⌧ (R̃TL ), T }
7: end if
8: return (H̃, R̃TL , R̃MNRM , ã)
281
Algorithm 28 Compute building block 1. This algorithm is part of Algorithm 26.

1: tr t
ˆ
2: X̂ 0; X̂ 0
3: while tr < H do
4: ⌧¯r Compute the Cherno↵ tau-leap step size using (X̄+ X̂, āj2B1 , H, ¯)
¯+ ˆ ¯ ¯
5: ⌧¯r Compute the Cherno↵ tau-leap step size using (X̄ X̂, ā j2B1 , H, )
6: Hr min{H, tr +¯ ⌧r , tr +⌧¯r }
7: A1 ¯j2B1 )
min(āj2B1 , ā
8: A2 āj2B1 A1
9: A3 ¯j2B1 A1
ā
10: ⇤ P(A·(Hr tr ))
ˆ ˆ
11: X̂ X̂ + (⇤1 +⇤2 )⌫j2B1
12: X̂ X̂ + (⇤1 +⇤3 )⌫j2B1
13: tr Hr
14: end while
Algorithm 29 The auxiliary function used in algorithm 26. Inputs: current time, t,
current time horizon, T̄¯, current system state at coarser level and finer level, X̄, X̄
¯ , re-
spectively, the internal clocks R and P , the values, A, and the current building block, B.
Outputs: updated time, t, updated system states, X̄, X̄ ¯ , and updated internal clocks, R ,
i
Pi , i=1, 2, 3.
1: ti (Pi Ri )/Ai , for i = 1, 2, 3
2: mini { ti }
3: µ argmini { ti }
4: if t + > T̄¯ then
5: R R + A·(T̄¯ t)
6: t ¯
T̄
7: else
8: update X̄ and X̄ ¯ using ⌫
j2B
9: R R+A
10: r uniform(0, 1)
12: t t+
13: end if
14: return (t, X̄, X̄¯ , R, P )
282
REFERENCES
vol. 22, pp. 403–434, 1976.
[2] C. C. Battaile, “The kinetic Monte Carlo method: Foundation, implementation,

and application,” Computer Methods in Applied Mechanics and Engineering, vol.
Volume 197, no. 41-42, p. 33863398, 2008.
Physics, vol. 127, no. 21, 2007.

2001.

[7] L. Harris and P. Clancy, “A partitioned leaping? approach for multiscale mod-
eling of chemical reaction dynamics,” J. Chem. Phys, vol. Volume 125, 2006.
[8] J. Puchalka and A. Kierzek, “Bridging the gap between stochastic and determin-
istic regimes in the kinetic simulations of the biochemical reaction networks,”
Biophysical Society Biophysical Journal, vol. 86, no. 3, pp. 1357–1372, 2004.
[9] E. Haseltine and J. Rawlings, “Approximate simulation of coupled fast and slow
reactions for stochastic chemical kinetics,” J. Chem. Phys, vol. 117, no. 15, 2002.
[10] S. Plyasunov, “Averaging methods for stochastic dynamics of complex reaction

networks: description of multi-scale couplings,” arXiv:physics/0510054v1, 2005.
283
[11] Y. Cao and L. Petzold, “Accuracy limitations and the measurement of errors in
the stochastic simulation of chemically reacting systems,” Journal of Computa-
tional Physics, vol. 212, no. 1, pp. 6–24, 2006.
[12] Y. Cao, D. T. Gillespie, and L. R. Petzold, “Efficient step size selection for
the tau-leaping simulation method,” The Journal of Chemical Physics, vol. 124,
no. 4, p. 044109, 2006.
vol. 10, no. 1, 2012.

9 2005.

istry A, vol. 104, no. 9, pp. 1876–1889, 2000.
systems,” Multiscale Model. Simul., vol. 6, no. 2, pp. 417–436 (electronic), 2007.
[18] R. Srivastava, L. You, J. Summers, and J. Yin, “Stochastic vs. deterministic

modeling of intracellular viral kinetics,” Journal of Theoretical Biology, vol. 218,
no. 3, pp. 309–321, 2002.
[19] Y. Cao, D. T. Gillespie, and L. R. Petzold, “The slow-scale stochastic simulation

algorithm,” The Journal of Chemical Physics, vol. 122, no. 1, pp. –, 2005.
[20] D. T. Gillespie, A. Hellander, and L. R. Petzold, “Perspective: Stochastic algo-

rithms for chemical kinetics,” The Journal of Chemical Physics, vol. 138, no. 17,
pp. –, 2013.
284
Chapter 8
Multiscale Modeling of Wear

Degradation in Cylinder Liners
Alvaro Moraes, Fabrizio Ruggeri, Raúl Tempone and Pe-
dro Vilanova 1
Abstract
Every mechanical system is naturally subjected to some kind of wear process that,
at some point, will cause failure in the system if no monitoring or treatment process
is applied. Since failures often lead to high economical costs, it is essential both to
predict and to avoid them. To achieve this, a monitoring system of the wear level
should be implemented to decrease the risk of failure. In this work, we take a first
step into the development of a multiscale indirect inference methodology for state-
dependent Markovian pure jump processes. This allows us to model the evolution of
the wear level, and to identify when the system reaches some critical level that triggers
a maintenance response. Since the likelihood function of a discretely observed pure
jump process does not have an expression that is simple enough for standard non-
sampling optimization methods, we approximate this likelihood by expressions from
1
A. Moraes, F. Ruggeri, P. Vilanova and R. Tempone, “Multiscale Modeling of Wear Degradation
in Cylinder Liners”, SIAM Multiscale Modeling and Simulation, Vol. 12, Issue 1 (2014).
285
upscaled models of the data. We use the Master Equation to assess the goodness-of-fit
and to compute the distribution of the hitting time to the critical level.
8.1 Introduction
It is well known that one of the main factors in the failure of heavy-duty diesel
engines used for marine propulsion is wear of the cylinder liner [1]. The stochastic
modeling of the wear degradation of cylinder liners is extensively treated in [2, 1, 3, 4]
and references therein. This wear process, at some point, will cause failure if no
maintenance program is utilized. An e↵ective maintenance program is one that can
be carried out when there is some identifiable warning of the occurrence of failure.
Then, preventive maintenance can be carried out on the basis of the current condition
of the liner, generally when the maximum wear approaches a specified limit as imposed
by warranty clauses.
In this work, we aim to use a multiscale indirect inference approach for the wear
degradation problem. In our context, the term indirect inference is used in the sense
that “it is impossible to efficiently estimate the parameters of interest because of
the intractability of the likelihood function” [5]. But, instead of using a sampling-
oriented method to obtain consistent estimators, we propose the use of a multiscale
hierarchy of approximate “tractable” likelihoods. After optimizing these likelihoods,
we plug the estimated parameters in our base model and assess its quality by checking
confidence bands computed directly from the estimated base model.
The data set, w = {wi }ni=1 , taken from [4], consists of wear levels observed on n =
32 cylinder liners of eight-cylinder SULZER engines as measured by a caliper with
a precision of 0.05 mm. Warranty clauses specify that, to avoid failures, the liner
should be changed before it accumulates a wear level of 4.0 mm. Data are presented
in Figure 8.1.
286
5
Wear [mm]
3
0
0 1 2 3 4 5 6
4
Figure 8.1: Data set from [4]. Data refer to cylinder liners used in ships of the
Grimaldi Group.
As a consequence of the finite resolution of the caliper, the set of possible mea-
surements of the cylinder wear is represented with a finite lattice in the positive real
line. For that reason, we propose to model the resulting measurements of the wear
process as a Markovian pure jump process [6], which is the simplest class of pure
jump processes. This type of process can be characterized by a finite set of possible
jumps, each one having a certain intensity function (see Section 8.2.1 for details).
In this work, we propose a multiscale inference approach that gradually allows
us to estimate the number of possible jumps of the process, its amplitudes, and the
corresponding intensity functions. We depart from the simplest possible pure jump
model, i.e., the one that has only one possible jump with a linear intensity function,
and proceed with more complex models, by, for instance, adding more possible jumps
and/or more general intensity functions.
Our base model, which defines the microscopic scale, is a continuous-time Markov
pure jump process in a lattice. Since the process is observed only in a finite set of
times, i.e., it is partially observed, its likelihood function usually can not be writ-
287
ten in a simple closed form amenable to performing standard optimization proce-
dures. Proper inference based on partially observed continuous-time Markov chains
in lattices should be based on likelihoods corresponding to non-homogeneous Poisson
processes. We refer to the chapter “Inference for Stochastic Kinetic Models” in [7]
for details on the mentioned likelihoods, and Monte Carlo Markov Chains (MCMC)
techniques for inference on pure jump processes. In [7], the inference procedure is
intended only for linear rates and it is based on MCMC and exact path-simulation
techniques as the Stochastic Simulation Algorithm (SSA) by [8]. As a consequence,
this inference methodology may be very computationally demanding and does not
address the problem for general nonlinear rates. Up to the authors knowledge, there
are no computationally low-cost sampling schemes from non-homogeneous Poissonian
bridges.
For that reason, the idea is to consider upscaled auxiliary versions for modeling
our data, from which we can obtain simpler likelihood functions. Once we estimate
the parameters from this approximate likelihood, we plug them into the base model
and look at how well it fits the data. We use confidence bands, computed from the
Master Equation [9] at the microscopic level, as a visual goodness-of-fit criteria. The
inference methodology, motivated by the introduction of several temporal scales is as
follows, the two first categories correspond to macroscale indirect inferences, whereas
the third category corresponds to the mesoscale indirect inferences and the last one
corresponds to direct inference at the microscale level:
Perturbed Mean Field We first approximate the likelihood function of our base
microscopic model by the likelihood corresponding to its Mean Field (macroscale
reaction rate ordinary di↵erential equations -ODEs-) whose observations are
perturbed with Gaussian noise.
Perturbed Gaussian process If the parameters estimated for each member of a

suitable family of macroscopic Mean Field models do not fit the experimental
288
data, we translate the indirect inference problem into a slightly more complex
one. In this case, the approximated likelihood corresponds to a Gaussian process
whose mean and variance are obtained by a second-order moment expansion of
the base model, perturbed with additive independent Gaussian noise. Then,
provided that the probability distribution of the underlying pure jump process
is unimodal for all times, we expect good agreement between the estimated
microscopic model and the experimental data.
Langevin di↵usion If the estimated microscopic model does not fit the data, we
translate the inference problem into the mesoscopic scale, where inference tech-
niques for Langevin di↵usions apply. We observe that the likelihood functions
based on Gaussian models for the data are more restrictive than are the like-
lihoods based on the Langevin one, because the last one allows us to model
the time evolution of multimodal distributions. When the Gaussian model is
appropriate, however, it is estimated much more quickly than the Langevin one.
Direct inference In the same way, if the parameters estimated in a suitable family
of mesoscale Langevin models do not fit the experimental data, we should make
a direct inference at the microscopic level. It is worth mentioning that inference
procedures at the mesoscale and microscale are much more involved from the
computational point of view. For these two scales, the likelihood functions in
general can not be written in a closed form, and for that reason, they have to
be approximated and optimized by sampling procedures [10].
In the literature there are other approaches for the wear inference problem based
on pure jump processes that do not deal directly with the continuous time model.
For example, in [1], in a chapter entitled “Stochastic Processes for Modeling the Wear
of Marine Engine Cylinder Liners”, the authors modeled the wear process as a time-
continuous state-dependent Markov chain. In their methodology they approximated
289
the continuous nature of time by using a discrete time Markov chain with uniform
time steps and model its transitions probabilities with a Poisson kernel depending
on two parameters. The resulting fitting with this approach is poor and the authors
do not proceed further in this direction. In [4], a state-dependent, inhomogeneous
in time, Markov chain is proposed. The authors use a similar inference strategy by
discretizing time and space. These approximation steps and their relevant associated
errors are not discussed and the computational cost of this approximate inference
method explodes as one refines the time and/or space discretizations.
Similar observations can be made for di↵usion processes, where the lattice is ap-
proximated by a continuum of states, and the Poissonian noise is replaced by Gaussian
noise. Sampling bridges from di↵usions is still an ongoing research area, see [11] and
the references therein.
Our main contribution is twofold. First, we o↵er a novel approach to the prob-
lem of modeling the wear degradation of cylinder liners by using a continuous time
Markov chain in a lattice determined by caliper precision. Second, we take a first step
toward a general methodology for a multiscale indirect inference approach. It is a
first step because, for this particular problem, we did not need to use the mesoscopic
or microscopic levels of approximation.
The goals of this work are: (i) to estimate the parameters of the wear process,
modeled as a Markovian pure jump process and,(ii) to obtain the distribution of the
hitting time to the critical warranty level.
The remainder of this paper is organized as follows. Section 8.2 presents the base
model, its upscalings and the system of ODEs for the first two moments of the base
model. In Section 8.3, we present the model that actually fits the data. Next, in
Section 8.4, we derive the likelihood functions for the macroscopic scales. Later, in
Section 8.5, we develop a method for computing the hitting time to the critical level
based on the solution of the Master Equation. Section 8.6 contains the results of the
290
inference process and the distribution of the hitting time for the fitted model. Finally,
Section 8.7 o↵ers the conclusions.
8.2 Methodology
In this section, we first present the elements of the base microscopic model and
its infinitesimal generator. Then, we derive the macroscopic Mean Field and the
mesoscopic Langevin approximations. Finally, we show how to derive a system of
ODEs for the time evolution of the first two moments of the base model. The Mean
Field and the second-order expansion are used in Section 8.4 as the basis of the
indirect inference method.
8.2.1 The Pure Jump Process
Consider a Markov pure jump process, X, taking values in a lattice, S, in R+ . This

means that the evolution of the state vector, X(t), is modeled as a continuous-time
Markov chain (see [12]).
Assume that each possible jump in the system occurs according to one of the
pairs {(aj (x; ✓), ⌫j )}Jj=1 , where aj : S⇥⇥ ! R+ is known as the propensity function
associated with the jump ⌫j . For any ✓, we define aj (x; ✓)=0 for those x where
/ d+ . The propensity functions depend on a parameter ✓ 2 ⇥, where ⇥ is
x+⌫j 2Z
assumed here to be finite dimensional.
The probability that the system jumps from x2S to x+⌫j 2S during the small
interval (t, t+dt) is
P X(t+dt) = x+⌫j X(t) = x = aj (x; ✓)dt + o (dt) .
Example 8.2.1 (Simple decay model). The Simple Decay model is a pure jump
process X in the lattice S = N, where is a positive real number. The system
291
starts from x0 2S at time t=0, and the only reaction allowed is ⌫ = . Its associated
propensity function is a(x; c) = c x, where c > 0.
8.2.2 Upscaling: the Mean Field and Langevin equations
The generator LX of a pure jump Markov process X is a linear operator defined on

the set of bounded functions. In our case, it is given by (see [6])
X
LX (f ) := aj (x; ✓)(f (x + ⌫j ) f (x)). (8.1)
j
Using a first-order Taylor expansion of f in (8.1), we obtain the generator
X
LZ (f ) := aj (x; ✓)@x f (x)⌫j ,
j
which corresponds to the reaction-rates ordinary di↵erential equation (also known as

the Mean Field ODE)
8
>
< dZ(t) = ⌫a(Z(t); ✓)dt, t 2 R+
(8.2)
>
: Z(0) = x0 2 R+ ,
where the j-column of the matrix ⌫ is ⌫j and a is a column vector with components
aj .
Using a second-order Taylor expansion of f in (8.1), we obtain the generator of
an Itô di↵usion process, Y ,
X 1
LY (f ) := aj (x; ✓)(@x f (x)⌫j + ⌫j> @x2 f (x)⌫j ),
j
2
where Y is the di↵usion process defined by the Langevin Itô stochastic di↵erential
equation (SDE), where B(t) is a RJ valued Wiener process with independent compo-
292
nents.
8
> p
< dY (t) = ⌫a(Y (t); ✓)dt + ⌫diag( a(Y (t); ✓))dB(t), t 2 R+
(8.3)
>
: Y (0) = x0 2 R+ .
8.2.3 The second-order moment expansion
The Mean Field equations (8.2) approximate the evolution of the mean of the pure
jump process X. But, sometimes, it is desirable to have ODEs that approximate
higher-order moments as well. Here, we show how to derive a system of ODEs for the
evolution of the first two moments of a pure jump process by approximating it with
a Gaussian process. This approach is well suited for unimodal distributions of X(t)
for all times. In such a case, it has a main advantage with respect to the Langevin
di↵usion approach (8.3) because we do not need to sample any random variables, and
in particular di↵usion bridges, for obtaining estimations of our parameters.
Direct approach: Consider the Dynkin formula [6] for the process X,
Z t
E [f (X(t))] = f (X(0)) + E [LX (f )(s)] ds. (8.4)
0
To obtain the second-order moment expansion, we simply consider the formula (8.4)
applied to f (x) = x and f (x) = x2 . This leads to
8 hP i
> Rt
< E [X(t)] = x0 + E aj (X(s); ✓)⌫j ds
0 j
Rt hP i (8.5)
>
: E [X 2 (t)] = x20 + E aj (X(s); ✓) 2⌫j X(s) + ⌫j2 ds.
0 j
In general, the system (8.5) is not closed, and it depends on the form of the
propensity functions aj . In the linear case, when aj (x; ✓) = g(✓)x, the system (8.5) is
2
closed. We derive in Section 8.4.2 an ODE system for µ(t) := E [X(t)] and (t) :=
E [(X(t) µ(t))2 ].
An alternative approach: We present here an alternative way of deriving a system
293
of ODEs that approximately describes the evolution of the first two moments of the
process, X. The advantage of this approach is that it always gives a closed system of
ODEs.
Consider a general SDE of the form
dY (t) = ↵(Y (t))dt + (Y (t))dB(t). (8.6)
It is possible to approximate the moments of Y (t) as follows. First, take ex-

pectations in both sides of (8.6) and in both sides of the SDE, which results from
applying the Itô formula to the function g(y, t) := (y E [Y (t)])2 . Then, define
(t) := Y (t) E [Y (t)], and Taylor-expand ↵(Y (t)), (Y (t)) and (t)(↵(t) E [↵(t)])
around E [Y (t)] in powers of (t). Finally, drop the terms of order E [ 3 (t)] and higher.
In that way, we obtain the following system of ODEs:
8
>
>
>
> dµ(t) = (↵(µ(t)) + ↵00 (µ(t)) (t)/2) dt
<
d 2 (t) = ((2↵0 (µ(t)) + 00
(µ(t))) 2 (t) + 2 (µ(t))) dt (8.7)
>
>
>
>
: (µ(0), 2
(0)) = (x0 , 0), x0 2 R+ , t 2 R+ ,
where (y) := (y)2 /2, such that µ(t) and 2

(t) approximate E [Y (t)] and E [ 2 (t)],
respectively. This moment expansion approach can be extended directly to the mul-
tidimensional case. For the linear case, when aj (x; ✓) = g(✓)x, the system in (8.7) is
equivalent to the system in (8.5).
Remark 8.2.2. When the distribution of X(t) is multimodal, then we can extend the
approach in (8.7), by approximating the distribution of X(t) with a Gaussian mixture.
The price to pay in this case is the increment in the dimension of the resulting ODE
system.
294
8.3 The thickness measurement process
In this section, we define an auxiliary process named the thickness process that is
used for modeling and inference convenience. The relation between the wearing and
the thickness processes is simple: the sum of both is a constant that equals the initial
thickness. Therefore, the thickness process is decreasing and it takes positive values
in the lattice generated by caliper precision.
It is worth mentioning that the simple decay model described in Example 8.2.1
predicts a much smaller variability than does the one observed in the data, and it can
not be used as a model for the thickness process. For that reason, we propose modeling
the thickness process using two reactions with linear state-dependent coefficients.
This model generates satisfactory confidence bands that we use as a goodness-of-fit
test.
Definition of the thickness process. Let X(t) be the thickness process derived
from the wear of the cylinder liners up to time t (see [3, 4]), i.e., X(t) = T0 W (t),
where W is the wear process and T0 is the initial thickness. We model X(t) as a
sum of two simple decay processes (see Example 8.2.1) with = 0.05 (which is
the resolution of the measurement instrument), since one simple decay process is
not enough to explain the variance of the data. The two considered intensity-jump
pairs are (a1 (x), ⌫1 ) = (c1 x, ) and (a2 (x), ⌫2 ) = (c2 x, k ), where k is a positive
integer to be determined, and c1 and c2 are coefficients with dimension (mm·hour) 1 .
Therefore, the probability to observe a thickness decrement in a small time interval
(t, t + dt) is
P (X(t + dt) = X(t) X(t) = x) = c1 x dt (8.8)
P (X(t + dt) = X(t) k X(t) = x) = c2 x dt,
where the initial thickness X(0) = T0 , the coefficients c1 , c2 and k are four unknown
295
parameters.
8.4 Inference for the thickness measurement pro-
cess
In this section, we obtain the likelihood functions at the macroscopic level for the
thickness process (8.8). The first step is to transform the data set to observe a
decreasing thickness process. We define the thickness data x = {xi }ni=1 , as xi :=
T0 wi , where T0 is an unknown parameter that we expect to be around 5.0 mm (see
Section 8.6 for further details) and wi is the wear of the i-th datum. Observe that x
depends on T0 .
We consider two approximate models for the experimental data, x, as follows. The
first one postulates that each data point is the Mean Field of the thickness process
plus Gaussian noise with constant variance. In this case, the maximum likelihood
estimation (MLE) leads to an ordinary least squares problem. This model turns
out to be unsatisfactory for two reasons: when we consider only one reaction, it
gives a very narrow confidence band; when we consider two reactions, there is an
identifiability problem since there is a straight line in which the maximum of the
likelihood is attained. The second one is slightly more complex. It postulates that
the data are the sum of two terms that are independent realizations of two Gaussian
random variables. The moments of the first random variable evolve in time according
to a system of ODEs obtained by moment expansion. The second term is just additive
Gaussian noise with constant variance. The MLE leads to a weighted least squares
problem with a logarithmic penalization term. In this case, as we see in Section 8.6,
there is only one point in which the maximum of the likelihood is attained.
296
8.4.1 Mean Field approximation
Let us consider a Mean Field approximation for the thickness data, x, i.e., the data
x are modeled according to
xi = Z(ti ) + ✏i (model 1), (8.9)
2
where Z(t) satisfies the Mean Field ODE (8.2) and ✏i are i.i.d. realizations of N (0, E)
2
for i = 1, . . . , n, where E > 0 is the experimental measurement error. In this work,
we set E = (see Remark 8.4.1).
In this case, the likelihood function can be written as
n
Y ⇢
(xi Z(ti ; ✓))2
L(✓; x) / exp , (8.10)
i=1
2 E2
where ✓=(c1 , c2 , k, T0 ).
Now, given k and T0 , the MLE for (c1 , c2 ) is the minimizer of the opposite of the
log likelihood, i.e.,
n
X
⇤
c (k, T0 ) := arg min (xi Z(ti ; ✓))2 . (8.11)
c1 0,c2 0
i=1
8.4.2 A Gaussian approximation based on moment expansion
To obtain a system of ODEs for the time evolution of the first two moments of the
process X, we proceed to write down the system (8.5) in di↵erential form, where the
propensity functions and jumps are defined in (8.8). Since the propensity functions
are linear functions of the state, we have a closed system of ODEs:
297
8
>
>
>
> dµ(t) = (c1 ⌫1 + c2 ⌫2 )µ(t)dt,
<
d 2 (t) = (2(c1 ⌫1 + c2 ⌫2 ) 2 (t) + (c1 ⌫12 + c2 ⌫22 )µ(t)) dt, (8.12)
>
>
>
>
: (µ(0), 2
(0)) = (x0 , 0), x0 2 R+ , t 2 R+ .
Its solution is given by
µ(t) = x0 exp ((c1 ⌫1 + c2 ⌫2 )t) , (8.13)

2 c1 ⌫12 + c2 ⌫22
(t) = x0 exp ((c1 ⌫1 + c2 ⌫2 )t) (exp ((c1 ⌫1 + c2 ⌫2 )t) 1). (8.14)
c1 ⌫ 1 + c2 ⌫ 2
2
Based on µ(t) and (t), we consider a Gaussian model for our data, i.e., the data
x are modeled according to
xi = Ỹ (ti ) + ✏i (model 2), (8.15)
2 2
where, Ỹ (t) ⇠ N (µ(t), (t)), with mean µ(t) and variance (t). We also consider
2 2
that ✏i are i.i.d. realizations of N (0, E) for i = 1, . . . , n, where E > 0 is the
experimental measurement error. Here, E = (see Remark 8.4.1).
In this case, the likelihood function can be written as
n
Y ⇢
1 (xi µ(ti ; ✓))2
L(✓; x) = p exp , (8.16)
i=1 2⇡( 2
E + 2 (t
i ; ✓)) 2( E2 + 2 (ti ; ✓))
where ✓=(c1 , c2 , k, T0 ).
Now, the MLE for (c1 , c2 ), for fixed k and T0 , is the minimizer of the opposite of
the log likelihood,
n ⇢
X
⇤ (xi µ(ti ; ✓))2 2 2
c (k, T0 ) := arg min 2
+ log( E + (ti ; ✓)) . (8.17)
c1 0,c2 0
i=1 E + 2 (ti ; ✓)
298
Finally, we determine the appropriate values of k and T0 by analyzing the sequence
{c⇤ (k, T0 )}k 2 for di↵erent values of T0 . In Section 8.6, we see that for our data set,
w, the appropriate values for k and T0 are 4 and 5.0 mm, respectively.
Remark 8.4.1. Since the precision of the caliper is , if we assume that the mea-
surement errors are normally distributed, then the interval ± /2 is approximately
±3 E wide, and therefore we could set E = 2 /3. Numerical experiments show that
our inferences are essentially the same whether E is or 2 /3.
Remark 8.4.2. From the Langevin equation (8.3), if we define ↵(y) := ⌫a(y; ✓) and
p
(y) := ⌫diag( a(y; ✓)), we again obtain the system in (8.12).
Remark 8.4.3. The wear process, by its physical nature, is increasing and bounded,
therefore the thickness process should be decreasing and bounded from below. Thus,
we expect that the mean of the thickness process, µ(t), defined in (8.13), decays to
2
zero. Regarding the variance, (t), defined in (8.14), it should start from zero at
time zero, increase, and then return to zero again.
8.5 Hitting Times
In this section, we address the problem of computing the distribution of the time in
which the wear attains a certain critical value, L, i.e., the hitting time to L. Let ⌧L
be the first time that the wear process, W , is greater or equal than the critical level,
L,
⌧L := inf{t 2 R+ : W (t) L}.
This is exactly the first time that the thickness process, X, is less than or equal to
B = T0 L, where T0 is the initial thickness.
P
We have that F⌧B ;✓ (t) := P (X(t)  B|✓) = xB px (t; ✓), where px (t; ✓) is the
probability that X(t)=x, given the value of the parameter vector ✓. We know that
299
px (t; ✓) satisfies a system of ODEs named the Master Equation (ME)(see [13, 14, 9]).
In our setting, the ME is given by
8
> dpx (t; ✓) X X
>
< = px ⌫j (t; ✓)aj (x ⌫j ; ✓) px (t; ✓) aj (x; ✓), t 2 R+
dt j j (8.18)
>
>
: p (0; ✓) = 1
x x=x0 ,
where 1A is the indicator function of the set A, x, x+⌫j 2 S and ✓ = (c1 , c2 , k, T0 ).

This system of ODEs can be efficiently solved by any standard numerical technique.
8.5.1 Conditional Residual Reliability
Suppose that we know that the wear process, W , is at level w0 at time t0 0. Assume
that there exists a critical stopping level, wmax > w0 , that determines the residual
lifetime ⌧max t0 . For t > 0, the residual lifetime is greater than t, if and only if
W (t0 + t) < wmax . Therefore, the conditional probability
P (⌧max t0 > t|W (t0 ) = w0 ) = P (W (t0 + t) < wmax |W (t0 ) = w0 ) .
Taking into account the relation between the wear and the thickness processes, we
have that the conditional residual reliability function defined as
R(t; t0 , w0 ) := P (⌧max t0 > t|W (t0 ) = w0 )
can be written as P (X(t; T0 w 0 ) > T0 wmax ), where X(·, x0 ) is the thickness pro-
cess starting from x0 .
300
8.6 Numerical Results
As mentioned at the beginning of the Section 8.3, simple decay models (see 8.2.1) do
not fit the wear data, w = {wi }ni=1 , since they produce very narrow confidence bands,
like the dashed blue one shown in the left panel of Figure 8.3. In fact, we modeled
P
a(x; ✓) as Jj=1 cj xj for J = 1, 2, 3. In each case the only nontrivial coefficient was c1 .
It is important to notice that all the confidence bands are computed using the ME
(8.18).
Consider the pure jump process defined in (8.8). For this process, we have to esti-
mate the values of c1 , c2 , k and T0 . Figure 8.1 shows, in the left panel, the contour plot
of the least squares function (8.11), associated with the likelihood function defined in
(8.10), for k=4 (for other values of k, we obtain the same results), and T0 =5.0. We
can observe an identifiability problem since the maximum of the likelihood function
is attained at a straight line, see the left panel of Figure 8.1. By varying the values of
c1 and c2 in the minimum level set of the least squares function, we obtain a family
of confidence bands. For c2 = 0 (one reaction model), or small values of c2 , the con-
fidence band is very narrow, see the right panel of Figure 8.1. In the other extreme,
when c1 is positive but close to zero, we obtain satisfactorily wide confidence bands,
shown in the right panel of Figure 8.2.
To identify properly the values of c1 and c2 for each integer k 2 and T0 2S, we
use model 2, defined in (8.15), for the thickness data x. Figure 8.2 shows, in the
left panel, the contour plot of the least squares function (8.17), associated with the
likelihood function defined in (8.16) for k=4 and T0 =5.0. Now, we are in a better
situation regarding identifiability. Conditional on those values of k and T0 , the MLE
for (c⇤1 , c⇤2 ) is given by (0.63 · 10 4 , 1.2 · 10 4 ). In the right panel, the corresponding
90% confidence band is shown, which is very similar to the one obtained in [4], but
we use a more parsimonious model for the wearing process.
Figure 8.3 shows in the left panel the wear data, w, along with the confidence
301
−4 Data and the 90% confidence band
2.5 4.5
5 The 90% confidence band .
4
4.5
2
3.5
4
3.5 3
1.5
3 2.5
Wear
c2
2.5
2
1
2
1.5
1.5
1
0.5 1
0.5 0.5
0 0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
c1 −4 Operating time [h] 4
x 10 x 10
Figure 8.1: Left panel: residuals of the opposite of the loglikelihood of (8.10) for
k = 4. There is an identifiability problem for the parameters c1 and c2 . For each pair
in the minimizing set, we have a di↵erent confidence band. Right panel: wear data
and the 90% confidence band under model 1, defined in (8.9), for positive but small
c2 . The confidence band turns to be narrow when c1 increases.

−4 Data and the 90% confidence band
2.5 4.5
10 The 90% confidence band .
4
9
2
3.5
8
3
7
1.5
6 2.5
Wear
c2
5 2
1
4
1.5
3
1
0.5 2
1 0.5
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
c −4 Operating time [h] 4
1 x 10 x 10
Figure 8.2: Left panel: residuals of the minus loglikelihood (8.16) for k = 4. The
model 2, defined in (8.15), for the thickness data x, produces a likelihood function
with a unique global maximum. The MLE is (c⇤1 , c⇤2 ) = (0.63 · 10 4 , 1.2 · 10 4 ). Right
panel: th 90% confidence band.
bands computed for the data models 1 and 2. The values of c1 , c2 and k were com-
puted using the upscaled models, but fitted using the ME, which acts directly in the
microscopic base model (8.8).
Now, consider the values of the objective function defined in (8.17), evaluated at
302
⇤ ⇤
✓ :=(c (k, T0 ), k, T0 ), as a function of k and T0 ,
n ⇢
X (xi m(ti ; ✓⇤ ))2
F (k, T0 ) := 2
+ log( 2
E + 2
(ti ; ✓⇤ )) . (8.19)
i=1 E + 2 (ti ; ✓⇤ )
Figure 8.3 shows in the right panel that F (k, T0 =5.0mm) decreases until k=4, where
it reaches a plateau. The same situation is true for other values of T0 , even more,
F (4, 5.0)  F (4, T0 ) for T0 2 S. For that reason, ✓⇤⇤ := (0.63 · 10 4 , 1.2 · 10 4 , 4, 5.0)
is the MLE for our model. As a consequence, we identify two types of jumps, one
with amplitude and the other with amplitude 4 .
Figure 8.4 shows the evolution in time of the probability mass function defined in
(8.18), px (t; ✓⇤⇤ ), which is the solution of an ODE system. It looks like the typical
surface obtained in the Fokker Planck equations for di↵usions, but this is because we
are considering a fine lattice, S= N, with =0.05. We see that it departs at time
t=0 from a point mass concentrated at the initial thickness T0 , and it di↵uses into
a unimodal bell-shaped distribution. In the domain, we plotted 100 exact simulated
paths and their average (see [15]).
In Section 8.5, we defined the hitting time to the critical level L. Let L=4, as
specified in warranty clauses. Then, since we have T0 =5.0, we have that B=1. We
can see the cumulative distribution function (CDF) and the probability distribution
function (PDF) of the hitting time, ⌧B , for B=1, in the left and right panels of Figure
8.5, respectively. The figure indicates that at around t = 30, 000 hours, it is advisable
to start monitoring the wear.
Figure 8.6, in the left panel, shows the evolution of the Gaussian confidence in-
tervals with the mean and variance computed from the process X. The functions
2
µ(t) and (t) are defined in (8.12). In the right panel of Figure 8.6, we see that the
confidence band computed from the ME (8.18) does not contain any negative value.
In Figure 8.7, we show, in the left panel, the QQ-plot of the normalized thickness
303
The opposite of the loglikelihood at (c*(k),k)
4.5 −38
4
−40
3.5
−42
3
Wear [mm]
2.5 −44
F(k)
2 −46
1.5
−48
1 Data
90% conf. band model 1
0.5 −50
90% conf. band model 2
0 −52
0 1 2 3 4 5 6 2 3 4 5 6 7 8
4
k
Figure 8.3: Left panel: the exact 90% confidence band from the ME (8.18).
Right panel: Plot of F (k) defined in 8.19. F (k) decreases until k=4, where it reaches
a plateau.
Figure 8.4: Solution of the ME (8.18) and 100 exact simulated paths [15].
p
data, z = {zi }ni=1 , defined by zi = (xi µ(ti ))/ 2 (t
i) + 2
E, where the thickness data
{xi }ni=1 are defined in Section 8.4. The figure suggests that there is good agreement
between z and the standard Gaussian distribution. In the right panel of Figure 8.7, we
show the percentage histogram and a kernel density estimation of z. The p-value of
the Shapiro-Wilk test is 0.68. We therefore can not reject Gaussianity. This analysis
strongly supports the use we made of model 2, defined in (8.15), for the thickness
304
CDF of the hitting time, B = 1 PDF of the hitting time, B = 1
1 0.05
0.8 0.04
0.6 0.03
0.4 0.02
0.2 0.01
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
x 10 Operating time [h] 4
x 10
Figure 8.5: Left panel: CDF of the hitting time for B = 1. Right panel:PDF of the
hitting time to the critical level.
Data and the 90% Gaussian confidence band Data and the 90% Gaussian confidence band
1
4
0.8
3
0.6
Wear [mm]
Wear [mm]
2 0.4
1 0.2
0
0
90% Gaussian confidence intervals −0.2
90% Gaussian confidence intervals
−1 90% ME confidence band −0.4 90% ME confidence band
1 2 3 4 5 6 0 5000 10000
time [h] x 10
4
time [h]
Figure 8.6: Left panel:

p wear data and the band made from the 90% confidence in-
2 2 2
tervals µ(t) ± 1.645 (t) + E , where µ(t) and (t) are defined in (8.12). In this
case, µ(t) and 2 (t) describe exactly the evolution of the mean and variance of the
process X. Right panel: details of the left panel. We see that the confidence band
computed from the ME (8.18) does not contain any negative value.
data, x.
Figure 8.8 shows the behavior of the conditional residual reliability function,
R(t; 0, w0 ) (see Section 8.5.1), for some values of w0 . In this case, we set wmax = 4.
As expected, for a fixed residual lifetime t, we have that R(t; 0, w0 ) is a decreasing
function of w0 .
305
QQ−plot for normalized thickness data Histogram and the kernel density estimation
3 0.5
Normalized thickness data quantiles

KDE
2 Std Gaussian
0.4
Normalized count [%]

1
0.3
0
0.2
−1
0.1
−2
−3 0
−3 −2 −1 0 1 2 3 −4 −2 0 2 4
Standard normal quantiles Normalized thickness data
Figure p 8.7: Left panel: QQ-plot of the normalized thickness data, zi = (xi
µ(ti ))/ 2 (t ) + 2 . This plot suggests that there is good agreement between z
i E
and the standard Gaussian distribution. Right panel: percentage histogram and a
kernel density estimation of z. The p-value of the Shapiro-Wilk test is 0.68. We can
therefore not reject Gaussianity.
Conditional residual reliabilty

1.4
1.2
1 w0 =2
w0 =3
0.8
w0 =4
0.6
w0 =5
0.4
0.2
0
0 2 4 6 8 10 12
Residual lifetime [h] x 10
4
Figure 8.8: The conditional residual reliability function, R(t; 0, w0 ) (see Section 8.5.1),
for some values of w0 .
8.7 Conclusions
In this paper, we presented a novel approach to the problem of modeling the wear
process of cylinder liners. Since the measuring caliper has finite precision, the wear
process takes values in a lattice and therefore a pure jump process is a sensible model.
306
In this approach, we started fitting one of the most simple pure jump processes, i.e.,
the simple decay model, and added complexity only when necessary. We found that
the wear process can be modeled using only two jumps of amplitudes and 4 , with
linear propensity functions. In contrast to the work of Giorgio et al (2011) [4], we did
not need to use age-dependent propensity functions or gamma noise. Nevertheless,
our approach is totally suitable for dealing with age-dependent propensities since the
time do not play role other than given constant quantities.
One of the main contributions of this work is the multiscale indirect inference
approach, where the inferences are based on upscaled models. The coefficients of
the linear propensity functions were inferred using the likelihood associated with a
Gaussian upscaled model. The mean and variance of this Gaussian process are the
solutions of a second-order moment expansion ODE system. In this way, we computed
the MLE by solving a standard nonlinear least squares problem. We observe that
this method is much simpler than dealing directly with the likelihood of the pure
jump process, which in general can not be expressed in closed form and requires
computationally intensive sampling techniques to be solved. We notice that, as long
as the probability distribution of the pure jump process is unimodal at every time, our
Gaussian inference approach is applicable and produces substantially savings in the
computational work. Otherwise, the Langevin model, while more computationally
demanding, is more flexible.
Thanks to the remarkable simplicity of our model, we can easily obtain the dis-
tribution of any observable of the process directly from the solution of the Master
Equation, which provides the probability distribution of the process at all times.
From this probability mass function, we easily compute the cumulative distribution
function of the hitting time to the critical value stipulated in the warranty and the
conditional residual reliability function. It is worth mentioning that we did not use
Monte Carlo simulation or any other sampling procedure.
307
Acknowledgments
The first, third and fourth authors are members of the SRI Center for Uncertainty
Quantification in Computational Science & Engineering at KAUST. This research
was performed when the second author visited KAUST.
308
REFERENCES
[1] M. Giorgio, M. Guida, and G. Pulcini, “Stochastic processes for modeling the
wear of marine engine cylinder liners,” in Statistics for Innovation, P. Erto, Ed.
Springer Milan, 2009, pp. 213–230.
[2] ——, “A wear model for assessing the reliability of cylinder liners in marine
diesel engines,” Reliability, IEEE Transactions on, vol. 56, no. 1, pp. 158–166,
2007.
[3] ——, “A state-dependent wear model with an application to marine engine cylin-
der liners,” Technometrics, vol. 52, no. 2, pp. 172–187, 2010.
[4] ——, “An age- and state-dependent Markov model for degradation processes,”
IIE Transactions, vol. 43, no. 9, pp. 621–632, 2011.
[5] C. Gourieroux, A. Monfort, and E. Renault, “Indirect inference,” Journal of

Applied Econometrics, vol. 8 (supplement), pp. S85–118, 1993.

tion), 2nd ed. Imperial College Press, 6 2005.
[7] D. J. Wilkinson, Stochastic Modelling for Systems Biology, Second Edition

(Chapman & Hall/CRC Mathematical & Computational Biology), 2nd ed. CRC
Press, 2011.

vol. 22, pp. 403–434, 1976.
[9] N. Van Kampen, Stochastic Processes in Physics and Chemistry, Third Edition
(North-Holland Personal Library), 3rd ed. North Holland, 2007.
[10] M. Bladt and M. Sørensen, “Statistical inference for discretely observed Markov
jump processes,” Journal of the Royal Statistical Society Series B, vol. 67, no. 3,
pp. 395–410, 2005.
309
[11] C. Bayer and J. Schoenmakers, “Simulation of conditional di↵usions via forward-
reverse stochastic representations,” 2013.
[12] J. Norris, Markov Chains (Cambridge Series in Statistical and Probabilistic

Mathematics). Cambridge University Press, 7 1998.

vol. 22, pp. 403–434, 1976.
310
Chapter 9
The Forward-Reverse Algorithm

for Stochastic Reaction Networks
with Applications to Statistical
Inference
Christian Bayer, Alvaro Moraes, Raúl Tempone and Pedro
Vilanova 1
Abstract
In this work, we present an extension of the forward-reverse algorithm by Bayer

and Schoenmakers [Annals of Applied Probability, 24(5):1994–2032, October 2014]
to the context of stochastic reaction networks (SRNs). It makes the approximation
of expected values of functionals of bridges for this type of processes computationally
feasible. We then apply this bridge-generation technique to the statistical inference
problem of approximating the reaction coefficients based on discretely observed data.
To this end, we introduce a two-phase iterative inference method in which, during the
1
C. Bayer, A. Moraes, R. Tempone and P. Vilanova, “The Forward-Reverse Algorithm for
Stochastic Reaction Networks with Applications to Statistical Inference”, preprint, (2015)
311
first phase, we solve a set of deterministic optimization problems where the SRNs are
replaced by their reaction-rate ODE approximation; then, during the second phase,
the Monte Carlo version of the Expectation-Maximization (EM) algorithm is applied
starting from the output of the previous phase. By selecting a set of over-dispersed
seeds as initial points for the phase I, the output of parallel runs of our two-phase
method is a cluster of maximum likelihood estimates. For convergence assessment
we use techniques from the theory of Markov Chain Monte Carlo. Our results are
illustrated by numerical examples.
9.1 Introduction
Stochastic Reaction Networks (SRNs) are a class of continuous-time Markov chains,

X⌘{X(t)}t2[0,T ] , that take values in Zd+ , i.e., the lattice of d-tuples of non-negative
integers. SRNs are mathematical models employed to describe the time evolution
of many natural and artificial systems. Among them we find biochemical reactions,
spread of epidemic diseases, communication networks, social networks, transcription
and translation in genomics and virus kinetics.
For historical reasons, the jargon from chemical kinetics is used to describe the
elements of SRNs. The integer d 1 is the number of chemical species reacting in our
system. The coordinates of the Markov chain, X(t)=(X1 (t), . . . , Xd (t)), account for
the number of molecules or individuals of each species present in the system at time
t. The transitions in our system are given by a finite number J of reaction channels,
(Rj )Jj=1 . Each reaction channel Rj is a pair formed by a vector ⌫j of d integer
components and a non-negative function aj (x) of the state of the system. Usually, ⌫j
and aj are named stoichiometric vector and propensity function, respectively. Since
our state space is a lattice, our system evolves in time by jumps from one state to
the next, and for that reason X is a pure jump process.
312
The propensity functions, aj , are usually derived through the mass action principle
also known as the law of mass action, see for instance Section 3.2.1 in [1]. For that
reason, we assume that aj (x)=cj gj (x), where cj is a non negative coefficient and gj (x)
is a given monomial in the coordinates of the process, X. However, our results can
be easily extended to polynomial propensities.
In this work, we address the statistical inference problem of estimating the co-
efficients ✓=(c1 , . . . , cJ ) from discretely observed data, i.e., data collected by observ-
ing one or more paths of the process X at a certain finite number of observational
times or epochs. It means that our data, D, is a finite collection {(tn,m , x(tn,m ))},
where m=1, 2, . . . , M indicates the observed path, n=1, 2, . . . , N (m) indicates the n-
th observational time corresponding to the m-th path, and the datum x(tn,m ) can be
considered as an observation of the m-path of the process X at time time tn,m . The
observational times, tn,m , are either deterministic or random but independent from
the state of the process X. In what follows, we denote with Xi,n,m the i-th coordinate
of X(tn,m , !m ), with X·,n,m the vector X(tn,m , !m ), where !m is the m-th path of the
process X.
Let us remark that we observe all the coordinates of X and not only a fixed subset
at each observational time tn,m . In that sense, we are not treating the case of partially
observed data where only a fixed proper subset of coordinates of X is observed.
Remark 9.1.1. The partially observed case can in principle also be treated by a
variant of the FREM algorithm based on [2], Corollary 3.8.
For further convenience, we organize the information in our data set, D, as a finite
collection,
D = ([sk , tk ], x(sk ), x(tk ))K

k=1 , (9.1)
such that for each k, Ik := [sk , tk ] is the time interval determined by two consecutive
313
observational points sk and tk , where the states x(sk ) and x(tk ) have been observed,
respectively. Notice that D collects all the data corresponding to the M observed
paths of the process X. For that reason, it is possible to have [sk , tk ]=[sk0 , tk0 ] for
k6=k 0 , for instance, in the case of repeated measurements.
For technical reasons, we need to define a sequence of intermediate times, (t⇤k )K
k=1 ;
for instance, t⇤k , could be the midpoint of [sk , tk ].

It turns out that the likelihood function, likc (✓), corresponding to data obtained
from continuously observed paths of X is relatively easy to derive, see Section 9.3.2.
It depends on the total number of times that each reaction channel fires over the time
interval [0, T ] and the values of the monomials gj evaluated at the jump times of X.
Since the observational times, tn,m , are not necessarily equal to the jump times of the
process X, we can not directly deal with the likelihood likc (✓). For that reason, we
consider the Monte Carlo version of the Expectation-Maximization (EM) algorithm
[3, 4, 5, 6] in which we treat the jump times of X and their corresponding reactions,
as missing data. The “missing data” can be gathered by simulating SRN-bridges of
the process X conditional on the data, D, i.e., X(sk )=x(sk ) and X(tk )=x(tk ) for all
intervals [sk , tk ]. To simulate SRN-bridges, we extend the forward-reverse technique
developed by Bayer and Schoenmakers [2] for Itô di↵usions to the case of SRNs. As
explained in Section 9.2, the forward-reverse algorithm generates forward paths from
sk to t⇤k and backward paths from tk to t⇤k . An exact SRN-bridge is formed when a
forward and backward paths meet at t⇤k . Observe that the probability of producing
SRN-bridges strongly depends on the approximation of ✓ we use to generate the
forward and backward paths. Not only exact bridges are considered in this work, we
also relax this meeting condition by using a kernel .
In this work, we present a two-phase algorithm that approximates the Maximum
Likelihood Estimator, ✓ˆMLE , of the vector ✓, using the collected data, D.
The phase I is the result of a deterministic procedure while the phase II is the
314
result of a stochastic one. The purpose of the phase I is to generate an estimate
of ✓, that will be used as initial point for the phase II. To this end, in the phase
I we solve a deterministic global optimization problem obtained by substituting at
each time interval, [sk , tk ], the ODE approximations to the mean of the forward and
reverse stochastic paths and minimizing a weighted sum of the squares of the euclidean
distances of the ODE approximations at the times t⇤k . Using this value as starting
point of the phase II, we hope to simulate an acceptable number of SRN-bridges in
(0)
the interval [sk , tk ] without too much computational e↵ort. The phase I starts at ✓I
(0)
and provides ✓II . In the phase II, we run a Monte Carlo EM stochastic sequence,
(✓ÎI )+1
(p)
p=1 until a certain convergence criterion is fulfilled. Here we have a schematic
representation of the two-phase method:
✓I ! ✓II ! ✓ÎI ! · · · ✓ÎI ! · · · ! ✓.

ˆ
(0) (0) (1) (p)
During the phase II, we intensively use a computationally efficient implementa-

tion of the SRN-bridge simulation algorithm for simulating the “missing data” that
feeds the Monte Carlo EM algorithm. Details are provided in Section 9.4. Our two-
phase algorithm is named FREM as the acronym for Forward-Reverse Expectation
Maximization.
Our FREM algorithm has certain similarity with the estimation methodology
proposed in [7], but it also has remarkable di↵erences. In terms of the similarity,
in [7] the authors propose a two-phase method where the first phase is intended to
select a seed for the second phase, which is an implementation of the Monte Carlo EM
algorithm. While our first phase is deterministic and uses the reaction-rate ODEs as
approximations of the SRN-paths, theirs is stochastic and a number of parameters
should be chosen to determine the amount of computational work and the accuracy
of the estimates. There is also a main di↵erence is the implementation of the second
315
phase: while the FREM algorithm is focused in efficiently generating kernel-based
SRN-bridges using the novel forward-reverse technology introduced by Bayer and
Schoenmakers in [2], the authors of [7] propose a trial-and-error shooting method
for sampling SRN-bridges. This shooting method can be viewed as a particular
case of the FREM algorithm by systematically choosing the intermediate point t⇤k
as the right extreme point tk , giving no place for the backward paths. To quantify
the uncertainty in our estimates, we prefer to have the outputs of our algorithm
starting from a set of over-dispersed initial points without assuming Gaussianity in
its distribution (see [4]). The variance of our estimators can be easily assessed by
bootstrap calculations. In our numerical experiments, we observe that the outputs
lie on a low dimensional manifold in the parameter space; this is a motivation against
the use of the Gaussiantiy assumption. Regarding the stopping criterion proposed
in [7], we found that the condition imposed there, of obtaining three consecutive
iterations close to each other up to certain tolerance could be considered as a rare
event in some examples and it may lead to the generation of an excessive number of
Monte Carlo EM iterations. We refer to [7] for comparisons against other existing
related statistical inference methods for SRNS.
In [8] the authors propose a method based on maximum likelihood for parame-
ter inference. It is based on first estimating the gradient of the likelihood function
with respect to the parameters by using reversible jump Markov Chain Monte Carlo
sampling (RJMCMC) [9, 10], and then applying a gradient descent method to obtain
the maximum likelihood estimation of the parameter values. The authors provide a
formula for the gradient of the likelihood function given the observations. The idea of
the RJMCMC method is to generate an initial reaction path, and then generate new
samples by adding or deleting a set of reactions from the path, by using an accep-
tance method. The authors propose a general method for obtaining a sampler that
can work for any reaction system. This sampler can be inefficient in the case of large
316
observation intervals. At this point, we would like to observe that their approach can
be combined with ours if, instead of using RJMCMC for computing the gradient of
the likelihood function, we use our forward-reverse method. We think that this com-
bination may be useful in cases in which many iterations of our method are needed
(see Section 9.6.3 for such an example). This is left as future work.
In the remainder of this section, we formally introduce SRNs, their reaction-rate
ODE approximations, the Stochastic Simulation Algorithm and the forward-reverse
method. In Section 9.2, we develop the main result of this article, that is the ex-
tension of the forward-reverse technique to the context of SRNs. The EM algorithm
for SRNs is introduced in Section 9.3. Next, in Section 9.4, we introduce the main
application of this article, that is the Forward-Reverse EM (FREM) algorithm for
SRNs. In Section 9.5 , we provide computational details for the practical implemen-
tation of the FREM algorithm. Later, in Section 9.6, we present numerical examples
to illustrate the FREM algorithm. Finally, in Section 9.7, we give our conclusions.
At the end, Appendix 7.6 contains the pseudo-code for the implementation of the
FREM algorithm.
9.1.1 Stochastic Reaction Networks
Stochastic Reaction Networks are continuous time Markov chains, X : [0, T ] ⇥ ⌦ !

Zd+ , that describe the stochastic evolution of a system of d interacting species. In
this context, the i-th coordinate of the process X, Xi (t), can be interpreted as the
number of individuals of species i present in the system at time t.
The system evolves randomly through J di↵erent reaction channels Rj := (⌫j , aj ).
Each stoichiometric vector ⌫j 2Zd represents a possible jump of the system, x ! x+⌫j .
The probability that the reaction j occur during an infinitesimal interval (t, t + dt) is
317
given by
P reaction j fires during (t, t + dt) X(t) = x = aj (x)dt + o (dt) , (9.2)
where aj : Rd ! [0, 1) are known as propensity functions. We set aj (x)=0 for those
/ Zd+ . We assume that the initial condition of X, X(0) = x0 2 Zd+
x such that x+⌫j 2
is deterministic and known.
c
Example 9.1.2 (Simple decay model). Consider the reaction X ! ; where one
particle is consumed. In this case, the state vector X(t) is in Z+ where X denotes
the number of particles in the system. The vector for this reaction is ⌫ = 1. The
propensity functions in this case could be, for example, a(X) = c X, where c > 0.
Section 9.6 contains more examples of Stochastic Reaction Networks.
9.1.2 Deterministic Approximations of SRN
The infinitesimal generator LX of the process X is a linear operator defined on the

set of bounded functions [11] . In the case of SRN, it is given by
X
LX (f )(x) := aj (x)(f (x + ⌫j ) f (x)). (9.3)
j
The Dynkin formula, (see [12])
Z t
E [f (X(t))] = f (X(0)) + E [LX (f )(s)] ds, (9.4)
0
can be used to obtain integral equations describing the time evolution of any observ-
able of the process X. In particular, taking the canonical projections fi (x) = xi , we
318
obtain a system of equations for E [Xi (t)],
Z tX
E [Xi (t)] = x0 + E [aj (X(s))] ⌫j,i ds.
0 j
If all the propensity functions, aj , are affine functions of the state, then this system
of equations forms a closed system of ODEs. In general, some propensity functions
may not depend on their coordinates x in an affine way, and for that reason, the
integral equations for E [Xi (t)] obtained from the Dynkin formula depend on higher
moments of X. This can be treated by closing moment techniques [13, 14] or by
taking a di↵erent approach: using a formal first-order Taylor expansion of f in (9.3),
we obtain the generator
X
LZ (f )(x) := aj (x)@x f (x)⌫j ,
j
which corresponds to the reaction-rate ODEs (also known as the mean field ODEs)
8
>
< dZ(t) = ⌫a(Z(t))dt, t 2 R+ ,
(9.5)
>
: Z(0) = x , 0
where the j-column of the matrix ⌫ is ⌫j and a is a column vector with components
aj .
This derivation motivates the use of Z(t) as an approximation of E [X(t)] in the
phase I of our FREM algorithm.
9.1.3 The Stochastic Simulation Algorithm
To simulate paths of the process X, we employ the Stochastic Simulation Algorithm

(SSA) by Gillespie [15]. The SSA simulates statistically exact paths of X, i.e., the
probability law of any path generated by the SSA satisfies (9.2). It requires one to
319
sample two independent uniform random variables per time step: one is used to find
the time of the next reaction and the other to determine which is the reaction that
fires at that time. Concretely, given the current state of the system, x := X(t), we
simulate two independent uniform random numbers, U1 , U2 ⇠ U(0, 1), and compute:
n k
X o
1
j = min k 2 {1, . . . , J} : ai (x)>U1 a0 (x) , ⌧min = (a0 (x)) ln (U2 ) ,
i=1
PJ
where a0 (x) := j=1 aj (x). The system remains in the state x until the time t + ⌧min ,
then it jumps, X(t + ⌧min ) = x + ⌫j . In this way, we can simulate a full path of the
process X.
Exact paths can be generated using more efficient algorithms like the Modified
Next Reaction Method by Anderson [16], where only one uniform variate is needed
at each step. However, in regimes where the total propensity, a0 (x), is high, approxi-
mate path-simulation methods like the hybrid Cherno↵ tau-leap [17] or its multilevel
versions [18, 19] may be required.
9.1.4 Bridge Simulation for SDEs
In [2], Bayer and Schoenmakers introduced the so-called forward-reverse algorithm

for computing conditional expectations of path-dependent functional of a di↵usion
process conditioned on the values of the di↵usion process at the end-points of the
time interval. More precisely, let X = X(t), 0  t  T , denote the solution of a
d-dimensional stochastic di↵erential equation (SDE) driven by standard Brownian
motion. Under mild regularity conditions, a stochastic representation is provided for
conditional expectations of the form,
H ⌘ E [ g(X) | X0 = x, XT = y] ,
320
for fixed values x, y 2 R and a (sufficiently regular) functional g on the path-space.2
d
More precisely, they prove an limiting equality of the form
⇥ ⇤
lim✏!0 E g(X (f ) X (b) )✏ (X (f ) (t⇤ ) X (b) (t⇤ ))Y
H= . (9.6)
lim✏!0 E [✏ (X (f ) (t⇤ ) X (b) (t⇤ ))Y]
Here, X (f ) is the solution of the original SDE (i.e., is a copy of X) started at X (f ) (0) =
x and solved until some time 0 < t⇤ < T . X (b) is the time-reversal of another di↵usion
process Y whose dynamics are again given by an SDE (with coefficients explicitly given
in terms of the coefficients of the original SDEs) started at Y (t⇤ ) = y and run until
time T . Hence, X (b) starts at t⇤ and ends at X (b) (T ) = y. We then evaluate the
functional g on the “concatenation” X (f ) X (b) of the paths X (f ) and X (b) , which is
a path defined on the full interval [0, T ] defined by
8
>
>
<X (f ) (s), 0  s  t⇤ ,
X (f ) X (b) (s) ⌘
>
>
:X (b) (s), t⇤ < s  T.
In particular, we remark that X (f ) X (b) may exhibit a jump at t⇤ . Here, Y is an

⇣R ⌘
T
exponential weighting term of the form Y = exp t⇤ c(Ys )ds . At last, ✏ denotes a
kernel with bandwidth ✏ > 0. Notice that the processes X (f ) and the pair X (b) , Y
are chosen to be independent.
Let us roughly explain the structure of the representation (9.6). First note that
the term on the right hand side only contains standard (unconditional) expectations,
implying that the right hand side (unlike the left hand side) is amenable to standard
Monte Carlo simulation — which is why we call (9.6) a “stochastic representation”.
The denominator of (9.6) actually equals the transition density p(0, x, T, y) of the
solution X, and its presence directly follows from the same term in the (analytical)
2
In fact, Bayer and Schoenmakers [2] require g to be a smooth function of the values Xti of the
process X along a grid ti , but a closer look at the paper reveals that more general, truly path-
dependent functionals can be allowed.
321
definition of the conditional expectation in terms of densities. In fact, it was precisely
in this context (i.e., in the context of density estimation) that Milstein, Schoenmakers
and Spokoiny introduced the general idea for the first time [20]. In essence, the reverse
process Y can be thought as an “adjoint” process to X, as its infinitesimal generator
is essentially the adjoint operator of the infinitesimal generator of X — see below for
a more detailed discussion in the SRN setting.
In a nutshell, the idea is that the law of the di↵usion bridge admits a Radon-
Nikodym density with respect to the law of the concatenated process X (f ) X (b) with
density given by Y, provided that the trajectories meet at time t⇤ , i.e., provided that
X (f ) (t⇤ ) = X (b) (t⇤ ). Of course, this happens only with zero probability3 , so we relax
the above equality with the help of a kernel with a positive band-width ✏. Further-
more, note that by the independence of X (f ) and X (b) , we can independently sample
many trajectories of X (f ) and many trajectories of X (b) and then identify all pairs of
trajectories satisfying the approximate identity X (f ) (t⇤ ) ⇡ X (b) (t⇤ ) as determined by
the kernel ✏ . This results in a Monte Carlo algorithm, which, in principle, requires
the calculation of a huge double sum by summing over all pairs of N samples from
X (f ) and M samples from X (b) . A naive implementation of that algorithm would
require a prohibitive computational cost of order O(M 2 ) operations, but fortunately
there are more efficient implementation relying on the structure of the kernel and
often reducing the complexity to O(M log(M )), see [2, 21]. In this way, the forward-
reverse algorithm can almost achieve the optimal Monte Carlo convergence rate of
1/2. More precisely, assuming enough regularity on the density of X and assuming
the use of a kernel of sufficiently high order (depending on the dimension), the root
1/2
mean squared error of the estimator is O(M ) with a complexity O(M log(M ))
1/d
and a bandwidth chosen to be ✏ = O(M ). These statements assume that we can
exactly solve the SDEs driving the forward and the reverse processes. Otherwise, the
3
In the SRN setting, the probability is positive, since the state space is discrete.
322
error induced by, say, the Euler scheme, will be added.
The structure of the construction of the forward-reverse representation (9.6) and
later of the corresponding Monte Carlo estimator in [2] strongly suggests that the
forward-reverse approach does not rely on the continuity of di↵usion processes, but
merely on the Markov property. Hence, the approach was generalized to discrete time
Markov chains in [21] and is generalized to the case of continuous time Markov chains
with discrete state space in the present work.
For a literature review on computational algorithms for computing conditional
expectations of functionals of di↵usion processes we refer to [2].
9.2 Expectations of Functionals of SRN-Bridges
In this section, we derive the dynamics of the reverse paths and the expectation
formula for functionals of SRN-briges. The derivation follows the same scheme used
in [20] , that is, i) write the Master Equation, ii) manipulate the Master Equation to
obtain a Backward Kolmogorov equation and, iii) derive the infinitesimal generator
of the reverse process.
9.2.1 The Master Equation
Let X be a SRN defined by the intensity-reaction pairs ((⌫j , aj (x)))Jj=1 . Let p(t, x, s, y)
be its transition probability function, i.e., p(t, x, s, y):=P X(s)=y X(t)=x where
x, y 2 Zd+ , and 0<t<s<T . The function p satisfies the following linear system of
ODEs known as the Master Equation [22, 23, 24]:
8
< @s p(t, x, s, y) = PJ (aj (y
>
⌫j )p(t, x, s, y ⌫j ) aj (y)p(t, x, s, y)) ,
j=1
(9.7)
>
: p(t, x, t, y) = x=y ,
where is the Kronecker delta function.

323
A general analytic solution of (9.7) is infeasible in terms of computational work.
Even numerical solutions are infeasible for systems with infinite or large number of
states. For continuous state spaces, (9.7) becomes a parabolic PDE known as Fokker-
Planck Equation. Next, we derive the generator of the reverse process in the SRN
setting.
9.2.2 Derivation or the Reverse Process
Let us consider a fixed time interval [t, T ]. For s 2 [t, T ], and x, y 2 Zd+ , let us
P
define v(s, y) := x g(x)p(t, x, s, y) provided that the sum converges. We remark
here that v cannot in general be interpreted as an expectation of g. Indeed, while
P
y p(t, x, s, y) = 1, the sum over x could, in principle, even diverge. Hence, it is not
a-priori clear that v admits a stochastic representation. However, multiplying both

sides of the Master Equation (9.7) by g(x) and summing over x, we obtain:
8
< @s v(s, y) = PJ (aj (y
>
⌫j )v(s, y ⌫j ) aj (y)v(s, y)) ,
j=1
(9.8)
>
: v(t, y) = g(y).
Now, let us consider a time-reversal induced by a change of variables s̃ = T + t s

with ṽ(s̃, y) := v(T + t s̃, y) = v(s, y) leading to the following backward equation:
8
>
< PJ
@s̃ ṽ(s̃, y) = j=1 (aj (y ⌫j )ṽ(s̃, y ⌫j ) aj (y)ṽ(s̃, y)) , t < s̃ < T,
(9.9)
>
: ṽ(T, y) = v(t, y) = g(y).
Let ⌫˜j := ⌫j . By adding and subtracting the term aj (y + ⌫˜j )ṽ(s̃, y), we can write
the first equation of (9.9) as:
J
X
@s̃ ṽ(s̃, y) + (aj (y + ⌫˜j ) (ṽ(s̃, y + ⌫˜j ) ṽ(s̃, y)) + (aj (y + ⌫˜j ) aj (y)) ṽ(s̃, y)) = 0.
j=1
324
As a consequence, the system (9.9) can be written as
8
< @s̃ ṽ(s̃, y) + PJ aj (y + ⌫˜j ) (ṽ(s̃, y + ⌫˜j )
>
ṽ(s̃, y)) + c(y)ṽ(s̃, y) = 0,
j=1
(9.10)
>
: ṽ(T, y) = g(y),
PJ
where c(y) := j=1 aj (y + ⌫˜j ) aj (y).
Let us now define ãj (y) := aj (y + ⌫˜j ) and substitute it into (9.10). We have arrived
at the following backward Kolmogorov equation [25] for the cost-to-go function v(s̃, y),
8
< @s̃ ṽ(s̃, y) + PJ ãj (y) (ṽ(s̃, y + ⌫˜j )
>
ṽ(s̃, y)) + c(y)ṽ(s̃, y) = 0,
j=1
(9.11)
>
: ṽ(T, y) = g(y).
PJ
We recognize in (9.11) the generator LY (ṽ)(s̃, y) := j=1 ãj (y) (ṽ(s̃, y + ⌫˜j ) ṽ(s̃, y))
that defines the so-called reverse process Y ⌘ {Y (s̃, !)}ts̃T by
P Y (s̃ + ds̃) = y + ⌫˜j Y (s̃) = y = ãj (y)ds̃, (9.12)
or equivalently by,
P Y (s̃ + ds̃) = y ⌫j Y (s̃) = y = aj (y ⌫j )ds̃. (9.13)
The Feynman-Kac formula [25] provides a stochastic representation of the solution

of (9.11),
 ✓Z T ◆
ṽ(s̃, y) = E g(Y (T )) exp c(Y (s))ds Y (s̃) = y . (9.14)
s̃
Notice that Y is a SRN in its own right.

325
9.2.3 The Forward-reverse Formula for SRN
Let us consider a time interval [s, t] and assume that we only observe the process X
on the end-points, i.e., that we have X(s) = x and X(t) = y for some observed values
x, y 2 Zd+ . Fix an intermediate time s<t⇤ <t, which will be considered a numerical
input parameter later on. Denote by X (f ) the process X conditioned on starting at
X (f ) (s) = x and restricted to the time domain [s, t⇤ ].
Furthermore, let Y denote the reverse process constructed in (9.12) on the time
domain [t⇤ , t] (i.e., inserting t⇤ for t and t for T in the above subsection) started at
Y (t⇤ ) = y. As noted above, Y is again an SRN with reaction channels (( ⌫j , ãj ))Jj=1 .
For convenience, we also introduce the notation X (b) for the process Y run backward
in time, i.e., we define X (b) (u):=Y (t⇤ +t u) for u2[t⇤ , t], and notice that X (b) (t) = y.
Recall that we aim to provide a stochastic representation, i.e., a representation
containing standard expectations only, for conditional expectations of the form,
H(x, y) ⌘ E [ (X, [s, t]) | X(s) = x, X(t) = y] , (9.15)
for mapping Zd+ -valued paths to real numbers. Obviously, needs to be integrable
in order for H to be well-defined, and we shall also assume polynomial-growth condi-
tions on and its derivatives (here we are assuming that there is a sensible smooth
extension of to the real domain) with respect to jump-times of the underlying path.
Moreover, we assume that p(s, x, t, y) > 0. Once again, the fundamental idea of the
forward-reverse algorithm of Bayer and Schoenmakers [2] is to simulate trajectories
of X (f ) and (independently) of X (b) and then look for any pairs which are “linked”.
Since the state space is now discrete, we may, in principle, require exact linkage in
the sense that we may only consider pairs such that X (f ) (t⇤ ) = X (b) (t⇤ ). However, in
order to decrease the variance of the estimator, it may once again be advantageous
to relax this condition by introducing a kernel.
326
By a kernel, we understand a function  : Zd ! R satisfying
X
(x) = 1.
x2Zd
Moreover, we call  a kernel of order r 0 if, in addition,
X
x↵ (x) = 0
x2Zd
for any multi-index ↵ with 1  |↵|  r, with ↵ := ↵1 + · · · + ↵d and x↵ :=

x↵1 1 · · · x↵d d , ↵ 2 {0, 1, 2, . . .}. For instance, any non-negative symmetric kernel has
order r = 1 in this sense.
Having fixed one such kernel , we define a whole family of kernels ✏ , indexed by
the bandwidth ✏ 0, by
⇣x⌘
✏ (x) = C✏ 
✏
P
with the constant C✏ being defined by the normalization condition x2Zd ✏ (x) = 1.
Here, we implicitly assume the kernel, , to be extended to Rd , for instance in a
piece-wise constant way. As we necessarily have (x) ! 0 as |x| ! 1, it is easy to
see that we have the special case
8
>
>
<1, x = 0,
0 (x) =
>
>
:0, x 6= 0.
Remark 9.2.1. The Kronecker kernel 0 can, indeed, also be realized as 0 = ✏0 for
some ✏0 > 0 which will depend on the base kernel , provided that the base kernel 
has finite support.
Theorem 9.2.2. Let be a continuous real-valued functional on the space of piece-

wise constant functions defined on [s, t] and taking values in Zd (w.r.t. uniform topol-
327
ogy) such that both H and the right hand side of (9.16) is finite for any ✏. With ✏ ,
X (f ) and X (b) as above, we have
⇥ ⇤
E X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t]
H(x, y) = lim ,
✏!0 E [✏ (X (f ) (t⇤ ) X (b) (t⇤ )) (X (b) , [t⇤ , t])]
(9.16)
where X (f ) X (b) denotes the concatenation of the paths X (f ) and X (b) in the sense
defined by 8
>
>
<X (f ) (s), s  u  t⇤ ,
(f ) (b)
X X (u) ⌘
>
>
:X (b) (s), t⇤ < u  t.
and
✓Z b ◆
(Z, [a, b]):= exp c (Z(u)) du .
a
Remark 9.2.3. In line with Remark 9.2.1, we note that we could easily have avoided
taking limits in Theorem 9.2.2 by replacing ✏ with 0 everywhere in (9.16). We note
at this stage that the Monte Carlo estimator based on (9.16) with positive ✏ will have
considerable smaller variance than the version with ✏ = 0, potentially outweighing the
increased bias.
Sketch of proof of Theorem 9.2.2. For simplicity, we assume that the kernel  has
finite support and that the functional is uniformly bounded. We will prove con-
vergence of the numerator and the denominator in (9.16) separately. Let us, hence,
prove the more general case first, i.e., the convergence
h(x, y):=H(x, y) p(s, x, t, y) =

⇥ ⇤
lim E X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t] . (9.17)
✏!0
In the first step, we assume that (Z, [s, t]) only depends on the values of Z on a
328
fixed grid, say s = t0 < t1 < · · · < tn = t, i.e.,
(Z, [s, t]) = g (Z(t0 ), . . . , Z(tn )) .
Then (9.17) is proved (with minor modifications) in [2], Theorem 3.4. Indeed, a closer
look at that proof reveals that only Markovianity of X is really used.
Furthermore, note that any continuous functional can be approximated by func-
tionals n depending only on the values of the process on a (ever finer) finite grid
t0 , . . . , tn . As, on the one side,
h(x, y) = E [ (X, [s, t]) | X(s) = x, X(t) = y] p(s, x, t, y) =
lim E [ n (X, [s, t]) | X(s) = x, X(t) = y] p(s, x, t, y)

n!1
and, on the other side,
⇥ ⇤
lim lim E n X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t] =
✏!0 n!1
⇥ ⇤
lim E X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t] ,
✏!0
we are left to prove that
⇥ ⇤
lim lim E n X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t] =
✏!0 n!1
⇥ ⇤
lim lim E n X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t] ,
n!1 ✏!0
which follows as 0 = ✏0 for some ✏0 > 0. In fact, it even follows in the general case
by dominated convergence.
Finally, the proof of convergence of the denominator is a special case of the proof
for the numerator, and therefore the convergence of the fraction follows from conti-
nuity of (a, b) 7! a/b for b > 0.
329
9.3 The EM Algorithm for SRN
In this section, we present the EM algorithm for SRNs, which is the main step for
computing the parameter estimation. First, we mention what is the EM algorithm
in general. Then, we derive the log-likelihood function for a fixed realization of the
process, X. Finally, we present the EM algorithm for SRNs.
9.3.1 The EM Algorithm
The EM algorithm [3, 4, 5, 6] has its name due to its two steps: Expectation and
Maximization. It is an iterative algorithm that, given an initial guess and a stopping
rule, provides an approximation for a local maximum or saddle point of the likelihood
function, lik(✓ D). It is a data augmentation technique in the sense that the maxi-
mization of the likelihood lik(✓ D) is performed by treating the data D as a part of
a larger data set, (D, D̃), where the complete-likelihood, likc (✓ D, D̃), is amenable to
maximization. Given an initial guess ✓(0) , the EM algorithm maps ✓(p) into ✓(p+1) by
h i
1. Expectation step: Q✓(p) (✓ D) := E✓(p) log(likc (✓ D, D̃)) D .
2. Maximization step: ✓(p+1) := arg max✓ Q✓(p) (✓ D).
⇥ ⇤
Here, E✓(p) · D , denotes the expectation associated with the distribution of D̃ under
the parameter choice ✓(p) , conditional on the data, D. In many applications, the Ex-
pectation step is computationally infeasible and Q✓(p) (✓ D) should be approximated
by some estimate,
h i
c
Q̂✓(p) (✓ D) := Ê✓(p) log(lik (✓ D, D̃)) D .
Remark 9.3.1 (The Monte Carlo EM). If we know how to sample a sequence of
M independent variates (D̃i )M (p)
i=1 ⇠ D̃ D, with parameter ✓ , then we can define the
330
following Monte Carlo estimator of Q✓(p) (✓ D),
M
1 X
Q̂✓(p) (✓ D) := log(likc (✓ D, D̃i )).
M i=1
In Section 9.4, we describe how to simulate exact and approximate samples of

D̃ D.
9.3.2 The Log-likelihood Function for Continuously Observed
Paths
The goal of this section is to derive an expression for the likelihood of a particular
path, (X(t, !0 ))t2[0,T ] , of the process X, where !0 2 ⌦ is a fixed realization. An
important assumption in this work is that the propensity functions aj can be written
as aj (x) = cj gj (x) for j=1, . . . , J and x 2 Zd+ where gj are known functionals and cj
are considered the unknown parameters. Define ✓:=(c1 , . . . , cJ ). Let us denote the
jump times of (X(t, !0 ))t2[0,T ] in (0, T ) by ⇠1 , ⇠2 , . . . , ⇠N 1. Define ⇠0 := 0, ⇠N := T
and ⇠i = ⇠i+1 ⇠i for i = 0, 1, . . . , N 1.
Let us assume that the system is in the state x0 at time 0. We have that ⇠1 is
the time to the first reaction, or equivalently, the time that the system spend at x0
(sojourn time or holding time at state x0 ). Let us denote by ⌫⇠1 the reaction that
takes place at ⇠1 , and therefore, the system at time ⇠1 is in the state x1 := x0 + ⌫⇠1 .
From the SSA algorithm it is easy to see that the probability density corresponding
to this transition is the product a⌫⇠1 (x0 ) exp ( a0 (x0 ) ⇠0 ).
By the Markov property we can see that the density of one path ((⇠i , xi ))N 1
i=0 is
given by
N
Y1
a⌫⇠i (xi 1 ) exp ( a0 (xi 1 ) ⇠i 1 ) ⇥ exp ( a0 (xN 1) ⇠N 1 ). (9.18)
i=1
331
The last factor in (9.18) is due to the fact that we know that the system will remain
in the state xN 1 in the time interval [⇠N 1 , T ).
Rearranging the factors in (9.18), we obtain
N
!N 1
X1 Y
exp a0 (xi ) ⇠i a⌫⇠i (xi 1 ). (9.19)
i=0 i=1
Now, taking logarithms in (9.19), we have
N
X1 N
X1
a0 (xi ) ⇠i + log(a⌫⇠i (xi 1 )). (9.20)
i=0 i=1
By the definition of a0 and the assumption aj (x) = cj gj (x), we can write (9.20) as
N
X1 X
J N
X1
cj gj (xi ) ⇠i + log(c⌫⇠i g⌫⇠i (xi 1 )).
i=0 j=1 i=1
Interchanging the order in the summation and denoting the number of times that the
reaction ⌫j occurred in the interval [0, T ] by Rj,[0,T ] , we have
J N
!
X X1 N
X1
cj gj (xi ) ⇠i + log(cj )Rj,[0,T ] + log(g⌫⇠i (xi 1 )). (9.21)
j=1 i=0 i=1
Observing that the last term in (9.21) does not depend on ✓, the complete log-
likelihood of the path (X(t, !0 ))t2[0,T ] is up to constant terms given by
J
X
c
` (✓) := log(cj )Rj,[0,T ] cj Fj,[0,T ] , with ✓=(c1 , . . . , cJ ), (9.22)
j=1
RT
where Fj,[0,T ] := gj (x0 ) ⇠0 +· · ·+gj (xN 1) ⇠N 1 = 0
gj (X(s)) ds. The last equality
is due to gj being piece-wise constant in the partition {⇠0 , ⇠1 , . . . , ⇠N }.
Now let us assume that we have a collection of intervals, (Ik = [sk , tk ])K
k=1 ⇢ [0, T ],
where we have continuously observed the process (X(t, ·))t2Ik at each Ik . We define
332
the log-likelihood function as:
J K K
!
X X X
c
` (✓) := log(cj ) Rj,Ik cj Fj,Ik .
j=1 k=1 k=1
Remark 9.3.2. Note that Rj,Ik and Fj,Ik are random variables which are functions of
the full paths of X, but not of the discretely observed paths. Hence, they are random
given the data D as defined in (9.1).
9.3.3 The EM Algorithm for SRNs
According to the Section 9.3.1, for a particular value of the parameter ✓, say ✓(p) , we
define
J K K
!
X X ⇥ ⇤ X ⇥ ⇤
Q✓(p) (c1 , . . . , cJ D) := log(cj ) E✓(p) Rj,Ik D cj E✓(p) Fj,Ik D ,
j=1 k=1 k=1
⇥ ⇤ ⇥ ⇤
where E✓(p) Rj,Ik D = E✓(p) Rj,Ik X(sk )=x(sk ), X(tk )=x(tk ) (by the Markov prop-
erty). Analogously for Fj,Ik .
Consider now the partial derivatives of Q✓(p) (c1 , . . . , cJ D) with respect to cj
K K
1 X ⇥ ⇤ X ⇥ ⇤
@cj Q✓(p) (c1 , . . . , cJ D) = E✓(p) Rj,Ik D E✓(p) Fj,Ik D .
cj k=1 k=1
Therefore, rQ✓(p) (c1 , . . . , cJ D) = 0 is obtained at ✓⇤ = (c⇤1 , . . . , c⇤J ) such that
PK ⇥ ⇤
E ✓ (p) Rj,Ik D
c⇤j = Pk=1
K ⇥ ⇤ , j=1, . . . , J. (9.23)
k=1 E✓ (p) Fj,Ik D
This is clearly the global maximization point of the function Q✓(p) (· D).
The EM algorithm for this particular problem generates a deterministic sequence
(✓(p) )+1
p=1 that starts from a deterministic initial guess ✓
(0)
provided by the phase I (see
333
Section 9.4.1) and evolves by
PK ⇥ ⇤
(p+1) k=1 E✓ (p) Rj,Ik D
cj = PK ⇥ ⇤, (9.24)
k=1 E✓ (p) Fj,Ik D
⇣ ⌘
(p) (p) (p)
where ✓ = c1 , . . . , c J .
9.4 The Forward-Reverse Monte Carlo EM Algo-
rithm for SRN
In this section, we present a two-phase algorithm for estimating the parameter ✓. The
phase I is deterministic while the phase II is stochastic. We consider the data, D, as
given by (9.1). The main goal of this section is to provide a Monte Carlo version of
formula (9.24).
9.4.1 Phase I: Using Approximating ODEs
The objective of the phase I is to address the key problem of finding a suitable initial
(0)
point ✓II to reduce the variance (or the computational work) of the phase II, thereby
increasing (in some cases dramatically) the number of SRN-bridges from the sampled
forward-reverse trajectories for all time intervals.
(0)
Let us now describe the phase I. From the user-selected seed, ✓I , we solve the fol-
lowing deterministic optimization problem using some appropriate numerical iterative
method:
(0)
X 2
✓II := arg min wk Z̃ (f ) (t⇤k ; ✓), Z̃ (b) (t⇤k ; ✓) . (9.25)
✓ 0
k
Here Z̃ (f ) is the ODE approximation, defined by (9.5), in the interval [sk , t⇤k ], to
the SRN defined by the reaction channels, ((⌫j , aj ))Jj=1 , and the initial condition
334
x(sk ); and, Z̃ (r)
, is the ODE approximation in the interval [t⇤k , tk ], to the SRN de-
fined by the reaction channels, (( ⌫j , ãj ))Jj=1 , and by the initial condition x(tk ).
Let us recall that in Section 9.2.2, ãj (x) has been defined as aj (x ⌫j ). We define
Z̃ (b) (u, ✓):=Z̃ (r) (t⇤k +tk u, ✓) for u 2 [t⇤k , tk ]. Further, wk :=(tk sk ) 1
and k·k is the
Euclidean norm in Rd . The rationale behind this particular choice of the weight fac-
tors is based on the mitigation of the e↵ect of very large time intervals where the
evolution of the process, X, may be more uncertain. A better (but more costly)
measure would be the inverse of the maximal variance of the SRN-bridge.
(0)
Remark 9.4.1 (Alternative definition of ✓II ). In some cases, convergence issues
arise when solving the problem (9.25). We found it useful to solve a set of simpler
problems whose answers can be combined to provide a reasonable seed for the phase
II: more precisely, we solve K deterministic optimization problems, one for each time
interval [sk , tk ]:
k := arg min Z̃ (f ) (t⇤k ; ✓) Z̃ (b) (t⇤k ; ✓) ,

✓ 0
(0)
all of them solved iteratively with the same seed, ✓I . Then, we define
P
(0) wk k
✓II := Pk . (9.26)
k wk
9.4.2 Phase II: The Monte Carlo EM
In our statistical estimation approach, the Monte Carlo EM Algorithm uses data
(pseudo-data) generated by those forward and backward simulated paths that result
in SRN-bridges, either exact or approximate bridges. In Figure 9.1, we illustrate this
idea, for the wear example data presented in Section 9.6.2. The phase II implements
the Monte Carlo EM algorithm for SRNs.
335
63
70 62
61
65
60
60
59
W
W
55
58
50 57
56
45
55
0.25 0.3 0.35 0.4 0.45 0.5 0.365 0.37 0.375 0.38 0.385 0.39 0.395 0.4
Time Time
Figure 9.1: Left: Illustration of the forward-reverse path simulation in Phase II. The
plot corresponds to a given interval for the wear data, presented in Section 9.6.2. The
observed values are marked with a black circle (beginning and end of the interval).
In the y-axis we plot the thickness process X(t), derived from the wear process of
the cylinder liner. Observe that every forward path that ends up at a certain value
will be joined with every backward path that ends up in the same value, when using
the Kronecker kernel. For example, this happens at value 58, where several forward
paths end and several backward paths starts. Right: Zoom near the value 58.
Simulating Forward and Backward Paths
This phase starts with the simulation of forward and backward paths at each time
interval Ik , for k=1, ..., K. More specifically, given an estimation of the true parameter
✓, say, ✓ˆ = (ĉ1 , ĉ2 , . . . , ĉJ ), the fist step is to simulate Mk forward paths with reaction
channels (⌫j , ĉj gj (x))Jj=1 in [sk , t⇤k ], all of them starting at sk from x(sk ) (see Section
9.5.1 for details about the selection of Mk ). Then, we simulate Mk backward paths
with reaction channels ( ⌫j , ĉj gj (x ⌫j ))Jj=1 in [t⇤k , tk ], all of them starting at tk
˜ m ))M
from x(tk ). Let (X̃ (f ) (t⇤k , ! (b) ⇤
˜ m0 ))M
m=1 and (X̃ (tk , !
k
m0 =1 denote the values of the
k
simulated forward and backward paths at the time t⇤k , respectively. If the intersection
of these two sets of points is nonempty, then, there exists at least one m and one m0
such that the forward and backward paths can be linked as one SRN-path connecting
the data values x(sk ) and x(tk ).
When the number of simulated paths Mk is large enough, and an appropriate
guess of the parameter ✓ is used to generate those paths, then, due to the discrete
336
nature of our state space Zd+ we expect to generate a sufficiently large number of
exact SRN-bridges to perform statistical inference. However, at early stages of the
Monte Carlo EM algorithm, our approximations to the unknown parameter ✓ are not
expected to provide a large number of exact SRN-bridges. In such a case, we can
use kernels to relax the notion of exact SRN-bridge, (see Section 9.2.3). Notice that
in the case of exact SRN-bridges, we are implicitly using a Kronecker kernel in the
formula (9.16), that is,  takes the value 1 when X̃ (f ) (t⇤k , !
˜ m ) = X̃ (b) (t⇤k , !
˜ m0 ) and 0
otherwise. We can relax this condition to obtain approximate SRN-bridges.
To make an computationally efficient use of kernels, we sometimes transform the
endpoints of the forward and backward paths generated in the interval Ik ,
Xk := (X̃ (f ) (t⇤k , !
˜ 1 ), X̃ (f ) (t⇤k , !
˜ 2 ), . . . , X̃ (f ) (t⇤k , !
˜ Mk ), (9.27)
X̃ (b) (t⇤k , !
˜ Mk +1 ), X̃ (b) (t⇤k , !
˜ Mk +2 ), . . . , X̃ (b) (t⇤k , !
˜ 2Mk )),
into
H(Xk ) := (Ỹ (f ) (t⇤k , !

˜ 1 ), Ỹ (f ) (t⇤k , !
˜ 2 ), . . . , Ỹ (f ) (t⇤k , !
˜ Mk ), (9.28)
Ỹ (b) (t⇤k , !
˜ Mk +1 ), Ỹ (b) (t⇤k , !
˜ Mk +2 ), . . . , Ỹ (b) (t⇤k , !
˜ 2Mk )),
by a linear transformation H with the aim of eliminating possibly high correlations in

the components of Xk . The original cloud of points Xk consisting of the extremes of the
forward and backward paths is then transformed into H(Xk ), which hopefully has a
covariance matrix close to a multiple of the d-dimensional identity matrix ↵Id . Ideally,
the coefficient ↵ should be chosen in such way that each d-dimensional unitary cube
centered at Ỹ (f ) (t⇤k , !
˜ m ) contains on average one element of [m0 {Ỹ (b) (t⇤k , !
˜ m0 )}. Note
that this transformation changes (generally slightly) the variances of our estimators
(see Section 9.5.3 for details about the selection of ↵ and H).
337
In our numerical examples, we use the Epanechnikov kernel
✓ ◆d Yd
3
(⌘) := (1 ⌘i2 )1|⌘i |1 , (9.29)
4 i=1
where ⌘ is defined as
⌘ ⌘ ⌘k (m, m0 ) := Ỹ (f ) (t⇤k , !
˜m) Ỹ (b) (t⇤k , !
˜ m0 ). (9.30)
This choice is motivated by the way in which we compute ⌘k (m, m0 ) avoiding

whenever possible to make Mk2 calculations. The support of  is perfectly adapted to
our strategy of dividing Rd into unitary cubes with vertices in Zd .
Kernel-weighted Averages for the Monte Carlo EM
As we previously mentioned, the only available data in the interval Ik correspond to

the observed values of the process, X, at its extremes. Therefore, the expected values
⇥ ⇤ ⇥ ⇤
E✓(p) Rj,Ik D and E✓(p) Fj,Ik D in the formula (9.24) must be approximated by
SRN-bridge simulation. To this end, we generate a set of Mk forward paths in the
interval Ik using ✓ÎI as the current guess for the unknown parameter ✓(p) . Having
(p)
(f ) (f )
generated those paths, we record Rj,Ik (˜
!m ) and Fj,Ik (˜
!m ) for all j = 1, 2, . . . , J and
(b)
m = 1, 2, . . . , Mk as defined in Section 9.3.2. Analogously, we record Rj,Ik (˜
!m0 ) and
(b)
!m0 ) for all j = 1, 2, . . . , J and m0 = 1, 2, . . . , Mk .
Fj,Ik (˜
Consider the following -weighted averages, where  = ✏ for an appropriate choice
⇥ ⇤ ⇥ ⇤
of bandwidth ✏, that approximate E✓(p) Rj,Ik D and E✓(p) Fj,Ik D , respectively:
P ⇣ ⌘
(f ) (b)
Rj,Ik (˜
m,m0 !m0 ) (⌘k (m, m0 )) k (m0 )
!m ) + Rj,Ik (˜
A✓ˆ(p) (Rj,Ik D; ) := P 0 0
(9.31)
m,m0 (⌘k (m, m )) k (m )
II
P ⇣ ⌘
(f ) (b)
m,m0 Fj,Ik (˜ !m0 ) (⌘k (m, m0 )) k (m0 )
!m ) + Fj,Ik (˜
A✓ˆ(p) (Fj,Ik D; ) := P 0 0
m,m0 (⌘k (m, m )) k (m )
II
338
where ⌘ (m, m ) has been defined in (9.30) and m, m0 = 1, 2, . . . , Mk , and k (m0 ) :=
0
⇣R ⌘
t
exp t⇤k cj (X̃ (b) (s, !
˜ m0 ))ds , according to Theorem 9.2.2. Observe that we generate
k
Mk forward and reverse paths in the interval Ik , but we do not control directly the
number of exact or approximate SRN-bridges that are formed. The number Mk is
chosen using a coefficient of variation criterion, as explained in Section 9.5.1. In
Section 9.5.2, we indicate an algorithm to reduce the computational complexity of
computing those -weighted averages from O(Mk2 ) to O(Mk log(Mk )).
Finally, the Monte Carlo EM algorithm for this particular problem generates a
stochastic sequence (✓ÎI )+1
(p) (0)
p=1 staring from the initial guess ✓II provided by the phase
I (9.25), and evolving by
PK
k=1 A✓ˆ(p) (Rj,Ik D; )
ĉ(p+1) = PK II
, (9.32)
k=1 A✓ˆ(p) (Fj,Ik D; )
II
⇣ ⌘
where ✓ÎI = ĉ1 , . . . , ĉJ . In Section 9.5.4, a stopping criterion based on techniques
(p) (p) (p)
widely used in Monte Carlo Markov chains is applied.
9.5 Computational Details
This section is intended to show computational details omitted in Section 9.4. Here,
we explain why and how we transform the clouds Xk formed by endpoints of forward
and reverse paths in the time interval Ik at the time t⇤k , for k=1, ..., K. Then, we
explain how to chose the number of simulated forward and backward paths, Mk ,
in the time interval Ik to obtain accurate estimates of the expected values of Rj,Ik
and Fj,Ik for j = 1, 2, . . . , J. Next, we show how to reduce the computational cost of
computing approximate SRN-bridges from O(Mk2 ) to O(Mk log(Mk )) using a strategy
introduced by Bayer and Schoenmakers [21]. Finally, we indicate how to choose the
initial seeds for the phase I and a stopping criteria for the phase II.
339
9.5.1 On the Selection of the Number of Simulated Forward-
Backward Paths
The selection strategy of the number of sampled forward-backward paths, Mk , for

interval Ik , is determined by the following sampling scheme:
1. First sample M forward-reverse paths (in the numerical examples we use M =100).
2. If the number of joined forward-reverse paths using a delta kernel is less than a
certain threshold , we transform the data as described in Section 9.5.3. This
data transformation allow us to use the Epanechnikov kernel (9.29). In this
way, we are likely to obtain a larger number of joined paths.
3. We then compute the coefficient of variation of the sample mean of the sum of
(f ) (b)
the number of times that each reaction j occurred in the interval Ik , Rj,Ik +Rj,Ik
(f ) (b) (f ) R
and Fj,Ik +Fj,Ik , for j=1, ..., J. Here Fj,Ik = Ik gj (X (f ) (s)) ds and the coefficient
(b) R
of variation of the sample mean of the sum Fj,Ik = Ik gj (X (b) (s)) ds. Further
details can be found in Section 9.3.2. The coefficient of variation (cv) of a
random variable is defined as the ratio of its standard deviation over its mean
µ, cv := |µ|
. In this case, for the reaction channel j in the interval Ik , we have:
(f ) (b)
1/2 S(Rj,Ik (˜
!m )+Rj,Ik (˜
!m ); Lk )
cvR̄ (Ik , j) = Lk ⇣ ⌘
(f ) (b)
A Rj,Ik (˜
!m )+Rj,Ik (˜
!m ); Lk
and
(f ) (b)
1/2 S(Fj,Ik (˜
!m )+Fj,Ik (˜
!m ); Lk )
cvF̄ (Ik , j) = Lk ⇣ ⌘,
(f ) (b)
A Fj,Ik (˜
!m )+Fj,Ik (˜
!m ); Lk
where S(Y ; L):=A (Y 2 ; L)A (Y ; L)2 is the sample standard deviation of the
P
random variable Y over an ensemble of size L, and A (Y ; L) := L1 Lm=1 Y (!m ),
its sample average. Here Lk denotes the number of joined paths in the interval
340
k, which is bounded by Mk2 . For the case that Lk is small, we compute a
bootstrapped coefficient of variation.
The idea is that, controlling both coefficients of variation, we can control the
variation of the p-th iteration estimation ✓ÎI . Our numerical experiments con-
(p)
firm this fact.
4. If each coefficient of variation is less than a certain threshold then the sampling
for interval Ik finishes, being Mk the total number of sampled paths, and ac-
cepting the quantities in step 3., and also the quantities (⌘k (m, m0 )) k (m
0
),
m, m0 = 1, ..., L, defined in Section 9.3.2. Otherwise, we sample additional
forward-reverse paths (increasing the number of sampled paths at each itera-
tion M ) and go to step 2.
This selection procedure is implemented in Algorithm 30.
9.5.2 On the Complexity of the Path Joining Algorithm
In this section, we describe the computational complexity of Algorithm 31 for joining

paths in the phase II, and show that this complexity is O(M log(M )) on average.
Let us describe the idea. First, fix a time interval Ik , and a reaction channel j.
We use the following double sum as an example,
M ⇣
M X
X ⌘
(f ) (b)
Rj,Ik (˜
!m ) + Rj,Ik (˜
!m0 ) m,m0 .
m=1 m0 =1
A double sum like this one appears in the numerator of (9.31). Instead of computing a
double loop which always takes O(M 2 ) steps (and many of those steps contribute 0 to
the sum), we take the following alternative approach: let ⇥di=1 [Ai , Bi ] be the smallest
hyperrectangle of sides [Ai , Bi ], i = 1, ..., d, that contains the cloud Y, defined in
(9.28). Let us also assume that Ai , Bi , i = 1, ..., d are integers. The length Bi Ai
341
depends on how sparse the cloud is in its i-th dimension. Given the cloud, it is easy to
check that the values Ai , Bi , i = 1, ..., d can be computed in O(M ) operations. Now,
we subdivide the hyperrectangle in sub-boxes of size-length 1, with sides parallel to
the coordinate axis.
Since we have a finite number of those sub-boxes, we can associate an index for
each one, in such a way that it is possible to directly retrieve each one using a suitable
data structure (for example an efficient sparse matrix or a hash table). The average
access cost of such structure is constant with respect of M . For each sub-box, we
associate a list of forward points that ended up in that sub-box. It is also direct to
see that the construction of such a structure takes a computational cost of M steps
on average. Then, instead of evaluating the double sum which has O(M 2 ) steps, we
evaluate only the non-zero terms. This is because, when a kernel  is used, (x, y) 6= 0
if and only if x and y are situated in neighboring sub-boxes. That is,
M ⇣
M X
X ⌘
(f ) (b)
Rj,Ik (˜
!m ) + Rj,Ik (˜
!m0 ) m,m0
m=1 m0 =1
Xi ) ⇣ ⌘
3 n(b
M X d
X (f ) (b)
= Rj,Ik (˜
!`(l) ) + Rj,Ik (˜
!m0 ) `(l),m0 ,
m0 =1 i=1 l=1
where n(bi ) is the total quantity of reverse end points associated with the i-th neighbor
of the sub-box to which the forward end-point, Ỹ (f ) (t⇤k , !
˜ m ), belongs, whereas `(l)
indexes one of those reverse end points. Note that the constant of this complexity
depends exponentially on the dimension (3d ).
The cost that dominates the triple sum on the right hand side is the expected maxi-
mum number of reverse points that can be found in a sub-box. This size can be proved
to be O(log(M )), which makes the whole joining algorithm of order O(M log(M )).
For additional details we refer to [2].
342
9.5.3 A Linear Transformation for the Epanechnikov Kernel
We have seen in our numerical experiments that clouds formed by the endpoints of
the simulated paths, X , usually have a shape similar to the cloud Z shown in the left
panel of Figure 9.1.
It turns out that partitioning the space into d-dimensional cubes with sides parallel
to the coordinate axis is a far-from-optimal manner to select the kernel domains and
consequently to find SRN-bridges. A more natural way of proceeding can be to divide
the space into a system of parallelepipeds with sides parallel to the principal directions
of the cloud Z with sides proportional to the lengths of its corresponding semi-axes
(we are thinking in some sort of singular value decomposition here), and use them as
the supports or our kernels.
Another way of proceeding (somehow related but not totally equivalent) is to
transform the original cloud Z to obtaining another cloud T (Z) with near-spherical
shape. Then, scale it to have in average one point of the cloud in each d-dimensional
cube (with sides parallel to the coordinate axis). In this new cloud, H(Z), we can
naturally find neighbors using the algorithm described in Section 9.5.2 below and
the Epanechnikov kernel to assign weights. For that reason we stated in Section
9.4 that we want to transform the data Xk into an isotropic cloud, such that, every
unitary cube centered in Ỹ (f ) (t⇤k , !
˜m0
) contains, on average, one point of the cloud
[m0 Ỹ (b) (t⇤k , !
˜m0
).
We now proceed to describe the details of the mentioned transformations.
We first show a customary procedure in statistics to motivate the transformation.
Let ⌃ := cov(Z) be the sample covariance matrix computed from a cloud of points
1/2
Z. To obtain a de-correlated version of Z the linear transformation T (z) = ⌃ z
is widely used in statistics. For example, consider a cloud Z of points obtained by
sampling 103 independent highly correlated bi-variate Gaussian random variables.
The corresponding cloud T (Z), depicted in the right panel of Figure 9.1, shows the
343
aspect of a sphere of radius 3. The next step is to obtain a radius ↵ such that the
Original Cloud Z Td(Z)

3
400
300 2
200
1
100
0
0
−100 −1
−200
−2
−300
−3
−400
−500 −400 −300 −200 −100 0 100 200 300 400 500 −4 −3 −2 −1 0 1 2 3 4
Figure 9.1: Left: A bivariate Gaussian cloud, Z. Right: Its corresponding decorre-
lated and scaled version T (Z).
Y = H(Z) = α Td(Z)
15
10
−5
−10
−15
−20
−20 −15 −10 −5 0 5 10 15 20
Figure 9.2: Cloud H(Z).
volume of a d-dimensional sphere of radius 3↵ equals to the volume of M unitary

1
d-dimensional cubes. From the equation, M = (3↵)d Vd , we obtain ↵ = 3
( VMd )1/d ,
⇡ d/2
where Vd = (d/2+1)
is the volume of the unitary sphere in Rd . Therefore, the linear
transformation H is defined by H(x) := ↵T (x). The result of this transformation is
depicted in Figure 9.2 in our Gaussian example.
In the general case, we do not expect to have a Gaussian-like distribution for
Xk , but it seems to be a good approximation in our numerical examples. At this
point, is worth mentioning that in examples with several dimensions (species), the
344
number of approximate SRN-bridges we get, by using the transformation, may be of
the order of M 2 . This indicates that the bandwidth is too large, and consequently
the bias introduced in the estimation may be large. In these cases, we expand ↵ by
a factor, say 1.5, until O(M ) approximate bridges are formed. Generally, one or two
expansions are enough.
A motivation for the Gaussian approximation is that, for short time intervals and
in certain regimes of activity of the system, specially where the total propensity, a0 , is
high enough, a Langevin approximation of our SRN provides an Ornstein-Uhlenbeck
process which can potentially be close in distribution to our SRN (see [26]).
Remark 9.5.1. According to the transformation H, the kernel used in our case is
approximately equal to
1 1
H (z) :=  H (z) ,
det(H)
where  is the Epanechnikov kernel defined in (9.29), since it corresponds with the
continuous case and not with the lattice case.
Remark 9.5.2. We can even consider a perturbated version of T , say Tc , by adding a

multiple of the diagonal matrix formed by the diagonal elements of ⌃, i.e., Tc = (⌃ +
1/2
c diag(⌃)) , where c is a positive constant of order O(1). The linear transformation
Tc can be considered as a regularization of T that does not change the scale of the
transformation T .
9.5.4 On the Stopping Criterion
A well known fact about the EM Algorithm is that, given a starting point, it converges
to a saddle point or a local maximum of the likelihood function. Unless we know
beforehand that the likelihood function has a unique global maximum, we can not be
sure that the output of the EM Algorithm is the MLE we are looking for. The same
phenomenon occurs in the case of the Monte Carlo EM Algorithm, and for that reason
345
Casella and Robert in [4] recommend to generate a set of N (usually N around five)
parallel independent Monte Carlo EM sequences starting from a set of over-dispersed
initial guesses. Usually, we do not know even the scale of the coordinates of our
unknown parameter ✓ = (c1 , c2 , . . . , cd ). For that reason, we recommend to run only
the phase I of our algorithm over a set of uniformly distributed random samples
Q
drawn from a d-dimensional hyper rectangle di=1 (0, Ci ], where Ci is a reasonable,
case dependent, upper bound for each reaction rate parameter ci . We observed in our
numerical experiments, that the result of this procedure is a number of points laying
on a low dimensional manifold. Once this manifold is identified, N di↵erent initial
guesses are taken as over-dispersed seeds for the phase II.
Note that the stochastic iterative scheme given by formula (9.32) may be easily
adapted to produce N parallel stochastic sequences where, for each i = 1, 2, . . . , N ,
the distribution of the random variable ✓ÎI,i depends on its history, (✓ÎI,i )pk=1 , only
(p+1) (k)
trough its previous value, ✓ÎI,i . In this sense, the N sequences, (✓ÎI,i )+1
(p) (p+1)
p=1 , are Markov
Chain Monte Carlo(MCMC) sequences [27, 4]. There is a number of convergence

assessment techniques or convergence diagnostic tools in the literature of MCMC; in
this article, we adopt the R̂ criterion by Gelman and Rubin [28, 29], that monitors
(p) +1
the convergence of N parallel random sequences ( i )p=1 , where i = 1, 2, . . . , N .
Compute:
N ⇣
X ⌘2 Xp N
p ¯·,i ¯ , where ¯·,i := 1 1 X¯
and ¯ :=
(k)
B := i ·,i , and
N 1 i=1
p k=1 N i=1
N p ⇣ ⌘2
1 X 1 X
W := s2i , where s2i :=
(k) ¯·,i .
i
N i=1
p 1 k=1
Then define
r
p 11 V
V := W + B and R̂ := . (9.33)
p p W
346
B and W are known as between and within variances, respectively. It is expected that
R̂ (potential scale reduction) declines to 1 as p ! +1. In our numerical experiments
we use 1.4 as a threshold. Quoting Gelman and Shirley in Chapter 6 of [30] “At
convergence, the chains will have mixed, so that the distribution of the simulations
between and within chains will be identical, and the ratio R̂ should equal 1. If R̂
is greater than 1, this implies that the chains have not fully mixed and that further
simulation might increase the precision of inferences”.
Once we stop to iterate after p⇤ iterations, the individual outputs,
(p⇤ ) (p⇤ ) (p⇤ )

✓ÎI,1 , ✓ÎI,2 , . . . , ✓ÎI,N
form a small cluster. We can not be totally sure that this cluster is near to the MLE,
but at least we have some kind of confidence on that. In such a case, we can use
the mean of that small cluster as a MLE estimation of our unknown parameter, ✓.
Otherwise, if we have two or more clusters or overdispersed results, we should make
a more careful analysis.
Remark 9.5.3. The R̂ stopping criterion works if there is only one local maximum
in the basin of attraction of the algorithm. Otherwise R̂ may not decrease to 1,
even worse, it may go to +1. For that reason, it is recommendable to monitor the
evolution of R̂. In our numerical examples we have that R̂ is decreasing and we stop
the algorithm using R̂0 = 1.4 as a threshold.
In this section, we present numerical results that show the performance of our FREM
(0)
algorithm. In the phase I, we use the alternative definition of ✓II,i described in Remark
9.4.1. For the phase II, we run N = 4 parallel sequences using 1.4 as a threshold for R̂
(described in Section 9.5.4). As a point estimator of ✓, we provide the cluster average
347
(p⇤ ) (p⇤ ) (p⇤ )
of the sequence ✓ÎI,1 , ✓ÎI,2 , . . . , ✓ÎI,N .
For each example, we report: i) the number of iterations of the phase II, p⇤ ; ii) a
(0) (0)
table containing a) the initial points, ✓I,i , b) the outputs of the phase I, ✓II,i , and c)
(p⇤ )
the outputs of the phase II, ✓ÎI,i ; and iii) a figure with all those values.
For the examples in which we generate synthetic data, we provide the seed parame-
ter ✓G we used to generate the observations. It is important to stress that the distance
from our point estimator to ✓G depends of the number of generated observations.
9.6.1 Decay Process
We first start with a simple model, with only one species and two reaction channels
0 1 0 1
B 1 C B c1 X C
⌫=@ A and a(X) = @ A.
4 c2 X · 1X 4
We set X0 =100, T =1 and consider synthetic data observed in uniform time inter-
1
vals of size t= 16 . This determines a set of 17 observations generated from a single
path, using the parameter ✓G =(3.78, 7.20). The data trajectory is shown in Figure
9.1.
(0) (0)
For this example, we ran N =4 FREM-sequences starting at ✓I,1 =(1, 5), ✓I,2 =(6, 5),
(0) (0) ˆ
✓I,3 =(1, 9) and ✓I,4 =(6, 9). We obtained a cluster average of ✓=(3.68, 7.50), and took
p⇤ =3 iterations to converge (minimum imposed). Details can be found in Table 9.1
and Figure 9.2.
(p⇤ )
= ✓ÎI,i
(0) (0)
i ⇤ = ✓I,i 3 = ✓II,i
1 (1, 5) (1.35, 10.67) (3.65, 7.52)
2 (6, 5) (7.85, 9.11) (3.80, 7.46)
3 (1, 9) (1.20, 10.71) (3.63, 7.50)
4 (6, 9) (7.06, 9.30) (3.65, 7.50)
Table 9.1: Values computed by the FREM Algorithm for the decay example.
348
Data trajectory
100
X
90
80
70
Species count
60
50
40
30
20
10
0 0.2 0.4 0.6 0.8 1
Time
Figure 9.1: Data trajectory for the Decay example. This is obtained by observing the
values of an SSA path at uniform time intervals of size t=1/16.
11
10
9
5
1 2 3 4 5 6 7 8
Figure 9.2: FREM estimation (phase I and phase II) for the decay process.
349
Remark 9.6.1. Recall that the distance between the value ✓G used to generate syn-
thetic data and the the estimation ✓ˆ is meaningless for small data sets. The relevant
distance in this estimation problem is the one we obtain from our FREM algorithm
✓ˆ and the ✓ˆMLE based on maximizing the true likelihood function, but this last one is
not available in most cases.
9.6.2 Wear in Cylinder Liners
We now test our FREM algorithm by using real data. We will show that these data
can be modeled using a decay process. The data set w = {wi }ni=1 , taken from [31],
consists of wear levels observed on n = 32 cylinder liners of eight-cylinder SULZER
engines as measured by a caliper with a precision of = 0.05 mm. Data are presented
in Figure 9.3.

5
4
Wear [mm]
0
0 1 2 3 4 5 6
x 10
Figure 9.3: Data set from [31]. Data refer to cylinder liners used in ships of the
Grimaldi Group.
The finite resolution of the caliper allows us to represent the set of possible mea-
surements using a finite lattice. We propose to model the measurements as a Marko-
350
vian pure jump process. Let X(t) be the thickness process derived from the wear
of the cylinder liners up to time t, i.e., X(t) = X0 W (t), where W is the wear
process and X0 is the initial thickness. The final time of some observations is close
to T =60, 000 hours.
We model X(t) as a decay processes with two reaction channels and = 0.05,
since a simple decay process is not enough to explain the data. The two considered
intensity-jump pairs are (a1 (x), ⌫1 ) = (c1 x, ) and (a2 (x), ⌫2 ) = (c2 x, 4 ). Here
c1 and c2 are coefficients with dimension (mm · hour) 1 .
The linear propensity functions, the value X0 =5 mm and the initial values for
(0) (0) (0) (0)
the phase I: ✓I,1 =(1, 1), ✓I,2 =(10, 1), ✓I,3 =(1, 10) and ✓I,4 =(10, 10), are motivated by
previous studies of the same data set, see [26] for details.
In our computations, we re-scaled the original problem by setting =1 and T =1.
ˆ
Our FREM algorithm estimation gave us a cluster average of ✓=(8.9, 5.7) which cor-
ˆˆ
responds to ✓=(1.5 · 10 4 , 0.97 · 10 4 ) in the non-scaled model. The algorithm took
p⇤ =93 iterations to converge. Details can be found in Table 9.2 and Figure 9.4.
12
10
8
c2
0
0 5 10 15 20 25 30 35 40
c1
Figure 9.4: FREM estimation (phase I and phase II) for the wear data set.
351
(p⇤ )
= ✓ÎI,i
(0) (0)
i ⇤= ✓I,i 3 = ✓II,i
1 (1, 1) (2.81, 9.90) (8.56, 5.83)
2 (10, 1) (36.88, 1.58) (9.07, 5.71)
3 (1, 10) (1.13, 10.31) (8.68, 5.80)
4 (10, 10) (11.44, 7.79) (9.34, 5.62)
Table 9.2: Values computed by the FREM Algorithm for the wear example.
Data and the 90% confidence band Data and the 90% confidence band
5 5
Data
The new 90% confidence band
4 4
Thickness
Thickness
3 3
2 2
1 1
Data
The 90% confidence band
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
4 Operating time [h] 4
x 10
Figure 9.5: Left: confidence band with the parameter ✓˜ obtained in [26] for the wear
example. Right: the confidence band obtained with the FREM algorithm.
Remark 9.6.2. In this particular example, the data set has been obtained with the
help of a caliper with a finite precision. Therefore, our likelihood should incorporate
also the distribution of the measurement errors, which may be assumed Gaussian and
independent and identically distributed with mean zero and variance equals to the
caliper’s precision. We omitted this step in our analysis for the sake of simplicity and
brevity.
ˆˆ
Remark 9.6.3. Comparing our FREM estimate, ✓=(1.5 · 10 4 , 0.97 · 10 4 ), with the
˜
value obtained in [26] for the same data set and the same model, ✓=(0.63 · 10 4 , 1.2 ·
10 4 ), we obtained the same scale in the coefficients and a quite similar confidence
band, see Figure 9.5.
352
9.6.3 Birth-death Process
This model has one species and two reaction channels:
1c 2 c
; ! X, X ! ;

0 1 0 1
B 1 C B c1 C
⌫=@ A and a(X) = @ A.
1 c2 X
Since we are not continuously observing the paths of X, an increment of size k in

the number of particles in a time interval [t1 , t2 ], may be the consequence of any
combination of n+k firings of channel 1 and n firings of channel 2 in that interval.
This fact turns non-trivial the estimation of c1 and c2 .
We set X0 =17, T =200 and consider synthetic data observed in uniform time
intervals of size t=5. This determines a set of 41 observations generated form
a single path, using the parameter ✓G =(1, 0.06). The data trajectory is shown in
Figure 9.6.
(0) (0)
For this example we ran N =4 FREM sequences starting at ✓I,1 =(0.5, 0.04), ✓I,2 =(0.5, 0.08),
(0) (0)
✓I,3 =(1.5, 0.04) and ✓I,4 =(1.5, 0.08). Those points where chosen after a previous ex-
ploration with the phase I.
ˆ
Our FREM estimation gave us a cluster average of ✓=(1.22, 0.065). The FREM
algorithm took p⇤ =95 iterations to converge. Details can be found in Table 9.3 and
Figure 9.7.
9.6.4 SIR Epidemic Model
In this section we consider the SIR epidemic model (susceptible-infected-removed

species), where X(t) = (S(t), I(t), R(t)), and the total population is constant, S+I+R =
353
Data trajectory
30
X
28
26
24
Species count
22
20
18
16
14
12
0 50 100 150 200
Time
Figure 9.6: Data trajectory for the Birth-death example. This is obtained by observ-
ing the values of an SSA path at uniform time intervals of size t=5.
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Figure 9.7: FREM estimation (phase I and phase II) for the birth-death process.
N (see [32]). The importance of this example lies in the fact that has a non-linear
propensity function and it has two dimensions.
354
(p⇤ )
= ✓ÎI,i
(0) (0)
i ⇤= ✓I,i 3 = ✓II,i
1 (0.5, 0.04) (6.24e-01, 3.29e-02) (1.24e+00, 6.55e-02)
2 (0.5, 0.08) (7.68e-01, 4.07e-02) (1.29e+00, 6.67e-02)
3 (1.5, 0.04) (1.01e+00, 5.25e-02) (1.18e+00 6.27e-02)
4 (1.5, 0.08) (1.53e+00, 7.97e-02) (1.20e+00, 6.34e-02)
Table 9.3: Values computed by the FREM Algorithm for the birth-death example.
This model has two reaction channels,
S+I ! 2I, I ! R
described by the stoichiometric matrix and the propensity function

0 1
1 0 C 0 1
B
B C B SI C
⌫=B
B 1 1 C
C and a(X) = @ A.
@ A I
0 1
We set X0 =(300, 5), T =10 and consider synthetic data generated using the parame-
ters ✓G =(1.66, 0.44) by observing at uniform time intervals of size t=1. The data
trajectory is shown in Figure 9.8.
(0)
For this example we ran N =4 FREM sequences starting at ✓I,1 =(0.40, 0.05),
(0) (0) (0)
✓I,2 =(0.40, 1.00), ✓I,3 =(3.00, 0.05) and ✓I,4 =(3.00, 1.00). Those points where chosen
after some previous exploration with the phase I.
ˆ
Our FREM algorithm estimation gave us a cluster average of ✓=(1.65, 0.39). The
FREM algorithm took p⇤ =3 iterations to converge (minimum imposed). Details can
be found in Table 9.4 and Figure 9.9.
(p⇤ )
= ✓ÎI,i
(0) (0)
i ⇤ = ✓I,i 3 = ✓II,i
1 (0.40, 0.05) (1.50, 0.38) (1.65, 0.39)
2 (0.40, 1.00) (1.50, 0.38) (1.65, 0.39)
3 (3.00, 0.05) (1.50, 0.38) (1.66, 0.39)
4 (3.00, 1.00) (1.50, 0.38) (1.66, 0.39)
Table 9.4: Values computed by the FREM Algorithm for the SIR model.
355
Data trajectory
100
90 S
I
80
70
Species count
60
50
40
30
20
10
0
0 2 4 6 8 10
Time
Figure 9.8: Data trajectory for the SIR example. This is obtained by observing the
values of an SSA path at uniform time intervals of size t=1.
1
0.9
0.8
0.7
0.6
c2
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3
c1
Figure 9.9: FREM estimation (phase I and phase II) for the SIR model. In this
particular case, where the results of the phase I are a single point, 4 MCMCs seem
to be unnecessary, but the R̂ criterion needs at least 2 chains.
356
9.6.5 Auto-regulatory Gene Network
The following model, taken from [7] has eight reaction channels and five species. It
has been selected to test the robustness of our FREM algorithm to deal with several
dimensions and several reactions.
1 c 2 c
DN A + P2 ! DN A P2 , DN A P2 ! DN A + P2
3 c 4 c
DN A ! DN A + mRN A, mRN A ! ;
5 c c
6
P +P ! P2 , P2 ! P +P
7 c 8c
mRN A ! mRN A + P, P ! ;
0 1 0 1
1 1 0 0 1 c1 DN A P2
B C B C
B C B C
B 1 1 0 0 1 C B c2 DN A · P2 C
B C B C
B C B C
B 0 0 1 0 0 C B c3 DN A C
B C B C
B C B C
B C B C
B 0 0 1 0 0 C B c4 mRN A C
⌫=B
B
C and a(X) = B
C B
C.
C
B 0 0 0 2 1 C B c5 P (P 1) C
B C B C
B C B C
B 0 0 0 2 1 C B c 6 P2 C
B C B C
B C B C
B C B C
B 0 0 0 1 0 C B c7 mRN A C
@ A @ A
0 0 0 1 0 c8 P
Quoting [7], “DN A, P , P2 , and mRN A represent DN A promoters, protein gene

products, protein dimers, and messenger RN A molecules, respectively”. Same as in
the cited work, we also set the initial state of the system at
X0 = (DN A, DN A P2 , mRN A, P, P2 ) = (7, 3, 10, 10, 10),

357
and run the system to the final time T = 50. Synthetic data is gathered by observing
a single trajectory generated using ✓G = (0.1, 0.7, 0.35, 0.3, 0.1, 0.9, 0.2, 0.1) at uniform
time intervals of size t= 12 . The data trajectory is shown in Figure 9.10. For this
Data trajectory
16
DNA
14 DNA-P2
mRNA
12
P
P2
Species count
10
0
0 10 20 30 40 50
Time
Figure 9.10: Data trajectory for the auto-regulatory gene network example obtained
by observing the values of an SSA path at uniform time intervals of size t= 12 .
(0) (0)
example we ran N =2 FREM sequences starting at ✓I,1 = 0.1 v and ✓I,2 = 0.5 v,
respectively, where v is the vector of R8 with all its components equal to one. Our
FREM algorithm estimation gave us a cluster average of
✓ˆ = (0.107, 0.649, 0.337, 0.319, 0.087, 0.835, 0.053, 0.025),
which is in the range of the components of ✓G and seems to be a satisfactory estimation

of ✓. The FREM algorithm took p⇤ =169 iterations to converge, taking 2 days in our
workstation configuration: a 12 core Intel GLNXA64 architecture with MATLAB
version R2014b.
Remark 9.6.4. Observe that in the examples where the stoichiometric vectors are
358
(0)
linearly dependent, the results of the phase I, ✓II,i , i = 1, 2, 3, 4, lies in a hyperplane
that reflects certain amount of indi↵erence in the coefficient estimations. This does
not happen in the SIR example where all the estimations in the phase I are essentially
the same.
9.7 Conclusions
In this work, we addressed the problem of efficiently computing approximations of

expectations of functionals of bridges in the context of stochastic reaction networks
by extending the forward-reverse technique developed by Bayer and Schoenmakers in
[2]. We showed how to apply this technique in the statistical problem of inferring the
set of coefficients of the propensity functions. We presented a two-phase approach,
namely FREM algorithm, in which the first phase, based on reaction-rate ODEs is
deterministic and intended to provide a starting point that reduces the computational
work of the second phase, which is properly the Monte Carlo EM Algorithm. Our
novel algorithm for generating bridges provides a clear advantage over shooting meth-
ods and methods based on acceptance rejection techniques. Our work is illustrated
with numerical examples. As future work we plan to incorporate higher order kernels
and multilevel Monte Carlo methods in the FREM algorithm.
Acknowledgments
The research reported here was supported by King Abdullah University of Science
and Technology (KAUST). A. Moraes, R. Tempone and P. Vilanova are members of
the KAUST SRI Center for Uncertainty Quantification at the Computer, Electrical
and Mathematical Sciences & Engineering Division at King Abdullah University of
Science and Technology (KAUST).
359
Appendix
360
Algorithm 30 The F-R (forward-reverse) path generation algorithm in the MCEM
phase, for a given time interval, [s, t]. Inputs: the initial sample size, M0 , the co-
efficient of variation threshold, cv0 , the initial time, s, the final time, t, the initial
observed state, x(s), and the final observed state, x(t). Outputs: a sequence of the
number of times that a reaction channel fired in the given time interval, ((rj,l )Jj=1 )Ll=1 ,
a sequence of forward Euler values for the given time interval, ((uj,l )Jj=1 )Ll=1 and a
sequence of kernel weights for the given time interval, ((wj,l )Jj=1 )Ll=1 . Notes: Here Vd
(f )
is the volume of a d dimensional unit sphere, X̃·,·,n is the sampled forward process
(f ) (b) (b)
value at time tn , X̃·,·,n0 is the sampled reverse process at time tn0 ,  is the Kronecker
delta kernel and e is the Epanechnikov kernel, L is the number of joined F-R paths
in the time interval [s, t], where 0  L  M̃ 2 . Finally, 0 < < 1 and CL is an integer
greater than 1 (in our examples we use 2).
1: M̃ 1
2: M M0
1
3: t⇤ 2
(t s)
4: while cv cv0 do
5: for m = M̃ to M̃ +M 1 do
(f ) (f ) N (m) (f )
6: ((X̃·,m,n , tm,n )n=1 , (rj,m )Jj=1 ) FW path from s to t⇤ starting at x(s)
(f ) (f ) (f ) (f )
7: uj,m (tm,n+1 tm,n )gj (X̃·,m,n )
(b) (b) N 0 (m) (b)
8: ((X̃·,m,n0 , tm,n0 )n=1 , (rj,m )Jj=1 ) RV path from t to t⇤ starting at x(t)
(b) (b) (b) (b)
9: uj,m (tm,n0 +1 tm,n0 )gj (X̃·,m,n0 +1 )
10: end for
(f,b) (f,b) (f,b)
11: (u·,l , r·,l , w·,l )Ll=1 join F-R paths (X̃·,· (t⇤ ), (rj,· )Jj=1 , (↵j,· )Jj=1 ,  )
(f ) (b)
12: Here, ↵j,l = ↵j,m + ↵j,m s.t. m 2 {1, 2, ..., M̃ } and
(f ) (b)
13:  (X̃·,m (t⇤ ), X̃·,m (t⇤ )) > 0. Similarly for rj,l .
14: if L < d M̃ e then
(f ) (b)
15: ⌃ covariance matrix of (X̃·,m (t⇤ ), X̃·,m (t⇤ ))
16: ⌃ ⌃ + c diag(⌃), where c is a positive constant.
17: if ⌃ 1/2 not singular then
1
18: H 3
⌃ 1/2 ( VM̃d )1/d
19: ⇣ 1
20: repeat
(f ) (f )
21: Ỹ·,m (t⇤ ) ⇣H X̃·,m (t⇤ )
(b) (b)
22: Ỹ·,m (t⇤ ) ⇣H X̃·,m (t⇤ )
(f,b) (f,b) (f,b)
23: (u·,l , r·,l , w·,l )Ll=1 join F-R paths (Ỹ·,· (t⇤ ), (rj,· )Jj=1 , (↵j,· )Jj=1 , e )
24: ⇣ 1.5⇣
25: until L  CL M̃
26: end if
27: end if
28: compute the coefficient of variation of (u·,l )Ll=1 and (r·,l )Ll=1 (see section 9.5.1)
29: M̃ M̃ + M
30: M 2M
31: end while
361
Algorithm 31 The F-R path join algorithm in the MCEM. Inputs: a sequence of
forward-backward samples for the time interval [s, t] evaluated at the intermediate
(f,b)
time, t⇤ , X̃·,· (t⇤ ), a sequence of the number of times that a reaction channel fired
(f,b)
in the forward interval [s, t⇤ ] and in the reverse interval [t⇤ , t], r·,· , the sequence
of forward Euler values for each reaction channel for the forward interval [s, t⇤ ] and
(f,b)
for the backward interval [t⇤ , t], u·,· , and the kernel . Outputs: the number of
joined paths, L, a sequence of the number of times that a reaction channel fired in
the interval [s, t], ((rj,l )Jj=1 )Ll=1 , the sequence of forward Euler values for each reaction
channel for the interval [s, t], ((uj,l )Jj=1 )Ll=1 and the sequence of kernel weights for
the interval [s, t], ((wj,l )Jj=1 )Ll=1 . Notes: S is a two dimensional sparse matrix of size
C ⇥ M̃ .
1: L 0
2: for i = 1 to d do
(f,b)
3: Ai minm bX̃i,m (t⇤ )c
(f,b)
4: Bi maxm dX̃i,m (t⇤ )e
5: Ei 1 + B i Ai
6: end for
7: for m = 1 to M̃ do
(f )
8: pi 1 + dX̃i,m (t⇤ )e Ai
9: c convert(p, E) (converts d dimensional address to {1, ..., C})
10: Sc,n(c)+1 m, where n(c) is the number of elements in row c of S
11: n(c) n(c) + 1
12: end for
13: for m = 1 to M̃ do
d (b)
14: (bk )3k=1 get neighboring sub-boxes of X̃·,m (t⇤ ) s.t. bk 2 {1, ..., C}
15: for k = 1 to 3d do
16: for j = 1 to n(ck ) do
17: ` Sck ,j
(f ) (b)
18: v (X̃·,` (t⇤ ), X̃·,m (t⇤ ))
19: if v > 0 then
20: L L+1
(f ) (b)
21: ul u` + um
(f ) (b)
22: rl r` + rm
23: wl v
24: end if
25: end for
26: end for
27: end for
362
REFERENCES
[1] M. H. Holmes, Introduction to the foundations of applied mathematics, ser. Texts
in applied mathematics. Dordrecht, London: Springer, 2009.
[2] C. Bayer and J. Schoenmakers, “Simulation of forward-reverse stochastic rep-

resentations for conditional di↵usions,” Annals of Applied Probability, vol. 24,
no. 5, pp. 1994–2032, October 2014.
[3] A. Dempster, N. Laird, and D. Rubin, “Maximum likelihood from incomplete

data via the EM algorithm,” Journal of the Royal Statistical Society, vol. 39
(Series B), pp. 1–38, 1977.
[5] M. Watanabe and K. Yamaguchi, The EM Algorithm and Related Statistical

Models. Marcel Dekker Inc, 10 2003.
[6] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions, 2nd ed.
Wiley-Interscience, 3 2008.
[7] B. J. Daigle, M. K. Roh, L. R. Petzold, and J. Niemi, “Accelerated maximum

likelihood parameter estimation for stochastic biochemical systems,” BMC bioin-
formatics, vol. 13, no. 1, p. 68, 2012.
[8] Y. Wang, S. Christley, E. Mjolsness, and X. Xie, “Parameter inference

for discretely observed stochastic kinetic models using stochastic gradient
descent,” BMC Systems Biology, vol. 4, no. 1, p. 99, 2010. [Online]. Available:
http://www.biomedcentral.com/1752-0509/4/99
[9] P. J. Green, “Reversible jump Markov chain Monte Carlo computation and
Bayesian model determination,” Biometrika, vol. 82, pp. 711–732, 1995.
[10] R. Boys, D. Wilkinson, and T. Kirkwood, “Bayesian inference for a discretely

observed stochastic kinetic model,” Statistics and Computing, vol. 18, no. 2, pp.
125–135, Jun. 2008.
363
9 2005.

tion), 2nd ed. Imperial College Press, 2005.
[13] C. Gillespie, “Moment-closure approximations for mass-action models,” IET Syst

Biol, vol. 3, no. 1, 2009.
[14] P. Smadbeck and Y. Kaznessis, “A closure scheme for chemical master equa-
tions,” Proc Natl Acad Sci USA, vol. 110, no. 35, 2013.

vol. 22, pp. 403–434, 1976.
Physics, vol. 127, no. 21, p. 214107, 2007.

[19] ——, “Multilevel adaptive reaction-splitting simulation method for stochastic

reaction networks,” submitted to SIAM Journal of Scientific Computing, preprint
arXiv:1406.1989, 2014.
[20] G. N. Milstein, J. G. Schoenmakers, and V. Spokoiny, “Transition density esti-

mation for stochastic di↵erential equations via forward-reverse representations,”
Bernoulli, vol. 10, no. 2, pp. 281–312, 2004.
[21] C. Bayer, H. Mai, and J. Schoenmakers, “Forward-reverse EM algorithm for

Markov Chains,” preprint WIAS, 2013.
364
[24] N. Van Kampen, Stochastic Processes in Physics and Chemistry, Third Edition
(North-Holland Personal Library), 3rd ed. North Holland, 2007.
[25] L. C. G. Rogers and D. Williams, Di↵usions, Markov processes, and martingales.

Volume 1, Foundations, ser. Cambridge mathematical library. Cambridge, U.K.,
New York: Cambridge University Press, 2000.

wear degradation in cylinder liners,” Multiscale Modeling & Simulation, vol. 12,
no. 1, pp. 396–409, 2014.
[27] J. Norris, Markov Chains (Cambridge Series in Statistical and Probabilistic

Mathematics). Cambridge University Press, 7 1998.
[28] A. Gelman and D. B. Rubin, “Inference from iterative simulation using multiple
sequences (with discussion),” Statistical Science, vol. 7, p. 457511, 1992.
[29] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B.

Rubin, Bayesian Data Analysis, Third Edition (Chapman & Hall/CRC Texts in
Statistical Science), 3rd ed. Chapman and Hall/CRC, 11 2013.
[30] S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Eds., Handbook of Markov
Chain Monte Carlo (Chapman & Hall/CRC Handbooks of Modern Statistical
Methods), 1st ed. Chapman and Hall/CRC, 5 2011.
[31] M. Giorgio, M. Guida, and G. Pulcini, “An age- and state-dependent Markov
model for degradation processes,” IIE Transactions, vol. 43, no. 9, pp. 621–632,
2011.

and Epidemiology (Texts in Applied Mathematics), 2nd ed. Springer, 2011.
365
APPENDICES
Appendix A
Brief Review of Probability and

Random Processes
In this chapter, we present a minimal set of concepts and results from Probability
Theory and the Theory of Random Processes needed to understand the main results
of this thesis. A brief list of references can be [1, 2, 3].
A.1 Probability Spaces
Let ⌦ be a nonempty set such that its elements, ! 2 ⌦, are the possible outcomes of
a random experiment. Examples of random experiments are:
Example A.1.1.
1. Roll a dice: ⌦ = {1, 2, 3, 4, 5, 6}
2. Pick a point at random in the unit interval: ⌦ = [0, 1]
3. Flip a coin infinite times: ⌦ = {(Mi )1

i=1 : Mi 2 {H, T }, 8i}
366
An Event is a subset A ⇢ ⌦. Examples are:
Example A.1.2.
1. Roll a dice and obtain an odd number: A = {1, 3, 5}
2. Pick a point at random in the unit interval and obtain a number larger than
0.3: A = (0.3, 1]
3. Flip a coin infinite times and obtain head in the first toss: A = {(Mi )1
i=1 : M1 =
H, Mi 2 {H, T }, 8i 2}
The outcome ! is not an event but {!} is. Let A be an event and !0 2 ⌦ be
the outcome of one particular realization of the considered random experiment. If
!0 2 A, we say that the event A ‘has happened’, else, we say that its complement,
Ac := {! 2 ⌦ : ! 2
/ A}, ‘has happened’.
Definition A.1.3 (Sigma-algebra of Events). A nonempty family, F, of events of ⌦

is called Sigma-algebra if
(i) ⌦ 2 F
(ii) A 2 F implies Ac 2 F
(iii) (Ai )1 1
i=1 ⇢ F implies [i=1 Ai 2 F
Immediate consequences are:
(a) ; 2 F
(b) (Ai )1 1
i=1 ⇢ F implies \i=1 Ai 2 F
(c) If A, B 2 F then A [ B 2 F and A \ B 2 F
Definition A.1.4 (Probability Space). A probability space is a mathematical triplet

(⌦, F, P) such that, ⌦ is a nonempty set of outcomes; F is a sigma-algebra of events
of ⌦ and P is a set function P : F ! [0, 1] such that (Kolmogorov Axioms):
367
(i) P (⌦) = 1
(ii) If (Ai )1 1
i=1 is a collection of pairwise disjoint events of F, then P ([i=1 Ai ) =
P1
i=1 P (Ai ).
P is called a Probability Measure on (⌦, F).
(a) P (Ac ) = 1 P (A)
(b) P (;) = 0
(c) if A, B 2 F such that A ⇢ B then P (A)  P (B)
(d) if A, B 2 F then P (A [ B) = P (A) + P (B) P (A \ B)
(e) if A, B, C 2 F then P (A [ B [ C) = P (A) + P (B) + P (C) P (A \ B)

P (A \ C) P (B \ C) + P (A \ B \ C)
Definition A.1.5 (Conditional Probability). Let H 2 F such that P (H) > 0, i.e.,
an event of strictly positive probability. The function PH : F ! [0, 1] such that
PH (A) := P (A \ H) /P (H)
defines a probability measure on (H, F) called conditional probability on H. We use

the notation P A H for PH (A).
Definition A.1.6 (Independence of events). Let A, B 2 F, we say that A and B

are independent if and only if P (A \ B) = P (A) P (B). More generally, any family
A ⇢ F is called independent, if and only if, for any finite subfamily (Ai )N
i=1 of A it
QN
is true that P \N
i=1 Ai = i=1 P (Ai ).
Intuitively, A is independent from B when P A B = P (A).

368
A.2 Random Variables
Definition A.2.1 (Random Variable). A function X : ⌦ ! R is a random variable

defined on (⌦, F, P) if {X  a} := {! 2 ⌦ : X(!)  a} 2 F, 8a 2 R.
Definition A.2.2 (CDF). The Cumulative Distribution Function (CDF) of the ran-
dom variable X is defined as FX (a) = P (X  a), 8a 2 R.
(a) If a < b then FX (a)  FX (b) (monotonic non-decreasing)
(b) FX (a) ! 0 as a ! 1
(c) FX (a) ! 1 as a ! +1
(d) FX (a + ✏) ! FX (a) as ✏ ! 0+ (right-continuous)
Definition A.2.3 (Discrete random variables). A random variable X is discrete if

there exists a finite or countable set {xi } 2 R such that FX is piece-wise constant with
jumps at {xi }. In such a case, we define the Probability Mass Function (PMF) of X
as pX (a) := FX (a) FX (a ) = P (X = a), 8a 2 R.
P
Observe that pX will be nonzero only at {xi } and i pX (xi ) = 1.
Definition A.2.4 (Continuous random variables). A random variable X is continu-

ous if exists a function fX : R ! R such that for any a < b, we have FX (b) FX (a) =
Rb
f (x)dx. In such a case, P (X = a) = 0, for any a 2 R, and fX is called Proba-
a X
bility Density Function (PDF) of X.
R +1
Observe that fX 0 and 1
fX = 1.
369
Definition A.2.5 (Independence of random variables). A family {Xi }i2I of random
variables, where Xi has CDF Fi , is said independent, if for every finite subset, J ⇢ I,
we have
Y
P (\ik 2J {Xik  xik }) = Fik (xik ), 8{xik }ik 2J .
ik 2J
Definition A.2.6 (Identically distributed). A family of random variables is said

identically distributed if all the members of the family have the same CDF.
Definition A.2.7 (IID). A family of random variables is said independent and iden-
tically distributed (iid) if it is an independent family and all the members of the family
have the same CDF.
Now, we define expectation of a random variable. Let g : R ! R be any function.
Definition A.2.8. If X is a discrete random variable, we define
X
E [g(X)] := g(xi )pX (xi ).
i
Definition A.2.9. If X is a continuous random variable, we define
Z +1
E [g(X)] := g(x)fX (x)dx.
1
Notice that we have convergence issues here since E [X] is well defined except in
the case ‘1 1’.
(a) E [a] = a for any a 2 R
(b) E [aX + bY ] = aE [X] + bE [Y ] for any X and Y random variables and a, b 2 R
(c) If X  Y then E [X]  E [Y ]
(d) If h is a convex function then h(E [X])  E [h(X)]

370
Definition A.2.10 (Variance). The variance of X is defined as
⇥ ⇤
Var [X] := E (X E [X])2 .
It is easy to see that Var [X] = E [X 2 ] (E [X])2 .
(a) Var [a] = 0 for any a 2 R
(b) Var [aX] = a2 Var [X] for any X random variables and a 2 R
The following result is known as the Markov Inequality:

If h : R ! [0, +1) and a > 0 then
P (h(X) a)  E(h(X))/a.
By taking h(x) = ((x E [X])/ (X))2 and a = k 2 , we obtain the Chebyshev inequal-
ity:
P (|X E [X] | > k (X))  1/k 2 ,
p
where (X) := Var [X] is the standard deviation of X.
Definition A.2.11 (Moments of a random variable). The random variable X, has

finite moment of order r > 0 when E [|X|r ] < +1.
Definition A.2.12 (Moment Generating Function (MGF)). The Moment Generating

Function of a random variable X is defined as MX (t) := E [exp(tX)], if 9 ✏ > 0 such
that E [exp(tX)] < +1, 8t 2 ( ✏, ✏).
(a) If 8t 2 ( ✏, ✏), MX (t) = MY (t), then FX (t) = FY (t), 8t 2 R.

371
dn
(b) If MX converges in ( ✏, ✏), then dtn
MX (0) = E [X n ].
(c) If X and Y are independent RVs, then MX+Y = MX · MY .
Cherno↵ Bounds
Let X be a random variable with MGF MX . Then, 8t > 0
P (X > a) = P (exp (tX) > exp (ta))
(Markov Ineq)  E [exp (tX)] / exp (ta)
(def of MGF) = exp ( ta)MX (t)
Definition A.2.13 (Cherno↵ bound). Let X be a random variable with MGF MX ,

the Cherno↵ bound of X is: inf t>0 {exp ( ta)MX (t)}.
A.2.1 The Bernoulli random variable
Let ⌦ be nonempty, and A ⇢ ⌦ such that A 6= ; and A 6= ⌦. Define the Bernoulli

sigma-algebra F := {A, Ac , ;, ⌦} and the random variable X(!) = 1A (!). Define
p := P (X = 1) = P (A). Where, 1A (!), is the indicator function of the set A, taking
the value, 1, when ! 2 A, and, 0, otherwise.
In this case, we use the notation X ⇠ Bernoulli(p). It is immediate to obtain:
(a) E [X] = p
(b) Var [X] = p(1 p)
X can be interpreted as the outcome of a binary experiment in which if ! 2 A

means success and ! 2 Ac failure. Bernoulli trials can be used as building blocks
to define more complex random variables as the binomial, the geometric and the
negative binomial random variables.
372
A.2.2 Binomial and Geometric random variables
X is a binomial random variable with parameters n and p, noted X ⇠ Binomial(n, p),

when X is the number of successes in a sequence of n independent Bernoulli trials
each of which yields success with probability p. Then
n!
P (X=k) = pk (1 p)(n k)
, k = 0, 1, 2, . . . , n,
k!(n k)!
Y is a geometric random variable with parameter p, noted Y ⇠ Geometric(p),

when Y is the number of independent Bernoulli trials needed to get one success.
Then
P (Y =k) = (1 p)(k 1)
p, k = 1, 2, . . . .
A.2.3 The Uniform random variable
Let ⌦ = [0, 1]. Here the sigma-algebra F is defined as the intersection of all sigma-
algebras on [0, 1] containing the intervals of [0, 1]. F called the Borel sigma-algebra
generated by the intervals of [0, 1]. We say that U ⇠ U (0, 1) if U has PDF fU (x) =
1[0,1] (x). We have that
Fx (x) = x, 8x 2 [0, 1].
Computer languages use pseudo-random number generators to provide samples of U .

For instance, in MATLAB the function RAND is used to this end. Uniform random
variables are used to generate many other random variables.
A.2.4 Inverse Transformation Method
Definition A.2.14. Let X be a random variable with CDF F . Define
F (u) = inf{x : F (x) u}.

373
Theorem A.2.15 (Inverse transformation method). If U ⇠ U (0, 1) then the random
variable F (U ) has distribution F .
Proof:
For all u 2 [0, 1] and x 2 F ([0, 1]), we have that:
F (F (u)) u (since F is right continuous), and F (F (x))  x.
Therefore
{(u, x) : F (u)  x} = {(u, x) : F (x) u}.
We conclude that,
P F (U )  x = P (U  F (x)) = F (x).
A.2.5 The Exponential random variable
A random variable, T , satisfies the memory-less property if 8a, b > 0
P T > a + b T > b = P (T > a) .
It can be shown that the only continuous and positive random variable, T , that
satisfies this property is the exponential random variable.
We say that T ⇠ E( ) is exponentially distributed with rate > 0 when
P (T > t) = exp( t), 8t > 0.
In this case, we have:

374
(a) CDF FT (t) = 1 exp ( t) for t > 0.
(b) PDF fT (t) = exp( t) for t > 0.
(c) E [T ] = 1/ .
(d) Var [T ] = 1/ 2 .
(e) If U ⇠ U (0, 1), then ln(U )/ ⇠ E( ).
A.2.6 Minimum of independent random variables
Here we have to apply the definition of CDF and the independence among random
variables.
Let X1 , X2 , . . . , XN be independent random variables. We want to find the distri-

bution of X = min{X1 , X2 , . . . , XN }.
FX (x) = P (X  x) = 1 P (X > x) = 1 P (min{X1 , X2 , . . . , XN } > x) =

1 P (X1 > x, X2 > x, . . . , XN > x) = 1 P (X1 > x) P (X2 > x) · · · P (XN > x) =
1 (1 P (X1  x))(1 P (X2  x)) · · · (1 P (XN  x)) = 1 (1 FX1 (x))(1
FX2 (x)) · · · (1 FXN (x))
It is easy to see that if Tj ⇠ E( j ) for j = 1, 2, . . . , J, then, min{T1 , . . . , TJ } ⇠

E( 1 + ··· + J ).
A.2.7 Gaussian random variable
The random variable Y ⇠ N (µ, 2

) is said Gaussian with parameters µ 2 R and
2
> 0 if for all x 2 R,
Z x ✓ ◆2 !
1 1 t µ
FY (x) := p exp dt.
1 2⇡ 2 2
375
A.3 Stochastic Processes
Definition A.3.1. Given a probability space (⌦, F, P), a stochastic process, X :=

{Xi }i2I , is an indexed family of random variables taking values in a fixed set, C.
When, I = [0, +1), we write {Xt }t 0 and think of t as a time variable and X as a
function of time that evolves randomly. More specifically, X : [0, +1) ⇥ ⌦ 7! C, such
that X(t, ·) = Xt (·) is a random variable for any fixed t, and X(·, !) is a function of
t for any fixed !. In the last case X(·, !) : [0, +1) 7! C is the path of the process,
X, corresponding to the outcome !.
Definition A.3.2 (Continuous-time Markov Chain). A Stochastic process, X =

{X(t)}t 0 taking values in C is called a continuous-time Markov Chain if
P X(t+h) = j X(t)=i, (X(tk )=ik )nk=1 = P X(t+h) = j X(t)=i
for all i, j, i1 , . . . , in 2 C, 0  t1 , t2 , . . . , tn < t and h > 0.
It means that future probabilistic evolution of the process, X, depends only on

the present state of the process, X(t), and not on the past history of X.
A.3.1 The Poisson process
Let us consider a stochastic process, N , in [0, +1) taking values in {0, 1, 2, . . .}, such
that N (0) = 0 and
1. P (N (t + dt) N (t) = 1) = dt + o(dt)
2. P (N (t + dt) N (t) > 1) = o(dt)
3. for any t1 < t2 < s1 < s2 , we have N (t2 ) N (t1 ) and N (s2 ) N (s1 ) are
independent random variables.
376
Here dt is an infinitesimal and o(dt)/dt ! 0 as dt ! 0. It can be shown that
P (N (t + h) N (h) = k) = P (N (t) = k) = exp ( t)( t)k /k!,
for k = 0, 1, 2, . . . and for any h > 0. This is equivalent to say that N (t) ⇠ Poisson( t).
In general, X ⇠ P oisson( ) when
k
P (X=k) = exp ( ) /k!
for k = 0, 1, 2, . . ..
A.3.2 Relation between Poisson and Exponential
Let E1 , E2 , . . . be a sequence of independent and identically distributed (iid) ex-

ponential random variables with rate > 0. It can be shown that, if we define
Pi P
Si = k=1 Ek , then the random variable i 1Si t ⇠ Poisson( t).
If N (t) ⇠ Poisson( t), then
1. E [N (t)] = t.
2. Var [N (t)] = t
The following result can be considered as the superposition of two independent

Poisson processes.
If M (t) ⇠ Poisson(µt), independent of N (t), then
M (t) + N (t) ⇠ Poisson((µ + )t).
As a consequence, the inter-arrival times between two consecutive events in the su-
perposition of M and N are independent exponential random variables of rate µ + .
377
A.4 Convergence Concepts
In probability theory there are 4 di↵erent but interrelated concepts of convergence of

a sequence of random variables (Xn )+1
n=1 .
Definition A.4.1 (Convergence in probability). Let, (Xn )+1

n=1 , and, X, be random
variables defined in (⌦, F, P). The sequence sequence, (Xn )+1

n=1 , converges in prob-
ability to the random variable X if, 8 ✏ > 0, lim P (|Xn X| > ✏) = 0. We write
n!1
P
Xn ! X.
Definition A.4.2 (Almost sure convergence). Let, (Xn )+1

n=1 , and, X, be random
variables defined in (⌦, F, P). The sequence sequence, (Xn )+1

n=1 , converges almost
⇣ ⌘
a.s.
surely to the random variable X if, P lim Xn = X = 1. We write Xn ! X.
n!1
Definition A.4.3 (Convergence in law). Let, (Xn )+1

n=1 , and, X, be random variables
with CDFs (Fn )+1 +1

n=1 , and, F , respectively. The sequence sequence, (Xn )n=1 , converges
in law, or in distribution, to the random variable X if, lim Fn (x) = F (x), 8x in

n!1
which F is continuous. We write Xn ) X.
Definition A.4.4 (Lp convergence). Given a real number, p 1 and (Xn )+1
n=1 , and, X,
random variables such that E [|Xn |r ] < +1, 8n , and, E [|X|r ] < +1. The sequence,
(Xn )+1
n=1 , converges in Lp , to the random variable X if, lim E [|Xn X|r ] = 0. We
n!1
Lp
write Xn ! X.
Theorem A.4.5 (Strong Law of the Large Numbers (SLLN)). Let, (Xn )+1
n=1 be an
iid sequence of random variables such that E [|X1 |] < +1. Define µ := E [X1 ]. We
have that
X1 + X2 + · · · + Xn a.s.
! µ.
n
Theorem A.4.6 (Central Limit Theorem (CLT)). Let (Xi )+1

i=1 be an iid sequence of
378
random variables such that E [X] = µ and Var [X] = 2
. Then, for all x 2 R,
✓ ◆ Z x ✓ ◆
X1 + X2 + · · · + Xn nµ 1 1 2
lim P p x = p exp t dt
n!1 n 1 2⇡ 2
X1 +X2 +···+X n nµ
If we define, Zn := p
n
, and Z ⇠ N (0, 1), the CLT says that Zn ) Z.
This result justifies the following approximation: for large n we have that
X1 + X2 + · · · + Xn 2
⇡ N (µ, /n)
n
A.5 Branching Processes
Let Y be random variable taking values in {0, 1, 2, . . .} with probabilities {p0 , p1 , p2 , . . .}

respectively. Consider the following construction, denominated branching process
based on a triangular arrange of independent random variables (Yn,m )n,m 1 ⇠Y:
X0 = 1
X1 = Y1,1
X2 = Y2,1 + Y2,1 + · · · + Y2,X1
···
Xn = Yn,1 + Yn,2 + · · · + Yn,Xn 1
P1
Let P (s) := i=0 pi si 8s 2 [0, 1] and m := E [Y ].
Define ⇡ as the extinction probability, i.e., ⇡ := P (9N : XN = 0).
If m1 then ⇡=1. If m>1 then ⇡<1 and is the unique non-negative root of P (s)=s,
which is less than 1.
379
A.6 The Monte Carlo Method
The Monte Carlo method was created to solve the problem of integration in high
dimensions, where the usual deterministic quadrature methods failed. It is based on
the law of the large numbers. Let us assume that we want to compute
Z
I := f (x)dx.
[0,1]d
The idea is to sample M independent and identically distributed random vectors

X(!1 ), X(!2 ), . . . , X(!M ) such that X is uniformly distributed in [0, 1]d , and approx-
imate I by
M
1 X
Iˆ := f (X(!m )).
M m=1
The random variable Iˆ has expectation I since the probability density function of X
is 1[0,1]d (x). The variance of Iˆ is M 1/2
Var [f (X)]. If instead of sampling the ran-
dom vectors X(!m ), we carefully choose a deterministic sequence x1 , x2 , . . . , xM , the
1 PM ˆ Depending
average f (xm ) give us a quasi-Monte Carlo approximation of I.
M m=1
on the regularity of f , we can obtain a convergence rate proportional to M 1 .
To implement the Monte Carlo method, we should have at hand a random number
generator (RNG). In general, a RNG is a recipe that generates finite deterministic se-
quences of numbers in the interval [0, 1] which pass a number of statistical hypothesis
tests of uniformity. We also should have ways of sampling random variables with spe-
cific distributions from uniform random variables in [0, 1] such the Inverse Transform
Method (see Section A.2.4). For a general exposition of the Monte Carlo method in
statistics we refer to [4].
380
A.7 Multilevel Monte Carlo
In general, it is possible to reduce the variance of the Monte Carlo estimator of I,

1/2
M Var [f (X)] by sampling pairs (f (X), Y ), where Y is highly correlated with
f (X) and less computationally expensive to sample from. The random variable Y is
named, control variate. For simplicity, in what follows, we rename f (X) as X.
To reduce the variance of the standard Monte Carlo estimator
M
1 X
✓ˆ := X(!m )
M m=1
of E [X], we define another estimator which uses a control variate Y , correlated with
X, where E [Y ] is known. In fact, we assume that we can generate pairs (X(!), Y (!))
in such way that the cost of generating Y (!) is less than the corresponding cost of
generating X(!).
M
ˆ 1 X
✓2 := E [Y ] + {X Y )(!m ).
M m=1
h i h i
If Var [X Y ] < Var [X], we have that Var ✓ˆ2  Var ✓ˆ .
Observe here that if we do not know E(Y ), we can use a third unbiased estimator
of E [X]:
M0 M1
1 X 1 X
✓ˆ3 := Y (!m ) + {X Y )(!m ).
M0 m=1 M1 m=1
Here X Y is computed from the sampled pair (X, Y ); it means that X and Y are
not independent in general, and even more, they should be highly correlated.
Let us assume now that we have a hierarchy of L levels of approximation for
the random variable X, that is Y (0) , Y (1) , . . . , Y (L) are random variables, possibly
obtained by discretizing some dimension of the domain of definition of X, for instance,
381
if X is continuously defined in [0, T ], we can think of (Y (`) )L`=0 , as a hierarchy of
discretizations of X using a nested family of time-meshes with decreasing size. The
last reasoning can be extended to define:
M0 L M
1 X X 1 X̀ (`)
✓ˆL := Y (0) (!m ) + {Y Y (` 1)
}(!m ),
M0 m=1 `=1
M ` m=1
where Y (L+1) := X.
To compute ✓ˆL , one should be able to sample from Y (0) and from the pairs
h i
(Y (` 1) , Y (`) ) for ` = 0, 1, 2, . . . , L. The expected value of ✓ˆL is E [X] but Var ✓ˆL is
⇥ ⇤ L ⇥ ⇤
Var Y (0) X Var Y (`) Y (` 1)
+ .
M0 `=1
M`
In general, by using the same computational work for computing ✓ˆ and ✓ˆL , we find
h i h i
that, Var ✓ˆL < Var ✓ˆ . A review of Monte Carlo methods is given in [5].
REFERENCES
[1] S. Resnick, A Probability Path, 1st ed. Birkhäuser, 10 1999.
[2] E. Çinlar, Probability and Stochastics (Graduate Texts in Mathematics, Vol. 261),
1st ed. Springer, 2 2011.
[3] A. N. Shiryaev, Probability (Graduate Texts in Mathematics) (v. 95), 2nd ed.
Springer, 12 1995.
[5] M. Giles, “Multilevel Monte Carlo methods,” Monte Carlo and Quasi-Monte
Carlo Methods, pp. 79–98, 2012.
382
B Articles: Published and

Submitted
Simulation Algorithms:
A. Moraes, R. Tempone and P. Vilanova, “Hybrid Cherno↵ Tau-Leap”, SIAM

Multiscale Modeling and Simulation, Vol. 12, Issue 2, (2014).
A. Moraes, R. Tempone and P. Vilanova, “Multilevel Hybrid Cherno↵ Tau-

Leap”, Accepted for publication in BIT Numerical Mathematics, (2015).
A. Moraes, R. Tempone and P. Vilanova, “Multilevel adaptive reaction-splitting

simulation method for stochastic reaction networks”, preprint arXiv:1406.1989v1,
(2014).
Statistical Inference:
A. Moraes, F. Ruggeri, R. Tempone and P. Vilanova, “Multiscale Modeling of

Wear Degradation in Cylinder Liners”, SIAM Multiscale Modeling and Simula-
tion, Vol. 12, Issue 1 (2014).
C. Bayer, A. Moraes, R. Tempone and P. Vilanova, “Forward-Reverse Rep-

resentation for Stochastic Reaction Networks with Applications to Statistical
Inference”, preprint (2015).

PHD Thesis

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

PHD Thesis

Caricato da

Copyright:

Formati disponibili

Simulation and Statistical Inference of Stochastic

Reaction Networks with Applications to Epidemic

In Partial Fulfillment of the Requirements

For the Degree of

(Applied Mathematics and Computational Science)

King Abdullah University of Science and Technology (KAUST),

Thuwal, Makkah Province,

Kingdom of Saudi Arabia

Copyright January 2015

All Rights Reserved

Committee Chairperson: Dr. Raúl Tempone

Committee Member: Dr. Omar Knio

Committee Member: Dr. Marc Genton

Committee Member: Dr. Fabrizio Bisetti

Committee Member: Dr. Boualem Djehiche

Committee Member: Dr. Michael Giles

Simulation and Statistical Inference of Stochastic Reaction

Networks with Applications to Epidemic Models

Examination Committee Approval 2

1 Stochastic Reaction Networks 19

2 Stochastic Epidemic Models 47

4 Concluding Remarks 106

II Included Papers 111

5 Hybrid Cherno↵ Tau-leap 112

6 Multilevel Hybrid Cherno↵ Tau-leap 169

7 A multilevel adaptive reaction-splitting simulation method for stochas-

8 Multiscale Modeling of Wear Degradation in Cylinder Liners 284

9 The Forward-Reverse Algorithm for Stochastic Reaction Networks

A Brief Review of Probability and Random Processes 365

CDF Cumulative Density Function

FREM Forward-Reverse Expectation Maximization

GTT Gene Transcription and Translation

ODE Ordinary Di↵erential Equations

PDF Probability Density Function

SDE Stochastic Di↵erential Equation

3.1 Cherno↵ Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.1 Cherno↵ Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.1 Synchronization Horizons . . . . . . . . . . . . . . . . . . . . . . . . . 189

8.1 Data of cylinder liners . . . . . . . . . . . . . . . . . . . . . . . . . . 286

9.1 Illustration of the forward-reverse path simulation in Phase II . . . . 335

3.1 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.1 One-step switching rule summary . . . . . . . . . . . . . . . . . . . . 139

6.1 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

7.1 Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Stochastic Reaction Networks

Stochastic reaction networks (SRNs) are continuous-time Markov chains intended to

1.1 Definitions and Terminology

Given an underlying probability space, (⌦, F, P) 1 , a SRN is stochastic process in

P X(t + dt) = x + ⌫j X(t) = x = aj (x) dt, j = 1, 2, . . . , J and (1.1)

where cj is a positive constant for j = 1, 2, . . . , J, and 1A is the indicator function

Remark 1.1.1 (Random Time-change Representation). Let X be a SRN defined

where Yj : R+ ⇥⌦ ! Z+ are independent unit-rate Poisson processes. The represen-

P Y (t + dt) = y + 1 Y (t) = y = dt.

1.3 The Master Equation

Consider the probability mass function, px (t) := P X(t) = x X(0) = x0 , de-

+ P X(t) = x X(0) = x0 P X(t + h) = x X(t) = y

A SRN {X(t)}t 0 is a continuous-time Markov process and therefore, it can be charac-

The infinitesimal generator of {T (t)}t 0 is defined by:

The Dynkin formula [2, 12] states that

In the particular case of SRNs, we obtain the following identity:

P dpx (t) dE [f (X(t))]

Example 1.5.1. By taking a linear observable f (x) = x, Dynkin’s formula (1.10)

1.6 The Backward Kolmogorov Equation for SRNs