Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Thesis by
Alvaro Moraes
Doctor of Philosophy
by Alvaro Moraes
ABSTRACT
Alvaro Moraes
Epidemics have shaped, sometimes more than wars and natural disasters, demo-
graphic aspects of human populations around the world, their health habits and their
economies. Ebola and the Middle East Respiratory Syndrome (MERS) are clear and
current examples of potential hazards at planetary scale.
During the spread of an epidemic disease, there are phenomena, like the sudden
extinction of the epidemic, that can not be captured by deterministic models. As a
consequence, stochastic models have been proposed during the last decades. A typical
forward problem in the stochastic setting could be the approximation of the expected
number of infected individuals found in one month from now. On the other hand, a
typical inverse problem could be, given a discretely observed set of epidemiological
data, infer the transmission rate of the epidemic or its basic reproduction number.
Markovian epidemic models are stochastic models belonging to a wide class of pure
jump processes known as Stochastic Reaction Networks (SRNs), that are intended to
describe the time evolution of interacting particle systems where one particle interacts
with the others through a finite set of reaction channels. SRNs have been mainly
developed to model biochemical reactions but they also have applications in neural
networks, virus kinetics, and dynamics of social networks, among others.
4
This PhD thesis is focused on novel fast simulation algorithms and statistical
inference methods for SRNs.
Our novel Multi-level Monte Carlo (MLMC) hybrid simulation algorithms provide
accurate estimates of expected values of a given observable of SRNs at a prescribed
final time. They are designed to control the global approximation error up to a
user-selected accuracy and up to a certain confidence level, and with near optimal
computational work. We also present novel dual-weighted residual expansions for fast
estimation of weak and strong errors arising from the MLMC methodology.
Regarding the statistical inference aspect, we first mention an innovative multi-
scale approach, where we introduce a deterministic systematic way of using up-scaled
likelihoods for parameter estimation while the statistical fittings are done in the base
model through the use of the Master Equation. In a di↵erent approach, we derive
a new forward-reverse representation for simulating stochastic bridges between con-
secutive observations. This allows us to use the well-known EM Algorithm to infer
the reaction rates. The forward-reverse methodology is boosted by an initial phase
where, using multi-scale approximation techniques, we provide initial values for the
EM Algorithm.
5
ACKNOWLEDGEMENTS
Foremost, I would like to thank my supervisor, Raúl Tempone, for his outstanding
scientific advise and his permanent care on providing an ideal atmosphere for doing
research. To Pedro Vilanova, for our long-standing collaboration and friendship.
To my collaborators Christian Bayer and Fabrizio Ruggeri. To Anders Szepessy,
Jesper Oppelstrup, Erik von Schwerin, Håkon Hoel and Georgios Zouraris, for many
interesting scientific discussions and support during my PhD studies. To the past and
present members of the KAUST Stochastic Numerics Research Group and the SRI
Uncertainty Quantification Center, specially to Omar Knio, Olivier Le Maı̂tre and
Serge Prudhomme. To the members of my PhD thesis committee, for their feedback
and constructive criticism. To Carlos Castillo-Chavez, for his advise and generosity,
and to all his team in the MTBI program. To Boualem Djehiche, for sharing his
deep insights on stochastic analysis. To Petr Plecháč, for hosting me twice at Oak
Ridge National Lab. To Peter Glynn and Gerardo Rubino, for generously sharing
their vision about critical parts of my PhD research.
To my mentors and teachers in probability and statistics, Enrique Cabaña, Marco
Scavino, Gonzalo Perera and Ernesto Mordecki. To my friends and colleagues in the
University of the Republic, Jorge Graneri, Gustavo Guerbero↵, Franco Robledo and
Claudio Risso who always encouraged me to pursue my dreams.
To Leticia Garcı́a and Amal El Euch for their extraordinary efficiency, care and
constant help regarding administrative and logistic issues.
Last but not least, this PhD thesis is dedicated to my wife Estela and my children
Bruno and Juana, for their unconditional patience and love.
6
TABLE OF CONTENTS
Abstract 3
Acknowledgements 5
List of Abbreviations 13
List of Figures 14
List of Tables 17
I Introductory Chapters 18
References 63
REFERENCES . . . . . . . . . . . . . . . . . 63
3 Overview of Articles 66
3.1 Overview of Article I . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.1.1 The Cherno↵ Tau-Leap Method . . . . . . . . . . . . . . . . . 67
3.1.2 Our Hybrid Switching Rule . . . . . . . . . . . . . . . . . . . 68
3.1.3 Global Error Control . . . . . . . . . . . . . . . . . . . . . . . 71
3.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2 Overview of Article II . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2.1 Coupling Two Hybrid Paths . . . . . . . . . . . . . . . . . . . 74
3.2.2 Global Error Control . . . . . . . . . . . . . . . . . . . . . . . 76
3.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3 Overview of Article III . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.3.1 Optimal-work Splitting Rule . . . . . . . . . . . . . . . . . . . 79
3.3.2 Coupling Two Mixed Paths . . . . . . . . . . . . . . . . . . . 80
3.3.3 A New Control Variate . . . . . . . . . . . . . . . . . . . . . . 81
3.3.4 Global Error Control . . . . . . . . . . . . . . . . . . . . . . . 81
3.3.5 An illuminating example: A holding company . . . . . . . . . 82
3.3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.4 Overview of Article IV . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.4.1 What is wear? . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4.2 The Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.4.3 The Base Model . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.4.4 A Gaussian moment expansion . . . . . . . . . . . . . . . . . 91
8
3.4.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5 Overview of Article V . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5.1 A Two-phase Algorithm . . . . . . . . . . . . . . . . . . . . . 98
3.5.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
References 105
REFERENCES . . . . . . . . . . . . . . . . . 105
References 109
REFERENCES . . . . . . . . . . . . . . . . . 109
Appendices 164
Appendix 5.A Expected number of tau-leap steps of a hybrid trajectory . 164
References 166
REFERENCES . . . . . . . . . . . . . . . . . 166
Appendices 231
10
References 241
REFERENCES . . . . . . . . . . . . . . . . . 241
Appendices 278
References 282
REFERENCES . . . . . . . . . . . . . . . . . 282
References 308
REFERENCES . . . . . . . . . . . . . . . . . 308
Appendices 359
References 362
REFERENCES . . . . . . . . . . . . . . . . . 362
Appendices 365
13
LIST OF ABBREVIATIONS
EM Expectation Maximization
ME Master Equation
MGF Moment Generating Function
MLE Maximum Likelihood Estimator
MLMC Multi-level Monte Carlo
MNRM Modified Next Reaction Method
LIST OF FIGURES
2.1 SIRdiagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2 SIR stencil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3 SIR 3 paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.4 SIR Mean Field and SSA paths . . . . . . . . . . . . . . . . . . . . . 52
2.5 Mean of Langevin paths for the SIR model . . . . . . . . . . . . . . . 55
2.6 SISdiagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.7 SEIRdiagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.8 SIRDEMdiagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.9 SIR with Demography . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.10 Mixed paths for the SIR model . . . . . . . . . . . . . . . . . . . . . 59
2.11 SIR example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.1 Predicted work versus the estimated error bound and details for the
ensemble run of the phase II algorithm . . . . . . . . . . . . . . . . . 271
7.2 Estimated weak error, ÊI,` , as a function of the time mesh size, h . . . 272
7.3 Percentage of the statistical error over the total error . . . . . . . . . 273
7.4 The one-step exit probability bound . . . . . . . . . . . . . . . . . . . 273
7.5 T OL versus the actual computational error . . . . . . . . . . . . . . . 274
7.6 Predicted work (runtime) versus the estimated error bound . . . . . . 276
LIST OF TABLES
2.1 Values generated by the FREM Algorithm for the SIR model. . . . . 60
9.1 Values computed by the FREM Algorithm for the decay example. . . 347
9.2 Values computed by the FREM Algorithm for the wear example. . . . 351
9.3 Values computed by the FREM Algorithm for the birth-death example. 354
9.4 Values computed by the FREM Algorithm for the SIR model. . . . . 354
18
Part I
Introductory Chapters
19
Chapter 1
1
Appendix A contains a brief review of Probability and Random Processes
21
Formula 1.1 means that the probability of observing a jump of the process, X, from
the state x to the state x + ⌫j , caused by the firing of the j-th reaction channel,
Rj , during the infinitesimal time interval, (t, t + dt], is proportional to the length
of the time interval, dt, with aj (x) as the constant of proportionality. Even more,
due to the memory-less property of Markov processes, given that X is at state x
at time t, the time to the next reaction is exponentially distributed with parameter
P
a0 (x) := Jj=1 aj (x).
Every reaction channel, Rj , must satisfy the non-negativity assumption: if x 2 Zd+
/ Zd+ , then, aj (x) = 0, i.e., the system can never produce negative
but x + ⌫j 2
population values.
The jargon (e.g. “species” and “reaction channels”) is taken from the theory of
chemical kinetics. For instance, the stoichiometric vectors, ⌫j , and propensity func-
tions, aj , are known as the transition vectors and the intensity functions, respectively,
in the theory of Markov processes. Another very important feature taken from the
theory of stochastic chemical kinetics is the called stochastic mass-action kinetics
principle, which provides a mathematical model for the reaction channels, Rj . The
stochastic mass-action kinetics principle is usually represented by a diagram like:
cj
↵j,1 S1 + · · · + ↵j,d Sd * j,1 S1 + · · · + j,d Sd , (1.2)
implying that, when the j-th reaction channel, Rj = (⌫j , aj (x)), fires in the infinitesi-
mal time interval, (t, t + dt], and the process X is at the state x at time t, the number
of particles of the species, Si , changes from xi to xi ↵j,i + j,i . More specifically,
the relation (1.2) implies that, when the reaction Rj takes place, ↵j,i molecules of
the species Si are consumed and j,i are produced. Thus, ↵j,i 2 Z+ , j,i 2 Z+ but
j,i ↵j,i can be a negative integer. In this case, the vectors, ⌫j , in the equation
(1.1) are defined by ⌫j := ( j,1 ↵j,1 , . . . , j,d ↵j,d ) 2 Zd and the propensities, aj ,
22
appearing in the right hand side of the equation (1.1) are defined by
d
Y xi !
aj (x) := cj 1{x ↵j,i } , (1.3)
i=1
(xi ↵j,i )! i
J
X ✓Z t ◆
X(t) = x0 + ⌫ j Yj aj (X(s)) ds , (1.4)
j=1 0
In this section, we present a few examples of SRNs. For each example, from the
chemical-kinetics diagrams, we derive the reaction channels: Rj = (⌫j , aj (x)).
1
Example 1.2.1 (Standard Poisson Process). Consider the diagram, ; * I, that
corresponds to a process, Y , taking values in Z+ , with a unique reaction channel,
R = (1, 1). The process evolves in time by jumps of size 1, and with constant intensity,
a(y) ⌘ 1, y = 0, 1, 2, . . .. Therefore, Y , is defined by:
If Y (0) = 0 then Y is a standard Poisson process with rate = 1 on the real line.
ty
In this case, we know that P (Y (t) = y) = exp( t) , y = 0, 1, 2, . . ., i.e., Y (t) ⇠
y!
Poisson (t) (see Section A.3.1).
µ
Example 1.2.2 (Exponential Decay). Let us consider the diagram I * ; which
corresponds to a SRN with a unique reaction channel R = ( 1, µ i). If the state of
the process X at time t is X(t) = i, the holding time to the next reaction is given by
an exponential random variable with expected value (µ i) 1 , implying that the system
slows with the pass of the time. If we set X(0) = x0 then, it can be proved using the
Dynkin formula (1.10) that E [X(t)] = x0 exp( µ t).
Example 1.2.3 (Birth and Death Processes). Let us consider a stochastic process X
in the set of non-negative integers, Z+ , such that for the state i we have two possible
transitions: i ! i+1 (birth) and i ! i 1 (death, when i > 1).
Now, let us list few cases of birth and death processes derived from SRNs:
µ
1. ; * I and I * ;. It means that there are two reaction channels, R1 =
(1, ) and R2 = ( 1, µ i). The first channel, R1 , means that newborns are
24
introduced in the population at constant rate . The second reaction channel,
R2 , corresponds to a death process driven by an exponential decay with rate µ.
µ
2. The diagrams I * 2 I and I * ; correspond to the reaction channels R1 =
(1, i) and R2 = ( 1, µ i). Here both propensities are linear functions of the
state of the process, i.
µ
3. The diagrams I * 2 I and 2 I * I correspond to the reaction channels, R1 =
(1, i) and R2 = ( 1, µ i(i 1)). Observe that the second reaction happens with
zero probability when i < 2.
In this context, a common place problem is the estimation of the expected value
of a given functional, g, of the SRN, X, at a certain time T , i.e., E [g(X(T ))] with a
prescribed accuracy and up to a certain confidence level.
The Master Equation, also known as the Chemical Master Equation (CME), is a linear
system of ODEs for P X(t) = x X(0) = x0 , i.e., the probability of the process, X,
being in the state x at time t, having departed from the state x0 at time 0.
Since X is a Markov chain, it satisfies the Chapman-Kolmogorov equation: for
any s 2 (0, t)
X
P X(t) = x X(0) = x0 = P X(t) = x X(s) = y ⇥ (1.5)
y2Z+
d
⇥ P X(s) = y X(0) = x0 .
X
px (t + h) = P X(t) = y X(0) = x0 P X(t + h) = x X(t) = y
y
X
= P X(t) = x ⌫j X(0) = x0 P X(t + h) = x X(t) = y
j
which gives the Master Equation for SRNs (see [8, 9, 10, 11]):
8
> dpx (t) px (t + h) px (t)
>
> = lim
>
< dt h!0 h
P P
= (1.6)
> j px ⌫j (t)aj (x ⌫j ) px (t) j aj (x)
>
>
>
: p (0) = 1
x x=x0 ,
dP (t)
which is a linear system of ODEs, = AP (t), where: i) the square matrix A is
dt
sparse, ii) it has the size of the state space of the process, X, and iii) its entries are
the propensities, aj , which in most cases are nonlinear functions of the states of X.
In principle, every question about the time-evolution of a SRN can be answered by
solving its corresponding Master Equation, but a general analytic solution for (1.6) is
in general too costly in terms of computational work [11]. Since numerical solutions
are computationally feasible for relatively non-sti↵ small systems, those questions can
be addressed only by Monte Carlo methods. Any Monte Carlo method applied to SRN
depends on simulating paths of the process, X, in the time interval [0, T ]. Developing
fast and accurate methods for simulating paths of SRNs is a major motivation in this
PhD thesis.
26
1.4 The Infinitesimal Generator of a SRN
of operators by
⇥ ⇤
T (s)f (X(t)) := E f (X(t + s)) FtX .
1
LX (f ) := lim+ {T (t)f f }, (1.7)
t!0 t
where the domain of the di↵erential operator, LX , is the set of functions, Dom(f ), in
which the limit of the right hand side of (1.7), exists.
Since X, is a Markov process, we know that conditioning on the filtration gen-
erated by the process X up to the time t, FtX , is equivalent to conditioning on the
current state of the process X(t) = x. Then, we have,
1 1 ⇥ ⇤
{T (h)f (x) f (x)} = E f (X(t + h)) X(t) = x f (x)
h h !
1 X
= f (y)P X(t + h) = y X(t) = x f (x)
h y
! !
1 X X
= f (x + ⌫j )(aj (x)h + o (h)) + f (x) 1 h aj (x) + o (h) f (x)
h j j
!
X o (h) X o (h)
= f (x + ⌫j )(aj (x) + ) f (x) aj (x) .
j
h j
h
Taking limits when h ! 0+ , we obtain the formula for the infinitesimal generator of
X:
J
X
Lf (x) := aj (x) (f (x + ⌫j ) f (x)) . (1.8)
j=1
27
1.5 Dynkin’s Formula for SRN
Z t
E [f (X(t))] = f (X(0)) + E [Lf (X(s))] ds. (1.9)
0
Z " #
t X
E [f (X(t))] = f (X(0)) + E aj (X(s)) (f (X(s) + ⌫j ) f (X(s))) ds. (1.10)
0 j
Let us prove this particular case: first, multiply both sides of the Master Equation
(1.6) by f (x) and sum over x 2 Zd+ . We obtain (provided all the series converge)
X dpx (t) X X
f (x) = f (x) px ⌫j (t)aj (x ⌫j ) px (t)aj (x)
x
dt x j
XX
(⇤) = f (x)aj (x ⌫j )px ⌫j (t)
j x
XX
f (x)aj (x)px (t)
j x
XX
(⇤⇤) = f (x0 + ⌫j )aj (x0 )px0 (t)
j x0
XX
f (x)aj (x)px (t)
j x
XX
= (f (x0 + ⌫j ) f (x0 ))aj (x0 )px0 (t)
j x0
X X
= px (t) (f (x + ⌫j ) f (x))aj (x)
x j
= E [Lf (X(t))] .
P
Notice that in j f (x)aj (x ⌫j )px ⌫j (t) we are keeping fixed the final state, x, while
P
in j f (x0 + ⌫j )aj (x0 )px0 (t) we are keeping fixed the starting state, x0 , so we conclude
P P
that both sums are not equal. But notice that in (⇤) = j x f (x)aj (x ⌫j )px ⌫j (t)
28
P P 0 0
and (⇤⇤) = j x0 f (x + ⌫j )aj (x )px0 (t) all pair of states, y and z, which are con-
nected by one reaction channel (i.e., y + ⌫j = z for some ⌫j ), are counted once, with
a and p computed on y and f computed on z, thus (⇤) = (⇤⇤).
dE [f (X(t))]
= E [Lf (X(t))] . (1.11)
dt
Formula (1.10) follows integrating both sides of (1.11) in the interval (0, t]
Remark 1.5.2 (Closure techniques). One can be tempted to use Dynkin’s formula to
obtain systems of ODEs for E [f (X(t))] or Var [f (X(t))] (or higher order moments of
f (X(t))), but in most cases, this is only possible by using moment closure techniques
[13, 14].
⇥ ⇤ X
u(t, x) := E f (X(T )) X(t) = x = f (y) P X(T ) = y X(t) = x .
y
X
P X(T ) = y X(t) = x = P X(T ) = y X(t + h) = x + ⌫j (aj (x)h + o (h))
j
X
+ P X(T ) = y X(t + h) = x (1 h aj (x) + o (h)).
j
As a consequence:
Therefore,
@P X(T ) = y X(t) = x
=
@t
X
aj (x) P X(t) = y X(t) = x + ⌫j P X(t) = y X(t) = x .
j
Now, multiply both sides by f (y) and sum over y 2 Zd+ , to obtain
In general, it is not possible to solve in closed form the linear system of ODEs (1.12),
which can be seen as the dual of the Master Equation. Numerical methods for solving
(1.12) is an active area of research [15]. Its unique solution u admits a stochastic
⇥ ⇤
representation, E f (X(T )) X(t) = x , which is a particular case of the more general
Feynman-Kac formula (see [16]). In this PhD thesis we are particularly interested
⇥ ⇤
in estimating u(0, x0 ), through its stochastic representation E f (X(T )) X(0) = x0 ,
using Monte Carlo Methods. It means that we sample independent and identically
distributed approximate paths of X departing from x0 at time zero, evaluate f at the
final time, T , and consider empirical averages of those values.
Approximations to SRNs
In this section, we consider two types of approximations to our SRN models. Re-
garding the scale in which we describe some natural and artificial phenomena, a SRN
model corresponds to the microscopic scale, while its reaction-rate ODEs and the
Langevin-Itô di↵usions approximations correspond to its macroscopic and mesoscopic
scales, respectively [17, 18].
Let LX be the generator of a SRN, X, defined through its reaction channels,
Rj = (⌫j , aj ), j = 1, 2, . . . , J, and with initial state x0 :
X
LX f (x) = aj (x)(f (x + ⌫j ) f (x)).
j
31
A formal Taylor-expansion of f of order one, at the state x, gives:
X
LZ f (x) = aj (x)f 0 (x)⌫j .
j
where the j column of the matrix ⌫ is ⌫j , and a is a column vector with components
aj .
To see this, define the strongly continuous semi-group {T (t)}t 0 of operators by
that is, the value of the solution Z of (1.13) at time t + s, knowing that Z(t) = z.
Consider the quotient
1 1
{T (h)f (z) f (z)} = (f (Z(t + h; Z(t) = z)) f (z))
h h✓ ◆
1 0 dZ(t)
= f (z) + f (z) h + o (h) f (z)
h dt
dZ(t) o (h)
= f 0 (z) + .
dt h
32
+
Then, taking limits when h ! 0 , we obtain
dZ(t)
LZ f (z) = f 0 (z)
dt
= f 0 (z)⌫ (a(Z(t)))
X
= f 0 (z) ⌫j aj (z)
j
X
= aj (z)f 0 (z)⌫j .
j
Remark 1.7.1 (Relation with Dynkin’s formula). In the affine case, i.e., when all
the propensities aj (x) are affine functions of x as well as f ; the reaction-rate ODEs
coincides with the di↵erential equation for E [X(t)] obtained by Dynkin’s formula (see
Example 1.5.1).
In 1970s, T.G. Kurtz [1, 2] proved versions of the law of large numbers and the
central limit theorem relating SRNs, and the associated reaction-rate ODEs (1.13).
These results are intended for density dependent SRNs, i.e., when aj (x) can be written
as n ãj (x/n). The parameter n is a scaling parameter, that could be, for instance, the
initial number of particles in the system. Kurtz consider the limit of a scaled family of
SRNs indexed by n where the sequence of initial states, x0,n , is such that x0,n /n ! x0
as n ! +1. It means that, at least for the first moments of our system, all the species
are in abundance. For that reason, the propensity functions obey the power law known
Q ↵
as deterministic mass-action kinetics principle, i.e., a⇤j (x) := cj di=1 xi j,i ⇡ aj (x) for
large n. In this PhD thesis, we are not particularly interested in asymptotic results,
by the contrary, we are interested in the cases when one or more species (but not all)
are scarce.
Now, let us consider a generator obtained by a formal Taylor-expansion of f of
33
order two, at the state x:
X✓ 1 T 00
◆
0
LY f (x) = aj (x)f (x)⌫j + ⌫j f (x)⌫j ,
j
2
which is the generator of an Itô di↵usion, Y , which satisfies the Langevin stochastic
di↵erential equation (SDE) (see [18]):
8
> p
< dY (t) = ⌫ a(Y (t)) dt + ⌫ diag( a(Y (t))) dW (t), t 2 R+ ,
(1.14)
>
: Y (0) = x0 2 R+ ,
Exact algorithms simulate paths of X that satisfy the defining probabilities (1.1) and
consequently the Master Equation (1.6). Hence, a path simulated by an exact method
has the correct statistical distribution and, as a consequence, only the sampling error
(Monte Carlo error or statistical error) is relevant to estimate quantities of interest
like E [g(X(T ))].
In [20], the algorithm known as SSA or next reaction or Kinetic Monte Carlo or
Feller-Doob algorithm, is popularized by Gillespie for simulating chemical reactions:
1. Set x x0 and t 0
P
2. In state x at time t, compute (aj (x))Jj=1 and the sum a0 (x) = j aj (x).
6. Record (t, x). Return to step 2 if t < T , otherwise end the simulation.
It is based on the idea that, obeying (1.1), the probability for only the j-th reaction
P
firing in (t, t + h), is aj (x) exp( a0 (x)h), where x = X(t) and a0 (x) = j aj (x).
aj (x)
This last expression can be written as the product ⇥ a0 (x) exp ( a0 (x)h). So,
a0 (x)
it can be sampled as the product of two independent random variables: the first
aj (x)
factor, , is the probability of choosing the j-th reaction proportional to its cur-
a0 (x)
rent propensity. The second factor is the density of an exponential random variable
35
with rate a0 (x). Summarizing, the SSA can be deduced as follows: since X is a
continuous-time Markov chain, we have that: a) the holding times (or times between
two consecutive reactions or inter-arrival times) are exponentially distributed, with
rate equals to the sum of the propensities evaluated at the current state, x; b) the re-
action to occur next should be chosen at random such that the probability of choosing
reaction i is proportional to ai (x).
The MNRM is based on the random time-change representation (1.4). The MNRM is
based on the Next Reaction Method (NRM) [21]. It is an exact simulation algorithm
like Gillespie’s SSA but it needs only one exponential random variable per iteration.
The reaction times are modeled with firing times of Poisson processes, Yj , with in-
ternal times given by the integrated propensity functions. The MRNM can be easily
modified to generate paths in the cases where the rate functions depend on time and
also when there are reactions delayed in time. The MNRM is also used in [22] to
couple exact and approximate step in the multilevel Monte Carlo setting. A careful
description of the MNRM due to D. Anderson [23] is made in Section 7.1.2.
At first glance, there is no need for approximate algorithms since we have simple and
easy to implement exact algorithms at hand (e.g. SSA and MNRM). Still, observe that
1
the expected holding time is (a0 (x)) which can be extremely short in some regions
of the state space, making simulations computationally very costly. Large values of
a0 (x) can be caused for instance by a few reaction channels with high propensities in x
(channels with high activity) while the others having small propensity values. In this
case, if there is variability caused by the reaction channels with smaller propensities
then we can not rely on the reaction-rate ODE approximation.
36
Tau-Leap Methods
where Yj are independent unit-rate Poisson processes, we obtain the explicit tau-leap
Rt
method [24] by approximating the integrals, 0 aj (X(s))ds, by forward Euler sums in
a partition, {s0 , s1 , . . . , sN }, of the interval [0, t]; resulting in
J N
!
X X1
X̄(t) = x0 + Yj aj (X̄(sn ))(sn+1 sn ) ⌫ j .
j=1 n=0
⇣P ⌘
N 1
In this case, Nj = Yj n=0 aj (X̄(sn ))(sn+1 sn ) . Since each Yj is a path of a unit-
rate Poisson process, to simulate X̄(t + ⌧ ) conditional on having observed X̄ until
⇣P ⌘
N 1
time t, we can split Yj n=0 aj (X̄(sn ))(sn+1 sn ) + aj (X̄(t)) ⌧ into two terms: i)
⇣P ⌘
N 1
Yj n=0 aj (X̄(sn ))(sn+1 sn ) , which is known because it depends on X̄ up to the
time t, and ii) a Poisson random variable Pj (aj (X̄(t)) ⌧ ) 2 . In this way, we have an
2
A Poisson random variable with rate 1 + 2 can be decomposed as the sum of two independent
Poisson random variables with respective rates 1 and 2 , provided 1 and 2 are non-negative real
numbers.
37
explicit iterative tau-leap method: given X̄(t) = x 2 Zd+ ,
J
X
X̄(t+⌧ ) = x + Pj (aj (x) ⌧ ) ⌫j .
j=1
J
X
X̄(t+⌧ ) = x + Pj (aj (x) ⌧ ) ⌫j
j=1
J
X J
X
=x+⌧ aj (x) ⌫j + (Pj (aj (x) ⌧ ) aj (x) ⌧ ) ⌫j .
j=1 j=1
PJ
2. Simulate: X̄(t + ⌧ ) = x + j=1 Pj (aj (y) ⌧ ) ⌫j , (observe that X̄(t + ⌧ ) 2 Zd+ ).
The increment of the tau-leap method are linear combinations of the stoichiometric
vectors, i.e., X̄(t; ⌧ ) = N1 ⌫1 + . . . + NJ ⌫J . Let us remind that, i) the random
variables Nj , j = 1, 2, . . . , J, are independent and non-negative integer valued, and
ii) the stoichiometric vectors, ⌫j , may have negative components. As a consequence,
a drawback of the tau-leap method is that, it is perfectly possible to jump from
one state x, with non-negative components, to a state y with at least one negative
coordinate, that is, x = X̄(t) 2 Zd+ , but x + / Zd+ .
X̄(t; ⌧ ) 2
38
To remedy this feature many methods have been proposed in the literature, in
general they fall in at least one of the following categories:
4. Post-leap methods: each time y has negative components, record the Poisson
random variables and sample new ones but halving ⌧ . Repeat recursively this
until all the coordinates are non-negative. Then, use the recorded values of the
Poisson variates to sample Poisson bridges (see e.g. [30, 31]).
All the previous categories have their own disadvantages: in the first two, we are
introducing a modeling error that should be taken into account when approximating
quantities of interest; in the pre-leap category, we are not avoiding the problem of
reaching negative populations, just controlling the exit probability; the post-leap
methods can be memory consuming and impractical to perform multilevel Monte
Carlo. In this PhD thesis, we propose a pre-leap method for the explicit tau-leap
scheme, so we do not avoid the possibility of reaching negative populations, but we
introduce an efficient way of controlling the exit probabilities. Even more, when ⌧ is
of the order of the inter-arrival times in the exact method, we switch to the SSA or
the MNRM adaptively. Thus, we propose hybrid path-simulation methods.
39
1.9 Statistical Inference for SRNs
The problem of inferring the coefficients of the propensity functions based on observed
data is relevant in the applications (e.g. condition-based maintenance, design of
chemical reactors, control of epidemic diseases).
Di↵erent techniques can be used depending on the data and the prior knowledge
about the unknown parameters. Regarding the data, it can be continuously observed
(when we observe the time-evolution of the paths of X) or discretely observed (when
there is a separation between consecutive observation of one path of X). Let us
remark that SRNs are pure jump processes and for that reason they are constant
between to consecutive jumps, and the jumps are events of Poisson type. For that
reason, by knowing the jump times and its respective jump vectors, we have complete
observation of the path. Observe that this is not possible in other type of stochastic
processes like Itô di↵usions. Finally, the data can be completely or partially observed,
depending whether we observe all the coordinates of X or a fixed subset of them.
Let us derive the likelihood function of a continuously and completely observed path
X(t, !0 ), t 2 [0, T ], !0 2 ⌦. Let us assume that the propensity functions aj can be
written as aj (x) = cj gj (x), 8j = 1, . . . , J and x 2 Zd+ . Assume also that the functions
gj are known, for instance, by the stochastic mass-action kinetics principle. Define
✓ := (c1 , . . . , cJ ) as the vector of unknown coefficients that we have to infer from our
data.
Let us consider the jump times of (X(t, !0 ))t2[0,T ] in (0, T ) as ⇠1 , ⇠2 , . . . , ⇠N 1.
given by
N
Y1
a⌫⇠i (xi 1 ) exp ( a0 (xi 1 ) ⇠i 1 ) ⇥ exp ( a0 (xN 1) ⇠N 1 ). (1.15)
i=1
The last factor in (1.15) is due to the fact that we know that the system will remain
in the state xN 1 in the time interval [⇠N 1 , T ).
N
!N 1
X1 Y
exp a0 (xi ) ⇠i a⌫⇠i (xi 1 ). (1.16)
i=0 i=1
N
X1 N
X1
a0 (xi ) ⇠i + log(a⌫⇠i (xi 1 )). (1.17)
i=0 i=1
By the definition of a0 and the assumption aj (x) = cj gj (x), we can write (1.17) as
N
X1 X
J N
X1
cj gj (xi ) ⇠i + log(c⌫⇠i g⌫⇠i (xi 1 )).
i=0 j=1 i=1
Interchanging the order in the summation and noting the number of times that the
reaction ⌫j occurred in the interval [0, T ] as Rj,[0,T ] , we have
J N
!
X X1 N
X1
cj gj (xi ) ⇠i + log(cj )Rj,[0,T ] + log(g⌫⇠i (xi 1 )). (1.18)
j=1 i=0 i=1
41
Observing that the last term in (1.18) does not depend on ✓, we conclude that for
any particular !0 2 ⌦, the complete log-likelihood of the path (X(t, !0 ))t2[0,T ] is up
to constant terms given by
J
X
c
` (✓) := log(cj )Rj,[0,T ] (!0 ) cj Fj,[0,T ] (!0 ) ,
j=1
Computing the maximum value of `c (✓), i.e., the maximum likelihood estimator
(MLE) of ✓, is a trivial task if we have observed completely and continuously one
or more paths of X. But in general, the data available is discrete and partial. In such
cases, numerical methods like the EM algorithm or its Monte Carlo version [32] can
be applied (EM stands for its two steps, Expectation and Maximization, see 9.3.1 for
a complete description). Other methods are derived from upscaled versions of SRNs,
that is, inference based on reaction-rate ODEs, Langevin di↵usions or stochastic dif-
ferential equations driven by other type of noise, such as Gamma noise.
In this PhD thesis, we present two di↵erent approaches to the problems of es-
timating the vector of unknown coefficients, ✓ = (c1 , c2 , . . . , cJ ), of the propensity
functions, aj (x) = cj gj (x), j = 1, 2, . . . , J. The first approach is an indirect inference
method where for all t, X(t) is treated as a Gaussian random variable where its mean,
2
m(t), and its variance, (t), are adjusted to the data through the matching moments
technique. We also assume independent Gaussian noise in the measurements, and ar-
rive at a penalized, weighted, non-linear least squares problem, that can be solved by
classical, deterministic, optimization methods. It is worth to mention that in this ap-
proach we do not assume the knowledge of any aspect of the set of reaction channels,
42
we methodologically depart from the simplest possible model and add complexity
until find a model that reasonably fit the data and allow us to make predictions. The
second approach is based on the Monte Carlo EM algorithm, where we show how to
estimate expected values of functional of bridges that link consecutive observations.
This approach is based on a forward-reverse representation for bridges and here we
assume that we know the set of reaction channels with the exception of the coeffi-
cients ✓. This representation is derived from the Master Equation and the backward
Kolmogorov’s equations for Markov processes. We refer to Sections 3.4 and 3.5 for
overviews of this two contributions.
[4] J. Karlsson and R. Tempone, “Towards automatic global error control: Com-
putable weak error expansion for the tau-leap method,” Monte Carlo Methods
and Applications, vol. 17, no. 3, pp. 233–278, 2011.
[6] S. Engblom, “On the stability of stochastic jump kinetics,” arXiv preprint
arXiv:1202.3892v6, 2014.
[9] C. Gardiner, Stochastic Methods: A Handbook for the Natural and Social Sciences
(Springer Series in Synergetics). Springer, 2010.
[10] H. Risken and T. Frank, The Fokker-Planck Equation: Methods of Solution and
Applications (Springer Series in Synergetics). Springer, 1996.
44
[11] P. Érdi and G. Lente, Stochastic Chemical Kinetics: Theory and (Mostly) Sys-
tems Biological Applications (Springer Series in Synergetics), 1st ed. Springer,
5 2014.
[14] P. Smadbeck and Y. Kaznessis, “A closure scheme for chemical master equa-
tions,” Proc Natl Acad Sci USA, vol. 110, no. 35, 2013.
[23] D. F. Anderson, “A modified next reaction method for simulating chemical sys-
tems with time dependent propensities and delays,” The Journal of Chemical
Physics, vol. 127, no. 21, p. 214107, 2007.
[25] T. Li, “Analysis of explicit tau-leaping schemes for simulating chemically reacting
systems,” Multiscale Modeling and Simulation, vol. 6, no. 2, pp. 417–436, 2007.
[26] T. Tian and K. Burrage, “Binomial leap methods for simulating stochastic chem-
ical kinetics,” The Journal of Chemical Physics, vol. 121, no. 21, pp. 10 356–
10 364, 2004.
[31] J. Karlsson and R. Tempone, “Towards automatic global error control: Com-
putable weak error expansion for the tau-leap method,” Monte Carlo Methods
and Applications, vol. 17, no. 3, pp. 233–278, March 2011.
46
[32] C. Robert and G. Casella, Monte Carlo Statistical Methods (Springer Texts in
Statistics), 2nd ed. Springer, 2005.
Chapter 2
The theory of Markov Processes and its applications to population processes has
been rigorously developed by Kurtz during the last decades [1]. His results have
been applied by Andersson and Britton in Chapters 5 and 8 of [2] to the context
of Markovian epidemic models. In [3], Greenwood and Grodillo make a concise and
clear exposition of stochastic epidemic modeling that explicitly recognizes the need for
stochastic simulation for models exhibiting mild complexity. The aim of this chapter
is to present the elements of Markovian epidemic models in the context of SRNs where
we have been developing fast and accurate stochastic simulation techniques as well
as inference methods (see Chapter 3).
Let us consider a homogeneous and well mixed population of individuals. This popu-
lation is partitioned into mutually exclusive compartments that describe the possible
di↵erent stages of an epidemic process. In the context of SRNs, we identify the set of
species with the set of compartments while the set of reaction channels is defined by
the natural flow of the epidemic process through its di↵erent stages taking into ac-
count the rates at which individuals move from one compartment to the next. Thanks
to the simplifying assumptions about homogeneity and well mixing of the population,
48
we do not need to add any spatial (either graph or other) structure to our epidemic
models.
Stochastic models have some clear advantages over the deterministic ones. First,
the nature of a contagion contact seems to be more a consequence of chance than
a deterministic phenomenon. Second, deterministic models, like the SIR below, do
not admit the possibility of the sudden extinction of the epidemic or the possibility
of a minor epidemic outbreak. There is a standard list of questions associated with
stochastic epidemic models that does not have a counterpart in the deterministic
setting, including, among others: the probability of extinction, the probability of
observing an outbreak, the distribution of the duration of the epidemic process, the
distribution of the maximum number of infected individuals, the distribution of the
final size of the epidemic, etc.
Simulation studies for epidemic models have been performed extensively during
the last decades, but the increasing complexity of the mathematical models describing
the spread of transmissible diseases and the size of the involved populations make
exact simulation methods, like the SSA, computationally infeasible. Articles I, II
and III in this PhD thesis, introduce novel, fast hybrid path-simulation algorithms
that can be used to compute quantities associated with complex stochastic epidemic
models.
In what follows, we show how to interpret the compartment-rate diagrams, that
are typical in deterministic epidemic models like SRNs, and how to use the tools
presented in Chapter 1.
SI I
S I R
Assume that individuals in the I-class are not only infected but also they are
infectious (or infective), that is, they port the pathogen that causes a certain disease
and they are also able to transmit the disease to susceptible individuals. Contagious
contacts only can happen when an individual from the S-class meets an infective
individual. Infected individuals recovers after an exponentially distributed random
time and gain immunity to the disease becoming and individual of the R-class. A
removed individual is no longer part of the epidemic process. In the language of
SRNs, we have:
I ! R, removal.
I (s 1, i+1)
⌫1
(s, i)
⌫2
(s, i 1)
S
Figure 2.2: The two possible transitions from the state (s, i) in the SIR model. Ob-
serve that the disease-free states (s, 0) are absorbing states.
By the kinetic mass-action principle (see (1.3)), we have two reaction channels: i)
50
contagion, R1 = (⌫1 , a1 (s, i)) = (( 1, 1)T , s i) and, ii) remotion (act of removing),
R2 = (⌫2 , a2 (s, i)) = ((0, 1)T , i). The units of the parameter are [individuals] 2 ⇥
[time] 1 , expressing the transmission rate per capita whereas the parameter is
1 1
expressed in [individuals] ⇥ [time] and it represents the recovery rate.
In the notation used in (1.1), the stochastic process X(t) defining the SIR model
is described by
8
>
< P (X(t + dt) = (s, i) + ( 1, 1)|X(t) = (s, i)) = si dt + o(dt),
X(t) : (2.1)
>
: P (X(t + dt) = (s, i) + (0, 1)|X(t) = (s, i)) = i dt + o(dt).
Figure 2.2 depicts the possible transitions from (s, i). Figure 2.3 shows 3 SSA paths
of the SIR model.
SIR Model (s0,i0)=(99,1) SIR Model (s0,i0)=(99,1)
50
6
45
40 5
35
4
30
Infected
Infected
25
3
20
2
15
10
1
5
0 0
0 20 40 60 80 100 88 90 92 94 96 98 100
Susceptible Susceptible
Figure 2.3: 3 SSA paths. Left: Observe that one of the paths is quickly absorbed
by the disease-free states (s, 0); in this case we do not observe an epidemic outbreak.
Right: details close to the initial point (S0 , I0 ) = (99, 1). Here the population size is
N = 100.
The stoichiometric matrix, ⌫, and the vector of propensities, a, are given by:
0 1T 0 1
B 1 1 C B SI C
⌫=(⌫j )Jj=1 := @ A and a(X):= @ A,
0 1 I
0 1 0 1 0 1 0 1
Z Z
B SM F (t) C B S0 C 1 C 0 C
t t
B B
@ A=@ A+ SM F (u) IM F (u) du @ A+ IM F (u) du @ A,
IM F (t) I0 0 1 0 1
8
>
>
>
> ṠM F (t) = SM F (t) IM F (t)
<
I˙M F (t) = SM F (t) IM F (t) IM F (t) (2.3)
>
>
>
>
: (SM F (t), IM F (0)) = (S0 , I0 ),
In Figure 2.4, we observe that when I0 is small, the mean field approximation
(SM F (t), IM F (t)), given by the solution of (2.3), overestimate the average number of
52
SIR Model (s0,i0)=(99,1) SIR Model (s0,i0)=(90,10)
60 70
Mean field Mean field
SSA−Average SSA−Average
60
50
50
40
40
Infected
Infected
30
30
20
20
10
10
0 0
0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90
Susceptible Susceptible
Figure 2.4: Notice how the mean field path overestimates the mean of the stochastic
SIR model when I0 = 1, but when the initial number of infectives is I0 , the mean
field gives a better approximation. Observe that the mean field seems to give a good
approximation for the mean of the trajectories that escape from disease-free states,
I = 0.
infected individuals, that is IM F (t) E [I(t)] , 8t 2 (0, T ]. In certain sense, the mean
field does not takes into account the number of paths of X that are quickly absorbed
by the disease-free states. The SSA paths are generated with the algorithm described
in Section 1.8.1.
Notice that if we apply the Dynkin formula (see (1.10)) for g(s, i) = i, we have
that
LX g(s, i) = si i,
53
and therefore
dE [I(t)]
= E [S(t)I(t)] E [I(t)] ,
dt
implying that an ODE for E [I(t)] depends on higher order moments, in this case,
E [S(t)I(t)]. This is caused by the nonlinearity of the term a1 (s, i) = s i (see Remark
1.5.2).
dp(s,i) (t)
= (s + 1) (i 1)1{(s+1,i 1)2D} p(s+1,i 1) (t) (2.4)
dt
+ (i + 1)1{(s,i+1)2D} p(s,i+1) (t)
Remark 2.1.1. Due to the structure and sparsity of the coefficient matrix of (2.4),
it may be possible to apply fast numerical method based on, for example, numerical
tensorial linear algebra [7], but we do not follow this approach in this PhD thesis.
54
The Langevin Approximation for the SIR Model
According to (1.14), the Langevin di↵usion approximation, Y (t) = (SChL (t), IChL (t)),
to our epidemic process, X(t), is the following stochastic process driven by the pair
of independent standard Wiener processes (WS (t), WI (t))t2(0,T ] :
8
> p
>
> dS = SIdt + SIdWS
>
<
p p
Y (t) : dI = ( SI I)dt SIdWS + IdWI
>
>
>
>
: IC : (S(0), I(0)) = (s, i).
For sufficiently high reaction rates where the Gaussian random variables are good
approximations for Poisson random variables, the Langevin di↵usion Y is an inter-
esting alternative to SSA-paths since the linear combination of independent Gaussian
random variables is a Gaussian and there are fast Gaussian random number genera-
tors. Figure 2.5 shows the Langevin approximation to X. Tools from the Theory of
Stochastic Di↵erential Equations (SDEs) (see [8]) can be used derive distributions of
some typical quantities associated with epidemic models, such as: i) the basic repro-
duction number, R0 , defined as the average number of infections caused by a single
infective in a susceptible population; ii) the quasi-stationary distribution of infectives,
that is, the limit distribution of the number of infectives conditional on the disease-
free boundary has not been reached; iii) the time to extinction of the epidemic, that
is, the hitting time to the disease-free boundary, iv) final size of the epidemic, which is
defined as, N S(+1), that is, the number of individuals untouched by the epidemic
process. All those quantities are specific to each epidemic model. An application of
this approximation technique, i.e., SRNs approximated by SDEs, can be found in [9]
in the context of alcohol drinking in college campuses across the USA.
55
Figure 2.5: Here we observe that the empirical mean of the Langeving paths is close to
the empirical mean of the SSA paths independently of the initial number of infectives.
This is due to the same boundary phenomenon. KMC stands for Kinetic Monte Carlo,
which is another common name for the SSA.
The SIS model is a particular case of the SIR model which assumes that once one
infected individual recovers, she does not gain immunity and then immediately returns
to the susceptible class, see Figure 2.6. Since S(t) + I(t) = N , the two reaction
SI
S I
I
dE [I(t)] ⇥ ⇤
=( N )E [I(t)] E I 2 (t) , (2.5)
dt
which depends on higher order moments of I(t). At this point, we would like to
remark how straightforwardly we obtained the equation (2.5) for the SRNs machin-
ery developed in Chapter 1. For instance, in Chapter 3 of [10] there is a two-page
derivation of (2.5) based on moment generating functions (MGFs).
While the SIR model is suitable for diseases in which individuals of the I-class are in-
fected and infectives at the same time, there are infectious diseases where a recently in-
fected individual has an exposed period before it may develops symptoms and becomes
infective. Figure 2.7 depicts the compartmental diagram for the SEIR model. The
reaction channels of the SEIR model are: R1 = (⌫1 , a1 (s, e, i)) = (( 1, 1, 0)T , s i),
R2 = (⌫2 , a2 (s, e, i)) = ((0, 1, 1)T , e) and R3 = (⌫3 , a3 (s, e, i)) = ((0, 0, 1)T , i).
SI E I
S E I R
Figure 2.7: SEIR model. An infected individual (E) may become infective (I). The
exposed period is an exponential random variable with rate .
For endemic diseases the scale in which the epidemic develops should account for
demographic e↵ects. In this case, a birth and death process (see 1.2.3) a↵ects the
SIR epidemic process. If the population is in demographic equilibrium, the inflow of
newborns, who we assume are susceptible (no vertical transmission of the disease),
57
should match the outflow due to the deaths which may occur in any compartment.
Figure 2.8 depicts the compartment diagram for the SIR model with demography.
✓N SI I
S I R
✓S ✓I ✓R
The set of reaction channels in this case is (in the 2-dimensional case in which we
do not track the number of removed individuals): i) contagion R1 = (⌫1 , a1 (s, i)) =
(( 1, 1)T , s i), ii) remotion R2 = (⌫2 , a2 (s, i)) = ((0, 1)T , i) , iii) birth R3 =
(⌫3 , a3 (s, i)) = ((1, 0)T , ✓ N ) , iv) death of a susceptible R4 = (⌫4 , a4 (s, i)) = (( 1, 0)T , ✓ s)
and, v) death of an infective R4 = (⌫5 , a5 (s, i)) = ((0, 1)T , ✓ i). Figure 2.9 shows 100
SSA-paths of the SIR model with demography along its mean field and the empirical
average of the SSA-paths.
SIR Model with demography (s0,i0)=(99,1), N=100 SIR Model with demography (s0,i0)=(90,10), N=100
80 80
Mean field Mean field
SSA−Average SSA−Average
70 70
60 60
50 50
Infected
Infected
40 40
30 30
20 20
10 10
0 0
0 20 40 60 80 100 120 140 0 20 40 60 80 100
Susceptible Susceptible
Figure 2.9: SIR with demography, the reaction-rate ODEs vs the mean of its cor-
responding SRN. Left: one initial infected produces a relatively large probability of
a quick absorption by the disease-free states. Right: the empirical average di↵ers
from the mean field by not producing a spiral behavior, again due to the positive
probability of the sudden extinction of the epidemic disease. Observe that the mean
field seems to give a good approximation for the mean of the trajectories that escape
from the disease-free states.
Remark 2.1.2 (Markovian SIR, SIS and SIR with demography). In chapters 5 and
58
8 of the very pedagogical lecture notes by Andersson and Britton [2], a survey of
results regarding Markovian SIR, SIS and SIR with demography are presented. There,
functional laws of large numbers and central limit theorems are derived from the results
of Kurtz [1]. It also contains results on epidemic models obtained by Djehiche, Nåsell,
Ball and many others.
In Articles I, II and III (references [11, 12, 13] respectively), presented in the second
part of this PhD thesis, we developed fast hybrid algorithms for simulating paths of
SRNs. We also developed Multilevel Monte Carlo methods for estimating expected
values of observables of SRNs at some fixed time T , for example, the expected value
and the variance of the number of infected individuals after one month of started the
epidemic process. With the aid of our numerical methods it is possible to estimate
quantities of interest arising from Markovian epidemic models by paths simulation.
Observe that our hybrid paths take values on the lattice Zd+ . This is a desirable
characteristic not shared by the paths generated by Langevin SDEs. In Figure 2.10, we
can see a single SIR Mixed-path generated by Algorithm 25. Observe that, when the
hybrid process is visiting states close to the boundary, an exact method is preferred,
but, sufficiently far from the boundary, the method selects the Cherno↵ tau-leap for
some reactions (generally both in this case), allowing us to take larger time-steps and
therefore, saving computational work.
Regarding the inferential aspects of SRNs, in Articles IV and V (references [14, 15]
respectively) presented in the second part of this PhD thesis, we address the statistical
estimation problem of estimating the coefficients of a given SRNs from collected
discretely observed data. The traditional least squares approach (see Chapter 10 of
[4]) can be viewed as an indirect inference method where goodness-of-fit techniques
59
Mixed path (s0,i0)=(700,1), N=100 Mixed path (s0,i0)=(700,1), N=100
300
35
250
30
200 25
Infected
Infected
20
150
15
100
10
50
5
0 0
0 100 200 300 400 500 600 700 620 630 640 650 660 670 680 690 700
Susceptible Susceptible
Figure 2.10: Left: mixed paths for the SIR model starting at (S0 , I0 , R0 ) = (700, 1, 0).
Right: detail of the paths close to the disease-free boundary.
based on the Master Equation can be applied [14]. Now, we briefly summarize our
results obtained in [15] for the classical SRI stochastic model given by (2.1). In a
SIR problem, consider an initial state X0 = (S0 , I0 , R0 ) = (300, 5, 0), T = 10 and
consider synthetic data generated using the parameters c1 = 1.66 and c2 = 0.44 by
1
observing at uniform time intervals of size t= 16
, without adding observation noise.
The data trajectory is shown in the left panel of Figure 2.11.
Data trajectory
100 1
90
S Initial point phase I
I 0.9
Initial point phase II
80 0.8 Final point phase II
70
0.7
Species count
60
0.6
50
0.5
40
0.4
30
0.3
20
0.2
10
0.1
0
0 2 4 6 8 10 0
Time 0 0.5 1 1.5 2 2.5 3
Figure 2.11: Left: data trajectory for the SIR example. This is obtained by observing
1
the values of an SSA path at regular time intervals of size t= 16 . Right: FREM
estimation (phase I and phase II) for the SIR model.
ˆ
Our FREM estimation gave us a cluster average of ✓=(1.86, 0.43). The FREM
algorithm took p⇤ =3 iterations to converge (minimum imposed). Details can be found
60
in Table 2.1 and right panel of Figure 2.11.
Remark 2.2.1. At this point it is worth mentioning that the distance between the esti-
ˆ
mation ✓=(1.86, 0.43) and the values used for generating the synthetic data (1.66, 0.44)
ˆ
is meaningless. The important one is the distance between our FREM estimation, ✓,
and the true MLE estimate of ✓ (which we do not have).
(p⇤ )
= ✓ˆII,i
(0) (0)
i ⇤ = ✓I,i 3 = ✓II,i
1 (0.40, 0.05) (2.96, 0.66) (1.86, 0.43)
2 (0.40, 1.00) (2.96, 0.66) (1.86, 0.43)
3 (3.00, 0.05) (2.96, 0.66) (1.86, 0.43)
4 (3.00, 1.00) (2.96, 0.66) (1.86, 0.43)
Table 2.1: Values generated by the FREM Algorithm for the SIR model.
demic Models
There are many di↵erent generalizations of the classical SIR model: we have models
for specific diseases, for interaction between populations, for multiple concurrent epi-
demics, for age-dependent contagion-rates, for vaccination and quarantine strategies,
just to mention a few.
The scientific production of Carlos Castillo-Chavez and his collaborators [4, 16,
10, 3, 17, 18, 19, 20, 21, 22] constitutes a major reference for all classes of epidemic
61
models where compartmental models described by systems of ODEs have a privileged
position. In [16], models for Influenza, HIV, Tuberculosis and Sexually Transmitted
Diseases (STDs) are found. Complex diagrams and their associated reaction-rate
ODEs are presented and analyzed using tools of the theory of dynamical systems [23].
Many of these models have an immediate translation into SRNs where simulation
studies can be performed. Fast and efficient simulation methods are required to
deal with complex epidemic models. In our immediate horizon of future work we
would like to mention: implicit hybrid tau-leap schemes, incorporation of the spatial
inhomogeneity, sensitivity analysis of SRNs by dual-based methods and control as
well as keep developing statistical inference techniques.
When collecting epidemic data, we rarely can observe the number of individuals in
all compartments at the same time, especially in complex models. Typically, we can
only count the symptomatic individuals reported by the health authorities. Dr.Anuj
Mubay (ASU) suggested to develop fast statistical methods for partially observed
data in Leishmaniasis models [26].
Estimation of R0
The estimation of the basic reproduction number, R0 , is mainly based on the obser-
vation of the first stage of the epidemic process [27] where it behaves like a branching
process (see A.5). Prof. Carlos Castillo-Chavez (ASU) suggested to estimate R0 in
stochastic models from the final size relation (see Chapter 9 of [4])
✓ ◆ ✓ ◆
S0 S0
log = R0 1 .
S(+1) S(+1)
Remark 2.3.1 (MTBI). The Mathematical and Theoretical Biology Institute (MTBI)
is a research program created and organized by Prof. Carlos Castillo-Chavez that
yearly encourages a diverse and well-motivated group of young undergraduate and
graduate students coming from all around the world to formulate their own research
questions while acquiring an impressive number of skills to answer them with the
help of mentors and outstanding, experienced researchers. The results of the research
projects can be found at http: // mtbi. asu. edu/ research/ archive . This is not
only a valuable source of relevant epidemic models but it also represents a tremendous
source for research opportunities. For a thorough description of MTBI see [28].
63
REFERENCES
[1] S. Ethier and T. Kurtz, Markov Processes: Characterization and Convergence
(Wiley Series in Probability and Statistics), 2nd ed. Wiley-Interscience, 9 2005.
[2] H. Andersson and T. Britton, Stochastic Epidemic Models and Their Statistical
Analysis (Lecture Notes in Statistics), 2000th ed. Springer, 7 2000.
[7] W. Hackbusch, Tensor Spaces and Numerical Tensor Calculus (Springer Series
in Computational Mathematics, Vol. 42), 1st ed. Springer, 2 2012.
[12] ——, “Multilevel hybrid Cherno↵ tau-leap,” Accepted for publication in BIT
Numerical Mathematics, 2015.
[19] ——, Mathematical Approaches for Emerging and Reemerging Infectious Dis-
eases: Models, Methods, and Theory (The IMA Volumes in Mathematics and its
Applications), 2002nd ed. Springer, 5 2002.
65
[20] D. Zeng, H. Chen, C. Castillo-Chavez, W. B. Lober, and M. Thurmond, Eds.,
Infectious Disease Informatics and Biosurveillance (Integrated Series in Infor-
mation Systems), 2011th ed. Springer, 11 2010.
[24] K. P. Hadeler and C. Castillo-Chávez, “A core group model for disease trans-
mission,” Mathematical Biosciences, vol. 128, no. 1, pp. 41–55, 1995.
[25] B. Song, W. Du, and J. Lou, “Di↵erent types of backward bifurcations due
to density-dependent treatments.” Mathematical biosciences and engineering:
MBE, vol. 10, no. 5-6, p. 1651, 2013.
Chapter 3
Overview of Articles
The central subject of this PhD thesis is known under di↵erent names; among the
most common ones we have: stochastic reaction networks (SRNs), chemical reac-
tion kinetics and continuous-time Markovian pure jump processes. For the reader
unfamiliar with this topic, a quick review of SRNs is presented in Chapter 1.
In this work, we have been focused on two di↵erent problems related to SRNs: i)
fast path-simulation and global error control and ii) statistical inference for the set
of reaction coefficients. Problem i) is treated in the first three chapters of the second
part of this thesis (Articles I, II and III), while the problem ii) is addressed in the
last two (Articles IV and V).
The main objective of the problem i) is the following: given a SRN, X, de-
fined though its set of reaction channels, and its deterministic initial state, estimate
E [g(X(T ))], that is the expected value of a scalar observable, g, of the process, X,
at a fixed time, T . This problem lead us to define a series of Monte Carlo estimators,
M, that with high probability can produce values close to the quantity of interest,
E [g(X(T ))]. More specifically, given a user-selected tolerance, T OL, and a small
confidence level, ⌘, find an estimator, M, based on sampled paths of X, such that,
The author contributed to the theoretical sections of the paper and specially
with the formulation of the Cherno↵ bound. This work has been presented by
the author in the ECCOMAS conference, Sept 2012, Vienna - Austria.
In this article, we present a novel, adaptive, hybrid algorithm for simulating paths
of SRNs. It is hybrid because, at each step, our algorithm decides between the SSA
(see Section 1.8.1) and the Cherno↵ tau-leap method.
We develop in this article a pre-leap method (see Section 1.8.2) for controlling, but
not avoiding, the one-step exit probability consequence of the tau-leap method, i.e.,
let x = X̄(t) be the tau-leap approximation of X(t), then, the value of X̄ at the
P
next leap of size ⌧ is given by X̄(t + ⌧ ) = x + Jj=1 Pj (aj (x) ⌧ ) ⌫j , where Pj ( j ), j =
1, 2, . . . , J, are independent Poisson random variables with rates j , respectively. Note
P
that Jj=1 Pj (aj (x) ⌧ ) is a linear combination of the stoichiometric vectors, ⌫j , with
unbounded coefficients. If any ⌫j has at least one negative component, then, there
is a positive probability that the state X̄(t + ⌧ ) has negative components too. This
probability is clearly a function of x and ⌧ . In this article, we address the problem of,
68
given / Zd+ X̄(t) = x <
> 0, finding the largest ⌧ ⌘ ⌧ (x, ) such that, P X̄(t + ⌧ ) 2
. We develop a Cherno↵-type of bound (see Section A), ChBnd(x, ⌧ ) for a linear
combination of independent Poisson random variables. The function ChBnd(x, ⌧ )
satisfies:
/ Zd+ X̄(t) = x ChBnd(x, ⌧ ) . .
P X̄(t + ⌧ ) 2 (3.1)
Let ⌧Ch be the largest value of ⌧ satisfying (3.1). Figure 3.1 depicts the ChBnd in
a simple decay example. The Cherno↵ bound has a closed analytic expression only
in cases where there is a single reaction channel. In this work, we introduce a fast
numerical algorithm for approximating the value ⌧Ch (since its exact value involves
solving a transcendental equation).
0.1
0.01
Klar !1D"
0.001
Chernoff ! this work"
10 ! 4 Poisson ! exact "
Gaussian ! approxim ation "
10 ! 5
10 ! 6
4 6 8 10
Λ
Figure 3.1: Let n =⇣ 10 and 2 (2, 10). ⌘Semi-logarithmic plot of P (Q( ) n)
ChBnd(n, ) = exp n(1 log(n/ ) ) . See Klar’s bound in [1] and Gaussian
approximation in [2].
The way in which both methods are blended in one path depends on a cost-based
switching rule. This rule takes into account the time-mesh and the one-step exit
probabilities associated with the tau-leap method.
69
The decision rule is as follows: given the current time, t, the current state, x,
the next mesh point, Tk and a given one-step exit probability bound, ; our hybrid
algorithm makes its choice by comparing the expected computational cost of reaching
Tk from t between the SSA and Cherno↵ tau-leap methods. In this way, our hybrid
method induces a natural partition of the state space of the process X into two
regions, one for the SSA and the other for the tau-leap. It turns out that the SSA-
region is close to the boundaries, where the probability of the tau-leap step should be
very small to control the one-step exit probability. Figure 3.1.2 depicts this fact for
the Gene Transcription and Translation (GTT) example, described in 5.5.2. Observe
that in the left panel of the Figure 3.1.2 the SSA region has few points of the form
(0, y) with small y. It means that there is at least one reaction channel pushing
the process X in the direction of the vector (0, 1) such that, for the states (0, y)
with small y, and for the given time-mesh, the tau-leap method has a one-step exit
probability greater than = 10 2 . In the central panel of the same figure, the SSA
region incorporates many states of the form (1, y) and (x, 1), but not (0, y) or (x, 0)!.
In the states (1, y), there is a reaction pushing out the lattice in the direction ( 1, 0),
which is not active in (0, y) because when the process is at the boundary, the reactions
pushing out of this boundary are inactive; analogously in the (x, 1) case.
Chernoff tau−leap and SSA regions Chernoff tau−leap and SSA regions Chernoff tau−leap and SSA regions
40 40 40
Tau−leap Tau−leap Tau−leap
35 SSA 35 SSA 35 SSA
30 30 30
25 25 25
Proteins
Proteins
Proteins
20 20 20
15 15 15
10 10 10
5 5 5
0 0 0
0 10 20 30 40 0 10 20 30 40 0 10 20 30 40
mRNA mRNA mRNA
Figure 3.2: Regions of the one-step switching rule in the Gene Transcription and
Translation model (see Section 5.5.2). The blue and red dots show the Cherno↵
tau-leap and the SSA regions, respectively. From left to right, = 10 2 , 10 4 , 10 6 ,
respectively.
70
We observe in 5.A that, when the size of the time-step or the parameter goes
to zero, then, the hybrid method decides for the SSA. This implies that the expected
work of a hybrid path remains bounded by the expected computational work of one
SSA path.
Let us describe the hybrid algorithm in more detail: When ⌧Ch is of the same
order as the expected inter-arrival time of the SSA, ⌧SSA = (a0 (x)) 1 , it is convenient
to take an exact step instead of a tau-leap step. In this way, we arrive at a hybrid
(exact-approximate) algorithm (Algorithm 1) that adaptively switches between the
SSA and the Cherno↵ tau-leap method by choosing the method that moves forwards
faster per unit cost.
Algorithm 1 Let x be the state of our hybrid path at time t. Let Tk be the next
grid point. K1 is the cost of computing ⌧Ch (x) divided by the cost of taking an SSA
step. K2 = K2 (x, ) is the cost of taking a Cherno↵ Tau-leap step divided by the
cost of taking an SSA step plus the cost of computing ⌧Ch (x, ). This cost analysis is
due to the fact that the computational cost of generating Poisson random variables
depends on its rate , see Figure 3.3.
1: Compute ⌧SSA . (A low cost calculation.)
2: if K1 ⌧SSA > Tk t then
3: Use SSA.
4: else
5: Compute ⌧Ch (A more expensive calculation.)
6: if ⌧Ch K2 ⌧SSA then
7: Use Cherno↵ Tau-leap.
8: else
9: Use SSA.
10: end if
11: end if
71
Poisson random variates computational work model Poisson random variates computational work model
−4 −4
x10 x10
Actual simulation runtimes
6 Least squares fit
5
CP(λ)
CP(λ)
4 2
Figure 3.3: Left: The computational work (runtime) model for generating a Poisson
random variate, using the Gamma method by Ahrens and Dieter [3]. Right: Linear
growth detail, for 2 [0, 15].
The global error is defined as the di↵erence E := E [g(X(T ))] M, where the Monte
Carlo estimator, M is
M
1 X
M := g(X̄(T ))1A (!m ).
M m=1
Here A is the event in which the hybrid path, X̄, arrives at the final time T , without
exiting the state space of X, and Ac is its complement. Notice that M is an unbiased
⇥ ⇤
estimator of E g(X̄(T ))1A , but a biased estimator of E [g(X(T ))].
The global error can be decomposed as follows:
⇥ ⇤ ⇥ ⇤
E [g(X(T ))] M = E [g(X(T ))(1A + 1Ac )]
E g(X̄(T ))1A + E g(X̄(T ))1A M=
M ⇥ ⇤ !
⇥ ⇤ X E g(X̄(T ))1A g(X̄(T ))1A
E (g(X(T )) g(X̄(T )))1A + E [g(X(T ))1Ac ] + (!m ).
m=1
M
⇥ ⇤
The first component, E (g(X(T )) g(X̄(T )))1A , is the discretization error, EI . It
depends mostly on the size of the time-step, t. In this article, we introduce a dual-
weighted method for fast estimation of EI . The second component, E [g(X(T ))1Ac ],
72
is named the global exit error, EE . It is controlled by the one-step exit probability
bound, , but it also depends on the expected number of tau-leap steps in a hybrid
path, which depends also on t. The third term of the global error decomposition,
PM ⇥ ⇤
M 1 E g(X̄(T ))1A g(X̄(T ))1A (!m ), is the statistical error, ES . It can be
m=1
controlled by the number of generated hybrid paths, M .
To provide the simulation setting, i.e., the one-step exit probability bound, ,
the time-step, t, and the number of hybrid paths, M , needed for estimating,
E [g(X(T ))], with near optimal computational work, we show in this article a cal-
ibration algorithm design to approximately solve:
8
>
>
> minM, t, M ( t, )
>
<
s.t. . (3.2)
>
>
>
>
: EI + EE + ES T OL
Remark 3.1.1 (On the optimization problem). In fact, the constraint we use in (3.2)
is the sum of three terms: i) T OL2 as an upper bound for |EE | (this is achieved by
controlling ), ii) a dual-weighted estimate of the magnitude of EI , and iii) a term
p
proportional to an estimate of Var [ES ], where the constant of proportionality is
chosen according to the confidence level we want to achieve.
3.1.4 Results
Error bound
−2
10
−2
10
1 2 3 4 1 2 3 4
10 10 10 10 10 10 10 10
Predicted work (runtime, seconds) Work (runtime, seconds)
Figure 3.4: Left: Predicted work (runtime) versus the estimated error bound for the
gene transcription and translation model. The hybrid method is preferred over the
SSA one, for the first two tolerances (larger ones). For the last four tolerances, the
SSA is preferred. Therefore, in the latter case, the total predicted runtime is the
same for the hybrid and SSA methods. Right: Predicted and actual work (runtime)
versus the estimated error bound.
3.1.5 Summary
Our hybrid method allows us (i) to control the global exit error caused by the tau-
leap steps and (ii) to obtain accurate and computable estimates of the expected value
of observables of SRNs with near optimal computational work. Another advantage
derived from the use of a hard bound for one-step exit probabilities is that we do
not need to make any distributional approximation for the tau-leap increments (e.g.
exchanging Poisson random variables for binomial ones) and then, we are not intro-
ducing additional modeling error. It is worth mentioning that, by simulating hybrid
paths, we obtained accurate estimates of the average number of steps required by the
SSA method to reach the final time. This is especially relevant in problems where
the process visits regions of the state space where the total propensity is very high.
This article extends the hybrid Cherno↵ tau-leap method presented in [4] to the
multilevel Monte Carlo (MLMC) setting. Inspired by the work of Anderson and
Higham on the tau-leap MLMC method with uniform time-meshes, we develop a
novel algorithm that is able to couple two hybrid Cherno↵ tau-leap paths at di↵erent
levels. But, unlike the multilevel algorithms proposed by Anderson and Higham, we
do not need to distinguish between biased and unbiased discretization. When our
hybrid algorithm chooses exact paths at the bottom level, we automatically have an
unbiased algorithm.
The levels are given by a hierarchy of L + 1 nested time-meshes of the interval [0, T ],
indexed by ` = 0, 1, . . . , L, such that t0 is the size of the coarsest time-mesh and
`
t` = 2 t0 , ` = 1, . . . , L. Let X̄` (·):=X̄(·; t` , ) be a hybrid Cherno↵ tau-leap
path generated using a time-mesh of size t` and one-step exit probability bound
¯ : X̄` (t) 2 Zd+ , 8t 2 [0, T ]}, and g` := g(X̄` (T )). The MLMC
!2⌦
. Define A` :={¯
estimator proposed in this article, ML , requires to sample from the random variables
[g` g` 1 ](!), that is, the di↵erence between the observable g, computed at the end
of two coupled hybrid paths generated by two consecutive time-meshes of sizes t`
and t` 1 , respectively.
To couple two hybrid paths, we use at each time step 4 algorithms as building
blocks:
75
Block ` 1 ` description
t+¯ t̄
H̄
¯t̄ t+¯
t ¯
H̄ T
Figure 3.5: This figure depicts a particular instance of the Cherno↵ hybrid coupling
algorithm (Algorithm 2), where ⌧¯ < ⌧¯. The synchronization horizon H, defined as
H:= min{H̄, H̄¯ }, is equal to H̄ in this case. Notice that H̄:= min{t + ⌧¯, t̄, T } and
¯ ¯ from t to H.
H̄ := min{t + ⌧¯, t̄¯, T }. Algorithm 2 computes X̄ and X̄
76
Algorithm 2 Inputs: Initial point x0 , coarse and fine meshes, final time T . Outputs:
¯ , in the interval [0, T ].
two coupled hybrid paths, X̄ and X̄
1: Set X̄ ¯
x , X̄ x
0 0
2: Set t H
3: Set t̄ as the smallest coarse mesh point greater than t
4: Set t̄¯ as the smallest fine mesh point greater than t
5: Compute H̄ H(t, X̄, t̄, T )
6: Compute H̄ ¯ ¯ , t̄¯, T )
H(t, X̄
7: while t < T do
8: H ¯}
min{H̄, H̄
9: Select Block and move forwards X̄ and X̄ ¯ from t to H
10: Set t H
11: Set t̄ as the smallest coarse mesh point greater than t
12: Set t̄¯ as the smallest fine mesh point greater than t
13: if H = H̄ then
14: Compute H̄ H(t, X̄, t̄, T ) using Algorithm 3
15: end if
16: if H = H̄ ¯ then
17: Compute H̄ ¯ H(t, X̄ ¯ , t̄¯, T ) using Algorithm 3
18: end if
19: end while
Algorithm 3 Inputs: current time t, current state x, smallest mesh point s greater
than t, and final point T . Outputs: H.
1: Given x, s and t and T , get the method m and ⌧
2: if m is TL then
3: H min{t + ⌧, s, T }
4: else
5: H min{t + ⌧, T }
6: end if
7: return H(t, x, s, T )
M0 L M
1 X X 1 X̀
ML := g0 1A0 (!m0 ) + [g` 1A` g` 1 1A` 1 ](!m` ).
M0 m =1 `=1
M` m =1
0 `
s.t. .
>
>
>
>
: EI,L + EE,L + ES,L T OL
The meaning of these expressions are analogous to the ones described in Article I
(see Remark 3.1.1). For reaching this optimality, we derived novel formulas based
on dual-weighted residual estimations for computing the variance of the di↵erence
of the observables between two consecutive levels in coupled hybrid paths and also
the bias of the deepest level. These formulas are particularly relevant for Stochastic
Reaction Networks since alternative standard sample estimators become too costly
at deep levels because of the presence of large kurtosis.
Of paramount importance is that the computational complexity of our hybrid
MLMC method is of order O (T OL 2 ), that is, the same computational complexity
of an exact method, but with a smaller constant. To put this into perspective, our
algorithm acts as if we were generating exact paths and then use the standard Monte
Carlo method.
Our numerical examples show substantial gains compared to the previous single-
level approach and the SSA.
3.2.3 Results
Predicted work vs. Error bound, Genes model Actual work vs. Error bound, Genes model
−1
−1 10
10 SSA
SSA
Hybrid Hybrid
Hybrid ML Hybrid ML
slope 1/2 slope 1/2
Error bound
Error bound
Asymptotic
−2 −2
10 10
1 2 3 4
10 10 10 10 1 2 3 4
10 10 10 10
Predicted work (runtime, seconds) Actual work (runtime, seconds)
Figure 3.6: Left: Predicted work (runtime) versus the estimated error bound for the
Gene Transcription and Translation model 5.5.2. The hybrid method is preferred over
the SSA for the first three tolerances only. The multilevel hybrid method is preferred
over the SSA and the single-level method for all the tolerances. Right: Actual work
(runtime) versus the estimated error bound.
3.2.4 Summary
In this article, we developed a multilevel Monte Carlo version for the single-level hy-
brid Cherno↵ tau-leap algorithm presented in [4]. We showed that the computational
complexity of this method is of order O (T OL 2 ) and, therefore, that it can be seen
as a variance reduction of the SSA method, which has the same complexity. This
represents an important advantage of the hybrid tau-leap compared the pure tau-
leap in the multilevel context. In our numerical examples, we obtained substantial
gains with respect to both the SSA and the single-level hybrid Cherno↵ tau-leap. The
present approach, like the one in [4], also provides an approximation of E [g(X(T ))]
79
with prescribed accuracy and confidence level, with nearly optimal computational
work.
The author contributed specially to the splitting algorithm and the formulation
of the expected computational work per path. This work has been presented by
the author in the Mathematical, Computational and Modeling Sciences Center
at Arizona State University, June 2014, Tempe - USA.
In this article, we present a novel multilevel Monte Carlo method for kinetic simulation
of stochastic reaction networks that is specifically designed for systems in which the
set of reaction channels can be adaptively partitioned into two subsets: RTL and
RMNRM ; the idea is to find the next state of the system, X̄n+1 , as the current state,
X̄n , plus two increments, TL + MNRM , where TL is a tau-leap increment involving
the reactions in the class RTL and, MNRM is an exact increment produced by the
reactions in the class RMNRM . Adaptivity in this context means that the partition
evolves in time according to the states visited by the stochastic paths of the system.
The partition of the set of reaction channels is based on a heuristic manner of greed-
ily optimizing an objective function defined as the expected computational work of
moving the system from the current time t to the next time-horizon H.
For a reaction j to be in the RTL class there are two simultaneous requirements:
80
A) high propensity aj (x) and B) low probability, ✓j , of reaching a negative population
state (see the precise definition in Equation (7.4)).
We propose to split the sorted set of penalized propensities
In such a case, the objective function has J + 1 values. We propose to reduce this
number to 3 by searching only the current partition and its two neighbors, that is,
if for the k-step of our algorithm, we select the highest p penalized propensities to
be in the tau-leap group, then, in the k + 1-step, we evaluate the objective function
at three partitions: the highest p, p 1 and p + 1 penalized propensities. Algorithm
4 performs the described split. Observe that if the current time is close to the next
grid point, we select a trivial partition (;, R, ), that is, we take an exact step.
Algorithm 4 The one-step mixing rule. Inputs: the current state of the approximate
process, X̄(t), the current time, t, the values of the propensity functions evaluated
at X̄(t), (aj (X̄(t)))Jj=1 , the one-step exit probability bound , the next grid point,
T̃ , and the previous optimal split, . Outputs: the tau-leap set, RTL , the exact set,
RMNRM , and the new optimal split .
PJ
Require: a0 j=1 aj > 0
1: if K1 /a0 < T̃ t then
2: Compute ✓j , j=1, .., J (see (7.4))
3: ã (j) Sort{(1 ✓j )aj } descending, j=1, .., J
4: Si Compute the splits taking into account the previous optimal split
5: (RTL , RMNRM , ) Take the minimum work split
6: return (RTL , RMNRM , )
7: else
8: return (;, R, )
9: end if
B2, B3 and B4, that we use as building blocks. Table 3.1 summarizes them. In order
R̄TL R̄MNRM
¯
R̄ B1 B2
TL
¯
R̄ B3 B4
MNRM
Table 3.1: Building blocks for simulating two coupled mixed Cherno↵ tau-leap paths.
Algorithms B1 and B2 are presented as Algorithms 2 and 3 in [6]. Algorithms B3
and B4 can be directly obtained from Algorithm B2 (see [5]).
to do that, the algorithm computes, independently, the sets RTL and RMNRM for
each level, and the time until the next decision is taken, H, using Algorithm 27 in
Section 7.6. Next, it computes concurrently the increments due to each one of the
sets (storing the results in X̄ and ¯ for the coarse and fine grid, respectively).
X̄
We note that the only case in which we use a Poisson random variates generator for
the tau-leap method is in Algorithm B1 (Algorithm 28 in Section 7.6).
We consider a system with hundreds of species and reactions but still easy to repro-
duce due to its simple structure. This example is intended to show the advantages
of our multilevel adaptive reaction-splitting technique over the multilevel tau-leap
proposed by Anderson and Higham in [6] which is regarded as the state of the art. To
this end, to make the comparison clear and fair, we do not use the MATLAB feature
that allows to call a batch of Poisson deviates, because this feature is not present
in programming languages like C or FORTRAN. Remember that coupling two simu-
lated paths in two consecutive time-meshes is essential in the multilevel Monte Carlo
methods (see Section A.7).
Example description
Consider a closed system (economy) formed by one big particle (the holding company)
and N small particles (business units). Each business unit obtains its funds by trading
with the environment (represented by the empty set) and it makes net transfers to the
holding company according to its current money level. At the same time, the holding
company pays dividends to the environment, and from time to time, its money level
is increased by its own investments. Let us consider the following particular case:
Each business unit obtains its funds from the environment at constant rate and
makes net transfers to the holding company at a rate proportional to its current
83
money level,
k k
; ! X1 , ..., ; ! XN
c c
X1 ! Y, ..., XN ! Y
The holding company pays dividends to the environment, and receives returns
from its own investments,
a b
Y ! ;, Y ! 50 Y
that is, with a rate proportional to its current money level, the holding company
increases its money level by 49 units.
0 1tr
1 0 ... 0
B C
B C
B 0 1 ... 0 C
B C
B . .. C
B .. ... . C
B C
B C
B C
B 1 0 ... 1 C
Stoichiometric matrix, ⌫ = BB
C
C
B 0 1 ... 1 C
B C
B . .. C
B .. ... . C
B C
B C
B C
B 0 0 ... 1 C
@ A
0 0 ... 49
(2(N +1)⇥N +1)
4
x 10
10.2 2
1.8
10.1
1.6
1.4
10
1.2
9.9 1
0.8
9.8
0.6
0.4
9.7
0.2
9.6 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
Figure 3.7: Left panel: One SSA path of the money level of the holding company.
Right panel: five SSA paths of the business units. Observe that this example is far
from a deterministic mean field approximation and has relevant stochastic behavior.
In this section, we present results showing the computational work involved in gener-
ating sample paths. We compare single-level path generation: i) vanilla tau-leap, with
ii) our mixed method using uniform time-meshes, and then, we compare coupled-level
path generation: iii) Anderson and Higham’s coupled TL-TL method with iv) our
coupled mixed method using uniform time-meshes.
Since we are just comparing discretization schemes, to make fair comparisons, we
do not control the exit error for the generation of mixed paths, so there is no Cherno↵
bound cost involved.
The savings in computational work when generating Poisson random variables
heavily depend on MATLAB’s performance. For example, we do not generate the
random variates in batches nor we use any “vectorization” advantage. In fact, we
should expect better results from our method if we implement our algorithms in
more performance-oriented languages or if we sample Poisson random variables in
batches.
85
As a baseline to compare, the average work (runtime) per path of the SSA is
17 seconds.
The Anderson and Higham unbiased scheme, that involves the generation of
two coupled paths, where the tau-leap one uses a time-mesh with t = 0.0625,
and the other is generated using the exact Modified Next Reaction Method by
Anderson, has an average work per path (runtime) of 74 seconds. This makes
the unbiased approach not attractive and therefore it is not considered in this
comparison any further.
We compute the average, minimum and maximum value over a batch of 5 runs.
Note that our Mixed and Coupled Mixed path (with constant t) both have zero
observed exited paths.
86
TL
Mixed
A&H Coupled
101 Mixed Coupled
100
Note that our Mixed and Coupled Mixed path (with constant t) have zero exited
paths.
In this section we compare our Cherno↵ Mixed ML method, which controls the global
approximation error and in particular the e↵ect of the exit error against SSA, for
di↵erent levels of T OL. Work is measured in runtime, seconds.
Work vs. Error bound, NPLAY2 model Work vs. Error bound, NPLAY2 model
SSA SSA
Mixed ML Mixed ML
slope 1/2 slope 1/2
Asymptotic Asymptotic
Error bound
Error bound
10-3 10-3
Figure 3.9: Left: Predicted work (runtime) versus the estimated error bound. Right:
Predicted work (runtime) versus the estimated error bound using the control variate
at level 0 (see Section (7.4)). An additional gain of a multiplicative factor of 50 is
obtained.
3.3.6 Summary
Every mechanical system is naturally subjected to some kind of wear process that,
at some point, will cause failure in the system if no monitoring or treatment process
is applied. Since failures are expensive, it is essential both to predict and to avoid
89
them. To achieve this, a monitoring system of the wear level should be implemented
to decrease the risk of failure. In this work, we take a first step into the development of
a multiscale indirect inference methodology for state-dependent Markovian pure jump
processes. This allows us to model the evolution of the wear level, and to identify
when the system reaches some critical level that triggers a maintenance response.
Since the likelihood function of a discretely observed pure jump process does not
have an expression that is simple enough for standard non-sampling optimization
methods, we approximate this likelihood by expressions from upscaled models of the
data. We use the Master Equation to assess the goodness-of-fit and to compute the
distribution of the hitting-time to the critical level.
“In materials science, wear is erosion or sideways displacement of material from its
‘derivative’ and original position on a solid surface performed by the action of another
surface” (Wikipedia). The wear in the cylinder liner is mainly because of following
reasons:(http://www.marineinsight.com)
The data set consists of wear levels observed on 32 cylinder liners of eight-cylinder
SULZER engines and measured by a caliper with a precision of = 0.05 mm (see
90
Figure 3.10). Warranty clauses specify that the liner should be substituted before it
accumulates a wear level of 4.0 mm, in order to avoid expensive failures.
A motivational question could be: when should we send the ship for maintenance?
4
Wear [mm]
0
0 1 2 3 4 5 6
Operating time [h] x 10
4
Figure 3.10: Due to the caliper’s finite precision, every single measurement of the
wear process, W (t), belongs to the lattice {0, , 2 , . . .}.
where X(0) = x0 is the initial thickness and ✓ = (c1 , c2 , X0 , k) is the vector of unknown
parameters.
inference model
The data x = {xi }ni=1 are modeled according to xi = Z(ti ) + ✏i (indirect inference
2 2
model). Here Z(t) ⇠ N (m(t), (t)), and m(t) and (t) satisfy
8
>
>
>
> dm(t) = (c1 ⌫1 + c2 ⌫2 )m(t)dt,
<
d 2 (t) = (2(c1 ⌫1 + c2 ⌫2 ) 2 (t) + (c1 ⌫12 + c2 ⌫2 2 )m(t))dt,
>
>
>
>
: (m(0), 2
(0)) = (x0 , 0), x0 2 R+ , t 2 R+ ,
2
and ✏i are i.i.d. realizations of N (0, ) for i = 1, . . . , n. Here, ⌫1 = and
⌫2 = k according to (3.3).
In this case, the likelihood can be written as
n
Y ⇢
1 (xi m(ti ; ✓))2
L(✓; x) = p exp .
i=1 2⇡( 2 + 2 (t
i ; ✓)) 2( 2 + 2 (ti ; ✓))
The MLE for ✓ is given by the minimizer of minus the log likelihood,
n ⇢
X
⇤ (xi m(ti ; ✓))2 2 2
✓ = arg min 2 + 2 (t ; ✓)
+ log( + (ti ; ✓)) .
✓2⇥ i
i=1
We determine first the minimum conditioned on k and X0 and then the global
92
optimizer.
3.4.5 Results
In the Figure 3.11, we see that the likelihood provided by our indirect inference model
has a unique maximum at (c⇤1 , c⇤2 ) = (0.63 · 10 4 , 1.2 · 10 4 ). The 90% confidence band
derived from the Master Equation (1.6) associated with our SRN (3.3) is given in the
right panel.
Wear [mm]
1.5
6 2.5
c2
5 2
1
4
1.5
3
1 Data
0.5 2 90% conf. band model 1
1
0.5 90% conf. band model 2
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
c −4
Operating time [h] 4
1 x 10 x 10
Figure 3.11: Left panel: Unique global maximum (c⇤1 , c⇤2 ) = (0.63 · 10 4 , 1.2 · 10 4 ).
Right panel: the exact 90% confidence band computed from the associated Master
Equation.
Suppose that we know that the wear process, W , is at level w0 at time t0 0. Assume
that there exists a critical stopping level, wmax > w0 , that determines the residual
lifetime ⌧max t0 . For t > 0, the residual lifetime is greater than t, if and only if
93
CDF of the hitting time, B = 1 PDF of the hitting time, B = 1
1 0.05
0.8 0.04
0.6 0.03
0.4 0.02
0.2 0.01
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Operating time [h] 4
x 10 Operating time [h] 4
x 10
Figure 3.12: Left panel: CDF of the hitting-time for B = 1. Right panel: PDF of
the hitting-time to the critical level.
Taking into account the relation between the wear and the thickness processes, we
have that the conditional residual reliability function defined as
3.4.6 Summary
In this paper, we presented a novel approach to the problem of modeling the wear
process of cylinder liners. Since the measuring caliper has finite precision, the wear
process takes values in a lattice, and therefore a pure jump process is a sensible model.
In this approach, we started fitting one of the most simple pure jump processes, i.e.,
the simple decay model, and added complexity only when necessary. We found that
the wear process can be modeled using only two jumps of amplitudes, and 4 , with
94
Conditional residual reliabilty
1.4
1.2
1 w0 =2
w0 =3
0.8
w0 =4
0.6
w0 =5
0.4
0.2
0
0 2 4 6 8 10 12
Residual lifetime [h] 4
x 10
Figure 3.13: Behavior of the conditional residual reliability function, R(t; 0, w0 ) for
some values of w0 . In this case, we set wmax = 4. As expected, for a fixed residual
lifetime t, we have that R(t; 0, w0 ) is a decreasing function of w0 .
linear propensity functions. In contrast to the work of Giorgio, Guida, and Pulcini
[7], we did not need to use age-dependent propensity functions or gamma noise. Nev-
ertheless, our approach can deal with age-dependent propensities since time does not
play any other role than a given constant. One of the main contributions of this work
is the multiscale indirect inference approach, where the inferences are based on up-
scaled models. The coefficients of the linear propensity functions were inferred using
the likelihood associated with a Gaussian upscaled model. The mean and variance
of this Gaussian process are the solutions of a second-order moment expansion ODE
system. In this way, we computed the MLE by solving a standard nonlinear least
squares problem. We observe that this method is much simpler than dealing directly
with the likelihood of the pure jump process, which in general cannot be expressed in
closed form and requires computationally intensive sampling techniques to be solved.
We notice that, as long as the probability distribution of the pure jump process is
unimodal at every time, our Gaussian inference approach is applicable and produces
95
substantial savings in the computational work. Otherwise, the Langevin model, while
more computationally demanding, is more flexible.
Thanks to the remarkable simplicity of our model, we can easily obtain the dis-
tribution of any observable of the process directly from the solution of the associ-
ated Master Equation, which provides the probability distribution of the process at
all times. From this probability mass function, we easily compute the CDF of the
hitting-time to the critical value stipulated in the warranty and the conditional resid-
ual reliability function. It is worth mentioning that we did not use Monte Carlo
simulation or any other sampling procedure.
(0)
Figure 3.14: The two-phase estimation process. In the first step, we obtain ✓II from
(0)
✓I by solving the optimization problem (3.5)). In the subsequent steps, we generate
(p)
the stochastic sequence (✓II )+1
p=1 using Monte Carlo EM (3.12).
Starting from a set of over-dispersed seeds, the output of our two-phase method is
a cluster of maximum likelihood estimates obtained by using convergence assessment
techniques from the theory of Markov chain Monte Carlo. An example of the output
of our method for a Birth and Death process (see 1.2.3) is provided in the Figure 3.16
and Table 3.2. The data set for this example is shown in Figure 3.15.
Data trajectory
30
X
28
26
24
Species count
22
20
18
16
14
12
0 50 100 150 200
Time
Figure 3.15: Data trajectory for the Birth-death example. This is obtained by ob-
serving the values of an SSA path at uniform time intervals of size t=5.
97
0.09
Initial point phase I
Initial point phase II
0.08
Final point phase II
0.07
0.06
0.05
0.04
0.03
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Figure 3.16: FREM estimation (phase I and phase II) for the birth-death process.
The horizontal axis is for the birth rate and the vertical axis for the death rate.
(p⇤ )
= ✓ˆII,i
(0) (0)
seed i ⇤ = ✓I,i 3 = ✓II,i
1 (0.5, 0.04) (6.24e-01, 3.29e-02) (1.24e+00, 6.55e-02)
2 (0.5, 0.08) (7.68e-01, 4.07e-02) (1.29e+00, 6.67e-02)
3 (1.5, 0.04) (1.01e+00, 5.25e-02) (1.18e+00 6.27e-02)
4 (1.5, 0.08) (1.53e+00, 7.97e-02) (1.20e+00, 6.34e-02)
Table 3.2: Values generated by the FREM algorithm for the birth-death example.
98
3.5.1 A Two-phase Algorithm
such that, for each k, Ik := [sk , tk ] is the time interval determined by two consecutive
observational points, sk and tk , where the states x(sk ) and x(tk ) have been observed,
respectively.
The main goal of the phase I is to address the key problem of finding a suitable initial
(0)
point, ✓II for the phase II. The idea is to increase (in some cases dramatically) the
number of SRN-bridges from the sampled forward-reverse trajectories for all time
intervals.
(0)
Let us now describe the phase I. From the user-selected seed, ✓I , we solve the fol-
lowing deterministic optimization problem using some appropriate numerical iterative
method:
X ⇣ ⌘
(0) (0)
✓II := arg min wk d Z̃ (f ) (t⇤k ; ✓), Z̃ (b) (t⇤k ; ✓) , starting from ✓I . (3.5)
✓ 0
k
Here Z̃ (f ) is the ODE approximation, defined by (9.5) (or (1.13)), in the interval
[sk , t⇤k ], to the SRN defined by the reaction channels, ((⌫j , aj ))Jj=1 , and the initial
condition x(sk ); and, Z̃ (r) , is the ODE approximation in the interval [t⇤k , tk ], to the
SRN defined by the reaction channels, (( ⌫j , ãj ))Jj=1 , and by the initial condition
99
x(tk ), where ãj (x) := aj (x ⌫j ). We define Z̃ (b) (u, ✓):=Z̃ (r) (t⇤k +tk u, ✓) for u 2 [t⇤k , tk ].
Here wk :=(tk sk ) 1
and d(·, ·) is an appropriate distance in Rd . The rationale behind
this particular choice of the weight factors, wk , is based on the mitigation of the
e↵ect of very large time intervals where the evolution of the process, X, may be more
uncertain. A better (but more costly) measure would be the inverse of the maximal
variance of the SRN-bridge.
(0)
Remark 3.5.1 (Alternative definition of ✓II ). In some cases, convergence issues
arise when solving the problem (3.5). We found useful to solve a set of simpler
problems whose answers can be combined to provide a reasonable seed for the phase
II: more precisely, we solve K deterministic optimization problems, one for each time
interval [sk , tk ]:
(0)
all of them solved iteratively with the same seed, ✓I . Then, we define
P
(0) wk k
✓II := Pk , (3.6)
k wk
In our statistical estimation approach, the Monte Carlo EM algorithm uses data
(pseudo-data) generated by those forward and backward simulated paths that result
in SRN-bridges, either exact or approximate bridges. Figure 3.17 illustrates this idea.
This last notion is associated with the use of kernels. The phase II implements the
Monte Carlo EM algorithm for SRNs.
Simulating Forward and Backward Paths: this phase starts with the sim-
ulation of forward and backward paths at each time interval Ik . More specifically,
given an estimation of the true parameter ✓, say, ✓ˆ = (ĉ1 , ĉ2 , . . . , ĉJ ), the first step
100
70
65
60
W
55
50
45
˜ m0 ))M
(X̃ (b) (t⇤k , ! m0 =1 denote the values of the simulated forward and backward paths at
k
the time, t⇤k , respectively. If the intersection of this two sets of points is nonempty,
then, there exists at least one m and one m0 such that the forward and backward
paths can be linked as one SRN-bridge connecting the data values x(sk ) and x(tk ).
When the number of simulated paths, Mk , is large enough, and an appropriate
guess of the parameter ✓ is used to generate those paths, then, due to the discrete
nature of our state space, Zd+ , we expect to generate a number of exact SRN-bridges
sufficiently large to perform statistical inference. However, at early stages of the
Monte Carlo EM algorithm, our approximations to the unknown parameter, ✓, are
not expected to provide a large number of exact SRN-bridges. In such a case, we
can use kernels to relax the notion of exact SRN-bridge, (see Section 9.2.3). Notice
that in the case of exact SRN-bridges, in the formula (9.16), we are implicitly using
a Kronecker’s kernel, that is, takes the value 1 when X̃ (f ) (t⇤k , !
˜ m ) = X̃ (b) (t⇤k , !
˜ m0 )
and 0 otherwise. We can relax this condition to obtain approximate SRN-bridges.
To make an efficient use of kernels, we first transform the endpoints of the forward
and backward paths generated in the interval Ik ,
Xk := (X̃ (f ) (t⇤k , !
˜ 1 ), X̃ (f ) (t⇤k , !
˜ 2 ), . . . , X̃ (f ) (t⇤k , !
˜ Mk ), (3.7)
X̃ (b) (t⇤k , !
˜ Mk +1 ), X̃ (b) (t⇤k , !
˜ Mk +2 ), . . . , X̃ (b) (t⇤k , !
˜ 2Mk ))
102
into
Ỹ (b) (t⇤k , !
˜ Mk +1 ), Ỹ (b) (t⇤k , !
˜ Mk +2 ), . . . , Ỹ (b) (t⇤k , !
˜ 2Mk ))
✓ ◆d Yd
3
(⌘) := (1 ⌘i2 )1|⌘i |1 , (3.9)
4 i=1
where ⌘ is defined as
⌘ ⌘ ⌘k (m, m0 ) := Ỹ (f ) (t⇤k , !
˜m) Ỹ (b) (t⇤k , !
˜ m0 ). (3.10)
P ⇣ ⌘
(f ) (b)
Rj,Ik (˜
m,m0 !m ) + Rj,Ik (˜ !m0 ) (⌘k (m, m0 )) k (m0 )
A✓ˆ(p) (Rj,Ik D; ) := P 0 0
(3.11)
m,m0 (⌘k (m, m )) k (m )
II
P ⇣ ⌘
(f ) (b)
m,m 0 F j,Ik (˜
! m ) + F j,Ik (˜
! m 0) (⌘k (m, m0 )) k (m0 )
A✓ˆ(p) (Fj,Ik D; ) := P 0 0
m,m0 (⌘k (m, m )) k (m )
II
Mk forward and reverse paths in the interval Ik , but we do not control directly
the number of exact or approximate SRN-bridges that are created. The number
Mk is chosen such that either the number of SRN-bridges is of order O (Mk ) or we
reach a computational budget Mb , which is 200 in our numerical experiments. In
Section 9.5.2, we indicate an algorithm to reduce the the computational complexity
of computing those -weighted averages from O (Mk2 ) to O (Mk ).
Finally, the Monte Carlo EM algorithm for this particular problem generates a
stochastic sequence (✓ˆII )+1
(p) (0)
p=1 staring from the initial guess ✓II provided by the phase
PK
k=1 A✓ˆ(p) (Rj,Ik D; )
(p+1) II
ĉ = PK , (3.12)
k=1 A✓ˆ(p) (Fj,Ik D; )
II
⇣ ⌘
where ✓ˆII = ĉ1 , . . . , ĉJ . In Section 9.5.4, a stopping criterion based on techniques
(p) (p) (p)
[3] J. Ahrens and U. Dieter, “Computer methods for sampling from gamma, beta,
Poisson and bionomial distributions,” Computing, vol. 12, pp. 223–246, 1974.
[5] ——, “Multilevel hybrid Cherno↵ tau-leap,” Accepted for publication in BIT Nu-
merical Mathematics, 2015.
[6] D. Anderson and D. Higham, “Multilevel Monte Carlo for continuous Markov
chains, with applications in biochemical kinetics,” SIAM Multiscal Model. Simul.,
vol. 10, no. 1, Mar. 2012.
[7] M. Giorgio, M. Guida, and G. Pulcini, “An age- and state-dependent Markov
model for degradation processes,” IIE Transactions, vol. 43, no. 9, pp. 621–632,
2011.
[9] C. Robert and G. Casella, Monte Carlo Statistical Methods (Springer Texts in
Statistics), 2nd ed. Springer, 2005.
106
Chapter 4
Concluding Remarks
4.1 Summary
The simulation of SRNs started with Feller and Doob [8] and has been an active area
of research since the last sixty years; specially active after the introduction of the
tau-leap method by Gillespie [9] in 2001 (see also [10]).
Simulation methods with rigorous error control for systems with hundreds or even
thousands of reaction channels and/or high number of species, are required by chem-
ical combustion, genomics, social networks and design of hydro-crackers, just to men-
tion a few disciplines. Fast algorithms for statistical inference in high dimensional
systems are also required.
108
Incorporating techniques from polynomial chaos, mean-field learning, tensorial
numerical analysis and sparse matrix computations seems to be attractive to adequate
to address those simulation and inference problems.
To the best of our knowledge, up-scaled approximation of SRNs like Chemical
Langevin difussions or reaction-rate ODEs has been proposed in hybrid simulation
schemes, but neither rigorously studied from the numerical point of view nor global
error control was performed. We believe that our dual-weighted residual expansion
techniques can be applied to obtain accurate error estimates and then our global error
control techniques can be applied to optimize the computational work.
In our immediate research plans, we have the exploration of techniques for incor-
porating implicit tau-leap methods to our hybrid schemes presented in [1, 2, 3] as
well as methods for incorporating spatial dimensions.
Regarding the statistical inference problem, we plan to extend the FREM al-
gorithm presented in [6] to the Multilevel Monte Carlo setting and to incorporate
higher-order kernels to deal with high-dimension problems. We are also planning to
extend our indirect inference methodology presented in [5] to the multidimensional
case.
109
REFERENCES
[1] A. Moraes, R. Tempone, and P. Vilanova, “Hybrid cherno↵ tau-leap,” Multiscale
Modeling and Simulation, vol. 12, no. 2, pp. 581–615, 2014.
[2] ——, “Multilevel hybrid Cherno↵ tau-leap,” Accepted for publication in BIT
Numerical Mathematics, 2015.
[4] D. Anderson and D. Higham, “Multilevel Monte Carlo for continuous Markov
chains, with applications in biochemical kinetics,” SIAM Multiscal Model. Simul.,
vol. 10, no. 1, Mar. 2012.
[8] P. Érdi and G. Lente, Stochastic Chemical Kinetics: Theory and (Mostly) Sys-
tems Biological Applications (Springer Series in Synergetics), 1st ed. Springer,
5 2014.
Part II
Included Papers
112
Chapter 5
Abstract
Markovian pure jump processes model a wide range of phenomena, including chem-
ical reactions at the molecular level, dynamics of wireless communication networks
and the spread of epidemic diseases in small populations. There exist algorithms
like Gillespie’s Stochastic Simulation Algorithm (SSA) or Anderson’s Modified Next
Reaction Method (MNRM) that simulate a single path with the exact distribution
of the process, but this can be time consuming when many reactions take place dur-
ing a short time interval. Gillespie’s approximated tau-leap method, on the other
hand, can be used to reduce computational time, but it may lead to non-physical
values due to a positive one-step exit probability, and it also introduces a time dis-
cretization error. Here, we present a novel hybrid algorithm for simulating individual
paths which adaptively switches between the SSA and the tau-leap method. The
switching strategy is based on a comparison of the expected inter-arrival time of the
SSA and an adaptive time step derived from a Cherno↵-type bound for the one-step
exit probability. Because this bound is non-asymptotic, we do not need to make any
1
A. Moraes, R. Tempone and P. Vilanova, “Hybrid Cherno↵ Tau-Leap”, SIAM Multiscale Mod-
eling and Simulation, Vol. 12, Issue 2, (2014).
113
distributional approximation for the tau-leap increments. This hybrid method allows
us (i) to control the global exit probability of any simulated path and (ii) to obtain
accurate and computable estimates of the expected value of any smooth observable of
the process with minimal computational work. We present numerical examples that
illustrate the performance of the proposed method.
5.1 Introduction
x ! x + ⌫j .
The probability that reaction j will occur during the small interval (t, t + dt) is then
assumed to be
J
X ✓Z t ◆
X(t) = X(0) + ⌫ j Yj aj (X(s)) ds , (5.3)
j=1 0
Remark 5.1.1. In chemical kinetics, the above setting can be used to describe well-
stirred systems of chemical species, interacting through di↵erent chemical reactions,
characterized by stoichiometric vectors, ⌫j , and polynomial propensities, aj , derived
from the mass-action principle (see [18]). Such systems are assumed to be confined
to a constant volume and to be in thermal, but not necessarily chemical, equilibrium
at some constant temperature. Other popular applications can be found in population
biology, epidemiology, and communication networks (see e.g., [19, 20]).
c
Example 5.1.2 (Simple decay model). Consider the reaction X ! ; where one
particle is consumed. In this case, the state vector X(t) is in Z+ where X denotes
the number of particles in the system. The vector for this reaction is ⌫ = 1. The
propensity functions in this case could be, for example, a(X) = c X, where c > 0.
The classical approach to chemical kinetics deals with state vectors of non-negative
real numbers representing the concentration of species at time t, usually measured in
moles per liter. In this setting, the concentrations are assumed to vary continuously
in time, according to the mass action principle, which says that each reaction in the
system a↵ects the rate of change of the species. More precisely, the e↵ect on the
instantaneous rate of change is proportional to the product of the concentrations of
the reacting species. For the simple decay example, we have the reaction rate ODE
(or mean field): ẋ(t) = cx(t) for t 2 R+ and x(0) = x0 2 R+ . In general, let
⌫ be the stoichiometric matrix with columns ⌫j , and a(x) be the column vector of
119
propensities. Then, we have
8
>
< ẋ(t) = ⌫a(x(t)), t 2 R+
(5.4)
>
: x(0) = x0 2 R+ .
The SSA method simulates exact paths of X using equation (5.3). It requires the
sampling of two random variables per time step: one to find the time of the next
reaction and another to determine which is the reaction that is firing at that time.
In [2], Gillespie presented the original SSA or the direct method.
Given a state X(t), the direct method is carried out by drawing two uniform
random numbers, U1 , U2 ⇠ U (0, 1), which give the time to, and index of, the next
reaction, i.e.,
n Xk o ✓ ◆
ai (X(t)) 1 1
j = min k 2 {1, . . . , J} : >U1 , ⌧min = ln ,
i=1
a 0 (X(t)) a 0 (X(t)) U2
PJ
where a0 (x) := j=1 aj (x). The new state is X(t+⌧min ) = X(t)+⌫j , and by repeating
the above procedure until final time T , a complete path of the process, X, can be
simulated.
The drawback of this algorithm appears clearly as the sum of the intensities of all
reactions, a0 (x), becomes large: since all the jump times have to be included in the
time discretization, the corresponding computational work may become una↵ordable.
Indeed, we have that the mean value of the jump times on the interval (t, t + ⌧ ) is
approximately a0 (X(t))⌧ + o (⌧ ).
120
5.1.3 The Tau-leap Approximation
where {Yj ( j )}Jj=1 are independent Poisson distributed random variables with param-
eter j, used to model the number of times that the reaction j fires during the (t, t+⌧ )
interval. This is nothing else than a forward Euler discretization of the stochastic dif-
ferential equation of the pure jump process (5.3), realized by the Poisson random
measure with state dependent intensity (see [5]).
In the limit, when ⌧ ! 0, the tau-leap method gives the same solution as the exact
methods, using the property that, for a constant propensity, the firing probability in
one reaction channel is independent of the other reaction channels. The total number
of firings in each channel is then a Poisson distributed stochastic variable depending
only on the initial population, X̄(t). The error thus comes from the variation of
a(X(s)) for s 2 (t, t + ⌧ ).
The outline of this work is as follows. In Section 5.2, we derive and give an implemen-
tation of the Cherno↵-type bound that guarantees that the one-step exit probability
in the tau-leap method is less than a predefined quantity. We also show that the
Gaussian pre-leap selection step is not accurate and should not be used as a reliable
bound. In Section 5.3, we motivate and give implementation details of the one-step
switching decision rule, which will be the key ingredient for generating hybrid paths.
121
We show how to choose between the SSA or the tau-leap method, on the basis of the
current state of the approximated process. Next, we show how to generate hybrid
paths and to obtain an estimate of the path exit error based on the probability that
one hybrid path exits the Zd+ lattice. This estimation of the global exit probability
depends on the expected number of tau-leap steps taken by the hybrid algorithm. It is
easy to prove that this number is finite. Hybrid paths can also be used for estimating
the expected number of steps that the SSA algorithm needs in order to reach the final
time. In Section 5.4, we decompose the total error into three components, the dis-
cretization error, the statistical error and the global exit error, which were studied in
the previous section. To control these errors, we give an algorithm capable of estimat-
ing the error components. We also compute the necessary ingredients for obtaining
the desired estimate, i.e., a time mesh, a bound for the one-step exit probability and
the total number of Monte Carlo hybrid paths to be simulated. These ingredients
are computed by optimizing the expected work of the hybrid method constrained to
the error requirements. In Section 5.5, we present some numerical experiments using
well-known examples taken from the literature. Finally, in Section 5.6, we provide
conclusions and suggest directions for future work.
ities
In this section, we derive a Cherno↵-type bound that helps us to guarantee that the
one-step exit probability in the tau-leap method is less than a predefined quantity,
> 0. This is crucial to controlling the computational global error, E, which is
defined below in Section 5.4. To motivate the main ideas, the bound is first derived
for the single reaction case and then generalized to several reactions. At the end of
this section, we present an algorithm that efficiently computes the step size.
122
5.2.1 The Single-reaction Case
⇣ ⌘
P (Q n) exp n(1 log(n/ ) ) , (5.7)
⇥ ⇤
sQ sn E esQ
P (Q n) = P e e
esn
and thus
⇣ ⌘
s
P (Q n) exp inf { sn + (e 1)} .
s>0
is achieved at s⇤ = log(n/ ), and its value is n(1 log(n/ ) ). From this simple
calculation, we obtain the Cherno↵ bound (5.7).
Given a positive integer, n, representing the state of the system at a certain time,
and 2 (0, 1), we would like to obtain the largest value for such that P (Q( ) n)
123
. From the Cherno↵ bound, we have
or equivalently
log( )
log( ) log(n) + 1. (5.8)
n n
✓ ◆ 1
1 ,
n+1
n
exp( ) .
n!
Taking logarithms on both sides, we arrive exactly at the Cherno↵ bound (5.8). We
can see in Figure 5.1 that the Klar bound (5.6) is sharp, except when gets close
to the singularity at n + 1. The Cherno↵ bound (5.7) is not as sharp as Klar’s
bound but, as we will see in the next subsection, it has a generalization to the more
practical many-reaction case. We observe that the Gaussian approximation in Figure
5.1 performs poorly for small values of and is not a bound in general.
To the best of our knowledge, there is no simple expression for the cumulative dis-
tribution function of a linear combination of independent Poisson random variables.
For that reason, we propose a Cherno↵-type bound for estimating the maximum size
of the tau-leap step when many reactions are involved.
Consider the following pre-leap check problem: find the largest possible ⌧ such
124
1
0.1
0.01
Klar !1D"
0.001
Chernoff ! this work"
10 ! 4 Poisson ! exact "
Gaussian ! approxim ation "
10 ! 5
10 ! 6
4 6 8 10
Λ
Figure 5.1: Let n = 10 and 2 (2, 10). Here, we show the semi-logarithmic plot of
P (Q( ) n), the Cherno↵ bound exp ((n(1 log(n/ ) ))), the Klar bound and
the Gaussian approximation.
that, with high probability, the next step of the tau-leap method will take a value in
the Zd+ lattice of non-negative integers, i.e.,
J
!
X
P X̄(t) + ⌫j Yj aj (X̄(t))⌧ 2 Zd+ X̄(t) 1 , (5.9)
j=1
for some small > 0. Observe that this value of ⌧ depends on X̄(t).
Condition (5.9) can be achieved by solving d auxiliary problems, one for each
x-coordinate, i = 1, 2, . . . , d. Find the largest possible ⌧i 0, such that
J
!
X
P X̄i (t) + ⌫ji Yj aj (X̄(t))⌧i < 0 X̄(t) i, (5.10)
j=1
where i = /d and ⌫ji is the i-th coordinate of the j-th reaction channel, ⌫j . In-
equality (5.9) is then fulfilled if we let ⌧ := min{⌧i : i = 1, 2, . . . , d}.
In the following sections, we show how to find the largest time steps, ⌧i .
125
Defining the function ⌧i (s)
Consider the random variable Qi (t, ⌧i ) representing the opposite of the increment of
the process, X̄i (t).
J
X
Qi (t, ⌧i ) := ( ⌫ji )Yj aj (X̄(t))⌧i .
j=1
⇣ ⌘
P Qi (t, ⌧i ) > X̄i (t) X̄(t) = P exp (sQi (t, ⌧i )) > exp sX̄i (t) X̄(t)
E [exp (sQi (t, ⌧i ))] (5.11)
.
exp sX̄i (t)
Observe that the independent Poisson random variables, Yj aj (X̄(t))⌧i , have moment-
generating functions,
and, therefore,
J
Y
E [exp (sQi (t, ⌧i ))] = Mj ( s⌫ji )
j=1
J
! (5.12)
X
s⌫ji
= exp ⌧i aj (X̄(t))(e 1) .
j=1
By combining (5.11) and (5.12), we obtain the Cherno↵ bound for the multi-
reaction case, namely
J
!
X
s⌫ji
P Qi (t, ⌧i )>X̄i (t) X̄(t) inf exp sX̄i (t) + ⌧i aj (X̄(t))(e 1) . (5.13)
s>0
j=1
126
To avoid the computational problem of finding exactly the above infimum and to
guarantee that
P Qi (t, ⌧i ) > X̄i (t) X̄(t) i ,
J
X
s⌫ji
sX̄i (t) + ⌧i aj (X̄(t))(e 1) = log( i ).
j=1
where
J
X
a0 (X̄(t)) := aj (X̄(t)).
j=1
Study of ⌧i (s)
In this section, we study how much we can increase ⌧i while satisfying condition
(5.10). Obviously, it is satisfied for ⌧i = 0+ . By a continuity argument, we want
to obtain ⌧i⇤ defined as the maximum ⌧i such that every point of the interval [0, ⌧i ]
satisfies (5.10). Note that ⌧i⇤ could be +1.
We discuss how, depending on certain relations among the pairs {(aj (X̄(t)), ⌫ji )}Jj=1 ,
we can conclude that ⌧i⇤ is either a real number or +1. First of all, if ⌫ji > 0, 8j,
then ⌧i⇤ must be +1, since no reaction is pointing to zero. From now on, we assume
that, given the coordinate i, there is at least one reaction pointing to zero, i.e.,
J
X
s ⌫ji
Di (s) := a0 X̄(t) + aj (X̄(t))e , (5.16)
j=1
s⌫ji
which is convex since it is a positive linear combination of the convex functions e
plus the constant term a0 (X̄(t)). We also notice that Di (0) = 0 and Di (+1) = +1
when (5.15) holds.
On the other hand, the numerator of (5.14),
is a straight line crossing the vertical axis at log( i ) < 0, and we can assume that its
slope, X̄i (t), is positive. Otherwise, the X̄(t) process is at the boundary of Zd+ , and
therefore no reaction is pointing outside the lattice, Zd+ . We therefore set ⌧i⇤ = +1.
Let us define si as the root of the numerator Ri (s), i.e.,
J
X ⌫ji /X̄i (t)
Di (si ) = a0 (X̄(t)) + aj (X̄(t)) i , (5.18)
j=1
and
J
X ⌫ji /X̄i (t)
Di0 (si ) = aj (X̄(t))⌫ji i . (5.19)
j=1
In order to determine whether ⌧i⇤ < 1 or ⌧i⇤ = 1, we have to analyze all possible
cases regarding the pair (Ri (s), Di (s)).
128
Indeed, note that
J
X
Di0 (0) = aj (X̄(t))⌫ji ,
j=1
and if Di0 (0) 0, which could be interpreted as a drift pointing to the boundary, then
Di (s) is monotonically increasing in [0, +1). This situation is illustrated in Figure
5.2.2: in the left panel, we see the pair (Ri (s), Di (s)); in the right panel, we see the
quotient ⌧i (s) = Ri (s)/Di (s). The function ⌧i achieves its maximum, ⌧i⇤ , at a unique
point, s̃i .
60 0.10 Τi
50
0.08
40 Num erator
30 Denom inator 0.06
20 0.04
10
0.02
s
0.1 0.2 0.3
! 10 s
0.2 0.4 0.6 0.8 1.0
[⌧i (s)]Left: Numerator Ri (s) and denominator Di (s). Right: Quotient
⌧i (s) = Ri (s)/Di (s). Both plots are for the case Di0 (0) 0.
If Di0 (0) < 0, which can be interpreted as a drift pointing to +1, the value of
⌧i⇤ depends on X̄i (t), i.e., on the size of the slope of Ri (s). Observe that Di (s) is
then negative in an interval (0, di ), with Di (di ) = 0, and in general there is not a
closed form for di . Also, since Di (s) and Ri (s) may have opposite signs for some
s max(si , di ), this allows for artificially negative values of ⌧i , which should not be
taken into account.
The value of ⌧i⇤ is finite or +1 according to the sign of Di (si ). These three cases
are shown in the left panel of the Figure 5.2. When X̄i (t) is large enough, i.e., when
Di (si ) < 0, we can see in the right panel of Figure 5.2 that ⌧i ⇤ = +1. This is true
because the limit of ⌧i (s), as s ! d+
i , is +1. Therefore, if X̄i (t) is far from the
boundary and the drift is pointing to +1, we can take ⌧i to be as large as we wish.
The two other cases are as follows: if Di (si ) > 0, it means that X̄i (t) is, in a
129
Numerator R1 i
40
Numerator R2 i 1.0
Τi
20 Numerator R3 i
0.5
Denominator
s s
0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1.0
! 20 ! 0.5
PJ
Figure 5.2: In this case j=1 aj (X̄(t))⌫ji > 0. Left: relative positions of
(Ri (s), Di (s)), depending on the sign of Di (si ). Right: ⌧i (s) = Ri (s)/Di (s) in the
case Di (si ) < 0.
certain sense, close to the boundary, and even if the drift is pointing to +1, there
exists an upper bound for ⌧i . This is illustrated in the left part of Figure 5.3, where
⌧i⇤ is the maximum to the right of si . Finally, if Di (si ) = 0, then ⌧i⇤ can be obtained
as the limit of ⌧i (s) as s ! d+ ⇤ 0
i . By l’Hôpital’s rule, we have that ⌧i = X̄i (t)/Di (si ).
0.10 Τi 1.0
0.08 0.8
Τi
0.6
0.06
0.4
0.04
0.2
0.02
s
0.2 0.4 0.6 0.8 1.0
s ! 0.2
0.2 0.4 0.6 0.8 1.0
PJ
Figure 5.3: The other two cases for ⌧i (s) when j=1 aj (X̄(t))⌫ji > 0. Left: Di (si ) > 0.
Right: Di (si ) = 0.
We can summarize the previous discussion as follows: If ⌫ji 0, for all j, then
⌧i⇤ = +1; otherwise, we have the following three cases:
1. Di (si ) > 0. In this case, ⌧i (si ) = 0 and Di (s) is positive and increasing as
8s si . Therefore, ⌧i (s) is equal to the ratio of two positive increasing functions.
The numerator, Ri (s), is a linear function and the denominator, Di (s), grows
exponentially fast. Then, there exist an upper bound, ⌧i⇤ , and a unique number,
s̃i , which satisfies ⌧i (s̃i ) = ⌧i⇤ . We develop an algorithm for approximating s̃i ,
130
using the relation ⌧i0 (s̃i ) = 0.
Approximating s̃i
In this section, we present a simple and fast algorithm for approximating s̃i , which
was defined in case (1) above. We proceed in two steps. In the first step, we find
an initial guess, s⇤i,0 , and in the second one, we improve this guess and obtain s⇤i,1 .
Therefore, ⌧i⇤ =⌧i (s̃i ) will be approximated by ⌧i (s⇤i,1 ).
From (5.15), the equation ⌧i0 (s) = 0 is equivalent to
J
X J
X
a0 (X̄(t))+ aj (X̄(t)) exp( s⌫ji ) = (s si ) aj (X̄(t))( ⌫ji ) exp( s⌫ji ). (5.20)
j=1 j=1
Let us define
As a consequence,
⌫ji /X̄i (t)
exp( s⌫ji ) = i exp( ŝ⌫ji ),
J
X J
X
bji (X̄(t)) exp( ŝ⌫ji ) = a0 (X̄(t)) + ŝ bji (X̄(t))( ⌫ji ) exp( ŝ⌫ji ). (5.21)
j=1 j=1
J
X
G(y) = a0 (X̄(t)) + bji (X̄(t)) ji (y).
j=1
The left graph in Figure 5.4 shows the shape of ji depending on the sign of ⌫ji . We
deduce that G is a decreasing function such that G(0) = D(si ) and G(+1) = 1.
1.0 30
20
0.5
10
Figure 5.4: Left: function (s) for di↵erent values of ⌫. Right: function G and its
approximating parabola.
By neglecting the exponential term in ji , we can obtain an initial guess for s̃i ,
i.e., P
a0 (X̄(t)) + Jj=1 bji (X̄(t)) Di (si )
s⇤i,0 = PJ = 0 .
j=1 bji (X̄(t))( ⌫ji )
Di (si )
As we observed in case (1), the values for Di (si ) and Di0 (si ) are positive, and our
initial guess, s⇤i,0 , is a positive number.
In the right graph in Figure 5.4, we can see that the parabola obtained as the
second-order approximation of G at s⇤i,0 is a good approximation of G close to its
root, s̃i . Therefore, we obtain s⇤i,1 as the largest root of the approximating parabola.
By evaluating ⌧i (s⇤i,1 ), we obtain a sharp lower bound of sups>0 ⌧i (s).
An expression for s⇤i,1 in terms of G and its derivatives up to the second order
evaluated at s⇤i,0 is given by
⇣ q ⌘
s⇤i,1 = s⇤i,0 + G 0
(s⇤i,0 ) + G0 (s⇤i,0 )2 2G00 (s⇤i,0 )G(s⇤i,0 ) /G00 (s⇤i,0 ). (5.22)
132
An efficient implementation for computing ⌧i (s⇤i,1 ) ⇡ ⌧i⇤ can be found in Algorithm
5 (see the definition of ⌧i⇤ in case (1) at the end of 5.2.2).
Algorithm 5 Computes the Cherno↵ tau-leap step size. Inputs: the current state
of the approximate process, X̄, the propensity functions evaluated at X̄, (aj (X̄))Jj=1 ,
and the stoichiometric matrix, ⌫ji . Output: ⌧ . Notes: for a fixed coordinate, i, such
that (5.15) is fulfilled (otherwise ⌧i = + 1), this algorithm determines whether of not
⌧i⇤ is finite. When ⌧i⇤ is finite, this algorithm computes an approximation for ⌧i (s⇤i,1 )
based on (5.22).
PJ
Require: a0 j=1 aj > 0
1: for i=1 to d do
2: if 9j : ⌫ji < 0 and X̄i (t) > 0 then
3: x X̄i (t)
⌫ /x
4: bj a ji
PjJ i
5: b̂ j=1 bj
6: if b̂ a0 < 0 then
7: ⌧i +1
8: else
9: if b̂ a0 > 0 then P
10: s (a0 b̂)/ Jj=1 bj ⌫ji
11: ⇠j bj exp( s⌫ji )
PJ p
12: cp j=1 ⇠j ⌫ji , p = 0, 1, 2, 3
1
13: ↵ (c s c2 )
2 3
14: c2 s
15: a0 + c 0 p+ c1 s
16: s s ( + 2 4↵ )/2↵
PJ
17: ⌧i sx/( a0 + j=1 bj exp( s⌫ji ))
18: else P
19: ⌧i x/ Jj=1 bj ⌫ji
20: end if
21: end if
22: else
23: ⌧i +1
24: end if
25: end for
26: return min{⌧1 , . . . , ⌧d }
133
5.2.3 Computational Work of the Pre-Leap Methods: Cher-
⇣ ⌘
P X̄i (t + ⌧gau ) < 0 X̄(t) , i=1, . . . , d. (5.23)
P X̄i (t) Qi (t, ⌧gau ) < 0 X̄(t) = P Qi (t, ⌧gau ) > X̄i (t) X̄(t)
PJ
where Qi (t, ⌧gau ) := j=1 ⌫ji Yj (aj (X̄(t))⌧gau ).
Now, we approximate Qi (t, ⌧gau ) by
q
Q̂i (t, ⌧gau ) := E [Qi (t, ⌧gau )] + Var [Qi (t, ⌧gau )]N ,
0 PJ 1
⇣ ⌘ X̄i (t)+ ⌫ji aj (X̄(t))⌧gau
@ j=1 A,
P Q̂i (t) X̄i (t) X̄(t) = qP
J 2
j=1 ⌫ji aj (X̄(t))⌧gau
where is the cumulative density function for the standard normal distribution.
Finally, let z satisfy (z ) = 1 . Then, the ⌧gau that approximately solves (5.23)
134
is obtained from
v
J u J
X uX
X̄i (t)+ ⌫ji aj (X̄(t))⌧gau =z t 2
⌫ji aj (X̄(t))⌧gau .
j=1 j=1
Algorithm 6 efficiently computes the step size, ⌧gau , using the Gaussian approxi-
mation.
Algorithm 6 Computes the tau-leap step size using a Gaussian approximation. In-
puts: the current state of the approximate process, X̄, the propensity functions eval-
uated at X̄, (aj (X̄))Jj=1 , and the stoichiometric matrix, ⌫ji . Output: ⌧ . Notes: for a
fixed coordinate, i, this algorithm determines whether of not ⌧i⇤ is finite. When ⌧i⇤ is
finite, this algorithm computes its value.
PJ
Require: j=1 aj > 0
1: for i=1 to d do
2: x X̄
P i (t)
J p
3: cp j=1 aj ⌫ji , p = 1, 2
4: ⇢ z 2i
5: ↵ ⇢2 c22 4⇢c1 c2 x
6: if c2 = 0 or (c1 > 0 and ↵ < 0) then
7: ⌧i +1
8: else
9: if c2 6= 0 and (c1 < 0 or (c1 > 0 and ↵ 0)) then
10: ⇢c2 p 2c1 x
11: ⌧i ( ↵)/2c21
12: else
13: ⌧i x2 /⇢c2
14: end if
15: end if
16: end for
17: return min{⌧1 , . . . , ⌧d }
−4
10
−4
10
−6
10
δ
−6 empirical δ
10
−6 −4 −6 −4
10 10 10 10
δ δ
Figure 5.5: The Cherno↵ bound vs. the Gaussian approximation in the simple decay
model example, with initial condition X0 =100 (see Section 5.5). Left: The empirical
one-step exit probability bound with asymptotic confidence intervals (95%) versus
a reference line with a unit slope (solid line) for the Cherno↵ tau-leap. Missing
confidence intervals means that the values are zero or negative. Right: The Gaussian
approximation case. We can observe that the Cherno↵ bound holds for any , with a
confidence level of 95%, which is not the case for the Gaussian approximation.
136
5.3 The One-step Switching Rule and Hybrid Tra-
jectories
In this section, we first present a one-step switching rule that, given the current state
of the approximate process, X̄(t), adaptively determines whether to use an exact
or an approximated method for the next step. Then, we present an algorithm for
simulating a whole hybrid path. This algorithm consists of a certain number of exact
and approximate steps. Next, we estimate the probability that one hybrid path exits
the lattice, Zd+ , which is an event that depends on the expected number of tau-leap
steps, as we will see. Finally, we show how to estimate, based only on hybrid paths,
the expected number of steps of a pure SSA path.
Here, we provide a justification for the one-step switching rule algorithm, as described
in Algorithm 7.
Let x = X̄(t) be the current state of the approximate process, X̄. Therefore, the
expected time step of the SSA algorithm is given by 1/a0 (x). Let ⌧Ch =⌧Ch (x, ) be the
Cherno↵ tau-leap step, obtained using Algorithm 5. To move one step forward using
the SSA method, we should compute at least a0 (x) and sample two uniform random
variables. On the other hand, to move one step forward using the Cherno↵ tau-leap
method, we not only have to compute ⌧Ch (discussed at the end of Section 5.2), but we
also have to generate J Poisson random variables, where J is the number of reaction
channels. It is critical to observe that the computational work of generating J Poisson
random variables is much larger than the computational work of generating only two
uniform random variables. This computational work could be measured, for example,
as the average execution time for the operations involved in it.
We now describe K1 and K2 . In order to avoid the overhead caused by unnecessary
137
computations of ⌧Ch , we first estimate the computational work of moving forward from
the current time, t, to the next grid point, T0 , by using the SSA method. If this work is
less than the work of computing ⌧Ch , we take an exact step. This motivates us to define
K1 as the ratio between the work of computing ⌧Ch and the work of computing a0 (x)
plus sampling two uniform random variables. Otherwise, we compute ⌧Ch and decide
whether to take an SSA step or a tau-leap one, according to the comparison between
⌧Ch and K2 /a0 (x). Here K2 =K2 (x, ) is defined as the work of taking a Cherno↵ tau-
leap step given the current state of the process, divided by the work of taking an SSA
step plus the work of computing ⌧Ch . As we mentioned, associated with each type of
step, there is computational work. In the first case, when K1 /a0 (x) > T0 t, the work
is C1 , and includes the computation of 1/a0 (x) and the generation of two uniform
random variates. In the same way, when K1 /a0 (x) > T0 t and K2 /a0 (x) > ⌧Ch , the
work is C2 , and involves the work contained in C1 and of computing ⌧Ch (x, ), which
is denoted by C3 . On the other hand, when a Cherno↵ tau-leap step is taken, we have
not only the constant work, C3 , but also variable work, which is the work of generating
the Poisson random variates. The latter work is a function of the propensities of all
the reaction channels, namely, a(x)⌧Ch (x, ). We model the computational work of
generating one Poisson random variate according to [23], and this work is denoted
by CP (·). In the Gamma simulation method developed by Ahrens and Dieter in [23],
which is used by MATLAB, the work grows like b1 +b2 ln where > 15 is the rate
of the Poisson random variable. For 15, the growth is linear. In practice, it is
possible to estimate b1 and b2 using a Monte Carlo method with a least squares fit,
as shown in Figure 5.6.
PJ
C3 C3 + j=1 CP (aj (X̄(t))⌧Ch (X̄(t), ))
Summarizing, K1 := C1
, and K2 (X̄(t), ) := C1 +C3
.
C3 +Jb1
Observe that K2 (x, ) ! C1 +C3
=: C̃ > 0 as ! 0.
Here, we estimate the coefficients (o✏ine precomputed, machine dependent quan-
tities) C1 , C2 , C3 , b1 , and b2 by computing average execution times of the correspond-
138
Poisson random variates computational work model Poisson random variates computational work model
−4 −4
x10 x10
Actual simulation runtimes
6 Least squares fit
5
CP(λ)
CP(λ)
4 2
Figure 5.6: Left: The computational work (runtime) model for generating a Poisson
random variate, using the Gamma method by Ahrens and Dieter [23]. Right: Linear
growth detail, for 2 [0, 15].
Algorithm 7 The one-step switching rule. Inputs: the current time, t, the current
state of the approximate process, X̄(t), the propensity functions, (aj (X̄(t)))Jj=1 , and
⇥ ⇤
the next grid point, T0 . Outputs: method and ⌧ . Notes: based on E ⌧SSA (X̄(t)) =
1/a0 (X̄(t)) and ⌧Ch (X̄(t), ), this algorithm adaptively selects which method to use:
SSA or TL. We denote by ⌧SSA (⌧Ch ) the step size when the decision is to use the
SSA (tau-leap) method.
PJ
Require: a0 j=1 aj > 0
1: if K1 /a0 < T0 t then
2: ⌧Ch Algorithm 5
3: if ⌧Ch < K2 (X̄(t), )/a0 then
4: return (SSA, ⌧SSA )
5: else
6: return (T L, ⌧Ch )
7: end if
8: else
9: return (SSA, ⌧SSA )
10: end if
We now briefly describe Algorithm 7. The first decision is made through the
comparison between the expected SSA step size and the remaining time until the
next grid point, T0 . To interpret this rule, we first assume that T0 t tends to zero.
Then, the selected method tends to be SSA. This decision rule favors SSA over tau-
leap and trivially guarantees the Cherno↵ bound. In the case of problems where
139
the SSA method is more convenient, the advantage is obvious: it is not necessary
to superfluously compute the tau-leap step size. On the other hand, this choice has
“reasonable” computational work in terms of choosing SSA over tau-leap, since there
is little time left until T0 . Now assume that K1 /a0 tends to infinity; that is, a0 tends
to zero. Then, the reasonable choice is SSA, because the Cherno↵ tau-leap step size
tends to zero, in this case. It should be noted that this first decision rule has no extra
computational work, because a0 must be computed anyway. If K1 /a0 < T0 t holds,
then the tau-leap size is computed and the second decision is made (line 3). In this
case, first assume that ⌧Ch tends to zero. Then, the selected method tends to be SSA,
which is a natural choice. If, on the contrary, ⌧Ch tends to infinity, the chosen method
tends to be the tau-leap, which again is a natural choice. Now, assume that K2 /a0
tends to infinity. Then, a reasonable choice is SSA, because the step size is large and
the bound is guaranteed. If K2 /a0 tends to zero, the reasonable choice is tau-leap.
A summary of the one-step switching rule decisions is given in Table 5.1.
aa
aa
tends to
If aa 1 0
aa
Decision 1 T0 t go to Decision 2 SSA
K1 /a0 SSA TL
Decision 2 ⌧Ch TL SSA
K2 /a0 SSA TL
Table 5.1: One-step switching rule summary. Decision 1 is made at line 1 of algorithm
7, whereas decision 2 is made at line 3.
Remark 5.3.1. In Figure 5.3.1, we illustrate the result of the one-step switching
rule in the Gene Transcription and Translation model (see Section 5.5). As (the
parameter that controls the one-step exit probability) decreases, the SSA region, in the
state space of the problem, increases. We observe that, for = 10 2 , almost all the
state space is a Cherno↵ tau-leap region. For smaller , we observe that, if the number
of proteins (y-axis) is zero, and the number of mRNA’s (x-axis) is large enough, the
states belong to the tau-leap region, because the propensity of the reactions pointing
140
outside the lattice is weaker than the propensity of the reactions pointing inside the
lattice. When the number of proteins increases, there is a narrow region in which the
propensity of the reactions pointing out dominates, and consequently, the switching
rule chooses for the SSA method. After that, the Cherno↵ tau-leap is preferred. The
situation is almost symmetric in the x = y axis.
Chernoff tau−leap and SSA regions Chernoff tau−leap and SSA regions Chernoff tau−leap and SSA regions
40 40 40
Tau−leap Tau−leap Tau−leap
35 SSA 35 SSA 35 SSA
30 30 30
25 25 25
Proteins
Proteins
Proteins
20 20 20
15 15 15
10 10 10
5 5 5
0 0 0
0 10 20 30 40 0 10 20 30 40 0 10 20 30 40
mRNA mRNA mRNA
Figure 5.7: Regions of the one-step switching rule in the gene transcription and
translation model (see Section 5.5). The blue and red dots show the Cherno↵ tau-
leap and the SSA regions, respectively. From left to right, = 10 2 , 10 4 , 10 6 ,
respectively.
In this subsection, we present a novel algorithm that adaptively switches between the
approximate (Cherno↵ tau-leap) and the exact (SSA) method to generate a whole
141
hybrid path. Algorithm 8 presents this idea.
On the one hand, a path generated by an exact method never exits the lattice,
Zd+ , although the computational work could be una↵ordable due to many small inter-
arrival times typically occurring when the process is “far” from the boundary. On the
other hand, a tau-leap path, which may be cheaper than an exact one, could leave
the lattice at any step. It depends on the size of the next time step and the current
state of the approximate process, X̄(t). This one-step exit probability could be large,
especially when the approximate process is “close” to the boundary. In Section 5.2
we show how to control this one-step exit probability adaptively, by adjusting the
tau-leap step size. As we previously mentioned, a hybrid path consists of a certain
number of exact and approximate steps. A hybrid path could therefore leave the
lattice. In Section 5.3.3 we show how to estimate and control the probability of this
event.
Given a problem, Algorithm 8 returns the last system state, X̄(tK ), and its respec-
tive time tK , such that the process belongs to the lattice. At each time, tk , Algorithm
7 chooses the method to use (exact or approximate) for taking the (k+1) th step
and its size.
Once we introduce the hybrid approximate process, X̄, one issue is to estimate the
¯ be the sample space for
probability that one hybrid path exits the lattice, Zd+ . Let ⌦
the set of all hybrid paths generated by Algorithm 8. The event A = {¯ ¯ : tK =T }
!2⌦
¯ = Ac [ A = {¯
⌦ ¯ : tK < T } [ {¯
!2⌦ ¯ : tK = T }.
!2⌦
142
Algorithm 8 The hybrid tau-leap algorithm. Inputs: the initial state, X(0), the
propensity functions, (aj )Jj=1 , the stoichiometric vectors, ⌫=(⌫j )Jj=1 , and the final
time, T . Outputs: a sequence of states, (X̄(tk ))K k=0 ⇢ Z+ such that tK T . If
d
tK < T , then the path exited the Z+ lattice before the final time T . It also returns the
d
number of times, NT L , the tau-leap method was successfully applied (i.e., X̄(tk ) 2 Zd+ ,
apply the tau-leap method obtaining X̄(tk+1 ) 2 Zd+ ) , the number of SSA steps such
that K1 /a0 (X̄(t)) > tk t is true, NSSA,K1 , and the number of SSA steps such that
K1 /a0 (X̄(t)) > tk t is false and K2 (X̄(t))/a0 (X̄(t)) > ⌧Ch is true, NSSA,K2 (see
Algorithm 7). Notes: given the current state, nextSSA computes the next state using
the SSA method. Here, ti denotes the current time at the i-th step.
1: i 0, ti 0, X̄(ti ) X(0), Z̄ X(0)
2: while ti < T do
3: T0 next grid point greater than ti
4: (m, ⌧ ) Algorithm 7 with (ti , Z̄, (aj (Z̄))Jj=1 , T0 )
5: if m = SSA then
6: NSSA NSSA + 1
7: if ti +⌧ < T then
8: Z̄ nextSSA (Z̄)
9: end if
10: ti+1 min{T, ti +⌧ }
11: else
12: ⌧ min{⌧, T ti }
13: Z̄ Z̄ + P(a(Z̄)⌧ )⌫
14: if Z̄ 2 Zd+ then
15: NTL NTL + 1
16: ti+1 ti + ⌧
17: else
18: return ((X̄(tk ))ik=0 , NTL , NSSA )
19: end if
20: end if
21: i i+1
22: X̄(ti ) Z̄
23: end while
24: return ((X̄(tk ))ik=0 , NTL , NSSA )
P
Let tk = k ⌧k , where each ⌧k is obtained using either SSA or the tau-leap method,
! 2 A} , {9k 2 N : tk = T }
{¯
+1
[
, ({9k 2 N : tk = T } \ {NT L = n}) .
n=0
143
Then, we can write
+1
X
P (A) = P ({9k 2 N : tk = T, NT L = n})
n=0
+1
X
= P {9k 2 N : tk = T NT L = n} P (NT L = n) .
n=0
At each step, the probability of exiting the lattice will be less than when the step
size is computed using the Cherno↵ method (Algorithm 5), and it will be equal to
zero if the SSA is adopted. At this stage, it should be pointed out that if we use the
Gaussian approximation (Algorithm 6), it will not be possible to guarantee an upper
bound for the probability of event A. Observe that
P {9k 2 N : tk = T NT L = n} (1 )n
because
That is, if the path reached time T , and NT L = n, then the Cherno↵ algorithm was
successfully applied n times. By definition,
+1
X ⇥ ⇤
P (A) = P {9k 2 N : tk = T NT L = n} P (NT L = n) E (1 ) NT L .
n=0
144
Moreover, for small values of , using a second-order Taylor approximation for the
function (1 )NT L and taking expectations, we obtain the following:
⇥ ⇤ 2 ⇥ ⇤
E (1 ) NT L = 1 E [NT L ] + (E NT2 L E [NT L ]) + o( 2 ).
2
2 ⇥ ⇤
P (Ac ) E [NT L ] (E NT2 L E [NT L ]) + o( 2 ).
2
Z T Z T
1
a0 (X(s))ds = ds
0 0 E [ t|X(s)]
This allows us, for example, to approximate CostSSA (T OL), i.e., the computa-
tional work that the SSA method requires to estimate E [g(X(T ))] for a given tolerance
(T OL). This remark is used below in Algorithm 9.
145
5.4 Error Decomposition, Estimation and Control
In this section, we define the computational global error, E, and show how it can be
naturally decomposed into three components: the discretization error, EI , and the
exit error, EE , both coming from the tau-leap part of the hybrid method, and the
Monte Carlo statistical error, ES . Next, we show how to model and control the global
error, E, giving upper bounds for each one of the three components. Finally, given
a prescribed tolerance, T OL, we present a procedure for obtaining the parameters
needed for estimating E [g(X(T ))] by sampling hybrid paths. These parameters are
the time mesh, (tk )K
k=0 (T OL), the one-step exit probability bound, (T OL), and the
As we already mentioned, the main goal of this work is to estimate accurately and
efficiently the expected value E [g(X(T ))], where X : [0, T ]!Zd+ is a Markov pure
jump process and g : Rd !R is a smooth observable of the process at final time T .
We propose the following estimator:
M
1 X
g(X̄(T ))1A (¯
!m ), (5.24)
M m=1
where X̄ : [0, T ]!Zd is the hybrid approximate process introduced in Section 5.3.2,
¯ The set A ⇢ ⌦
¯ 2 ⌦.
and ! ¯ was defined in Section 5.3.3. We recall that 1A (¯
!m ) = 1
if and only if the m-hybrid path did not exit Zd+ .
We define the computational global error, E, as
M
1 X
E := E [g(X(T ))] g(X̄(T ))1A (¯
!m ). (5.25)
M m=1
146
We can split E into three parts:
1
PM ⇥ ⇤
E [g(X(T ))] M m=1 g(X̄(T ))1A (¯
!m ) = E g(X(T )) g(X̄(T )) 1A +
| {z }
=:EI
M
1 X ⇥ ⇤
E [g(X(T ))1Ac ] + E g(X̄(T ))1A g(X̄(T ))1A (¯
!m ) .
| {z } M m=1
=:EE | {z }
=:ES
Here, EI and ES are the discretization and Monte Carlo statistical errors, respec-
tively, and they are associated with the hybrid paths, X̄ on A. EE is the global
exit error. We observe that the error term, EE , is defined as the expected value of
¯ More specifically, we set
g(X(T ))1Ac , which is a random variable defined on ⌦ ⇥ ⌦.
EE such that
|EE | = min |EE (P )|,
P 2P
¯ By choosing P 2 P as the
where P is the set of all probability measures on ⌦ ⇥ ⌦.
product probability measure, we have that g(X(T )) and 1Ac are independent random
variables. As a consequence,
An approximate upper bound, B, for |E [g(X(T ))] | could be obtained, for instance,
as the 95% quantile of a bootstrap sample for |A (g(X(T ); ·) |. As we showed in
Section 5.3.3, P (Ac ) can be approximated by E [NT L ]. Therefore, B A (NT L ; ·) is
an approximated upper bound for |EE |, where A (NT L ; ·) is the estimator of E [NT L ].
⇥ ⇤
The discretization error, EI = E g(X(T )) g(X̄(T )) 1A , is actually the weak
error associated with the hybrid paths in A. An efficient procedure for accurately
estimating this quantity in the context of the tau-leap method is described in [17].
This procedure computes EI (¯ ¯ ))K
! ) for every simulated hybrid path, (X̄(tk , ! k=0 , as a
'K = rg(X̄K )
where r is the gradient operator and Ja (X̄k )=[@i aj (X̄k )]j,i is the Jacobian matrix of
the propensity function, aj , for j=1 . . . J and i=1 . . . d. Then, we have,
K J
!
X ⌧k X
EI (¯
!) = 'k 1T L (k) ⌫jT aj,k (¯
! ).
k=1
2 j=1
Here, X̄k ⌘ X̄(tk ), ⌧k =tk+1 tk , aj,k =aj (X̄k+1 ) aj (X̄k ), 1T L (k)=1 if and only if,
at time tk , the tau-leap method was used and Id is the d ⇥ d identity matrix.
We model the Monte Carlo statistical error, ES , as a Gaussian random variable
that has zero mean and variance Var [g(X(T ))] /M , which could be controlled by
obtaining a rough estimate of Var [g(X(T ))]. The sample variance is denoted as
p
S 2 (Y ; M ) :=A (Y 2 ; M ) A (Y ; M )2 . Therefore, CA S 2 (g(X(T )); ·) /M is used as
an estimation of ES , where CA 2 is a desired confidence level.
Given a tolerance, T OL, we would like to have a procedure that determines whether
we should use the SSA method or the hybrid one. This decision should be based on the
expected computational work of both methods, and the procedure should provide, in
any case, the necessary elements for computing the estimator. When the SSA method
148
is chosen, the procedure should provide the number of simulations, MSSA (T OL). On
the contrary, when the hybrid method is chosen, the procedure should provide not
only the number of simulations, MHyb (T OL), but also the time mesh, (tk )K
k=0 (T OL),
and the one-step exit probability bound, (T OL). Let us describe such a decision
procedure. The building block of a hybrid path is Algorithm 7, which adaptively
determines whether to use an SSA step or a tau-leap one. According to this algorithm,
given the current state of the approximate process, x, there are two ways of taking an
SSA step, depending on the logical conditions K1 /a0 (x) > T0 t and K2 (x, )/a0 (x) >
⌧Ch . The first way is when K1 /a0 (x) > T0 t is true. In this case, we take an SSA step
and avoid the computation of ⌧Ch (x). The second is when K1 /a0 (x) > T0 t is false
and K2 (x, )/a0 (x) > ⌧Ch is true; but in this case, we have to compute ⌧Ch (x). We
consider one particular hybrid path, and we let NSSA,K1 (h, ) be the number of SSA
steps such that K1 /a0 (x) > T0 t is true. In the same way, let NSSA,K2 (h, ) be the
number of SSA steps such that K1 /a0 (x) > T0 t is false and K2 (x, )/a0 (x) > ⌧Ch is
true. Finally, let NTL (h, ) be the total number of tau-leap steps. We define (h, )
as the expected work of a hybrid path, i.e.,
where CA 2.
Instead of solving 5.27, we proceed as follows. First, we fix h = h0 and derive
Maux and ✏0 as functions of h0 and .
p !2
@h (h0 , ) C S 2 (g(X(T )); M )
A s
Maux (h0 , ) = . , (5.28)
(h0 , ) 2@h EI (h0 , )
p
S 2 (g(X(T )); ·)
and ✏0 (h0 , ) = EI (h0 , ) + CA p .
Maux (h0 , )
p !2
CA S 2 (g(X(T )); Ms )
MHyb (h0 , ) = . (5.29)
T OL T OL2 EI (h0 , )
@h (h0 , )
The estimation of (h0 , )
and @h EI (h0 , ) in (5.28) deserves some remarks.
@h (h0 , )
First, note that (h0 , )
= @h log( (h0 , )). In a pure tau-leap regime, the number
of steps is approximately inversely proportional to the size of the mesh. We therefore
have E [NT L (h)] =O (h 1 ). In a hybrid regime, we model E [ (h, )] =O (ha ). There-
fore, for large values of h, a plausible model for log( (h, )) is a log(h) + b. We denote
it with ˜ (h; a, b). See Algorithm 9 for details. An initial guess for a is 1.
For the estimation of @h EI (h0 , ), we simply take numerical derivatives when con-
secutive meshes are available as follows:
In this section, we present two examples to illustrate the performance of our proposed
method.
152
Algorithm 10 Auxiliary function for Algorithm 9. Inputs: same as Algorithm 8,
and constant r0 , used to control the total computational work of the algorithm (bud-
get). Outputs: the estimated runtime of the hybrid path, ˆ , the total accumulated
runtime, r, an estimate of Var [g(X(T ))], S 2 g(X̄(T ); · , an estimate of E [g(X(T ))],
A g(X̄(T ); · , an estimate of E [EI ], A (EI ; ·), an estimate of the expected number of
steps needed by the SSA method, A (NSSA⇤ ; ·)) and A (NTL ; ·). Here, 1T L (k) = 1 if
and only if the decision at time tk was tau-leap. Notes: the values C1 , C2 and C3
are defined in Section 5.3.1. Set appropriate values for M0 and CV0 . For the sake of
simplicity, we omit the arguments of the algorithms when there is no risk of confusion.
1: M M0 , cv 1, r 0, Mf 0
2: while cv > CV0 and r r0 do
3: for m 1 to M do
4: ((X̄(tk ))Kk=0 , NTL , NSSA,K1 , NSSA,K2 ) Algorithm 8
5: if the path does not exit Zd+ then
6: Mf Mf + 1
7: Compute g(X̄(T ; ! ¯ m ))
8: EI Algorithm 11
9: Use remark 5.3.3PJ for PK estimating NSSA⇤ (¯ !m )
10: CP oi (¯
!m ) j=1 C
k=0 P (a j ( X̄(t k ))(t k+1 tk ))1T L (k)
11: end if
12: end for
13: Estimate the coefficients of variation cvg and cvEI of the estimators of
Var [g(X(T ))] and E [EI ], respectively.
14: cv max{cvg , cvEI }
15: ˆ C1 A (NSSA,K1 ; Mf ) +C2 A (NSSA,K2 ; Mf ) +C3 A (NTL ; Mf ) +A (CP oi ; Mf )
16: r r + Mf ˆ
17: M 2M
18: end while
19: return ( ˆ , r, S 2 g(X̄(T )); Mf , A {g(X̄(T )), EI , NSSA⇤ , NTL }; Mf )
The classical radioactive decay model provides a simple and important example for
the application of the hybrid method. This model has only one species and one
reaction,
c
X ! ;.
153
Algorithm 11 Computes the discretization error, EI ⌘ EI (¯ !m ). Inputs: (X̄(tk ))K
k=0 .
Here, 1T L (k) = 1 if and only if the decision at time tk was tau-leap, and Id is the
d ⇥ d identity matrix Output: EI (¯ !m ).
1: EI 0
2: Compute 'K rg(X̄(tK ))
3: for k K 1 to 1 do
4: tk tk+1 tk
5: Compute Ja (X̄(tk )) = [@i aj (X̄(tk ))]j,i
6: 'k Id + tk JTa (X̄(tk )) ⌫ T 'k+1
7: ak a(X̄(tk+1 )) a(X̄(tk ))
8: EI EI + 2tk ( ak 1T L (k) ⌫ T ) 'k
9: end for
10: return EI
⌫= 1 and a(X) = c X.
0
Decay model, T = 2
10
τ Chernoff, E[τ SSA|X0]
−1
10
−2
10
X0= 5
X0=10
−3
10 X0=15
X0=20
E[τ SSA|X0]
−4
10 −20 −15 −10 −5 0
10 10 10 10 10
δ
Figure 5.8: Cherno↵ step size, ⌧Ch , as a function of , for x0 2 {5, 10, 15, 20}, com-
pared to E [⌧SSA |X0 ]. For x0 fixed, we can observe two regimes delimited by the dotted
lines. Above the dotted line, the Cherno↵ tau-leap method is preferred and below
the line, the preferred method is the SSA.
In Figure 5.9, we show ⌧Ch as a function of x0 , using a log scale on the x-axis, for
di↵erent values of . It is interesting to observe that the maximum value of ⌧Ch is 1,
even when the final time is T = 2. This is influenced by the propensity function and
the value of c. For smaller values of c, the maximum increases. This figure shows
that when x0 is small, the values of ⌧Ch decrease rapidly and become much smaller
than ⌧SSA . As we mentioned, to be close or far from the boundary is a relative notion
and it must be seen according to the probability of exiting the lattice. For instance,
155
when x0 = 10, we have that ⌧SSA is approximately equal to ⌧Ch for = 10 5 , which
is greater than the values of typically needed to achieve small tolerances. In the
figure, we can see that, when x0 tends to 1 (its minimum value), the expected ⌧SSA
tends to 1 and it is greater than ⌧Ch . This shows that, as we are getting closer to the
boundary by decreasing x0 , the ⌧Ch becomes too small. On the contrary, when x0
increases, the Cherno↵ tau-leap step size becomes larger and, therefore, the tau-leap
method is preferred.
Decay model, T = 2
1
0.9
0.8
τ Chernoff, E[τ SSA|X0]
0.7
δ=1e−17
0.6
δ=1e−14
0.5 δ=1e−11
0.4 δ=1e−08
δ=1e−05
0.3 δ=0.001
0.2 δ= 0.01
δ= 0.1
0.1 E[τ SSA|X0]
0 0 2 4 6
10 10 10 10
X0
Figure 5.9: Cherno↵ step size, ⌧Ch , as a function of x0 for di↵erent values of . We
observe two regimes: as x0 decreases, the SSA method is preferred; as x0 increases,
the Cherno↵ tau-leap is preferred.
Consider the initial condition X0 =100 and final time T =2. We can observe that
the process starts at a regime where the expected SSA step size is smaller than the
Cherno↵ tau-leap, but after a certain time, it is the opposite. In Figure 5.10, we show
20 SSA paths and 20 hybrid paths.
Now, we consider the initial condition, X0 =105 , and the final time, T =0.5. In
this case, the process starts far from the boundary. First, we observe in Figure 5.11
156
20 exact paths 20 Hybrid paths, δ = 1.0e−04
100 100
90 X 90
80 80
Number of particles
Number of particles
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
Time Time
Figure 5.10: Left: 20 SSA paths for the simple decay model with X0 =100 and T =2.
Right: 20 hybrid trajectories, with linear interpolation between sample points (time
steps). We can observe that, near the x-axis, the hybrid algorithm takes more SSA
steps and fewer tau-leap steps.
that the SSA paths are very close to each other; that is, the variance of g(X(T ))
is small. We analyze an ensemble of five independent realizations of the calibration
x 10
4 20 exact paths 4
x 10
20 exact paths
10
6.15
9.5
Number of particles
Number of particles
9
6.1
8.5
7.5
6.05
6.5
X 6 X
6
0 0.1 0.2 0.3 0.4 0.5 0.495 0.496 0.497 0.498 0.499 0.5
Time Time
Figure 5.11: Left: 20 SSA paths for the simple decay model with X0 =105 and T =0.5.
Right: Details.
algorithm (Algorithm 9), using di↵erent relative tolerances. In Figure 5.12, we show,
in the left panel, the total predicted work (runtime) given by the calibration algorithm
for both methods, the hybrid and the SSA, versus the estimated error bound, and
its corresponding confidence intervals at the 95% level. The method chooses for the
hybrid algorithm for the first three tolerances (largest) and the SSA for the two
smaller ones. For the fourth tolerance, the method chooses the hybrid in 80% of the
runs and SSA for the rest (see Table 5.2). Note that as T OL decreases, the hybrid
157
path converges to the exact one because goes to 0 (see Appendix 5.A). In the right
panel, we show, for di↵erent tolerances, the actual work (runtime) of both methods,
using a 12 core Intel GLNXA64 architecture and MATLAB version R2012b. The
actual runtimes are in accordance with our predictions.
Predicted work vs. Error bound, Decay model Predicted/Actual work vs. Error bound, Decay model
SSA SSA predicted
Hybrid Hybrid predicted
−3
slope 1/2 reference SSA
10
−3 Hybrid
Error bound
Error bound
10
−4
10
−4
10
0 1 2 3 0 1 2 3
10 10 10 10 10 10 10 10
Predicted work (runtime, seconds) Work (runtime, seconds)
Figure 5.12: Left: Predicted work (runtime) versus the estimated error bound for
X0 =105 and T =0.5. The hybrid method is preferred over the SSA one for the first
three tolerances (larger ones). For the last two tolerances, the SSA is preferred.
Therefore, in that case, the total predicted runtime is the same for the hybrid and
SSA methods. Right: Predicted and actual work (runtime) versus the estimated error
bound.
In the simple decay model, where an explicit expression for E [g(X(T ))] is avail-
able, we can accurately compute the ratio between the estimated weak error and EI ,
which we call efficiency index of the discretization error. We compute this quantity
when the preferred method is the hybrid one. Recall that
⇥ ⇤
EI = E g(X(T )) g(X̄(T )) 1A .
In order to compute that quantity, for each run of the calibration algorithm (Algo-
rithm 9), we use a large sample, in order to control the statistical error. The sample
size is such that, the statistical error, in the estimation of EI , is ten times smaller
than the prescribed tolerance. In Figure 5.13 we show the efficiency index of the
discretization error, with confidence intervals at 95%. In the same figure we also
show T OL versus the actual computational error. It can be seen that the prescribed
158
T OL Method (T OL) h(T OL) M (T OL)
SSA HYB
3.13e-03 0.00 1.00 3.05e-08 2.0e-03 5.0
1.56e-03 0.00 1.00 3.81e-09 9.8e-04 1.6e+01
7.81e-04 0.00 1.00 4.77e-10 4.9e-04 6.4e+01
3.91e-04 0.80 0.20 5.96e-11 2.4e-04 2.6e+02
1.95e-04 1.00 0.00 - - -
9.77e-05 1.00 0.00 - - -
ŴHyb
T OL MSSA ŴSSA
A (NTL ; ·) A (NSSA⇤ ; ·)
3.13e-03 3.0 0.20 ±0.03 2.6e+02 3.9e+04
1.56e-03 1.2e+01 0.37 ±0.05 5.1e+02 3.9e+04
7.81e-04 4.6e+01 0.57 ±0.09 1.0e+03 3.9e+04
3.91e-04 1.8e+02 0.97 ±0.05 2.0e+03 3.9e+04
1.95e-04 7.2e+02 1.00 - 3.9e+04
9.77e-05 2.9e+03 1.00 - 3.9e+04
Table 5.2: Details for an ensemble of five independent runs of Algorithm 9 for the
simple decay model with X0 =105 and T =0.5. For example, the third row of the table
tells us that we should run M =64 hybrid paths, with a time mesh of size h=4.9 · 10 4
and a one-step exit probability bound of =4.77 · 10 10 . The work of the hybrid
method is, in average, 57% of the work of the SSA (thrid column in the second part
of the table). Here ŴHyb := MHyb ˆ and ŴSSA := MSSA C1 A (NSSA⇤ ; ·). The fourth
row shows, in the second and third columns, that in 4 runs of Algorithm 9 the SSA
method is chosen, and in one run the hybrid method. In that case, we should simulate
MSSA =180 SSA paths or M =260 hybrid paths. Confidence intervals at 95% level are
also provided.
c
1 2 c
; ! R, R ! R+P
3c 4 c
2P ! D, R ! ;
5 c
P ! ;
159
TOL vs. Estimated over actual weak error TOL vs. Total error 1.2
Total error
1.2
−4
10
1.1
1 −5
10
0.9
0.8 −6
10
−3 −4 −3
10 10 10
TOL TOL
Figure 5.13: Left: Efficiency index for EI and 95% confidence intervals. Right: T OL
versus the actual computational error. The numbers above the straight line show the
percentage of runs that had errors larger than the required tolerance. We observe
that in all cases the computational error follows the imposed tolerance closely with
the expected confidence of 95%.
0 1 0 1
1 0 0 c1
B C B C
B C B C
B 0 1 0 C B c R C
B C B 2 C
B C B C
⌫=B
B 0 2 1 C and a(X) = B c3 P (P 1) C
C B
C,
B C B C
B C B C
B 1 0 0 C B c4 R C
@ A @ A
0 1 0 c5 P
respectively, where X(t) = (R(t), P (t), D(t)), and c1 =25, c2 =103 , c3 =0.001, c4 =0.1,
and c5 =1. In the simulations, the initial condition is (0, 0, 0) and the final time is
T =1. The observable is given by g(X) = X3 = D.
We can see that the abundance of the mRNA species, represented by R, is close
to zero for t 2 [0, T ]. Therefore, this can be interpreted that the process is close
to the boundary. However, according to Table 5.3, the calibration algorithm always
chooses the hybrid method only in the first two tolerances. This happens because
small tolerances induce small one-step exit probabilities, and, as a consequence, the
Cherno↵ tau-leap steps are smaller than the expected SSA steps. This suggests that
160
the reduced abundance of one of the species is not enough to ensure that the SSA
method should be used. The tolerance also plays a role in this choice.
In Figure 5.14, we show an ensemble of five independent realizations of the calibra-
tion algorithm and the comparisons of its corresponding predicted and actual work.
We can appreciate the robustness of the calibration procedure. We can also observe
that the hybrid method converges to the SSA one when the tolerance goes to zero.
Predicted work vs. Error bound, Genes model Predicted/Actual work vs. Error bound, Genes model
−1 −1
10 10
SSA SSA predicted
Hybrid Hybrid predicted
slope 1/2 reference SSA
Hybrid
Error bound
Error bound
−2
10
−2
10
1 2 3 4 1 2 3 4
10 10 10 10 10 10 10 10
Predicted work (runtime, seconds) Work (runtime, seconds)
Figure 5.14: Left: Predicted work (runtime) versus the estimated error bound for
the gene transcription and translation model. The hybrid method is preferred over
the SSA one, for the first two tolerances (larger ones). For the last four tolerances,
the SSA is preferred. Therefore, in the latter case, the total predicted runtime is the
same for the hybrid and SSA methods. Right: Predicted and actual work (runtime)
versus the estimated error bound.
In Figure 5.15 we show the efficiency index of the discretization error, with con-
fidence intervals at 95%. In the same figure we also show T OL versus the actual
computational error. It can be seen that the prescribed tolerance is achieved with
the required confidence of 95%, since CA =1.96.
5.6 Conclusions
In this work, we addressed the problem of accurately estimating the expected value of
an observable of a Markov pure jump process at a given final time within a certain pre-
scribed tolerance with high probability. Examples of settings where such estimation
161
T OL Method (T OL) h(T OL) M (T OL)
SSA HYB
1.00e-01 0.00 1.00 8.0e-05 ±2e-05 2e-02 ±2e-03 66 ±3
5.00e-02 0.00 1.00 1.0e-05 ±2e-06 7e-03 ±7e-04 230 ±8
2.50e-02 0.40 0.60 1.1e-06 ±5e-07 3e-03 ±7e-04 840 ±70
1.25e-02 0.80 0.20 1.9e-07 2.0e-03 3e+03
6.25e-03 1.00 0.00 - - -
3.13e-03 1.00 0.00 - - -
ŴHyb
T OL MSSA ŴSSA
A (NTL ; ·) A (NSSA⇤ ; ·)
1.00e-01 3.5e+01 0.39 ±0.04 7e+01 ±1e+01 1.8e+04
5.00e-02 1.4e+02 0.54 ±0.10 1.4e+02 ±2e+01 1.8e+04
2.50e-02 5.5e+02 0.88 ±0.10 3.2e+02 ±9e+01 1.7e+04
1.25e-02 2.2e+03 0.99 ±0.02 4.9e+02 1.8e+04
6.25e-03 8.8e+03 1.00 - 1.8e+04
3.13e-03 3.5e+04 1.00 - 1.8e+04
Table 5.3: Details for an ensemble of five independent runs of Algorithm 9 for the
gene transcription and translation model. Details on how to read the table is provided
in Table 5.2.
TOL vs. Estimated over actual weak error TOL vs. Total error 1.6
Estimated over actual weak error
1.8 2.4
2.4
1.6 −2
0.2
10
1.4
Total error
1.2
−4
10
1
0.8
−6
0.6 10
−2 −1 −2 −1
10 10 10 10
TOL TOL
Figure 5.15: Left: Efficiency index for EI and 95% confidence intervals. Right: T OL
versus the actual computational error. The numbers above the straight line show the
percentage of runs that had errors larger than the required tolerance. We observe
that in all cases the computational error follows the imposed tolerance closely with
the expected confidence of 95%.
Acknowledgments
The authors are members of the KAUST SRI Center for Uncertainty Quantification
in the division of Computer, Electrical and Mathematical Sciences and Engineering at
King Abdullah University of Science and Technology (KAUST). The authors would
like to thank Jesper Karlsson for many interesting discussions at the early stages of
this work.
164
Appendix
Let E [NTL (h, )] be the expected number of tau-leap steps of a hybrid path with a
mesh of size h and a one-step exit probability bound, . Let {Ti } be the sequence of
grid points, t the current time and X̄(t) the current state of the hybrid process, X̄.
Let ⌧Ch (X̄(t), ) be the Cherno↵ tau-leap step size computed using Algorithm 5, and
finally K1 , and K2 = K2 (X̄(t), ⌧Ch ) are the ones introduced in Section 5.3.1.
According to Algorithm 7, the logical conditions for choosing a tau-leap step are
given by
K1 K2
< Ti t, and < ⌧Ch (X̄(t), ).
a0 (X̄(t)) a0 (X̄(t))
The e↵ective step size in this case is given by min{⌧Ch (X̄(t), ), Ti t}. Observe
⇢
K2
that <⌧Ch (X̄(t), ) ! ; as ! 0, because K2 ! C̃ and ⌧Ch ! 0 (see
a0 (X̄(t))
165
Section 5.3.1). By the definition of NT L , we have that
E [NTL (h, )]
2 ⇢ 3
K1 K2
Z
6X Ti 1 a0 (X̄(t)) < Ti t, a0 (X̄(t)) < ⌧Ch (X̄(t), ) 7
= E6 4 dt7
5
i Ti 1 min{⌧Ch (X̄(t), ), Ti t}
2 ⇢ 3
K1 K2
6 XZ Ti 1
a0 (X̄(t))
< Ti t,
a0 (X̄(t))
< ⌧Ch (X̄(t), ) 7
E6
4 dt7
5
Ti K1 K2
i 1 min{ , }
a0 (X̄(t)) a0 (X̄(t))
2 ⇢ 3
K2
Z a0 (X̄(t))1 <⌧Ch (X̄(t), )
6X Ti
a0 (X̄(t)) 7
E6
4 dt7
5 ! 0, as ! 0.
i Ti 1
min{K1 , K2 }
It is also true that E [NT L ] has a polynomial bound since a0 is polynomial and
2 ⇢ 3
K2 RT ⇥ ⇤
6 XZ Ti a0 (X̄(t))1
a0 (X̄(t))
<⌧Ch (X̄(t), ) 7 E a0 (X̄(t)) dt
E6
4 dt7
5
0
.
i Ti 1
min{K1 , K2 } min{K1 , K2 }
Finally, for the problems where maxx2Zd a0 (x) < 1, we get the rough upper
+
bound RT ⇥ ⇤
0
E a0 (X̄(t)) dt T
max a0 (x).
min{K1 , K2 } min{K1 , K2 } x2Zd
+
Observe that Zd+ can substituted by Zd+ (x0 , T ) ⇢ Zd+ defined by the subset of states
that can be reached by a path starting from x0 and evolving up to time T . Therefore,
we have an upper bound for E [NT L ] that does not depend on . When the lattice is
finite as in the exponential decay (Example 5.1.2), this bound is c T x0 /K2 .
166
REFERENCES
[1] S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Conver-
gence (Wiley Series in Probability and Statistics), 2nd ed. Wiley-Interscience,
9 2005.
[3] A. Voter, “Introduction to the kinetic Monte Carlo method,” Radiation E↵ects
in Solids, pp. 1–23, 2007.
[5] T. Li, “Analysis of explicit tau-leaping schemes for simulating chemically reacting
systems,” Multiscale Model. Simul., vol. 6, no. 2, pp. 417–436 (electronic), 2007.
[8] ——, “Efficient step size selection for the tau-leaping simulation method.” The
Journal of Chemical Physics, vol. 124, no. 4, p. 044109, 2006.
[13] T. Tian and K. Burrage, “Binomial leap methods for simulating stochastic chem-
ical kinetics,” The Journal of Chemical Physics, vol. 121, no. 21, pp. 10 356–
10 364, 2004.
[17] J. Karlsson and R. Tempone, “Towards automatic global error control: Com-
putable weak error expansion for the tau-leap method,” Monte Carlo Methods
and Applications, vol. 17, no. 3, pp. 233–278, March 2011.
[18] T. G. Kurtz, “The relationship between stochastic and deterministic models for
chemical reactions,” The Journal of Chemical Physics, vol. 57, no. 7, p. 2976,
1972.
[23] J. Ahrens and U. Dieter, “Computer methods for sampling from gamma, beta,
Poisson and bionomial distributions,” Computing, vol. 12, pp. 223–246, 1974.
[24] D. F. Anderson and M. Koyama, “Weak error analysis of numerical methods for
stochastic models of population processes,” Multiscale Modeling and Simulation,
vol. 10, no. 4, pp. 1493–1524, 2012.
[26] D. Anderson and D. Higham, “Multilevel Monte Carlo for continuous Markov
chains, with applications in biochemical kinetics,” SIAM Multiscal Model. Simul.,
vol. 10, no. 1, 2012.
169
Chapter 6
Abstract
In this work, we extend the hybrid Cherno↵ tau-leap method to the multilevel Monte
Carlo (MLMC) setting. Inspired by the work of Anderson and Higham on the tau-
leap MLMC method with uniform time steps, we develop a novel algorithm that is
able to couple two hybrid Cherno↵ tau-leap paths at di↵erent levels. Using dual-
weighted residual expansion techniques, we also develop a new way to estimate the
variance of the di↵erence of two consecutive levels and the bias. This is crucial
because the computational work required to stabilize the coefficient of variation of
the sample estimators of both quantities is often una↵ordable for the deepest levels
of the MLMC hierarchy. Our method bounds the global computational error to be
below a prescribed tolerance, T OL, within a given confidence level. This is achieved
with nearly optimal computational work. Indeed, the computational complexity of
our method is of order O (T OL 2 ), the same as with an exact method, but with a
1
A. Moraes, R. Tempone and P. Vilanova, “Multilevel Hybrid Cherno↵ Tau-Leap”, Accepted for
publication in BIT Numerical Mathematics, (2015).
170
smaller constant. Our numerical examples show substantial gains with respect to the
previous single-level approach and the Stochastic Simulation Algorithm.
6.1 Introduction
This work, inspired by the multilevel discretization schemes introduced in [1], ex-
tends the hybrid Cherno↵ tau-leap method [2] to the multilevel Monte Carlo setting
[3]. Consider a non-homogeneous Poisson process, X, taking values in the lattice
of non-negative integers, Zd+ . We want to estimate the expected value of a given
observable, g : Rd ! R of X, at a final time, T , i.e., E [g(X(T ))]. For example, in
a chemical reaction in thermal equilibrium, the i-th component of X, Xi (t), could
describe the number of particles of species i present at time t. In the systems modeled
here, di↵erent species undergo reactions at random times by changing the number of
particles in at least one of the species. The probability of a single reaction happen-
ing in a small time interval is modeled by a non-negative propensity function that
depends on the current state of the system. We present a formal description of the
problem in Section 6.1.1.
Pathwise realizations of such pure jump processes (see, e.g., [4]) can be simulated
exactly using the Stochastic Simulation Algorithm (SSA), introduced by Gillespie in
[5], or the Modified Next Reaction Method (MNRM) introduced by Anderson in [6].
Although these algorithms generate exact realizations for the Markov process, X,
they are computationally feasible for only relatively low propensities.
For that reason, Gillespie in [7] and Aparicio and Solari in [8] independently
proposed the tau-leap method to approximate the SSA by evolving the process with
fixed time steps and by keeping the propensity fixed within each time step. In fact, the
tau-leap method can be seen as a forward Euler method for a stochastic di↵erential
equation driven by Poisson random measures (see, e.g., [9]).
171
A drawback of the tau-leap method is that the simulated process may take negative
values, which is an undesirable consequence of the approximation and not a qualitative
feature of the original process. For this purpose, we proposed in [2] a Cherno↵-type
bound that controls the probability of reaching negative values by adjusting the time
steps. Also, to avoid extremely small time steps, we proposed switching adaptively
between the tau-leap and an exact method, creating a hybrid tau-leap/exact method
that combines the strengths of both methods.
More specifically, let x̄ be the state of the approximate process at time t, and let
2 (0, 1) be given. The main idea is to compute a time step, ⌧ =⌧ ( , x̄), such that
the probability that the approximate process reaches an unphysical negative value in
[t, t+⌧ ) is less than . This allows us to control the probability that a entire hybrid
path exits the lattice, Zd+ . In turn, this quantity leads to the definition of the global
exit error, which is a global error component along with the time discretization error
and the statistical error (see Section 6.3.2 for details).
The multilevel Monte Carlo idea goes back at least to [10, 11]. In that setting,
the main goal was to solve high-dimensional, parameter-dependent integral equations
and to conduct corresponding complexity analyses. Later, in [3], Giles developed
and analyzed multilevel techniques that were used to reduce the computational work
when estimating an expected value using Monte Carlo path simulations of a cer-
tain quantity of interest of a stochastic di↵erential equation. Independently, in [12],
Speight introduced a multilevel approach to control variates. Control variates are a
widespread variance reduction technique with the main goal of increasing the preci-
sion of an estimator or reducing the computational e↵ort. The main idea is as follows:
to reduce the variance of the standard Monte Carlo estimator of E [X],
M
1 X
µ̂1 := X(!m ),
M m=1
172
we consider another unbiased estimator of E [X],
M
1 X
µ̂2 := (X(!m ) (Y (!m ) E [Y ])) ,
M m=1
M
1 X
µ̂2 = E [Y ] + (X Y ) (!m ).
M m=1
M0 M1
1 X 1 X
µ̃2 := Y (!m0 ) + (X Y ) (!m1 ).
M0 m =1 M1 m =1
0 1
See Section 6.1.6 for details about the definition of levels in our context.
In this work, we apply Giles’s multilevel control variates idea to the hybrid Cher-
no↵ tau-leap approach to reduce the computational cost, which is measured as the
amount of time needed for computing an estimate of E [g(X(T ))], within T OL, with a
given level of confidence. We show that our hybrid MLMC method has the same com-
putational complexity of the pure SSA, i.e., order O (T OL 2 ). From this perspective,
our method can be seen as a variance reduction for the SSA since our MLMC method
173
does not change the complexity; it just reduces the corresponding multiplicative con-
stant. We note in passing that in [13], the authors show that the computational
complexity for the pure MLMC tau-leap case has order O (T OL 2 (log(T OL))2 ). We
note also that here our goal is to provide an estimate of E [g(X(T ))] in the probability
sense and not in the mean square sense as in [1].
The global error arising from our hybrid tau-leap MLMC method can naturally be
decomposed into three components: the global exit error, the time discretization error
and the statistical error. This global error should be less than a prescribed tolerance,
T OL, with probability larger than a certain confidence level. The global exit error
is controlled by the one-step exit probability bound, [2]. The time discretization
error, inherent to the tau-leap method, is controlled through the size of the mesh, t,
[14]. At this point, it is crucial to stress that, by controlling the exit probability of
the set of hybrid paths, we are indirectly turning this event into a rare event. Thus,
direct sampling of exit paths is not an a↵ordable way to estimate the probability of
such an event.
Motivated by the Central Limit results of Collier et al. [15] for the Multilevel
Monte Carlo estimator (see appendix A, Theorem 1), we approximate the statistical
error with a Gaussian random variable with zero mean. In our numerical experiments,
we tested this hypothesis by employing Q-Q plots and the Shapiro-Wilk test [16].
There, we did not reject the Gaussianity of the statistical error at the 1% significance
level. The variance of the statistical error is a linear combination of the variance
at the coarsest level and variances of the di↵erence of two consecutive levels, which
we sometimes call strong errors. In Section 6.3.3, motivated by the fact that sample
variance and bias estimators are inaccurate on the deepest levels, we develop a novel
dual-weighted residual expansion that allows us to estimate those quantities, cf. (6.7)
and (6.8). We also control the statistical error through the number of coupled hybrid
paths, (M` )L`=0 , simulated at each level.
174
We note that our use of duals in this work is di↵erent from the use in [14]. That
earlier work proposed an adaptive, single-level, tau-leap algorithm for error control,
choosing the time steps non-uniformly to control the global weak error based on dual-
weighted error estimators. In this work, we do not have an adaptive time step based
on dual-weighted error estimators as in [14]. We use instead dual-weighted error
estimators to reduce the statistical error in our error estimates.
To describe the class of Markovian pure jump process, X : [0, T ] ⇥ ⌦ ! Zd+ , that we
use in this work, we consider a system of d species interacting through J di↵erent
reaction channels. For the sake of brevity, we write X(t, !)⌘X(t). Let Xi (t) be
the number of particles of species i in the system at time t. We want to study the
evolution of the state vector,
x ! x + ⌫j .
The probability that reaction j will occur during the small interval (t, t+dt) is then
assumed to be
J
X ✓Z t ◆
X(t) = X(0) + ⌫ j Yj aj (X(s)) ds , (6.2)
j=1 0
Remark 6.1.1. In this setting, the solution of the following system of ordinary dif-
ferential equations,
8
>
< ẋ(t) = ⌫a(x(t)), t 2 R+
>
: x(0) = x0 2 R+ ,
is called mean field solution, where ⌫ is the matrix with columns ⌫j and a(x) is the
column vector of propensities. In Section 6.4, we use the mean field path for scaling
and preprocessing constants associated with the computational work of the SSA and
Cherno↵ tau-leap steps.
176
6.1.2 Description of the Modified Next Reaction Method
(MNRM)
The MNRM, introduced in [6], based on the Next Reaction Method (NRM) [17], is
an exact simulation algorithm like Gillespie’s SSA that explicitly uses representation
(6.2) for simulating exact paths and generates only one exponential random variable
per iteration. The reaction times are modeled with firing times of Poisson processes,
Yj , with internal times given by the integrated propensity functions. The randomness
is now separated from the state of the system and is encapsulated in the Yj ’s. For
Rt
each reaction, j, the internal time is defined as Rj (t)= 0 aj (X(s))ds. There are J+1
time frames in the system, the absolute one, t, and one for each Poisson process,
Yj . Computing the next reaction and its time is equivalent to computing how much
time passes before one of the Poisson processes, Yj , fires, and to identifying which
process fires at that particular time, by taking the minimum of such times. The NRM
and MNRM make use of internal times to reduce the number of simulated random
variables by half. In the following, we describe the MRNM and then we present its
implementation in Algorithm 12.
Given t, we have the propensity aj =aj (X(t)) and the internal time Rj =Rj (t).
Now, let Rj be the remaining time for the reaction, j, to fire, assuming that aj
stays constant over the interval [t, t+ Rj ). Then, t+ Rj is the time when the next
reaction, j, occurs. The next internal time at which the reaction, j, fires is then
given by Rj +aj Rj . When simulating the next step, the first reaction that fires
occurs after = minj Rj . We then update the state of the system according to that
reaction, add to the global time, t, and then update the internal times by adding
aj to each Rj . We are left to determine the value of Rj , i.e., the amount of time
until the Poisson process, Yj , fires, taking into account that aj remains constant until
the first reaction occurs. Denote by Rj the first firing time of Yj that is strictly larger
than Rj , i.e., Pj := min{s>Rj : Yj (s)>Yj (Rj )} and finally Rj = a1j (Pj Rj ).
177
Algorithm 12 The Modified Next Reaction Method. Inputs: the initial state, X(0),
the next grid point, T0 > t0 , the propensity functions, (aj )Jj=1 , the stoichiometric
vectors, (⌫j )Jj=1 . Outputs: the history of system states, (X(tk ))K
k=0 . Here, we denote
S ⌘ (Sj )Jj=1 , P ⌘ (Pj )Jj=1 , and R ⌘ (Rj )Jj=1 .
1: k 0, tk 0, X(tk ) X(0) and R 0
2: Generate J independent, uniform(0, 1) random numbers, rj
3: P (log(1/rj ))Jj=1
4: while tk < T0 do
5: S (aj (X(tk )))Jj=1
6: ( Rj )Jj=1 ((Pj Rj )/Sj )Jj=1
7: µ argminj { Rj }
8: minj { Rj }
9: tk+1 tk +
10: X(tk+1 ) X(tk ) + ⌫µ
11: R R+S
12: r uniform(0, 1)
13: Pµ Pµ + log(1/r)
14: k k+1
15: end while
16: return (X(tl ))kl=01
In this section, we define X̄, the tau-leap approximation of the process, X, which
follows from applying the forward Euler approximation to the integral term in the
178
following random time-change representation of X:
J
X ✓Z t+⌧ ◆
X(t + ⌧ ) = X(t) + ⌫ j Yj aj (X(s)) ds .
j=1 t
The tau-leap method was proposed in [7] to avoid the computational drawback
of the exact methods, i.e., when many reactions occur during a short time interval.
The tau-leap process, X̄, starts from X(0) at time 0, and given that X̄(t)=x̄ and a
time step ⌧ >0, we have that X̄ at time t+⌧ is generated by
J
X
X̄(t + ⌧ ) = x̄ + ⌫j Pj (aj (x̄)⌧ ) ,
j=1
where {Pj ( j )}Jj=1 are independent Poisson distributed random variables with pa-
rameter j, used to model the number of times that the reaction j fires during the
(t, t+⌧ ) interval. Again, this is nothing other than a forward Euler discretization
of the stochastic di↵erential equation formulation of the pure jump process (6.2),
realized by the Poisson random measure with state dependent intensity (see, e.g.,
[9]).
In the limit, when ⌧ tends to zero, the tau-leap method gives the same solution
as the exact methods. The total number of firings in each channel is a Poisson-
distributed stochastic variable depending only on the initial population, X̄(t). The
error thus comes from the variation of a(X(s)) for s 2 (t, t+⌧ ).
We observe that the computational work of a tau-leap step involves the generation
of J independent Poisson random variables. This is in contrast to the computational
work of an exact step, which involves only the work of generating two uniform random
variables, in the case of the SSA, and only one in the case of MNRM.
179
6.1.4 The Cherno↵-Based Pre-Leap Check
In [2], we derived a Cherno↵-type bound that allows us to guarantee that the one-step
exit probability in the tau-leap method is less than a predefined quantity, >0. We
now briefly summarize the main idea. Consider the following pre-leap check problem:
find the largest possible ⌧ such that, with high probability, in the next step, the
approximate process, X̄, will take a value in the lattice, Zd+ , of non-negative integers.
The solution to that problem can be achieved by solving d auxiliary problems, one for
each x-coordinate, i = 1, 2, . . . , d, as follows. Find the largest possible ⌧i 0, such
that !
J
X
P X̄i (t) + ⌫ji Pj aj X̄(t) ⌧i < 0 X̄(t) i, (6.3)
j=1
where i= /d, and ⌫ji is the i-th coordinate of the j-th reaction channel, ⌫j . Fi-
nally, we let ⌧ := min{⌧i : i = 1, 2, . . . , d}. To find the largest time steps, ⌧i , let
P
Qi (t, ⌧i ):= Jj=1 ( ⌫ji )Pj aj X̄(t) ⌧i . Then, for all s>0, we have the Cherno↵
bound:
J
!
X
s⌫ji
P Qi (t, ⌧i )>X̄i (t) X̄(t) inf exp sX̄i (t) + ⌧i aj (X̄(t))(e 1) .
s>0
j=1
where
J
X
a0 (X̄(t)) := aj (X̄(t)).
j=1
We want to maximize ⌧i while satisfying condition (6.3). Let ⌧i⇤ be this maximum.
We then have the following possibilities: If ⌫ji 0, for all j, then naturally ⌧i⇤ = + 1;
otherwise, we have the following three cases:
180
1. Di (si )>0. In this case, ⌧i (si )=0 and Di (s) is positive and increasing as 8s
si . Therefore, ⌧i (s) is equal to the ratio of two positive increasing functions.
The numerator, Ri (s), is a linear function and the denominator, Di (s), grows
exponentially fast. Then, there exist an upper bound, ⌧i⇤ , and a unique number,
s̃i , which satisfy ⌧i (s̃i )=⌧i⇤ . We developed an algorithm in [2] for approximating
s̃i , using the relation ⌧i0 (s̃i )=0.
In this section, we briefly summarize our previous work, presented in [2], on hybrid
paths.
The main idea behind the hybrid algorithm is the following. A path generated by
an exact method (like SSA or MNRM) never exits the lattice, Zd+ , although the com-
putational cost may be una↵ordable due to many small inter-arrival times typically
occurring when the process is “far” from the boundary. A tau-leap path, which may
be cheaper than an exact one, could leave the lattice at any step. The probability
of this event depends on the size of the next time step and the current state of the
approximate process, X̄(t). This one-step exit probability could be large, especially
when the approximate process is “close” to the boundary. We developed in [2] a
Cherno↵-type bound to control the mentioned one-step exit probability. Even more,
by construction, the probability that one hybrid path exits the lattice, Zd+ , can be
estimated by
⇥ ⇤ 2 ⇥ 2 ⇤
P (Ac ) E 1 (1 )NTL = E [NTL ] (E NTL E [NTL ]) + o( 2 ),
2
181
K(¯
!)
¯ 2 A if and only if the whole hybrid path, (X̄(tk , !
where ! ¯ ))k=0 , belongs to the
lattice, Zd+ , >0 is the one-step exit probability bound, and NTL (¯
! )⌘NTL is the
number of tau-leap steps in a hybrid path. Here, Ac is the complement of the set A.
To simulate a hybrid exact/Cherno↵ tau-leap path, we first developed a one-
step switching rule that, given the current state of the approximate process, X̄(t),
adaptively determines whether to use an exact or an approximated method for the
next step. This decision is based on the relative computational cost of taking an
exact step (MNRM) versus the cost of taking a Cherno↵ tau-leap step. We show the
switching rule in Algorithm 13. To compare the mentioned computational costs, we
Algorithm 13 The one-step switching rule. Inputs: the current state of the ap-
proximate process, X̄(t), the current time, t, the values of the propensity functions
evaluated at X̄(t), (aj (X̄(t)))Jj=1 , the one-step exit probability bound , and the next
⇥ ⇤
grid point, T0 . Outputs: method and ⌧ . Notes: based on E ⌧SSA (X̄(t)) X̄(t) =
1/a0 (X̄(t)) and ⌧Ch (X̄(t), ), this algorithm adaptively selects between MNRM and
Cherno↵ tau-leap (TL). We denote by ⌧MNRM (⌧Ch ) the step size when the decision is
to use the MNRM (tau-leap) method.
PJ
Require: a0 j=1 aj > 0
1: if K1 /a0 < T0 t then
2: ⌧Ch compute Cherno↵ step size (see Section 2.2 in [2] )
3: if ⌧Ch < K2 (X̄(t), )/a0 then
4: return (MNRM, ⌧MNRM )
5: else
6: return (TL, ⌧Ch )
7: end if
8: else
9: return (MNRM, ⌧MNRM )
10: end if
define K1 as the ratio between the cost of computing ⌧Ch and the cost of computing
one step using the MNRM method, and K2 =K2 (X̄(t), ) is defined as the cost of
taking a Cherno↵ tau-leap step, divided by the cost of taking a MNRM step plus the
cost of computing ⌧Ch . For further details on the switching rule, we refer to [2].
182
6.1.6 The Multilevel Monte Carlo Setting
In this subsection, we briefly summarize the control variates idea developed by Giles
in [3]. Let {X̄` (t)}t2[0,T ] be a hybrid Cherno↵ tau-leap process with a time mesh of size
t` and a one-step exit probability bound, . We can simulate paths of {X̄` (t)}t2[0,T ]
by using Algorithm 4 in [2]. Let g` :=g(X̄` (T )).
Consider a hierarchy of nested meshes of the time interval [0, T ], indexed by
` = 0, 1, . . . , L. Let t0 be the size of the coarsest time mesh that corresponds
`
to the level `=0. The size of the time mesh at level ` 1 is given by t` =R t0 ,
where R>1 is a given integer constant.
Assume that we are interested in estimating E [gL ], and we are able to simulate
correlated pairs, (g` , g` 1 ) for ` = 1, . . . , L. Then, the following unbiased Monte Carlo
estimator of E [gL ] uses gL 1 as a control variate:
ML
1 X
µ̃L := (gL (!mL ) (gL 1 (!mL ) E [gL 1 ]))
ML m =1
L
ML
1 X
= E [gL 1] + (gL gL 1 )(!mL ).
ML m =1
L
Applying this idea recursively and taking into account the following telescopic de-
P
composition: E [gL ] = E [g0 ] + L`=1 E [g` g` 1 ], we arrive at the multilevel Monte
Carlo estimator of E [gL ]:
M0 L M
1 X X 1 X̀
µ̂L := g0 (!m0 ) + (g` g` 1 )(!m` ). (6.4)
M0 m =1 `=1
M` m =1
0 `
We have that µ̂L is unbiased, since E [µ̂L ] =E [gL ]. The variance of µ̂L is given
P
by Var [µ̂L ] = Var[g
M0
0]
+ L`=1 Var[gM
` g` 1 ]
`
. Here, we are assuming independence among
the batches between levels. For highly correlated pairs, (g` , g` 1 ), we can expect, for
the same computational work, that Var [µ̂L ] is much less than the variance of the
183
standard Monte Carlo estimator of E [gL ].
Let us give a close examination of the problem of estimating Var [g` g` 1 ] for highly
correlated pairs, (g` , g` 1 ). This estimation is required to solve the optimization
problem (6.25), that indicates how to choose the simulation parameters, particularly
the number of simulated coupled paths for each pair of consecutive levels, (M` )L`=0 .
When ` becomes large, due to our coupling strategy developed in Section 6.2, we
expect to obtain g` = g` 1 in most of our simulations, while observing di↵erences only
in a very small proportion of the simulated coupled paths.
For the sake of illustration, let us assume that the random variable ` :=g` g` 1
takes values in the set { 1, 0, 1}, with respective probabilities {p` , 1 2p` , p` }, where
p` goes to zero. The kurtosis of ` is by definition
⇥ ⇤
E ( ` E [ ` ])4
⇥ ⇤ 2 3.
E ( ` E [ ` ])2
Simple calculations show that the kurtosis of ` is (2p` ) 1 , and we observe that 2
` ⇠
Bernoulli(2p` ). The maximum likelihood estimator of 2p` , ✓ˆ` , is the sample average
of M` independent and identically distributed (iid) values of 2` . The coefficient of
h i h i
variation of ✓ˆ` , defined as (Var ✓ˆ` )1/2 (E ✓ˆ` ) 1 , is (2p` M` ) 1 . Therefore, an accurate
estimation of p` requires a sample of size
M` (2p` ) 1 !1.
This lower bound on M` goes strongly against the spirit of the Multilevel Monte Carlo
method, where M` should be a decreasing function of `.
To overcome this difficulty, in Section 6.3.3, we developed a formula based on
184
dual-weighted residuals. The technique of dual-weighted residuals can be motivated
¯ , such that its position at time s, having departed
as follows: consider a process X̄
¯ (s; t, x). Notice that for
from the state x, at a previous time t, is denoted as X̄
¯ (T ; t, x) = X̄
t<s<T , we have that X̄ ¯ (T ; s, X̄
¯ (s; t, x)). Let us define an auxiliary
¯ (T ; t, x)), where g is an observable scalar function of the
function U (t, x) := g(X̄
¯ that started from the state x at the initial time, t. If
final state of the process X̄
¯ , we want to have a computable approximation for
X̄ is a process approximating X̄
g(X̄(T ; 0, x0 )) ¯ (T ; 0, x )). Consider a time mesh, {0=t , t , . . . , t =T }, and
g(X̄ 0 0 1 N
¯
define X̄tn :=X̄(tn ; 0, x0 ), X̄ ¯ ¯
tn+1 :=X̄ (tn+1 ; tn , X̄tn ) and en+1 := X̄tn+1 X̄tn+1 . Observe
that
N
X1
= U (tn+1 , X̄tn+1 ) U (tn , X̄tn )
n=0
X1 ⇣
N ⌘
= ¯ (t ; t , X̄ ))
U (tn+1 , X̄tn+1 ) U (tn+1 , X̄ n+1 n tn
n=0
X1 ✓
N Z 1 ◆
= en+1 · rx U (tn+1 , X̄tn+1 sen+1 )ds
n=0 0
N
X1
= en+1 · rx U (tn+1 , X̄tn+1 ) + r2 U ken+1 k2 + h.o.t. .
n=0
185
N
We can now write a backward recurrence for the dual weights, ( n )n=1 :
¯ (T ; t , X̄ ))
n :=rx U (tn , X̄tn ) = @X̄tn g(X̄ n tn
¯ (T ; t , X̄ ))
= @X̄tn g(X̄ n+1 tn+1
@X̄tn+1
¯ (T ; t , X̄ ))
= @X̄tn+1 g(X̄ n+1 tn+1
@X̄tn
@X̄tn+1
= rx U (tn+1 , X̄tn+1 )
@X̄tn
@X̄tn+1
= n+1
@X̄tn
N :=rg(X̄(T ; 0, x0 )).
In Section 6.2, we first show the main idea for coupling two tau-leap paths, which
comes from a construction by Kurtz [19] for coupling two Poisson random variables.
186
Then, inspired by the ideas of Anderson and Higham in [1], we propose an algorithm
for coupling two hybrid Cherno↵ tau-leap paths (see [2]). This algorithm uses four
building blocks that result from the combination of the MNRM and the tau-leap
methods. In Section 6.3, we propose a novel hybrid MLMC estimator. Next, we
introduce a global error decomposition; and finally, we develop formulae to efficiently
estimate the variance of the di↵erence of two consecutive levels and to estimate the
bias based on dual-weighted residuals. These estimates are particularly useful to
addressing the large kurtosis problem, described in Section 6.1.7, that appears at the
deeper levels and makes standard sample estimators too costly. Next, in Section 6.4,
we show how to control the three error components of the global error and how to
obtain the parameters needed for computing the hybrid MLMC estimator to achieve
a given tolerance with nearly optimal computational work. We also show that the
computational complexity of our method is of order O (T OL 2 ). In Section 6.5,
the numerical examples illustrate the advantages of the hybrid MLMC method over
the single-level approach presented in [2] and to the SSA. Section 6.6 presents our
conclusions and suggestions for future work.
In this section, we present an algorithm that generates coupled hybrid Cherno↵ tau-
leap paths, which is an essential ingredient for the multilevel Monte Carlo estimator.
We first show how to couple two Poisson random variables and then we explain how
we make use of the two algorithms presented in [1] as Algorithms 2 and 3 and two
additional algorithms we developed to create an algorithm that generates coupled
hybrid paths.
187
6.2.1 Coupling Two Poisson Random Variables
We motivate our coupling algorithm (Algorithm 14) by first describing how to couple
two Poisson random variables. In our context, ‘coupling’ means that we want to
induce a correlation between them that is as strong as possible. This construction
was first proposed by Kurtz in [19]. Suppose that we want to couple P1 ( 1 ) and
P2 ( 2 ), two Poisson random variables, with rates 1 and 2, respectively. Consider
the following decompositions,
P1 ( 1 ) := P ⇤ ( 1 ^ 2) + Q1 ( 1 1 ^ 2)
P2 ( 2 ) := P ⇤ ( 1 ^ 2) + Q2 ( 2 1 ^ 2 ),
=| 1 2| .
Var [P1 ( 1 ) P2 ( 2 )] = 1 + 2,
In this section, we describe how to generate two coupled hybrid Cherno↵ tau-leap
¯ , corresponding to two nested time discretizations, called coarse and
paths, X̄ and X̄
fine, respectively. Assume that the current time is t, and we know the states, X̄(t)
¯ (t). Based on this knowledge, we have to determine a method for each level.
and X̄
This method can be either the MNRM or the tau-leap one, determining four possible
combinations leading to four algorithms, B1, B2, B3 and B4, that we use as building
blocks. Table 6.1 summarizes them.
Table 6.1: Building blocks for simulating two coupled hybrid Cherno↵ tau-leap paths.
Algorithms B1 and B2 are presented as Algorithms 2 and 3 in [1]. Algorithm B3 can
be directly obtained from Algorithm B2. Algorithm B4 is also based on Algorithm
B2, but to produce MNRM steps, we update the propensities at the coarse level at
the beginning of each time interval defined by the fine level.
We note that the only case in which we use a Poisson random variates generator
for the tau-leap method is in Algorithm B1. In Algorithms B2 and B3, the Poisson
random variables are simulated by adding independent exponential random variables
with the same rate, , until a given time final time T is exceeded. The rate, ,
is obtained by freezing the propensity functions, a, at time t. More specifically,
the Poisson random variates are obtained by using the MNRM repeatedly without
updating the intensity.
We now briefly describe the Cherno↵ hybrid coupling algorithm, i.e., Algorithm
14. Given the current time, t, and the current state of the process at the coarse level,
¯ (t), this algorithm determines the next time point at which
X̄(t), and the fine level, X̄
we run the algorithm (called time “horizon”). To fix the idea, let us assume that,
189
based on X̄(t), the one-step switching rule, i.e., Algorithm 13, chooses the tau-leap
method at the coarse level, with the corresponding Cherno↵ step size, ⌧¯. As we
mentioned, this ⌧¯ is the largest step size such that the probability that the process,
in the next time step, takes a value outside Zd+ , is less than ¯. This step size plus
the current time, t, cannot be greater than the final time, T , and also cannot be
greater than the next time discretization grid point in the coarse grid, t̄, because the
discretization error must be controlled. Taking the minimum of all those values, we
obtain the next time horizon at the coarse grid, H̄. Note that, if the chosen method
is MNRM instead of tau-leap, we do not need to take into account the grid, and the
next time horizon will be the minimum between the next reaction time and the final
time, T .
We now explain algorithm B1 (TL-TL). Assume that tau-leap is chosen at the
coarse and at the fine level. We thus obtain two time horizons, one for the coarse
¯ . In this case, the global time horizon
level, H̄, and another for the fine level, H̄
¯ }. Since the chosen method in both grid levels is tau-leap, we
will be H:= min{H̄, H̄
need to freeze the propensities at the beginning of the corresponding intervals. In the
coarse case, during the interval [t, H̄) (the propensities are equal to a(X̄(t))=:ā), and
¯ ) (the propensities are equal to a(X̄
in the fine case during the interval [t, H̄ ¯ (t))=:ā
¯).
¯ (see Figure 6.1).
Suppose that H̄ < H̄
t+¯ t̄
H̄
¯t̄ t+¯
t ¯
H̄ T
Figure 6.1: This figure depicts a particular instance of the Cherno↵ hybrid coupling
algorithm (Algorithm 14), where ⌧¯ < ⌧¯. The synchronization horizon H, defined as
H:= min{H̄, H̄ ¯ }, is equal to H̄ in this case. Notice that H̄:= min{t̄, t + ⌧¯, T } and
¯ := min{t̄¯, t + ⌧¯, T }
H̄
190
Then, we couple two Poisson random variables at time t=H̄, using the idea de-
scribed in Section 6.2.1. When time reaches H̄, the decision between which method
to use (and the corresponding step size) at the coarse level must be made again. Note
¯ . The
that the propensities of the process at the fine grid will be kept frozen until H̄
¯ is analogous to the one we described, but the decisions on the
case when H̄ > H̄
¯ . It can also be
method and step size are made at the finer level, when time reaches H̄
¯ . In that case, the decision between which method to use (and
possible that H̄ = H̄
the corresponding step size) must be made at the coarse and at the fine level.
In the case of algorithm B2 (TL-MNRM), we assume that tau-leap is chosen
at the coarse level, and MNRM at the fine level, obtaining two time horizons, one
¯ . The only di↵erence in
for the coarse level, H̄, and another for the fine level, H̄
how we determine the time horizons between algorithms B1 and B2 is that the time
¯.
discretization grid points in the fine grid are not taken into account to determine H̄
¯ }. Suppose
Algorithm B2 is then applied until the simulation reaches H:= min{H̄, H̄
¯ < H̄. In this case, the process X̄
that H̄ ¯ could take more than one step to reach H̄
¯.
¯ (·)) are computed, but not the propensities
At each step, the propensity functions a(X̄
for the coarse level, because in that case the tau-leap method is used. Note that the
decision between which algorithm to use (B2 or another) is not made at those steps,
¯ . When time reaches H̄
but only when time reaches H̄ ¯ , the decision of which method
to use (and the corresponding step size) at the fine level must be made again. In this
case, the propensities at the coarse grid will be kept frozen until H̄. The reasoning
¯ > H̄ and H̄
for the cases H̄ ¯ = H̄ are similar to before.
The other two cases, that is, B3 and B4, are the same as B2. The only di↵erence
¯. See Algorithm 14 for more
resides is when to update the propensity values, ā and ā
details. As made clear in the preceding paragraphs, the decision on which algorithm
to use for a certain time interval is made only at the horizon points.
Remark 6.2.1. [About telescoping] To ensure the telescoping sum property, the prob-
191
ability law of the hybrid process at level ` should be the same disregarding whether level
¯ ) or the coarser in the pair (X̄ , X̄
` is the finer in the pair (X̄` 1 , X̄ ¯ ). For that
` ` `+1
reason, each process has its own next horizon as its decision points. See Figure 6.1
showing the time horizons scheme and Figures 6.14 and 6.15 in Section 6.5 to see
that the telescoping sum property is satisfied by our hybrid coupling sampling scheme.
Error Decomposition
In this section, we present the multilevel Monte Carlo estimator. We first show the
estimator and its properties and then we analyze and control the computational global
error, which is decomposed into three error components: the discretization error, the
global exit error, and the Monte Carlo statistical error. We give upper bounds for
each one of the three components.
In this section, we discuss and implement a variation of the multilevel Monte Carlo
estimator (6.4) for the hybrid Cherno↵ tau-leap case. The main ingredient of this
section is Algorithm 14, which generates coupled hybrid paths at levels ` 1 and `.
Let us now introduce some notation. Let A` be the event in which the X̄` -path arrived
at the final time, T , without exiting the state space of X. Let 1A , be the indicator
function of an arbitrary set, A. Finally, g` := g(X̄` (T )) was defined in Section 6.1.6.
Consider the following telescoping decomposition:
L
X ⇥ ⇤
E [gL 1AL ] = E [g0 1A0 ] + E g ` 1A` g ` 1 1A ` 1
,
`=1
192
which motivates the definition of our MLMC estimator of E [g(X(T ))],
M0 L M
1 X X 1 X̀
ML := g0 1A0 (!m,0 ) + [g` 1A` g` 1 1A` 1 ](!m,` ). (6.5)
M0 m=1 `=1
M` m=1
In this section, we define the computational global error, EL , and show how it can
be naturally decomposed into three components: the discretization error, EI,L , and
the exit error, EE,L , both coming from the tau-leap part of the hybrid method and
the Monte Carlo statistical error, ES,L . Next, we show how to model and control
the global error, EL , giving upper bounds for each one of the three components. We
define the computational global error, EL , as
EL := E [g(X(T ))] ML .
⇥ ⇤
E [g(X(T ))] ML = E g(X(T ))(1AL + 1AcL ) E [gL 1AL ] + E [gL 1AL ] ML
⇥ ⇤
= E g(X(T ))1AcL + E [(g(X(T )) gL ) 1AL ] + E [gL 1AL ] ML .
| {z } | {z } | {z }
=:EE,L =:EI,L =:ES,L
We show in [2] that by choosing adequately the one-step exit probability bound, ,
the exit error, EE,L , satisfies |EE,L | |E [g(X(T ))] | P (AcL ) T OL2 . An efficient pro-
cedure for accurately estimating EI,L in the context of the tau-leap method is described
in [14]. We adapt this method in Algorithm 20 for estimating the weak error in the
N (¯
!)
hybrid context. A brief description follows. For each hybrid path, (X̄` (tn,` , !
¯ ))n=0 ,
N (¯
!)
we define the sequence of dual weights ('n,` (¯
! ))n=1 backwards as follows (see Section
193
6.1.7):
where tn,` :=tn+1,` tn,` , r is the gradient operator and Ja (X̄` (tn,` , !
¯ ))⌘[@i aj (X̄` (tn,` , !
¯ ))]j,i
is the Jacobian matrix of the propensity function, aj , for j=1 . . . J and i=1 . . . d. Ac-
cording to this method, EI,L is approximated by A (EI,L (¯
! ); ·), where
N (¯
!) J
!
X tn,L X
EI,L (¯
! ) := 1T L (n) ('n,L · ⌫j ) aj,n (¯
! ), (6.7)
n=1
2 j=1
PM
A (X; M ) := M1 m=1 X(!m ), and, S 2 (X; M ) :=A (X 2 ; M ) A (X; M )2 denote the
sample mean and the sample variance of the random variable, X, respectively. Here,
aj,n (¯
! ):=aj (X̄L (tn+1,` , !
¯ )) aj (X̄L (tn,` , !
¯ )), 1T L (n)=1 if and only if, at time tn,` , the
tau-leap method was used, and we denote by Id the d ⇥ d identity matrix.
P L V`
The variance of the statistical error, ES,L , is given by `=0 M` , where V0 :=
⇥ ⇤
Var [g0 1A0 ] and V` := Var g` 1A` g` 1 1A` 1 , ` 1. In the next subsection, we
show how to estimate V` efficiently using the duals from (6.6).
194
6.3.3 Dual-weighted Residual Estimation of V`
Here, we derive the formula (6.8) for estimating the variance, V` , ` 1. It is based
on dual-weighted local errors arising from two consecutive tau-leap approximations
of the process, X. For each level ` 1, the formula estimates V` with much smaller
statistical error than the standard sample estimator, which is seriously a↵ected by
the large kurtosis present at the deepest levels (see Section 6.1.7).
Let us introduce some notation:
fj,n := ('n+1 · ⌫j ),
tn X
µj,n := (raj (xn ) · ⌫i )ai (xn ),
2 i
tn X
µ̄j,n := |(raj (xn ) · ⌫i )|ai (xn ),
2 i
2 tn X
j,n := (raj (xn ) · ⌫i )2 ai (xn ),
2 i
q
mj,n := min{µ̄j,n , µ2j,n + j,n2
},
µj,n
qj,n := ,
j,n
pj,n := ( qj,n ),
tn
where 1Gn =1 if and only if aj (xn ) 2
>c for all j 2 {1, . . . , J}, where c is a positive
user-defined constant.
First, notice that V` could be a very small positive number. In fact, in our nu-
merical experiments, we observe that the standard Monte Carlo sample estimation
of this quantity turns out to be computationally infeasible due to the huge number
of simulations required to stabilize its coefficient of variation. For this reason, we
initially consider the following dual-weighted approximations:
" #
X
E [ g` g` 1 ] ⇡ E 'n+1,` 1 · en+1,` 1 , (6.9)
n
" #
X
Var [ g` g` 1 ] ⇡ Var 'n+1,` 1 · en+1,` 1 ,
n
N (¯
!) 1
where ('n+1,` 1 )n=0 , defined in (6.6), is a sequence of dual weights computed
N (¯
!)
backwards from a simulated path, (X̄` (tn,` 1 ))n=1 , and the sequence of local errors,
N (¯
!) 1
(en+1,` 1 )n=0 , defined in (6.14), is the subject of the next subsection.
For simplicity of analysis, we make two assumptions: i) the time mesh associated with
the level, `, is obtained by halving the intervals of the level ` 1; ii) we perform the
tau-leap at both levels without considering the Cherno↵ bounds described in Section
196
6.1.4.
¯ be two tau-leap approximations of X based on two consecutive grid
Let X̄ and X̄
levels, for instance, X̄:=X̄` ¯ :=X̄ . Consider two consecutive time-mesh points
and X̄
1 `
¯ , {t , (t +t )/2, t }.
for X̄, {tn , tn+1 }, and three consecutive time-mesh points for X̄ n n n+1 n+1
¯ is to define
The first step for coupling X̄ and X̄
X
X̄n+1 := xn + ⌫j Yj,n (aj (xn ) tn ), (6.10)
j
X tn
Zn+1 := xn + ⌫j Qj,n (aj (xn ) ), (6.11)
j
2
X tn
¯
X̄ ⌫j Rj,n (aj (Zn+1 )
n+1 := Zn+1 + ),
j
2
where {Yj,n }Jj=1 [ {Qj,n }Jj=1 [ {Rj,n }Jj=1 are Poisson random variables. To couple the
¯ processes, we first decompose Y (a (x ) t ) as the sum of two indepen-
X̄ and X̄ j,n j n n
tn
dent Poisson random variables, Qj,n (aj (xn ) 2
)+Q0 j,n (aj (xn ) 2tn ). As a consequence,
¯ coincide in the closed interval [t , (t +t )/2]. By applying this decompo-
X̄ and X̄ n n n+1
X tn X tn
X̄n+1 = xn + ⌫j Qj,n (aj (xn ) )+ ⌫j Q0j,n (aj (xn ) ), (6.12)
j
2 j
2
X tn X tn
¯
X̄ = xn + ⌫j Qj,n (aj (xn ) )+ ⌫j Rj,n (aj (Zn+1 ) ).
n+1
j
2 j
2
min{aj (xn ), aj (Zn+1 )}, cj := aj (xn ) mj and fj := aj (Zn+1 ) mj . Notice that for
each j, either cj or fj is zero (or both).
197
Now, consider the following decompositions:
tn tn tn
Q0j,n (aj (xn ) 0
) = Pj,n (mj 00
) + Pj,n (cj ), (6.13)
2 2 2
tn 0 tn tn
Rj,n (aj (xn ) ) = Pj,n (mj ) + R0j,n (fj ),
2 2 2
en+1,` ¯
:= X̄ X̄n+1 (6.14)
1 n+1
X ✓ ◆
0 tn 00 tn
= ⌫j Rj,n (fj ) Pj,n (cj )
j
2 2
X ✓ tn tn
◆
= ⌫j R0j,n ( aj,n )1{ aj,n >0} Pj,n 00
( aj,n )1{ aj,n <0} ,
j
2 2
where aj,n := aj (Zn+1 ) aj (xn ) and Zn+1 is defined in (6.11). Note that in (6.14)
not only are R0j,n and Pj,n
00
random variables, but aj,n is also random because it
depends on the random variables (Qj,n )Jj=1 . Also note that all the mentioned random
variables are independent.
Conditioning
⇥ ⇥ ⇤⇤
E [X] = E E X F ,
⇥ ⇥ ⇤⇤ ⇥ ⇥ ⇤⇤
Var [X] = Var E X F + E Var X F . (6.15)
N (¯
!)
The main idea is to generate M` Monte Carlo paths, (X̄` (tn ; !
¯ ))n=1 , and to estimate
P
Var [ n 'n+1 · en+1 ] using
0 1 0 1
B C B C
BX ⇥ ⇤ C BX ⇥ ⇤ C
V̂` := S B
B E2
' n+1 · e n+1 F (¯
! ) ; M `
C + AB
C B Var ' n+1 · e n+1 F (¯
! ) ; M `
C.
C
@ n A @ n A
| {z } | {z }
Se (¯
!) Sv (¯
!)
(6.16)
In this section, we derive a local error representation that takes into account the fact
that the dual is computed backwards and the distribution of the local errors that is
relevant to our calculations is therefore not exactly the one given by (6.14), but the
distribution given by (6.17).
N (¯
!)
Consider the sequence (X̄n )n=0 defined in (6.10). For fixed n, define Fn as the
199
sigma-algebra
i.e., the information we obtain by observing the randomness used to generate X̄n+1
from x0 . Motivated by dual-weighted expansions (6.9), we want to express the local
error representation (6.14) conditional on F:=FN (¯!) .
At this point, it is convenient to remember a key result for building Poissonian
bridges. If X1 and X2 are two independent Poisson random variables with parameters
tn
Applying this observation to the decomposition Yj,n (aj (xn ) tn )=Qj,n (aj (xn ) 2
)+
tn tn
Q0 j,n (aj (xn ) 2
), we conclude that the conditional distribution of Qj,n (aj (xn ) 2
)
tn
given Fn , i.e., Qj,n (aj (xn ) 2
) Fn , is binomial with parameters Yj,n and 1/2.
Define now the sigma-algebra, Gn , as
✓ ◆
tn J
Gn := (Qj,n (aj (xn ) ) Fn )j=1 .
2
00
Applying the same argument to Pj,n , defined in (6.13), we conclude that
✓ ◆
00 cj
Pj,n {Fn , Gn } ⇠ binomial Yj,n Qj,n , .
aj (xn )
P
From the definition of Zn+1 =xn + j ⌫j Qj,n in (6.11), we conclude that
✓ ◆
tn
R0j,n Gn ⇠ Poisson (aj (Zn+1 ) mj ) .
2
00
Notice that, by construction, Pj,n {Fn , Gn } and R0j,n Gn are independent random
variables. Since cj = aj,n 1{ aj,n <0} and aj (Zn+1 ) mj = aj,n 1{ aj,n 0} , we can
200
express the conditional local error as
in the distribution sense. For instance, we can easily compute the expectation of
en+1 {Fn , Gn } as follows:
X ✓ ◆
⇥ ⇤ tn Yj,n Qj,n
E en+1 {Fn , Gn } = ⌫j aj,n 1{ aj,n 0} + 1{ aj,n <0} .
j
2 aj (xn )
Taking into account that the joint distribution of (Qj,n )Jj=1 Fn is given by
J
Y
P
Yj,n Yj,n !
P (Qj,n = qj,n )Jj=1 Fn = 2 j , 0 qj,n Yj,n ,
j=1
q j,n !(Y j,n q j,n )!
we can exactly compute the expected value and the variance of vn+1 · en+1 Fn for
N (¯
!)
any given deterministic vector, vn+1 . Notice that given F, the sequence (X̄n )n=0 is
N (¯
!)
deterministic and, as a consequence, the sequence ('n )n=1 F is also a deterministic
sequence of vectors. We can thus compute
" # " #
X X
E 'n+1 · en+1 F and Var 'n+1 · en+1 F (6.18)
n n
exactly and proceed as stated at the beginning of this section. However, trying to
develop computable expressions from (6.17) has two main disadvantages: i) it may
lead to computationally demanding procedures, especially for systems with many
reaction channels or in regimes with high activity; ii) it may be a↵ected by the
variance associated with the randomness in Fn and Gn .
201
Deriving a Formula for V̂`
In this section, we derive the formula (6.8). Our goal is to find computable approx-
imations of (6.18), where the underlying sigma-algebra, F, is just the information
gathered by observing the coarse path, X̄. This means that our formula should not
depend explicitly on the knowledge of the random variables that generate Fn and
Gn . At this point, it is important to recall the comments in Section 6.3.3; that is,
N (¯
!)
! ))n=1 is measurable with respect to F. This implies that, for all
the sequence ('n (¯
n, 'n+1 is independent of Gn . Hereafter, for notational convenience, we omit writing
explicitly the conditioning on F in our formulae.
It turns out that the leading order terms of the conditional moments obtained
from (6.17) are essentially the same as those computed from (6.14). We will then
derive (6.8) from (6.14). Using the notation from Section 6.3.3, we have that
X ✓ ◆
tn tn
('n+1 · en+1 ) = fj,n R0j,n ( aj,n )1{ aj,n >0}
00
Pj,n ( aj,n )1{ aj,n <0} .
j
2 2
⇥ ⇥ ⇤⇤ tn X
E [('n+1 · en+1 )] = E E ('n+1 · en+1 ) Gn = fj,n E [ aj,n ] .
2 j
X
aj,n := aj (xn + ⌫i Qi,n (ai (xn ) tn /2)) aj (xn )
i
X
⇡ (raj (xn ) · ⌫i Qi,n (ai (xn ) tn /2))
i
X
= (raj (xn ) · ⌫i )Qi,n (ai (xn ) tn /2).
i
Since Qi,n (ai (xn ) tn /2) ⇠ Poisson(ai (xn ) tn /2), we have that E [ aj,n ] = µj,n and
202
2
Var [ aj,n ] = j,n . Thus,
tn X
E [('n+1 · en+1 )] ⇡ fj,n µj,n .
2 j
⇥ ⇥ ⇤⇤ ⇥ ⇥ ⇤⇤
Var [('n+1 · en+1 )] = Var E ('n+1 · en+1 ) Gn + E Var ('n+1 · en+1 ) Gn .
⇥ ⇥ ⇤⇤ ( tn )3 X X
Var E ('n+1 · en+1 ) Gn ⇡ fj,n fj 0 ,n (raj (xn ) · ⌫i )(raj 0 (xn ) · ⌫i )ai (xn ),
8 j,j 0 i
⇥ ⇥ ⇤⇤ tn X 2
E Var ('n+1 · en+1 ) Gn ⇡ f E [ aj,n sgn( aj,n )] .
2 j j,n
Let us consider the case where ai (xn ) tn /2 is large enough for all i. It is well
known that a Poisson random variable, Q( ), is well approximated by a Gaussian ran-
dom variable, N ( , ), for moderate values of , say >10. Since Qi,n (ai (xn ) tn /2) ⇠
Poisson(ai (xn ) tn /2), we have that, when ai (xn ) tn /2 is large enough for all i,
2
aj,n ⇡ N (µj,n , j,n ). Consider a Gaussian random variable Z with parameters µ
2
and > 0. Then,
Z +1
⇥ ⇤
E (µ + Z)1{µ+ Z>0} = µP (µ + Z > 0) + p z exp z 2 /2 dz (6.19)
2⇡ µ/
✓ 2 ◆
⇥ ⇤ j,n qj,n
E aj,n 1{ aj,n >0} ⇡ µj,n (1 pj,n ) + p
exp , (6.20)
2⇡ 2
✓ 2 ◆
⇥ ⇤ j,n qj,n
E aj,n 1{ aj,n <0} ⇡ µj,n pj,n p exp .
2⇡ 2
203
By subtracting the expressions in (6.20), we obtain
⇥ ⇥ ⇤⇤ tn X 2
E Var ('n+1 · en+1 ) Gn ⇡ f (µ̃j,n + ˜j,n ) . (6.21)
2 j j,n
Let us now consider the case where ai (xn ) tn /2 is close to zero for some i. We can
p
bound the expression E [ aj,n sgn( aj,n )] by E [| aj,n |] and also E [( aj,n )2 ]. It is
easy to see that E [| aj,n |] µ̄j,n . Regarding E [( aj,n )2 ], it can be approximated by
" #
X X
E (raj (xn ) · ⌫i ) (raj (xn ) · ⌫i0 ) Qi Qi0 = (raj (xn ) · ⌫i ) (raj (xn ) · ⌫i0 ) E [Qi Qi0 ] .
i,i0 i,i0
Since
2
✓ ◆2 !
( tn ) tn tn
E [Qi Qi0 ] = ai (xn )ai0 (xn )1i6=i0 + ai (xn ) + ai (xn ) 1i=i0 , (6.22)
4 2 2
We conclude that E [ aj,n sgn( aj,n )] can be bounded by mj,n , which has been
q
defined as min{µ̄j,n , µ2j,n + j,n
2
}.
Remark 6.3.3. We are assuming that only tau-leap steps are taken, but in our hybrid
algorithms, some steps can be exact, and, hence, do not contribute to the local error.
For that reason, we include the indicator function of the tau-leap step, 1TL , in the
estimator, V̂` .
Remark 6.3.4. The dual-weighted residual approach makes the estimation of V` fea-
204
sible. In our numerical experiments, we found that, using the same number of sim-
ulated coupled hybrid paths, the variance of V̂` is much smaller than the variance of
Var [g` g` 1 ], estimated by a standard Monte Carlo. Note that V̂` can be computed
using only single-level hybrid paths at level ` 1. In the upper right panel of Figure
6.8, we can see that due to the hybrid nature of the simulated paths, it is not possible
to predict where the variance of g` g` 1 will enter into a superlinear regime. Thus,
by extrapolating the Var [g` g` 1 ] from the coarser levels, we may overestimate the
values of Var [g` g` 1 ] for the deepest levels.
In this section, we present a procedure that estimates E [g(X(T ))] within a given
prescribed relative tolerance, T OL>0, with high probability. The process contains
three phases:
Phase II Solution of the work optimization problem: we obtain the total number of
levels, L, and the sequences ( ` )L`=0 and (M` )L`=0 , i.e., the one-step exit proba-
bility bounds and the required number of simulations at each level. We recall
that in Section 6.1.6, we defined t` := t0 R ` , where R > 1 is a given integer
constant. For that reason, to define the whole sequence of meshes, ( t` )L`=0 , we
simply need to define the size of the coarsest mesh, t0 .
6.4.1 Phase I
6.4.2 Phase II
In this section, we set and solve the work optimization problem. Our objective func-
tion is the expected total work of the MLMC estimator, ML , defined in (6.5), i.e.,
L
X
` M` ,
`=0
where L is the maximum level (deepest level), 0 is the expected work of a single-level
path at level 0, and `, for ` 1, is the expected computational work of two coupled
paths at levels ` 1 and `. Finally, M0 is the number of single-level paths at level 0,
and M` , for ` 1, is the number of coupled paths at levels ` 1 and `.
Let us now describe in detail the quantities, ( ` )L`=0 . For `=0, Algorithm 23
206
generates a single hybrid path. The building block of a single hybrid path is Algo-
rithm 13, which adaptively determines whether to use an MNRM step or a tau-leap
one. According to this algorithm, there are two ways of taking an MNRM step, de-
pending on the logical conditions, K1 /a0 (x)>T0 t and K2 /a0 (x)>⌧Ch . Given one
particular hybrid path, let NK1 ( t0 , 0 ) be the number of MNRM steps such that
K1 /a0 (x)>T0 t is true, and let NK2 ( t0 , 0 ) be the number of MNRM steps such
that K1 /a0 (x)>T0 t is false and K2 /a0 (x)>⌧Ch is true. When a Cherno↵ tau-leap
step is taken, we have constant work, C3 , and variable work computed with the aid
of CP . Then, the expected work of a single hybrid path, at level ` = 0, is
where t0 is the size of the time mesh at level 0 and 0 is the exit probability bound
at level 0. Therefore, the expected work at level 0 is 0 M0 , where M0 is the total
number of single hybrid paths.
For ` 1, we use Algorithm 14 to generate M` -coupled paths that couple the
` 1 and ` levels. Given two coupled paths, let NK1 ( t` 1 , ` 1) and NK1 ( t` , ` ) be
the number of exact steps for level ` 1 (coarse mesh) and ` (fine mesh), respectively,
with associated work C1 . We define NK2 ( t` 1 , ` 1) and NK2 ( t` , ` ) analogously.
Then, the expected work of a pair of coupled hybrid paths at levels ` and ` 1 is
h i h i h i
(c) (c) (c)
` := C1 E NK1 (`) + C2 E NK2 (`) + C3 E NTL (`) (6.24)
J
X Z
+ E CP (aj (X̄` (s))⌧Ch (X̄` (s), ` ))1T L (X̄` (s))ds
j=1 [0,T ]
J
X Z
+ E CP (aj (X̄` 1 (s))⌧Ch (X̄` 1 (s), ` 1 ))1T L (X̄` 1 (s))ds ,
j=1 [0,T ]
207
where
(c)
NK1 (`) := NK1 ( t` , ` ) + NK1 ( t` 1 , ` 1)
(c)
NK2 (`) := NK2 ( t` , ` ) + NK2 ( t` 1 , ` 1)
(c)
NTL (`) := NTL ( t` , ` ) + NTL ( t` 1 , ` 1 ).
Now, recalling the definitions of the error decomposition, given at the beginning of
Section 6.3.2, we have all the elements to formulate the work optimization problem.
Given a relative tolerance, T OL>0, we solve
8
> PL
>
> min L }
`=0 ` M`
>
< { t 0 ,L,(M , )
` ` `=0
s.t. (6.25)
>
>
>
>
: EE,L + EI,L + ES,L T OL.
8
> P
>
>
> min(M` 1)L`=0 L`=0 ` M`
<
s.t. (6.26)
>
> q
>
: EI,L + CA PL V` T OL T OL2 ,
>
`=0 M`
s.t. . (6.27)
>
>
>
: P L V` R
>
`=0 M`
We do not develop here all the calculations, but a pseudo code is given in Algorithm
22.
Let us now analyze two extreme cases: i) for L such that EI,L is less but very close
P
to T OL T OL2 , we have that L`=0 V` /M`⇤ is a very small number. As a consequence,
we obtain large values of M`⇤ and, hence, a large value of wL . By adding one more
level, i.e., LL+1, we expect a larger gap between EI,L and T OL0 ; that means that
P
we expect a larger value of L`=0 V` /M`⇤ that may lead to smaller values of M`⇤ . We
observe that, in spite of adding one more term to wL , this leads to a smaller value of
wL . ii) At the other extreme, a large value of L is associated with large values of L
for some p, we have wLp+1 wLp , we accept L⇤ =Lp . Of course, we can stop even if
wLp+1 <wLp , but the di↵erence wLp+1 wLp is sufficiently small. In this last case, we
accept L⇤ =Lp+1 .
209
Computational Complexity
At this point, we have all the necessary elements to establish a key point of this work,
the computational complexity of the multilevel hybrid Cherno↵ tau-leap method.
Let us now analyze the optimal amount of work at level L, wL , as a function of the
given relative tolerance, T OL. For simplicity, let us assume that M`⇤ >1, `=0, ..., L.
In this case, the optimal number of samples at level ` is given by
L
X
p p
M`⇤ =(CA /✓)2 T OL 2
V` / ` V` `,
`=0
for some ✓ 2 (0, 1). In fact, ✓ is the proportion of the tolerance, T OL, that our
computational cost optimization algorithm selects for the statistical error, ES,L . In
our algorithms, we impose ✓ 0.5; however, our numerical experiments always select
a larger value (see Figures 6.3 and 6.9).
By substituting M`⇤ into the total work formula, wL , we conclude that the optimal
expected work, conditional on ✓, is given by
0 12
L(✓)
⇥ ⇤ ⇤
@ CA X p A T OL 2 .
E wL (T OL) ✓ = V` `
✓ `=0
P1 p
Let us consider the series `=0 V` `. First, observe that the expected compu-
tational work per path at level `, `, is bounded by a multiple of the expected
computational work of the MNRM (see Section 6.1.2), i.e., K MNRM . In our nu-
merical experiments, we observe that taking K around 3 is enough. Therefore,
P1 p p P p
`=0 V` ` K MNRM 1 `=0 V` . Observe that, by construction, V` ! 0, super-
210
linearly. More specifically, it satisfies the bound V` = O ( t` ) C t0 (1/2)` for some
P p
positive constant C. Therefore, the series 1 `=0 V` is dominated by the geometric
P p ` PL p
series 1 `=0 (1/ 2) < 1 . We conclude that supL { `=0 V` ` } is bounded and,
therefore, the expected computational complexity of the multilevel hybrid Cherno↵
tau-leap method is wL⇤ (T OL)=O (T OL 2 ).
At this point, it is crucial to observe that if we impose the condition (6.28) on any
level `<L, then we are unnecessarily enforcing a dependence of ` on T OL. This
dependence may result in very small values of `, which in turn may increase the
expected number of exact steps and tau-leap steps at level `, implying a larger ex-
pected computational work at level `. In the appendix of [2], we proved that, when
` tends to zero, the expected values of the number of tau-leap steps at level ` go to
zero, and therefore our hybrid MLMC strategy would converge to the SSA method
without the desired reduction in computational work. To avoid the dependence of
211
L 1
( ` )`=0 on T OL, we adopt a di↵erent strategy based on the following decomposition:
⇥ ⇤ ⇥ ⇤
V` = Var g` 1A` g ` 1 1A` 1
= Var g` g` 1 A` \ A` 1 P (A` \ A` 1 )
⇥ ⇤
+ Var g` A` \ Ac` 1 P A` \ Ac` 1
⇥ ⇤
+ Var g` 1 Ac` \ A` 1 P (Ac` \ A` 1 ) .
We impose that the first term of the right-hand side dominates the other two. This is
because the conditional variances appearing in the last two terms are of order O (1),
while the conditional variance appearing in the first term is of order O ( t` ), and
we make our computations with approximations of V` assuming that P (A` \ A` 1 )
is close to one. We proceed as follows: first, we approximate P (A` \ A` 1 ) by
P (A` ) P (A` 1 ); then, we consider 1 ` A (NTL ( t` , ` ); ·) as an approximate upper
bound for P (A` ) when ` A (NTL ( t` , ` ); ·) ⌧1. Those considerations lead us to im-
pose
⇥ ⇤
Var g` g` 1 A` \ A` 1 (1 ` A (NTL ( t` , ` ); ·)) (1 ` 1 A (NTL ( t` 1 , ` 1 ); ·)) >
(6.29)
⇥ ⇤ ⇥ ⇤
Var g` A` \ Ac` 1 ` 1 A (NTL ( t` 1 , ` 1 ); ·) + Var g` 1 Ac` \ A` 1 ` A (NTL ( t` , ` ); ·) .
Algorithms 23 and 18 provide A (g` ; ·), A (NTL ; ·) and the other required quanti-
ties. Condition (6.30) does not a↵ect the telescoping sum property of our multilevel
estimator, ML , defined in (6.5), since each level, `, has its own `.
l
X
⇥ ⇤ ⇥ ⇤ ⇥ ⇤ ⇥ ⇤
Var g(X̄l (T )) = Var g(X̄0 (T )) + (Var g(X̄` (T )) Var g(X̄` 1 (T )) ),
`=1
where l > 1 is a fixed level. Using the usual variance estimators for each level, we
obtain an unbiased multilevel estimator of the variance of g(X̄). We refer to [21] for
details.
Remark 6.4.2 (Coupled paths exiting the lattice, Zd+ ). Algorithm 14 could compute
four types of paths. It could happen that no approximate process (the coarse one,
X` 1 , or the fine one, X` ) exits the lattice, which is the most common case. It could
also happen that one of the approximate processes exits the lattice. And finally, both
approximate processes could exit the lattice. The first case is the most common one
and no further explanation is required. We now explain the case when one of processes
exits the lattice. Suppose that the coarse one exits the lattice. In that case, until the
fine process reaches time T or exits the lattice, we still simulate the coupled process
by simulating only the fine path using the single-level hybrid algorithm presented in
[2]. If the fine path reaches T , we have that 1A` 1
= 0, and 1A` = 1. Vice versa, if
the fine process exits and the coarse one reaches T , we have 1A` 1
= 1 and 1A` = 0.
M0 L 1 M
1 X X 1 X̀
M̃L := g0 1A0 (!m,0 ) + [g` 1A` g` 1 1A` 1 ](!m,` )
M0 m=1 `=1
M` m=1
ML
1 X
+ [g(X(T )) g` 1 1AL 1 ](!m,L ).
ML m=1
From Phase II, we found that, to compute our multilevel Monte Carlo estimator,
ML , for a given tolerance, we have to run M0⇤ single hybrid paths with parameters
( t0 , 0 ) and M`⇤ coupled hybrid paths with parameters ( t` 1 , ` 1) and ( t` , ` ),
for ` = 1, 2, . . . , L⇤ . But, we will follow a slightly di↵erent strategy: we run half of
the required simulations and use them to update our estimations of the sequences
⇤ ⇤ ⇤
(EI,` )L`=0 , (V` )L`=0 , and ( ` )L`=0 . Then, we solve the problem (6.26) again and re-
214
calculate the values of M`⇤ for all `. We proceed iteratively until convergence. In
this way, we take advantage of the information generated by new simulated paths
and update the estimations of the sequences of weak errors, computational costs, and
variances, obtaining more control over the total work of the method.
In this section, we present two examples to illustrate the performance of our proposed
method, and we compare the results with the single-level approach given in [2]. For
bench-marking purposes, we use Gillespie’s Stochastic Simulation Algorithm (SSA)
instead of the Modified Next Reaction Method (MNRM), because the former is widely
used in the literature.
The classical radioactive decay model provides a simple and important example for
the application of our method. This model has only one species and one first-order
reaction,
c
X ! ;. (6.31)
⌫= 1 and a(X) = c X.
Here, we choose c = 1, and define g(x) = x as the scalar observable. In this par-
ticularly simple example, we have that E [g(X(T ))|X(t) = X0 ] = X0 exp( c(T t)).
Consider the initial condition X0 =105 and the final time T =0.5. In this case, the pro-
215
cess starts relatively far from the boundary, i.e., it is a tau-leap dominated setting.
We now analyze an ensemble of five independent runs of the calibration algorithm
(Algorithm 18), using di↵erent relative tolerances. In Figure 6.1, we show, in the
left panel, the total predicted work (runtime) for the single-level hybrid method, for
the multilevel hybrid method and for the SSA method, versus the estimated error
bound. The multilevel method is preferred over the SSA and the single-level hybrid
method for all the tolerances. We also show the estimated asymptotic work of the
multilevel method. In the right panel, we show, for di↵erent tolerances, the actual
work (runtime), using a 20 core Intel GLNXA64 architecture and MATLAB version
R2014a.
In Table 6.1, we summarize an ensemble run of the calibration algorithm, where
WML is the average actual computational work of the multilevel estimator (the sum
of all the seconds taken to compute the estimation) and WSSA is the corresponding
average actual work of the SSA. We compare those values with the corresponding
estimations, ŴML and ŴSSA .
Predicted work vs. Error bound, Decay model Actual work vs. Error bound, Decay model
SSA SSA
Hybrid Hybrid
Hybrid ML Hybrid ML
−3 slope 1/2 −3 slope 1/2
Error bound
10
Error bound
10
Asymptotic
−4 −4
10 10
0 1 2 3
10 10 10 10 0 1 2 3
10 10 10 10
Predicted work (runtime, seconds) Actual work (runtime, seconds)
Figure 6.1: Left: Predicted work (runtime) versus the estimated error bound for
the simple decay model (6.31), with 95% confidence intervals. The multilevel hybrid
method is preferred over the SSA and the single-level method for all the tolerances.
Right: Actual computational work (runtime) versus the estimated error bound. No-
tice that the computational complexity has order O (T OL 2 ).
216
In Figure 6.2, we can observe how the estimated weak error, ÊI,` , and the estimated
variance of the di↵erence of the functional between two consecutive levels, V̂` , decrease
linearly as we refine the time mesh. This corresponds to the pure tau-leap case since
the process, X, remains far from the boundary in [0, T ]. As expected, the linear
relationship for the variance starts at level 1. The estimated total path work, ˆ` ,
increases as we refine the mesh. Observe that it increases more slowly than linearly.
This is because the work needed for generating Poisson random variables becomes less
as we refine the time mesh. In the lower right panel, we show the total computational
work, only in the cases in which ÊI,` < T OL T OL2 .
In Figure 6.4, we show the main outputs of Algorithm 18, ` and M` for ` =
0, ..., L⇤ , for the smallest considered tolerance. In this case, L⇤ is 12. We observe that
the number of realizations decreases slower than linearly, from levels 1 to L⇤ 1, until
it reaches ML⇤ =1.
ŴM L WM L
T OL L⇤ Min Max ŴSSA
Min Max WSSA
Min Max
3.13e-03 5 5 5 0.03 0.02 0.04 0.03 0.02 0.05
1.56e-03 6 6 6 0.04 0.02 0.10 0.04 0.02 0.13
7.81e-04 8 8 8 0.03 0.02 0.05 0.03 0.02 0.06
3.91e-04 9.2 9 10 0.02 0.02 0.03 0.02 0.01 0.03
1.95e-04 11 11 11 0.02 0.02 0.03 0.02 0.02 0.04
9.77e-05 12 12 12 0.03 0.02 0.03 0.03 0.02 0.03
Table 6.1: Details of the ensemble run of Algorithm 18 for the simple decay model
(6.31). As an example, the second row of the table indicates that, for a tolerance
T OL=1.56 · 10 3 , six levels are needed. The predicted work of the multilevel hybrid
method is, on average, 4% of the predicted work of the SSA method, which coincides
with the actual work. Observed minimum and maximum values in the ensemble are
also provided.
In the left panel of Figure 6.5, we show the performance of formula (6.8), imple-
mented in Algorithm 21, used to estimate the strong error, V` , defined in Section
6.3.2. The quotient of V̂` over a standard Monte Carlo estimate of V` is almost 1 for
the first ten levels. At levels 11 and 12, we obtain 0.99 and 0.91, respectively. Both
217
Weak error for TOL = 9.77e−05 Variancel for TOL = 9.77e−05
−2
10
−6
Linear reference
Weak Error 10
−3
10 10
−7
Varl
−8
10
−4
10
−9
10
−5
10
Linear reference 10
−10
−4 −3 −2 −1 −4 −3 −2 −1
10 10 10 10 10 10 10 10
h h
Psil for TOL = 9.77e−05 Predicted total work(L), for TOL = 9.77e−05
1400
Linear reference
1200
1
10
1000
Work
0
10
l
Psi
800
−1 600
10
400
−2
10
−4 −3 −2 −1 200
10 10 10 10 0 5 10 15
h Level (L)
Figure 6.2: Upper left: estimated weak error, ÊI,` , as a function of the time mesh size,
t, for the simple decay model (6.31). Upper right: estimated variance of the di↵er-
ence between two consecutive levels, V̂` , as a function of t. Lower left: estimated
path ˆ` , as a function of t. Lower right: estimated total computational work,
PL work,
ˆl Ml , as a function of the level, L.
l=0
quantities are estimated using a coefficient of variation less than 5%, but there is a
remarkable di↵erence in terms of computational work in favor of our dual-weighted
estimator. In the right panel of the same figure, we show the estimated variance of
V` , computed by dual-weighted estimation (6.8) and computed by direct sampling.
Observe that, in this case, the computational savings may be up to order O (105 ).
In the simulations, we observed that, as we refine T OL, the optimal number of
levels approximately increases logarithmically, which is a desirable feature. We fit the
model L⇤ = a + b log(T OL 1 ), obtaining b= 2.11 and a= 7.3.
218
TOL vs. Statistical error %
Sqrt of−4(Variancel times Psil), for TOL = 9.77e−05
x 10
0.9
2.2
Figure 6.3: Left: Percentage of the statistical error over the computational global
error, for the simple decay model q
(6.31). As mentioned in Section 6.4, it is well above
0.5 for all the tolerances. Right: V̂` ˆ` as a function of `, for the smallest tolerance,
which decreases as the level increases. Observe that the contribution of level 0 is less
than 50% of the sum of the other levels.
2
−8 10
10
Ml
δl
−10
10 0
10
−12
10
−2
10
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Level (l) Level (l)
Figure 6.4: One-step exit probability bound, ` , and M` for `=0, 1, ..., L⇤ , for the
smallest tolerance, for the simple decay model (6.31).
The QQ-plot in Figure 6.6 shows, for the smallest considered T OL, 103 inde-
pendent realizations of the multilevel estimator, ML (defined by (6.5)). Those 103
points are generated using 5 sets of parameters given by an independent run of the
calibration algorithm (Algorithm 18). This plot, complemented with a Shapiro-Wilk
normality test, validates our assumption about the Gaussian distribution of the sta-
tistical error. Observe that the estimates are concentrated around the theoretical
219
Dual−based and empirical variances, Decay model Variance of Variancel for TOL = 9.77e−05
−15
−7 10
10
Variance of Variancel
Variances
−8 −20
10 10
−9
10 −25
10
Cuadratic reference
Dual−based Dual−based
Empirical Empirical
−4 −3 −2 −1
10 10 10 10 2 4 6 8 10 12
h Level (l)
Figure 6.5: Left: performance of the formula (6.8) as a strong error estimate, for the
simple decay model (6.31). Here, h= t. Right: estimated variance of V` with 95%
confidence intervals.
QoI vs.
4 Standard Normal for TOL = 9.77e−05 TOL vs. Total error
x 10 3.6
6.0662 2
−3
4
6.066 10 4
2.6
6.0658 5.6
Quantiles of QoI
−4
10
6.0656
Total error
6.0654 −5
10
6.0652
−6
6.065 10
6.0648
−7
10
6.0646
6.0644
−4 −3
−3 −2 −1 0 1 2 3 10 10
Standard Normal Quantiles TOL
Figure 6.6: Left: QQ-plot for the hybrid Cherno↵ MLMC estimates, ML , in the
simple decay model (6.31). Also, we performed a Shapiro-Wilk normality test, and we
obtained a p-value of 0.0105. Right: T OL versus the actual computational error. The
numbers above the straight line show the percentage of runs that had errors larger
than the required tolerance. We observe that in all cases, except for the smallest
tolerance, the computational error follows the imposed tolerance with the expected
confidence of 95%.
value X0 exp( c(T t)) = 105 exp( 0.5) ⇡ 6.0653e + 04. In the same figure, we also
show T OL versus the actual computational error. It can be seen that the prescribed
tolerance is achieved with the required confidence of 95%, in all the tolerances.
220
6.5.2 Gene Transcription and Translation [1]
1 c 2 c
; ! R, R ! R+P
3 c 4 c
2P ! D, R ! ; (6.32)
5 c
P ! ;
0 1 0 1
1 0 0 c1
B C B C
B C B C
B 0 1 0 C B c R C
B C B 2 C
B C B C
⌫=B
B 0 2 1 C and a(X) = B c3 P (P 1) C
C B
C,
B C B C
B C B C
B 1 0 0 C B c4 R C
@ A @ A
0 1 0 c5 P
where X(t) = (R(t), P (t), D(t)), and c1 =25, c2 =103 , c3 =0.001, c4 =0.1, and c5 =1.
In the simulations, the initial condition is (0, 0, 0) and the final time is T =1. The
observable is given by g(X) = D. We observe that the abundance of the mRNA
species, represented by R, is close to zero for t 2 [0, T ]. However, as we point out in
[2], the reduced abundance of one of the species is not enough to ensure that the SSA
method should be used.
We now analyze an ensemble of five independent runs of the calibration algorithm
(Algorithm 18), using di↵erent relative tolerances. In Figure 6.7, we show, in the left
panel, the total predicted work (runtime) for the single-level hybrid method, for the
multilevel hybrid method and for the SSA method, versus the estimated error bound.
We also show the estimated asymptotic work of the multilevel method. Again, the
multilevel hybrid method outperforms the others and we remark that the observed
computational work of the multilevel method is of order O (T OL 2 ).
221
Predicted work vs. Error bound, Genes model Actual work vs. Error bound, Genes model
−1
−1 10
10 SSA
SSA
Hybrid Hybrid
Hybrid ML Hybrid ML
slope 1/2 slope 1/2
Error bound
Error bound
Asymptotic
−2 −2
10 10
1 2 3 4
10 10 10 10 1 2 3 4
10 10 10 10
Predicted work (runtime, seconds) Actual work (runtime, seconds)
Figure 6.7: Left: Predicted work (runtime) versus the estimated error bound for the
gene transcription and translation model (6.32). The hybrid method is preferred over
the SSA for the first three tolerances only. The multilevel hybrid method is preferred
over the SSA and the single-level method for all the tolerances. Right: Actual work
(runtime) versus the estimated error bound.
In Figure 6.8, we can observe how the estimated weak error decreases linearly
for the coarser time meshes, but, as we continue refining the time mesh, it quickly
decreases towards zero. In the case of the estimated variance, V̂` , it decreases faster
than linearly, and it also quickly decreases towards zero afterwards. This is a con-
sequence of the transition from a hybrid regime to a pure exact one. The estimated
total path work, ˆ` , increases sublinearly as we refine the mesh. Note that ˆ` reaches
a maximum, which corresponds to a SSA-dominant regime. In the lower right panel,
we show the total computational work only in the cases in which ÊI,` < T OL T OL2 .
In Figure 6.10, we show the main outputs of Algorithm 18, ` and M` for ` =
0, ..., L⇤ , for the smallest tolerance. We observe that the number of realizations de-
creases slower than linearly from levels 1 to 12.
In Figure 6.11, we see that our dual-weighted estimator of the strong error, V` ,
gives essentially the same results as the standard Monte Carlo estimator, but with
much less computational work. In this case, an accurately empirical estimate of V7
took almost 48 hours, but the dual-based computation of V̂7 just took few minutes.
222
Weak error for TOL = 3.13e−03 Variancel for TOL = 3.13e−03
−1
10
−2
10
−2
10
Weak Error
−4
Varl
10
−3
10
−6
−4
10
10
1
10
2
Work
l
Psi
0 1.5
10
1
−1
10
0.5
−4 −3 −2 −1 0
10 10 10 10 0 2 4 6 8 10 12 14
h Level (L)
Figure 6.8: Upper left: estimated weak error, ÊI,` , as a function of the time mesh size,
t, for the gene transcription and translation model (6.32). Upper right: estimated
variance of the di↵erence between two consecutive levels, V̂` , as a function of t.
Lower left: estimated pathPwork, ˆ` , as a function of t. Lower right: estimated
total computational work, Ll=0 ˆl Ml , as a function of the level, L.
ŴM L WM L
T OL L⇤ Min Max ŴSSA
Min Max WSSA
Min Max
1.00e-01 3 3 3 0.04 0.04 0.04 0.06 0.05 0.07
5.00e-02 4.6 4 5 0.04 0.03 0.04 0.05 0.05 0.05
2.50e-02 6 6 6 0.03 0.03 0.04 0.05 0.04 0.05
1.25e-02 8 8 8 0.03 0.03 0.03 0.05 0.05 0.06
6.25e-03 10 10 10 0.03 0.03 0.03 0.05 0.04 0.05
3.13e-03 11.4 11 13 0.03 0.03 0.03 0.05 0.04 0.05
Table 6.2: Details for the ensemble run of Algorithm 18 for the gene transcription
and translation model (6.32).
223
TOL vs. Statistical error %
Sqrt of (Variance times Psi ), for TOL = 3.13e−03
l l
0.95 0.035
0.85 0.02
0.015
0.8
0.01
0.75 0.005
0 2 4 6 8 10 12
0.02 0.04 0.06 0.08 0.1 Level
TOL
Figure 6.9: Left: Percentage of the statistical error over the computational global
error, for the gene transcription and translation model (6.32).
q As mentioned in Section
6.4, it is well above 0.5 for all the tolerances. Right: V̂` ˆ` as a function of `,
for the smallest tolerance, which decreases as the level increases. Observe that the
contribution of level 0 is almost equal to the sum of the other levels.
−7
10
2
10
Ml
−8
δl
10
−9
10
0
−10 10
10
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Level (l) Level (l)
Figure 6.10: The one-step exit probability bound, ` , and M` for `=0, 1, ..., L⇤ , for
the smallest tolerance in the gene transcription and translation model (6.32).
−10
10
Variance of Variancel
Variances
−4
10
−15
10
−5
10 Cuadratic reference
Dual−based
−20 Dual−based
Empirical 10
−2 −1
Empirical
10 10
2 4 6 8 10 12
h
Level (l)
Figure 6.11: Left: performance of formula (6.8) as a strong error estimate for the gene
transcription and translation model (6.32). Here, h= t. Right: estimated variance
of V` with 95% confidence intervals.
QoI vs. Standard Normal for TOL = 3.13e−03 TOL vs. Total error 2.8
10
−1 2.6
3730 3.6
4
3725
−2
4
10 5
Quantiles of QoI
3720
Total error
−3
3715 10
3710
−4
10
3705
3700 −5
10
3695
−2 −1
−3 −2 −1 0 1 2 3 10 10
Standard Normal Quantiles TOL
Figure 6.12: Left: QQ-plot based on ML estimates for the gene transcription and
translation model (6.32). Also, we performed a Shapiro-Wilk normality test and we
obtained a p-value of 0.6. Right: T OL versus the actual global computational error.
The numbers above the straight line show the percentage of runs that had errors
larger than the required tolerance. We observe that in all cases (except the second
for a very small margin) the computational error follows the imposed tolerance with
the expected confidence of 95%.
the actual global computational error. It can be seen that the prescribed tolerance
is achieved, except for the second smallest tolerance, with the required confidence of
95%, since CA =1.96.
225
MLMC Hybrid-Path Analysis
Remark 6.5.1. The savings in computational work when generating Poisson random
227
Chernoff steps over total tau−leap for TOL = 1.25e−02
1e−10
1
1e−5
1e−5
1e−5
1e−5
1e−6
1e−6
1e−7
1e−7
1e−8
1e−8
1e−8
1e−8
1e−9
1e−9
1e−9
1e−9
0.9
0.8
Proportion 0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0 1C 1F 2C 2F 3C 3F 4C 4F 5C 5F 6C 6F 7C 7F 8C 8F
Level and δl
Figure 6.13: Proportion of the number of Cherno↵ tau-leap steps over the total
number of tau-leap steps for the gene transcription and translation model (6.32). In
the x-axis, we show the corresponding level (starting from level 0) and, subsequently,
the coarse (C) and fine (F) level. Below the title, we show the corresponding ` of
each level. We observe a small increase in the proportion of the number of Cherno↵
steps from levels 1F/2C to levels 3F/4C (strictly speaking, a shift in the median and
the third quartile). This is due to consecutive refinements in the values of , from
1e 5 to 1e 7, producing smaller and smaller values of ⌧Ch .
Remark 6.5.2. (Level 0 time mesh) In this example, we use an adaptive mesh at
level 0. This is because this example is mildly sti↵. Using a uniform time mesh at
level 0 imposes a small time step size requirement for all time which is not needed.
Moreover, this issue is propagated to the finer levels. In all our numerical examples,
228
Total tau−leap steps for TOL = 1.25e−02
1e−10
1e−5
1e−5
1e−5
1e−5
1e−6
1e−6
1e−7
1e−7
1e−8
1e−8
1e−8
1e−8
1e−9
1e−9
1e−9
1e−9
3000
2500
2000
Total count
1500
1000
500
0
0 1C 1F 2C 2F 3C 3F 4C 4F 5C 5F 6C 6F 7C 7F 8C 8F
Level and δl
Figure 6.14: Total number of tau-leap steps per path for the gene transcription and
translation model (6.32). In the x-axis, we show the corresponding pairings of two
consecutive levels (starting from level 0) and, subsequently, the coarse (C) and fine
(F) meshes for two consecutive levels. Below the title, we show the corresponding
` of each level. The domain ITL of the tau-leap method decreases with refinements,
but, since the size of time mesh halves from by passing from one level to the next one,
we see an increasing number of tau-leap steps until, at a certain level, there are no
more tau-leap steps due to the relative computational cost of the tau-leap method.
at level 0, we use the coarsest possible time mesh such that the Forward Euler method
is numerically stable.
6.6 Conclusions
In this work, we developed a multilevel Monte Carlo version for the single-level hybrid
Cherno↵ tau-leap algorithm presented in [2]. We showed that the computational
complexity of this method is of order O (T OL 2 ) and, therefore, that it can be seen
as a variance reduction of the SSA method, which has the same complexity. This
represents an important advantage of the hybrid tau-leap with respect to the pure
229
Exact steps for TOL = 1.25e−02
1e−10
1e−5
1e−5
1e−5
1e−5
1e−6
1e−6
1e−7
1e−7
1e−8
1e−8
1e−8
1e−8
1e−9
1e−9
1e−9
1e−9
5000
4500
4000
3500
Total count
3000
2500
2000
1500
1000
500
0
0 1C 1F 2C 2F 3C 3F 4C 4F 5C 5F 6C 6F 7C 7F 8C 8F
Level and δl
Figure 6.15: Total number of exact steps per path for the gene transcription and
translation model (6.32). In the x-axis, we show the corresponding pairings of two
consecutive levels (starting from level 0) and, subsequently, the coarse (C) and fine
(F) meshes for two consecutive levels. Below the title, we show the corresponding `
of each level. The domain IMNRM of the exact method is monotonically increasing
with refinements of the time mesh and the one-step exit probability bound. As a
consequence, we expect the total count of exact paths to be a monotonically increasing
function of the level, `.
Average proportion of tau−leap steps, level 0 Average proportion of tau−leap steps, level 5 Average proportion of tau−leap steps, level 8
1 1 1
tau−leap steps over total steps
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Time Time Time
Figure 6.16: This figure depicts the ‘blending’ e↵ect produced by our hybrid path-
simulation algorithm. Here, we can see the proportion of the tau-leap steps adaptively
taken based on expected work optimization. We see how the presence of the tau-leap
decreases when we move to the deepest levels. We observe that, for the chosen
tolerance, to couple with an exact path at the last level is not optimal.
230
tau-leap in the multilevel context. In our numerical examples, we obtained substantial
gains with respect to both the SSA and the single-level hybrid Cherno↵ tau-leap. The
present approach, like the one in [2], also provides an approximation of E [g(X(T ))]
with prescribed accuracy and confidence level, with nearly optimal computational
work. For reaching this optimality, we derived novel formulas based on dual-weighted
residual estimations for computing the variance of the di↵erence of the observables
between two consecutive levels in coupled hybrid paths and also the bias of the deepest
level (see (6.7) and (6.8)). These formulas are particularly relevant in the present
context of Stochastic Reaction Networks due to the fact that alternative standard
sample estimators become too costly at deep levels because of the presence of large
kurtosis.
Future extensions may involve better hybridization techniques as well as implicit
and higher-order versions of the hybrid MLMC.
Acknowledgments
The authors would like to thank two anonymous reviewers for their constructive
comments that helped us to improve our manuscript. We also would like to thank
Prof. Mike Giles for very enlightening discussions. The authors are members of
the KAUST SRI Center for Uncertainty Quantification in the Computer, Electrical
and Mathematical Sciences and Engineering Division at King Abdullah University of
Science and Technology (KAUST). This work was supported by KAUST.
231
Appendix
232
Algorithm 14 Coupled hybrid path. Inputs: the initial state, X(0), the final time,
T , the propensity functions, a=(aj )Jj=1 , the stoichiometric vectors, ⌫=(⌫j )Jj=1 , two
one-step exit probability bounds; one for the coarse level, ¯, and another for the fine
level, ¯, and two time meshes, one coarse (tk )K k=0 , such that tK =T and a finer one,
K0 0
(sl )l=0 , such that s0 =t0 , sK =tK , and (tk )k=0 ⇢ (sl )K
0
K
l=0 . Outputs: a sequence of states
evaluated at the coarse grid, (X̄(tk ))K k=0 ⇢ Z+ , such that tK T , a sequence of states
d
Algorithm 15 Compute next time horizon. Inputs: the current state, X̃, the current
time, t, the next grid point, t̃, the final time, T , the one step exit probability bound, ˜,
and the propensity functions, a=(aj )Jj=1 . Outputs: the next horizon H, the selected
method m, current propensity values ã.
1: ã a(X̃)
2: (m, ⌧˜) Algorithm 13 with (X̃,t,ã, ˜,t̃)
3: if m = TL then
4: H min{t̃, t+˜⌧, T}
5: else
6: H min{t+˜ ⌧, T}
7: end if
8: return (H, m, ã)
234
Algorithm 16 Auxiliary function used in Algorithm 14. Inputs: the current time,
t, the current time horizon, H, the current system state at coarser level, X̄, and finer
¯ , the internal clocks R , P , i=1, 2, 3, and the values, S , i=1, 2, 3 (see Section
level, X̄ i i i
6.1.2 for more information on these values). Outputs: updated time, t, updated
system states, X̄, X̄ ¯ , and updated internal clocks R , P , i=1, 2, 3.
i i
1: ti (Pi Ri )/Si , for i=1, 2, 3
2: mini { ti }
3: µ argmini { ti }
4: if t + > H then
5: R R + S·(H t)
6: t H
7: else
8: update X̄ and X̄ ¯
9: R R+S
10: r uniform(0, 1)
11: Pµ Pµ + log(1/r)
12: t t+
13: end if
14: return (t, X̄, X̄ ¯ , R, P )
estimator, ŴML , and the estimated computational work of the SSA method, ŴSSA .
We denote by gl ⌘ g(X̄l (T ; ! ¯ )), and gl+1 gl ⌘ g(X̄l+1 (T ; ! ¯ )). Here, C ⇤
¯ )) g(X̄l (T ; !
is the unitary cost of a pure SSA step, and c is the factor of refinement of (in our
experiments c=10). See also Remark 6.4.1 regarding the estimators of Var [g(X(T ))]
and E [g(X(T ))], and Remark 6.4.3.
(a)
1: l 0, l 0.01, ŴM L 1
0
2: Set initial meshes (tk )K k=0 and (sl )K
l=0
3: fin-delta false
4: while not fin-delta do
5: ( ˆ0 , S 2 (gl ; ·) , A ({gl , EI , NSSA⇤ , NTL }; ·)) Algorithm 23
2 2
6: if V̂l (1 l A (NTL ; ·)) 2S (gl ; ·) l A (NTL ; ·) and l A (NTL ; ·) < 0.1 then
7: fin-delta true
8: Refine l by a factor of c
9: end if
10: end while
11: l+1 l
12: fin false
13: while not fin do
14: fin-delta false
15: while not fin-delta do
16: ( ˆl+1 , V̂l+1 , A ({gl+1 , NSSA⇤ , EI , NT L,l+1 }; ·) , S 2 (gl+1 ; ·)) Algorithm 19
2 2
17: if V̂l+1 (1 l+1 A (NT L,l+1 ; ·)) 2S (gl+1 ; ·) l+1 A (NT L,l+1 ; ·) and
l+1 A (N T L,l+1 ; ·) < 0.1 then
18: fin-delta true
19: l l+1
20: else
21: Refine l+1 by a factor of c
22: end if
23: end while
24: MSSA CA2 S 2 (gl+1 ; ·)/T OL2
236
Algorithm 22 Solve the optimization problem (6.27) using a greedy scheme. Inputs:
the estimations of the coupled path cost for all the levels, ( ˆ` )L`=0 , the estimation of
the variance of the quantity of interest at level 0, V̂0 , the estimations of the di↵er-
ences of the quantity of interest for all the coupled levels, (V̂` )L`=1 , the prescribed
tolerance, T OL, and the weak error estimation for level L, EI . Output: the number
of realizations needed for each level, (M )L`=0 .
q
PL k ˆ
`=0 ` V̂`
Define qk := PL
RHS `=L k+1 V̂`
1: RHS ((T OL T OL2 EI )/CA )2
2: fin false
3: k 0
4: while not fin and k L do
5: if ˆL k qk2 V̂L k < 0 then
6: fin true q
7: L k
(M` )`=0 qk V̂` / ˆ`
8: else
9: ML k 1
10: k k+1
11: end if
12: end while
13: return (M` )L `=0
240
[6] D. F. Anderson, “A modified next reaction method for simulating chemical sys-
tems with time dependent propensities and delays,” The Journal of Chemical
Physics, vol. 127, no. 21, 2007.
[9] T. Li, “Analysis of explicit tau-leaping schemes for simulating chemically reacting
systems,” Multiscale Modeling and Simulation, vol. 6, no. 2, pp. 417–436, 2007.
242
[10] S. Heinrich, “Multilevel Monte Carlo methods,” in Large-Scale Scientific Com-
puting, ser. Lecture Notes in Computer Science. Springer Berlin Heidelberg,
2001, vol. 2179, pp. 58–67.
[11] ——, “Monte Carlo complexity of global solution of integral equations,” Journal
of Complexity, vol. 14, no. 2, pp. 151–175, 1998.
[14] J. Karlsson and R. Tempone, “Towards automatic global error control: Com-
putable weak error expansion for the tau-leap method,” Monte Carlo Methods
and Applications, vol. 17, no. 3, pp. 233–278, March 2011.
[16] S. S. Shapiro and M. B. Wilk, “An analysis of variance test for normality (com-
plete samples),” Biometrika, vol. 52, no. 3/4, pp. 591–611, Dec. 1965.
Chapter 7
A multilevel adaptive
reaction-splitting simulation
method for stochastic reaction
networks
1
Alvaro Moraes, Raúl Tempone and Pedro Vilanova
Abstract
Stochastic modeling of reaction networks is a framework used to describe the time evo-
lution of many natural and artificial systems, including, biochemical reactive systems
at the molecular level, viral kinetics, the spread of epidemic diseases, and wireless
communication networks, among many other examples. In this work, we present a
novel multilevel Monte Carlo method for kinetic simulation of stochastic reaction net-
works that is specifically designed for systems in which the set of reaction channels
can be adaptively partitioned into two subsets characterized by either “high” or “low”
activity. Adaptive in this context means that the partition evolves in time according
1
A. Moraes, R. Tempone and P. Vilanova, “Multilevel adaptive reaction-splitting simulation
method for stochastic reaction networks”, preprint arXiv:1406.1989v1, (2014).
245
to the states visited by the stochastic paths of the system. To estimate expected
values of observables of the system at a prescribed final time, our method bounds
the global computational error to be below a prescribed tolerance, T OL, within a
given confidence level. This is achieved with a computational complexity of order
O (T OL 2 ), the same as with an exact method, but with a smaller constant. We also
present a novel control variate technique based on the stochastic time change repre-
sentation by Kurtz, which may dramatically reduce the variance of the coarsest level
at a negligible computational cost. Our numerical examples show substantial gains
with respect to the standard Stochastic Simulation Algorithm (SSA) by Gillespie and
also our previous hybrid Cherno↵ tau-leap method.
7.1 Introduction
Stochastic reaction networks (SRN) are mathematical models that employ Markovian
dynamics to describe the time evolution of interacting particle systems where one
particle interact with the others through a finite set of reaction channels. Typically,
there is a finite number of interacting chemical species (S1 , S2 , . . . , Sd ) and a stochastic
process, X, such that its i-th coordinate is a non-negative integer number Xi (t) that
keeps track of the abundance of the i-th species at time t. Therefore, the state space
of the process X is the lattice Zd+ .
Our main goal is to estimate the expected value E [g(X(T ))], where X is a non-
homogeneous Poisson process describing a SRN, and g : Rd ! R is a given real
observable of X at a final time T . Pathwise realizations can be simulated exactly
using the Stochastic Simulation Algorithm (SSA), introduced by Gillespie in [1] (also
known as Kinetic Monte Carlo among physicists, see [2] and references therein), or
the Modified Next Reaction Method (MNRM) introduced by Anderson in [3], among
other methods. Although these algorithms generate exact realizations of X, they
246
may be computationally expensive for systems that undergo high activity. For that
reason, Gillespie proposed in [4] the tau-leap method to approximate the SSA by
evolving the process with fixed time steps while freezing the propensity functions at
the beginning of each time step.
A drawback of the tau-leap method is that the simulated paths may take negative
values, which is a nonphysical consequence of the approximation and not a qualitative
feature of the original process. For that reason, we proposed in [5, 6] a Cherno↵-based
hybrid method that switches adaptively between the tau-leap and an exact method.
This allows us to control the probability of reaching negative values while keeping the
computational work substantially smaller than the work of an exact method. The
hybrid method developed in [5, 6] can be successfully applied to systems where the
state space, Zd+ , can be decomposed into two regions according to the activity of the
system; where all the propensities are uniformly low or uniformly high, i.e., non-sti↵
systems. To handle sti↵ systems, we first measure the total activity of the system at
a certain state by the total sum of the propensity functions evaluated at this state.
The activity of the system is low when all the propensities are uniformly low, but a
high level of activity can be the result of a high activity level in one single channel.
This observation suggests that to reduce computational costs, we should adaptively
split the set of reaction channels into two subsets according to the individual high and
low activity levels. It is natural to evolve the system in time by applying the tau-leap
method to the high activity channels and an exact method to the low activity ones.
This is the main idea we develop in this work.
Reaction-splitting methods for simulating stochastic reaction networks are treated
for instance in [7, 8, 9, 10], but our work is, to the best of our knowledge, the first
that i) achieves the computational complexity of an exact method like the SSA by
using the multilevel Monte Carlo paradigm, ii) explicitly uses a decomposition of
the global error to provide all the simulation parameters needed to achieve our goal
247
with minimal computational e↵ort, iii) e↵ectively controls the global probability of
reaching negative populations with the tau-leap method, and iv) needs only two user-
defined parameters that are natural quantities - the maximum allowed relative global
error or tolerance and the confidence level.
In [7], the authors propose an adaptive reaction-splitting scheme that considers
not only the exact and tau-leap methods but also the Langevin and Mean Field ones.
Their main goal is to obtain fast hybrid simulated paths, and they do not try to
control the global error. The efficiency of their method is measured a posteriori using
smoothed frequency histograms that should be close to the exact ones according to the
distance defined by Cao and Petzold in [11]. In their work, the tau-leap step is chosen
according to the “leap condition” (as in [12]) but they do not perform a rigorous
control of the global discretization error. In order to avoid negative populations,
the authors reverse population updates if any value is found to be negative after
accounting for all the reactions. Then, the tau-lep step size is decremented and the
path simulation is restarted. This approach introduces bias in the estimations, and
even by controlling the small reactant populations, a tau-leap step always may lead to
negative populations subsequently increasing its computational work. Our Cherno↵
-based bound is a fast and accurate procedure to obtain the correct tau-leap step size.
Finally, the method in [7] needs to define three parameters that quantify the speed
of the reaction channels, which, in principle, are not trivial to determine for a given
problem.
Puchalka and Kierzek’s approach [8] seems to be closest to our approach in spirit
since they also explore the idea of adaptively splitting the set of reaction channels
using the tau-leap method for the fast ones and an exact method for the slow ones.
They seek to simulate fast approximate paths while maintaining qualitative features
of the system. The quantitative features are checked a posteriori against an exact
method. Regarding their tau-leap step size selection, Puchalka and Kierzek consider
248
a user-defined maximal time step empirically chosen by numerical tests instead of
controlling the discretization error. Their classification rule is applied individually to
each reaction channel. It takes into account both the percentage of individual activity
and the abundance of the species consumed. In a certain sense it can be seen as a
way of controlling the probability of negative populations and an ad-hoc manner to
split the reaction channels by optimizing the computational work.
In [9] and [10], the reaction-splitting issue is addressed but the partition method
is not adaptive, i.e., fast and slow reaction channels are identified o✏ine and are
inputs of the algorithms. We note that these works do not provide any measure or
control of the resulting global error. Furthermore, they do not control the probability
of attaining negative populations.
In the remaining of this section, we introduce the mathematical model and the
path simulation techniques used in this work. In Section 7.2, we present an algorithm
to generate mixed trajectories; that is, the algorithm generates a trajectory using
an exact method for the low activity channels and the Cherno↵ tau-leap method
for the high activity ones. Then, inspired by the ideas of Anderson and Higham
[13], we propose an algorithm for coupling two mixed Cherno↵ tau-leap paths. This
algorithm uses four building blocks that result from the combination of the MNRM
and the tau-leap methods. In Section 7.3, we propose a mixed MLMC estimator.
Next, we introduce a global error decomposition and show that the computational
complexity of our method is of order O (T OL 2 ). Finally, we show the automatic
procedure that estimates our quantity of interest within a given prescribed relative
tolerance, up to a given confidence level. Next, in Section 7.4, we present a novel
control variate technique to reduce the variance of the quantity of interest at level 0.
In Section 7.5, the numerical examples illustrate the advantages of the mixed
MLMC method over the hybrid MLMC method presented in [6] and to the SSA.
Finally, Section 7.6 presents our conclusions.
249
7.1.1 A Class of Markovian Pure Jump Processes
In this section, we describe the class of Markovian pure jump processes, X : [0, T ] ⇥
⌦ ! Zd+ , frequently used for modeling stochastic biochemical reaction networks.
Consider a biochemical system of d species interacting through J di↵erent reaction
channels. For the sake of brevity, we write X(t, !)⌘X(t). Let Xi (t) be the number
of particles of species i in the system at time t. We study the evolution of the state
vector, X(t) = (X1 (t), . . . , Xd (t)) 2 Zd+ , modeled as a continuous-time Markov chain
starting at X(0) 2 Zd+ . Each reaction can be described by the vector ⌫j 2 Zd , such
that, for a state vector x 2 Zd+ , a single firing of reaction j leads to the change
x ! x + ⌫j . The probability that reaction j will occur during the small interval
(t, t+dt) is then assumed to be
The MNRM, introduced in [3], and based on the Next Reaction Method [15], is an
exact simulation algorithm like Gillespie’s SSA that explicitly uses representation
(7.2) for simulating exact paths and generates only one exponential random variable
250
per iteration. The reaction times are modeled with firing times of Poisson processes,
Yj , with internal times given by the integrated propensity functions.
The randomness is now separated from the state of the system and is encapsulated
in the Yj ’s.
Computing the next reaction and its time is equivalent to computing how much
time passes before one of the Poisson process, Yj , fires, and which process fires at
that particular time, by taking the minimum of such times.
It is important to mention that the MNRM is used to simulate correlated exact/tau-
leap paths as well as nested tau-leap/tau-leap paths, as in [6, 13]. In Section 7.2.5,
we use this feature for coupling two mixed paths.
In this section, we define X̄, the tau-leap approximation of the process, X, which
follows from applying the forward-Euler approximation to the integral term in the
random time change representation (7.2).
The tau-leap method was proposed in [4] to avoid the computational drawback
of the exact methods, i.e., when many reactions occur during a short time interval.
The tau-leap process, X̄, starts from X(0) at time 0, and given that X̄(t)=x̄ and a
time step ⌧ >0, we have that X̄ at time t+⌧ is generated by
J
X
X̄(t + ⌧ ) = x̄ + ⌫j Pj (aj (x̄)⌧ ) ,
j=1
where {Pj ( j )}Jj=1 are independent Poisson distributed random variables with pa-
rameter j, used to model the number of times that the reaction j fires during the
(t, t+⌧ ) interval. Again, this is nothing else than a forward-Euler discretization of the
stochastic di↵erential equation formulation of the pure jump process (7.2), realized
by the Poisson random measure with state-dependent intensity (see, e.g., [16]).
251
In the limit, when ⌧ tends to zero, the tau-leap method gives the same solution
as the exact methods [16]. The total number of firings in each channel is a Poisson
distributed stochastic variable depending only on the initial population, X̄(t). The
error thus comes from the variation of a(X(s)) for s 2 (t, t+⌧ ).
In [5], we derived a Cherno↵-type bound that allows us to guarantee that the one-step
exit probability in the tau-leap method is less than a predefined quantity, >0. The
idea is to find the largest possible time step, ⌧ , such that, with high probability, in
the next step, the approximate process, X̄, will take a value in the lattice, Zd+ , of
non-negative integers.
This can be achieved by solving d auxiliary problems, one for each x-coordinate,
X̄i (t), i = 1, 2, . . . , d as follows. Find the largest possible ⌧i 0, such that
J
!
X
P X̄i (t) + ⌫ji Pj aj X̄(t) ⌧i < 0 X̄(t) i, (7.3)
j=1
where i= /d, and ⌫ji is the i-th coordinate of the j-th reaction channel, ⌫j . Finally,
we let ⌧ := min{⌧i : i = 1, 2, . . . , d}.
Using the exact pre-leap method we developed in [5, 6] for single-level and mul-
tilevel hybrid schemes, allows us to switch adaptively between the tau-leap and an
exact method.
By construction, the probability that one hybrid path exits the lattice, Zd+ , can
be estimated by
⇥ ⇤ 2 ⇥ 2 ⇤
P (Ac ) E 1 (1 )NTL = E [NTL ] (E NTL E [NTL ]) + o( 2 ),
2
K(¯
!)
¯ 2 A if and only if the whole hybrid path, (X̄(tk , !
where ! ¯ ))k=0 , belongs to the
252
lattice, Zd+ , >0 is the one-step exit probability bound, and NTL (¯
! )⌘NTL is the
number of tau-leap steps in a hybrid path. Here, Ac is the complement of the set A.
To simulate a hybrid path, given the current state of the approximate process,
X̄(t), we adaptively determine whether to use an exact or the tau-leap method for
the next step. This decision is based on the relative computational cost of taking an
exact step versus the cost of taking a Cherno↵ tau-leap step. Instead, in the present
work, at each time step, we adaptively determine which reactions are suitable for
using the exact method and which reactions are suitable for the Cherno↵ tau-leap
method.
In this section we explain how mixed paths are generated. First, we present the
splitting heuristic; that is, we discuss how to partition the set of reaction channels
at each decision time. Then, we present the one-step mixing rule, which is the main
building block for constructing a mixed path. Finally, we show how to couple two
mixed paths.
In this section, we explain how we partition the set of reaction channels, R:={1, ..., J},
into RTL and RMNRM .
Let (t, x) be the current time and state of the approximate process, X̄, and H be
the next decision (or synchronization) time (given by the Cherno↵ tau-leap step size
⌧Ch = ⌧Ch (x, ) and the time mesh). We want to split R into two subsets, RMNRM
and RTL , such that the expected computational work of reaching H, starting at t, is
minimal for all possible splittings.
The idea goes as follows. First, we define a linear order on R, based on the basic
253
principle that we want to use tau-leap for the j-th reaction if its activity is high. This
linear order determines J+1 possible splittings, out of 2J . In order to measure the
activity, it turns out that using only the propensity functions evaluated at x, that
is, aj (x), is not enough. This is because the j-th reaction could a↵ect components
of x with small values. If this is the case, this determines small Cherno↵ tau-leap
step sizes. In order to avoid this scenario, we penalize the j-th reaction channel if
it has a high exit probability. We approximate this exit probability using a Poisson
distribution for each dimension of x. For example, let ⌫ji be the i-th component of
the j-th reaction channel. If ⌫ji < 0, then the probability that a Poisson distributed
random variable with rate aj (x)(H t) is greater than xi /⌫ji measures how likely
species xi can become negative in the interval H t, independently of reactions j 0 2R,
j6=j 0 . Let Ij :={i : ⌫ji < 0},
8 ⇣ ⌘
>
< P P(aj (x)(H t)) > mini2I { xi
j ⌫ji
} x if Ij 6= ;
✓j := . (7.4)
>
: 0 otherwise
Then, the penalty weight for aj (x) is 1 ✓j . We define ãj (x):=(1 ✓j )aj (x). The linear
order is then a permutation, , over R such that
Second, we find among the J+1 partitions the one with optimal work. This is
the computational work incurred when performing one step of the algorithm using
tau-leap for the reactions RT L and the MNRM for the reactions RMNRM . The work
corresponding to RTL is
!
H t X
Work(RTL , x, t) := Cs + CP (aj (x)⌧Ch ) , (7.5)
min{⌧Ch , H t} j2RTL
254
where Cs is the work of computing the split (see Section 7.2.2), and CP ( ) is the work
H t
of a Poisson random variate with rate . The factor min{⌧Ch ,H t}
takes into account
the number of steps required to reach H = H(t) from t. For the Gamma simulation
method developed by Ahrens and Dieter in [17], which is the one used by MATLAB,
CP is defined as
8
>
< b1 +b2 ln for > 15
CP ( ) := .
>
: b3 +b4 for 15
H t
Work(RMNRM , x, t) := CMNRM ,
min{⌧MNRM , H t}
⇣P ⌘ 1
where the constant CMNRM is the work of an MNRM step and ⌧MNRM = j2RMNRM aj (x) .
The work required to perform the splitting includes the work required to determine
Work(RTL ) and Work(RMNRM ), both defined in Section 7.2.1. The linear order pre-
viously defined determines J+1 possible splittings, Si , i=0, ..., J, as follows:
RTL RMNRM
S0 ; R
1 1 1
S1 { (1)} { (2), .., (J)}
.
1 1 1 1
S2 { (1), (2)} { (3), .., (J)}
..
.
SJ R ;
255
The cost of computing each of the J+1 splits is dominated by the cost of determining
the Cherno↵ tau-leap step size, ⌧Ch (see (7.5)). As we observe in [5], the work of
computing a single ⌧Ch is linear on J. Then, in order to avoid J 2 complexity of the
splitting rule, we implement a local search instead of computing J ⌧Ch ’s, to keep the
complexity of Cs linear on J. The main idea is to keep track of the last split at each
decision time, assuming that the propensities do not vary widely between. If that
is the case, we can just evaluate the previous split, S , and its neighbors, 1 and
+1. Then, the cost of the splitting rule is on the order of three computations of a
Cherno↵ step size. It turns out that this local search is very accurate for the examples
we worked on. In order to avoid being trapped in local minima, a randomization rule
may be applied.
P
j2R ã (j)
RTL is defined s.t. PJ TL ⌫,
k=1 ãk
In this section we present the main building block for simulating a mixed path. Let
x=X̄(t) be the current state of the approximate process, X̄. Therefore, the expected
time step of the MNRM is given by 1/a0 (x). To move one step forward using the
MNRM, we should compute at least a0 (x) and sample a uniform random variable.
On the other hand, to move one step forward using the mixed Cherno↵ tau-leap
method, we need first to compute the split, then compute the tau-leap increments for
the reactions in the tau-leap set, RTL , and finally compute the MNRM steps for the
reactions in the set RMNRM , as discussed in Section 7.2.2.
To avoid the overhead caused by unnecessary computation of the split, we first
estimate the computational work of moving forward from the current time, t, to the
next grid point, T̃ , by using the MNRM only. If this work is less than the work of
computing the split, we take an exact step.
Algorithm 24 The one-step mixing rule. Inputs: the current state of the approxi-
mate process, X̄(t), the current time, t, the values of the propensity functions eval-
uated at X̄(t), (aj (X̄(t)))Jj=1 , the one-step exit probability bound , the next grid
point, T̃ , and the previous optimal split, . Outputs: the tau-leap set, RTL , the
exact set, RMNRM , and the new optimal split .
PJ
Require: a0 j=1 aj > 0
1: if K1 /a0 < T̃ t then
2: Compute ✓j , j=1, .., J (see (7.4))
3: ã (j) Sort{(1 ✓j )aj } descending, j=1, .., J
4: Si Compute the splits taking into account the previous optimal split
5: (RTL , RMNRM , ) Take the minimum work split
6: return (RTL , RMNRM , )
7: else
8: return (;, R, )
9: end if
In this section, we present a novel algorithm (Algorithm 25) that combines the ap-
proximate Cherno↵ tau-leap method and the exact MNRM to generate a whole hy-
brid path. This algorithm combines the advantages of an exact method (expensive
but exact) and the tau-leap method (may be cheaper but has a discretization er-
ror and a positive probability of exiting the lattice). This algorithm automatically
and adaptively partitions the reactions into two subsets, RTL and RMNRM , using a
computational work criterion.
Since a mixed path consists of a certain number of exact/approximate steps, it
may also exit the lattice, except in those steps in which the tau-leap method is not
applied; that is, when RTL is empty. The idea of this algorithm is to apply, at each
decision point, the one-step mixing rule (Algorithm 24) to determine the sets RTL
and RMNRM , and then to apply the corresponding method.
In this section, we explain how to couple two mixed paths. This is essential for the
multilevel estimator. The four algorithms that are the building blocks of the coupling
258
Algorithm 25 The mixed-path algorithm. Inputs: the initial state, X(0), the
propensity functions, (aj )Jj=1 , the stoichiometric vectors, ⌫=(⌫j )Jj=1 , the final time, T ,
and the one-step exit probability bound, . Outputs: a sequence of states, (X̄(tk ))K k=0 ,
and the number of times, NTL , that the tau-leap method was successfully applied (i.e.,
X̄(tk ) 2 Zd+ , we applied the tau-leap method and we obtained an X̄(tk+1 ) 2 Zd+ ).
Notes: given the current state, nextMNRM computes the next state using the MNRM
method. Here, ti denotes the current time at the i-th step, and ⌧Ch (RTL ) is the
Cherno↵ step size associated with RTL .
1: i 0, ti t0 , X̄(ti ) X(0), Z̄ X(0)
2: Sj Compute splits, j=0, ..., J
3: arg minj Work(Sj )
4: while ti < T do
5: T̃ next grid point greater than ti
6: (RTL , RMNRM , ) Algorithm 24 with (Z̄, ti , (aj (Z̄))Jj=1 , , T̃ , )
7: if RTL 6= ; then
8: TL P(aj (Z̄)⌧Ch (RTL ))⌫j , for j2RTL
9: H ti + ⌧Ch (RTL )
10: else P
11: H min{ti log(r)/ j aj , T }, r⇠Unif(0, 1)
12: end if
13: if RMNRM 6= ; then
14: while ti < H do
15: (Z̄, ti ) nextMNRM (Z̄, Re , ti , H)
16: end while
17: end if
18: Z̄ Z̄ + TL
19: if Z̄ 2 Zd+ then
20: NTL NTL + 1
21: ti+1 H
22: else
23: return ((X̄(tk ))ik=0 , NTL )
24: end if
25: i i+1
26: X̄(ti ) Z̄
27: end while
28: return ((X̄(tk ))ik=0 , NTL )
algorithm were already presented in [6]. The novelty here comes from the fact that
the coupled mixed algorithm may have to run the four algorithms concurrently in
the sense of the time of the process, t. In this section, we denote with a bar ¯· and a
double bar ¯· coarse and fine grid-related quantities.
259
We now briefly describe the mixed Cherno↵ coupling algorithm, i.e., Algorithm 26.
¯ be two mixed paths, corresponding to two nested time discretizations,
Let X̄ and X̄
called coarse and fine, respectively. Assume that the current time is t, and we know the
¯ (t), the next grid points at each level, t̄, t̄¯, and the corresponding
states, X̄(t) and X̄
one-step exit probabilities, ¯ and ¯. Based on this knowledge, we have to determine
¯ , R̄
the four sets (R̄TL , R̄MNRM , R̄ ¯
TL MNRM ), that correspond to four algorithms, B1,
B2, B3 and B4, that we use as building blocks. Table 7.1 summarizes them. In order
R̄TL R̄MNRM
¯
R̄ B1 B2
TL
¯
R̄ B3 B4
MNRM
Table 7.1: Building blocks for simulating two coupled mixed Cherno↵ tau-leap paths.
Algorithms B1 and B2 are presented as Algorithms 2 and 3 in [13]. Algorithms B3
and B4 can be directly obtained from Algorithm B2 (see [6]).
to do that, the algorithm computes, independently, the sets RTL and RMNRM for each
level, and the time until the next decision is taken, H, using Algorithm 27. Next, it
computes concurrently the increments due to each one of the sets (storing the results
in X̄ and ¯ for the coarse and fine grid, respectively). We note that the only case
X̄
in which we use a Poisson random variates generator for the tau-leap method is in
Algorithm B1 (Algorithm 28). For Algorithms B2, B3 and B4, the Poisson random
variables are simulated by adding independent exponential random variables with the
same rate, , until exceeding a given time final time, T . The only di↵erence in the
latter blocks are the time points at which the propensities, aj , are computed. For B2,
the coarse propensities are frozen at time t, whereas for B3 the finer are frozen at t.
In B4, the propensities are computed at each time step. After arriving at time H, the
¯ , R̄
four sets (R̄TL , R̄MNRM , R̄ ¯
TL MNRM ) and the time until the next decision is taken,
H, are determined again, and then all procedures are repeated until the simulation
reaches the final time, T .
260
7.3 The Multilevel Estimator and Total Error De-
composition
In this section, we first show the multilevel Monte Carlo estimator. We then analyze
and control the computational global error, which is decomposed into three error
components: the discretization error, the global exit error, and the Monte Carlo
statistical error. Upper bounds for each one of the three components are given.
Finally, we briefly describe the automatic estimation procedure that allows us to
estimate our quantity of interest within a given prescribed relative tolerance, up to a
given confidence level.
In this section, we discuss and implement a multilevel Monte Carlo estimator for the
mixed Cherno↵ tau-leap case.
Consider a hierarchy of nested meshes of the time interval [0, T ], indexed by
` = 0, 1, . . . , L. Let t0 be the size of the coarsest time mesh that corresponds
`
to the level `=0. The size of the time mesh at level ` 1 is given by t` =R t0 ,
where R>1 is a given integer constant. Let {X̄` (t)}t2[0,T ] be a mixed Cherno↵ tau-
leap process with a time mesh of size t` and a one-step exit probability bound ,
and let g` :=g(X̄` (T )) be our quantity of interest computed with a mesh of size t` .
We can simulate paths of {X̄` (t)}t2[0,T ] by using Algorithm 25. We are interested in
estimating E [gL ], and we can simulate correlated pairs, (g` , g` 1 ) for ` = 1, . . . , L, by
using Algorithm 26. Let A` be the event in which the `-th grid level path, X̄` , arrives
at the final time, T , without exiting the state space of X.
261
Consider the following telescopic decomposition:
L
X ⇥ ⇤
E [gL 1AL ] = E [g0 1A0 ] + E g ` 1A` g ` 1 1A ` 1
,
`=1
where 1A is the indicator function of the set A. This motivates the definition of our
MLMC estimator of E [g(X(T ))]:
M0 L M
1 X X 1 X̀
ML := g0 1A0 (!m,0 ) + [g` 1A` g` 1 1A` 1 ](!m,` ). (7.6)
M0 m=1 `=1
M ` m=1
Computational Complexity
A key property of our multilevel estimator is that the computational work is a function
of the given relative tolerance, T OL, is of the order of T OL 2 . The optimal work is
is given by
L
!2
CA X p
wL⇤ (T OL) = V` ` T OL 2 .
✓ `=0
P p
From the fact that the sum 1`=0 V` ` converges, because ` = O ( MNRM ), we con-
PL p
clude that supL { `=0 V` ` } is bounded and, therefore, the expected computational
complexity of the multilevel mixed Cherno↵ tau-leap method is wL⇤ (T OL)=O (T OL 2 ).
In this section, we define the computational global error, EL , and show how it can be
naturally decomposed into three components: the discretization error, EI,L , and the
exit error, EE,L , both coming from the tau-leap part of the mixed method, and the
Monte Carlo statistical error, ES,L . We also give upper bounds for each one of the
three components.
The computational global error, EL , is defined as
EL := E [g(X(T ))] ML ,
262
and can be decomposed as
⇥ ⇤
E [g(X(T ))] ML = E g(X(T ))(1AL + 1AcL ) ± E [gL 1AL ] ML
⇥ ⇤
= E g(X(T ))1AcL + E [(g(X(T )) gL ) 1AL ] + E [gL 1AL ] ML .
| {z } | {z } | {z }
=:EE,L =:EI,L =:ES,L
We showed in [5] that by choosing adequately the one-step exit probability bound,
, the exit error, EE,L , satisfies |EE,L | |E [g(X(T ))] | P (AcL ) T OL2 .
An efficient procedure for accurately estimating EI,L in the context of the tau-leap
method is described in [6].
N (¯
!)
For each mixed path, (X̄` (tn,` , !
¯ ))n=0 , we define the sequence of dual weights,
N (¯
!)
('n,` (¯
! ))n=1 , backwards as follows:
where tn,` :=tn+1,` tn,` , r is the gradient operator and Ja (X̄` (tn,` , !
¯ ))⌘[@i aj (X̄` (tn,` , !
¯ ))]j,i
is the Jacobian matrix of the propensity function, aj , for j=1 . . . J and i=1 . . . d.
We then approximate EI,L by A (EI,L (¯
! ); ·), where
N (¯
!) J
!
X tn,L X
EI,L (¯
! ) := 'n,L 1j2RTL (n) ⌫jT aj (X̄L (tn+1,` )) aj (X̄L (tn,` )) (¯
! ),
n=1
2 j=1
PM
A (X; M ) := M1 m=1 X(!m ) and S 2 (X; M ) :=A (X 2 ; M ) A (X; M )2 denote the
sample mean and the sample variance of the random variable, X, respectively. Here
1j2RTL (n) =1 if and only if, at time tn,` , the tau-leap method was used for reaction
channel j, and we denote by Id the d ⇥ d identity matrix.
P L V`
The variance of the statistical error, ES,L , is given by `=0 M` , where V0 :=
⇥ ⇤
Var [g0 1A0 ] and V` := Var g` 1A` g` 1 1A` 1 , ` 1. In [6], we presented an efficient
263
and accurate method for estimating V` , ` 1 using the formula
! !
X ⇥ ⇤ X ⇥ ⇤
V̂` := S 2 E 'n+1 · en+1 F (¯
! ); M` +A Var 'n+1 · en+1 F (¯
! ); M` ,
n n
N (¯
!)
where F is a suitable chosen sigma algebra such that ('n (¯
! ))n=1 is measurable,
with N (¯
! ) being the total number of steps given by Algorithm 26. In this way,
⇥ ⇤ ⇥ ⇤
the only randomness in E 'n+1 · en+1 F and Var 'n+1 · en+1 F comes from the
N (¯
!)
local errors, (en )n=1 , defined as en := X`,n X` 1,n .
In the aforementioned work,
⇥ ⇤
we derived exact and approximate formulas for computing E 'n+1 · en+1 F and
⇥ ⇤
Var 'n+1 · en+1 F .
Remark 7.3.1 (Backward Euler). In (7.7), we have that 'n,` can be computed by
a backward Euler formula when too fine time meshes are required for stability, i.e.,
1
'n,` := Id tn,` JTa (X̄` (tn,` , !
¯ )) ⌫ T 'n+1,` .
In this section, we briefly describe the automatic procedure that estimates E [g(X(T ))]
within a given prescribed relative tolerance, T OL>0, up to a given confidence level.
Up to minor changes, it is the same as the one presented in [6]. It is important to
remark that the minimal user intervention is required to obtain the parameters needed
to simulate the mixed paths, and subsequently, to compute the estimations using
(7.6). Once the reaction network is given (stoichiometric matrix ⌫ and J propensity
functions aj ), the user only needs to set the required maximum allowed relative global
error or tolerance, T OL, and the confidence level, ↵. This process has three phases:
s.t. . (7.8)
>
>
>
>
: EE,L + EI,L + ES,L T OL
where t0 is the size of the time mesh at level 0 and 0 is the exit probability
bound at level 0, and RTL = RTL (t) is the tau-leap set, which depends on time
(and also the current state of the process). The set RTL is determined at each
decision step by Algorithm 24. Therefore, the expected work at level 0 is 0 M0 ,
h i h i
(c) (c)
` := CMNRM E NMNRM (`) + CTL E NTL (`) (7.10)
2 3
Z X
+ E4 CP (aj (X̄` (s))⌧Ch (X̄` (s), ` ))ds5
[0,T ] j2RTL,` (s)
2 3
Z X
+ E4 CP (aj (X̄` 1 (s))⌧Ch (X̄` 1 (s), ` 1 ))ds
5,
[0,T ] j2RTL,` 1 (s)
where
(c)
NMNRM (`) := NMNRM ( t` , ` ) + NMNRM ( t` 1 , ` 1)
(c)
NTL (`) := NTL ( t` , ` ) + NTL ( t` 1 , ` 1 ).
Time Change
In this section, we motivate a novel control variate for the random variable X(T, !)
defined by the random time change representation,
X ✓Z T ◆
X(T, !) = x0 + ⌫ j Yj aj (X(s)) ds, ! .
j 0
First, we replace the independent Poisson processes, (Yj (s, !))s 0 , by the identity
266
function. This defines the deterministic mean field,
X Z T
Z(T ) = x0 + ⌫j aj (Z(s)) ds.
j 0
X ✓Z T ◆
X̃(T, !) = x0 + ⌫ j Yj aj (Z(s)) ds, ! ,
j 0
ˆ j,k , by
The sequence Zk allow us to define another sequence, ⇤
8
>
< ⇤
ˆ j,k+1 = ⇤
ˆ j,k + aj (Zk ) tk , k=1, . . . , K 1
,
> ˆ =0
: ⇤ j,0
RT
ˆ j,K approximates
where ⇤ aj (Z(s)) ds.
0
Then, for each realization of X̄(T, !), which is an approximation of X(T, !), we
267
compute the control variate:
X ⇣ ⌘
X̂K = x0 + ⌫ j Yj ⇤ˆ j,K , (7.11)
j
which is the corresponding approximation of X̃(T, !) and has the computable expec-
tation
h i X
µK := E X̂K = x0 + ˆ j,K .
⌫j ⇤
j
N (!)
Now, we consider the random sequence, {X̄n (!)}n=0 , generated in this case by
the mixed algorithm. Here, X̄N (!) (!) is an approximation of X(T, !). The sequence
¯ j,n (!)}, is defined by
of mixed random times, {⇤
8
>
< ⇤
¯ j,n+1 = ⇤
¯ j,n + aj (X̄n (!)) sn , n=0, . . . , N (!) 1
,
> ¯ j,0 = 0
: ⇤
¯ j,n < ⇤
1. for some n, ⇤ ˆ j,K < ⇤
¯ j,n+1 . Since Yj ⇤
¯ j,n and Yj ⇤
¯ j,n+1 are known, we
ˆ j,K )
Yj ( ⇤ ¯ j,n , Yj ⇤
Yj ⇤ ¯ j,n+1 ¯ j,n + B
⇠ Yj ⇤
!
ˆ j,K ⇤
⇤ ¯ j,n
¯ j,n+1
B ⇠ binomial Yj ⇤ ¯ j,n
Yj ⇤ ,¯ .
⇤j,n+1 ⇤¯ j,n
ˆ j,K > ⇤
2. ⇤ ¯ j,N . Since we know the value Yj ⇤
¯ j,N , we just have to sample a
268
Poisson random variate as follows:
ˆ j,K ) Yj ⇤
Yj ( ⇤ ¯ j,N ⇠ Yj ⇤
¯ j,N + P
ˆ j,K
P ⇠ Poisson(⇤ ¯ j,N ).
⇤
⇥ ⇤
Finally, using the aforementioned control variate, we can estimate E g(X̄(T ))
with
M M
1 X 1 X
g(X̄N (!m )) (g(X̂K (!m )) g(µK )), (7.12)
M m=1 M m=1
h i
⇣ h i⌘
for any linear functional, g, since E g(X̂K ) = g E X̂K = g(µK ).
Remark 7.4.1 (Nonlinear observables). Observe in (7.11) that X̂K is a linear com-
bination of independent Poisson random variables. For that reason, we can exactly
compute the expected value of any polynomial or exponential function of X̂K . Let C
be the class of functions spanned by the family of polynomials and exponential func-
tions on X̂K . Let g be any nonlinear function and g̃ its projection onto the class C.
Consider now the following telescoping sum:
⇥ ⇤ ⇥ ⇤ h i h i
E g(X̄N ) = E (g g̃)(X̄N ) + E g̃(X̄N ) g̃(X̂K ) + E g̃(X̂K ) .
The random variable (g g̃)(X̄N ) has small variance since g is well approximated by
g̃. The variance of g̃(X̄N )g̃(X̂K ) is small since X̂K is coupled to X̄N .
⇥ ⇤
This observation leads us to approximate E g(X̄N ) by
1 X⇣ ⌘ h i
M
g(X̄N (!m )) g̃(X̂K (!m ) + E g̃(X̂K )) ,
M m=1
⇣ ⌘
g(X̄0 (T )) = g(X̃(T )) + g(X̄0 (T )) g(X̃(T )) .
Therefore,
⇥ ⇤ h i h i
E g(X̄0 (T )) = E g(X̃(T )) + E g(X̄0 (T )) g(X̃(T )) .
h i
We assume that we can compute exactly E g(X̃(T )) (if not, we can use a g̃ as in Re-
h i ⇥ ⇤
mark 7.4.1), we just have to estimate E g(X̄0 (T )) g(X̃(T )) instead of E g(X̄0 (T ))
h i
in our multilevel scheme. The computational gain lies in the fact that Var g(X̄0 (T )) g(X̃(T ))
⇥ ⇤
could be substantially lower than Var g(X̄0 (T )) .
Remark 7.4.3 (Computational Cost). An advantage of this control variate is that the
¯ j,n
computational cost is almost negligible because we only need to store two scalars, ⇤
¯ j,n+1 , for each reaction, j. These values are determined at each step by aj (X̄n ),
and ⇤
which is a quantity that is already computed at each time step of the mixed algorithm.
Also, for each realization of the control variate, at most one Poisson random variate
is needed for each reaction channel.
Remark 7.4.4 (Empirical Time Change). We can also compute the final times,
ˆ j,K , using a sample average of mixed paths instead of the mean field. We found
⇤
no significant improvements when using that approach, which requires a lot more
computational work. We conjecture that, for settings in which the mean field is not
representative, this approach is the only reasonable option.
270
7.5 Numerical Examples
In this section, we present two examples to illustrate the performance of our proposed
method, and we compare the results with the hybrid MLMC approach given in [6]. For
benchmarking purposes, we use Gillespie’s Stochastic Simulation Algorithm (SSA)
instead of the Modificed Next Reaction Method (MNRM) because the former is widely
used in the literature.
This model, first developed in [18], has four species and six reactions,
1
E ! E+G, the viral template (E) forms a viral genome (G),
0.025
G ! E, the genome generates a new template,
1000
E ! E+S, a viral structural protein (S) is generated,
7.5⇥10 6
G+S ! V , the virus (V) is produced,
0.25 2
E ! ;, S ! ; degradation reactions.
0 1tr 0 1
B 1 0 0 0 C B E C
B C B C
B 1 0 1 0 C B 0.025 G C
B C B C
B C B C
B C B C
B 0 1 0 0 C B 1000 E C
⌫=B
B
C
C and a(X) = B
B
C,
C
B 1 1 0 1 C B 7.5⇥10 6 G S C
B C B C
B C B C
B 0 0 1 0 C B 0.25 E C
B C B C
@ A @ A
0 1 0 0 2S
respectively.
271
In this model, X(t) = (G(t), S(t), E(t), V (t)), and g(X(t)) = V (t), the number of
viruses produced. The initial condition is X0 =(0, 0, 10, 0) and the final time is T =20.
This example is interesting because i) it shows a clear separation of time scales, ii)
our previous hybrid Cherno↵ method has no compuational work gain with respect to
an exact method, and iii) in [13] the authors take an alternative approach, not using
the multilevel aspect of their paper.
We now analyze an ensemble of 10 independent runs of the phase II algorithm (see
Section 7.3.3), using di↵erent relative tolerances. In Figure 7.1, we show the total
predicted work (runtime) for the multilevel mixed method and for the SSA method,
versus the estimated error bound. We also show the estimated asymptotic work of the
multilevel mixed method. We remark that the computational work of the multilevel
hybrid method is the same as the work of the SSA.
ŴM L
T OL L⇤ ŴSSA
Figure 7.1: Left: Predicted work (runtime) versus the estimated error bound,
with 95% confidence intervals. The multilevel mixed method is preferred over
the SSA and the multilevel hybrid method for all the tolerances. P Right: Details
L⇤ ˆ
for the ensemble run of the phase II algorithm. Here, ŴM L = `=0 ` M` and
ŴSSA = MSSA CSSA A (NSSA⇤ ; ·). As an example, the fourth row of the table tells
us that, for a tolerance T OL=1.25 · 10 2 , 2.2 levels are needed on average. The work
of the multilevel hybrid method is, on average, 3% of the work of the SSA and the
multilevel hybrid method. Confidence intervals at 95% are also provided.
In Figure 7.2, we can observe how the estimated weak error, ÊI,` , and the estimated
variance of the di↵erence of the functional between two consecutive levels, V̂` , decrease
272
linearly as we refine the time mesh, which corresponds to a tau-leap dominated regime.
This linear relationship for the variance starts at level 1, as expected. When the
MNRM dominated regime is reached, both quickly converge to zero as expected. The
estimated total path work, ˆ` , increases as we refine the time mesh. Observe that it
increases linearly for the coarser grids, until it reaches a plateau, which corresponds
to the pure MNRM case where the computational cost is independent of the grid size.
In the lower right panel, we show the total computational work, only for the cases in
which ÊI,` < T OL T OL2 .
Figure 7.2: Upper left: estimated weak error, ÊI,` , as a function of the time mesh size,
h. Upper right: estimated variance of the di↵erence between two consecutive levels,
path work, ˆ` , as a function of h. Lower
V̂` , as a function of h. Lower left: estimatedP
right: estimated total computational work, Ll=0 ˆl Ml , as a function of the level, L.
In Figure 7.4, we show the main outputs of the phase II algorithm, ` and M`
for ` = 0, ..., L⇤ , for the smallest considered tolerance. In this example, L⇤ is 8 or 9,
273
Figure 7.3: Left: Percentage of the statistical error over the total error. q As we
mentioned in Section 7.3.1, it is well above 0.5 for all the tolerances. Right: V̂` ˆ` ,
as a function of `, for the smallest tolerance, which decreases as the level increases.
Observe that the contribution of level 0 is less than 50% of the sum of the other levels.
depending on the run. We observe that the number of realizations decreases slower
than linearly, from levels 1 to L⇤ 1, until it drops, due to the change to a MNRM
dominated regime.
Figure 7.4: The one-step exit probability bound, `, and M` for `=0, 1, ..., L⇤ , for the
smallest tolerance.
Figure 7.5: Left: T OL versus the actual computational error. The numbers above
the straight line show the percentage of runs that had errors larger than the required
tolerance. We observe that in all cases, the computational error follows the imposed
tolerance with the expected confidence of 95%. Right: quantile-quantile plot based
on realizations of ML .
Remark 7.5.1. In the simulations, we observe that, as we refine T OL, the optimal
number of levels approximately increases logarithmically, which is a desirable feature.
We fit the model L⇤ = a log(T OL 1 ) + b, obtaining a=1.47 and b=3.56.
Remark 7.5.2 (Pareto rule). Using the cost-based rule (see remark 7.2.1), we esti-
mate the threshold for the Pareto rule, obtaining ⌫ = 0.95419. It turns out that, for
this example, ŴM ixP areto /ŴM ix ranges from 0.6 to 0.75 (for most T OLs). This shows
that it is possible to increase the computational work gains further in some examples.
Remark 7.5.3. The savings in computational work when generating Poisson random
variables heavily depend on MATLAB’s performance capabilities. In fact, we would
expect better results from our method if we were to implement our algorithms in more
performance-oriented languages or if we were to sample Poisson random variables in
batches.
275
A Simple Sti↵ System
This model, adapted from [19], has three species and a mixture of fast and slow
reaction channels,
c
1 c3 c4
*
X1 ) X2 ! X3 ! ;, c2 c 3 > c4 .
c2
0 1tr 0 1
B 1 1 0 C B c1 X 1 C
B C B C
B 1 1 0 C B c2 X 2 C
B C B C
⌫=B C and a(X) = B C,
B 0 1 1 C B cX C
B C B 3 2 C
@ A @ A
0 0 1 c4 X 3
respectively, where g(X(t)) = X3 (t). In this model, successive firings of the reac-
tion X2 ! X3 are separated by many reversible firings between X1 and X2 , which
takes a lot of computational work in a standard SSA run. In [20], Gillespie et al.
claim that this inefficiency cannot be addressed using ordinary tau-leaping because
of the sti↵ness of the system. We show here that we have substantial gains using our
mixed method, which also controls the global error. In this example, we also show
the performance of the control variate idea, presented in Section 7.4. We analyze 10
independent runs of the phase II algorithm (see Section 7.3.3), using di↵erent rela-
tive tolerances. In Figure 7.6, we show the total predicted work (runtime) for the
multilevel mixed method with and without a control variate at level 0 and for the
SSA method versus the estimated error bound. We also show the estimated asymp-
totic work of the multilevel mixed method. Observe that, for practical tolerances, the
computational work gains with respect to the SSA method, when using the control
variate, are of a factor of 500 times. Without using the control variate, computational
gains are also substantial.
276
ŴM Lcv ŴM L
Predicted work vs. Error bound, Simple stiff model T OL L⇤
ŴSSA ŴSSA
−3
3.13e-03 1.0 0.002 ±0.0004 0.03 ±0.001
10
1.56e-03 1.0 0.003 ±0.0004 0.04 ±0.001
7.81e-04 1.0 0.003 ±0.0010 0.04 ±0.002
Error bound
Figure 7.6: Left: Predicted work (runtime) versus the estimated error bound, with
95% confidence intervals, for the simple sti↵ model with and without using the control
variate at level 0, as described in Section 7.4. Right: Details of the ensemble run of
the phase II algorithm using the control variate (third column) and without using
the control variate (fourth column). As an example, the fifth row of the table tells
us that, for a tolerance T OL=1.95 · 10 4 , 2 levels are needed on average. The work
of the multilevel mixed method using the control variate at level 0 is, on average, 1%
of the work of the SSA. When not using the control variate, it is 9%. Confidence
intervals at 95% are also provided.
7.6 Conclusions
Acknowledgments
The research reported here was supported by King Abdullah University of Science
and Technology (KAUST). The authors are members of the KAUST SRI Center
for Uncertainty Quantification at the Computer, Electrical and Mathematical Sci-
ences & Engineering Division at King Abdullah University of Science and Technology
(KAUST).
278
Appendix
279
Algorithm 26 Coupled mixed path. Inputs: the initial state, X(0), the final time
T , the propensity functions, (aj )Jj=1 , the stoichiometric vectors, (⌫j )Jj=1 , and two time
N0
meshes, one coarser (ti )N i=0 , such that tN =T and a finer one, (sj )j=0 , such that s0 =t0 ,
N0
sM =tN , and (ti )N i=0 ⇢(sj )j=0 . Outputs: a sequence of states evaluated at the coarse
k=0 ⇢ Z+ , such that tK T , a sequence of states evaluated at the fine
grid, (X̄(tk ))K d
the tau-leap method was successfully applied at the fine level and at the coarse level
and the number of exact steps at the fine level and at the coarse level. For the sake
of simplicity, we omit sentences involving the recording of current state variables,
counting of the number of steps, checking if the path jumps out of the lattice, the
updating of the current split, , and the return sentence.
1: t t ; X̄ X(0); X̄¯ X(0)
0
2: t̄ next grid point in (ti )N i=0 larger than t
3: (H̄, R̄TL , R̄MNRM , ā) Algorithm 27 with (X̄,t,t̄,T , ¯)
4: t̄¯ next grid point in (si )N i=0 larger than t
5: ¯ ¯ ¯
(H̄, R̄TL , R̄MNRM , ā) ¯ Algorithm 27 with (X̄ ¯ ,t,t̄¯,T , ¯)
6: while t < T do
7: H min{H̄, H̄ ¯}
(B1 , B2 , B3 , B4 ) ¯ , R̄
split building blocks from (R̄TL , R̄MNRM , R̄ ¯
8: TL MNRM )
9: Algorithm 28 (compute state changes due to block B1 )
10: Initialize internal clocks R, P if needed (see [5, 6])
11: X̄ 0; X̄ ¯ 0
12: for B = B2 , B3 , B4 do
13: tr t
14: X̄r ¯
X̄; X̄ ¯
X̄
r
15: while tr < H do
16: update Pj2B
17: switch B
18: case B2
19: d¯ āj2B
20: d¯ aj2B (X̄ ¯)
21: ⌧r Compute the Cherno↵ tau-leap step size using (X̄r , āj2B , H, ¯)
22: end case
23: case B3
24: d¯ aj2B (X̄)
25: d¯ ā ¯j2B
¯ , ā ¯
26: ⌧r Compute the Cherno↵ tau-leap step size using (X̄ r ¯j2B , H, )
27: end case
28: case B4
29: d¯ aj2B (X̄)
30: d¯ aj2B (X̄ ¯)
31: ⌧r 1
32: end case
33: end switch
280
34: A1 min(d, ¯
¯ d)
35: A2 d¯ A1 ; A3 d¯ A1
36: Hr min{H, tr +⌧r }
37: ¯ ,R ,P )
(tr , X̄r , X̄ ¯ , R , P , A)
Algorithm 29 with (tr , Hr , X̄r , X̄
r jB j2B r j2B j2B
38: end while
39: X̄ X̄ + (X̄r X̄); ¯
X̄ ¯ + (X̄
X̄ ¯ X̄¯)
r
40: end for
¯ ¯+ ˆ ¯
41: X̄ X̄ + X̂ + X̄; X̄ X̄ X̂ + X̄
42: t H
43: if t < T then
44: if H̄ H̄ ¯ then
45: t̄ next grid point in (ti )N
i=0 larger than t
46: (H̄, R̄TL , R̄MNRM , ā) Algorithm 27 with (X̄,t,t̄,T , ¯)
47: end if
48: if H̄ H̄ ¯ then
t̄¯ next grid point in
0
49: (sj )N
j=0 larger than t
(H̄¯ , R̄
¯ , R̄ ¯ ¯) ¯ ,t,t̄¯,T , ¯)
50: TL MNRM , ā Algorithm 27 with (X̄
51: end if
52: end if
53: end while
Algorithm 27 Compute the next time horizon. Inputs: the current state, X̃, the current
time, t, the next grid point, t̃, the final time, T , the one step exit probability bound, ˜,
and the propensity functions, a=(aj )Jj=1 . Outputs: the next horizon H, the set of reaction
channels to which the Tau-leap method should be applied, R̃TL , the set of reaction channels
to which MNRM should be applied, R̃MNRM , and current propensity values ã.
1: ã a(X̃)
2: (R̃TL , R̃MNRM ) Algorithm 24 with (X̄, t, (aj (X̄))Jj=1 , ˜, t̃, )
3: if R̃TL 6= ; then
4: H̃ min{t̃, t+⌧ (R̃TL ), T }
5: else
6: H̃ min{t+⌧ (R̃TL ), T }
7: end if
8: return (H̃, R̃TL , R̃MNRM , ã)
281
Algorithm 29 The auxiliary function used in algorithm 26. Inputs: current time, t,
current time horizon, T̄¯, current system state at coarser level and finer level, X̄, X̄
¯ , re-
spectively, the internal clocks R and P , the values, A, and the current building block, B.
Outputs: updated time, t, updated system states, X̄, X̄ ¯ , and updated internal clocks, R ,
i
Pi , i=1, 2, 3.
1: ti (Pi Ri )/Ai , for i = 1, 2, 3
2: mini { ti }
3: µ argmini { ti }
4: if t + > T̄¯ then
5: R R + A·(T̄¯ t)
6: t ¯
T̄
7: else
8: update X̄ and X̄ ¯ using ⌫
j2B
9: R R+A
10: r uniform(0, 1)
11: Pµ Pµ + log(1/r)
12: t t+
13: end if
14: return (t, X̄, X̄¯ , R, P )
282
REFERENCES
[1] D. T. Gillespie, “A general method for numerically simulating the stochastic
time evolution of coupled chemical reactions,” Journal of Computational Physics,
vol. 22, pp. 403–434, 1976.
[3] D. F. Anderson, “A modified next reaction method for simulating chemical sys-
tems with time dependent propensities and delays,” The Journal of Chemical
Physics, vol. 127, no. 21, 2007.
[6] ——, “Multilevel hybrid Cherno↵ tau-leap,” Accepted for publication in BIT
Numerical Mathematics, 2015.
[7] L. Harris and P. Clancy, “A partitioned leaping? approach for multiscale mod-
eling of chemical reaction dynamics,” J. Chem. Phys, vol. Volume 125, 2006.
[8] J. Puchalka and A. Kierzek, “Bridging the gap between stochastic and determin-
istic regimes in the kinetic simulations of the biochemical reaction networks,”
Biophysical Society Biophysical Journal, vol. 86, no. 3, pp. 1357–1372, 2004.
[9] E. Haseltine and J. Rawlings, “Approximate simulation of coupled fast and slow
reactions for stochastic chemical kinetics,” J. Chem. Phys, vol. 117, no. 15, 2002.
[12] Y. Cao, D. T. Gillespie, and L. R. Petzold, “Efficient step size selection for
the tau-leaping simulation method,” The Journal of Chemical Physics, vol. 124,
no. 4, p. 044109, 2006.
[13] D. Anderson and D. Higham, “Multilevel Monte Carlo for continuous Markov
chains, with applications in biochemical kinetics,” SIAM Multiscal Model. Simul.,
vol. 10, no. 1, 2012.
[16] T. Li, “Analysis of explicit tau-leaping schemes for simulating chemically reacting
systems,” Multiscale Model. Simul., vol. 6, no. 2, pp. 417–436 (electronic), 2007.
[17] J. Ahrens and U. Dieter, “Computer methods for sampling from gamma, beta,
Poisson and bionomial distributions,” Computing, vol. 12, pp. 223–246, 1974.
Chapter 8
Abstract
Every mechanical system is naturally subjected to some kind of wear process that,
at some point, will cause failure in the system if no monitoring or treatment process
is applied. Since failures often lead to high economical costs, it is essential both to
predict and to avoid them. To achieve this, a monitoring system of the wear level
should be implemented to decrease the risk of failure. In this work, we take a first
step into the development of a multiscale indirect inference methodology for state-
dependent Markovian pure jump processes. This allows us to model the evolution of
the wear level, and to identify when the system reaches some critical level that triggers
a maintenance response. Since the likelihood function of a discretely observed pure
jump process does not have an expression that is simple enough for standard non-
sampling optimization methods, we approximate this likelihood by expressions from
1
A. Moraes, F. Ruggeri, P. Vilanova and R. Tempone, “Multiscale Modeling of Wear Degradation
in Cylinder Liners”, SIAM Multiscale Modeling and Simulation, Vol. 12, Issue 1 (2014).
285
upscaled models of the data. We use the Master Equation to assess the goodness-of-fit
and to compute the distribution of the hitting time to the critical level.
8.1 Introduction
It is well known that one of the main factors in the failure of heavy-duty diesel
engines used for marine propulsion is wear of the cylinder liner [1]. The stochastic
modeling of the wear degradation of cylinder liners is extensively treated in [2, 1, 3, 4]
and references therein. This wear process, at some point, will cause failure if no
maintenance program is utilized. An e↵ective maintenance program is one that can
be carried out when there is some identifiable warning of the occurrence of failure.
Then, preventive maintenance can be carried out on the basis of the current condition
of the liner, generally when the maximum wear approaches a specified limit as imposed
by warranty clauses.
In this work, we aim to use a multiscale indirect inference approach for the wear
degradation problem. In our context, the term indirect inference is used in the sense
that “it is impossible to efficiently estimate the parameters of interest because of
the intractability of the likelihood function” [5]. But, instead of using a sampling-
oriented method to obtain consistent estimators, we propose the use of a multiscale
hierarchy of approximate “tractable” likelihoods. After optimizing these likelihoods,
we plug the estimated parameters in our base model and assess its quality by checking
confidence bands computed directly from the estimated base model.
The data set, w = {wi }ni=1 , taken from [4], consists of wear levels observed on n =
32 cylinder liners of eight-cylinder SULZER engines as measured by a caliper with
a precision of 0.05 mm. Warranty clauses specify that, to avoid failures, the liner
should be changed before it accumulates a wear level of 4.0 mm. Data are presented
in Figure 8.1.
286
Observed wear process
5
Wear [mm]
3
0
0 1 2 3 4 5 6
Operating time [h] x 10
4
Figure 8.1: Data set from [4]. Data refer to cylinder liners used in ships of the
Grimaldi Group.
As a consequence of the finite resolution of the caliper, the set of possible mea-
surements of the cylinder wear is represented with a finite lattice in the positive real
line. For that reason, we propose to model the resulting measurements of the wear
process as a Markovian pure jump process [6], which is the simplest class of pure
jump processes. This type of process can be characterized by a finite set of possible
jumps, each one having a certain intensity function (see Section 8.2.1 for details).
In this work, we propose a multiscale inference approach that gradually allows
us to estimate the number of possible jumps of the process, its amplitudes, and the
corresponding intensity functions. We depart from the simplest possible pure jump
model, i.e., the one that has only one possible jump with a linear intensity function,
and proceed with more complex models, by, for instance, adding more possible jumps
and/or more general intensity functions.
Our base model, which defines the microscopic scale, is a continuous-time Markov
pure jump process in a lattice. Since the process is observed only in a finite set of
times, i.e., it is partially observed, its likelihood function usually can not be writ-
287
ten in a simple closed form amenable to performing standard optimization proce-
dures. Proper inference based on partially observed continuous-time Markov chains
in lattices should be based on likelihoods corresponding to non-homogeneous Poisson
processes. We refer to the chapter “Inference for Stochastic Kinetic Models” in [7]
for details on the mentioned likelihoods, and Monte Carlo Markov Chains (MCMC)
techniques for inference on pure jump processes. In [7], the inference procedure is
intended only for linear rates and it is based on MCMC and exact path-simulation
techniques as the Stochastic Simulation Algorithm (SSA) by [8]. As a consequence,
this inference methodology may be very computationally demanding and does not
address the problem for general nonlinear rates. Up to the authors knowledge, there
are no computationally low-cost sampling schemes from non-homogeneous Poissonian
bridges.
For that reason, the idea is to consider upscaled auxiliary versions for modeling
our data, from which we can obtain simpler likelihood functions. Once we estimate
the parameters from this approximate likelihood, we plug them into the base model
and look at how well it fits the data. We use confidence bands, computed from the
Master Equation [9] at the microscopic level, as a visual goodness-of-fit criteria. The
inference methodology, motivated by the introduction of several temporal scales is as
follows, the two first categories correspond to macroscale indirect inferences, whereas
the third category corresponds to the mesoscale indirect inferences and the last one
corresponds to direct inference at the microscale level:
Perturbed Mean Field We first approximate the likelihood function of our base
microscopic model by the likelihood corresponding to its Mean Field (macroscale
reaction rate ordinary di↵erential equations -ODEs-) whose observations are
perturbed with Gaussian noise.
Langevin di↵usion If the estimated microscopic model does not fit the data, we
translate the inference problem into the mesoscopic scale, where inference tech-
niques for Langevin di↵usions apply. We observe that the likelihood functions
based on Gaussian models for the data are more restrictive than are the like-
lihoods based on the Langevin one, because the last one allows us to model
the time evolution of multimodal distributions. When the Gaussian model is
appropriate, however, it is estimated much more quickly than the Langevin one.
Direct inference In the same way, if the parameters estimated in a suitable family
of mesoscale Langevin models do not fit the experimental data, we should make
a direct inference at the microscopic level. It is worth mentioning that inference
procedures at the mesoscale and microscale are much more involved from the
computational point of view. For these two scales, the likelihood functions in
general can not be written in a closed form, and for that reason, they have to
be approximated and optimized by sampling procedures [10].
In the literature there are other approaches for the wear inference problem based
on pure jump processes that do not deal directly with the continuous time model.
For example, in [1], in a chapter entitled “Stochastic Processes for Modeling the Wear
of Marine Engine Cylinder Liners”, the authors modeled the wear process as a time-
continuous state-dependent Markov chain. In their methodology they approximated
289
the continuous nature of time by using a discrete time Markov chain with uniform
time steps and model its transitions probabilities with a Poisson kernel depending
on two parameters. The resulting fitting with this approach is poor and the authors
do not proceed further in this direction. In [4], a state-dependent, inhomogeneous
in time, Markov chain is proposed. The authors use a similar inference strategy by
discretizing time and space. These approximation steps and their relevant associated
errors are not discussed and the computational cost of this approximate inference
method explodes as one refines the time and/or space discretizations.
Similar observations can be made for di↵usion processes, where the lattice is ap-
proximated by a continuum of states, and the Poissonian noise is replaced by Gaussian
noise. Sampling bridges from di↵usions is still an ongoing research area, see [11] and
the references therein.
Our main contribution is twofold. First, we o↵er a novel approach to the prob-
lem of modeling the wear degradation of cylinder liners by using a continuous time
Markov chain in a lattice determined by caliper precision. Second, we take a first step
toward a general methodology for a multiscale indirect inference approach. It is a
first step because, for this particular problem, we did not need to use the mesoscopic
or microscopic levels of approximation.
The goals of this work are: (i) to estimate the parameters of the wear process,
modeled as a Markovian pure jump process and,(ii) to obtain the distribution of the
hitting time to the critical warranty level.
The remainder of this paper is organized as follows. Section 8.2 presents the base
model, its upscalings and the system of ODEs for the first two moments of the base
model. In Section 8.3, we present the model that actually fits the data. Next, in
Section 8.4, we derive the likelihood functions for the macroscopic scales. Later, in
Section 8.5, we develop a method for computing the hitting time to the critical level
based on the solution of the Master Equation. Section 8.6 contains the results of the
290
inference process and the distribution of the hitting time for the fitted model. Finally,
Section 8.7 o↵ers the conclusions.
8.2 Methodology
In this section, we first present the elements of the base microscopic model and
its infinitesimal generator. Then, we derive the macroscopic Mean Field and the
mesoscopic Langevin approximations. Finally, we show how to derive a system of
ODEs for the time evolution of the first two moments of the base model. The Mean
Field and the second-order expansion are used in Section 8.4 as the basis of the
indirect inference method.
Example 8.2.1 (Simple decay model). The Simple Decay model is a pure jump
process X in the lattice S = N, where is a positive real number. The system
291
starts from x0 2S at time t=0, and the only reaction allowed is ⌫ = . Its associated
propensity function is a(x; c) = c x, where c > 0.
X
LX (f ) := aj (x; ✓)(f (x + ⌫j ) f (x)). (8.1)
j
X
LZ (f ) := aj (x; ✓)@x f (x)⌫j ,
j
where the j-column of the matrix ⌫ is ⌫j and a is a column vector with components
aj .
Using a second-order Taylor expansion of f in (8.1), we obtain the generator of
an Itô di↵usion process, Y ,
X 1
LY (f ) := aj (x; ✓)(@x f (x)⌫j + ⌫j> @x2 f (x)⌫j ),
j
2
where Y is the di↵usion process defined by the Langevin Itô stochastic di↵erential
equation (SDE), where B(t) is a RJ valued Wiener process with independent compo-
292
nents.
8
> p
< dY (t) = ⌫a(Y (t); ✓)dt + ⌫diag( a(Y (t); ✓))dB(t), t 2 R+
(8.3)
>
: Y (0) = x0 2 R+ .
The Mean Field equations (8.2) approximate the evolution of the mean of the pure
jump process X. But, sometimes, it is desirable to have ODEs that approximate
higher-order moments as well. Here, we show how to derive a system of ODEs for the
evolution of the first two moments of a pure jump process by approximating it with
a Gaussian process. This approach is well suited for unimodal distributions of X(t)
for all times. In such a case, it has a main advantage with respect to the Langevin
di↵usion approach (8.3) because we do not need to sample any random variables, and
in particular di↵usion bridges, for obtaining estimations of our parameters.
Direct approach: Consider the Dynkin formula [6] for the process X,
Z t
E [f (X(t))] = f (X(0)) + E [LX (f )(s)] ds. (8.4)
0
To obtain the second-order moment expansion, we simply consider the formula (8.4)
applied to f (x) = x and f (x) = x2 . This leads to
8 hP i
> Rt
< E [X(t)] = x0 + E aj (X(s); ✓)⌫j ds
0 j
Rt hP i (8.5)
>
: E [X 2 (t)] = x20 + E aj (X(s); ✓) 2⌫j X(s) + ⌫j2 ds.
0 j
In general, the system (8.5) is not closed, and it depends on the form of the
propensity functions aj . In the linear case, when aj (x; ✓) = g(✓)x, the system (8.5) is
2
closed. We derive in Section 8.4.2 an ODE system for µ(t) := E [X(t)] and (t) :=
E [(X(t) µ(t))2 ].
An alternative approach: We present here an alternative way of deriving a system
293
of ODEs that approximately describes the evolution of the first two moments of the
process, X. The advantage of this approach is that it always gives a closed system of
ODEs.
Consider a general SDE of the form
Remark 8.2.2. When the distribution of X(t) is multimodal, then we can extend the
approach in (8.7), by approximating the distribution of X(t) with a Gaussian mixture.
The price to pay in this case is the increment in the dimension of the resulting ODE
system.
294
8.3 The thickness measurement process
In this section, we define an auxiliary process named the thickness process that is
used for modeling and inference convenience. The relation between the wearing and
the thickness processes is simple: the sum of both is a constant that equals the initial
thickness. Therefore, the thickness process is decreasing and it takes positive values
in the lattice generated by caliper precision.
It is worth mentioning that the simple decay model described in Example 8.2.1
predicts a much smaller variability than does the one observed in the data, and it can
not be used as a model for the thickness process. For that reason, we propose modeling
the thickness process using two reactions with linear state-dependent coefficients.
This model generates satisfactory confidence bands that we use as a goodness-of-fit
test.
Definition of the thickness process. Let X(t) be the thickness process derived
from the wear of the cylinder liners up to time t (see [3, 4]), i.e., X(t) = T0 W (t),
where W is the wear process and T0 is the initial thickness. We model X(t) as a
sum of two simple decay processes (see Example 8.2.1) with = 0.05 (which is
the resolution of the measurement instrument), since one simple decay process is
not enough to explain the variance of the data. The two considered intensity-jump
pairs are (a1 (x), ⌫1 ) = (c1 x, ) and (a2 (x), ⌫2 ) = (c2 x, k ), where k is a positive
integer to be determined, and c1 and c2 are coefficients with dimension (mm·hour) 1 .
Therefore, the probability to observe a thickness decrement in a small time interval
(t, t + dt) is
where the initial thickness X(0) = T0 , the coefficients c1 , c2 and k are four unknown
295
parameters.
cess
In this section, we obtain the likelihood functions at the macroscopic level for the
thickness process (8.8). The first step is to transform the data set to observe a
decreasing thickness process. We define the thickness data x = {xi }ni=1 , as xi :=
T0 wi , where T0 is an unknown parameter that we expect to be around 5.0 mm (see
Section 8.6 for further details) and wi is the wear of the i-th datum. Observe that x
depends on T0 .
We consider two approximate models for the experimental data, x, as follows. The
first one postulates that each data point is the Mean Field of the thickness process
plus Gaussian noise with constant variance. In this case, the maximum likelihood
estimation (MLE) leads to an ordinary least squares problem. This model turns
out to be unsatisfactory for two reasons: when we consider only one reaction, it
gives a very narrow confidence band; when we consider two reactions, there is an
identifiability problem since there is a straight line in which the maximum of the
likelihood is attained. The second one is slightly more complex. It postulates that
the data are the sum of two terms that are independent realizations of two Gaussian
random variables. The moments of the first random variable evolve in time according
to a system of ODEs obtained by moment expansion. The second term is just additive
Gaussian noise with constant variance. The MLE leads to a weighted least squares
problem with a logarithmic penalization term. In this case, as we see in Section 8.6,
there is only one point in which the maximum of the likelihood is attained.
296
8.4.1 Mean Field approximation
Let us consider a Mean Field approximation for the thickness data, x, i.e., the data
x are modeled according to
2
where Z(t) satisfies the Mean Field ODE (8.2) and ✏i are i.i.d. realizations of N (0, E)
2
for i = 1, . . . , n, where E > 0 is the experimental measurement error. In this work,
we set E = (see Remark 8.4.1).
In this case, the likelihood function can be written as
n
Y ⇢
(xi Z(ti ; ✓))2
L(✓; x) / exp , (8.10)
i=1
2 E2
where ✓=(c1 , c2 , k, T0 ).
Now, given k and T0 , the MLE for (c1 , c2 ) is the minimizer of the opposite of the
log likelihood, i.e.,
n
X
⇤
c (k, T0 ) := arg min (xi Z(ti ; ✓))2 . (8.11)
c1 0,c2 0
i=1
To obtain a system of ODEs for the time evolution of the first two moments of the
process X, we proceed to write down the system (8.5) in di↵erential form, where the
propensity functions and jumps are defined in (8.8). Since the propensity functions
are linear functions of the state, we have a closed system of ODEs:
297
8
>
>
>
> dµ(t) = (c1 ⌫1 + c2 ⌫2 )µ(t)dt,
<
d 2 (t) = (2(c1 ⌫1 + c2 ⌫2 ) 2 (t) + (c1 ⌫12 + c2 ⌫22 )µ(t)) dt, (8.12)
>
>
>
>
: (µ(0), 2
(0)) = (x0 , 0), x0 2 R+ , t 2 R+ .
2
Based on µ(t) and (t), we consider a Gaussian model for our data, i.e., the data
x are modeled according to
2 2
where, Ỹ (t) ⇠ N (µ(t), (t)), with mean µ(t) and variance (t). We also consider
2 2
that ✏i are i.i.d. realizations of N (0, E) for i = 1, . . . , n, where E > 0 is the
experimental measurement error. Here, E = (see Remark 8.4.1).
In this case, the likelihood function can be written as
n
Y ⇢
1 (xi µ(ti ; ✓))2
L(✓; x) = p exp , (8.16)
i=1 2⇡( 2
E + 2 (t
i ; ✓)) 2( E2 + 2 (ti ; ✓))
where ✓=(c1 , c2 , k, T0 ).
Now, the MLE for (c1 , c2 ), for fixed k and T0 , is the minimizer of the opposite of
the log likelihood,
n ⇢
X
⇤ (xi µ(ti ; ✓))2 2 2
c (k, T0 ) := arg min 2
+ log( E + (ti ; ✓)) . (8.17)
c1 0,c2 0
i=1 E + 2 (ti ; ✓)
298
Finally, we determine the appropriate values of k and T0 by analyzing the sequence
{c⇤ (k, T0 )}k 2 for di↵erent values of T0 . In Section 8.6, we see that for our data set,
w, the appropriate values for k and T0 are 4 and 5.0 mm, respectively.
Remark 8.4.1. Since the precision of the caliper is , if we assume that the mea-
surement errors are normally distributed, then the interval ± /2 is approximately
±3 E wide, and therefore we could set E = 2 /3. Numerical experiments show that
our inferences are essentially the same whether E is or 2 /3.
Remark 8.4.2. From the Langevin equation (8.3), if we define ↵(y) := ⌫a(y; ✓) and
p
(y) := ⌫diag( a(y; ✓)), we again obtain the system in (8.12).
Remark 8.4.3. The wear process, by its physical nature, is increasing and bounded,
therefore the thickness process should be decreasing and bounded from below. Thus,
we expect that the mean of the thickness process, µ(t), defined in (8.13), decays to
2
zero. Regarding the variance, (t), defined in (8.14), it should start from zero at
time zero, increase, and then return to zero again.
In this section, we address the problem of computing the distribution of the time in
which the wear attains a certain critical value, L, i.e., the hitting time to L. Let ⌧L
be the first time that the wear process, W , is greater or equal than the critical level,
L,
⌧L := inf{t 2 R+ : W (t) L}.
This is exactly the first time that the thickness process, X, is less than or equal to
B = T0 L, where T0 is the initial thickness.
P
We have that F⌧B ;✓ (t) := P (X(t) B|✓) = xB px (t; ✓), where px (t; ✓) is the
probability that X(t)=x, given the value of the parameter vector ✓. We know that
299
px (t; ✓) satisfies a system of ODEs named the Master Equation (ME)(see [13, 14, 9]).
In our setting, the ME is given by
8
> dpx (t; ✓) X X
>
< = px ⌫j (t; ✓)aj (x ⌫j ; ✓) px (t; ✓) aj (x; ✓), t 2 R+
dt j j (8.18)
>
>
: p (0; ✓) = 1
x x=x0 ,
Suppose that we know that the wear process, W , is at level w0 at time t0 0. Assume
that there exists a critical stopping level, wmax > w0 , that determines the residual
lifetime ⌧max t0 . For t > 0, the residual lifetime is greater than t, if and only if
W (t0 + t) < wmax . Therefore, the conditional probability
Taking into account the relation between the wear and the thickness processes, we
have that the conditional residual reliability function defined as
can be written as P (X(t; T0 w 0 ) > T0 wmax ), where X(·, x0 ) is the thickness pro-
cess starting from x0 .
300
8.6 Numerical Results
As mentioned at the beginning of the Section 8.3, simple decay models (see 8.2.1) do
not fit the wear data, w = {wi }ni=1 , since they produce very narrow confidence bands,
like the dashed blue one shown in the left panel of Figure 8.3. In fact, we modeled
P
a(x; ✓) as Jj=1 cj xj for J = 1, 2, 3. In each case the only nontrivial coefficient was c1 .
It is important to notice that all the confidence bands are computed using the ME
(8.18).
Consider the pure jump process defined in (8.8). For this process, we have to esti-
mate the values of c1 , c2 , k and T0 . Figure 8.1 shows, in the left panel, the contour plot
of the least squares function (8.11), associated with the likelihood function defined in
(8.10), for k=4 (for other values of k, we obtain the same results), and T0 =5.0. We
can observe an identifiability problem since the maximum of the likelihood function
is attained at a straight line, see the left panel of Figure 8.1. By varying the values of
c1 and c2 in the minimum level set of the least squares function, we obtain a family
of confidence bands. For c2 = 0 (one reaction model), or small values of c2 , the con-
fidence band is very narrow, see the right panel of Figure 8.1. In the other extreme,
when c1 is positive but close to zero, we obtain satisfactorily wide confidence bands,
shown in the right panel of Figure 8.2.
To identify properly the values of c1 and c2 for each integer k 2 and T0 2S, we
use model 2, defined in (8.15), for the thickness data x. Figure 8.2 shows, in the
left panel, the contour plot of the least squares function (8.17), associated with the
likelihood function defined in (8.16) for k=4 and T0 =5.0. Now, we are in a better
situation regarding identifiability. Conditional on those values of k and T0 , the MLE
for (c⇤1 , c⇤2 ) is given by (0.63 · 10 4 , 1.2 · 10 4 ). In the right panel, the corresponding
90% confidence band is shown, which is very similar to the one obtained in [4], but
we use a more parsimonious model for the wearing process.
Figure 8.3 shows in the left panel the wear data, w, along with the confidence
301
x 10 Residuals of the opposite of the loglikelihood
−4 Data and the 90% confidence band
2.5 4.5
5 The 90% confidence band .
4
4.5
2
3.5
4
3.5 3
1.5
3 2.5
Wear
c2
2.5
2
1
2
1.5
1.5
1
0.5 1
0.5 0.5
0 0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
c1 −4 Operating time [h] 4
x 10 x 10
Figure 8.1: Left panel: residuals of the opposite of the loglikelihood of (8.10) for
k = 4. There is an identifiability problem for the parameters c1 and c2 . For each pair
in the minimizing set, we have a di↵erent confidence band. Right panel: wear data
and the 90% confidence band under model 1, defined in (8.9), for positive but small
c2 . The confidence band turns to be narrow when c1 increases.
5 2
1
4
1.5
3
1
0.5 2
1 0.5
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
c −4 Operating time [h] 4
1 x 10 x 10
Figure 8.2: Left panel: residuals of the minus loglikelihood (8.16) for k = 4. The
model 2, defined in (8.15), for the thickness data x, produces a likelihood function
with a unique global maximum. The MLE is (c⇤1 , c⇤2 ) = (0.63 · 10 4 , 1.2 · 10 4 ). Right
panel: th 90% confidence band.
bands computed for the data models 1 and 2. The values of c1 , c2 and k were com-
puted using the upscaled models, but fitted using the ME, which acts directly in the
microscopic base model (8.8).
Now, consider the values of the objective function defined in (8.17), evaluated at
302
⇤ ⇤
✓ :=(c (k, T0 ), k, T0 ), as a function of k and T0 ,
n ⇢
X (xi m(ti ; ✓⇤ ))2
F (k, T0 ) := 2
+ log( 2
E + 2
(ti ; ✓⇤ )) . (8.19)
i=1 E + 2 (ti ; ✓⇤ )
Figure 8.3 shows in the right panel that F (k, T0 =5.0mm) decreases until k=4, where
it reaches a plateau. The same situation is true for other values of T0 , even more,
F (4, 5.0) F (4, T0 ) for T0 2 S. For that reason, ✓⇤⇤ := (0.63 · 10 4 , 1.2 · 10 4 , 4, 5.0)
is the MLE for our model. As a consequence, we identify two types of jumps, one
with amplitude and the other with amplitude 4 .
Figure 8.4 shows the evolution in time of the probability mass function defined in
(8.18), px (t; ✓⇤⇤ ), which is the solution of an ODE system. It looks like the typical
surface obtained in the Fokker Planck equations for di↵usions, but this is because we
are considering a fine lattice, S= N, with =0.05. We see that it departs at time
t=0 from a point mass concentrated at the initial thickness T0 , and it di↵uses into
a unimodal bell-shaped distribution. In the domain, we plotted 100 exact simulated
paths and their average (see [15]).
In Section 8.5, we defined the hitting time to the critical level L. Let L=4, as
specified in warranty clauses. Then, since we have T0 =5.0, we have that B=1. We
can see the cumulative distribution function (CDF) and the probability distribution
function (PDF) of the hitting time, ⌧B , for B=1, in the left and right panels of Figure
8.5, respectively. The figure indicates that at around t = 30, 000 hours, it is advisable
to start monitoring the wear.
Figure 8.6, in the left panel, shows the evolution of the Gaussian confidence in-
tervals with the mean and variance computed from the process X. The functions
2
µ(t) and (t) are defined in (8.12). In the right panel of Figure 8.6, we see that the
confidence band computed from the ME (8.18) does not contain any negative value.
In Figure 8.7, we show, in the left panel, the QQ-plot of the normalized thickness
303
The opposite of the loglikelihood at (c*(k),k)
4.5 −38
4
−40
3.5
−42
3
Wear [mm]
2.5 −44
F(k)
2 −46
1.5
−48
1 Data
90% conf. band model 1
0.5 −50
90% conf. band model 2
0 −52
0 1 2 3 4 5 6 2 3 4 5 6 7 8
Operating time [h] x 10
4
k
Figure 8.3: Left panel: the exact 90% confidence band from the ME (8.18).
Right panel: Plot of F (k) defined in 8.19. F (k) decreases until k=4, where it reaches
a plateau.
Figure 8.4: Solution of the ME (8.18) and 100 exact simulated paths [15].
p
data, z = {zi }ni=1 , defined by zi = (xi µ(ti ))/ 2 (t
i) + 2
E, where the thickness data
{xi }ni=1 are defined in Section 8.4. The figure suggests that there is good agreement
between z and the standard Gaussian distribution. In the right panel of Figure 8.7, we
show the percentage histogram and a kernel density estimation of z. The p-value of
the Shapiro-Wilk test is 0.68. We therefore can not reject Gaussianity. This analysis
strongly supports the use we made of model 2, defined in (8.15), for the thickness
304
CDF of the hitting time, B = 1 PDF of the hitting time, B = 1
1 0.05
0.8 0.04
0.6 0.03
0.4 0.02
0.2 0.01
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Operating time [h] 4
x 10 Operating time [h] 4
x 10
Figure 8.5: Left panel: CDF of the hitting time for B = 1. Right panel:PDF of the
hitting time to the critical level.
Data and the 90% Gaussian confidence band Data and the 90% Gaussian confidence band
1
4
0.8
3
0.6
Wear [mm]
Wear [mm]
2 0.4
1 0.2
0
0
90% Gaussian confidence intervals −0.2
90% Gaussian confidence intervals
−1 90% ME confidence band −0.4 90% ME confidence band
1 2 3 4 5 6 0 5000 10000
time [h] x 10
4
time [h]
data, x.
Figure 8.8 shows the behavior of the conditional residual reliability function,
R(t; 0, w0 ) (see Section 8.5.1), for some values of w0 . In this case, we set wmax = 4.
As expected, for a fixed residual lifetime t, we have that R(t; 0, w0 ) is a decreasing
function of w0 .
305
QQ−plot for normalized thickness data Histogram and the kernel density estimation
3 0.5
0.1
−2
−3 0
−3 −2 −1 0 1 2 3 −4 −2 0 2 4
Standard normal quantiles Normalized thickness data
Figure p 8.7: Left panel: QQ-plot of the normalized thickness data, zi = (xi
µ(ti ))/ 2 (t ) + 2 . This plot suggests that there is good agreement between z
i E
and the standard Gaussian distribution. Right panel: percentage histogram and a
kernel density estimation of z. The p-value of the Shapiro-Wilk test is 0.68. We can
therefore not reject Gaussianity.
1.2
1 w0 =2
w0 =3
0.8
w0 =4
0.6
w0 =5
0.4
0.2
0
0 2 4 6 8 10 12
Residual lifetime [h] x 10
4
Figure 8.8: The conditional residual reliability function, R(t; 0, w0 ) (see Section 8.5.1),
for some values of w0 .
8.7 Conclusions
In this paper, we presented a novel approach to the problem of modeling the wear
process of cylinder liners. Since the measuring caliper has finite precision, the wear
process takes values in a lattice and therefore a pure jump process is a sensible model.
306
In this approach, we started fitting one of the most simple pure jump processes, i.e.,
the simple decay model, and added complexity only when necessary. We found that
the wear process can be modeled using only two jumps of amplitudes and 4 , with
linear propensity functions. In contrast to the work of Giorgio et al (2011) [4], we did
not need to use age-dependent propensity functions or gamma noise. Nevertheless,
our approach is totally suitable for dealing with age-dependent propensities since the
time do not play role other than given constant quantities.
One of the main contributions of this work is the multiscale indirect inference
approach, where the inferences are based on upscaled models. The coefficients of
the linear propensity functions were inferred using the likelihood associated with a
Gaussian upscaled model. The mean and variance of this Gaussian process are the
solutions of a second-order moment expansion ODE system. In this way, we computed
the MLE by solving a standard nonlinear least squares problem. We observe that
this method is much simpler than dealing directly with the likelihood of the pure
jump process, which in general can not be expressed in closed form and requires
computationally intensive sampling techniques to be solved. We notice that, as long
as the probability distribution of the pure jump process is unimodal at every time, our
Gaussian inference approach is applicable and produces substantially savings in the
computational work. Otherwise, the Langevin model, while more computationally
demanding, is more flexible.
Thanks to the remarkable simplicity of our model, we can easily obtain the dis-
tribution of any observable of the process directly from the solution of the Master
Equation, which provides the probability distribution of the process at all times.
From this probability mass function, we easily compute the cumulative distribution
function of the hitting time to the critical value stipulated in the warranty and the
conditional residual reliability function. It is worth mentioning that we did not use
Monte Carlo simulation or any other sampling procedure.
307
Acknowledgments
The first, third and fourth authors are members of the SRI Center for Uncertainty
Quantification in Computational Science & Engineering at KAUST. This research
was performed when the second author visited KAUST.
308
REFERENCES
[1] M. Giorgio, M. Guida, and G. Pulcini, “Stochastic processes for modeling the
wear of marine engine cylinder liners,” in Statistics for Innovation, P. Erto, Ed.
Springer Milan, 2009, pp. 213–230.
[2] ——, “A wear model for assessing the reliability of cylinder liners in marine
diesel engines,” Reliability, IEEE Transactions on, vol. 56, no. 1, pp. 158–166,
2007.
[3] ——, “A state-dependent wear model with an application to marine engine cylin-
der liners,” Technometrics, vol. 52, no. 2, pp. 172–187, 2010.
[4] ——, “An age- and state-dependent Markov model for degradation processes,”
IIE Transactions, vol. 43, no. 9, pp. 621–632, 2011.
[9] N. Van Kampen, Stochastic Processes in Physics and Chemistry, Third Edition
(North-Holland Personal Library), 3rd ed. North Holland, 2007.
[10] M. Bladt and M. Sørensen, “Statistical inference for discretely observed Markov
jump processes,” Journal of the Royal Statistical Society Series B, vol. 67, no. 3,
pp. 395–410, 2005.
309
[11] C. Bayer and J. Schoenmakers, “Simulation of conditional di↵usions via forward-
reverse stochastic representations,” 2013.
[13] C. Gardiner, Stochastic Methods: A Handbook for the Natural and Social Sciences
(Springer Series in Synergetics). Springer, 2010.
[14] H. Risken and T. Frank, The Fokker-Planck Equation: Methods of Solution and
Applications (Springer Series in Synergetics). Springer, 1996.
Chapter 9
Abstract
9.1 Introduction
Remark 9.1.1. The partially observed case can in principle also be treated by a
variant of the FREM algorithm based on [2], Corollary 3.8.
For further convenience, we organize the information in our data set, D, as a finite
collection,
such that for each k, Ik := [sk , tk ] is the time interval determined by two consecutive
313
observational points sk and tk , where the states x(sk ) and x(tk ) have been observed,
respectively. Notice that D collects all the data corresponding to the M observed
paths of the process X. For that reason, it is possible to have [sk , tk ]=[sk0 , tk0 ] for
k6=k 0 , for instance, in the case of repeated measurements.
For technical reasons, we need to define a sequence of intermediate times, (t⇤k )K
k=1 ;
where aj : Rd ! [0, 1) are known as propensity functions. We set aj (x)=0 for those
/ Zd+ . We assume that the initial condition of X, X(0) = x0 2 Zd+
x such that x+⌫j 2
is deterministic and known.
c
Example 9.1.2 (Simple decay model). Consider the reaction X ! ; where one
particle is consumed. In this case, the state vector X(t) is in Z+ where X denotes
the number of particles in the system. The vector for this reaction is ⌫ = 1. The
propensity functions in this case could be, for example, a(X) = c X, where c > 0.
X
LX (f )(x) := aj (x)(f (x + ⌫j ) f (x)). (9.3)
j
Z t
E [f (X(t))] = f (X(0)) + E [LX (f )(s)] ds, (9.4)
0
can be used to obtain integral equations describing the time evolution of any observ-
able of the process X. In particular, taking the canonical projections fi (x) = xi , we
318
obtain a system of equations for E [Xi (t)],
Z tX
E [Xi (t)] = x0 + E [aj (X(s))] ⌫j,i ds.
0 j
If all the propensity functions, aj , are affine functions of the state, then this system
of equations forms a closed system of ODEs. In general, some propensity functions
may not depend on their coordinates x in an affine way, and for that reason, the
integral equations for E [Xi (t)] obtained from the Dynkin formula depend on higher
moments of X. This can be treated by closing moment techniques [13, 14] or by
taking a di↵erent approach: using a formal first-order Taylor expansion of f in (9.3),
we obtain the generator
X
LZ (f )(x) := aj (x)@x f (x)⌫j ,
j
which corresponds to the reaction-rate ODEs (also known as the mean field ODEs)
8
>
< dZ(t) = ⌫a(Z(t))dt, t 2 R+ ,
(9.5)
>
: Z(0) = x , 0
where the j-column of the matrix ⌫ is ⌫j and a is a column vector with components
aj .
This derivation motivates the use of Z(t) as an approximation of E [X(t)] in the
phase I of our FREM algorithm.
n k
X o
1
j = min k 2 {1, . . . , J} : ai (x)>U1 a0 (x) , ⌧min = (a0 (x)) ln (U2 ) ,
i=1
PJ
where a0 (x) := j=1 aj (x). The system remains in the state x until the time t + ⌧min ,
then it jumps, X(t + ⌧min ) = x + ⌫j . In this way, we can simulate a full path of the
process X.
Exact paths can be generated using more efficient algorithms like the Modified
Next Reaction Method by Anderson [16], where only one uniform variate is needed
at each step. However, in regimes where the total propensity, a0 (x), is high, approxi-
mate path-simulation methods like the hybrid Cherno↵ tau-leap [17] or its multilevel
versions [18, 19] may be required.
H ⌘ E [ g(X) | X0 = x, XT = y] ,
320
for fixed values x, y 2 R and a (sufficiently regular) functional g on the path-space.2
d
⇥ ⇤
lim✏!0 E g(X (f ) X (b) )✏ (X (f ) (t⇤ ) X (b) (t⇤ ))Y
H= . (9.6)
lim✏!0 E [✏ (X (f ) (t⇤ ) X (b) (t⇤ ))Y]
Here, X (f ) is the solution of the original SDE (i.e., is a copy of X) started at X (f ) (0) =
x and solved until some time 0 < t⇤ < T . X (b) is the time-reversal of another di↵usion
process Y whose dynamics are again given by an SDE (with coefficients explicitly given
in terms of the coefficients of the original SDEs) started at Y (t⇤ ) = y and run until
time T . Hence, X (b) starts at t⇤ and ends at X (b) (T ) = y. We then evaluate the
functional g on the “concatenation” X (f ) X (b) of the paths X (f ) and X (b) , which is
a path defined on the full interval [0, T ] defined by
8
>
>
<X (f ) (s), 0 s t⇤ ,
X (f ) X (b) (s) ⌘
>
>
:X (b) (s), t⇤ < s T.
In this section, we derive the dynamics of the reverse paths and the expectation
formula for functionals of SRN-briges. The derivation follows the same scheme used
in [20] , that is, i) write the Master Equation, ii) manipulate the Master Equation to
obtain a Backward Kolmogorov equation and, iii) derive the infinitesimal generator
of the reverse process.
Let X be a SRN defined by the intensity-reaction pairs ((⌫j , aj (x)))Jj=1 . Let p(t, x, s, y)
be its transition probability function, i.e., p(t, x, s, y):=P X(s)=y X(t)=x where
x, y 2 Zd+ , and 0<t<s<T . The function p satisfies the following linear system of
ODEs known as the Master Equation [22, 23, 24]:
8
< @s p(t, x, s, y) = PJ (aj (y
>
⌫j )p(t, x, s, y ⌫j ) aj (y)p(t, x, s, y)) ,
j=1
(9.7)
>
: p(t, x, t, y) = x=y ,
Let us consider a fixed time interval [t, T ]. For s 2 [t, T ], and x, y 2 Zd+ , let us
P
define v(s, y) := x g(x)p(t, x, s, y) provided that the sum converges. We remark
here that v cannot in general be interpreted as an expectation of g. Indeed, while
P
y p(t, x, s, y) = 1, the sum over x could, in principle, even diverge. Hence, it is not
Let ⌫˜j := ⌫j . By adding and subtracting the term aj (y + ⌫˜j )ṽ(s̃, y), we can write
the first equation of (9.9) as:
J
X
@s̃ ṽ(s̃, y) + (aj (y + ⌫˜j ) (ṽ(s̃, y + ⌫˜j ) ṽ(s̃, y)) + (aj (y + ⌫˜j ) aj (y)) ṽ(s̃, y)) = 0.
j=1
324
As a consequence, the system (9.9) can be written as
8
< @s̃ ṽ(s̃, y) + PJ aj (y + ⌫˜j ) (ṽ(s̃, y + ⌫˜j )
>
ṽ(s̃, y)) + c(y)ṽ(s̃, y) = 0,
j=1
(9.10)
>
: ṽ(T, y) = g(y),
PJ
where c(y) := j=1 aj (y + ⌫˜j ) aj (y).
Let us now define ãj (y) := aj (y + ⌫˜j ) and substitute it into (9.10). We have arrived
at the following backward Kolmogorov equation [25] for the cost-to-go function v(s̃, y),
8
< @s̃ ṽ(s̃, y) + PJ ãj (y) (ṽ(s̃, y + ⌫˜j )
>
ṽ(s̃, y)) + c(y)ṽ(s̃, y) = 0,
j=1
(9.11)
>
: ṽ(T, y) = g(y).
PJ
We recognize in (9.11) the generator LY (ṽ)(s̃, y) := j=1 ãj (y) (ṽ(s̃, y + ⌫˜j ) ṽ(s̃, y))
that defines the so-called reverse process Y ⌘ {Y (s̃, !)}ts̃T by
or equivalently by,
✓Z T ◆
ṽ(s̃, y) = E g(Y (T )) exp c(Y (s))ds Y (s̃) = y . (9.14)
s̃
Let us consider a time interval [s, t] and assume that we only observe the process X
on the end-points, i.e., that we have X(s) = x and X(t) = y for some observed values
x, y 2 Zd+ . Fix an intermediate time s<t⇤ <t, which will be considered a numerical
input parameter later on. Denote by X (f ) the process X conditioned on starting at
X (f ) (s) = x and restricted to the time domain [s, t⇤ ].
Furthermore, let Y denote the reverse process constructed in (9.12) on the time
domain [t⇤ , t] (i.e., inserting t⇤ for t and t for T in the above subsection) started at
Y (t⇤ ) = y. As noted above, Y is again an SRN with reaction channels (( ⌫j , ãj ))Jj=1 .
For convenience, we also introduce the notation X (b) for the process Y run backward
in time, i.e., we define X (b) (u):=Y (t⇤ +t u) for u2[t⇤ , t], and notice that X (b) (t) = y.
Recall that we aim to provide a stochastic representation, i.e., a representation
containing standard expectations only, for conditional expectations of the form,
for mapping Zd+ -valued paths to real numbers. Obviously, needs to be integrable
in order for H to be well-defined, and we shall also assume polynomial-growth condi-
tions on and its derivatives (here we are assuming that there is a sensible smooth
extension of to the real domain) with respect to jump-times of the underlying path.
Moreover, we assume that p(s, x, t, y) > 0. Once again, the fundamental idea of the
forward-reverse algorithm of Bayer and Schoenmakers [2] is to simulate trajectories
of X (f ) and (independently) of X (b) and then look for any pairs which are “linked”.
Since the state space is now discrete, we may, in principle, require exact linkage in
the sense that we may only consider pairs such that X (f ) (t⇤ ) = X (b) (t⇤ ). However, in
order to decrease the variance of the estimator, it may once again be advantageous
to relax this condition by introducing a kernel.
326
By a kernel, we understand a function : Zd ! R satisfying
X
(x) = 1.
x2Zd
X
x↵ (x) = 0
x2Zd
Remark 9.2.1. The Kronecker kernel 0 can, indeed, also be realized as 0 = ✏0 for
some ✏0 > 0 which will depend on the base kernel , provided that the base kernel
has finite support.
⇥ ⇤
E X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t]
H(x, y) = lim ,
✏!0 E [✏ (X (f ) (t⇤ ) X (b) (t⇤ )) (X (b) , [t⇤ , t])]
(9.16)
where X (f ) X (b) denotes the concatenation of the paths X (f ) and X (b) in the sense
defined by 8
>
>
<X (f ) (s), s u t⇤ ,
(f ) (b)
X X (u) ⌘
>
>
:X (b) (s), t⇤ < u t.
and
✓Z b ◆
(Z, [a, b]):= exp c (Z(u)) du .
a
Remark 9.2.3. In line with Remark 9.2.1, we note that we could easily have avoided
taking limits in Theorem 9.2.2 by replacing ✏ with 0 everywhere in (9.16). We note
at this stage that the Monte Carlo estimator based on (9.16) with positive ✏ will have
considerable smaller variance than the version with ✏ = 0, potentially outweighing the
increased bias.
Sketch of proof of Theorem 9.2.2. For simplicity, we assume that the kernel has
finite support and that the functional is uniformly bounded. We will prove con-
vergence of the numerator and the denominator in (9.16) separately. Let us, hence,
prove the more general case first, i.e., the convergence
In the first step, we assume that (Z, [s, t]) only depends on the values of Z on a
328
fixed grid, say s = t0 < t1 < · · · < tn = t, i.e.,
Then (9.17) is proved (with minor modifications) in [2], Theorem 3.4. Indeed, a closer
look at that proof reveals that only Markovianity of X is really used.
Furthermore, note that any continuous functional can be approximated by func-
tionals n depending only on the values of the process on a (ever finer) finite grid
t0 , . . . , tn . As, on the one side,
⇥ ⇤
lim lim E n X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t] =
✏!0 n!1
⇥ ⇤
lim E X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t] ,
✏!0
⇥ ⇤
lim lim E n X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t] =
✏!0 n!1
⇥ ⇤
lim lim E n X (f ) X (b) , [s, t] ✏ (X (f ) (t⇤ ) X (b) (t⇤ )) X (b) , [t⇤ , t] ,
n!1 ✏!0
which follows as 0 = ✏0 for some ✏0 > 0. In fact, it even follows in the general case
by dominated convergence.
Finally, the proof of convergence of the denominator is a special case of the proof
for the numerator, and therefore the convergence of the fraction follows from conti-
nuity of (a, b) 7! a/b for b > 0.
329
9.3 The EM Algorithm for SRN
In this section, we present the EM algorithm for SRNs, which is the main step for
computing the parameter estimation. First, we mention what is the EM algorithm
in general. Then, we derive the log-likelihood function for a fixed realization of the
process, X. Finally, we present the EM algorithm for SRNs.
The EM algorithm [3, 4, 5, 6] has its name due to its two steps: Expectation and
Maximization. It is an iterative algorithm that, given an initial guess and a stopping
rule, provides an approximation for a local maximum or saddle point of the likelihood
function, lik(✓ D). It is a data augmentation technique in the sense that the maxi-
mization of the likelihood lik(✓ D) is performed by treating the data D as a part of
a larger data set, (D, D̃), where the complete-likelihood, likc (✓ D, D̃), is amenable to
maximization. Given an initial guess ✓(0) , the EM algorithm maps ✓(p) into ✓(p+1) by
h i
1. Expectation step: Q✓(p) (✓ D) := E✓(p) log(likc (✓ D, D̃)) D .
⇥ ⇤
Here, E✓(p) · D , denotes the expectation associated with the distribution of D̃ under
the parameter choice ✓(p) , conditional on the data, D. In many applications, the Ex-
pectation step is computationally infeasible and Q✓(p) (✓ D) should be approximated
by some estimate,
h i
c
Q̂✓(p) (✓ D) := Ê✓(p) log(lik (✓ D, D̃)) D .
Remark 9.3.1 (The Monte Carlo EM). If we know how to sample a sequence of
M independent variates (D̃i )M (p)
i=1 ⇠ D̃ D, with parameter ✓ , then we can define the
330
following Monte Carlo estimator of Q✓(p) (✓ D),
M
1 X
Q̂✓(p) (✓ D) := log(likc (✓ D, D̃i )).
M i=1
Paths
The goal of this section is to derive an expression for the likelihood of a particular
path, (X(t, !0 ))t2[0,T ] , of the process X, where !0 2 ⌦ is a fixed realization. An
important assumption in this work is that the propensity functions aj can be written
as aj (x) = cj gj (x) for j=1, . . . , J and x 2 Zd+ where gj are known functionals and cj
are considered the unknown parameters. Define ✓:=(c1 , . . . , cJ ). Let us denote the
jump times of (X(t, !0 ))t2[0,T ] in (0, T ) by ⇠1 , ⇠2 , . . . , ⇠N 1. Define ⇠0 := 0, ⇠N := T
and ⇠i = ⇠i+1 ⇠i for i = 0, 1, . . . , N 1.
Let us assume that the system is in the state x0 at time 0. We have that ⇠1 is
the time to the first reaction, or equivalently, the time that the system spend at x0
(sojourn time or holding time at state x0 ). Let us denote by ⌫⇠1 the reaction that
takes place at ⇠1 , and therefore, the system at time ⇠1 is in the state x1 := x0 + ⌫⇠1 .
From the SSA algorithm it is easy to see that the probability density corresponding
to this transition is the product a⌫⇠1 (x0 ) exp ( a0 (x0 ) ⇠0 ).
By the Markov property we can see that the density of one path ((⇠i , xi ))N 1
i=0 is
given by
N
Y1
a⌫⇠i (xi 1 ) exp ( a0 (xi 1 ) ⇠i 1 ) ⇥ exp ( a0 (xN 1) ⇠N 1 ). (9.18)
i=1
331
The last factor in (9.18) is due to the fact that we know that the system will remain
in the state xN 1 in the time interval [⇠N 1 , T ).
N
!N 1
X1 Y
exp a0 (xi ) ⇠i a⌫⇠i (xi 1 ). (9.19)
i=0 i=1
N
X1 N
X1
a0 (xi ) ⇠i + log(a⌫⇠i (xi 1 )). (9.20)
i=0 i=1
By the definition of a0 and the assumption aj (x) = cj gj (x), we can write (9.20) as
N
X1 X
J N
X1
cj gj (xi ) ⇠i + log(c⌫⇠i g⌫⇠i (xi 1 )).
i=0 j=1 i=1
Interchanging the order in the summation and denoting the number of times that the
reaction ⌫j occurred in the interval [0, T ] by Rj,[0,T ] , we have
J N
!
X X1 N
X1
cj gj (xi ) ⇠i + log(cj )Rj,[0,T ] + log(g⌫⇠i (xi 1 )). (9.21)
j=1 i=0 i=1
Observing that the last term in (9.21) does not depend on ✓, the complete log-
likelihood of the path (X(t, !0 ))t2[0,T ] is up to constant terms given by
J
X
c
` (✓) := log(cj )Rj,[0,T ] cj Fj,[0,T ] , with ✓=(c1 , . . . , cJ ), (9.22)
j=1
RT
where Fj,[0,T ] := gj (x0 ) ⇠0 +· · ·+gj (xN 1) ⇠N 1 = 0
gj (X(s)) ds. The last equality
is due to gj being piece-wise constant in the partition {⇠0 , ⇠1 , . . . , ⇠N }.
Now let us assume that we have a collection of intervals, (Ik = [sk , tk ])K
k=1 ⇢ [0, T ],
where we have continuously observed the process (X(t, ·))t2Ik at each Ik . We define
332
the log-likelihood function as:
J K K
!
X X X
c
` (✓) := log(cj ) Rj,Ik cj Fj,Ik .
j=1 k=1 k=1
Remark 9.3.2. Note that Rj,Ik and Fj,Ik are random variables which are functions of
the full paths of X, but not of the discretely observed paths. Hence, they are random
given the data D as defined in (9.1).
According to the Section 9.3.1, for a particular value of the parameter ✓, say ✓(p) , we
define
J K K
!
X X ⇥ ⇤ X ⇥ ⇤
Q✓(p) (c1 , . . . , cJ D) := log(cj ) E✓(p) Rj,Ik D cj E✓(p) Fj,Ik D ,
j=1 k=1 k=1
⇥ ⇤ ⇥ ⇤
where E✓(p) Rj,Ik D = E✓(p) Rj,Ik X(sk )=x(sk ), X(tk )=x(tk ) (by the Markov prop-
erty). Analogously for Fj,Ik .
Consider now the partial derivatives of Q✓(p) (c1 , . . . , cJ D) with respect to cj
K K
1 X ⇥ ⇤ X ⇥ ⇤
@cj Q✓(p) (c1 , . . . , cJ D) = E✓(p) Rj,Ik D E✓(p) Fj,Ik D .
cj k=1 k=1
PK ⇥ ⇤
E ✓ (p) Rj,Ik D
c⇤j = Pk=1
K ⇥ ⇤ , j=1, . . . , J. (9.23)
k=1 E✓ (p) Fj,Ik D
This is clearly the global maximization point of the function Q✓(p) (· D).
The EM algorithm for this particular problem generates a deterministic sequence
(✓(p) )+1
p=1 that starts from a deterministic initial guess ✓
(0)
provided by the phase I (see
333
Section 9.4.1) and evolves by
PK ⇥ ⇤
(p+1) k=1 E✓ (p) Rj,Ik D
cj = PK ⇥ ⇤, (9.24)
k=1 E✓ (p) Fj,Ik D
⇣ ⌘
(p) (p) (p)
where ✓ = c1 , . . . , c J .
In this section, we present a two-phase algorithm for estimating the parameter ✓. The
phase I is deterministic while the phase II is stochastic. We consider the data, D, as
given by (9.1). The main goal of this section is to provide a Monte Carlo version of
formula (9.24).
The objective of the phase I is to address the key problem of finding a suitable initial
(0)
point ✓II to reduce the variance (or the computational work) of the phase II, thereby
increasing (in some cases dramatically) the number of SRN-bridges from the sampled
forward-reverse trajectories for all time intervals.
(0)
Let us now describe the phase I. From the user-selected seed, ✓I , we solve the fol-
lowing deterministic optimization problem using some appropriate numerical iterative
method:
(0)
X 2
✓II := arg min wk Z̃ (f ) (t⇤k ; ✓), Z̃ (b) (t⇤k ; ✓) . (9.25)
✓ 0
k
Here Z̃ (f ) is the ODE approximation, defined by (9.5), in the interval [sk , t⇤k ], to
the SRN defined by the reaction channels, ((⌫j , aj ))Jj=1 , and the initial condition
334
x(sk ); and, Z̃ (r)
, is the ODE approximation in the interval [t⇤k , tk ], to the SRN de-
fined by the reaction channels, (( ⌫j , ãj ))Jj=1 , and by the initial condition x(tk ).
Let us recall that in Section 9.2.2, ãj (x) has been defined as aj (x ⌫j ). We define
Z̃ (b) (u, ✓):=Z̃ (r) (t⇤k +tk u, ✓) for u 2 [t⇤k , tk ]. Further, wk :=(tk sk ) 1
and k·k is the
Euclidean norm in Rd . The rationale behind this particular choice of the weight fac-
tors is based on the mitigation of the e↵ect of very large time intervals where the
evolution of the process, X, may be more uncertain. A better (but more costly)
measure would be the inverse of the maximal variance of the SRN-bridge.
(0)
Remark 9.4.1 (Alternative definition of ✓II ). In some cases, convergence issues
arise when solving the problem (9.25). We found it useful to solve a set of simpler
problems whose answers can be combined to provide a reasonable seed for the phase
II: more precisely, we solve K deterministic optimization problems, one for each time
interval [sk , tk ]:
(0)
all of them solved iteratively with the same seed, ✓I . Then, we define
P
(0) wk k
✓II := Pk . (9.26)
k wk
In our statistical estimation approach, the Monte Carlo EM Algorithm uses data
(pseudo-data) generated by those forward and backward simulated paths that result
in SRN-bridges, either exact or approximate bridges. In Figure 9.1, we illustrate this
idea, for the wear example data presented in Section 9.6.2. The phase II implements
the Monte Carlo EM algorithm for SRNs.
335
63
70 62
61
65
60
60
59
W
W
55
58
50 57
56
45
55
0.25 0.3 0.35 0.4 0.45 0.5 0.365 0.37 0.375 0.38 0.385 0.39 0.395 0.4
Time Time
Figure 9.1: Left: Illustration of the forward-reverse path simulation in Phase II. The
plot corresponds to a given interval for the wear data, presented in Section 9.6.2. The
observed values are marked with a black circle (beginning and end of the interval).
In the y-axis we plot the thickness process X(t), derived from the wear process of
the cylinder liner. Observe that every forward path that ends up at a certain value
will be joined with every backward path that ends up in the same value, when using
the Kronecker kernel. For example, this happens at value 58, where several forward
paths end and several backward paths starts. Right: Zoom near the value 58.
This phase starts with the simulation of forward and backward paths at each time
interval Ik , for k=1, ..., K. More specifically, given an estimation of the true parameter
✓, say, ✓ˆ = (ĉ1 , ĉ2 , . . . , ĉJ ), the fist step is to simulate Mk forward paths with reaction
channels (⌫j , ĉj gj (x))Jj=1 in [sk , t⇤k ], all of them starting at sk from x(sk ) (see Section
9.5.1 for details about the selection of Mk ). Then, we simulate Mk backward paths
with reaction channels ( ⌫j , ĉj gj (x ⌫j ))Jj=1 in [t⇤k , tk ], all of them starting at tk
˜ m ))M
from x(tk ). Let (X̃ (f ) (t⇤k , ! (b) ⇤
˜ m0 ))M
m=1 and (X̃ (tk , !
k
m0 =1 denote the values of the
k
simulated forward and backward paths at the time t⇤k , respectively. If the intersection
of these two sets of points is nonempty, then, there exists at least one m and one m0
such that the forward and backward paths can be linked as one SRN-path connecting
the data values x(sk ) and x(tk ).
When the number of simulated paths Mk is large enough, and an appropriate
guess of the parameter ✓ is used to generate those paths, then, due to the discrete
336
nature of our state space Zd+ we expect to generate a sufficiently large number of
exact SRN-bridges to perform statistical inference. However, at early stages of the
Monte Carlo EM algorithm, our approximations to the unknown parameter ✓ are not
expected to provide a large number of exact SRN-bridges. In such a case, we can
use kernels to relax the notion of exact SRN-bridge, (see Section 9.2.3). Notice that
in the case of exact SRN-bridges, we are implicitly using a Kronecker kernel in the
formula (9.16), that is, takes the value 1 when X̃ (f ) (t⇤k , !
˜ m ) = X̃ (b) (t⇤k , !
˜ m0 ) and 0
otherwise. We can relax this condition to obtain approximate SRN-bridges.
To make an computationally efficient use of kernels, we sometimes transform the
endpoints of the forward and backward paths generated in the interval Ik ,
Xk := (X̃ (f ) (t⇤k , !
˜ 1 ), X̃ (f ) (t⇤k , !
˜ 2 ), . . . , X̃ (f ) (t⇤k , !
˜ Mk ), (9.27)
X̃ (b) (t⇤k , !
˜ Mk +1 ), X̃ (b) (t⇤k , !
˜ Mk +2 ), . . . , X̃ (b) (t⇤k , !
˜ 2Mk )),
into
Ỹ (b) (t⇤k , !
˜ Mk +1 ), Ỹ (b) (t⇤k , !
˜ Mk +2 ), . . . , Ỹ (b) (t⇤k , !
˜ 2Mk )),
✓ ◆d Yd
3
(⌘) := (1 ⌘i2 )1|⌘i |1 , (9.29)
4 i=1
where ⌘ is defined as
⌘ ⌘ ⌘k (m, m0 ) := Ỹ (f ) (t⇤k , !
˜m) Ỹ (b) (t⇤k , !
˜ m0 ). (9.30)
(f ) (f )
generated those paths, we record Rj,Ik (˜
!m ) and Fj,Ik (˜
!m ) for all j = 1, 2, . . . , J and
(b)
m = 1, 2, . . . , Mk as defined in Section 9.3.2. Analogously, we record Rj,Ik (˜
!m0 ) and
(b)
!m0 ) for all j = 1, 2, . . . , J and m0 = 1, 2, . . . , Mk .
Fj,Ik (˜
Consider the following -weighted averages, where = ✏ for an appropriate choice
⇥ ⇤ ⇥ ⇤
of bandwidth ✏, that approximate E✓(p) Rj,Ik D and E✓(p) Fj,Ik D , respectively:
P ⇣ ⌘
(f ) (b)
Rj,Ik (˜
m,m0 !m0 ) (⌘k (m, m0 )) k (m0 )
!m ) + Rj,Ik (˜
A✓ˆ(p) (Rj,Ik D; ) := P 0 0
(9.31)
m,m0 (⌘k (m, m )) k (m )
II
P ⇣ ⌘
(f ) (b)
m,m0 Fj,Ik (˜ !m0 ) (⌘k (m, m0 )) k (m0 )
!m ) + Fj,Ik (˜
A✓ˆ(p) (Fj,Ik D; ) := P 0 0
m,m0 (⌘k (m, m )) k (m )
II
338
where ⌘ (m, m ) has been defined in (9.30) and m, m0 = 1, 2, . . . , Mk , and k (m0 ) :=
0
⇣R ⌘
t
exp t⇤k cj (X̃ (b) (s, !
˜ m0 ))ds , according to Theorem 9.2.2. Observe that we generate
k
Mk forward and reverse paths in the interval Ik , but we do not control directly the
number of exact or approximate SRN-bridges that are formed. The number Mk is
chosen using a coefficient of variation criterion, as explained in Section 9.5.1. In
Section 9.5.2, we indicate an algorithm to reduce the computational complexity of
computing those -weighted averages from O(Mk2 ) to O(Mk log(Mk )).
Finally, the Monte Carlo EM algorithm for this particular problem generates a
stochastic sequence (✓ˆII )+1
(p) (0)
p=1 staring from the initial guess ✓II provided by the phase
PK
k=1 A✓ˆ(p) (Rj,Ik D; )
ĉ(p+1) = PK II
, (9.32)
k=1 A✓ˆ(p) (Fj,Ik D; )
II
⇣ ⌘
where ✓ˆII = ĉ1 , . . . , ĉJ . In Section 9.5.4, a stopping criterion based on techniques
(p) (p) (p)
This section is intended to show computational details omitted in Section 9.4. Here,
we explain why and how we transform the clouds Xk formed by endpoints of forward
and reverse paths in the time interval Ik at the time t⇤k , for k=1, ..., K. Then, we
explain how to chose the number of simulated forward and backward paths, Mk ,
in the time interval Ik to obtain accurate estimates of the expected values of Rj,Ik
and Fj,Ik for j = 1, 2, . . . , J. Next, we show how to reduce the computational cost of
computing approximate SRN-bridges from O(Mk2 ) to O(Mk log(Mk )) using a strategy
introduced by Bayer and Schoenmakers [21]. Finally, we indicate how to choose the
initial seeds for the phase I and a stopping criteria for the phase II.
339
9.5.1 On the Selection of the Number of Simulated Forward-
Backward Paths
1. First sample M forward-reverse paths (in the numerical examples we use M =100).
2. If the number of joined forward-reverse paths using a delta kernel is less than a
certain threshold , we transform the data as described in Section 9.5.3. This
data transformation allow us to use the Epanechnikov kernel (9.29). In this
way, we are likely to obtain a larger number of joined paths.
3. We then compute the coefficient of variation of the sample mean of the sum of
(f ) (b)
the number of times that each reaction j occurred in the interval Ik , Rj,Ik +Rj,Ik
(f ) (b) (f ) R
and Fj,Ik +Fj,Ik , for j=1, ..., J. Here Fj,Ik = Ik gj (X (f ) (s)) ds and the coefficient
(b) R
of variation of the sample mean of the sum Fj,Ik = Ik gj (X (b) (s)) ds. Further
details can be found in Section 9.3.2. The coefficient of variation (cv) of a
random variable is defined as the ratio of its standard deviation over its mean
µ, cv := |µ|
. In this case, for the reaction channel j in the interval Ik , we have:
(f ) (b)
1/2 S(Rj,Ik (˜
!m )+Rj,Ik (˜
!m ); Lk )
cvR̄ (Ik , j) = Lk ⇣ ⌘
(f ) (b)
A Rj,Ik (˜
!m )+Rj,Ik (˜
!m ); Lk
and
(f ) (b)
1/2 S(Fj,Ik (˜
!m )+Fj,Ik (˜
!m ); Lk )
cvF̄ (Ik , j) = Lk ⇣ ⌘,
(f ) (b)
A Fj,Ik (˜
!m )+Fj,Ik (˜
!m ); Lk
where S(Y ; L):=A (Y 2 ; L)A (Y ; L)2 is the sample standard deviation of the
P
random variable Y over an ensemble of size L, and A (Y ; L) := L1 Lm=1 Y (!m ),
its sample average. Here Lk denotes the number of joined paths in the interval
340
k, which is bounded by Mk2 . For the case that Lk is small, we compute a
bootstrapped coefficient of variation.
The idea is that, controlling both coefficients of variation, we can control the
variation of the p-th iteration estimation ✓ˆII . Our numerical experiments con-
(p)
4. If each coefficient of variation is less than a certain threshold then the sampling
for interval Ik finishes, being Mk the total number of sampled paths, and ac-
cepting the quantities in step 3., and also the quantities (⌘k (m, m0 )) k (m
0
),
m, m0 = 1, ..., L, defined in Section 9.3.2. Otherwise, we sample additional
forward-reverse paths (increasing the number of sampled paths at each itera-
tion M ) and go to step 2.
M ⇣
M X
X ⌘
(f ) (b)
Rj,Ik (˜
!m ) + Rj,Ik (˜
!m0 ) m,m0 .
m=1 m0 =1
A double sum like this one appears in the numerator of (9.31). Instead of computing a
double loop which always takes O(M 2 ) steps (and many of those steps contribute 0 to
the sum), we take the following alternative approach: let ⇥di=1 [Ai , Bi ] be the smallest
hyperrectangle of sides [Ai , Bi ], i = 1, ..., d, that contains the cloud Y, defined in
(9.28). Let us also assume that Ai , Bi , i = 1, ..., d are integers. The length Bi Ai
341
depends on how sparse the cloud is in its i-th dimension. Given the cloud, it is easy to
check that the values Ai , Bi , i = 1, ..., d can be computed in O(M ) operations. Now,
we subdivide the hyperrectangle in sub-boxes of size-length 1, with sides parallel to
the coordinate axis.
Since we have a finite number of those sub-boxes, we can associate an index for
each one, in such a way that it is possible to directly retrieve each one using a suitable
data structure (for example an efficient sparse matrix or a hash table). The average
access cost of such structure is constant with respect of M . For each sub-box, we
associate a list of forward points that ended up in that sub-box. It is also direct to
see that the construction of such a structure takes a computational cost of M steps
on average. Then, instead of evaluating the double sum which has O(M 2 ) steps, we
evaluate only the non-zero terms. This is because, when a kernel is used, (x, y) 6= 0
if and only if x and y are situated in neighboring sub-boxes. That is,
M ⇣
M X
X ⌘
(f ) (b)
Rj,Ik (˜
!m ) + Rj,Ik (˜
!m0 ) m,m0
m=1 m0 =1
Xi ) ⇣ ⌘
3 n(b
M X d
X (f ) (b)
= Rj,Ik (˜
!`(l) ) + Rj,Ik (˜
!m0 ) `(l),m0 ,
m0 =1 i=1 l=1
where n(bi ) is the total quantity of reverse end points associated with the i-th neighbor
of the sub-box to which the forward end-point, Ỹ (f ) (t⇤k , !
˜ m ), belongs, whereas `(l)
indexes one of those reverse end points. Note that the constant of this complexity
depends exponentially on the dimension (3d ).
The cost that dominates the triple sum on the right hand side is the expected maxi-
mum number of reverse points that can be found in a sub-box. This size can be proved
to be O(log(M )), which makes the whole joining algorithm of order O(M log(M )).
For additional details we refer to [2].
342
9.5.3 A Linear Transformation for the Epanechnikov Kernel
We have seen in our numerical experiments that clouds formed by the endpoints of
the simulated paths, X , usually have a shape similar to the cloud Z shown in the left
panel of Figure 9.1.
It turns out that partitioning the space into d-dimensional cubes with sides parallel
to the coordinate axis is a far-from-optimal manner to select the kernel domains and
consequently to find SRN-bridges. A more natural way of proceeding can be to divide
the space into a system of parallelepipeds with sides parallel to the principal directions
of the cloud Z with sides proportional to the lengths of its corresponding semi-axes
(we are thinking in some sort of singular value decomposition here), and use them as
the supports or our kernels.
Another way of proceeding (somehow related but not totally equivalent) is to
transform the original cloud Z to obtaining another cloud T (Z) with near-spherical
shape. Then, scale it to have in average one point of the cloud in each d-dimensional
cube (with sides parallel to the coordinate axis). In this new cloud, H(Z), we can
naturally find neighbors using the algorithm described in Section 9.5.2 below and
the Epanechnikov kernel to assign weights. For that reason we stated in Section
9.4 that we want to transform the data Xk into an isotropic cloud, such that, every
unitary cube centered in Ỹ (f ) (t⇤k , !
˜m0
) contains, on average, one point of the cloud
[m0 Ỹ (b) (t⇤k , !
˜m0
).
We now proceed to describe the details of the mentioned transformations.
We first show a customary procedure in statistics to motivate the transformation.
Let ⌃ := cov(Z) be the sample covariance matrix computed from a cloud of points
1/2
Z. To obtain a de-correlated version of Z the linear transformation T (z) = ⌃ z
is widely used in statistics. For example, consider a cloud Z of points obtained by
sampling 103 independent highly correlated bi-variate Gaussian random variables.
The corresponding cloud T (Z), depicted in the right panel of Figure 9.1, shows the
343
aspect of a sphere of radius 3. The next step is to obtain a radius ↵ such that the
300 2
200
1
100
0
0
−100 −1
−200
−2
−300
−3
−400
−500 −400 −300 −200 −100 0 100 200 300 400 500 −4 −3 −2 −1 0 1 2 3 4
Figure 9.1: Left: A bivariate Gaussian cloud, Z. Right: Its corresponding decorre-
lated and scaled version T (Z).
Y = H(Z) = α Td(Z)
15
10
−5
−10
−15
−20
−20 −15 −10 −5 0 5 10 15 20
Remark 9.5.1. According to the transformation H, the kernel used in our case is
approximately equal to
1 1
H (z) := H (z) ,
det(H)
where is the Epanechnikov kernel defined in (9.29), since it corresponds with the
continuous case and not with the lattice case.
A well known fact about the EM Algorithm is that, given a starting point, it converges
to a saddle point or a local maximum of the likelihood function. Unless we know
beforehand that the likelihood function has a unique global maximum, we can not be
sure that the output of the EM Algorithm is the MLE we are looking for. The same
phenomenon occurs in the case of the Monte Carlo EM Algorithm, and for that reason
345
Casella and Robert in [4] recommend to generate a set of N (usually N around five)
parallel independent Monte Carlo EM sequences starting from a set of over-dispersed
initial guesses. Usually, we do not know even the scale of the coordinates of our
unknown parameter ✓ = (c1 , c2 , . . . , cd ). For that reason, we recommend to run only
the phase I of our algorithm over a set of uniformly distributed random samples
Q
drawn from a d-dimensional hyper rectangle di=1 (0, Ci ], where Ci is a reasonable,
case dependent, upper bound for each reaction rate parameter ci . We observed in our
numerical experiments, that the result of this procedure is a number of points laying
on a low dimensional manifold. Once this manifold is identified, N di↵erent initial
guesses are taken as over-dispersed seeds for the phase II.
Note that the stochastic iterative scheme given by formula (9.32) may be easily
adapted to produce N parallel stochastic sequences where, for each i = 1, 2, . . . , N ,
the distribution of the random variable ✓ˆII,i depends on its history, (✓ˆII,i )pk=1 , only
(p+1) (k)
trough its previous value, ✓ˆII,i . In this sense, the N sequences, (✓ˆII,i )+1
(p) (p+1)
p=1 , are Markov
N ⇣
X ⌘2 Xp N
p ¯·,i ¯ , where ¯·,i := 1 1 X¯
and ¯ :=
(k)
B := i ·,i , and
N 1 i=1
p k=1 N i=1
N p ⇣ ⌘2
1 X 1 X
W := s2i , where s2i :=
(k) ¯·,i .
i
N i=1
p 1 k=1
Then define
r
p 11 V
V := W + B and R̂ := . (9.33)
p p W
346
B and W are known as between and within variances, respectively. It is expected that
R̂ (potential scale reduction) declines to 1 as p ! +1. In our numerical experiments
we use 1.4 as a threshold. Quoting Gelman and Shirley in Chapter 6 of [30] “At
convergence, the chains will have mixed, so that the distribution of the simulations
between and within chains will be identical, and the ratio R̂ should equal 1. If R̂
is greater than 1, this implies that the chains have not fully mixed and that further
simulation might increase the precision of inferences”.
Once we stop to iterate after p⇤ iterations, the individual outputs,
form a small cluster. We can not be totally sure that this cluster is near to the MLE,
but at least we have some kind of confidence on that. In such a case, we can use
the mean of that small cluster as a MLE estimation of our unknown parameter, ✓.
Otherwise, if we have two or more clusters or overdispersed results, we should make
a more careful analysis.
Remark 9.5.3. The R̂ stopping criterion works if there is only one local maximum
in the basin of attraction of the algorithm. Otherwise R̂ may not decrease to 1,
even worse, it may go to +1. For that reason, it is recommendable to monitor the
evolution of R̂. In our numerical examples we have that R̂ is decreasing and we stop
the algorithm using R̂0 = 1.4 as a threshold.
In this section, we present numerical results that show the performance of our FREM
(0)
algorithm. In the phase I, we use the alternative definition of ✓II,i described in Remark
9.4.1. For the phase II, we run N = 4 parallel sequences using 1.4 as a threshold for R̂
(described in Section 9.5.4). As a point estimator of ✓, we provide the cluster average
347
(p⇤ ) (p⇤ ) (p⇤ )
of the sequence ✓ˆII,1 , ✓ˆII,2 , . . . , ✓ˆII,N .
For each example, we report: i) the number of iterations of the phase II, p⇤ ; ii) a
(0) (0)
table containing a) the initial points, ✓I,i , b) the outputs of the phase I, ✓II,i , and c)
(p⇤ )
the outputs of the phase II, ✓ˆII,i ; and iii) a figure with all those values.
For the examples in which we generate synthetic data, we provide the seed parame-
ter ✓G we used to generate the observations. It is important to stress that the distance
from our point estimator to ✓G depends of the number of generated observations.
We first start with a simple model, with only one species and two reaction channels
described respectively by the stoichiometric matrix and the propensity function
0 1 0 1
B 1 C B c1 X C
⌫=@ A and a(X) = @ A.
4 c2 X · 1X 4
We set X0 =100, T =1 and consider synthetic data observed in uniform time inter-
1
vals of size t= 16 . This determines a set of 17 observations generated from a single
path, using the parameter ✓G =(3.78, 7.20). The data trajectory is shown in Figure
9.1.
(0) (0)
For this example, we ran N =4 FREM-sequences starting at ✓I,1 =(1, 5), ✓I,2 =(6, 5),
(0) (0) ˆ
✓I,3 =(1, 9) and ✓I,4 =(6, 9). We obtained a cluster average of ✓=(3.68, 7.50), and took
p⇤ =3 iterations to converge (minimum imposed). Details can be found in Table 9.1
and Figure 9.2.
(p⇤ )
= ✓ˆII,i
(0) (0)
i ⇤ = ✓I,i 3 = ✓II,i
1 (1, 5) (1.35, 10.67) (3.65, 7.52)
2 (6, 5) (7.85, 9.11) (3.80, 7.46)
3 (1, 9) (1.20, 10.71) (3.63, 7.50)
4 (6, 9) (7.06, 9.30) (3.65, 7.50)
Table 9.1: Values computed by the FREM Algorithm for the decay example.
348
Data trajectory
100
X
90
80
70
Species count
60
50
40
30
20
10
0 0.2 0.4 0.6 0.8 1
Time
Figure 9.1: Data trajectory for the Decay example. This is obtained by observing the
values of an SSA path at uniform time intervals of size t=1/16.
11
Initial point phase I
Initial point phase II
10
Final point phase II
9
5
1 2 3 4 5 6 7 8
Figure 9.2: FREM estimation (phase I and phase II) for the decay process.
349
Remark 9.6.1. Recall that the distance between the value ✓G used to generate syn-
thetic data and the the estimation ✓ˆ is meaningless for small data sets. The relevant
distance in this estimation problem is the one we obtain from our FREM algorithm
✓ˆ and the ✓ˆMLE based on maximizing the true likelihood function, but this last one is
not available in most cases.
We now test our FREM algorithm by using real data. We will show that these data
can be modeled using a decay process. The data set w = {wi }ni=1 , taken from [31],
consists of wear levels observed on n = 32 cylinder liners of eight-cylinder SULZER
engines as measured by a caliper with a precision of = 0.05 mm. Data are presented
in Figure 9.3.
4
Wear [mm]
0
0 1 2 3 4 5 6
Operating time [h] 4
x 10
Figure 9.3: Data set from [31]. Data refer to cylinder liners used in ships of the
Grimaldi Group.
The finite resolution of the caliper allows us to represent the set of possible mea-
surements using a finite lattice. We propose to model the measurements as a Marko-
350
vian pure jump process. Let X(t) be the thickness process derived from the wear
of the cylinder liners up to time t, i.e., X(t) = X0 W (t), where W is the wear
process and X0 is the initial thickness. The final time of some observations is close
to T =60, 000 hours.
We model X(t) as a decay processes with two reaction channels and = 0.05,
since a simple decay process is not enough to explain the data. The two considered
intensity-jump pairs are (a1 (x), ⌫1 ) = (c1 x, ) and (a2 (x), ⌫2 ) = (c2 x, 4 ). Here
c1 and c2 are coefficients with dimension (mm · hour) 1 .
The linear propensity functions, the value X0 =5 mm and the initial values for
(0) (0) (0) (0)
the phase I: ✓I,1 =(1, 1), ✓I,2 =(10, 1), ✓I,3 =(1, 10) and ✓I,4 =(10, 10), are motivated by
previous studies of the same data set, see [26] for details.
In our computations, we re-scaled the original problem by setting =1 and T =1.
ˆ
Our FREM algorithm estimation gave us a cluster average of ✓=(8.9, 5.7) which cor-
ˆˆ
responds to ✓=(1.5 · 10 4 , 0.97 · 10 4 ) in the non-scaled model. The algorithm took
p⇤ =93 iterations to converge. Details can be found in Table 9.2 and Figure 9.4.
12
Initial point phase I
10
Initial point phase II
Final point phase II
8
c2
0
0 5 10 15 20 25 30 35 40
c1
Figure 9.4: FREM estimation (phase I and phase II) for the wear data set.
351
(p⇤ )
= ✓ˆII,i
(0) (0)
i ⇤= ✓I,i 3 = ✓II,i
1 (1, 1) (2.81, 9.90) (8.56, 5.83)
2 (10, 1) (36.88, 1.58) (9.07, 5.71)
3 (1, 10) (1.13, 10.31) (8.68, 5.80)
4 (10, 10) (11.44, 7.79) (9.34, 5.62)
Table 9.2: Values computed by the FREM Algorithm for the wear example.
Data and the 90% confidence band Data and the 90% confidence band
5 5
Data
The new 90% confidence band
4 4
Thickness
Thickness
3 3
2 2
1 1
Data
The 90% confidence band
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Operating time [h] x 10
4 Operating time [h] 4
x 10
Figure 9.5: Left: confidence band with the parameter ✓˜ obtained in [26] for the wear
example. Right: the confidence band obtained with the FREM algorithm.
Remark 9.6.2. In this particular example, the data set has been obtained with the
help of a caliper with a finite precision. Therefore, our likelihood should incorporate
also the distribution of the measurement errors, which may be assumed Gaussian and
independent and identically distributed with mean zero and variance equals to the
caliper’s precision. We omitted this step in our analysis for the sake of simplicity and
brevity.
ˆˆ
Remark 9.6.3. Comparing our FREM estimate, ✓=(1.5 · 10 4 , 0.97 · 10 4 ), with the
˜
value obtained in [26] for the same data set and the same model, ✓=(0.63 · 10 4 , 1.2 ·
10 4 ), we obtained the same scale in the coefficients and a quite similar confidence
band, see Figure 9.5.
352
9.6.3 Birth-death Process
1c 2 c
; ! X, X ! ;
26
24
Species count
22
20
18
16
14
12
0 50 100 150 200
Time
Figure 9.6: Data trajectory for the Birth-death example. This is obtained by observ-
ing the values of an SSA path at uniform time intervals of size t=5.
0.09
Initial point phase I
Initial point phase II
0.08
Final point phase II
0.07
0.06
0.05
0.04
0.03
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Figure 9.7: FREM estimation (phase I and phase II) for the birth-death process.
N (see [32]). The importance of this example lies in the fact that has a non-linear
propensity function and it has two dimensions.
354
(p⇤ )
= ✓ˆII,i
(0) (0)
i ⇤= ✓I,i 3 = ✓II,i
1 (0.5, 0.04) (6.24e-01, 3.29e-02) (1.24e+00, 6.55e-02)
2 (0.5, 0.08) (7.68e-01, 4.07e-02) (1.29e+00, 6.67e-02)
3 (1.5, 0.04) (1.01e+00, 5.25e-02) (1.18e+00 6.27e-02)
4 (1.5, 0.08) (1.53e+00, 7.97e-02) (1.20e+00, 6.34e-02)
Table 9.3: Values computed by the FREM Algorithm for the birth-death example.
S+I ! 2I, I ! R
We set X0 =(300, 5), T =10 and consider synthetic data generated using the parame-
ters ✓G =(1.66, 0.44) by observing at uniform time intervals of size t=1. The data
trajectory is shown in Figure 9.8.
(0)
For this example we ran N =4 FREM sequences starting at ✓I,1 =(0.40, 0.05),
(0) (0) (0)
✓I,2 =(0.40, 1.00), ✓I,3 =(3.00, 0.05) and ✓I,4 =(3.00, 1.00). Those points where chosen
after some previous exploration with the phase I.
ˆ
Our FREM algorithm estimation gave us a cluster average of ✓=(1.65, 0.39). The
FREM algorithm took p⇤ =3 iterations to converge (minimum imposed). Details can
be found in Table 9.4 and Figure 9.9.
(p⇤ )
= ✓ˆII,i
(0) (0)
i ⇤ = ✓I,i 3 = ✓II,i
1 (0.40, 0.05) (1.50, 0.38) (1.65, 0.39)
2 (0.40, 1.00) (1.50, 0.38) (1.65, 0.39)
3 (3.00, 0.05) (1.50, 0.38) (1.66, 0.39)
4 (3.00, 1.00) (1.50, 0.38) (1.66, 0.39)
Table 9.4: Values computed by the FREM Algorithm for the SIR model.
355
Data trajectory
100
90 S
I
80
70
Species count
60
50
40
30
20
10
0
0 2 4 6 8 10
Time
Figure 9.8: Data trajectory for the SIR example. This is obtained by observing the
values of an SSA path at uniform time intervals of size t=1.
1
0.9
Initial point phase I
Initial point phase II
0.8
Final point phase II
0.7
0.6
c2
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3
c1
Figure 9.9: FREM estimation (phase I and phase II) for the SIR model. In this
particular case, where the results of the phase I are a single point, 4 MCMCs seem
to be unnecessary, but the R̂ criterion needs at least 2 chains.
356
9.6.5 Auto-regulatory Gene Network
The following model, taken from [7] has eight reaction channels and five species. It
has been selected to test the robustness of our FREM algorithm to deal with several
dimensions and several reactions.
1 c 2 c
DN A + P2 ! DN A P2 , DN A P2 ! DN A + P2
3 c 4 c
DN A ! DN A + mRN A, mRN A ! ;
5 c c
6
P +P ! P2 , P2 ! P +P
7 c 8c
mRN A ! mRN A + P, P ! ;
0 1 0 1
1 1 0 0 1 c1 DN A P2
B C B C
B C B C
B 1 1 0 0 1 C B c2 DN A · P2 C
B C B C
B C B C
B 0 0 1 0 0 C B c3 DN A C
B C B C
B C B C
B C B C
B 0 0 1 0 0 C B c4 mRN A C
⌫=B
B
C and a(X) = B
C B
C.
C
B 0 0 0 2 1 C B c5 P (P 1) C
B C B C
B C B C
B 0 0 0 2 1 C B c 6 P2 C
B C B C
B C B C
B C B C
B 0 0 0 1 0 C B c7 mRN A C
@ A @ A
0 0 0 1 0 c8 P
10
0
0 10 20 30 40 50
Time
Figure 9.10: Data trajectory for the auto-regulatory gene network example obtained
by observing the values of an SSA path at uniform time intervals of size t= 12 .
(0) (0)
example we ran N =2 FREM sequences starting at ✓I,1 = 0.1 v and ✓I,2 = 0.5 v,
respectively, where v is the vector of R8 with all its components equal to one. Our
FREM algorithm estimation gave us a cluster average of
Remark 9.6.4. Observe that in the examples where the stoichiometric vectors are
358
(0)
linearly dependent, the results of the phase I, ✓II,i , i = 1, 2, 3, 4, lies in a hyperplane
that reflects certain amount of indi↵erence in the coefficient estimations. This does
not happen in the SIR example where all the estimations in the phase I are essentially
the same.
9.7 Conclusions
Acknowledgments
The research reported here was supported by King Abdullah University of Science
and Technology (KAUST). A. Moraes, R. Tempone and P. Vilanova are members of
the KAUST SRI Center for Uncertainty Quantification at the Computer, Electrical
and Mathematical Sciences & Engineering Division at King Abdullah University of
Science and Technology (KAUST).
359
Appendix
360
Algorithm 30 The F-R (forward-reverse) path generation algorithm in the MCEM
phase, for a given time interval, [s, t]. Inputs: the initial sample size, M0 , the co-
efficient of variation threshold, cv0 , the initial time, s, the final time, t, the initial
observed state, x(s), and the final observed state, x(t). Outputs: a sequence of the
number of times that a reaction channel fired in the given time interval, ((rj,l )Jj=1 )Ll=1 ,
a sequence of forward Euler values for the given time interval, ((uj,l )Jj=1 )Ll=1 and a
sequence of kernel weights for the given time interval, ((wj,l )Jj=1 )Ll=1 . Notes: Here Vd
(f )
is the volume of a d dimensional unit sphere, X̃·,·,n is the sampled forward process
(f ) (b) (b)
value at time tn , X̃·,·,n0 is the sampled reverse process at time tn0 , is the Kronecker
delta kernel and e is the Epanechnikov kernel, L is the number of joined F-R paths
in the time interval [s, t], where 0 L M̃ 2 . Finally, 0 < < 1 and CL is an integer
greater than 1 (in our examples we use 2).
1: M̃ 1
2: M M0
1
3: t⇤ 2
(t s)
4: while cv cv0 do
5: for m = M̃ to M̃ +M 1 do
(f ) (f ) N (m) (f )
6: ((X̃·,m,n , tm,n )n=1 , (rj,m )Jj=1 ) FW path from s to t⇤ starting at x(s)
(f ) (f ) (f ) (f )
7: uj,m (tm,n+1 tm,n )gj (X̃·,m,n )
(b) (b) N 0 (m) (b)
8: ((X̃·,m,n0 , tm,n0 )n=1 , (rj,m )Jj=1 ) RV path from t to t⇤ starting at x(t)
(b) (b) (b) (b)
9: uj,m (tm,n0 +1 tm,n0 )gj (X̃·,m,n0 +1 )
10: end for
(f,b) (f,b) (f,b)
11: (u·,l , r·,l , w·,l )Ll=1 join F-R paths (X̃·,· (t⇤ ), (rj,· )Jj=1 , (↵j,· )Jj=1 , )
(f ) (b)
12: Here, ↵j,l = ↵j,m + ↵j,m s.t. m 2 {1, 2, ..., M̃ } and
(f ) (b)
13: (X̃·,m (t⇤ ), X̃·,m (t⇤ )) > 0. Similarly for rj,l .
14: if L < d M̃ e then
(f ) (b)
15: ⌃ covariance matrix of (X̃·,m (t⇤ ), X̃·,m (t⇤ ))
16: ⌃ ⌃ + c diag(⌃), where c is a positive constant.
17: if ⌃ 1/2 not singular then
1
18: H 3
⌃ 1/2 ( VM̃d )1/d
19: ⇣ 1
20: repeat
(f ) (f )
21: Ỹ·,m (t⇤ ) ⇣H X̃·,m (t⇤ )
(b) (b)
22: Ỹ·,m (t⇤ ) ⇣H X̃·,m (t⇤ )
(f,b) (f,b) (f,b)
23: (u·,l , r·,l , w·,l )Ll=1 join F-R paths (Ỹ·,· (t⇤ ), (rj,· )Jj=1 , (↵j,· )Jj=1 , e )
24: ⇣ 1.5⇣
25: until L CL M̃
26: end if
27: end if
28: compute the coefficient of variation of (u·,l )Ll=1 and (r·,l )Ll=1 (see section 9.5.1)
29: M̃ M̃ + M
30: M 2M
31: end while
361
Algorithm 31 The F-R path join algorithm in the MCEM. Inputs: a sequence of
forward-backward samples for the time interval [s, t] evaluated at the intermediate
(f,b)
time, t⇤ , X̃·,· (t⇤ ), a sequence of the number of times that a reaction channel fired
(f,b)
in the forward interval [s, t⇤ ] and in the reverse interval [t⇤ , t], r·,· , the sequence
of forward Euler values for each reaction channel for the forward interval [s, t⇤ ] and
(f,b)
for the backward interval [t⇤ , t], u·,· , and the kernel . Outputs: the number of
joined paths, L, a sequence of the number of times that a reaction channel fired in
the interval [s, t], ((rj,l )Jj=1 )Ll=1 , the sequence of forward Euler values for each reaction
channel for the interval [s, t], ((uj,l )Jj=1 )Ll=1 and the sequence of kernel weights for
the interval [s, t], ((wj,l )Jj=1 )Ll=1 . Notes: S is a two dimensional sparse matrix of size
C ⇥ M̃ .
1: L 0
2: for i = 1 to d do
(f,b)
3: Ai minm bX̃i,m (t⇤ )c
(f,b)
4: Bi maxm dX̃i,m (t⇤ )e
5: Ei 1 + B i Ai
6: end for
7: for m = 1 to M̃ do
(f )
8: pi 1 + dX̃i,m (t⇤ )e Ai
9: c convert(p, E) (converts d dimensional address to {1, ..., C})
10: Sc,n(c)+1 m, where n(c) is the number of elements in row c of S
11: n(c) n(c) + 1
12: end for
13: for m = 1 to M̃ do
d (b)
14: (bk )3k=1 get neighboring sub-boxes of X̃·,m (t⇤ ) s.t. bk 2 {1, ..., C}
15: for k = 1 to 3d do
16: for j = 1 to n(ck ) do
17: ` Sck ,j
(f ) (b)
18: v (X̃·,` (t⇤ ), X̃·,m (t⇤ ))
19: if v > 0 then
20: L L+1
(f ) (b)
21: ul u` + um
(f ) (b)
22: rl r` + rm
23: wl v
24: end if
25: end for
26: end for
27: end for
362
REFERENCES
[1] M. H. Holmes, Introduction to the foundations of applied mathematics, ser. Texts
in applied mathematics. Dordrecht, London: Springer, 2009.
[4] C. Robert and G. Casella, Monte Carlo Statistical Methods (Springer Texts in
Statistics), 2nd ed. Springer, 2005.
[6] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions, 2nd ed.
Wiley-Interscience, 3 2008.
[9] P. J. Green, “Reversible jump Markov chain Monte Carlo computation and
Bayesian model determination,” Biometrika, vol. 82, pp. 711–732, 1995.
[14] P. Smadbeck and Y. Kaznessis, “A closure scheme for chemical master equa-
tions,” Proc Natl Acad Sci USA, vol. 110, no. 35, 2013.
[16] D. F. Anderson, “A modified next reaction method for simulating chemical sys-
tems with time dependent propensities and delays,” The Journal of Chemical
Physics, vol. 127, no. 21, p. 214107, 2007.
[18] ——, “Multilevel hybrid Cherno↵ tau-leap,” Accepted for publication in BIT
Numerical Mathematics, 2015.
[23] H. Risken and T. Frank, The Fokker-Planck Equation: Methods of Solution and
Applications (Springer Series in Synergetics). Springer, 1996.
[24] N. Van Kampen, Stochastic Processes in Physics and Chemistry, Third Edition
(North-Holland Personal Library), 3rd ed. North Holland, 2007.
[28] A. Gelman and D. B. Rubin, “Inference from iterative simulation using multiple
sequences (with discussion),” Statistical Science, vol. 7, p. 457511, 1992.
[30] S. Brooks, A. Gelman, G. Jones, and X.-L. Meng, Eds., Handbook of Markov
Chain Monte Carlo (Chapman & Hall/CRC Handbooks of Modern Statistical
Methods), 1st ed. Chapman and Hall/CRC, 5 2011.
[31] M. Giorgio, M. Guida, and G. Pulcini, “An age- and state-dependent Markov
model for degradation processes,” IIE Transactions, vol. 43, no. 9, pp. 621–632,
2011.
APPENDICES
Appendix A
In this chapter, we present a minimal set of concepts and results from Probability
Theory and the Theory of Random Processes needed to understand the main results
of this thesis. A brief list of references can be [1, 2, 3].
Let ⌦ be a nonempty set such that its elements, ! 2 ⌦, are the possible outcomes of
a random experiment. Examples of random experiments are:
Example A.1.1.
Example A.1.2.
2. Pick a point at random in the unit interval and obtain a number larger than
0.3: A = (0.3, 1]
3. Flip a coin infinite times and obtain head in the first toss: A = {(Mi )1
i=1 : M1 =
H, Mi 2 {H, T }, 8i 2}
The outcome ! is not an event but {!} is. Let A be an event and !0 2 ⌦ be
the outcome of one particular realization of the considered random experiment. If
!0 2 A, we say that the event A ‘has happened’, else, we say that its complement,
Ac := {! 2 ⌦ : ! 2
/ A}, ‘has happened’.
(i) ⌦ 2 F
(ii) A 2 F implies Ac 2 F
(iii) (Ai )1 1
i=1 ⇢ F implies [i=1 Ai 2 F
(a) ; 2 F
(b) (Ai )1 1
i=1 ⇢ F implies \i=1 Ai 2 F
(ii) If (Ai )1 1
i=1 is a collection of pairwise disjoint events of F, then P ([i=1 Ai ) =
P1
i=1 P (Ai ).
(b) P (;) = 0
Definition A.1.5 (Conditional Probability). Let H 2 F such that P (H) > 0, i.e.,
an event of strictly positive probability. The function PH : F ! [0, 1] such that
PH (A) := P (A \ H) /P (H)
Definition A.2.2 (CDF). The Cumulative Distribution Function (CDF) of the ran-
dom variable X is defined as FX (a) = P (X a), 8a 2 R.
(b) FX (a) ! 0 as a ! 1
(c) FX (a) ! 1 as a ! +1
P
Observe that pX will be nonzero only at {xi } and i pX (xi ) = 1.
R +1
Observe that fX 0 and 1
fX = 1.
369
Definition A.2.5 (Independence of random variables). A family {Xi }i2I of random
variables, where Xi has CDF Fi , is said independent, if for every finite subset, J ⇢ I,
we have
Y
P (\ik 2J {Xik xik }) = Fik (xik ), 8{xik }ik 2J .
ik 2J
Definition A.2.7 (IID). A family of random variables is said independent and iden-
tically distributed (iid) if it is an independent family and all the members of the family
have the same CDF.
X
E [g(X)] := g(xi )pX (xi ).
i
Z +1
E [g(X)] := g(x)fX (x)dx.
1
Notice that we have convergence issues here since E [X] is well defined except in
the case ‘1 1’.
⇥ ⇤
Var [X] := E (X E [X])2 .
(b) Var [aX] = a2 Var [X] for any X random variables and a 2 R
P (h(X) a) E(h(X))/a.
By taking h(x) = ((x E [X])/ (X))2 and a = k 2 , we obtain the Chebyshev inequal-
ity:
P (|X E [X] | > k (X)) 1/k 2 ,
p
where (X) := Var [X] is the standard deviation of X.
Cherno↵ Bounds
(a) E [X] = p
n!
P (X=k) = pk (1 p)(n k)
, k = 0, 1, 2, . . . , n,
k!(n k)!
Let ⌦ = [0, 1]. Here the sigma-algebra F is defined as the intersection of all sigma-
algebras on [0, 1] containing the intervals of [0, 1]. F called the Borel sigma-algebra
generated by the intervals of [0, 1]. We say that U ⇠ U (0, 1) if U has PDF fU (x) =
1[0,1] (x). We have that
Fx (x) = x, 8x 2 [0, 1].
Proof:
For all u 2 [0, 1] and x 2 F ([0, 1]), we have that:
Therefore
{(u, x) : F (u) x} = {(u, x) : F (x) u}.
We conclude that,
P F (U ) x = P (U F (x)) = F (x).
It can be shown that the only continuous and positive random variable, T , that
satisfies this property is the exponential random variable.
We say that T ⇠ E( ) is exponentially distributed with rate > 0 when
(c) E [T ] = 1/ .
(d) Var [T ] = 1/ 2 .
Here we have to apply the definition of CDF and the independence among random
variables.
Z x ✓ ◆2 !
1 1 t µ
FY (x) := p exp dt.
1 2⇡ 2 2
375
A.3 Stochastic Processes
When, I = [0, +1), we write {Xt }t 0 and think of t as a time variable and X as a
function of time that evolves randomly. More specifically, X : [0, +1) ⇥ ⌦ 7! C, such
that X(t, ·) = Xt (·) is a random variable for any fixed t, and X(·, !) is a function of
t for any fixed !. In the last case X(·, !) : [0, +1) 7! C is the path of the process,
X, corresponding to the outcome !.
Let us consider a stochastic process, N , in [0, +1) taking values in {0, 1, 2, . . .}, such
that N (0) = 0 and
3. for any t1 < t2 < s1 < s2 , we have N (t2 ) N (t1 ) and N (s2 ) N (s1 ) are
independent random variables.
376
Here dt is an infinitesimal and o(dt)/dt ! 0 as dt ! 0. It can be shown that
for k = 0, 1, 2, . . . and for any h > 0. This is equivalent to say that N (t) ⇠ Poisson( t).
In general, X ⇠ P oisson( ) when
k
P (X=k) = exp ( ) /k!
for k = 0, 1, 2, . . ..
1. E [N (t)] = t.
2. Var [N (t)] = t
As a consequence, the inter-arrival times between two consecutive events in the su-
perposition of M and N are independent exponential random variables of rate µ + .
377
A.4 Convergence Concepts
ability to the random variable X if, 8 ✏ > 0, lim P (|Xn X| > ✏) = 0. We write
n!1
P
Xn ! X.
Definition A.4.4 (Lp convergence). Given a real number, p 1 and (Xn )+1
n=1 , and, X,
random variables such that E [|Xn |r ] < +1, 8n , and, E [|X|r ] < +1. The sequence,
(Xn )+1
n=1 , converges in Lp , to the random variable X if, lim E [|Xn X|r ] = 0. We
n!1
Lp
write Xn ! X.
Theorem A.4.5 (Strong Law of the Large Numbers (SLLN)). Let, (Xn )+1
n=1 be an
iid sequence of random variables such that E [|X1 |] < +1. Define µ := E [X1 ]. We
have that
X1 + X2 + · · · + Xn a.s.
! µ.
n
✓ ◆ Z x ✓ ◆
X1 + X2 + · · · + Xn nµ 1 1 2
lim P p x = p exp t dt
n!1 n 1 2⇡ 2
X1 +X2 +···+X n nµ
If we define, Zn := p
n
, and Z ⇠ N (0, 1), the CLT says that Zn ) Z.
This result justifies the following approximation: for large n we have that
X1 + X2 + · · · + Xn 2
⇡ N (µ, /n)
n
X0 = 1
X1 = Y1,1
···
P1
Let P (s) := i=0 pi si 8s 2 [0, 1] and m := E [Y ].
Define ⇡ as the extinction probability, i.e., ⇡ := P (9N : XN = 0).
If m1 then ⇡=1. If m>1 then ⇡<1 and is the unique non-negative root of P (s)=s,
which is less than 1.
379
A.6 The Monte Carlo Method
The Monte Carlo method was created to solve the problem of integration in high
dimensions, where the usual deterministic quadrature methods failed. It is based on
the law of the large numbers. Let us assume that we want to compute
Z
I := f (x)dx.
[0,1]d
The random variable Iˆ has expectation I since the probability density function of X
is 1[0,1]d (x). The variance of Iˆ is M 1/2
Var [f (X)]. If instead of sampling the ran-
dom vectors X(!m ), we carefully choose a deterministic sequence x1 , x2 , . . . , xM , the
1 PM ˆ Depending
average f (xm ) give us a quasi-Monte Carlo approximation of I.
M m=1
on the regularity of f , we can obtain a convergence rate proportional to M 1 .
To implement the Monte Carlo method, we should have at hand a random number
generator (RNG). In general, a RNG is a recipe that generates finite deterministic se-
quences of numbers in the interval [0, 1] which pass a number of statistical hypothesis
tests of uniformity. We also should have ways of sampling random variables with spe-
cific distributions from uniform random variables in [0, 1] such the Inverse Transform
Method (see Section A.2.4). For a general exposition of the Monte Carlo method in
statistics we refer to [4].
380
A.7 Multilevel Monte Carlo
M
1 X
✓ˆ := X(!m )
M m=1
of E [X], we define another estimator which uses a control variate Y , correlated with
X, where E [Y ] is known. In fact, we assume that we can generate pairs (X(!), Y (!))
in such way that the cost of generating Y (!) is less than the corresponding cost of
generating X(!).
M
ˆ 1 X
✓2 := E [Y ] + {X Y )(!m ).
M m=1
h i h i
If Var [X Y ] < Var [X], we have that Var ✓ˆ2 Var ✓ˆ .
Observe here that if we do not know E(Y ), we can use a third unbiased estimator
of E [X]:
M0 M1
1 X 1 X
✓ˆ3 := Y (!m ) + {X Y )(!m ).
M0 m=1 M1 m=1
Here X Y is computed from the sampled pair (X, Y ); it means that X and Y are
not independent in general, and even more, they should be highly correlated.
Let us assume now that we have a hierarchy of L levels of approximation for
the random variable X, that is Y (0) , Y (1) , . . . , Y (L) are random variables, possibly
obtained by discretizing some dimension of the domain of definition of X, for instance,
381
if X is continuously defined in [0, T ], we can think of (Y (`) )L`=0 , as a hierarchy of
discretizations of X using a nested family of time-meshes with decreasing size. The
last reasoning can be extended to define:
M0 L M
1 X X 1 X̀ (`)
✓ˆL := Y (0) (!m ) + {Y Y (` 1)
}(!m ),
M0 m=1 `=1
M ` m=1
where Y (L+1) := X.
To compute ✓ˆL , one should be able to sample from Y (0) and from the pairs
h i
(Y (` 1) , Y (`) ) for ` = 0, 1, 2, . . . , L. The expected value of ✓ˆL is E [X] but Var ✓ˆL is
⇥ ⇤ L ⇥ ⇤
Var Y (0) X Var Y (`) Y (` 1)
+ .
M0 `=1
M`
In general, by using the same computational work for computing ✓ˆ and ✓ˆL , we find
h i h i
that, Var ✓ˆL < Var ✓ˆ . A review of Monte Carlo methods is given in [5].
REFERENCES
[1] S. Resnick, A Probability Path, 1st ed. Birkhäuser, 10 1999.
[2] E. Çinlar, Probability and Stochastics (Graduate Texts in Mathematics, Vol. 261),
1st ed. Springer, 2 2011.
[3] A. N. Shiryaev, Probability (Graduate Texts in Mathematics) (v. 95), 2nd ed.
Springer, 12 1995.
[4] C. Robert and G. Casella, Monte Carlo Statistical Methods (Springer Texts in
Statistics), 2nd ed. Springer, 2005.
[5] M. Giles, “Multilevel Monte Carlo methods,” Monte Carlo and Quasi-Monte
Carlo Methods, pp. 79–98, 2012.
382
Simulation Algorithms:
Statistical Inference: