Sei sulla pagina 1di 11

Vol. 00 no.

00
Pages 111

Reverse Engineering of Gene Regulatory


Networks for Incomplete Expression Data:
Transcriptional Control of Haemopoietic
commitment
Kristin Missal a , Michael A. Cross b , Dirk Drasdo c,d
a

Bioinformatics Group, Department of Computer Science, University of Leipzig,

Hartelstrae
16-18, D-04107 Leipzig, Germany, b Interdisciplinary Center for Clinical
Research and Division of Hematology/Oncology, Inselstrae 22, D-04103 Leipzig,
Germany, c Interdisciplinary Center for Bioinformatics, University of Leipzig,

Hartelstrae
16-18, D-04107 Leipzig, Germany, d Max-Planck-Institute for
Mathematics in the Sciences, Inselstr. 22, D-04103 Leipzig, Germany.

ABSTRACT
Motivation: The identification of the topology and function of
gene regulation networks remains a challenge. A frequently
used strategy is to reconstruct gene regulatory networks from
time series of gene expression levels from data pooled from
cell populations. However, this strategy causes problems if the
gene expression in different cells of the population is not synchronous, as is expected to be the case in the transcription
factor network that controls lineage commitment in haematopoietic stem cells. Here, a promising alternative may be to
measure the gene expression levels in single cells individually.
The inference of a network requires knowledge of the gene
expression levels at successive time points, at least before
and after a network transition. However, due to experimental
limitations a complete determination of the precursor state is
not possible.
Results: We investigate a strategy for the inference of gene
regulatory networks from incomplete expression data based
on dynamic Bayesian networks that permits prediction of
the number of experiments necessary for network inference
depending on noise in the data, prior knowledge, limited
attainability of initial states and other inference parameters.
Our inference strategy combines a gradual Partial Learning
strategy only based on true experimental observations for
the network topology with expectation maximization for the
network parameters. We illustrate our strategy by extensive
computer simulations in a high-dimensional parameter space
on the network inference in simulated single-cell-based experiments during haematopoietic stem cell commitment. We find
for example that the feasibility of network inferences increases
significantly with the experimental ability to force the system
to

whom correspondence should be addressed

c Oxford University Press .


into different initial network states, with prior knowledge and


with noise reduction.
Availability: Source code is available from the authors upon
request.
Contact: drasdo@izbi.uni-leipzig.de

1 INTRODUCTION
Regulatory networks are frequently inferred from time series
of gene expression levels obtained from micro-array experiments in which the gene products of a large number of cells
from a population are pooled at successive time points (e.g.
[34, 3, 37]), such that the experimentally observed expression
state represents an average over the subpopulation. This strategy may cause serious problems if the expression patterns in
different cells of the population are non-synchronous, such
that the average values and time development of the gene
expression levels do not represent the true behavior within an
individual cell (compare also [3]). This situation is relevant
to haematopoietic stem cells undergoing lineage commitment
[8]. Haematopoietic stem cells are the progenitors for all of
the different types of human blood cells, which develop from
the stem cells by progressive specialization or differentiation. The choice of a specific cell lineage is believed to be
determined by interactions between a relative small number
of lineage-associated transcription factors and their corresponding genes [8, 20].
For cases such as these, experimental procedures have been
developed that permit the determination of gene expression
levels in single cells individually [5]. They involve (i) the
generation of cDNA sequences corresponding to mRNA of
expressed genes by reverse transcription and (ii) the amplification of the cDNAs by polymerase chain reaction (PCR).

submitted

The major drawback of the procedure for modelling purposes is that the cell is destroyed during mRNA extraction, so
that the full gene expression pattern within one cell can be
measured only once. On the other hand, in order to uniquely
infer the topology and rules of the underlying gene regulation network the previous network state within the cell, i.e.,
the network state prior to the transition that leads to network
state at the moment of cell destruction, must also be known
to a large extend. One way to gain knowledge of two successive network states is to up- or down-regulate the expression
level of individual genes experimentally by introduction of
recombinant genes or specific inhibitory molecules into the
cell [17, 25] followed by a complete monitoring of the network state after a transition has occurred. Unfortunately,
the number of gene expression levels that can be adjusted
simultaneously in this way is currently limited practically
(typically to just one or two genes), hence the knowledge of
the previous (manipulated) network state is largely incomplete. This makes the experimental strategy feasible only for
relatively small gene regulation networks, as is assumed to
be the case for haematopoietic lineage commitment, or for
sub-networks of larger networks.
In this paper we use in-silico simulations to study the efficiency and reliability of reverse engineering of gene regulation networks from transition data which are largely incomplete.
As a guide for the network parameters, we use the example of haematopoietic lineage commitment (see Fig. 1). The
main motivation for our simulation study was to estimate the
approximate number of experiments necessary for a reliable
inference of small gene regulation networks from experiments on state transitions. For this purpose we generated
artificial expression data from Boolean networks (BoolN,
[22]) by computer simulations, used reverse engineering strategies to infer networks from the data, and compared the
inferred networks to those that we originally used to generate the data. Similar procedures have been proved useful for
reverse engineering methods [24, 1, 36, 32], as well as for
example in sequence alignment (e.g. [11], [23]). The use of
BoolNs is undoubtedly an oversimplification in many biological situations [10] but it is noteworthy that despite of their
simplicity and shortcomings, Boolean networks have been
successfully used to model the gene regulatory network in
a number of biological systems as e.g. Drosophila melanogaster [2]. Furthermore, Liang et. al. [24] have presented a
reverse engineering algorithm (REVEAL) that allows the inference of a unique BoolN from a time series in which all states
can be measured. Using REVEAL the full network (for arbitrary Boolean rules) can be constructed from gene expression
data only if the number of known gene expression levels from
the precursor network state is larger than or equal to the number of input states of the gene that has the largest number
of input states (Missal and Drasdo, unpubl.). In the situations studied in our simulations, this condition is not fulfilled

at all. Moreover, expression profiles of genes which could


serve as a guide to substitute missing gene states [35] are not
available, as is often the case in small networks. We have therefore adopted an alternative strategy. We use the framework
of reverse engineering methods of dynamic Bayesian networks (DBN) [4, 15, 19, 18, 28, 29, 30, 31, 36, 37] to identify
the most probable model given the artificially generated data,
i.e. our inferred networks are represented by DBNs (Fig. 1b).
DBNs use a probabilistic approach and are particularly suited to situations in which (1.) the degree of uncertainty is high
as a consequence of the inherent stochasticity of the biological processes during gene regulation, measurement errors, or
inconsistencies in the data, [30, 29] (2.) variables may be
missing, as would be the case if not all influences are known
[4] and (3.) inference is to be improved by the inclusion of
prior knowledge [18].
A commonly used strategy to infer the network topology and
parameters from incomplete data is to substitute the missing expression levels with their expected values and then
re-estimate the maximum likelihood parameters and the network structure, assuming the expected values to be the true
observations. This method is known as Structural Expectation Maximization (SEM) [13, 16]. However, extensive
computer simulations strongly suggest that SEM is not the
most suitable reverse engineering strategy in cases in which
the state of many elements is missing (Missal and Drasdo,
unpubl.).
In this paper we therefore focus on a gradual Partial Learning (PL)-strategy which is based solely on observed experimental data [7, 28]. Since Boolean networks are a subclass
of dynamic Bayesian networks [28], the PL-strategy includes
REVEAL as a special case.
The main aim of the in-silico experiments reported here is to
study how the inference of gene regulatory networks depends
on the degree of lack of knowledge concerning the transition states, on the parameters of the PL-algorithm, on the
number of experiments and on prior biological knowledge in
order to design an optimal inference strategy. We find that the
required number of experiments using the currently available
technology is large but feasible given sufficient human and
financial resources.
This paper is organized as follows. In Section 2 we give a
detailed description of the model we use to generate the transition data, of our reverse engineering strategy, and define
how we quantify our inference results. Section 3 describes the simulation results on small artificial gene regulation
networks.

2 MODELING AND REVERSE ENGINEERING


METHODS
The transcription factor network in vitro and in silico:
We have generated gene expression data in-silico from artificial gene regulation networks. The network topology was

submitted

Fli1

c/ebpa[0]

c/ebpa[t1]

c/ebpa[t]

pu.1[0]

pu.1[t1]

pu.1[t]

gata1[0]

gata1[t1]

gata1[t]

gata2[0]

gata2[t1]

gata2[t]

scl[0]

scl[t1]

scl[t]

elf1[0]

elf1[t1]

elf1[t]

Myb

+ 1

PU.1

GATA1

NFE2

1
+

+
MafK

1 +/

EKLF

Elf1

+
1
SCL

GATA2
+
+

C/EBPa

2
3

G(0)

a)

b)

Fig. 1. (a) A gene regulation model for control of lineage commitment in haemopoietic stem cells. Labels are explained in the text. (b) DBN
modelling the process of gene regulation involving influences with label 1 in (a). The random variables X[t] = {c/ebpa[t], pu.1[t], gata
1[t], gata 2[t], scl[t], elf 1[t]} indicate the concentrations of corresponding transcription factors at time t. G (0) is the start structure
specifying the conditional independence assumptions over initial states X[0]. In case X[0] are randomly distributed no correlations in G (0)
are observed and hence no arcs are given. G is the transition structure representing conditional independence assumptions between state
transitions. We assume for simplicity that gene regulation is a Markovian and static process, hence the transition structure models only the
dependencies between two succeeding time steps t 1 and t1 .

specifically chosen according to alternative hypotheses on


how the haemopoietic transcription factor network may look
like based on a variety of published expression and regulation data (e.g. [20]) (Fig. 1a). The hypothesis involves 11
transcription factors from which six (marked by grey background colours in Fig. 1a) form a core network which is
not affected by the state of the other five transcription factors.
Published data has also been used to assign each hypothetical
interaction to one of four probability groups: highly probable
(label 1), probable (label 2), un-probable (label 3), and possible but not experimentally examined (label 4). Note that this
classification concerns only the occurrence and not the quality (the type) of the influences.
We use BoolNs to represent the gene regulation networks.
The probability pi , i {1, 2, 3, 4}, for the insertion of an
edge (a link between two elements or a line which denotes
an influence of the element on itself in Fig. 1) is chosen
according to the labels: p1 = 0.9, p2 = 0.66, p{3,4} = 0.5.
Edges not shown in Fig. 1 correspond to relationships for
which we could find neither direct experimental evidence nor
inference from the published literature. In order to test the
robustness of our simulation results against modifications we
also performed simulations for which we introduced these
edges with probability 0.1, but this resulted in only negligible differences. Next the quality of the regulatory influence

has to be specified. To this end, repressors (labeled with - in


Fig. 1a) were assumed to act dominantly [6] and were modelled by an AND-relation, while activators (labeled with + in
Fig. 1a) which may act either competitively or synergistically
[26] were modelled by AND or OR functions. The resulting
Boolean rules are canalizing functions2 , which are believed
to represent the biologically relevant rules [22]. A Boolean
function is defined as canalizing if at least one state of at
least one input element guarantees at least one state of an output element alone, i.e., independent of the states of all other
input elements [22, 33]. For example, the number of canalizing functions kcan (k) for Boolean functions with k inputs is
kcan (1) = 2, kcan (2) = 8, kcan (3) = 88, kcan (4) = 3104 etc.
while the total number of Boolean functions with k inputs is
k
22 .
The initial value of each element is chosen randomly with
equal probability from the set {0, 1}. We studied the situation where each initial network state is chosen only once (in
order to avoid redundancy) and compared it to the situation
where the initial network states where chosen randomly. The
non-redundant case allows us to uniquely identify the number
of different initial network states necessary to infer the network topology and parameters while in the redundant case
the same initial network states will appear repeatedly. Note,
2

might also have correlations between variables of one time slice.

This was achieved by investigating the Boolean functions of 1000 randomly created Boolean networks - according to Fig.1a.

submitted

redundant network states of a network with N elements


if M

are given, they include M 2N (1 (1 (1/2)N )M ) unique network states, so that for N = 6 and M = 64 at least
400 redundant network states must be sampled to
M
observe more than 63 different initial states. According to
the experimental feasibility we assume that either n = 1 or
n = 2 elements can be jointly manipulated. I.e. for our inference strategies we assume that the states of either one or two
of the N elements are known before transition. The network
state that evolves after one transition according to the Boolean network rules is assumed to be completely known since
it can be completely determined experimentally.
Reverse engineering method: The general modelling
scheme of the reverse engineering approach PL to infer the
topology of the DBN comprises the following steps:
(i) Initial determination of model parameters and topology
of the DBN. {See lines 1-2 Alg. 1.}
(ii) Optimization of structure
(a) Generation of a set of successors of the DBN by
inserting edges. {See line 7 Alg. 1.}
(b) Scoring all successors locally. A local score value evaluates in how far element Xi [t] is dependent from its
putative parents Pa(Xi [t]). {See lines 8-10 Alg. 1.}
(c) Continue with the best scoring successors, i.e. Hill
climbing. {See line 11 Alg. 1.}
(iii) Optimization of parameters
(a) Given an optimal topology of the DBN, parameters iji ki
are estimated by expectation maximization (EM) [9, 13].

Algorithm 1 : PL
1: Initialize Ginit with empty structure and init with randomly

chosen parameters
2: Set current optimal model to Gopt := Ginit and opt := init
3: Optimization of structure:
4:
Let n be the maximal number of elements Xi [t 1]
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:

whose expression states are jointly known


for l := 1 to l = n do
for each combination Pa of l elements Xi [t 1]
whose expression states are jointly known do
Construct successors G1 , . . . , Gm by inserting
edges from Pa to Xi [t], i
for all Gi such that 1 i m do
Compute local score of Gi {see equation 1}
end for
Gopt := Gopt {Gi : Score(Gi ) > 0}
{see equations 2, 3 and 4}
end for
end for
Optimization of parameters:
opt := EM (Gopt , opt , D) {see [9, 13]}

which will usually only represent a subset of all vertices.


By now applying the chain rule of probability the joint
distribution can be written in the form:
P (X[0], . . . , X[T ]) = P (X[0])

T
Y

P (X[t]|X[t 1], . . . , X[0])

t=1

N
T Y
Y

P (Xi [t]|Pa(Xi [t]))

t=0 i=1

A dynamic Bayesian network DBN = (G, ) consists of


a set of graphs G = (G(0) , G ) and a set of parameters
= ((0) , ) (Fig. 1b). G(0) denotes the start structure, (0) the set of parameters of G(0) , G the transition
structure, and the set of parameters of G . In the following z {0, } summarizes the notation for the start and
transition model, where 0 denotes the start, and the
transition model.
Gz is a directed acyclic graph whose vertices correspond to
random variables X[t] = {X1 [t], . . . , XN [t]} such that t = 0
for G(0) and 1 t T for G . N is the fixed number of
variables in a time slice t. z is a set of conditional probability distributions i [t] = P (Xi [t]|X[t 1], . . . , X[0]) for
each random variable Xi [t] in Gz . Gz expresses conditional
independence assumptions for each variable Xi [t]. Inherent
to Bayesian networks, one assumes Gz to obey the Markov
assumption, namely, that each random variable is independent of its non-descendants given its parents Pa(Xi [t]) in
Gz . Due to the Markov assumptions the i [t] decompose into
distributions iji ki [t] = P (Xi [t] = ki |Pa(Xi [t]) = ji ) that
denote the probability that Xi [t] = ki given that its parents
Pa(Xi [t]) in Gz have state ji (note that the vector ji contains
the state of each parent of element Xi [t]).
Knowing the topological structure of the graph Gz permits to
denote the state of an element by the states of its parents,

The joint probability distribution of the DBN represents the


trajectories of the process. For simplicity we assume that the
process of gene regulation is Markovian in time, i.e., parents
can only be variables from the previous time slice, and homogeneous in time, meaning that the transition probabilities are
time-independent. Under these assumptions only two successive time slices need to be considered [16]. Since the initial
state of each element of a network is chosen randomly, the
start structure of the DBN cannot contain correlations. We
therefore focus below on the transition structures only and
drop the index .
The main problem of reverse engineering from expression
data of single cells is the lack of information concerning the
expression levels in the previous transition network state. We
use PL to deal with this problem.

PL: This approach was first introduced in [7] to analyze the


structure in multivariate systems and firstly linked to gene
regulation networks in [28].
Subsets of parents with significant impact on the target gene
Xi at time t (note that the expression level of gene Xi at time
point t corresponds to the values of random variable Xi [t])

submitted

are identified based on the mutual information [24, 12]


M I(Xi [t], Pa(Xi [t])) =

ri
X
X

P (Xi [t] = ki , Pa(Xi [t]) = ji )

ki =1{ji [t]}

log2

P (Xi [t] = ki , Pa(Xi [t]) = ji )


. (1)
P (Xi [t] = ki )P (Pa(Xi [t]) = ji )

This quantifies the impact of the potential regulatory input


genes Pa(Xi [t]) on the target gene i in time slice t. ri
is the number of states of the element i, which does not
change in different time slices. {ji [t]} denotes the set of possible state vectors of its parents at time t where {ji [t]} =
{j1;i , . . . , jqi [t];i }. Here, qi [t] is the number of possible state
vectors of its parents at time t, and jl;i is the l-th state vector
of these parents. If the number of parents of element i is 6,
then qi [t] = 26 . The probabilities P () are estimated from the
relative frequencies of the occurrence of the event .
A mutual information significantly larger than zero refers
to a regulatory influence with high probability and justifies the introduction of directed edges into the model
topology from the genes in Pa(Xi [t]) to Xi [t], while
M I(Xi [t], Pa(Xi [t])) = 0 suggests the set Pa(Xi [t]) has
no regulatory influence on Xi [t].
The mutual information
PT score
PNof a DBN is decomposable:
log M I(G : D) = t=1 i=1 log M I(Xi [t], Pa(Xi [t]))
and hence the maximal score is achieved by adding all edges
for which M I(Xi [t], Pa(Xi [t])) > 0.
We use a 2 -independence test to quantify whether or not
a regulatory influence is highly probable [27]. Our NullHypothesis is that H0 : M I(Xi [t], Pa(Xi [t])) = 0 and our
test quantity is
M I(Xi [t], Pa(Xi [t]))M ln 4 2(ri 1)(qi [t]1);1 , (2)
where M is the number of observed transition samples. In
the resulting model structure there should be edges between Pa(Xi [t]) and the target gene Xi [t] if and only if the
regulatory relation exists with high probability. In our simulation we studied significance levels of [105 , 0.1].
In order to assess whether the addition of an element to a
set of elements which has already been identified to have
a significant regulatory impact leads to a significant increase of the regulatory impact, a further 2 -independence
test is performed. For this test the Null hypothesis is
H0 : M I(Xi [t], Pa(Xi [t])) M I(Xi [t], PaS (Xi [t])) =
0 with a test quantity M (M I(Xi [t], Pa(Xi [t]))
M I(Xi [t], PaS (Xi [t])))M ln 4 which has been shown to
follow also approximately a 2 distribution [7]

added to the final list of parents of element Xi [t].


After having learned a high probable structure of the DBN
a full expectation maximization (EM) iteration is applied to
obtain parameter estimates iji ki [9, 13]3 .
PL and prior knowledge: We modify the PL reverse engineering strategy to identify the topology such that edges are
now identified by a likelihood ratio test. This enables us to
direct incorporate prior knowledge by treating prior probabilities of the occurrence of a regulatory influence between each
gene and its potential regulatory input genes as additional
observations. The likelihood ratio LR evaluates the posterior
distributions as evidence in favor or against the hypotheses
H0 , that two random variables are independent, in regard
to the hypotheses H1 , that these two random variables are
dependent:

LR :=

P (D|H0 )
=
P (D|H1 )

ri
Q

P (Xi [t] = ki )Nki [t]

ki =1

P (Pa(Xi [t]) = ji )Nji [t]

{ji [t]}

P (Xi [t] = ki , Pa(Xi [t]) = ji )Nki ji [t]

{ki {ji [t]}}

Nki ji [t] is the number of observed transition samples where


Xi [t] = ki and Pa(Xi [t]) = ji , Nki [t] is the number of
observations where Xi [t] = ki and Nji [t] is the number of
observations where Pa(Xi [t]) = ji .
A likelihood ratio bigger than 1 is evidence in favor of
H0 while a likelihood ratio smaller than 1 is interpreted as
evidence in favor of H1 . The prior distributions are taken
directly from the probabilities that an edge occurs in our
hypothetical transcription factor network. E.g. if a regulatory relation of an input gene and its target gene is labeled
with 1 in Fig. 1a then the prior probability that this relation does not exist in the true network is P (H0 ) = 0.1,
while the prior probability that it does exist is evaluated with
P (H1 ) = 0.9. Quantifying the impact of a set of potential
regulatory input genes Pa(Xi [t]) on a target gene Xi [t] by a
likelihood ratio is equivalent to characterizing the impact by
a mutual information [27].:
2LR(Xi[t], Pa(Xi [t])) = M ln(4)M I(Xi [t], Pa(Xi [t])) .
Hence the strength of evidence in favor of hypothesis H0 can
again be quantified by an 2 -independence test reducing the
method to PL with prior knowledge:

M 2df (Xi [t],Pa(Xi [t]))df (Xi [t],PaS (Xi [t]));1D . (3)


The set Pa(Xi [t]) contains one additional element than its
subset PaS (Xi [t]). Both sets, PaS (Xi [t]) and Pa(Xi [t])
denote sets of potential parents of element Xi [t]. Only if H0
for M is rejected, the additional element of Pa(Xi [t]) is

P (H0 |D)
P (D|H0 )P (H0 )
=
P (H1 |D)
P (D|H1 )P (H1 )

2LR(Xi [t], Pa(Xi [t])) 2(ri 1)(qi [t]1);1 .

(4)

For this step we used the EM implementation of the LibB toolkit [14].

submitted

Assessment of parameters: To assess the learned model


parameters we calculate the relative entropy [19, 12]
Hrel (Q, P ) =

Q(X[1], . . . , X[T ]) log 2

x[1],...,x[T ]

ri
T X
N X
X
X

Q(X[1], . . . , X[T ])
P (X[1], . . . , X[T ])

P (Xi [t] = ki , Pa(Xi [t]) = ji )

t=1 i=1 ki =1 {ji [t]}

log 2

Q(Xi [t] = ki |Pa(Xi [t]) = ji )


.
P (Xi [t] = ki |Pa(Xi [t]) = ji )

(5)

This measure provides an evaluation of how well the model


is able to explain an independent test data set. If a model
explains the data well, Hrel (Q, P ) is small. Q() are estimated from the relative frequencies observed in the test data set
generated from the original network and P () are the learned
parameters of the DBN.
Classification of simulation results: We introduce the fidelity fk defined as the fraction of correctly identified input
elements to quantify the similarity of the network structure
of the inferred DBN and the Boolean network. fk = 1 means
that those and only those elements for that particular k have
been correctly identified. We study how the fidelity fk behaves with the number of state transition measurements M . To
evaluate the inferred structure we further calculate the sensitivity Sk which is defined as the ratio of parents of Xi [t]
in the BoolN which are also parents of Xi [t] in the (inferred) DBN, and the positive predictive value P P Vk which is
the ratio of parents of Xi [t] in the inferred network which
are also parents of Xi [t] in the BoolN. Finally we calculate the Hamming distance as a distance measure between
the inferred and the true networks. The Hamming distance is
zero if and only if the inferred and the true network have the
same topology. If some elements have not been found, or if
false positive elements have been learned, then the Hamming
distance is > 0.
Simulated initial conditions: We studied (i) situations in
which all initial network states were in principle accessible
and (ii) situations, in which only a fraction of initial network states were accessible due to limitation in the number
of potential perturbations of the network states.

3 RESULTS OF IN-SILICO SIMULATIONS


3.1 Random initial network states
We have performed extensive computer simulations to test
the parameter dependency of the network inference. We studied fk , Sk and P P Vk as functions of k, M , , D , prior
vs. non-prior knowledge, redundancy vs. non-redundancy
of the network states chosen, N , n and noise vs. no noise.
In case of noisy data the known states before transition is
assumed to have an error rate of 1% and after transition of

5%. The errors subsume errors from manipulation (usually


less than 1% since genetically manipulated cells may be sorted and confirmed by the co-expression of plasmid-derived
fluorescent-protein genes), and from measuring gene expression. In the latter the noise can be largely controlled by the
design of the experiment: to increase the reliability of measurements in case the expression level is low (such that it
may be missed) the absolute level of mRNA available can be
increased by a sufficiently stung amplification of cDNA from
single cells using PCR [21].
A family of typical plots of fk (M ), Sk (M ) and P P Vk (M )
are shown in Fig. 2 in case of prior and no prior knowledge
(the parameters in this plot are = 0.01, n = 1, noise and
no redundancy). The fidelity fk reveals a threshold behavior
as a function of log10 (M ) and saturates at fksat . fksat reflects
the degree to which a network topology can be learned. Here,
fksat = 1 meaning that for those Boolean functions compatible with the potential hypotheses on haematopoietic lineage
commitment each element transmitted information individually, i.e. M I(Xj [t], Xi [t + 1]) > 0 for each input element
Xj [t]. Note, that this is generally not the case for canalizing functions. Only k1 (k) of the kcan (k) Boolean functions
exist, where each of the input elements transmit information
on the output element individually with k1 (1) = 2, k1 (2) = 8,
k1 (3) = 64, k1 (4) = 1888. Consequently for networks that
result from an arbitrary choice of canalizing functions at most
k1 (k) of kcan (k) elements can be inferred from the data in
case n = 1, i.e. fksat;n=1 = k1 (k)/kcan (k) 1 (Missal and
Drasdo, unpubl.).
As shown in Fig. 2, the sensitivity Sk and the positive predictive value P P Vk also increase to one. The effect of prior
knowledge is that the network is inferred at lower M (Fig.
2a). The simple threshold behavior of fk (M ) which may well
be fitted by
1
fk (M ) = fksat (1 + tanh((log10 (M ) A1 )/A2 ))
2

(6)

suggests to quantify the inference by the threshold Mth


which we defined as the number of experiments necessary
to obtain a fidelity of 0.9fksat , i.e. fk (Mth ) 0.9fksat
Mth = fk1 . fksat , A1 , A2 are fit parameters. The resulting
plot (Fig. 3) suggests Mth may be approximated by
Mth A exp(k) ,
where depends on the inference parameters. This permits
estimation of Mth for larger k given Mth at smaller k = 2, 3 is
known. The shape for fk=1 (M ) often deviates slightly from
the shapes for fk>1 . Below we summarize our main findings:
(1) M th decreases if n is increased or if prior knowledge is
used. These tendencies are observed for all [105 , 0.01].
Decreasing increases M th via A both in the presence and
in the absence of noise.
(2) Mth is shifted towards slightly larger values for redundancy, which is the expected experimental situation (Fig. 3).

submitted

Sensitivity ncommit=1

0.8

0.8

0.6

no prior and prior, noise, =0.01

0.6

Sk

fk
0.4

0.4

0.2

0.2

0
10

(a)
100

1000

(b)

0
10

100

1000

PPV ncommit=1

no prior and prior, noise, =0.01

PPVk

PPVk

0.9

0.98

0.8

0.96

0.7

0.6

0.995

0.94
0.92

0.5

0.9
10

n=2, =0.01, D=1e-05, prior


1

(c)
M

100

n=2, =0.01, D=0.01, no prior


n=2, =0.01, D=0.01, prior (d)

0.4
1

Fig. 2. The parameters of plots (a)-(c) are = 0.01, n = 1, noise and no redundancy. (a) fk vs. log(M ). For e.g. M = 100 we have
performed 100 in-silico experiments where the 1st gene has been fixed randomly to either 0 or 1, 100 experiments, where the 2nd gene has
been fixed etc. The dotted lines denote inference including prior knowledge, the full lines without prior knowledge each for k = 1 (circles),
k = 2 (squares), k = 3 (diamonds), k = 4 (triangles up), k = 5 (triangles down). Each point represents an average over 1000 networks. f k
shows a threshold behavior as a function of log(M ) and saturates at sufficiently large values of M . The larger k is, the larger is the value
of M at which saturation is observed. (b) Sk vs. log(M ). For large M , Sk saturates at Sksat = 1 for all k. The saturation occurs earlier
with decreasing values of k. Prior knowledge results in an earlier saturation. (c) P P Vk vs. log(M ). The ppv is already large at M = 10 and
converges to P P Vksat = 1 for all k. For large k the convergence is much faster (at small M ) than the saturation of fk and Sk . That is, those
parents that are found by the DBN are indeed also parents in the original BoolN, i.e., they are true positives. (d) Condensed information on
P P Vk . The parameters are = 0.01, n = 2, noise and no redundancy. The bars denote the 25%, 50% and 75% quantils. For D = 0.01
and no prior knowledge the P P Vk rapidly converges to 1 for all k. However, in case of prior knowledge and too large D , the P P Vk
converges to 0.4. A smaller value of D is required to restrict the fraction of false positive parents (inset in (d)).

Redundancy also shifts the saturation of Sk (which is still


at 1) towards larger values of M for both the presence and
absence of prior knowledge. However, the shifts are signifi
cantly smaller that suggested by M(M
).
(3) For redundancy, a perfect inference is not achieved if
is chosen too large. In our computer simulations we found at
= 0.01 and n = 1 that the fidelity saturates at fksat 0.9.
In this case the P P Vk does not saturate at P P Vksat = 1 at
small k (we found P P V5sat = P P V4sat 1, P P V3 = 0.99,

P P V2 = 0.98, P P V1 = 0.95). If is decreased to 0.001,


both, the fidelity and ppv again saturate at 1. I.e., for too large
the DBN finds false positive edges for small k hence has
to be chosen sufficiently small.
If noise can be kept small the has not to be chosen too small
(note that small increase M th ). Generally, noise reduction
improves M th and often the quality of inference.
(4) For n = 2 and fixed , M th increases with decreasing D .
(5) The addition of prior knowledge in a noisy situation can

submitted

10

10

10

n=1, =0.01, no prior


n=1, =0.01, prior
n=2, =0.01, D=1e-05, no prior
n=2, =0.01, D=1e-05, prior
n=1, =0.01, no prior, redundancy

10

n=1, =0.01, no prior, redundancy


n=1, =0.001, no prior, redundancy
n=1, =0.001, prior, redundancy
n=2, =0.001, D=1e-05, no prior, redundancy
n=2, =0.001, D=1e-05, prior, redundancy

Mth

Mth

10

10

(a)

10 1

(b)

10 1

Fig. 3. (a) Mth vs. k in case of noise and no redundancy in simulation data. Mth , defined as the number of experiments at which fk =
0.9fksat , increases approximately exponentially fast with k. This permits to conclude Mth for larger k. All data points are calculated without
redundancy and as averages of 1000 network samples. (b) Mth vs. k in case of noise and redundancy in simulation data. Explanation see
text. Note that in case of n = 1, = 0.01 the fidelity does not converge to 1. In section 3.2 we chose = 0.001, because then the fidelity
saturates again at 1 for each k at still a high sensitivity at smaller M .

lead to an even smaller P P Vk -value than under (2) (Fig. 2d).


For example, the saturation value of P P Vk in case of noise,
n = 2 and prior knowledge at = 0.01 and D = 0.01 is
P P V1 0.43, while without prior knowledge P P V1 = 0.95
(Fig. 2d). In the absence of noise in the data (which is
never the case in experiments) the P P Vk rapidly saturates
at P P Vksat = 1 for otherwise the same parameters.
For n = 1 and otherwise the same parameter values (noise,
prior knowledge, = 0.01), the P P Vksat = 1k, i.e. the
found edges are much more reliable. The reason is that for
n = 1 the prior knowledge helps to select the correct edges
while for n = 2, even if the correct edges have already be
learned, further edges are added in order to fit the noise in the
data (overfitting). Accordingly this overfitting can be compensate for by choosing a sufficiently small D to ensure that
the learned network topology does not contain false positive
edges.
Fig. 4a shows the Hamming distance for = 0.01 in the
presence of noise and without redundancy. The Hamming
distance monotonically decreases to zero meaning that the
inferred and the true network topology are almost identical.
If for example n = 1 the topology at M 250 (at which
the network parameters have already largely been learned,
see below) has on average 2 missing edges in case of prior
knowledge and 3 missing edges in case of no prior knowledge (see vertical line in Fig. 4a). The characteristics of fk ,
P P Vk and Sk in Fig. 2(a-c) shows that at M 250 only
elements with k 3 have undetected parents. For n = 2 and
prior knowledge M 1500 experiments are necessary until
the topology has been almost completely learned, for n = 1

and prior knowledge M 2000, and for n = 1 in absence


of prior knowledge, M 2600 (horizontal line in Fig. 4a,b).
Prior knowledge improves the convergence.
Parameter learning: We used expectation maximization to
learn the correct parameters and quantify the results by the
relative entropy. Fig. 4b shows the result for = 0.01,
n = 1, 2, noise, redundancy, prior and no prior knowledge.
The relative entropy (see equ. 5) is a cumulative measure and
increases with the number of parents (since the number of
summands increases with the number of parents). In the case
of prior knowledge, the parents are learned more quickly, so
that the number of parents at a given M is larger than in the
absence of prior knowledge. This explains why Hrel is larger
with than without prior knowledge (we have checked that the
individual summands are not larger with than without prior
knowledge). Moreover, in case of n = 1 the relative entropy
increases rapidly, regardless of whether prior knowledge is
given or not: since due to the large number of missing values
the landscape in which EM optimizes the parameters differs
markedly from the true landscape, the optima do not coincide
to those in the true landscape. This is supported by the observation that in case more elements are jointly known (n = 2)
the entropy does not show a strong increase anymore (Fig.
4b). Convergence to the true landscape can only be obtained
if the number of elements which can be jointly manipulated
(and hence are known) is sufficiently large.
If one is primarily interested in a model which does not
necessarily reflect the true topology but which is able to predict observations correctly, than PL already leads to good
results if M 250 and n = 2.

submitted

Hamming distance

10

10

-1

10

-2

10

-3

10 10

1.3
1.2
1.1
Hrel 1
0.9
0.8
0.7
n=1, =0.01, no prior
0.6
n=1, =0.01, prior
n=2, =0.01, D=1e-05, no prior
0.5
n=2, =0.01, D=1e-05, prior
(a)
0.4
400 800 1200 1600 2000 2400 2800
10

n=1, =0.01, no prior


n=1, =0.01, prior
n=2, =0.01, D=1e-05, no prior
n=2, =0.01, D=1e-05, prior

(b)
400

800

1200 1600 2000 2400 2800

Fig. 4. Hamming distance and relative entropy in case of noise and no redundancy. (a) 95% confidence intervals of mean of Hamming
distance between inferred and true network. Vertical line is shown at M = 250, horizontal line at a Hamming distance of 0.01 (see text).
(b) 95% confidence intervals of mean of relative entropy. The means were estimated from 1000 DBNs. Hrel is calculated for each inferred
DBN from a sample of 1000 transitions vectors which are generated from the original BoolN.

3.2

Initial network states on periodic attractors

In a true experiment the haematopoietic stem cells are assumed to be in a periodic attractor with either two or more
states. For this purpose we generated networks with either
2 or 3 attractor states. The number of network states that
can be attained by a perturbation of either one (n = 1) or at
most two (n = 2) elements of the attractor states is very limited and usually smaller than 2N . According to experimental
feasibility we have chosen n = 2 and the parameters denoted
in Fig. 5. Again we assume that we are able to completely
monitor the network state that follows the perturbation.
The simulated fidelity curves look very different from the
case in which an arbitrary initial state could be chosen (Fig.
5). The fidelity increases almost immediately for all k but
does not converge to one. The reason is that the limited number of initial states is frequently and repeatedly offered to the
DBN so that it learns quickly the topology inherent in the
transitions which start from these initial states. On the other
hand if, the attractors have too few states, then not enough
states can be attained to ensure a complete inference. Both fk
and Sk depend significantly on the number of attractor states.
For prior knowledge the saturation values were the same but
Mth was reduced by 20% (not shown).

4 DISCUSSION
We have simulated gene network inference for small networks in the absence of extensive knowledge concerning the
transition states. In order to infer the network topology we
used a partial learning strategy which identifies the input
of gene elements by assessing the amount of information
transmitted to a gene from each gene or from groups of

genes. The network topology was represented by a dynamic Bayesian network and the joint probabilities for a given
topology were used to calculate the mutual information score.
Finally, expectation maximization was used to optimize the
inference parameters. This inference strategy permits the
inclusion of topological prior knowledge by considering the
likelihood ratio (ratio of the likelihood of independent elements to the likelihood of dependent elements) in the same
way as the mutual information score. Our studies were guided by a hypothesized core network for haematopoietic stem
cell commitment. For this network, we found that each element transmits information individually, so that in principle
one influenceable gene is sufficient to determine the whole
network topology. We quantified the degree of knowledge
on the network topology by the fraction of correctly learned
inputs (which we call fidelity). We found that prior knowledge, a larger number of influenceable initial states, or a
larger number of accessible initial states, all decrease the
number of experiments necessary to obtain the same fidelity. Redundancy increases the number of experiments as
well as the requirement for a larger statistical significance of
the resulting topology. Similar tendencies are found for the
sensitivities. However, the positive predictive value behaves
contra-intuitively in that, if the chosen significance value is
not sufficiently small, then the PPV saturates at small values
 1 in the case of prior knowledge for a small number of
input genes, and in the presence of noise. Prior knowledge
increases the tendency to keep false positive elements in case
the prior knowledge does not meet the situation found in the
data.
In a real experimental situation networks may often be in
attractors prior to experimental perturbation, which may largely limit the number of states accessible by a perturbation of
only a small number of elements. For this case, we found a
completely different shape of fidelity curves and a saturation

submitted

Attractors, ncommit=2, atLeast3States, Sensitivity

Attractors, ncommit=2, 2States Vs atLeast3States

prior Vs no prior, noise, alpha = 0.001, diffalpha=0.00001

prior, noise, alpha = 0.001, diffalpha=0.00001

0.8

0.8

0.6

0.6

Sk

fk
0.4

0.4

0.2

0.2

0
40

(a)
100

1000

0
40

(b)
100

1000

Fig. 5. Parameter settings are n = 2, = 0.001, D = 105 , no prior knowledge, redundancy and noise. (a) Fidelity vs. M for attractors
with 2 and 3 states. The full lines denote inference on networks with 2 attractor states and the dashed lines with 3 attractor states each
for k = 1 (circles), k = 2 (squares), k = 3 (diamonds), k = 4 (triangles up) and k = 5 (triangles down). The saturation value for f k is in case
of at least 3 attractor states close to 1. (b) Corresponding sensitivity curves. P P Vk was in all cases almost 1.

value often far below to that found when all network states are
in principle accessible. However, both the parameter learning
(which we did by calculating the relative entropy) and the
accessibility of states significantly improves with the number
of gene states that can be experimentally modified.
We also assessed these strategies for networks of N = 10
genes without any pre-conditions on the network topology.
The only difference to the results reported here was that the
fidelities saturated at values that were usually smaller than
one. However, the fidelity can still be described by the functional form denoted in equation (6), and Mth can still be fitted
to the form Mth A exp(k). For a given number n of
genes that can be manipulated jointly the saturation value is
given by the ratio of those canalizing functions in which n
genes transmit information on the output to the total number of canalizing functions. Here, increasing n significantly
improves network inference.
Further steps could lead into several directions. Firstly, our
strategy may be extended to infer subnetworks of larger networks as demonstrated in Ref. [31] for gene networks in
which each state is fully experimentally accessible. Secondly,
our strategy may be generalized to continuous expression
levels based on DBNs that permit to consider continuous state
functions [28].
Acknowledgments: Useful discussions with D. Hasenclever and M.
Loffler are gratefully acknowledged. This work was partly supported by the Interdisciplinary Center for Clinical Research, University
of Leipzig (Project N02) and the grant BIZ-6 1/1 from the Deutsche
Forschungsgemeinschaft.

REFERENCES
[1]T. Akutsu, S. Kuhara, O. Maruyama, and S. Miyano. Identification of gene regulatory networks by strategic gene disruptions

10

and gene overexpressions. Proc. 9th annual ACM-SIAM Symposium on Discrete Algorithms (SODA98), pages 695702,
1998.
[2]R. Albert and H. G. Othmer. The topology of the regulatory
interactions predict the expression pattern of the segment polarity gene in Drosophila melanogaster. J. Theoretical Biol.,
223:118, 2003.
[3]Z. Bar-Joseph. Analyzing time series gene expression data.
Bioinformatics, 20(16):24932503, 2004.
[4]M. J. Beal, F. Falciani, Z. Ghahramani, C. Rangel, and D. L.
Wild. A Bayesian approach to reconstructing genetic regulatory
networks with hidden factors. Bioinformatics, 21(3):349356,
2005.
[5]G. Brady, F. Billia, J. Knox, T. Hoang, I. R. Kirsch, E. B. Voura,
R. G. Hawley, R. Cumming, M. Buchwald, and K. Siminovitch.
Analysis of gene expression in a complex differentiation hierarchy by global amplification of cDNA from single cells. Current
Biology, 5(8):909922, 1995.
[6]L. J. Burke and A. Baniahmad. Co-repressors 2000. FASEB
Journal, 14(13):18761888, 2000.
[7]R. C. Conant. Extended dependency analysis of large systems
part I: Dynamic analysis. Int. J. General Systems, 14:97123,
1988.
[8]M. A. Cross and T. Enver. The lineage commitment of haemopoietic progenitor cells. Current Opinion in Genetics and
Development, 7:609613, 1997.
[9]A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximumlikelihood from incomplete data via the EM algorithm. Journal
of the Royal Statistical Society, 39:138, 1977.
[10]P. Dhaeseleer, S. Liang, and R. Somogyi. Genetic network
inference: From co-expression clustering to reverse engineering. Bioinformatics, 16(8):707726, 2000.
[11]D. Drasdo, T. Hwa, and M. Lassig. Scaling laws and similarity
detection in sequence alignment with gaps. J. Comp. Biol,
7:11541, 2000.

submitted

[12]R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological


sequence analysis: Probabilistic models of proteins and nucleic
acids. Cambridge University Press, 1998.
[13]N. Friedman. Learning belief networks in the presence of
missing values and hidden variables. Proc. 14th international
conference on machine learning, pages 125133, 1997.
[14]N. Friedman and G. Elidan. Libb for windows/linux 2.1.
http://www.cs.huji.ac.il/labs/compbio/LibB/.
[15]N. Friedman, M. Linial, I. Nachman, and D. Peer. Using Bayesian networks to analyze expression data. J. Comp. Biol.,
7(3):601620, 2000.
[16]N. Friedman, K. Murphy, and S. Russell. Learning the structure
of dynamic probabilistic networks. Proc. of the 14th conference
on uncertainty in artificial intelligence, pages 139147, 1998.
[17]G. J. Hannon. RNA interference. Nature, 418:24451, 2002.
[18]D. Heckerman. A tutorial on learning with Bayesian networks.
Technical report MSR-TR-95-06, Microsoft Research, 1995.
[19]D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian Networks: The combination of knowledge and
statistical data. Machine Learning, 20:197243, 1995.
[20]T. Hoang. The origin of hematopoietic cell type diversity.
Oncogene, 23:71887198, 2004.
[21]N. N. Iscove, M. Barbara, M. Gu, M. Gibson, C. Modi, and
N. Winegarden. Representation is faithfully preserved in global
cDNA amplified exponentially from sup-picogram quantities of
mRNA. Nature Biotechnology, 20(9):940943, 2002.
[22]A. S. Kauffman. The origins of order: self organization and
selection in evolution. Oxford University Press, 1993.
[23]M. Kschischo and M. Lassig. Finite temperature sequence
alignment. Pacific Symposium Biocomputing, 1:62435, 2000.
[24]S. Liang, S. Fuhrman, and R. Somogyi. Reveal, a general
reverse engineering algorithm for inference of genetic network
architectures. Proc. of the Pacific Symposium on Biocomputing,
3:1829, 1998.
[25]Z. McIvor, S. Hein, H. Fiegler, T. Schroeder, C. Stocking,
U. Just, and M. A. Cross. The transient expression of PU.1 commits multipotent progenitors to a myeloid fate, while continued
expression favours macrophage over granulocyte differentiation. Experimental Hematology, 31:3947, 2003.

[26]M. Merika and D. Thanos. Enhanceosomes. Current opinion in


genetics and development, 11(2):205208, 2001.
[27]G. A. Miller. Note on the bias of information estimates.
Information theory in psychology, 1955. The Free Press.
[28]A. Murphy and S. Mian. Modelling gene expression data using
dynamic Bayesian networks. Technical report, 1999.
[29]I. Nachman, A. Regev, and N. Friedman. Inferring quantitative models of regulatory networks from expression data.
Bioinformatics, 20(Suppl.1):i248i256, 2004.
[30]I. M. Ong, J. D. Glasner, and D. Page. Modelling regulatory pathways in E.coli from time series expression profiles.
Bioinformatics, 18(Suppl.1):S241S248, 2002.
[31]D. Peer, A. Regev, G. Elidan, and N. Friedman. Inferring subnetworks from perturbed expression profiles. Bioinformatics,
17(Suppl.1):S215S224, 2001.
[32]J. J. Rice, Y. Tu, and G. Stolovitzky. Reconstructing biological
networks using conditional correlation analysis. Bioinformatics
Advance Access, October 14, 2004.
[33]R. Somogyi and S. Fuhrman. Distributivity, a general information theoretic network measure, or while the whole is more
than the sum of its parts. Proc. international workshop on
information processing in cells and tissues (IPCAT), 1997.
[34]R. Thomas, S. Mehrotra, E. T. Papoutsakis, and V. Hatzimanikatis. A model-based optimization framework for the inference on
gene regulatory networks from DNA array data. Bioinformatics,
20(17):32213235, 2004.
[35]O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie,
R. Tibshirani, D. Botstein, and R. B. Altman. Missing value
estimation methods for DNA microarrays. Bioinformatics,
17(6):520525, 2001.
[36]J. Yu, V. A. Smith, P. P. Wang, A. J. Hartemink, and E. D. Jarvis.
Advances to Bayesian network inference for generating causal
networks from observational biological data. Bioinformatics,
20(18):35943603, 2004.
[37]M. Zou and S. D. Conzen. A new Dynamic Bayesian Network
(DBN) approach for identifying gene regulatory networks from
time course microarray data. Bioinformatics Advance Access,
August 12, 2004.

11

Potrebbero piacerti anche