Systems Biology
PROTEINS: Structure, Function, and Bioinformatics 60:525–546 (2005)
Phenotype Analysis Using Network Motifs Derived from
Changes in Regulatory Network Dynamics
German Cavelier1 and Dimitris Anastassiou1,2*
1
Genomic Information Systems Laboratory, Department of Electrical Engineering
2
Columbia Genome Center and Center for Computational Biology and Bioinformatics, Columbia University,
New York, New York
ABSTRACT
The intrinsic dynamic response of
a transcriptional regulatory network depends directly on molecular interactions in the cellular transcription, translation, and degradation machineries. These interactions can be incorporated into
dynamic mathematical models of the biochemical
system using the biophysical relationship with the
model parameters. Modifications of such interactions bring changes to the biological behavior of the
cells, and therefore, many normal and pathological
cellular states depend on them. It is important for
analysis, prediction, diagnosis, and treatment of
cellular function to have an experimentally derived
model with parameters that adequately represent
the molecular interactions of interest. Finding the
model and parameters of a transcriptional regulatory network is a difficult task that has been approached at different levels and with different techniques. We develop here a new analysis method
(based on previous work on network inference,
modeling, and parameter identification) that finds
the most changed parameters from yeast oligonucleotide microarray expression patterns in cases where
a phenotype difference exists between two samples.
We then relate and examine the changed parameters with their associated genes, corresponding genetic functional categories, and particular subnetworks and connectivities. The biophysical bases for
these changes are also identified by studying the
relationship of the changed parameters with the
transcription, translation, and degradation mechanisms. The method is improved to cases where there
are two or more transcription factors influencing
transcription, and a statistical analysis is performed to give a measurement of the uniqueness
and robustness of the parameter fit. Proteins 2005;
60:525–546. © 2005 Wiley-Liss, Inc.
Key words: network motif; genotype; phenotype;
transcription factor; transcription;
regulatory networks; evolutionary algorithms; high-density oligonucleotide microarrays
INTRODUCTION
The goal of this work is to identify subcomponents of
yeast regulatory networks that exhibit biophysically based
changes in dynamic response between two strains. This is
©
2005 WILEY-LISS, INC.
achieved by using microarray data to fit the parameters of
a simplified dynamical model of transcription, translation,
and macromolecular degradation. Parameters that exhibit
the greatest alterations between fitted models for the two
strains are identified, and the genes associated with those
parameters are recognized. Finally, the subnetworks in
which those genes are embedded are identified to examine
their connectivity features, their associations with known
functional genetic categories and with the yeast strain
phenotypes. The secondary goal of the work is to attempt
to identify the physical basis for the parameter differences
by examining their relationship to the biophysical mechanisms represented in the model.
Because regulatory networks are usually large nonlinear systems, the task of finding even their topology alone is
computationally daunting. For a full description of dynamics in the network, a model for the underlying biochemical
processes must be first mathematically described. Early
reports of gene circuit analysis1 formulated a phenomenological model and used simulated annealing for identification of its parameters. Another modeling approach by
Chen and Church2 provides a good theoretical treatment
of gene expression with differential equations and a complete analysis with a linearized version. This same approach has been used in additional efforts without linearization.3,4 Some issues of the required mathematical
representation5 have been depicted elsewhere,3,6 –9 as well
as in our previous article10 where we used a model based
on nonlinear ordinary differential equations (ODEs).
Once the mathematical representation is chosen for the
biochemical processes, the network connectivity must be
specified. Important prior work in network inference from
experimental data is found in the literature. One recent
approach11,12 (initially tested with in numero experiments) to reverse engineering the network topology used a
linear differential equation model of transcription with
linear mRNA degradation, and applied singular value
Abbreviations: GFP, green fluorescent protein; RNAP, RNA polymerase.
*Correspondence to: D. Anastassiou, Columbia Genome Center and
Center for Computational Biology and Bioinformatics, Columbia
University, Genomic Info Systems Lab, Department of Electrical
Engineering, New York, NY 10027. E-mail:-Received 17 April 2004; Revised 25 November 2004; Accepted 1
February 2005
Published online 21 June 2005 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/prot.20538
526
G. CAVELIER AND D. ANASTASSIOU
decomposition, robust regression, and transcriptional perturbations. The model included translation only in a
functional, nondetailed way. This method was applied to
experimental data13 in the SOS pathway in Escherichia
coli. Steady-state transcription profiles were measured
from this pathway to get a first-order description of the
regulatory interactions. A more recently developed technique to reconstruct genetic networks in yeast included a
computational analysis of gene modules incorporating
DNA binding measurements14 in addition to functional
information from expression data. Yet another recent in
silico analysis15 shows details of the modeling for the
mathematical equations of the network, as well as results
regarding the identifiability of their parameters. In this
case, a known topology was used together with detailed
biochemical equations for transcription factor dimerization and binding, a nonlinear description of transcription
regulation, and linear representations of translation and
macromolecular degradation.
Even more difficult is to assign experimentally based
numerical values to the parameters of the mathematical
equations describing the network dynamics and connectivity. An approach beginning with a known network topology4 used singular value decomposition and nonlinear
optimization to get the parameters of a nonlinear transcription model of promoter activity (i.e., not including mRNA
degradation, protein translation, or protein degradation).
Another interesting approach16 used a different modeling
based on the nonlinear Michaelis-Menten form for representing regulation of transcription (transcription factor
binding) without cooperativity effects.10 This representation was then used into dynamic Bayesian networks, and
gradient ascent optimization for parameter identification
was applied. The procedure is based on a given dataset of
gene transcription rates, and on knowing a priori the
values of mRNA degradation rates. It is quite powerful in
the sense that it finds the connectivity of the network ab
initio, the activity profiles of regulatory proteins, and the
cinematic parameters of transcription regulation.
Our present algorithm10 requires a priori knowledge of
the network connectivity and uses expression data from
sporulation in yeast17,18 to identify genes whose parameters change the most between two yeast strains,17 and the
motifs19 –24 associated with these genes. The methods
described above can be used to find the a priori necessary
network connectivity, but other methods can also be
employed for this purpose, such as genetic and biochemical
experiments25 complemented with known yeast databases.26 Other studies27 have found the network connectivity using support vector machines (SVM) as a tool to
identify transcription factor targets from gene expression
data. The present algorithm uses a model similar in some
aspects to the ones described above, but in an improved,
more comprehensive way. It includes a nonlinear representation of several processes of mRNA transcription (transcription factor binding, cooperativity, transcription regulation) together with linear mRNA degradation, as well as
linear protein translation and degradation. It also features
a model-independent, global, nonlinear parameter identifi-
cation procedure based on a type of genetic algorithms
called evolution strategies. The approach used here is
different in several respects from the usual clustering
algorithms or other genetic functional analysis procedures14,28,29 because, although it also uses gene expression patterns, it does not rely on similarities in such
expression patterns. It is instead a biophysically based
model of transcription, translation, and degradation that
reproduces each of the measured expression patterns. It
finds the corresponding dynamic parameters and their
changes in cases where a phenotype difference exists
between two samples. The method is able to identify the
biological or phenotypical consequences of such changes,
helping to bridge the gap between genotype and phenotype.9 The genes identified seem to complement those
found with clustering or other genetic functional analysis
algorithms. They overlap with some genes found with
them, but uncover additional ones and their genetic functional categories, as well as their relation to the particular
phenotype. The method also identifies and analyzes some
connectivity issues of the subnetworks or motifs found. As
a secondary goal, the present study examines the relationship of the parameter changes found to the mechanisms of
transcription, translation, and degradation. Details of
important molecular interactions3,7,8,30 may be included
in the mathematical description to understand the relationships of structural molecular details10,31–33 with the model
parameters. The method uses a particular model implementation that aims to get the most possible detail in the
description without making it computational prohibitive.
Examples of parameters in our model are: time constants
for concentration change (related to dilution and degradation rates), strength of protein and mRNA turnover rates,
equilibrium constants of transcription factor binding to
DNA, and binding cooperativity.10,34,35
METHODS
Transcription-Translation Modeling
We desire to model transcription and translation as
time-dependent, dynamic processes having biophysically
defined rates. The best way to describe such processes
under such assumptions is by means of differential equations.2,10,36 We will initially ignore spatial and stochastic
effects. Therefore, we describe the network with a set of
continuous time, deterministic ordinary differential equations, which in their state variable representation are:
ᠬ兴
d关P
ᠬ 兴,
⫽ gᠬ ([m
ᠬ ]) ⫺ Vp 䡠 关P
dt
(1)
d关m
ᠬ兴
ᠬ ]) ⫺ Vm 䡠 关m
ᠬ 兴,
⫽ ᠬf([P
dt
(2)
where J is the total number of gene products (proteins) in
the network; L is the total number of genes in the network;
ᠬ ] is the protein concentrations (nM) J-dimensional vec[P
tor, functions of time t; (m
ᠬ ) is the mRNA concentrations
(nM) L-dimensional vector, functions of time t; gᠬ ([m
ᠬ ]) is
the translation rate functions (nM 䡠 time⫺1) J-dimensional
ᠬ ]) is the transcription rate functions (nM 䡠
vector; fᠬ([P
PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS
time⫺1) L-dimensional vector; VP is the protein degradation rates (time⫺1) [constant (J ⫻ J) diagonal matrix]; Vm
is the mRNA degradation rates (time⫺1) [constant (L ⫻ L)
diagonal matrix].
The function gᠬ ([m
ᠬ ]) has usually been modeled as a linear
function,37 that is, the rate of protein production by the
process of translation is assumed to be just proportional to
the concentration of mRNA present. The degradation rates
Vm and VP are diagonal matrices because the degradation
of each substance is assumed to be proportional only to its
own concentration, so the off-diagonal elements are just
zero. The proportionality constant in such case is regularly
assumed to be the degradation rate (time 䡠 constant)⫺1.
ᠬ ]) represent transcription rates and
The functions in ᠬf([P
ᠬ ]) is
are nonlinear functions. Defining the functions ᠬf([P
critical because they represent the complex processes of
mRNA transcription initiation and elongation,3,38 including transcription factor binding and assembly of the RNA
polymerase initiation– competent complex. Structural details of the initiation complex31–33,39 of RNA polymerase II
have been found recently. Recent studies also demonstrate
that in the mathematical modeling of this system it may be
necessary to provide detailed descriptions, including molecular biophysical details.1,3,8,10,38,40,41 We try to include
as many biophysical details as possible without making
the model unpractical to solve. Some may be included in a
second round of model refinement. For example, additional
mechanisms involving the ubiquitin–proteasome degradation system can be defined both for the rate of transcription activation and for the rate of degradation.42 To
prevent excess complexity, their contribution to the transcription–translation processes will not be incorporated
into the present version of the model, but can be worked
with in future analysis to improve the modeling.
An intermediate-complexity approach (without cooperativity, structural, stochastic, spatial, or proteasome machinery details) is shown in the study by Zak et al.,15 and
includes rate-law kinetics derived from mass action principles. In this description, biochemical equations are written15 for: (a) binding and unbinding of transcription
factors to promoters to form TF-promoter complexes, (b)
translation modeled as a first-order process with a fixed
rate constant; (c) dimerization and undimerization for the
formation of the active (dimer) transcription factor. Dimer
concentrations are also affected by the promoter binding
and unbinding reactions in which they participate. Assuming that such reactions do not significantly influence the
concentration of transcription factor dimers, these terms
can be removed with only a small effect on the results;15 (d)
overall transcription rate as a linear sum of the transcription rates for the bound and unbound promoter states; (e)
in the case of promoters where more than one transcription factor can bind, additional rate terms15 (taking care of
the combinatorial problem too) are added to (d). We
developed in our previous work10 a still simpler rate-law
kinetics description derived from mass action principles.
Because these reactions are fast enough compared with
the dynamics of transcription and translation, our initial
simplification consisted in replacing for them the steady
527
state. This can be also written directly by a thermodynamic analysis of the combinatorial binding problem.10
This description would be complete, and would encompass
a direct relationship with the biophysical reality, but we
found that the large number of parameters (the on and off
rate constants in these equations, or the thermodynamically derived binding constants or binding free energies)
precludes their practical identification from experimental
data in almost all subnetworks with a meaningful size.
Therefore, to keep the direct relationship between the
model parameters and physical reality, the only practical
possibility is to use a two-step procedure. In a first
approximation, we model the system with a further simplification and abstraction of the rate-law kinetics derived
from mass action principles consisting in using the Hill
function (plus a constant basal rate) to represent the rate
of transcription.36 In addition, we use this simplified
model in a very simple subnetwork, namely one composed
of only two interacting genes. The model could then be
improved in a second step. Such improvement would
include, for example, finding (by a similar parameter
fitting optimization) a function with more details (and
correspondingly with more parameters) that fits the same
nonlinearity that the Hill function of the first step does.
Then, further parameter identification can be performed
with our method, beginning with the new more detailed
function and its larger number of parameters. The fact
that a first step gives initial values for the larger number
of parameters makes this second step feasible because the
optimization begins very near the optimum values. This
improved modeling would have a direct relationship of the
parameters with the biophysical reality, at the cost of
highly increased computational effort. We found in our
previous study10 that for most of our purposes only the
first step is necessary, and therefore chose to model here
our subnetworks only with the Hill function. The parameters found with this model still have a relationship with
the biophysical reality, although it is not one-to-one because they have been condensed in a few ones.
Hill Function
The Michaelis-Menten description of the rate of transcription and, in the case of cooperativity, the Hill function,4 have been the preferred descriptions considered for
biophysically based modeling of the transcription rate. In
previous modeling attempts, a normalized Hill function
was used within a wholly normalized model as a simulation tool.36 In our modeling10 we do not employ normalization of the whole model but instead use, only in the Hill
nonlinear function, explicit division of the nanomolar
concentration variable by its corresponding threshold for
steep change, preserving the dimensionality (in nanomoles) of such concentration variable. We add a basal rate
of transcription ␣0 and multiply the transcription rate
nonlinear function by ␣, a parameter that represents the
maximum strength of the nonbasal transcription rate
without repressor/activator. A complete treatment of the
system with Hill functions includes different degradation
rates for mRNA and proteins (represented here by the
528
G. CAVELIER AND D. ANASTASSIOU
time 䡠 decay constants, reciprocals of the degradation
rates, that is the time constants m and p in minutes), as
well as an explicit translation rate for mRNA (kt in min⫺1):
d关P j 兴
关P j 兴
,
⫽ k t 䡠 关m i 兴 ⫺
dt
P
(3)
d关m i 兴
关m i 兴
,
⫽ ␣ 0 ⫹ ␣ 䡠 f共关P j 兴兲 ⫺
dt
m
(4)
where t is the time (min); [mt] is the mRNA concentration
(nM); [Pj] is the corresponding (translated) protein concentration (nM); ␣0 is the “leakiness” of the promoter [mRNA
concentration per unit time (nM 䡠 min⫺1) produced from a
given promoter type during continuous growth in the
presence of saturating amounts of repressor]; ␣ ⫹ ␣0 is the
mRNA concentration per unit time (nM 䡠 min⫺1) produced
from a given promoter type during continuous growth in
the absence of repressor; kt is the protein translation
coefficient (min⫺1); p is the protein degradation time
constant (min); m is the mRNA degradation time constant
(min).
The Hill nonlinear (repression) function f([Pj]) is therefore in this case:
f共关P j 兴兲 ⫽
1
关P j 兴
1⫹
THR
冉 冊
,
nHILL
(5)
where nHill is the Hill coefficient; THR is the repressor
threshold concentration (nM) for steepest change of the
nonlinear function, that is, the concentration necessary to
half-maximally repress a promoter.
This corresponds to the case where [P] is the concentration of a repressor. If [P] is the concentration of an
activator, then replace f([P]) by [1 ⫺ f([P]). Similar modeling has been applied to the SOS DNA repair system in
Escherichia coli, but without the gene product (protein)
equation [Eq. (3)]. The parameters of such a model were
found with another nonlinear optimization procedure.4 In
our previous study,10 we analyzed different types of nonlinear functions to describe the process of transcription. The
simplest case displays only the transcription factor concentration variable ([m1(t)] ⫽ x1) and the RNAP concentration
([RNAP] ⫽ x2 ⫽ 30 nM). It considers only one binding site
for the transcription factor, and one (nonoverlapping)
binding site for RNAP. Upon constructing the corresponding binding site occupation table as described in ref. 10, it
is found that only terms of order 1 in the variable (x1) are
possible. This is therefore a simple sigmoid whose slope is
very smooth (it is the Michaelis-Menten type of sigmoid4).
The corresponding Hill function will display a Hill coefficient of 1 and will not be considered here because most
fittings to real data that we did10 indeed displayed sharper
slopes, corresponding to Hill coefficients larger than 1. The
second case to be considered is when there are two binding
sites (two operators) for the transcription factors and one
binding site for RNAP, as also analyzed in more detail in
our previous article.10 The simplest appropriate nonlinear
function in this case is still the one in Eqs. (4)–(5) with the
Hill coefficient greater than 1 taking care of the higher
complexity of the system. If even more detail is desired, for
example, to relate the network to molecular and atomic
X-ray structural details,30 a thermodynamically based
function and biophysical parameters can be found with our
described procedure.10
Genome-Wide Modeling
Modeling with a One-Input Gene Pair
A top-down approach seeks to define motifs and modules
that are repeated in the overall intracellular network,
whereas a bottom-up approach begins with the biochemical equations and tries to build and explain from there the
whole network. The model we used was similar to the Hill
model of Eqs. (3)–(5), with only two independent state
variables and one input. The analysis thus defines the
gene-pair motif as an independent entity:9,19 a first gene
whose experimental mRNA concentration in time, namely
[m1(t)], is taken as independent input for translation of a
certain concentration of its gene-product, [P1(t)], using a
form similar to Eq. (3). (This mRNA concentration is in
general a state variable, but because we have its timeseries measurement, we can assume that it is a known,
independent time-series input.) In turn, P1(t) (our first
state variable) is an activator or repressor for transcription of a certain concentration, [m2(t)], of the second gene
mRNA [Eq. (4)]. The output of this gene-pair motif is
precisely [m2(t)] (our second state variable), which is fitted
to the microarray time-series expression pattern for this
gene {[m2(t)] time-series data} by our parameter identification algorithm. This motif of two genes was chosen because
of the availability on a Web site of experimental microarray data (see below) for [m1(t)] and [m2(t)], even if there is
not data available for [P1(t)] in that Web site (no protein
chip measurements were done in that system). The method
developed here can be used in any pair of related genes or
biochemically related proteins, not necessarily transcription factors, as long as there is experimental data available
for the macromolecule concentrations. The ideal would be
to have concentration measurements for all the state
variables in the system equations. In the present case,
there was data for two out of three state variables (macromolecular concentrations) and the resulting parameter
identification was acceptable (the errors were below 20%
in most of the cases, in many cases below 10%). If only one
macromolecule concentration is available, the method can
also be used, but the procedure will be more difficult and
more prone to error. The gene product [P1(t)] can be
posttranslationally modified before binding as a transcription factor. We do not include such modifications explicitly
here, although if desired, this can be done. It should be
noticed that the present method finds the active transcription factor concentration necessary to produce the measured [m2(t)]. Therefore, the effects of posttranslational
modifications are included in the equations and parameters, not in a detailed way, but in a functional and
quantitatively correct way.
Note, in addition, that we suppose in this treatment that
the loading effects on the protein concentrations can be
PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS
neglected. Such negligible loading effects refer to the fact
that the protein concentrations (active transcription factors
concentrations) will not be appreciably affected by the binding and unbinding reactions in which they participate. This
assumption has been validated in a previous study,15 and
indeed seems reasonable because the protein concentrations
that we found in the optimization are quite large. A similar
assumption can be made about the concentration of mRNA
because of its participation in translation (namely, the concentration [m1(t)] is not modified by participation of this mRNA
in translation). The implications of deviations from such
ideal behavior could be studied by using stochastic approaches43,44 to the modeling in cases when the total number
of macromolecules is deemed to be very small, as discussed in
our previous study.10
Sampling Rate and Parameter Identification
We use our system identification algorithm10 to find the
whole set of parameters and the (usually unknown) initial
condition, P1(t ⫽ 0). The time constants m and p are in
units of minutes; THR and P1(t ⫽ 0) are in nanomoles. The
other units are as before. An important consideration
when obtaining experimental data for a model is the
sampling rate. In our case, because we used available data
obtained by other groups, we did not have the possibility of
selecting the best sampling rate. However, we can discuss
this issue in terms of our model and the system under
consideration. In one-dimensional linear systems, it is
known that the optimal sampling rate is determined by
the highest meaningful frequency, that is, the cutoff
frequency of the system’s frequency response. The sampling rate that theoretically permits to recover the original
signal in such a case must be at least twice the cutoff
frequency. For example, if the system responds only up to
excitations of the order of two cycles per minute [its cutoff
frequency is (2 䡠 min⫺1) two per minute] then the sampling
rate must be at least twice that value, or four samples per
minute. In our case, we are fitting to microarray values of
yeast mRNA concentrations, and it is known that the
half-life in yeast for mRNA degradation is in the approximate range of 40 – 60 min.45 Similarly, the observed oscillations in our data have a periodicity of about 60 –120 min.
This corresponds roughly to a cutoff frequency in the range
of 0.5–1.0 cycles per hour, and thus the sampling frequency should be at least twice such value, or about one to
two samples per hour. The data available that we used was
taken using sampling rates of the order of one sample per
hour. However, the fact that this is a nonlinear system,
plus the nonideal conditions like noise, model approximations, and multidimensionality, would make a higher
sampling rate more desirable. It is always a matter of
compromise between the cost of getting many samples and
the number of samples desired (ideally an infinite number).
The parameter identification of our gene-pair motif is
relatively simple, although it needs human supervision to
obtain meaningful results10 because in many cases it can
take a path in the optimization surface that will lead to
nonconvergence. This problem is most probably due to the
529
complexity of the nonlinear optimization surface. It might
be corrected in the future by automatic means, but until
now it has been only corrected by directly following the
results and visually identifying when the path will not
lead to convergence and stopping the procedure to restart
it on a better path. The procedure is thus assured convergence, and can then be successively repeated with any
desired number of gene pairs without the need for expensive or powerful computers or exponentially increasing
time or other computational resources.
Eukaryotic Transcription Machinery
The eukaryotic transcription machinery46 is much more
complicated than the simple examples above, which were
initially derived from and used in modeling a bacterial
system.37 However, a first attempt to identify the network
subsystems in an eukaryotic organism can be done by
considering that the higher level of complexity that eukaryotes have added to the transcription machinery to
achieve development and diversity46 can be reduced under
simplifying assumptions to the cases studied above. By
considering stages in development to be fixed, one can see
that many of the variables will be fixed. By considering
tissue cellular diversity to be fixed by taking only a specific
type of tissue, many other variables will be fixed. Therefore, the simple models employed above with only one or
two transcription factors, two binding sites for these
transcription factors and one binding site for RNAP may
represent a fixed stage in development in a particular cell
type. This type of modeling is similar to one already
discussed and incorporates some concepts of the top-down
regulatory architecture and modular structure.47,48 We
therefore only consider the binding of one or two transcription factors and RNAP. Because of the architectural
requirements of the transcription machinery, some transcription factors may need to loop the DNA towards the
binding site of RNAP46,49 and bind to the transcription
machinery with some interaction energy. This interaction
energy and the other fixed terms may be included in the
free energy of binding of the transcription factor of variable concentration, as many transcription factors are
known to act by increasing or decreasing the binding of
other transcription factors. Alternatively, it may be included in the cooperativity effects represented by the Hill
coefficient of the nonlinear function, as explained above in
the Methods section. We verified the validity of these
assumptions for eukaryotes in our previous article10 by
applying the procedure to several gene pairs in the yeast
sporulation time-series expression data.
Transcription Factor Oligomerization and
Cooperativity
We did not include the possibility of transcription factor
dimerization in our modeling10 of the nonlinear function.
For more accurate results, this should be included, and
with some mathematical and computational effort it may
be accomplished.37 Dimerization may increase the cooperativity effect of the overall function; thus, in detailed
models such as the thermodynamically derived function
Fig. 1.
its effect should be evaluated.34 For the sake of simplicity,
we will not study further this consideration here. Yet, this
factor will only change the Hill coefficient in the model
chosen for our present study, so it is taken into account,
albeit in a more general manner.
Modeling in the Two-Transcription Factor Binding
Case
The basic gene-pair motif can have more than one
transcriptional input. In such cases, the simple approach
with one nonlinear function of one variable presented
above is not complete. For example, in the case of two
inputs a three-dimensional nonlinear function is obtained,
namely two variables as inputs in the X-Y plane, and the
function value in the Z plane, giving a surface. Such
surface corresponds to a thermodynamically derived function of two variables, corresponding to the binding of two
transcription factors. Using Hill functions [Equation (5)]
Fig. 4.
Fig. 1. Possible combinations of AND/OR, activation and repression.
The surfaces show the normalized combined function of two concentration variables in the nanomolar logarithmic range of ⫺1 to ⫹3.
Fig. 4. (a) Normalized absolute differences with respect to the mean
of each parameter difference, for each of the 96 gene pairs. (b) Selection
of gene pairs with a normalized parameter difference higher than 3
standard deviations.
PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS
independently for each variable, one has the problem of
how to merge these two effects. We observed that they can
be either activation or repression, and that their combined
effect may be represented as the product (AND-type) or the
sum (OR-type). One has therefore 23 ⫽ 8 possible combinations of these possibilities, as shown in Figure 1. We found
that this approach is supported by experimental observation of this type of transcriptional input function because
with our definition the surface presents up to four different
plateaus (Fig. 1), as found experimentally.50 Each of the
eight possible combinations was used to model a transcription machinery complex and perform refined parameter
identifications in some of the cases where at least two
transcription factors are known to influence transcription.
More detailed and more complex analyses of the combinatorial transcription logic for the nonlinear function have
been performed elsewhere,51 and they show similarity to
our analysis in the AND/OR cases that we chose for our
modeling, although they do not show the four-level surface
found experimentally,50 which our modeling indeed shows.
More work is needed in the elucidation of which representation of the cis-regulatory combinatorial transcription
logic is best.
Parameter Identification
The parameter identification problem in nonlinear, complex systems requires using special purpose tools, because
of both the complexity and the high degree of nonlinearity
of these systems. This problem has been solved in engineering and in biological systems by means of Gradient Descent, Simulated Annealing,1 nonlinear least mean
squares,4 or Genetic/Evolutionary Programming Algorithms. We developed in our previous study10 an Evolution
Strategy algorithm that determines the parameters of a
large variety of transcription-translation mathematical
models, including the one discussed above with a Hill
nonlinear function. We use such tools in all the identification procedures carried out in the present study. We refer
the reader to the details of the procedure given in our
previous work.10 Briefly, the Evolution Strategy begins
with a population of assumed (random) values of the
parameters and generates by mutation and recombination
random variations on them, so that a random set of
solutions to the system can be obtained. Each of these
solutions is compared to the experimental data and a mean
square error metric for the model results with respect to
the measurements is generated. The different solutions
are then ranked in order of best fit to worst fit according to
this error metric, and a prespecified number of these
solutions is taken as the new population to continue in the
next iteration the process with another round of mutation,
recombination, solution, and error evaluation. Because not
only the best solution is taken in each case, there is the
possibility within the other less optimal solutions to
generate an exit from local minima, providing thus a
global optimization method. The solutions thus approach
the experimental data globally, and when a prespecified
minimum error is found, the procedure stops.
531
Identifying Highly Changed Parameters and Their
Biological Consequences
In our description, gene 1 is always the gene for a first
transcription factor, and gene 2 is always a gene that is
activated or repressed by said first transcription factor. We
used the connectivity table published as supplementary
information by Guelzim et al.26 as a primer to initiate our
procedure. The whole set of parameters for a large regulatory subnetwork was obtained by consecutive identification of the first 96 gene pairs in the Guelzim et al.26 (http://
www.nature.com/ng/journal/v31/n1/suppinfo/ng873_S1.html)
table, for each of two yeast strains. The corresponding
ORFs for these 96 gene pairs are shown in Table I. The
data employed for the parameter identifications was obtained by the laboratory of Esposito.17 In a few words,
yeast meiotic time-series expression patterns were obtained from high-density oligonucleotide microarray measurements on the yeast strains SK1 and W03 MATa/alpha
(http://re-esposito.bsd.uchicago.edu/). These two sporulation-proficient yeast strains were cultured in appropriate
media and sporulation was induced. Aliquots of sporulating cells from the W303 culture were taken at 0, 2, 4, 6, 8,
10, and 12 h, and from the SK1 culture at 0, 1, 2, 3, 4, 6, 8,
and 10 h. These time points were chosen to reflect the
progression of meiotic landmarks in each strain. Nonsporulating control samples from control strains were also
measured. Then RNA preparation and DNA probe synthesis, genomic DNA preparation, GeneChip probe array
(Affymetrix) hybridizations, and Expression Data Analysis (with GeneChip 3.0 software) were completed. Processed expression data of ⬃1600 yeast genes were organized in a relational database available in the above-cited
Web site.
The parameters identified included the initial conditions
for the concentrations of the gene product species (protein
Pi) as well as the seven parameters of the model for each
gene-pair motif: ␣, ␣0, m, p, kt, the Hill coefficient nHill,
and the threshold THR [Equations (3)–(5)]. The parameters thus found are first preprocessed by finding for each
gene pair the absolute value of the numerical difference of
each parameter between the two strains. The resulting
absolute values of the differences are then analyzed by
displaying them in a matrix containing the standard
deviations that each estimation departs from the mean
value for that particular parameter in the set. The matrix
of normalized absolute value of change in the parameters
is built. It is equivalent to a matrix of scores (for each
parameter and gene pair) with mean zero and standard
deviation of 1. Only those parameters that have a (normalized) parameter change greater than a chosen threshold
are considered in the final selection (a threshold of three
standard deviations was used). This subset of highly
changed parameters is our final selection, aimed at disclosing the associated network motifs and their interaction
with the rest of the subnetwork. This selection is visually
displayed and analyzed using the software Cytoscape52
(http://www.cytoscape.org/) to further examine it in the
whole yeast gene subnetwork from the database of 491
genes of Guelzim et al.26 Cytoscape provides a useful tool
532
G. CAVELIER AND D. ANASTASSIOU
TABLE I. ORF for the 96 Gene Pairs Identified
-)
Gene 1
Gene 2
YAL021C
YAL021C
YBL005W
YBL005W
YBL005W
YBL005W
YBL005W
YBL005W
YBL005W
YBL005W
YBL008W
YBL008W
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL021C
YBL093C
YBL103C
YBL103C
YBL103C
YBL103C
YBL103C
YBL103C
YBR008C
YBR049C
YBR049C
YBR083W
YCL067C
YDR123C
YGL073W
YGL207W
YLR182W
YLR256W
YHR119W
YMR303C
YDR011W
YDR406W
YGL013C
YGR281W
YJL219W
YOL156W
YOR153W
YOR328W
YDR225W
YML010W
YBL030C
YDL205C
YDL230W
YDR148C
YDR232W
YGL187C
YGR059W
YHR008C
YHR051W
YIL125W
YJL166W
YJR048W
YKL141W
YKL148C
YML091C
YNR001C
YOR065W
YOR375C
YPR191W
YBR020W
YCR005C
YDR176W
YLR304C
YNL037C
YNR001C
YOR136W
YBL005W
YAL038W
YBR019C
YHR084W
YNL216W
YDR207C
YDR207C
YER111C
YGL207W
YBR112C
for graphical network analysis and display, because the
initial Saccharomyces cerevisiae network is present on the
distribution of the software, can be loaded without difficulty and its subsets easily analyzed by using the menus or
entering just one- or two-column text files for their description. The software produces a graphical output of the
subnetworks in different formats, which can also be tailored with specific illustration or annotation needs. The
first neighbors of the selected genes are found with the
same tool. For each parameter member of the highly
variable final selection, an associated network motif is
Gene-)
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR049C
YBR083W
YBR112C
YBR112C
YBR112C
YBR112C
YBR112C
YBR112C
YBR112C
YBR112C
YBR112C
YBR112C
YBR112C
YBR112C
YBR279W
YBR289W
YBR289W
YBR289W
YBR289W
YBR297W
YBR297W
YCL066W
YCL066W
YCL066W
YCL066W
YCL066W
YCL067C
YCL067C
YCL067C
YCL067C
YCL067C
YCL067C
YCL067C
Gene 2
YBR020W
YCR012W
YDL140C
YDL164C
YDR050C
YDR146C
YER086W
YFL039C
YGL026C
YGR254W
YKL182W
YMR186W
YNL216W
YOL004W
YOL006C
YPL231W
YOR077W
YAL063C
YAR050W
YCR005C
YDR172W
YER065C
YHR051W
YHR211W
YIL162W
YJR048W
YJR094C
YLR256W
YML058WA
YPL256C
YAR071W
YBR093C
YIL162W
YPL187W
YBR298C
YBR299W
YGL089C
YIL099W
YKL178C
YLR113W
YPL187W
YDR007W
YDR103W
YFL026W
YGL089C
YIL015W
YKL209C
YPL187W
therefore found by finding its first neighbors in the database of 491 genes of Guelzim et al.26 The network motifs
found for each gene pair with highly variable parameters,
or the interconnected subnetwork of selected genes with
highly variable parameters may be subsequently studied
for dynamic response and other properties. Finally, the
selected highly variable gene set is visually examined with
Cytoscape with respect to the rest of the subnetwork of 491
genes of Guelzim et al.26 The connectivities of the gene set
are analyzed with Cytoscape and some tools from Guelzim
et al.26 Biological characteristics associated with the
PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS
changed parameters are found using the tools of the
Saccharomyces Genome Database (SGD, http://www.yeastgenome.org/). The genetic functional categories are found
and examined with respect to the corresponding biological
changes and phenotypes with the MIPS53 Comprehensive
Yeast Genome Database (CYGD, http://mips.gsf.de/genre/
proj/yeast/index.jsp). The biophysical relations of the parameter changes with the mechanisms of transcription,
translation, and degradation are explored. A comparison is
then made with results obtained by other groups with
other genetic functional examination methods such as
clustering and motif analysis. When there are more transcription factors in action an improvement will be obtained
by including an expanded input logic in the modeling, as
shown for example in Buchler et al.51 We improved the
algorithm to find in some of these cases the particular
combination of AND/OR, activation and repression, as well
as the corresponding parameters that provide the lowest
fitting error to the experimental data. This improvement
of the method is achieved by including into the same
algorithm additional parameters that optimize the activation/repression and Boolean combinations of several transcription factors as inputs. Finally, a robustness statistical
analysis of the results is performed on some of the gene
pairs by carrying out many more parameter evaluations
and finding their means and corresponding 95% confidence
intervals with Excel.
RESULTS
We obtained by successive identification with our method
in each yeast strain the whole set of parameters for the
genomic regulatory subnetwork composed of the first 96
gene pairs in the Guelzim et al.26 table as described in the
Methods section. Details of these results for six gene pairs
in each yeast strain are shown in Figure 2, and more of
such gene pairs are shown in Figure 3. Note the dynamic
changes in the top and bottom curves of each pair of yeast
strains (strain SK1 is on top, while strain W03 is at the
bottom) and the corresponding changes in the parameters.
Note that activation and repression are in some cases
switched between SK1 and W03. The errors in general
are around or below 20%, as found in our previous
study.10 It should also be noted that the degradation
time constants for mRNA and protein, m and p, do not
follow the rule accepted for prokaryotes,34 namely, that
the protein degradation time constants are much larger
than the mRNA time constants. In our eukaryote system, these two time constants are larger, nearly equal,
or shorter than the other is. There are, however, changes
in these orderings between the two strains for the same
gene pairs.
It was found (see Figs. 2–3) during the optimizations for
parameter identification that some of the paths in the
connectivity are in the opposite direction as that shown in
the published data.26 Some of the activators or repressors
found in the published data were also found to behave in
the opposite way. We believe that the optimization done
here gives a more accurate representation of causality,
activation and repression in these cases, because the time
courses (causality) of processes represented in detail by
533
the transcription and translation events in the time dependent equations cannot be reversed and follow very faithfully either an activation or repression function. Attempts
to change the causality found or to switch the activation or
repression functions in our identification resulted in a very
bad fit to the data (not shown).
Gene-Pair Motifs with Highly Changed Parameters
between the Two Strains
A further analysis is possible after the initial evaluation of
a set of 192 pairs of genes (96 pairs of genes for each yeast
strain) like those shown in Figures 2–3. The gene pairs
whose parameters change the most in absolute value from
strain SK1 to strain W03 can be found as explained in
METHODS. In Figure 4 the seven parameters identified for
each gene pair are shown on one axis as P1, P2, . . ., P7, while
the gene pairs are numbered on the other axis from 1 to 96.
The columns in Figure 4(a) represent the parameter change
in standard deviations from the mean for each parameter.
The selected genes with high parameter change are found as
described in the Methods section and shown in the columns
in Figure 4(b). They are also displayed in green in Figure 5(a)
within the network of 491 yeast genes of Guelzim et al.26 The
subnetwork [Fig. 5(b)] of selected genes with high variability
in their parameters (green) plus their first neighbors (red) is
extracted from the 491 genes of Guelzim et al.26 Several
known network motifs20,34 can be found among the selected
genes plus their first neighbors. One of them is shown in
Figure 5(c)–(d). The 11 genes whose parameters change most
are BAR1, CYC1, FLO5, GAL1, HUG1, MF(ALPHA)1,
PDR15, QCR8, STE2, STE5, and SUC2. Their list and ORFs
are shown in Tables II and III. Also extracted and displayed
for analysis are the connectivity and MIPS53 functional
categories of the selected genes with high variability in their
parameters (in green in Fig. 6) together with their upstream
genes (in red in Fig. 6) from the network of 491 yeast genes of
Guelzim et al.26 It can be seen that there are gene hubs that
control the different MIPS functional categories in an organized way.
The selected gene-pair motifs whose parameters changed
the most among the 96 gene-pairs analyzed in the present
study overlap with a number of known network motifs.9,19 –23 Such motifs are found to correspond in many
cases to already established categories, both in Escherichia coli and in Saccharomyces cerevisiae.9,19 –23 In other
cases, they have additional components, and it would be
interesting, for example, to see how the behavior of the
“canonical” Feed-Forward Loop22,23 (FFL) changes with
these additional components. The FFL is composed of a
transcription factor X, which regulates a second transcription factor Y, such that X and Y jointly regulate gene
Z.22,23 Some generalizations of these motifs have been
studied previously.54
The set of genes with high parameter change (the
selected genes shown in green in Figs. 5– 6) are all
terminal genes in the gene set of Guelzim et al.26 (they will
not influence any other gene in such set). They also have
an above average number of inputs, as can be observed in
Figure 6. The gene set of Guelzim et al.26 has 491 genes
534
G. CAVELIER AND D. ANASTASSIOU
Fig. 2. Model fitting results and parameters for six gene pairs of each yeast strain. Parameters listed at the
right follow the order given in the text. The three last items correspond to protein initial condition, optimization
error in percent, and digit 1 for activation or 0 for repression. Time series data: (Red – – –). Fitted m2 value:
(Red —). Curves are labeled with the corresponding gene ORF.
(from a yeast total of around 6400) and is derived from
confirmed genetic or biochemical experiments. The number of inputs for each of the genes with high parameter
change can be found with Cytoscape. The probability of
each of them having that particular number of inputs can
then be calculated as described.55 As shown in Table II,
most of the highly changed genes display a very low
probability of having these experimentally found large
numbers of inputs. It might be argued that the changes in
parameters found might be due to such large number of
inputs. We verified (shown below in Improving the model
results and Table IV) that increasing the number of
PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS
Fig. 3.
535
Model fitting results and parameters for 18 gene pairs of each yeast strain. The conventions are as in Figure 2.
inputs in the parameter identification did not in general
change the finding of highly changed parameters where
it had been previously found, although the change may
partially migrate to new parameters. It could also be
argued that the parameter change from SK1 to W03
might be due to the time-series waveform variance in
the inputs. The mean square differences between the
inputs of the two strains are shown in Table II, and they
536
G. CAVELIER AND D. ANASTASSIOU
Fig. 5. Selected gene pairs are shown in green. (a) Network for database of 491 gene pairs. (b) Selected
gene pairs with high difference in parameters and their first neighbors. (c) An example of a network motif with
two selected genes. (d) The same motif in a hierarchical display.
are, in general, around or below 25%. This suggests that
the changes in the input time series are not so large,
that when all of them are taken into account they will
modify the parameters. The number of inputs, the
probabilities of having such number of inputs, and the
mean square difference between the inputs in SK1 and
W03 are all shown in Table II.
Changes in Parameter Values and Their
Phenotypic or Biological Consequences
From the processing that we performed it is apparent
(Table III) that the changes from strain SK1 to strain W03
cause differences in transcription factor binding to DNA,
higher/lower mRNA rates of transcription, or higher/lower
537
PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS
TABLE II. Connectivity and Inputs of Genes with Highly Changed
Parameters
GENE/ORF
Number of Inputs
Probability
Diff SK1 -W03
-
1.07%
0.68%
24.90%
0.28%
10.00%
1.67%
15.90%
10.00%
4.12%
10.00%
0.11%
15.93%
19.60%
28.88%
23.74%
N/A
17.68%
12.10%
19.13%
14.54%
42.17%
25.27%
BAR1/YIL015W
CYC1/YJR048W
FLO5/YHR211W
GAL1/YBR020W
HUG1/ YML058W-A
MF (ALPHA) 1/YPL187W
PDR15/YDR406W
QCR8/YJL166W
STE2/YFL026W
STE5/YDR103W
SUC2/YIL162W
TABLE III. Summary of parameter changes, biophysical causes, functional categories
Parameter change
Gene-pair selection
MIPS category— changed parameter gene/ORF
SK1-W03 change/
measured
Standard
deviations
Possible biophysical cause
YBR049C 3 YBR020W
1 METABOLISM—GAL1/YBR020W
YBR112C 3 YIL162W
1 METABOLISM—SUC2/YIL162W
YBR112C 3 YJR048W
2 ENERGY—CYC1/YJR048W
YBL021C 3 YJL166W
2 ENERGY—QCR8/YJL166W
m 2–33/45
5.8
nHILL 1–3
m 84–7/112
p 1–97
m 39–87/33
THR-
3.1
YBR112C 3 YML058W-A
10 CELL CYCLE — HUG1/YML058W-A
kt 0–2
␣0 0–4
m 58–97
6.0
8.0
YBL005W 3 YDR406W
20 CELLULAR TRANSPORT—PDR15/YDR406W
YCL067C 3 YIL015W- Pheromone Response—BAR1/YIL015W
YBR289W 3 YPL187W- Pheromone Response—MF (ALPHA) 1/YPL187W
YCL067C 3 YFL026W- Pheromone Response—STE2/YFL026W
YCL067C 3 YDR103W- Pheromone Response—STE5/YDR103W
m 5–78/53
4.4
m 2–29/29
3.5
␣ 4–36
4.5
␣ 3–41
3.0
nHILL 3–39
5.2
YBR112C 3 YHR211W
34.07.01 cell–cell adhesion—FLO5/YHR211W
THR-
kt 0–5
3.6
4.6
activation thresholds; they also cause higher/lower protein
rates of translation, or higher/lower degradation rates for
proteins and mRNAs. All this can change the behavior of
an FFL because such parameters determine the functional
characteristics of the FFL.20,22,23 We summarize in Table
III some of the most significant parameter changes in our
analysis. We also identify in Table III the molecular
biophysical characteristics connected with each parameter
change, the MIPS53 functional categories associated with
the genes with highly changed parameters, and some
measured values in yeast strain Y26256 for the mRNA
degradation time constants m. The MIPS categories,
parameter changes, and other biological characteristics for
the highly changed genes were found as described in the
Methods section. They are listed in Table III, displayed in
Figure 6, and examined below.
Changes in mRNA degradation rates. An interesting
parameter modification appearing in several of the
8.7
4.0
mRNA degradation of
GAL1/YBR020W
Transcription cooperativity
for SUC2/YIL162W
Protein degradation
(YBR112C protein)
DNA binding,
cooperativity in
transcription machinery
for QCR8/YJL166W
Translation strength for
YBR112C protein
Basal transcription of
HUG1/YML058W-A
mRNA degradation of
PDR15/YDR406W
mRNA degradation of
BAR1/YIL015W
Transcription rate of MF
(ALPHA) 1/YPL187W
Transcription rate of
STE2/YFL026W
Cooperativity in
transcription machinery
for STE5/YDR103W
DNA binding,
cooperativity in
transcription machinery
for FL05/YHR211W
Translation strength for
YBR112C protein
Fig. 6.
Connectivity and MIPS functional categories of the highly changed selected genes with high variability in their parameters (in green), and their upstream genes (in red).
538
G. CAVELIER AND D. ANASTASSIOU
PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS
539
TABLE IV. Combinatorial input and statistical analysis of parameter fitting
Parameter
[SK1]
[W03]
␣1A
␣1R
␣2A
␣2R
␣0
m
P1
P2
kt1
kt2
nHILL1
nHILL2
THR1
THR2
Parameter
[SK1]
[W03]
␣
␣0
m
P
kt
nHILL
THR
Case 1 PDR15/YDR406W
1 run 2 inputs
1 run 1 input
-
-
-
0.66
1.36
3.33
7.93
-
Case 2
10 runs mean, 2 inputs
1 run 1 input
10 runs mean, 2 inputs
3.13 ⫾ 1.21
8.48 ⫾ 5.02
5.95 ⫾- ⫾ 3.86
7.14 ⫾- ⫾ 8.27
6.32 ⫾ 2.03
5.43 ⫾ 3.36
0.89 ⫾ 0.44
1.26 ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾ 5.10
0.45 ⫾ 0.23
1.06 ⫾ 0.85
0.24 ⫾ 0.11
1.20 ⫾ 1.88
1.58 ⫾ 0.65
5.31 ⫾ 3.81
1.89 ⫾ 0.43
4.58 ⫾- ⫾- ⫾- ⫾- ⫾ 113.35
-
57.49 ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾ 9.02
0.15 ⫾ 0.05
1.01 ⫾ 0.68
0.11 ⫾ 0.05
1.06 ⫾ 0.38
0.84 ⫾ 0.44
2.26 ⫾ 0.48
1.14 ⫾ 0.41
1.61 ⫾- ⫾- ⫾- ⫾- ⫾ 316.88
-
0.41
0.38
0.87
0.98
-
Case 3 FLO5/YHR211W
Case 4
1 run 1 input
10 runs mean, 1 input
1 run 1 input
10 runs mean, 1 input
-
13.57 ⫾ 6.62
7.63 ⫾ 1.98
1.99 ⫾- ⫾ 2.25
3.20 ⫾- ⫾ 1.73
5.89 ⫾- ⫾ 3.06
0.38 ⫾ 0.06
4.67 ⫾ 1.30
7.90 ⫾- ⫾- ⫾- ⫾ 886.70
-
28.76 ⫾- ⫾- ⫾ 0.75
1.14 ⫾ 0.84
8.82 ⫾- ⫾- ⫾- ⫾ 7.55
1.11 ⫾ 0.17
0.90 ⫾ 0.42
2.99 ⫾ 2.99
3.16 ⫾- ⫾- ⫾ 188.36
highly changed genes is that in m. It is increasingly
clear that the cell controls the levels of cellular mRNA
transcripts by regulating the rate at which mRNA
decays.57,58 The decay rates of housekeeping genes are
invariable, but the half-lives of many other mRNAs
change markedly in response to environmental clues. A
principal mRNA degradation path is initiated by removal of the 3⬘ poly(A) tail, an event that facilitates the
job of exonucleases. Poly(A) removal triggers rapid
cleavage of the 5⬘ cap by a decapping enzyme (Dcp1). In
addition, several cis-acting sequence elements can regu-
late the rate of turnover of a transcript, such as the A ⫹
U-rich element (ARE), found in the 3⬘ untranslated
regions (3⬘ UTRs). A plethora of ARE-binding proteins
exists, and they can either promote or inhibit mRNA
decay. Several signaling pathways control the ARE
degradation path, and there has even been found a
connection between degradation rates and translation.57 Microarray measurements of yeast mRNA halflives have been performed56 on strain Y262 (they were
the ones used, converted to time constant values in
minutes, as the measured values of m in Table III).
540
G. CAVELIER AND D. ANASTASSIOU
Strong evidence was found that precise control of the
decay of each mRNA is fundamental for gene expression,
as these decay rates were matched to one another with
high specificity and precision, for example, among the
mRNAs that encode protein subunits of stoichiometric
complexes. A similar precise control of mRNA decay
rates was found in transcripts encoding enzymes related
to a particular physiological function.56 It is therefore
clear that mRNA decay pathways make an important
contribution to the global control of gene expression, but
the mechanisms or their relationship to sequence remain obscure in many cases.59 In the cases found with
our calculations, the changes in decay rates may be due
to differences in coding sequences (there are a lot of
sequence disparities between strains SK1 and W0322),
but they can also be due to variations in noncoding
sequences, as many AREs are in such untranslated
regions. As discussed above, a variety of reasons may
modify the decay rate. Cytoplasmic enzymes in yeast
involved in mRNA decay pathways are: PAN2, PAN3,
CCR4, CAF1, DCP1, DCP2, DCPS, and XRN1; mRNA
decaping factors are DCP1, DCP2, VPS16, PAT1/MRT1/
SPB10, LSM, EDC1, EDC2, UPF1, UPF2/NMD2, UPF3,
and HRP1/NAB4;57 poly(A)-specific deadenylating nuclease is PARN, which interacts with PABP, and this
interacts with the translation initiation factor eIF4G,
which in turn forms a ternary complex with the capbinding protein eIF4E;57 ARE binding proteins are TTP,
UuR, KSRP, NF90, and TIA-1.59 The following genes
have been GO term annotated with mRNA catabolism
by the Gene OntologyTM (GO) Consortium: CBP1, CSL4,
DCP1, DCP2, DIS3, KEM1, LSM7, MRT4, MTR3, NAM7,
NMD2, RRP4, RRP40, RRP42, RRP43, RRP45, RRP46,
SKI2, SKI3, SKI6, SKI7, SKI8, and UPF3. Some of these
overlap with the lists of decay pathways above. There is,
however, no overlap of our present highly changed
parameter gene list with the mRNA catabolism GO term
annotated genes; neither is there overlap with the
mRNA decay pathway-related genes discussed above.
01 METABOLISM: GENES GAL1, AND SUC- Sugar, Glucoside, Polyol, and
Carboxylate Anabolism GAL- C-compound and carbohydrate
utilization SUC2
GAL1/YBR020W. Gene-pair YBR049C 3 YBR020W displays a change in m, and is a variation of the canonical
FFL within a cascade of four levels. The motif is rather
complex, and does not fit in any of the known network
motifs. The modification in m from 1.5 to 32.6 min is
probably due to a variation in the mRNA degradation rate
of GAL1. It is accompanied by a modification in p from 3.4
to 7.4 min. The measured m for GAL1 in strain Y26256 is
44.64 min (97.5–2.5% interval 36.00 –51.84 min). Gene
GAL1 is found in our results as repressed in both strains.
The parameter adjustments are in agreement with a more
delayed response for the slow growth strain W03. Gene
GAL1 is a galactokinase, involved in galactose metabolism. It is controlled by gene REB1/YBR049C, a DNA
binding protein involved in regulation of transcription
from Pol II promoter. Other genes regulated by REB1 are
YAL038W, YBR019C, YBR020W, YCR012W, (metabolism), YDL140C, (transcription from pol II promoter),
YDL164C, (transcriptional activator) YDR050C, (metabolism) YDR146C (transcriptional activator), YER086W
(amino acid biosynthesis), YFL039C (actin), YGL026C
(tryptophan biosynthesis), YGR254W (metabolism),
YKL182W (fatty acid biosynthesis), YMR186W (heat shock
protein), YNL216W (transcription factor activity),
YOL004W (DNA binding protein involved in transcriptional regulation, chromatin silencing), YOL006C (topoisomerase), and YPL231W (fatty acid biosynthesis). They
are all involved in general in metabolism, biosynthesis,
and transcription. The average numerical changes found
in our computations in the value of the parameters from
strain SK1 to strain W03 in this family of genes controlled
by REB1 are: ␣ ⫽ 58%, ␣0 ⫽ 279%, m ⫽ 202%, p ⫽ 157%,
kt ⫽ 358%, nHILL ⫽ 22%, and THR ⫽ 51%. Therefore, the
family of genes identified in the present work by an
adjustment in the mRNA decay rate of GAL1 display as a
set a large difference of their parameter values from strain
SK1 to strain W03, in particular, an increase in the time
constants for mRNA and protein degradation. The parameter modifications in these metabolism-related genes are
in agreement with the sluggishness of the W03 phenotype,
a sluggishness that the cell apparently tries to compensate
by high increases in the rates of transcription and translation.
SUC2/YIL162W. Gene-pair YBR112C 3 YIL162W displays a change in nHILL probably due to differences in DNA
binding strength and/or cooperativity in the transcription
machinery for SUC2. Gene SUC2 is an invertase (sucrose
hydrolyzing enzyme) involved in sucrose catabolism. It
gives in our parameter identification the following changes
from SK1 to W03: nHILL from 1.1 to 3.2, ␣ from 32.5 to 93.3,
␣0 from 1.1 to 4.1, and m from 83.8 to 6.6 min. The
measured m for SUC2 in strain Y26256 is 112.32 min
(97.5–2.5% interval 80.64 –217.44 min). Gene SUC2 is
found in our estimates as repressed in both strains. With
the parameter adjustments found, this metabolismrelated gene pair will behave more briskly in W03 than in
SK1. Therefore, the parameter modifications are not in
general in the direction of slowing the response.
02 ENERGY: GENES CYC1 AND QCR8
02.11 Electron Transport and MembraneAssociated Energy Conservation
CYC1/YJR048W. Gene pair YBR112C 3 YJR048W
displays a change in p, probably due to variations in
protein YBR112C degradation. Gene CYC1 is a Cytochrome c, isoform 1; electron carrier of the mitochondrial
intermembrane space that transfers electrons from ubiquinone– cytochrome c oxidoreductase to cytochrome c oxidase during cellular respiration. It is found in our parameter identification to have variations from SK1 to W03 in p
from 1.4 to 96.7 min, THR from 96.2 to 767.0, ␣ from 7.6 to
34.7 and m from 38.9 to 87.4 min. The measured m for
CYC1 in strain Y26256 is 33.12 min (97.5–2.5% interval
27.36 – 41.76 min). Gene CYC1 is found in our calculations
PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS
as repressed in SK1 but activated in W03. The parameter
modifications found for this gene-pair related to energy
conservation will, in general, induce in W03 a sluggishness
of the response, in agreement with the slow growth
phenotype of strain W03.
QCR8/YJL166W. Gene-pair YBL021C 3 YJL166W displays a change in threshold THR most probably due to a
change in DNA binding strength and/or cooperativity in
the transcription machinery for QCR8. The adjustment in
THR is from 570.16 in SK1 to 7844.69 in W03. Gene QCR8
is an ubiQuinol Cytochrome c (oxido)-reductase subunit 8,
involved in mitochondrial electron transport from ubiquinol to cytochrome c. It is found in our estimates as
repressed in SK1 and activated in W03. These variations
mean a change from repression at low concentrations of
the (transcription factor) gene-product of YBL021C to
activation only at very high concentrations, therefore
contributing to the sluggishness in the response of strain
W03.
10 CELL CYCLE AND DNA PROCESSING:
GENE HUG- Cell Cycle Arrest
HUG1/YML058W-A. Gene pair YBR112C 3 YML058W-A
displays two modifications: kt and ␣0. The variation in kt is
probably due to changes in translation strength for the
YBR112C protein. The variation in ␣0 is probably due to
differences in basal transcription of HUG1. Gene HUG1
gives in our parameter identification changes from SK1 to
W03 as follows: kt from 0.02 to 1.7, THR from 345.1 to
959.9, ␣ from 12.8 to 91.9, ␣0 from 0.05 to 3.8 and m from
58.3 to 96.7 min. No measurement of m for this gene is
found in the decay rate data for strain Y262.56 Gene HUG1
is found in our results as repressed in SK1 but activated in
W03. The parameter adjustments found for this cell cyclerelated gene pair are not all in the direction of increasing;
neither are they all in the direction of decreasing the
sluggishness of the W03 response. In general, it is a
stronger response, but with an increased delay due to the
increase in m.
20 CELLULAR TRANSPORT: GENE PDR- ABC Transporters
PDR15/YDR406W. Gene pair YBL005W 3 YDR406W
displays a variation in m for gene PDR15 from 4.5 to 77.8
min. The measured m for PDR15 in strain Y26256 is 53.28
min (97.5–2.5% interval- min). The higher m
value found in our estimates for W03 means that mRNA
concentration for gene PDR15 can be modified much more
rapidly in SK1 than in W03. This is indeed observed in the
time courses of these concentrations (not shown). It also
means that this change may be influencing the slowness of
W03 growth. It is therefore clear that the sequence disparity between SK1 and W03 in this cellular transport-related
gene has brought a considerable delay in gene expression
by making the decay rate much lower in W03. Thus, in
strain W03 gene PDR15 shows a considerably repressed
and delayed response to gene YBL005W compared to that
in SK1. Our analysis identifies gene PDR15 as implicated
in a change in mRNA turnover, although this gene has not
541
been previously identified in decay analysis. Therefore, its
involvement in a modification in turnover may have been
previously overlooked.
Gene PDR15 corresponds to protein Pdr15p, an ATPbinding Cassette (ABC) protein, a general stress response
factor implicated in cellular detoxification60 (PDR: Pleiotropic Drug Resistance). In a study60 made on yeast strain
W03-1A and some mutants, it was shown that Pdr15
becomes most abundant when cells exit the exponential
growth phase, an endogenous metabolic stress. PDR15 is
strongly induced by various other stress conditions like
heat shock, low pH, weak acids, or high osmolarity.60 Its
induction requires the general stress regulator Msn2p,
which directly decorates the stress response elements
(STREs) in the PDR15 promoter.60 This demonstrates a
crosstalk between PDR and general stress response. Pdr15p
has never been isolated in genetic screens for drug resistance, and while several genomic approaches hint at
PDR15 as a gene modulated by stress conditions, microarray data correlation with its functions had not been
previously found.60 Levels of Pdr15p remained high even
when the mRNA levels decreased.60 In our case, the
decrease in decay rate in strain W03 will help to maintain
the levels of PDR15 mRNA compared to those of SK1.
We have therefore found that the mRNA decay rate of
gene PDR15 is reduced in the slow growth phenotype in
strain W03 with respect to strain SK1. W03 probably
suffers from stress related conditions, while strain SK1
has a rapid turnover for this gene, in agreement with a
rapidly growing strain (can change rapidly upon induction
or repression). These facts raise the interesting possibility
that strain W03 is, at least in part, a slow grower because
of a stress response to its sequence. For example, several
ARE binding proteins are modulated by heat shock,57 and
mRNA turnover has been found very important in regulating gene expression following several conditions of stress.58
PDR15 cannot be found in any of the previously performed
clustering analysis in yeast found in the literature: It is
not found (in the article giving measurement during
sporulation17) among the genomic variations between
strains SK1 and W03, nor in any of the temporal clusters
found by analysis of the sporulation expression patterns.
When the parameters in the accompanying genes controlled by YBL005W (which are in FFL motifs nearby to
PDR15) are analyzed with our method, it is found that
they have also important changes. These genes are
YDR011W, YDR406W, YGL013C, YGR281W, YJL219W,
YOL156W, YOR153W, and YOR328W. They are controlled by a PDR gene and they by themselves are PDR
genes. The average numerical increases in the value of the
parameters from strain SK1 to strain W03 in this family of
8 PDR genes controlled by YBL005W are ␣ ⫽ 26%, ␣0 ⫽
121%, m ⫽ 357%, p ⫽ 318%, kt ⫽895%, nHILL ⫽ 34%, and
THR ⫽ 326%. There is, therefore, a massive average
increase in the parameters of this group of PDR genes.
These alterations, on one hand, are responsible for a delay
in the dynamics, and on the other hand, partially compensate for this sluggishness by increasing the transcription
and translation rates. The slow growth phenotype of strain
542
G. CAVELIER AND D. ANASTASSIOU
W03 may be therefore related to the stress response in this
family of cellular transport-related genes.
34 INTERACTION WITH THE CELLULAR
ENVIRONMENT: GENES BAR1, FLO5,
MF(ALPHA)1, STE2 AND STE- cell– cell adhesion FLO- Pheromone response BAR1,
MF(ALPHA)1, STE2, and STE5
FLO5/YHR211W. Gene-pair YBR112C 3 YHR211W
displays two highly changed parameters: the threshold
THR and the protein translation rate kt. The threshold
THR alteration is probably due to adjustments in DNA
binding strength and/or cooperativity in the transcription
machinery of FLO5. The variation in kt is probably due to a
translation strength modification of the YBR112C protein.
Gene FLO5 is a lectin-like protein involved in flocculation,
and a cell wall protein that binds to mannose chains on the
surface of other cells. It gives in our findings the following
changes from SK1 to W03: THR from 706.22 to 8964, kt
from 0.06 to 5.0, nHILL from 3.1 to 16.8, ␣0 from 0.5 to 11.6,
and m from 16.7 to 11.0 min. The m for FLO5 in strain
Y26256 could not be obtained because of poor data. Gene
FLO5 is found in our results as activated in both strains.
The parameter modifications found for this gene pair will
cause a stronger and somewhat faster response for strain
W03, but only at very high concentrations of gene product
YBR112C. Therefore, they are shown to contribute to the
slow growth phenotype of strain W03.
BAR1/YIL015W. Gene-pair YCL067C 3 YIL015W displays a change in m probably due to alterations in the
BAR1 gene mRNA degradation. Gene BAR1 is an aspartyl
protease secreted into the periplasmic space of mating
type a cells; it cleaves and inactivates alpha factor allowing cells to recover from ␣-factor–induced cell cycle arrest.
The variation in gene BAR1 corresponds to a change from
SK1 to W03 in m from 2.0 to 28.6 min, and in p from 4.0 to
48.7 min. Gene BAR1 is found in our estimates as repressed in both strains. Measurement of m for BAR1 in
strain Y26256 could not be done due to poor data quality,
while its measured deadenylation m is 28.80 min (97.5–
2.5% interval 21.60 – 40.32 min). The parameter adjustments found for this pheromone response gene pair in
general will give a more delayed response in W03, in
agreement with the slow growth W03 phenotype.
MF(ALPHA)1/YPL187W. Gene pair YBR289W 3
YPL187W displays a modification in ␣, probably due to an
adjustment in the transcription rate of MF(ALPHA)1,
which in turn, is perhaps due to changes in DNA binding
strength. Gene MF(ALPHA)1 codes for the mating ␣-factor
pheromone. It gives in the present parameter evaluation a
difference from SK1 to W03 in m from 3.5 to 36.2 min, and
p from 45.3 to 92.9 min. The measured m for MF(ALPHA)1 in strain Y26256 is 11.52 min (97.5–2.5% interval
10.08 –14.40 min). Gene MF(ALPHA)1 is found in our
evaluation as repressed in both strains. The parameter
modifications found here for this pheromone response gene
will in general give a more delayed response for W03.
STE2/YFL026W. Gene pair YCL067C 3 YFL026W displays a modification in ␣. Gene STE2 is an ␣-factor
pheromone receptor; seven-transmembrane domain protein (controlled by HMLALPHA2/YCL067C). It displays in
our parameter identification a change in ␣ from 3.2 in SK1
to 41.0 in W03 probably due to differences in its transcription rate, possibly by alteration of the DNA binding
strength. It also displays a variation in p from 4.5 to 22.0
min. It is found in our results as activated in SK1, but
repressed in W03. It displays as well a change in m from
15.7 to 23.0 min. Measurement of m for STE2 in strain
Y26256 was 64.80 min (97.5–2.5% interval-
min). According to the results of our parameter identification, this pheromone response gene will have a strongly
repressed and more delayed response in W03 in agreement
with the slow growth W03 phenotype.
STE5/YDR103W. Gene pair YCL067C 3 YDR103W
displays a change in nHILL, probably due to an adjustment
in cooperativity in the transcription machinery for STE5.
Gene STE5 (controlled by HMLALPHA2/YCL067C) is a
protein of the pheromone pathway with MAP-kinase scaffold activity. It is a scaffold protein that, in response to
pheromone, shuttles from the nucleus to the plasma
membrane and assembles kinases Ste11p, Ste7p, and
Fus3p into a specific signaling complex. Gene STE5 gives
in the present parameter estimates the following modifications from SK1 to W03: nHILL from 3.1 to 38.6, p from 2.8
to 23.5 min, and m from 27.7 to 7.8 min. The measured m
for STE5 in strain Y26256 is 20.16 min (97.5–2.5% interval
17.28 –24.48 min). Gene STE5 is found in our results as
repressed in SK1 but activated in W03. The parameter
variations found by our identification therefore show a
pheromone response kinase scaffold gene pair activated in
W03, which will be slower in the protein subsystem
(YCL067C gene product) due to the lower protein decay
rate, but more rapid in the mRNA subsystem (gene
HMLALPHA2/YCL067C) due to the higher mRNA decay
rate and the higher nHILL.
We therefore have found in addition to BAR1 three other
genes that belong to the ␣-factor pheromone pathway.
They are controlled by HMLALPHA2/YCL067C, and are
found in our study with high parameter changes: STE2,
STE5, and MF(ALPHA)1. They undergo on average the
following increases in their parameters from SK1 to W03:
␣ 377%, ␣0 ⫽ 119%, m ⫽ 398%, p ⫽ 588%, nHILL ⫽ 235%.
They show in our evaluations in general an agreement
with the slow growth phenotype of strain W03. This
ensemble is downstream of gene REB1, which also controls
GAL1 of the Metabolic (galactose) pathway (see Fig. 6).
In summary, we identify with our procedure five functional groups with important adjustments in their parameter values, which we can relate to as cooperating in
general to the phenotypical behaviors of the corresponding
yeast strains (changes in parameter values from the
faster, more synchronously sporulating strain SK1 to the
slow growth strain W03). The modifications found by our
identification correspond mainly to increases in the time
constants values and some increases in their transcription
and translation rates.
PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS
Results of Present Method Compared to Other
Functional Analysis Methods
The foregoing analysis of changes in parameters leading
to the identification of functional groups of genes is not
based on clustering analysis. Nonetheless, a comparison
can be made with results from clustering or other functional analysis methods as applied to this and similar
systems. Genes PDR15, GAL1, SUC2, BAR1, MF(ALPHA)1, STE2, STE5, FLO5, and HUG1 found in this study
are not in any of the clusters found by Primig et al.,17
clusters that were obtained by analysis of the same data.
Therefore, the important influence of their parameter
variations in the dynamics of functional responses would
not be found by application of traditional clustering methods. Only one of the genes found in the present study, gene
CYC1, is found in one of the clusters of Primig et al.17
(cluster 1, which contains stress-response genes and metabolic genes required for nitrogen uptake, amino acid
biosynthesis, and mitochondrial function17).
None of the genes found with our method is found in the
classes “Sporulation-Deficient Class,” “Enhanced Sporulation Efficiency Class,” and “Postgermination Growth Defective” in the Sporulation Phenotype and Clustering study
by Deutschbauer et al.61 Another study25 used genomewide location analysis experiments to find network motifs
in yeast and their functional assembling across the genome. From the genes selected in the present study, three
were also found in these motifs:25 genes BAR1 and STE2
were found in the Multiple Input Module (MIM) No. 93,
with regulators MCM1 and STE12, and eight potentially
regulated genes: BAR1, YDR179C, YDR179W-A, MFA2,
STE2, YFL027C, AGA2, and SIM1. Gene GAL1 was found
in the Single Input Motifs (SIM) regulated by Gal4, with
genes YLR408C, GAL7, FUR4, GAL80, GAL10, GAL1,
ORC2, YDR008C, and GAL3.
A more sophisticated, recently published method employs not only functional information from expression
data, but also links genes to the factors that regulate them
by incorporating DNA binding data.14 This method enriches previous results, giving a large number of functional
modules. Indeed, this method found gene STE2 (a pheromone response gene located also in our study) in cluster
73. The set of factors that regulate this module are MCM1
and STE12. Gene STE2 in this module14 is described as a
response to pheromone during conjugation with cellular
fusion, mating-type ␣-factor pheromone receptor activity,
integral to plasma membrane. The MIPS sub-categories
cited are- pheromone response, mating-type
determination, sex-specific proteins; 40.27 extracellular/
secretion proteins; and 14.04 cell differentiation. The
method14 also finds gene GAL1 (a metabolism gene uncovered as well in our study) in cluster 106 with regulator
GAL4. Gene GAL1 in this module is described as galactose
metabolism galactokinase activity. The MIPS subcategories cited are 01.05 C-compound and carbohydrate metabolism; 01 Metabolism; and 01.05.01 C-compound and carbohydrate utilization. In the rapamycin regulatory network
modules14 other genes appeared. Gene STE5 (a pheromone response gene found in our study) regulated by
543
RTG1 and MSN2, was found in cluster 19. It is described
as signal transduction during conjugation with cellular
fusion, MAP-kinase scaffold activity. The MIPS subcategories cited14 are:- pheromone response; 13.11
cellular sensing and response; and 13.11.03 chemoperception and response.
Some of the genes selected by the present study have
been therefore found by other microarray functional data
analysis methods (cluster or motif analysis) mainly when
they include in addition to clustering some other sophisticated procedures. By themselves, clustering techniques
only find one gene in common with the present study. In all
the other studies some of the MIPS categories overlap with
functional categories discovered in the present one, and
some genes are detected in common with our list of highly
changed genes. However, no dynamic parameters were
found, and the relationships of the parameter modifications with the biophysics of transcription, translation, and
degradation processes, with phenotype and with genetic
functional categories were not analyzed as it was done
here.
Results of an Improved Model with Two-Input
Motifs and Fitting Statistics
We chose to refine the model including a module with
two inputs in some of the cases where it is known that
more than one transcription factor affects transcription, as
described above in the Methods section. However, as
discussed also in the Methods section, there is still an
ambiguity due to the low sampling rate used to obtain the
data, so that these fittings may have undetected errors or
noise-induced variations. This uncertainty may increase
because of the higher number of parameters when two or
more inputs are considered while the number of samples
remains the same. As shown below and in Table IV,
running the parameter identification routine several times
and performing a statistical analysis will give confidence
intervals and will show how good the parameter fit is.
The large number of inputs found in our set of highly
changed genes has, in general, a very low probability of
occurrence, as shown in Table II. It might be that the
parameter changes in the models for the selected genes are
due to such high number of inputs, inputs that were not
taken into account in the initial evaluation with only one
input. To analyze this possibility, we performed 10 recalculations of the parameters with two inputs in cases in which
the initial calculation was performed only with one input
(Table IV, CASE 1 and CASE 2). The error obtained in
these cases was about the same with one input as with two
inputs, but the repeated calculations with two inputs
moderate the noise effects and give an indication of the
robustness of the parameter fit with the 95% confidence
intervals obtained for the mean parameter values. It was
established as discussed below and shown in Table IV that
the gene pairs with highly changed parameters evaluated
with one input maintain in general this high change in the
parameter values when estimated with two inputs. They
give, however, parameter changes of lesser magnitude,
544
G. CAVELIER AND D. ANASTASSIOU
and in some cases, the change has migrated to one or
several different/new parameters.
A final issue examined was the validity of the model and
the fitting approach, that is, the statistical robustness of
the fit, the confidence intervals for the parameters, and the
uniqueness of the fit. To examine these issues, we analyzed 10 recalculations of the parameters in several additional cases, both with one input only and with two inputs
(Table IV). Because of the inherently random nature of the
evolutionary algorithm procedure, these recalculations
were necessarily run with different initial conditions and
through different pathways in the optimization to the
minimum, yielding slightly different values for the parameters. The results show that a single run of the optimization may give too large a variation, while an average of
several runs will alleviate the noise effects and give an
indication of the robustness of the fit by providing the 95%
confidence intervals for the mean value of the parameters,
as shown in that table.
For example, in the first case analyzed in Table IV
(CASE 1) the gene pair was YBL005W 3 YDR406W
corresponding to the ABC transporter PDR15 examined
before (MIPS category 20 CELLULAR TRANSPORT).
This gene pair had a change in m for gene PDR15 of
1631.70% (from 4.49 to 77.82 min) using only one input,
and the average variation in all the parameters was
304.30% with a standard deviation of 586.60. When a
second input was taken into consideration (YGL013C 3
YBL005W 3 YDR406W, one run, two inputs) the mean
square error was approximately the same (17% for SK1
and 7% for W03). The modification in m was found to be
only 112.09% (from 29.19 to 61.91 min) but the mean
adjustment of all the parameters was 9722.70% with a
standard deviation of-. This clearly indicates an
overfitting of the parameters. When the parameter identification was run 10 times, an adjustment in m from SK1 to
W03 of 111.03% was found (from an average of 14.28 ⫾
6.41 to an average of 30.13 ⫾ 16.19 min, with a 95%
confidence interval for the means) and an average change
in all the parameters of 117.52% with a standard deviation
of 98.14. This shows that the trend in the parameter
modifications is conserved in this gene pair from the case
of evaluation with only one input to the case with two
inputs. It also shows that running the optimization several
times is better, as it moderates the noise and provides a
confidence interval for the mean of the parameter values.
In the second case analyzed in Table IV (CASE 2) the
gene pair was YBL021C 3 YNR001C. This gene pair did
not have highly changed parameters when the assessment
was done with only the input YBL021C. However, it gave
somewhat modified parameters when run 10 times with
the two inputs YBL021C and YBL103C. For example m
changed 494.88% (from an average of 3.40 ⫾ 2.25 min to an
average of 20.21 ⫾ 5.36 min, with a 95% confidence
interval for the means). An average change in all the
parameters of 182.73% with a standard deviation of 257.76
was found. The error in each of the cases is approximately
the same. Therefore, in the case of several parameter
optimizations, the mean and its confidence interval will be
a better assessment of the change in parameter values.
In the third case analyzed in Table IV (CASE 3), the
gene pair was YBR112C 3 YHR211W, which has only one
input. One modification discovered with the 10 evaluations
is in kt for the gene product of YBR112C (from an average
of 0.38 ⫾ 0.06 to an average of 4.67 ⫾ 1.30, with a 95%
confidence interval for the means). A second variation is in
THR for gene FLO5 (from an average of 803.06 ⫾ 152.72 to
an average of 8834.98 ⫾ 886.70, with a 95% confidence
interval for the means). This gene pair therefore also
maintains the trend in the parameter changes found with
only one run of the parameter identification.
The fourth case analyzed in Table IV (CASE 4) corresponds to the gene pair YBR083W 3 YOR077W, which did
not have a high change in its parameters initially with
only one input (it does not have more inputs) and one run
of the optimization. When the optimization was run 10
times with the sole input, a modification in m was found
(from an average of 8.82 ⫾ 1.39 min to an average of
44.23 ⫾ 19.36 min, with a 95% confidence interval for the
means). This shows that the optimization run 10 times
provides a better assessment of the parameter changes.
These evaluations prove that the general trends are
likely to be conserved from the case of one run to the case of
several runs of the parameter identification. However, to
rely only on the error metric of one run of the parameter
identification is not enough to guarantee a good fit, as
overfitting can give an equally low error but with an
unacceptable large variance of the parameters. This uncertainty can be reduced with more parameter evaluations
and the resulting robustness measured with the fitting
statistics proposed. A higher sampling rate would also
improve the fit, but at a resulting higher experimental
cost.
CONCLUSIONS
By repeating the parameter identification procedure
taking one gene pair at a time, covering subnetworks
across the genome of an organism, one can construct a
database of parameters and a corresponding model of
increasing complexity from the accumulated results. A set
of network motifs associated with the highly changed
parameters can be found as described, and their properties
analyzed with respect to phenotype-related biological
changes, network topology and connectivity, and associated genetic functional categories. The biophysical and
structural causes of the parameter modifications can be
studied, and their relationship with the transcription,
translation, and degradation processes uncovered. The
results obtained in the yeast subnetworks studied reveal
(compared to traditional methods such as clustering or
more sophisticated genetic functional analysis methods) a
number of additional genes and the properties of their
associated functional categories and network connectivities.
The method can be improved by including more inputs
when there is more than one transcription factor influencing transcription. There are in these cases not only the
PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS
usual system and measurement noises, but also parameter
uncertainty effects due to the limited amount of time
samples that is usually possible to obtain with microarray
measurements (underdetermined system). More time
samples or a statistical analysis performed over repeated
evaluations with the method can help to alleviate such
noise consequences and the parameter uncertainty effects.
The underlying processes may be further studied in the
future at the molecular and atomic scale using X-ray
structural details and the thermodynamical description of
the model, as explained above. For example, degradation
of mRNA and proteins, transcription factor binding to
DNA, binding cooperativity, mRNA transcription, and
translation rate strengths can be related to X-ray structural details of mutations or other sequence variations,
posttranslational modifications, macromolecular covalent
modifications, or protein misfolding. Similar analyses can
be imagined in cases where high-density oligonucleotide
microarray expression patterns over different tissues,
phenotypes, and organisms become available.
14.
15.
-.
ACKNOWLEDGMENTS
22.
The authors thank one of the anonymous reviewers for
very helpful comments that improved the manuscript, the
laboratory of U. Alon, in particular R. Milo, and the
laboratory of G. Church for very useful observations. They
also wish to thank the Cytoscape (http://www.cytoscape.
org/) development team, in particular, E. Cerami, for their
valuable comments and the software.
23.
24.
25.
REFERENCES
1. Reinitz J, Sharp DH. Mechanism of eve stripe formation. Mech
Dev 1995;49:-. Chen T, He HL, Church GM. Modeling gene expression with
differential equations. In: Altman RB, Dunker AK, Hunter L,
Klein TE, editors. BIOCOMPUTING ‘99 —Proceedings of the
Pacific Symposium 1999, Maui, Hawaii, USA. River Edge, NJ:
World Scientific Publishing Co., Inc.; 1999. p 29 – 40.
3. Gilman A, Arkin AP. Genetic “code”: representations and dynamical models of genetic components and networks. Annu Rev Genomics Hum Genet 2002;3:-. Ronen M, Rosenberg R, Shraiman BI, Alon U. Assigning numbers
to the arrows: parameterizing a gene regulation network by using
accurate expression kinetics. Proc Natl Acad Sci USA 2002;99:-. Hasty J, McMillen D, Collins JJ. Engineered gene circuits. Nature
2002;420:224 –230.
6. Kitano H. Computational systems biology. Nature 2002;420:206 –
210.
7. Ackers GK, Johnson AD, Shea MA. Quantitative model for gene
regulation by lambda phage repressor. Proc Natl Acad Sci USA
1982;79:1129 –1133.
8. Shea MA, Ackers GK. The OR control system of bacteriophage
lambda. A physical-chemical model for gene regulation. J Mol Biol
1985;181:-. Alm E, Arkin AP. Biological networks. Curr Opin Struct Biol
2003;13:-. Cavelier G, Anastassiou D. Data-based model and parameter
evaluation in dynamic transcriptional regulatory networks. Proteins Struct Funct Bioinformat 2004;55:339 –350.
11. Yeung MK, Tegner J, Collins JJ. Reverse engineering gene
networks using singular value decomposition and robust regression. Proc Natl Acad Sci USA 2002;99:-. Tegner J, Yeung MK, Hasty J, Collins JJ. Reverse engineering
gene networks: integrating genetic perturbations with dynamical
modeling. Proc Natl Acad Sci USA 2003;100:5944 –5949.
13. Gardner TS, di Bernardo D, Lorenz D, Collins JJ. Inferring
26.
27.
-.
545
genetic networks and identifying compound mode of action via
expression profiling. Science 2003;301:102–105.
Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F,
Gordon DB, Fraenkel E, Jaakkola TS, Young RA, Gifford DK.
Computational discovery of gene modules and regulatory networks. Nat Biotechnol 2003;21:-.
Zak DE, Gonye GE, Schwaber JS, Doyle FJ 3rd. Importance of
input perturbations and stochastic gene expression in the reverse
engineering of genetic regulatory networks: insights from an
identifiability analysis of an in silico network. Genome Res
2003;13:2396 –2405.
Nachman I, Regev A, Friedman N. Inferring quantitative models
of regulatory networks from expression data. Bioinformatics
2004;20(Suppl 1):I248 –I256.
Primig M, Williams RM, Winzeler EA, Tevzadze GG, Conway AR,
Hwang SY, Davis RW, Esposito RE. The core meiotic transcriptome in budding yeasts. Nat Genet 2000;26:415– 423.
Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO,
Herskowitz I. The transcriptional program of sporulation in
budding yeast. Science 1998;282:699 –705.
Wolf DM, Arkin AP. Motifs, modules and games in bacteria. Curr
Opin Microbiol 2003;6:125–134.
Shen-Orr SS, Milo R, Mangan S, Alon U. Network motifs in the
transcriptional regulation network of Escherichia coli. Nat Genet
2002;31:64 – 68.
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U.
Network motifs: simple building blocks of complex networks.
Science 2002;298:824 – 827.
Mangan S, Alon U. Structure and function of the feed-forward loop
network motif. Proc Natl Acad Sci USA 2003;100:11980 –11985.
Mangan S, Zaslaver A, Alon U. The coherent feedforward loop
serves as a sign-sensitive delay element in transcription networks. J Mol Biol 2003;334:197–204.
Rosenfeld N, Alon U. Response delays and the structure of
transcription networks. J Mol Biol 2003;329:645– 654.
Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK,
Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J,
Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne
JB, Volkert TL, Fraenkel E, Gifford DK, Young RA. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science
2002;298:799 – 804.
Guelzim N, Bottani S, Bourgine P, Kepes F. Topological and
causal structure of the yeast transcriptional regulatory network.
Nat Genet 2002;31:60 – 63.
Qian J, Lin J, Luscombe NM, Yu H, Gerstein M. Prediction of
regulatory networks: genome-wide identification of transcription
factor targets from gene expression data. Bioinformatics 2003;19:-.
Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks.
Bioinformatics 2002;18(Suppl 1):S233–S240.
Bussemaker HJ, Li H, Siggia ED. Regulatory element detection
using correlation with expression. Nat Genet 2001;27:167–171.
Oobatake M, Kono H, Wang Y, Sarai A. Anatomy of specific
interactions between lambda repressor and operator DNA. Proteins 2003;53:33– 43.
Asturias FJ, Craighead JL. RNA polymerase II at initiation. Proc
Natl Acad Sci USA 2003;100:-.
Armache KJ, Kettenberger H, Cramer P. Architecture of initiationcompetent 12-subunit RNA polymerase II. Proc Natl Acad Sci
USA 2003;100:6964 – 6968.
Bushnell DA, Kornberg RD. Complete, 12-subunit RNA polymerase II at 4.1-A resolution: implications for the initiation of
transcription. Proc Natl Acad Sci USA 2003;100:6969 – 6973.
Rosenfeld N, Elowitz MB, Alon U. Negative autoregulation speeds
the response times of transcription networks. J Mol Biol 2002;323:
785–793.
Misteli T. The flow of gene expression. Nat Struct Mol Biol
2004;11:202–205.
Elowitz MB, Leibler S. A synthetic oscillatory network of transcriptional regulators. Nature 2000;403:335–338.
Elowitz ME. Transport, assembly, and dynamics in systems of
interacting proteins. Princeton, NJ: Princeton University; 1999.
von Hippel PH. An integrated model of the transcription complex
in elongation, termination, and editing. Science 1998;281:660 –
665.
Bushnell DA, Westover KD, Davis RE, Kornberg RD. Structural
546
40.
41.
-.
G. CAVELIER AND D. ANASTASSIOU
basis of transcription: an RNA polymerase II-TFIIB cocrystal at
4.5 Angstroms. Science 2004;303:983–988.
Dundr M, Hoffmann-Rohrer U, Hu QY, Grummt I, Rothblum LI,
Phair RD, Misteli T. A kinetic framework for a mammalian RNA
polymerase in vivo. Science 2002;298:-.
Algire MA, Maag D, Savio P, Acker MG, Tarun SZ, Jr., Sachs AB,
Asano K, Nielsen KH, Olsen DS, Phan L, Hinnebusch AG, Lorsch
JR. Development and characterization of a reconstituted yeast
translation initiation system. RNA 2002;8:382–397.
Lipford JR, Deshaies RJ. Diverse roles for ubiquitin-dependent
proteolysis in transcriptional activation. Nat Cell Biol 2003;5:845–
850.
Kiehl TR, Mattheyses RM, Simmons MK. Hybrid simulation of
cellular behavior. Bioinformatics 2004;20:316 –322.
Morton-Firth CJ, Bray D. Predicting temporal fluctuations in an
intracellular signalling pathway. J Theor Biol 1998;192:117–128.
Das B, Butler JS, Sherman F. Degradation of normal mRNA in the
nucleus of Saccharomyces cerevisiae. Mol Cell Biol 2003;23:-.
Levine M, Tjian R. Transcription regulation and animal diversity.
Nature 2003;424:147–151.
McAdams HH, Shapiro L. A bacterial cell-cycle regulatory network operating in time and space. Science 2003;301:1874 –1877.
Bray D. Molecular networks: The top-down view. Science 2003;301:
1864 –1865.
Vilar JM, Leibler S. DNA looping and physical constraints on
transcription regulation. J Mol Biol 2003;331:981–989.
Setty Y, Mayo AE, Surette MG, Alon U. Detailed map of a
cis-regulatory input function. Proc Natl Acad Sci USA 2003;100:-.
Buchler NE, Gerland U, Hwa T. On schemes of combinatorial
transcription logic. Proc Natl Acad Sci USA 2003;100:5136 –5141.
52. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D,
Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks.
Genome Res 2003;13:2498 –2504.
53. Mewes HW, Frishman D, Güldener U, Mannhaupt G, Mayer K,
Mokrejs M, Morgenstern B, Münsterkoetter M, Rudd S, Weil B.
MIPS: a database for genomes and protein sequences. Nucleic
Acids Res 2002;30:31–34.
54. Kashtan N, Itzkovitz S, Milo R, Alon U. Topological generalizations of network motifs, arXiv:q-bio.MN/- v3 16 May 2004.
55. Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N.
Revealing modular organization in the yeast transcriptional network. Nat Genet 2002;31:370 –377.
56. Wang YL, Liu CL, Storey JD, Tibshirani RJ, Herschlag D, Brown
PO. Precision and functional specificity in mRNA decay. Proc Natl
Acad Sci USA 2002;99:5860 –5865.
57. Wilusz CJ, Wormington M, Peltz SW. The cap-to-tail guide to
mRNA turnover. Nat Rev Mol Cell Biol 2001;2:-. Fan JS, Yang XL, Wang WG, Wood WH, Becker KG, Gorospe M.
Global analysis of stress-regulated mRNA turnover by using
cDNA arrays. Proc Natl Acad Sci USA 2002;99:-. Wilusz CJ, Wilusz J. Bringing the role of mRNA decay in the
control of gene expression into focus. Trends Genet 2004;20:-. Wolfger H, Mamnun YM, Kuchler K. The Yeast Pdr15p ATPbinding cassette (ABC) protein is a general stress response factor
implicated in cellular detoxification. J Biol Chem 2004;279:-. Deutschbauer AM, Williams RM, Chu AM, Davis RW. Parallel
phenotypic analysis of sporulation and postgermination growth in
Saccharomyces cerevisiae. Proc Natl Acad Sci USA 2002;99:15530 –
15535.