German Cavelier | Freelancer Systems Biology

Systems Biology

PROTEINS: Structure, Function, and Bioinformatics 60:525–546 (2005) Phenotype Analysis Using Network Motifs Derived from Changes in Regulatory Network Dynamics German Cavelier1 and Dimitris Anastassiou1,2* 1 Genomic Information Systems Laboratory, Department of Electrical Engineering 2 Columbia Genome Center and Center for Computational Biology and Bioinformatics, Columbia University, New York, New York ABSTRACT The intrinsic dynamic response of a transcriptional regulatory network depends directly on molecular interactions in the cellular transcription, translation, and degradation machineries. These interactions can be incorporated into dynamic mathematical models of the biochemical system using the biophysical relationship with the model parameters. Modiﬁcations of such interactions bring changes to the biological behavior of the cells, and therefore, many normal and pathological cellular states depend on them. It is important for analysis, prediction, diagnosis, and treatment of cellular function to have an experimentally derived model with parameters that adequately represent the molecular interactions of interest. Finding the model and parameters of a transcriptional regulatory network is a difﬁcult task that has been approached at different levels and with different techniques. We develop here a new analysis method (based on previous work on network inference, modeling, and parameter identiﬁcation) that ﬁnds the most changed parameters from yeast oligonucleotide microarray expression patterns in cases where a phenotype difference exists between two samples. We then relate and examine the changed parameters with their associated genes, corresponding genetic functional categories, and particular subnetworks and connectivities. The biophysical bases for these changes are also identiﬁed by studying the relationship of the changed parameters with the transcription, translation, and degradation mechanisms. The method is improved to cases where there are two or more transcription factors inﬂuencing transcription, and a statistical analysis is performed to give a measurement of the uniqueness and robustness of the parameter ﬁt. Proteins 2005; 60:525–546. © 2005 Wiley-Liss, Inc. Key words: network motif; genotype; phenotype; transcription factor; transcription; regulatory networks; evolutionary algorithms; high-density oligonucleotide microarrays INTRODUCTION The goal of this work is to identify subcomponents of yeast regulatory networks that exhibit biophysically based changes in dynamic response between two strains. This is © 2005 WILEY-LISS, INC. achieved by using microarray data to ﬁt the parameters of a simpliﬁed dynamical model of transcription, translation, and macromolecular degradation. Parameters that exhibit the greatest alterations between ﬁtted models for the two strains are identiﬁed, and the genes associated with those parameters are recognized. Finally, the subnetworks in which those genes are embedded are identiﬁed to examine their connectivity features, their associations with known functional genetic categories and with the yeast strain phenotypes. The secondary goal of the work is to attempt to identify the physical basis for the parameter differences by examining their relationship to the biophysical mechanisms represented in the model. Because regulatory networks are usually large nonlinear systems, the task of ﬁnding even their topology alone is computationally daunting. For a full description of dynamics in the network, a model for the underlying biochemical processes must be ﬁrst mathematically described. Early reports of gene circuit analysis1 formulated a phenomenological model and used simulated annealing for identiﬁcation of its parameters. Another modeling approach by Chen and Church2 provides a good theoretical treatment of gene expression with differential equations and a complete analysis with a linearized version. This same approach has been used in additional efforts without linearization.3,4 Some issues of the required mathematical representation5 have been depicted elsewhere,3,6 –9 as well as in our previous article10 where we used a model based on nonlinear ordinary differential equations (ODEs). Once the mathematical representation is chosen for the biochemical processes, the network connectivity must be speciﬁed. Important prior work in network inference from experimental data is found in the literature. One recent approach11,12 (initially tested with in numero experiments) to reverse engineering the network topology used a linear differential equation model of transcription with linear mRNA degradation, and applied singular value Abbreviations: GFP, green ﬂuorescent protein; RNAP, RNA polymerase. *Correspondence to: D. Anastassiou, Columbia Genome Center and Center for Computational Biology and Bioinformatics, Columbia University, Genomic Info Systems Lab, Department of Electrical Engineering, New York, NY 10027. E-mail:-Received 17 April 2004; Revised 25 November 2004; Accepted 1 February 2005 Published online 21 June 2005 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.20538 526 G. CAVELIER AND D. ANASTASSIOU decomposition, robust regression, and transcriptional perturbations. The model included translation only in a functional, nondetailed way. This method was applied to experimental data13 in the SOS pathway in Escherichia coli. Steady-state transcription proﬁles were measured from this pathway to get a ﬁrst-order description of the regulatory interactions. A more recently developed technique to reconstruct genetic networks in yeast included a computational analysis of gene modules incorporating DNA binding measurements14 in addition to functional information from expression data. Yet another recent in silico analysis15 shows details of the modeling for the mathematical equations of the network, as well as results regarding the identiﬁability of their parameters. In this case, a known topology was used together with detailed biochemical equations for transcription factor dimerization and binding, a nonlinear description of transcription regulation, and linear representations of translation and macromolecular degradation. Even more difﬁcult is to assign experimentally based numerical values to the parameters of the mathematical equations describing the network dynamics and connectivity. An approach beginning with a known network topology4 used singular value decomposition and nonlinear optimization to get the parameters of a nonlinear transcription model of promoter activity (i.e., not including mRNA degradation, protein translation, or protein degradation). Another interesting approach16 used a different modeling based on the nonlinear Michaelis-Menten form for representing regulation of transcription (transcription factor binding) without cooperativity effects.10 This representation was then used into dynamic Bayesian networks, and gradient ascent optimization for parameter identiﬁcation was applied. The procedure is based on a given dataset of gene transcription rates, and on knowing a priori the values of mRNA degradation rates. It is quite powerful in the sense that it ﬁnds the connectivity of the network ab initio, the activity proﬁles of regulatory proteins, and the cinematic parameters of transcription regulation. Our present algorithm10 requires a priori knowledge of the network connectivity and uses expression data from sporulation in yeast17,18 to identify genes whose parameters change the most between two yeast strains,17 and the motifs19 –24 associated with these genes. The methods described above can be used to ﬁnd the a priori necessary network connectivity, but other methods can also be employed for this purpose, such as genetic and biochemical experiments25 complemented with known yeast databases.26 Other studies27 have found the network connectivity using support vector machines (SVM) as a tool to identify transcription factor targets from gene expression data. The present algorithm uses a model similar in some aspects to the ones described above, but in an improved, more comprehensive way. It includes a nonlinear representation of several processes of mRNA transcription (transcription factor binding, cooperativity, transcription regulation) together with linear mRNA degradation, as well as linear protein translation and degradation. It also features a model-independent, global, nonlinear parameter identiﬁ- cation procedure based on a type of genetic algorithms called evolution strategies. The approach used here is different in several respects from the usual clustering algorithms or other genetic functional analysis procedures14,28,29 because, although it also uses gene expression patterns, it does not rely on similarities in such expression patterns. It is instead a biophysically based model of transcription, translation, and degradation that reproduces each of the measured expression patterns. It ﬁnds the corresponding dynamic parameters and their changes in cases where a phenotype difference exists between two samples. The method is able to identify the biological or phenotypical consequences of such changes, helping to bridge the gap between genotype and phenotype.9 The genes identiﬁed seem to complement those found with clustering or other genetic functional analysis algorithms. They overlap with some genes found with them, but uncover additional ones and their genetic functional categories, as well as their relation to the particular phenotype. The method also identiﬁes and analyzes some connectivity issues of the subnetworks or motifs found. As a secondary goal, the present study examines the relationship of the parameter changes found to the mechanisms of transcription, translation, and degradation. Details of important molecular interactions3,7,8,30 may be included in the mathematical description to understand the relationships of structural molecular details10,31–33 with the model parameters. The method uses a particular model implementation that aims to get the most possible detail in the description without making it computational prohibitive. Examples of parameters in our model are: time constants for concentration change (related to dilution and degradation rates), strength of protein and mRNA turnover rates, equilibrium constants of transcription factor binding to DNA, and binding cooperativity.10,34,35 METHODS Transcription-Translation Modeling We desire to model transcription and translation as time-dependent, dynamic processes having biophysically deﬁned rates. The best way to describe such processes under such assumptions is by means of differential equations.2,10,36 We will initially ignore spatial and stochastic effects. Therefore, we describe the network with a set of continuous time, deterministic ordinary differential equations, which in their state variable representation are: ᠬ兴 d关P ᠬ 兴, ⫽ gᠬ ([m ᠬ ]) ⫺ Vp 䡠关P dt (1) d关m ᠬ兴 ᠬ ]) ⫺ Vm 䡠关m ᠬ 兴, ⫽ ᠬf([P dt (2) where J is the total number of gene products (proteins) in the network; L is the total number of genes in the network; ᠬ ] is the protein concentrations (nM) J-dimensional vec[P tor, functions of time t; (m ᠬ ) is the mRNA concentrations (nM) L-dimensional vector, functions of time t; gᠬ ([m ᠬ ]) is the translation rate functions (nM 䡠 time⫺1) J-dimensional ᠬ ]) is the transcription rate functions (nM 䡠 vector; fᠬ([P PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS time⫺1) L-dimensional vector; VP is the protein degradation rates (time⫺1) [constant (J ⫻ J) diagonal matrix]; Vm is the mRNA degradation rates (time⫺1) [constant (L ⫻ L) diagonal matrix]. The function gᠬ ([m ᠬ ]) has usually been modeled as a linear function,37 that is, the rate of protein production by the process of translation is assumed to be just proportional to the concentration of mRNA present. The degradation rates Vm and VP are diagonal matrices because the degradation of each substance is assumed to be proportional only to its own concentration, so the off-diagonal elements are just zero. The proportionality constant in such case is regularly assumed to be the degradation rate (time 䡠 constant)⫺1. ᠬ ]) represent transcription rates and The functions in ᠬf([P ᠬ ]) is are nonlinear functions. Deﬁning the functions ᠬf([P critical because they represent the complex processes of mRNA transcription initiation and elongation,3,38 including transcription factor binding and assembly of the RNA polymerase initiation– competent complex. Structural details of the initiation complex31–33,39 of RNA polymerase II have been found recently. Recent studies also demonstrate that in the mathematical modeling of this system it may be necessary to provide detailed descriptions, including molecular biophysical details.1,3,8,10,38,40,41 We try to include as many biophysical details as possible without making the model unpractical to solve. Some may be included in a second round of model reﬁnement. For example, additional mechanisms involving the ubiquitin–proteasome degradation system can be deﬁned both for the rate of transcription activation and for the rate of degradation.42 To prevent excess complexity, their contribution to the transcription–translation processes will not be incorporated into the present version of the model, but can be worked with in future analysis to improve the modeling. An intermediate-complexity approach (without cooperativity, structural, stochastic, spatial, or proteasome machinery details) is shown in the study by Zak et al.,15 and includes rate-law kinetics derived from mass action principles. In this description, biochemical equations are written15 for: (a) binding and unbinding of transcription factors to promoters to form TF-promoter complexes, (b) translation modeled as a ﬁrst-order process with a ﬁxed rate constant; (c) dimerization and undimerization for the formation of the active (dimer) transcription factor. Dimer concentrations are also affected by the promoter binding and unbinding reactions in which they participate. Assuming that such reactions do not signiﬁcantly inﬂuence the concentration of transcription factor dimers, these terms can be removed with only a small effect on the results;15 (d) overall transcription rate as a linear sum of the transcription rates for the bound and unbound promoter states; (e) in the case of promoters where more than one transcription factor can bind, additional rate terms15 (taking care of the combinatorial problem too) are added to (d). We developed in our previous work10 a still simpler rate-law kinetics description derived from mass action principles. Because these reactions are fast enough compared with the dynamics of transcription and translation, our initial simpliﬁcation consisted in replacing for them the steady 527 state. This can be also written directly by a thermodynamic analysis of the combinatorial binding problem.10 This description would be complete, and would encompass a direct relationship with the biophysical reality, but we found that the large number of parameters (the on and off rate constants in these equations, or the thermodynamically derived binding constants or binding free energies) precludes their practical identiﬁcation from experimental data in almost all subnetworks with a meaningful size. Therefore, to keep the direct relationship between the model parameters and physical reality, the only practical possibility is to use a two-step procedure. In a ﬁrst approximation, we model the system with a further simpliﬁcation and abstraction of the rate-law kinetics derived from mass action principles consisting in using the Hill function (plus a constant basal rate) to represent the rate of transcription.36 In addition, we use this simpliﬁed model in a very simple subnetwork, namely one composed of only two interacting genes. The model could then be improved in a second step. Such improvement would include, for example, ﬁnding (by a similar parameter ﬁtting optimization) a function with more details (and correspondingly with more parameters) that ﬁts the same nonlinearity that the Hill function of the ﬁrst step does. Then, further parameter identiﬁcation can be performed with our method, beginning with the new more detailed function and its larger number of parameters. The fact that a ﬁrst step gives initial values for the larger number of parameters makes this second step feasible because the optimization begins very near the optimum values. This improved modeling would have a direct relationship of the parameters with the biophysical reality, at the cost of highly increased computational effort. We found in our previous study10 that for most of our purposes only the ﬁrst step is necessary, and therefore chose to model here our subnetworks only with the Hill function. The parameters found with this model still have a relationship with the biophysical reality, although it is not one-to-one because they have been condensed in a few ones. Hill Function The Michaelis-Menten description of the rate of transcription and, in the case of cooperativity, the Hill function,4 have been the preferred descriptions considered for biophysically based modeling of the transcription rate. In previous modeling attempts, a normalized Hill function was used within a wholly normalized model as a simulation tool.36 In our modeling10 we do not employ normalization of the whole model but instead use, only in the Hill nonlinear function, explicit division of the nanomolar concentration variable by its corresponding threshold for steep change, preserving the dimensionality (in nanomoles) of such concentration variable. We add a basal rate of transcription ␣0 and multiply the transcription rate nonlinear function by ␣, a parameter that represents the maximum strength of the nonbasal transcription rate without repressor/activator. A complete treatment of the system with Hill functions includes different degradation rates for mRNA and proteins (represented here by the 528 G. CAVELIER AND D. ANASTASSIOU time 䡠 decay constants, reciprocals of the degradation rates, that is the time constants ␶m and ␶p in minutes), as well as an explicit translation rate for mRNA (kt in min⫺1): d关P j 兴关P j 兴 , ⫽ k t 䡠关m i 兴 ⫺ dt ␶P (3) d关m i 兴关m i 兴 , ⫽ ␣ 0 ⫹ ␣ 䡠 f共关P j 兴兲 ⫺ dt ␶m (4) where t is the time (min); [mt] is the mRNA concentration (nM); [Pj] is the corresponding (translated) protein concentration (nM); ␣0 is the “leakiness” of the promoter [mRNA concentration per unit time (nM 䡠 min⫺1) produced from a given promoter type during continuous growth in the presence of saturating amounts of repressor]; ␣ ⫹ ␣0 is the mRNA concentration per unit time (nM 䡠 min⫺1) produced from a given promoter type during continuous growth in the absence of repressor; kt is the protein translation coefﬁcient (min⫺1); ␶p is the protein degradation time constant (min); ␶m is the mRNA degradation time constant (min). The Hill nonlinear (repression) function f([Pj]) is therefore in this case: f共关P j 兴兲 ⫽ 1 关P j 兴 1⫹ THR 冉冊 , nHILL (5) where nHill is the Hill coefﬁcient; THR is the repressor threshold concentration (nM) for steepest change of the nonlinear function, that is, the concentration necessary to half-maximally repress a promoter. This corresponds to the case where [P] is the concentration of a repressor. If [P] is the concentration of an activator, then replace f([P]) by [1 ⫺ f([P]). Similar modeling has been applied to the SOS DNA repair system in Escherichia coli, but without the gene product (protein) equation [Eq. (3)]. The parameters of such a model were found with another nonlinear optimization procedure.4 In our previous study,10 we analyzed different types of nonlinear functions to describe the process of transcription. The simplest case displays only the transcription factor concentration variable ([m1(t)] ⫽ x1) and the RNAP concentration ([RNAP] ⫽ x2 ⫽ 30 nM). It considers only one binding site for the transcription factor, and one (nonoverlapping) binding site for RNAP. Upon constructing the corresponding binding site occupation table as described in ref. 10, it is found that only terms of order 1 in the variable (x1) are possible. This is therefore a simple sigmoid whose slope is very smooth (it is the Michaelis-Menten type of sigmoid4). The corresponding Hill function will display a Hill coefﬁcient of 1 and will not be considered here because most ﬁttings to real data that we did10 indeed displayed sharper slopes, corresponding to Hill coefﬁcients larger than 1. The second case to be considered is when there are two binding sites (two operators) for the transcription factors and one binding site for RNAP, as also analyzed in more detail in our previous article.10 The simplest appropriate nonlinear function in this case is still the one in Eqs. (4)–(5) with the Hill coefﬁcient greater than 1 taking care of the higher complexity of the system. If even more detail is desired, for example, to relate the network to molecular and atomic X-ray structural details,30 a thermodynamically based function and biophysical parameters can be found with our described procedure.10 Genome-Wide Modeling Modeling with a One-Input Gene Pair A top-down approach seeks to deﬁne motifs and modules that are repeated in the overall intracellular network, whereas a bottom-up approach begins with the biochemical equations and tries to build and explain from there the whole network. The model we used was similar to the Hill model of Eqs. (3)–(5), with only two independent state variables and one input. The analysis thus deﬁnes the gene-pair motif as an independent entity:9,19 a ﬁrst gene whose experimental mRNA concentration in time, namely [m1(t)], is taken as independent input for translation of a certain concentration of its gene-product, [P1(t)], using a form similar to Eq. (3). (This mRNA concentration is in general a state variable, but because we have its timeseries measurement, we can assume that it is a known, independent time-series input.) In turn, P1(t) (our ﬁrst state variable) is an activator or repressor for transcription of a certain concentration, [m2(t)], of the second gene mRNA [Eq. (4)]. The output of this gene-pair motif is precisely [m2(t)] (our second state variable), which is ﬁtted to the microarray time-series expression pattern for this gene {[m2(t)] time-series data} by our parameter identiﬁcation algorithm. This motif of two genes was chosen because of the availability on a Web site of experimental microarray data (see below) for [m1(t)] and [m2(t)], even if there is not data available for [P1(t)] in that Web site (no protein chip measurements were done in that system). The method developed here can be used in any pair of related genes or biochemically related proteins, not necessarily transcription factors, as long as there is experimental data available for the macromolecule concentrations. The ideal would be to have concentration measurements for all the state variables in the system equations. In the present case, there was data for two out of three state variables (macromolecular concentrations) and the resulting parameter identiﬁcation was acceptable (the errors were below 20% in most of the cases, in many cases below 10%). If only one macromolecule concentration is available, the method can also be used, but the procedure will be more difﬁcult and more prone to error. The gene product [P1(t)] can be posttranslationally modiﬁed before binding as a transcription factor. We do not include such modiﬁcations explicitly here, although if desired, this can be done. It should be noticed that the present method ﬁnds the active transcription factor concentration necessary to produce the measured [m2(t)]. Therefore, the effects of posttranslational modiﬁcations are included in the equations and parameters, not in a detailed way, but in a functional and quantitatively correct way. Note, in addition, that we suppose in this treatment that the loading effects on the protein concentrations can be PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS neglected. Such negligible loading effects refer to the fact that the protein concentrations (active transcription factors concentrations) will not be appreciably affected by the binding and unbinding reactions in which they participate. This assumption has been validated in a previous study,15 and indeed seems reasonable because the protein concentrations that we found in the optimization are quite large. A similar assumption can be made about the concentration of mRNA because of its participation in translation (namely, the concentration [m1(t)] is not modiﬁed by participation of this mRNA in translation). The implications of deviations from such ideal behavior could be studied by using stochastic approaches43,44 to the modeling in cases when the total number of macromolecules is deemed to be very small, as discussed in our previous study.10 Sampling Rate and Parameter Identiﬁcation We use our system identiﬁcation algorithm10 to ﬁnd the whole set of parameters and the (usually unknown) initial condition, P1(t ⫽ 0). The time constants ␶m and ␶p are in units of minutes; THR and P1(t ⫽ 0) are in nanomoles. The other units are as before. An important consideration when obtaining experimental data for a model is the sampling rate. In our case, because we used available data obtained by other groups, we did not have the possibility of selecting the best sampling rate. However, we can discuss this issue in terms of our model and the system under consideration. In one-dimensional linear systems, it is known that the optimal sampling rate is determined by the highest meaningful frequency, that is, the cutoff frequency of the system’s frequency response. The sampling rate that theoretically permits to recover the original signal in such a case must be at least twice the cutoff frequency. For example, if the system responds only up to excitations of the order of two cycles per minute [its cutoff frequency is (2 䡠 min⫺1) two per minute] then the sampling rate must be at least twice that value, or four samples per minute. In our case, we are ﬁtting to microarray values of yeast mRNA concentrations, and it is known that the half-life in yeast for mRNA degradation is in the approximate range of 40 – 60 min.45 Similarly, the observed oscillations in our data have a periodicity of about 60 –120 min. This corresponds roughly to a cutoff frequency in the range of 0.5–1.0 cycles per hour, and thus the sampling frequency should be at least twice such value, or about one to two samples per hour. The data available that we used was taken using sampling rates of the order of one sample per hour. However, the fact that this is a nonlinear system, plus the nonideal conditions like noise, model approximations, and multidimensionality, would make a higher sampling rate more desirable. It is always a matter of compromise between the cost of getting many samples and the number of samples desired (ideally an inﬁnite number). The parameter identiﬁcation of our gene-pair motif is relatively simple, although it needs human supervision to obtain meaningful results10 because in many cases it can take a path in the optimization surface that will lead to nonconvergence. This problem is most probably due to the 529 complexity of the nonlinear optimization surface. It might be corrected in the future by automatic means, but until now it has been only corrected by directly following the results and visually identifying when the path will not lead to convergence and stopping the procedure to restart it on a better path. The procedure is thus assured convergence, and can then be successively repeated with any desired number of gene pairs without the need for expensive or powerful computers or exponentially increasing time or other computational resources. Eukaryotic Transcription Machinery The eukaryotic transcription machinery46 is much more complicated than the simple examples above, which were initially derived from and used in modeling a bacterial system.37 However, a ﬁrst attempt to identify the network subsystems in an eukaryotic organism can be done by considering that the higher level of complexity that eukaryotes have added to the transcription machinery to achieve development and diversity46 can be reduced under simplifying assumptions to the cases studied above. By considering stages in development to be ﬁxed, one can see that many of the variables will be ﬁxed. By considering tissue cellular diversity to be ﬁxed by taking only a speciﬁc type of tissue, many other variables will be ﬁxed. Therefore, the simple models employed above with only one or two transcription factors, two binding sites for these transcription factors and one binding site for RNAP may represent a ﬁxed stage in development in a particular cell type. This type of modeling is similar to one already discussed and incorporates some concepts of the top-down regulatory architecture and modular structure.47,48 We therefore only consider the binding of one or two transcription factors and RNAP. Because of the architectural requirements of the transcription machinery, some transcription factors may need to loop the DNA towards the binding site of RNAP46,49 and bind to the transcription machinery with some interaction energy. This interaction energy and the other ﬁxed terms may be included in the free energy of binding of the transcription factor of variable concentration, as many transcription factors are known to act by increasing or decreasing the binding of other transcription factors. Alternatively, it may be included in the cooperativity effects represented by the Hill coefﬁcient of the nonlinear function, as explained above in the Methods section. We veriﬁed the validity of these assumptions for eukaryotes in our previous article10 by applying the procedure to several gene pairs in the yeast sporulation time-series expression data. Transcription Factor Oligomerization and Cooperativity We did not include the possibility of transcription factor dimerization in our modeling10 of the nonlinear function. For more accurate results, this should be included, and with some mathematical and computational effort it may be accomplished.37 Dimerization may increase the cooperativity effect of the overall function; thus, in detailed models such as the thermodynamically derived function Fig. 1. its effect should be evaluated.34 For the sake of simplicity, we will not study further this consideration here. Yet, this factor will only change the Hill coefﬁcient in the model chosen for our present study, so it is taken into account, albeit in a more general manner. Modeling in the Two-Transcription Factor Binding Case The basic gene-pair motif can have more than one transcriptional input. In such cases, the simple approach with one nonlinear function of one variable presented above is not complete. For example, in the case of two inputs a three-dimensional nonlinear function is obtained, namely two variables as inputs in the X-Y plane, and the function value in the Z plane, giving a surface. Such surface corresponds to a thermodynamically derived function of two variables, corresponding to the binding of two transcription factors. Using Hill functions [Equation (5)] Fig. 4. Fig. 1. Possible combinations of AND/OR, activation and repression. The surfaces show the normalized combined function of two concentration variables in the nanomolar logarithmic range of ⫺1 to ⫹3. Fig. 4. (a) Normalized absolute differences with respect to the mean of each parameter difference, for each of the 96 gene pairs. (b) Selection of gene pairs with a normalized parameter difference higher than 3 standard deviations. PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS independently for each variable, one has the problem of how to merge these two effects. We observed that they can be either activation or repression, and that their combined effect may be represented as the product (AND-type) or the sum (OR-type). One has therefore 23 ⫽ 8 possible combinations of these possibilities, as shown in Figure 1. We found that this approach is supported by experimental observation of this type of transcriptional input function because with our deﬁnition the surface presents up to four different plateaus (Fig. 1), as found experimentally.50 Each of the eight possible combinations was used to model a transcription machinery complex and perform reﬁned parameter identiﬁcations in some of the cases where at least two transcription factors are known to inﬂuence transcription. More detailed and more complex analyses of the combinatorial transcription logic for the nonlinear function have been performed elsewhere,51 and they show similarity to our analysis in the AND/OR cases that we chose for our modeling, although they do not show the four-level surface found experimentally,50 which our modeling indeed shows. More work is needed in the elucidation of which representation of the cis-regulatory combinatorial transcription logic is best. Parameter Identiﬁcation The parameter identiﬁcation problem in nonlinear, complex systems requires using special purpose tools, because of both the complexity and the high degree of nonlinearity of these systems. This problem has been solved in engineering and in biological systems by means of Gradient Descent, Simulated Annealing,1 nonlinear least mean squares,4 or Genetic/Evolutionary Programming Algorithms. We developed in our previous study10 an Evolution Strategy algorithm that determines the parameters of a large variety of transcription-translation mathematical models, including the one discussed above with a Hill nonlinear function. We use such tools in all the identiﬁcation procedures carried out in the present study. We refer the reader to the details of the procedure given in our previous work.10 Brieﬂy, the Evolution Strategy begins with a population of assumed (random) values of the parameters and generates by mutation and recombination random variations on them, so that a random set of solutions to the system can be obtained. Each of these solutions is compared to the experimental data and a mean square error metric for the model results with respect to the measurements is generated. The different solutions are then ranked in order of best ﬁt to worst ﬁt according to this error metric, and a prespeciﬁed number of these solutions is taken as the new population to continue in the next iteration the process with another round of mutation, recombination, solution, and error evaluation. Because not only the best solution is taken in each case, there is the possibility within the other less optimal solutions to generate an exit from local minima, providing thus a global optimization method. The solutions thus approach the experimental data globally, and when a prespeciﬁed minimum error is found, the procedure stops. 531 Identifying Highly Changed Parameters and Their Biological Consequences In our description, gene 1 is always the gene for a ﬁrst transcription factor, and gene 2 is always a gene that is activated or repressed by said ﬁrst transcription factor. We used the connectivity table published as supplementary information by Guelzim et al.26 as a primer to initiate our procedure. The whole set of parameters for a large regulatory subnetwork was obtained by consecutive identiﬁcation of the ﬁrst 96 gene pairs in the Guelzim et al.26 (http:// www.nature.com/ng/journal/v31/n1/suppinfo/ng873_S1.html) table, for each of two yeast strains. The corresponding ORFs for these 96 gene pairs are shown in Table I. The data employed for the parameter identiﬁcations was obtained by the laboratory of Esposito.17 In a few words, yeast meiotic time-series expression patterns were obtained from high-density oligonucleotide microarray measurements on the yeast strains SK1 and W03 MATa/alpha (http://re-esposito.bsd.uchicago.edu/). These two sporulation-proﬁcient yeast strains were cultured in appropriate media and sporulation was induced. Aliquots of sporulating cells from the W303 culture were taken at 0, 2, 4, 6, 8, 10, and 12 h, and from the SK1 culture at 0, 1, 2, 3, 4, 6, 8, and 10 h. These time points were chosen to reﬂect the progression of meiotic landmarks in each strain. Nonsporulating control samples from control strains were also measured. Then RNA preparation and DNA probe synthesis, genomic DNA preparation, GeneChip probe array (Affymetrix) hybridizations, and Expression Data Analysis (with GeneChip 3.0 software) were completed. Processed expression data of ⬃1600 yeast genes were organized in a relational database available in the above-cited Web site. The parameters identiﬁed included the initial conditions for the concentrations of the gene product species (protein Pi) as well as the seven parameters of the model for each gene-pair motif: ␣, ␣0, ␶m, ␶p, kt, the Hill coefﬁcient nHill, and the threshold THR [Equations (3)–(5)]. The parameters thus found are ﬁrst preprocessed by ﬁnding for each gene pair the absolute value of the numerical difference of each parameter between the two strains. The resulting absolute values of the differences are then analyzed by displaying them in a matrix containing the standard deviations that each estimation departs from the mean value for that particular parameter in the set. The matrix of normalized absolute value of change in the parameters is built. It is equivalent to a matrix of scores (for each parameter and gene pair) with mean zero and standard deviation of 1. Only those parameters that have a (normalized) parameter change greater than a chosen threshold are considered in the ﬁnal selection (a threshold of three standard deviations was used). This subset of highly changed parameters is our ﬁnal selection, aimed at disclosing the associated network motifs and their interaction with the rest of the subnetwork. This selection is visually displayed and analyzed using the software Cytoscape52 (http://www.cytoscape.org/) to further examine it in the whole yeast gene subnetwork from the database of 491 genes of Guelzim et al.26 Cytoscape provides a useful tool 532 G. CAVELIER AND D. ANASTASSIOU TABLE I. ORF for the 96 Gene Pairs Identiﬁed -) Gene 1 Gene 2 YAL021C YAL021C YBL005W YBL005W YBL005W YBL005W YBL005W YBL005W YBL005W YBL005W YBL008W YBL008W YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL021C YBL093C YBL103C YBL103C YBL103C YBL103C YBL103C YBL103C YBR008C YBR049C YBR049C YBR083W YCL067C YDR123C YGL073W YGL207W YLR182W YLR256W YHR119W YMR303C YDR011W YDR406W YGL013C YGR281W YJL219W YOL156W YOR153W YOR328W YDR225W YML010W YBL030C YDL205C YDL230W YDR148C YDR232W YGL187C YGR059W YHR008C YHR051W YIL125W YJL166W YJR048W YKL141W YKL148C YML091C YNR001C YOR065W YOR375C YPR191W YBR020W YCR005C YDR176W YLR304C YNL037C YNR001C YOR136W YBL005W YAL038W YBR019C YHR084W YNL216W YDR207C YDR207C YER111C YGL207W YBR112C for graphical network analysis and display, because the initial Saccharomyces cerevisiae network is present on the distribution of the software, can be loaded without difﬁculty and its subsets easily analyzed by using the menus or entering just one- or two-column text ﬁles for their description. The software produces a graphical output of the subnetworks in different formats, which can also be tailored with speciﬁc illustration or annotation needs. The ﬁrst neighbors of the selected genes are found with the same tool. For each parameter member of the highly variable ﬁnal selection, an associated network motif is Gene-) YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR049C YBR083W YBR112C YBR112C YBR112C YBR112C YBR112C YBR112C YBR112C YBR112C YBR112C YBR112C YBR112C YBR112C YBR279W YBR289W YBR289W YBR289W YBR289W YBR297W YBR297W YCL066W YCL066W YCL066W YCL066W YCL066W YCL067C YCL067C YCL067C YCL067C YCL067C YCL067C YCL067C Gene 2 YBR020W YCR012W YDL140C YDL164C YDR050C YDR146C YER086W YFL039C YGL026C YGR254W YKL182W YMR186W YNL216W YOL004W YOL006C YPL231W YOR077W YAL063C YAR050W YCR005C YDR172W YER065C YHR051W YHR211W YIL162W YJR048W YJR094C YLR256W YML058WA YPL256C YAR071W YBR093C YIL162W YPL187W YBR298C YBR299W YGL089C YIL099W YKL178C YLR113W YPL187W YDR007W YDR103W YFL026W YGL089C YIL015W YKL209C YPL187W therefore found by ﬁnding its ﬁrst neighbors in the database of 491 genes of Guelzim et al.26 The network motifs found for each gene pair with highly variable parameters, or the interconnected subnetwork of selected genes with highly variable parameters may be subsequently studied for dynamic response and other properties. Finally, the selected highly variable gene set is visually examined with Cytoscape with respect to the rest of the subnetwork of 491 genes of Guelzim et al.26 The connectivities of the gene set are analyzed with Cytoscape and some tools from Guelzim et al.26 Biological characteristics associated with the PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS changed parameters are found using the tools of the Saccharomyces Genome Database (SGD, http://www.yeastgenome.org/). The genetic functional categories are found and examined with respect to the corresponding biological changes and phenotypes with the MIPS53 Comprehensive Yeast Genome Database (CYGD, http://mips.gsf.de/genre/ proj/yeast/index.jsp). The biophysical relations of the parameter changes with the mechanisms of transcription, translation, and degradation are explored. A comparison is then made with results obtained by other groups with other genetic functional examination methods such as clustering and motif analysis. When there are more transcription factors in action an improvement will be obtained by including an expanded input logic in the modeling, as shown for example in Buchler et al.51 We improved the algorithm to ﬁnd in some of these cases the particular combination of AND/OR, activation and repression, as well as the corresponding parameters that provide the lowest ﬁtting error to the experimental data. This improvement of the method is achieved by including into the same algorithm additional parameters that optimize the activation/repression and Boolean combinations of several transcription factors as inputs. Finally, a robustness statistical analysis of the results is performed on some of the gene pairs by carrying out many more parameter evaluations and ﬁnding their means and corresponding 95% conﬁdence intervals with Excel. RESULTS We obtained by successive identiﬁcation with our method in each yeast strain the whole set of parameters for the genomic regulatory subnetwork composed of the ﬁrst 96 gene pairs in the Guelzim et al.26 table as described in the Methods section. Details of these results for six gene pairs in each yeast strain are shown in Figure 2, and more of such gene pairs are shown in Figure 3. Note the dynamic changes in the top and bottom curves of each pair of yeast strains (strain SK1 is on top, while strain W03 is at the bottom) and the corresponding changes in the parameters. Note that activation and repression are in some cases switched between SK1 and W03. The errors in general are around or below 20%, as found in our previous study.10 It should also be noted that the degradation time constants for mRNA and protein, ␶m and ␶p, do not follow the rule accepted for prokaryotes,34 namely, that the protein degradation time constants are much larger than the mRNA time constants. In our eukaryote system, these two time constants are larger, nearly equal, or shorter than the other is. There are, however, changes in these orderings between the two strains for the same gene pairs. It was found (see Figs. 2–3) during the optimizations for parameter identiﬁcation that some of the paths in the connectivity are in the opposite direction as that shown in the published data.26 Some of the activators or repressors found in the published data were also found to behave in the opposite way. We believe that the optimization done here gives a more accurate representation of causality, activation and repression in these cases, because the time courses (causality) of processes represented in detail by 533 the transcription and translation events in the time dependent equations cannot be reversed and follow very faithfully either an activation or repression function. Attempts to change the causality found or to switch the activation or repression functions in our identiﬁcation resulted in a very bad ﬁt to the data (not shown). Gene-Pair Motifs with Highly Changed Parameters between the Two Strains A further analysis is possible after the initial evaluation of a set of 192 pairs of genes (96 pairs of genes for each yeast strain) like those shown in Figures 2–3. The gene pairs whose parameters change the most in absolute value from strain SK1 to strain W03 can be found as explained in METHODS. In Figure 4 the seven parameters identiﬁed for each gene pair are shown on one axis as P1, P2, . . ., P7, while the gene pairs are numbered on the other axis from 1 to 96. The columns in Figure 4(a) represent the parameter change in standard deviations from the mean for each parameter. The selected genes with high parameter change are found as described in the Methods section and shown in the columns in Figure 4(b). They are also displayed in green in Figure 5(a) within the network of 491 yeast genes of Guelzim et al.26 The subnetwork [Fig. 5(b)] of selected genes with high variability in their parameters (green) plus their ﬁrst neighbors (red) is extracted from the 491 genes of Guelzim et al.26 Several known network motifs20,34 can be found among the selected genes plus their ﬁrst neighbors. One of them is shown in Figure 5(c)–(d). The 11 genes whose parameters change most are BAR1, CYC1, FLO5, GAL1, HUG1, MF(ALPHA)1, PDR15, QCR8, STE2, STE5, and SUC2. Their list and ORFs are shown in Tables II and III. Also extracted and displayed for analysis are the connectivity and MIPS53 functional categories of the selected genes with high variability in their parameters (in green in Fig. 6) together with their upstream genes (in red in Fig. 6) from the network of 491 yeast genes of Guelzim et al.26 It can be seen that there are gene hubs that control the different MIPS functional categories in an organized way. The selected gene-pair motifs whose parameters changed the most among the 96 gene-pairs analyzed in the present study overlap with a number of known network motifs.9,19 –23 Such motifs are found to correspond in many cases to already established categories, both in Escherichia coli and in Saccharomyces cerevisiae.9,19 –23 In other cases, they have additional components, and it would be interesting, for example, to see how the behavior of the “canonical” Feed-Forward Loop22,23 (FFL) changes with these additional components. The FFL is composed of a transcription factor X, which regulates a second transcription factor Y, such that X and Y jointly regulate gene Z.22,23 Some generalizations of these motifs have been studied previously.54 The set of genes with high parameter change (the selected genes shown in green in Figs. 5– 6) are all terminal genes in the gene set of Guelzim et al.26 (they will not inﬂuence any other gene in such set). They also have an above average number of inputs, as can be observed in Figure 6. The gene set of Guelzim et al.26 has 491 genes 534 G. CAVELIER AND D. ANASTASSIOU Fig. 2. Model ﬁtting results and parameters for six gene pairs of each yeast strain. Parameters listed at the right follow the order given in the text. The three last items correspond to protein initial condition, optimization error in percent, and digit 1 for activation or 0 for repression. Time series data: (Red – – –). Fitted m2 value: (Red —). Curves are labeled with the corresponding gene ORF. (from a yeast total of around 6400) and is derived from conﬁrmed genetic or biochemical experiments. The number of inputs for each of the genes with high parameter change can be found with Cytoscape. The probability of each of them having that particular number of inputs can then be calculated as described.55 As shown in Table II, most of the highly changed genes display a very low probability of having these experimentally found large numbers of inputs. It might be argued that the changes in parameters found might be due to such large number of inputs. We veriﬁed (shown below in Improving the model results and Table IV) that increasing the number of PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS Fig. 3. 535 Model ﬁtting results and parameters for 18 gene pairs of each yeast strain. The conventions are as in Figure 2. inputs in the parameter identiﬁcation did not in general change the ﬁnding of highly changed parameters where it had been previously found, although the change may partially migrate to new parameters. It could also be argued that the parameter change from SK1 to W03 might be due to the time-series waveform variance in the inputs. The mean square differences between the inputs of the two strains are shown in Table II, and they 536 G. CAVELIER AND D. ANASTASSIOU Fig. 5. Selected gene pairs are shown in green. (a) Network for database of 491 gene pairs. (b) Selected gene pairs with high difference in parameters and their ﬁrst neighbors. (c) An example of a network motif with two selected genes. (d) The same motif in a hierarchical display. are, in general, around or below 25%. This suggests that the changes in the input time series are not so large, that when all of them are taken into account they will modify the parameters. The number of inputs, the probabilities of having such number of inputs, and the mean square difference between the inputs in SK1 and W03 are all shown in Table II. Changes in Parameter Values and Their Phenotypic or Biological Consequences From the processing that we performed it is apparent (Table III) that the changes from strain SK1 to strain W03 cause differences in transcription factor binding to DNA, higher/lower mRNA rates of transcription, or higher/lower 537 PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS TABLE II. Connectivity and Inputs of Genes with Highly Changed Parameters GENE/ORF Number of Inputs Probability Diff SK1 -W03 - 1.07% 0.68% 24.90% 0.28% 10.00% 1.67% 15.90% 10.00% 4.12% 10.00% 0.11% 15.93% 19.60% 28.88% 23.74% N/A 17.68% 12.10% 19.13% 14.54% 42.17% 25.27% BAR1/YIL015W CYC1/YJR048W FLO5/YHR211W GAL1/YBR020W HUG1/ YML058W-A MF (ALPHA) 1/YPL187W PDR15/YDR406W QCR8/YJL166W STE2/YFL026W STE5/YDR103W SUC2/YIL162W TABLE III. Summary of parameter changes, biophysical causes, functional categories Parameter change Gene-pair selection MIPS category— changed parameter gene/ORF SK1-W03 change/ measured Standard deviations Possible biophysical cause YBR049C 3 YBR020W 1 METABOLISM—GAL1/YBR020W YBR112C 3 YIL162W 1 METABOLISM—SUC2/YIL162W YBR112C 3 YJR048W 2 ENERGY—CYC1/YJR048W YBL021C 3 YJL166W 2 ENERGY—QCR8/YJL166W ␶m 2–33/45 5.8 nHILL 1–3 ␶m 84–7/112 ␶p 1–97 ␶m 39–87/33 THR- 3.1 YBR112C 3 YML058W-A 10 CELL CYCLE — HUG1/YML058W-A kt 0–2 ␣0 0–4 ␶m 58–97 6.0 8.0 YBL005W 3 YDR406W 20 CELLULAR TRANSPORT—PDR15/YDR406W YCL067C 3 YIL015W- Pheromone Response—BAR1/YIL015W YBR289W 3 YPL187W- Pheromone Response—MF (ALPHA) 1/YPL187W YCL067C 3 YFL026W- Pheromone Response—STE2/YFL026W YCL067C 3 YDR103W- Pheromone Response—STE5/YDR103W ␶m 5–78/53 4.4 ␶m 2–29/29 3.5 ␣ 4–36 4.5 ␣ 3–41 3.0 nHILL 3–39 5.2 YBR112C 3 YHR211W 34.07.01 cell–cell adhesion—FLO5/YHR211W THR- kt 0–5 3.6 4.6 activation thresholds; they also cause higher/lower protein rates of translation, or higher/lower degradation rates for proteins and mRNAs. All this can change the behavior of an FFL because such parameters determine the functional characteristics of the FFL.20,22,23 We summarize in Table III some of the most signiﬁcant parameter changes in our analysis. We also identify in Table III the molecular biophysical characteristics connected with each parameter change, the MIPS53 functional categories associated with the genes with highly changed parameters, and some measured values in yeast strain Y26256 for the mRNA degradation time constants ␶m. The MIPS categories, parameter changes, and other biological characteristics for the highly changed genes were found as described in the Methods section. They are listed in Table III, displayed in Figure 6, and examined below. Changes in mRNA degradation rates. An interesting parameter modiﬁcation appearing in several of the 8.7 4.0 mRNA degradation of GAL1/YBR020W Transcription cooperativity for SUC2/YIL162W Protein degradation (YBR112C protein) DNA binding, cooperativity in transcription machinery for QCR8/YJL166W Translation strength for YBR112C protein Basal transcription of HUG1/YML058W-A mRNA degradation of PDR15/YDR406W mRNA degradation of BAR1/YIL015W Transcription rate of MF (ALPHA) 1/YPL187W Transcription rate of STE2/YFL026W Cooperativity in transcription machinery for STE5/YDR103W DNA binding, cooperativity in transcription machinery for FL05/YHR211W Translation strength for YBR112C protein Fig. 6. Connectivity and MIPS functional categories of the highly changed selected genes with high variability in their parameters (in green), and their upstream genes (in red). 538 G. CAVELIER AND D. ANASTASSIOU PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS 539 TABLE IV. Combinatorial input and statistical analysis of parameter ﬁtting Parameter [SK1] [W03] ␣1A ␣1R ␣2A ␣2R ␣0 ␶m ␶P1 ␶P2 kt1 kt2 nHILL1 nHILL2 THR1 THR2 Parameter [SK1] [W03] ␣ ␣0 ␶m ␶P kt nHILL THR Case 1 PDR15/YDR406W 1 run 2 inputs 1 run 1 input - - - 0.66 1.36 3.33 7.93 - Case 2 10 runs mean, 2 inputs 1 run 1 input 10 runs mean, 2 inputs 3.13 ⫾ 1.21 8.48 ⫾ 5.02 5.95 ⫾- ⫾ 3.86 7.14 ⫾- ⫾ 8.27 6.32 ⫾ 2.03 5.43 ⫾ 3.36 0.89 ⫾ 0.44 1.26 ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾ 5.10 0.45 ⫾ 0.23 1.06 ⫾ 0.85 0.24 ⫾ 0.11 1.20 ⫾ 1.88 1.58 ⫾ 0.65 5.31 ⫾ 3.81 1.89 ⫾ 0.43 4.58 ⫾- ⫾- ⫾- ⫾- ⫾ 113.35 - 57.49 ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾- ⫾ 9.02 0.15 ⫾ 0.05 1.01 ⫾ 0.68 0.11 ⫾ 0.05 1.06 ⫾ 0.38 0.84 ⫾ 0.44 2.26 ⫾ 0.48 1.14 ⫾ 0.41 1.61 ⫾- ⫾- ⫾- ⫾- ⫾ 316.88 - 0.41 0.38 0.87 0.98 - Case 3 FLO5/YHR211W Case 4 1 run 1 input 10 runs mean, 1 input 1 run 1 input 10 runs mean, 1 input - 13.57 ⫾ 6.62 7.63 ⫾ 1.98 1.99 ⫾- ⫾ 2.25 3.20 ⫾- ⫾ 1.73 5.89 ⫾- ⫾ 3.06 0.38 ⫾ 0.06 4.67 ⫾ 1.30 7.90 ⫾- ⫾- ⫾- ⫾ 886.70 - 28.76 ⫾- ⫾- ⫾ 0.75 1.14 ⫾ 0.84 8.82 ⫾- ⫾- ⫾- ⫾ 7.55 1.11 ⫾ 0.17 0.90 ⫾ 0.42 2.99 ⫾ 2.99 3.16 ⫾- ⫾- ⫾ 188.36 highly changed genes is that in ␶m. It is increasingly clear that the cell controls the levels of cellular mRNA transcripts by regulating the rate at which mRNA decays.57,58 The decay rates of housekeeping genes are invariable, but the half-lives of many other mRNAs change markedly in response to environmental clues. A principal mRNA degradation path is initiated by removal of the 3⬘ poly(A) tail, an event that facilitates the job of exonucleases. Poly(A) removal triggers rapid cleavage of the 5⬘ cap by a decapping enzyme (Dcp1). In addition, several cis-acting sequence elements can regu- late the rate of turnover of a transcript, such as the A ⫹ U-rich element (ARE), found in the 3⬘ untranslated regions (3⬘ UTRs). A plethora of ARE-binding proteins exists, and they can either promote or inhibit mRNA decay. Several signaling pathways control the ARE degradation path, and there has even been found a connection between degradation rates and translation.57 Microarray measurements of yeast mRNA halflives have been performed56 on strain Y262 (they were the ones used, converted to time constant values in minutes, as the measured values of ␶m in Table III). 540 G. CAVELIER AND D. ANASTASSIOU Strong evidence was found that precise control of the decay of each mRNA is fundamental for gene expression, as these decay rates were matched to one another with high speciﬁcity and precision, for example, among the mRNAs that encode protein subunits of stoichiometric complexes. A similar precise control of mRNA decay rates was found in transcripts encoding enzymes related to a particular physiological function.56 It is therefore clear that mRNA decay pathways make an important contribution to the global control of gene expression, but the mechanisms or their relationship to sequence remain obscure in many cases.59 In the cases found with our calculations, the changes in decay rates may be due to differences in coding sequences (there are a lot of sequence disparities between strains SK1 and W0322), but they can also be due to variations in noncoding sequences, as many AREs are in such untranslated regions. As discussed above, a variety of reasons may modify the decay rate. Cytoplasmic enzymes in yeast involved in mRNA decay pathways are: PAN2, PAN3, CCR4, CAF1, DCP1, DCP2, DCPS, and XRN1; mRNA decaping factors are DCP1, DCP2, VPS16, PAT1/MRT1/ SPB10, LSM, EDC1, EDC2, UPF1, UPF2/NMD2, UPF3, and HRP1/NAB4;57 poly(A)-speciﬁc deadenylating nuclease is PARN, which interacts with PABP, and this interacts with the translation initiation factor eIF4G, which in turn forms a ternary complex with the capbinding protein eIF4E;57 ARE binding proteins are TTP, UuR, KSRP, NF90, and TIA-1.59 The following genes have been GO term annotated with mRNA catabolism by the Gene OntologyTM (GO) Consortium: CBP1, CSL4, DCP1, DCP2, DIS3, KEM1, LSM7, MRT4, MTR3, NAM7, NMD2, RRP4, RRP40, RRP42, RRP43, RRP45, RRP46, SKI2, SKI3, SKI6, SKI7, SKI8, and UPF3. Some of these overlap with the lists of decay pathways above. There is, however, no overlap of our present highly changed parameter gene list with the mRNA catabolism GO term annotated genes; neither is there overlap with the mRNA decay pathway-related genes discussed above. 01 METABOLISM: GENES GAL1, AND SUC- Sugar, Glucoside, Polyol, and Carboxylate Anabolism GAL- C-compound and carbohydrate utilization SUC2 GAL1/YBR020W. Gene-pair YBR049C 3 YBR020W displays a change in ␶m, and is a variation of the canonical FFL within a cascade of four levels. The motif is rather complex, and does not ﬁt in any of the known network motifs. The modiﬁcation in ␶m from 1.5 to 32.6 min is probably due to a variation in the mRNA degradation rate of GAL1. It is accompanied by a modiﬁcation in ␶p from 3.4 to 7.4 min. The measured ␶m for GAL1 in strain Y26256 is 44.64 min (97.5–2.5% interval 36.00 –51.84 min). Gene GAL1 is found in our results as repressed in both strains. The parameter adjustments are in agreement with a more delayed response for the slow growth strain W03. Gene GAL1 is a galactokinase, involved in galactose metabolism. It is controlled by gene REB1/YBR049C, a DNA binding protein involved in regulation of transcription from Pol II promoter. Other genes regulated by REB1 are YAL038W, YBR019C, YBR020W, YCR012W, (metabolism), YDL140C, (transcription from pol II promoter), YDL164C, (transcriptional activator) YDR050C, (metabolism) YDR146C (transcriptional activator), YER086W (amino acid biosynthesis), YFL039C (actin), YGL026C (tryptophan biosynthesis), YGR254W (metabolism), YKL182W (fatty acid biosynthesis), YMR186W (heat shock protein), YNL216W (transcription factor activity), YOL004W (DNA binding protein involved in transcriptional regulation, chromatin silencing), YOL006C (topoisomerase), and YPL231W (fatty acid biosynthesis). They are all involved in general in metabolism, biosynthesis, and transcription. The average numerical changes found in our computations in the value of the parameters from strain SK1 to strain W03 in this family of genes controlled by REB1 are: ␣ ⫽ 58%, ␣0 ⫽ 279%, ␶m ⫽ 202%, ␶p ⫽ 157%, kt ⫽ 358%, nHILL ⫽ 22%, and THR ⫽ 51%. Therefore, the family of genes identiﬁed in the present work by an adjustment in the mRNA decay rate of GAL1 display as a set a large difference of their parameter values from strain SK1 to strain W03, in particular, an increase in the time constants for mRNA and protein degradation. The parameter modiﬁcations in these metabolism-related genes are in agreement with the sluggishness of the W03 phenotype, a sluggishness that the cell apparently tries to compensate by high increases in the rates of transcription and translation. SUC2/YIL162W. Gene-pair YBR112C 3 YIL162W displays a change in nHILL probably due to differences in DNA binding strength and/or cooperativity in the transcription machinery for SUC2. Gene SUC2 is an invertase (sucrose hydrolyzing enzyme) involved in sucrose catabolism. It gives in our parameter identiﬁcation the following changes from SK1 to W03: nHILL from 1.1 to 3.2, ␣ from 32.5 to 93.3, ␣0 from 1.1 to 4.1, and ␶m from 83.8 to 6.6 min. The measured ␶m for SUC2 in strain Y26256 is 112.32 min (97.5–2.5% interval 80.64 –217.44 min). Gene SUC2 is found in our estimates as repressed in both strains. With the parameter adjustments found, this metabolismrelated gene pair will behave more briskly in W03 than in SK1. Therefore, the parameter modiﬁcations are not in general in the direction of slowing the response. 02 ENERGY: GENES CYC1 AND QCR8 02.11 Electron Transport and MembraneAssociated Energy Conservation CYC1/YJR048W. Gene pair YBR112C 3 YJR048W displays a change in ␶p, probably due to variations in protein YBR112C degradation. Gene CYC1 is a Cytochrome c, isoform 1; electron carrier of the mitochondrial intermembrane space that transfers electrons from ubiquinone– cytochrome c oxidoreductase to cytochrome c oxidase during cellular respiration. It is found in our parameter identiﬁcation to have variations from SK1 to W03 in ␶p from 1.4 to 96.7 min, THR from 96.2 to 767.0, ␣ from 7.6 to 34.7 and ␶m from 38.9 to 87.4 min. The measured ␶m for CYC1 in strain Y26256 is 33.12 min (97.5–2.5% interval 27.36 – 41.76 min). Gene CYC1 is found in our calculations PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS as repressed in SK1 but activated in W03. The parameter modiﬁcations found for this gene-pair related to energy conservation will, in general, induce in W03 a sluggishness of the response, in agreement with the slow growth phenotype of strain W03. QCR8/YJL166W. Gene-pair YBL021C 3 YJL166W displays a change in threshold THR most probably due to a change in DNA binding strength and/or cooperativity in the transcription machinery for QCR8. The adjustment in THR is from 570.16 in SK1 to 7844.69 in W03. Gene QCR8 is an ubiQuinol Cytochrome c (oxido)-reductase subunit 8, involved in mitochondrial electron transport from ubiquinol to cytochrome c. It is found in our estimates as repressed in SK1 and activated in W03. These variations mean a change from repression at low concentrations of the (transcription factor) gene-product of YBL021C to activation only at very high concentrations, therefore contributing to the sluggishness in the response of strain W03. 10 CELL CYCLE AND DNA PROCESSING: GENE HUG- Cell Cycle Arrest HUG1/YML058W-A. Gene pair YBR112C 3 YML058W-A displays two modiﬁcations: kt and ␣0. The variation in kt is probably due to changes in translation strength for the YBR112C protein. The variation in ␣0 is probably due to differences in basal transcription of HUG1. Gene HUG1 gives in our parameter identiﬁcation changes from SK1 to W03 as follows: kt from 0.02 to 1.7, THR from 345.1 to 959.9, ␣ from 12.8 to 91.9, ␣0 from 0.05 to 3.8 and ␶m from 58.3 to 96.7 min. No measurement of ␶m for this gene is found in the decay rate data for strain Y262.56 Gene HUG1 is found in our results as repressed in SK1 but activated in W03. The parameter adjustments found for this cell cyclerelated gene pair are not all in the direction of increasing; neither are they all in the direction of decreasing the sluggishness of the W03 response. In general, it is a stronger response, but with an increased delay due to the increase in ␶m. 20 CELLULAR TRANSPORT: GENE PDR- ABC Transporters PDR15/YDR406W. Gene pair YBL005W 3 YDR406W displays a variation in ␶m for gene PDR15 from 4.5 to 77.8 min. The measured ␶m for PDR15 in strain Y26256 is 53.28 min (97.5–2.5% interval- min). The higher ␶m value found in our estimates for W03 means that mRNA concentration for gene PDR15 can be modiﬁed much more rapidly in SK1 than in W03. This is indeed observed in the time courses of these concentrations (not shown). It also means that this change may be inﬂuencing the slowness of W03 growth. It is therefore clear that the sequence disparity between SK1 and W03 in this cellular transport-related gene has brought a considerable delay in gene expression by making the decay rate much lower in W03. Thus, in strain W03 gene PDR15 shows a considerably repressed and delayed response to gene YBL005W compared to that in SK1. Our analysis identiﬁes gene PDR15 as implicated in a change in mRNA turnover, although this gene has not 541 been previously identiﬁed in decay analysis. Therefore, its involvement in a modiﬁcation in turnover may have been previously overlooked. Gene PDR15 corresponds to protein Pdr15p, an ATPbinding Cassette (ABC) protein, a general stress response factor implicated in cellular detoxiﬁcation60 (PDR: Pleiotropic Drug Resistance). In a study60 made on yeast strain W03-1A and some mutants, it was shown that Pdr15 becomes most abundant when cells exit the exponential growth phase, an endogenous metabolic stress. PDR15 is strongly induced by various other stress conditions like heat shock, low pH, weak acids, or high osmolarity.60 Its induction requires the general stress regulator Msn2p, which directly decorates the stress response elements (STREs) in the PDR15 promoter.60 This demonstrates a crosstalk between PDR and general stress response. Pdr15p has never been isolated in genetic screens for drug resistance, and while several genomic approaches hint at PDR15 as a gene modulated by stress conditions, microarray data correlation with its functions had not been previously found.60 Levels of Pdr15p remained high even when the mRNA levels decreased.60 In our case, the decrease in decay rate in strain W03 will help to maintain the levels of PDR15 mRNA compared to those of SK1. We have therefore found that the mRNA decay rate of gene PDR15 is reduced in the slow growth phenotype in strain W03 with respect to strain SK1. W03 probably suffers from stress related conditions, while strain SK1 has a rapid turnover for this gene, in agreement with a rapidly growing strain (can change rapidly upon induction or repression). These facts raise the interesting possibility that strain W03 is, at least in part, a slow grower because of a stress response to its sequence. For example, several ARE binding proteins are modulated by heat shock,57 and mRNA turnover has been found very important in regulating gene expression following several conditions of stress.58 PDR15 cannot be found in any of the previously performed clustering analysis in yeast found in the literature: It is not found (in the article giving measurement during sporulation17) among the genomic variations between strains SK1 and W03, nor in any of the temporal clusters found by analysis of the sporulation expression patterns. When the parameters in the accompanying genes controlled by YBL005W (which are in FFL motifs nearby to PDR15) are analyzed with our method, it is found that they have also important changes. These genes are YDR011W, YDR406W, YGL013C, YGR281W, YJL219W, YOL156W, YOR153W, and YOR328W. They are controlled by a PDR gene and they by themselves are PDR genes. The average numerical increases in the value of the parameters from strain SK1 to strain W03 in this family of 8 PDR genes controlled by YBL005W are ␣ ⫽ 26%, ␣0 ⫽ 121%, ␶m ⫽ 357%, ␶p ⫽ 318%, kt ⫽895%, nHILL ⫽ 34%, and THR ⫽ 326%. There is, therefore, a massive average increase in the parameters of this group of PDR genes. These alterations, on one hand, are responsible for a delay in the dynamics, and on the other hand, partially compensate for this sluggishness by increasing the transcription and translation rates. The slow growth phenotype of strain 542 G. CAVELIER AND D. ANASTASSIOU W03 may be therefore related to the stress response in this family of cellular transport-related genes. 34 INTERACTION WITH THE CELLULAR ENVIRONMENT: GENES BAR1, FLO5, MF(ALPHA)1, STE2 AND STE- cell– cell adhesion FLO- Pheromone response BAR1, MF(ALPHA)1, STE2, and STE5 FLO5/YHR211W. Gene-pair YBR112C 3 YHR211W displays two highly changed parameters: the threshold THR and the protein translation rate kt. The threshold THR alteration is probably due to adjustments in DNA binding strength and/or cooperativity in the transcription machinery of FLO5. The variation in kt is probably due to a translation strength modiﬁcation of the YBR112C protein. Gene FLO5 is a lectin-like protein involved in ﬂocculation, and a cell wall protein that binds to mannose chains on the surface of other cells. It gives in our ﬁndings the following changes from SK1 to W03: THR from 706.22 to 8964, kt from 0.06 to 5.0, nHILL from 3.1 to 16.8, ␣0 from 0.5 to 11.6, and ␶m from 16.7 to 11.0 min. The ␶m for FLO5 in strain Y26256 could not be obtained because of poor data. Gene FLO5 is found in our results as activated in both strains. The parameter modiﬁcations found for this gene pair will cause a stronger and somewhat faster response for strain W03, but only at very high concentrations of gene product YBR112C. Therefore, they are shown to contribute to the slow growth phenotype of strain W03. BAR1/YIL015W. Gene-pair YCL067C 3 YIL015W displays a change in ␶m probably due to alterations in the BAR1 gene mRNA degradation. Gene BAR1 is an aspartyl protease secreted into the periplasmic space of mating type a cells; it cleaves and inactivates alpha factor allowing cells to recover from ␣-factor–induced cell cycle arrest. The variation in gene BAR1 corresponds to a change from SK1 to W03 in ␶m from 2.0 to 28.6 min, and in ␶p from 4.0 to 48.7 min. Gene BAR1 is found in our estimates as repressed in both strains. Measurement of ␶m for BAR1 in strain Y26256 could not be done due to poor data quality, while its measured deadenylation ␶m is 28.80 min (97.5– 2.5% interval 21.60 – 40.32 min). The parameter adjustments found for this pheromone response gene pair in general will give a more delayed response in W03, in agreement with the slow growth W03 phenotype. MF(ALPHA)1/YPL187W. Gene pair YBR289W 3 YPL187W displays a modiﬁcation in ␣, probably due to an adjustment in the transcription rate of MF(ALPHA)1, which in turn, is perhaps due to changes in DNA binding strength. Gene MF(ALPHA)1 codes for the mating ␣-factor pheromone. It gives in the present parameter evaluation a difference from SK1 to W03 in ␶m from 3.5 to 36.2 min, and ␶p from 45.3 to 92.9 min. The measured ␶m for MF(ALPHA)1 in strain Y26256 is 11.52 min (97.5–2.5% interval 10.08 –14.40 min). Gene MF(ALPHA)1 is found in our evaluation as repressed in both strains. The parameter modiﬁcations found here for this pheromone response gene will in general give a more delayed response for W03. STE2/YFL026W. Gene pair YCL067C 3 YFL026W displays a modiﬁcation in ␣. Gene STE2 is an ␣-factor pheromone receptor; seven-transmembrane domain protein (controlled by HMLALPHA2/YCL067C). It displays in our parameter identiﬁcation a change in ␣ from 3.2 in SK1 to 41.0 in W03 probably due to differences in its transcription rate, possibly by alteration of the DNA binding strength. It also displays a variation in ␶p from 4.5 to 22.0 min. It is found in our results as activated in SK1, but repressed in W03. It displays as well a change in ␶m from 15.7 to 23.0 min. Measurement of ␶m for STE2 in strain Y26256 was 64.80 min (97.5–2.5% interval- min). According to the results of our parameter identiﬁcation, this pheromone response gene will have a strongly repressed and more delayed response in W03 in agreement with the slow growth W03 phenotype. STE5/YDR103W. Gene pair YCL067C 3 YDR103W displays a change in nHILL, probably due to an adjustment in cooperativity in the transcription machinery for STE5. Gene STE5 (controlled by HMLALPHA2/YCL067C) is a protein of the pheromone pathway with MAP-kinase scaffold activity. It is a scaffold protein that, in response to pheromone, shuttles from the nucleus to the plasma membrane and assembles kinases Ste11p, Ste7p, and Fus3p into a speciﬁc signaling complex. Gene STE5 gives in the present parameter estimates the following modiﬁcations from SK1 to W03: nHILL from 3.1 to 38.6, ␶p from 2.8 to 23.5 min, and ␶m from 27.7 to 7.8 min. The measured ␶m for STE5 in strain Y26256 is 20.16 min (97.5–2.5% interval 17.28 –24.48 min). Gene STE5 is found in our results as repressed in SK1 but activated in W03. The parameter variations found by our identiﬁcation therefore show a pheromone response kinase scaffold gene pair activated in W03, which will be slower in the protein subsystem (YCL067C gene product) due to the lower protein decay rate, but more rapid in the mRNA subsystem (gene HMLALPHA2/YCL067C) due to the higher mRNA decay rate and the higher nHILL. We therefore have found in addition to BAR1 three other genes that belong to the ␣-factor pheromone pathway. They are controlled by HMLALPHA2/YCL067C, and are found in our study with high parameter changes: STE2, STE5, and MF(ALPHA)1. They undergo on average the following increases in their parameters from SK1 to W03: ␣ 377%, ␣0 ⫽ 119%, ␶m ⫽ 398%, ␶p ⫽ 588%, nHILL ⫽ 235%. They show in our evaluations in general an agreement with the slow growth phenotype of strain W03. This ensemble is downstream of gene REB1, which also controls GAL1 of the Metabolic (galactose) pathway (see Fig. 6). In summary, we identify with our procedure ﬁve functional groups with important adjustments in their parameter values, which we can relate to as cooperating in general to the phenotypical behaviors of the corresponding yeast strains (changes in parameter values from the faster, more synchronously sporulating strain SK1 to the slow growth strain W03). The modiﬁcations found by our identiﬁcation correspond mainly to increases in the time constants values and some increases in their transcription and translation rates. PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS Results of Present Method Compared to Other Functional Analysis Methods The foregoing analysis of changes in parameters leading to the identiﬁcation of functional groups of genes is not based on clustering analysis. Nonetheless, a comparison can be made with results from clustering or other functional analysis methods as applied to this and similar systems. Genes PDR15, GAL1, SUC2, BAR1, MF(ALPHA)1, STE2, STE5, FLO5, and HUG1 found in this study are not in any of the clusters found by Primig et al.,17 clusters that were obtained by analysis of the same data. Therefore, the important inﬂuence of their parameter variations in the dynamics of functional responses would not be found by application of traditional clustering methods. Only one of the genes found in the present study, gene CYC1, is found in one of the clusters of Primig et al.17 (cluster 1, which contains stress-response genes and metabolic genes required for nitrogen uptake, amino acid biosynthesis, and mitochondrial function17). None of the genes found with our method is found in the classes “Sporulation-Deﬁcient Class,” “Enhanced Sporulation Efﬁciency Class,” and “Postgermination Growth Defective” in the Sporulation Phenotype and Clustering study by Deutschbauer et al.61 Another study25 used genomewide location analysis experiments to ﬁnd network motifs in yeast and their functional assembling across the genome. From the genes selected in the present study, three were also found in these motifs:25 genes BAR1 and STE2 were found in the Multiple Input Module (MIM) No. 93, with regulators MCM1 and STE12, and eight potentially regulated genes: BAR1, YDR179C, YDR179W-A, MFA2, STE2, YFL027C, AGA2, and SIM1. Gene GAL1 was found in the Single Input Motifs (SIM) regulated by Gal4, with genes YLR408C, GAL7, FUR4, GAL80, GAL10, GAL1, ORC2, YDR008C, and GAL3. A more sophisticated, recently published method employs not only functional information from expression data, but also links genes to the factors that regulate them by incorporating DNA binding data.14 This method enriches previous results, giving a large number of functional modules. Indeed, this method found gene STE2 (a pheromone response gene located also in our study) in cluster 73. The set of factors that regulate this module are MCM1 and STE12. Gene STE2 in this module14 is described as a response to pheromone during conjugation with cellular fusion, mating-type ␣-factor pheromone receptor activity, integral to plasma membrane. The MIPS sub-categories cited are- pheromone response, mating-type determination, sex-speciﬁc proteins; 40.27 extracellular/ secretion proteins; and 14.04 cell differentiation. The method14 also ﬁnds gene GAL1 (a metabolism gene uncovered as well in our study) in cluster 106 with regulator GAL4. Gene GAL1 in this module is described as galactose metabolism galactokinase activity. The MIPS subcategories cited are 01.05 C-compound and carbohydrate metabolism; 01 Metabolism; and 01.05.01 C-compound and carbohydrate utilization. In the rapamycin regulatory network modules14 other genes appeared. Gene STE5 (a pheromone response gene found in our study) regulated by 543 RTG1 and MSN2, was found in cluster 19. It is described as signal transduction during conjugation with cellular fusion, MAP-kinase scaffold activity. The MIPS subcategories cited14 are:- pheromone response; 13.11 cellular sensing and response; and 13.11.03 chemoperception and response. Some of the genes selected by the present study have been therefore found by other microarray functional data analysis methods (cluster or motif analysis) mainly when they include in addition to clustering some other sophisticated procedures. By themselves, clustering techniques only ﬁnd one gene in common with the present study. In all the other studies some of the MIPS categories overlap with functional categories discovered in the present one, and some genes are detected in common with our list of highly changed genes. However, no dynamic parameters were found, and the relationships of the parameter modiﬁcations with the biophysics of transcription, translation, and degradation processes, with phenotype and with genetic functional categories were not analyzed as it was done here. Results of an Improved Model with Two-Input Motifs and Fitting Statistics We chose to reﬁne the model including a module with two inputs in some of the cases where it is known that more than one transcription factor affects transcription, as described above in the Methods section. However, as discussed also in the Methods section, there is still an ambiguity due to the low sampling rate used to obtain the data, so that these ﬁttings may have undetected errors or noise-induced variations. This uncertainty may increase because of the higher number of parameters when two or more inputs are considered while the number of samples remains the same. As shown below and in Table IV, running the parameter identiﬁcation routine several times and performing a statistical analysis will give conﬁdence intervals and will show how good the parameter ﬁt is. The large number of inputs found in our set of highly changed genes has, in general, a very low probability of occurrence, as shown in Table II. It might be that the parameter changes in the models for the selected genes are due to such high number of inputs, inputs that were not taken into account in the initial evaluation with only one input. To analyze this possibility, we performed 10 recalculations of the parameters with two inputs in cases in which the initial calculation was performed only with one input (Table IV, CASE 1 and CASE 2). The error obtained in these cases was about the same with one input as with two inputs, but the repeated calculations with two inputs moderate the noise effects and give an indication of the robustness of the parameter ﬁt with the 95% conﬁdence intervals obtained for the mean parameter values. It was established as discussed below and shown in Table IV that the gene pairs with highly changed parameters evaluated with one input maintain in general this high change in the parameter values when estimated with two inputs. They give, however, parameter changes of lesser magnitude, 544 G. CAVELIER AND D. ANASTASSIOU and in some cases, the change has migrated to one or several different/new parameters. A ﬁnal issue examined was the validity of the model and the ﬁtting approach, that is, the statistical robustness of the ﬁt, the conﬁdence intervals for the parameters, and the uniqueness of the ﬁt. To examine these issues, we analyzed 10 recalculations of the parameters in several additional cases, both with one input only and with two inputs (Table IV). Because of the inherently random nature of the evolutionary algorithm procedure, these recalculations were necessarily run with different initial conditions and through different pathways in the optimization to the minimum, yielding slightly different values for the parameters. The results show that a single run of the optimization may give too large a variation, while an average of several runs will alleviate the noise effects and give an indication of the robustness of the ﬁt by providing the 95% conﬁdence intervals for the mean value of the parameters, as shown in that table. For example, in the ﬁrst case analyzed in Table IV (CASE 1) the gene pair was YBL005W 3 YDR406W corresponding to the ABC transporter PDR15 examined before (MIPS category 20 CELLULAR TRANSPORT). This gene pair had a change in ␶m for gene PDR15 of 1631.70% (from 4.49 to 77.82 min) using only one input, and the average variation in all the parameters was 304.30% with a standard deviation of 586.60. When a second input was taken into consideration (YGL013C 3 YBL005W 3 YDR406W, one run, two inputs) the mean square error was approximately the same (17% for SK1 and 7% for W03). The modiﬁcation in ␶m was found to be only 112.09% (from 29.19 to 61.91 min) but the mean adjustment of all the parameters was 9722.70% with a standard deviation of-. This clearly indicates an overﬁtting of the parameters. When the parameter identiﬁcation was run 10 times, an adjustment in ␶m from SK1 to W03 of 111.03% was found (from an average of 14.28 ⫾ 6.41 to an average of 30.13 ⫾ 16.19 min, with a 95% conﬁdence interval for the means) and an average change in all the parameters of 117.52% with a standard deviation of 98.14. This shows that the trend in the parameter modiﬁcations is conserved in this gene pair from the case of evaluation with only one input to the case with two inputs. It also shows that running the optimization several times is better, as it moderates the noise and provides a conﬁdence interval for the mean of the parameter values. In the second case analyzed in Table IV (CASE 2) the gene pair was YBL021C 3 YNR001C. This gene pair did not have highly changed parameters when the assessment was done with only the input YBL021C. However, it gave somewhat modiﬁed parameters when run 10 times with the two inputs YBL021C and YBL103C. For example ␶m changed 494.88% (from an average of 3.40 ⫾ 2.25 min to an average of 20.21 ⫾ 5.36 min, with a 95% conﬁdence interval for the means). An average change in all the parameters of 182.73% with a standard deviation of 257.76 was found. The error in each of the cases is approximately the same. Therefore, in the case of several parameter optimizations, the mean and its conﬁdence interval will be a better assessment of the change in parameter values. In the third case analyzed in Table IV (CASE 3), the gene pair was YBR112C 3 YHR211W, which has only one input. One modiﬁcation discovered with the 10 evaluations is in kt for the gene product of YBR112C (from an average of 0.38 ⫾ 0.06 to an average of 4.67 ⫾ 1.30, with a 95% conﬁdence interval for the means). A second variation is in THR for gene FLO5 (from an average of 803.06 ⫾ 152.72 to an average of 8834.98 ⫾ 886.70, with a 95% conﬁdence interval for the means). This gene pair therefore also maintains the trend in the parameter changes found with only one run of the parameter identiﬁcation. The fourth case analyzed in Table IV (CASE 4) corresponds to the gene pair YBR083W 3 YOR077W, which did not have a high change in its parameters initially with only one input (it does not have more inputs) and one run of the optimization. When the optimization was run 10 times with the sole input, a modiﬁcation in ␶m was found (from an average of 8.82 ⫾ 1.39 min to an average of 44.23 ⫾ 19.36 min, with a 95% conﬁdence interval for the means). This shows that the optimization run 10 times provides a better assessment of the parameter changes. These evaluations prove that the general trends are likely to be conserved from the case of one run to the case of several runs of the parameter identiﬁcation. However, to rely only on the error metric of one run of the parameter identiﬁcation is not enough to guarantee a good ﬁt, as overﬁtting can give an equally low error but with an unacceptable large variance of the parameters. This uncertainty can be reduced with more parameter evaluations and the resulting robustness measured with the ﬁtting statistics proposed. A higher sampling rate would also improve the ﬁt, but at a resulting higher experimental cost. CONCLUSIONS By repeating the parameter identiﬁcation procedure taking one gene pair at a time, covering subnetworks across the genome of an organism, one can construct a database of parameters and a corresponding model of increasing complexity from the accumulated results. A set of network motifs associated with the highly changed parameters can be found as described, and their properties analyzed with respect to phenotype-related biological changes, network topology and connectivity, and associated genetic functional categories. The biophysical and structural causes of the parameter modiﬁcations can be studied, and their relationship with the transcription, translation, and degradation processes uncovered. The results obtained in the yeast subnetworks studied reveal (compared to traditional methods such as clustering or more sophisticated genetic functional analysis methods) a number of additional genes and the properties of their associated functional categories and network connectivities. The method can be improved by including more inputs when there is more than one transcription factor inﬂuencing transcription. There are in these cases not only the PHENOTYPE ANALYSIS USING DYNAMICS-DERIVED NETWORK MOTIFS usual system and measurement noises, but also parameter uncertainty effects due to the limited amount of time samples that is usually possible to obtain with microarray measurements (underdetermined system). More time samples or a statistical analysis performed over repeated evaluations with the method can help to alleviate such noise consequences and the parameter uncertainty effects. The underlying processes may be further studied in the future at the molecular and atomic scale using X-ray structural details and the thermodynamical description of the model, as explained above. For example, degradation of mRNA and proteins, transcription factor binding to DNA, binding cooperativity, mRNA transcription, and translation rate strengths can be related to X-ray structural details of mutations or other sequence variations, posttranslational modiﬁcations, macromolecular covalent modiﬁcations, or protein misfolding. Similar analyses can be imagined in cases where high-density oligonucleotide microarray expression patterns over different tissues, phenotypes, and organisms become available. 14. 15. -. ACKNOWLEDGMENTS 22. The authors thank one of the anonymous reviewers for very helpful comments that improved the manuscript, the laboratory of U. Alon, in particular R. Milo, and the laboratory of G. Church for very useful observations. They also wish to thank the Cytoscape (http://www.cytoscape. org/) development team, in particular, E. Cerami, for their valuable comments and the software. 23. 24. 25. REFERENCES 1. Reinitz J, Sharp DH. Mechanism of eve stripe formation. Mech Dev 1995;49:-. Chen T, He HL, Church GM. Modeling gene expression with differential equations. In: Altman RB, Dunker AK, Hunter L, Klein TE, editors. BIOCOMPUTING ‘99 —Proceedings of the Paciﬁc Symposium 1999, Maui, Hawaii, USA. River Edge, NJ: World Scientiﬁc Publishing Co., Inc.; 1999. p 29 – 40. 3. Gilman A, Arkin AP. Genetic “code”: representations and dynamical models of genetic components and networks. Annu Rev Genomics Hum Genet 2002;3:-. Ronen M, Rosenberg R, Shraiman BI, Alon U. Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. Proc Natl Acad Sci USA 2002;99:-. Hasty J, McMillen D, Collins JJ. Engineered gene circuits. Nature 2002;420:224 –230. 6. Kitano H. Computational systems biology. Nature 2002;420:206 – 210. 7. Ackers GK, Johnson AD, Shea MA. Quantitative model for gene regulation by lambda phage repressor. Proc Natl Acad Sci USA 1982;79:1129 –1133. 8. Shea MA, Ackers GK. The OR control system of bacteriophage lambda. A physical-chemical model for gene regulation. J Mol Biol 1985;181:-. Alm E, Arkin AP. Biological networks. Curr Opin Struct Biol 2003;13:-. Cavelier G, Anastassiou D. Data-based model and parameter evaluation in dynamic transcriptional regulatory networks. Proteins Struct Funct Bioinformat 2004;55:339 –350. 11. Yeung MK, Tegner J, Collins JJ. Reverse engineering gene networks using singular value decomposition and robust regression. Proc Natl Acad Sci USA 2002;99:-. Tegner J, Yeung MK, Hasty J, Collins JJ. Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci USA 2003;100:5944 –5949. 13. Gardner TS, di Bernardo D, Lorenz D, Collins JJ. Inferring 26. 27. -. 545 genetic networks and identifying compound mode of action via expression proﬁling. Science 2003;301:102–105. Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, Gordon DB, Fraenkel E, Jaakkola TS, Young RA, Gifford DK. Computational discovery of gene modules and regulatory networks. Nat Biotechnol 2003;21:-. Zak DE, Gonye GE, Schwaber JS, Doyle FJ 3rd. Importance of input perturbations and stochastic gene expression in the reverse engineering of genetic regulatory networks: insights from an identiﬁability analysis of an in silico network. Genome Res 2003;13:2396 –2405. Nachman I, Regev A, Friedman N. Inferring quantitative models of regulatory networks from expression data. Bioinformatics 2004;20(Suppl 1):I248 –I256. Primig M, Williams RM, Winzeler EA, Tevzadze GG, Conway AR, Hwang SY, Davis RW, Esposito RE. The core meiotic transcriptome in budding yeasts. Nat Genet 2000;26:415– 423. Chu S, DeRisi J, Eisen M, Mulholland J, Botstein D, Brown PO, Herskowitz I. The transcriptional program of sporulation in budding yeast. Science 1998;282:699 –705. Wolf DM, Arkin AP. Motifs, modules and games in bacteria. Curr Opin Microbiol 2003;6:125–134. Shen-Orr SS, Milo R, Mangan S, Alon U. Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 2002;31:64 – 68. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science 2002;298:824 – 827. Mangan S, Alon U. Structure and function of the feed-forward loop network motif. Proc Natl Acad Sci USA 2003;100:11980 –11985. Mangan S, Zaslaver A, Alon U. The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. J Mol Biol 2003;334:197–204. Rosenfeld N, Alon U. Response delays and the structure of transcription networks. J Mol Biol 2003;329:645– 654. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002;298:799 – 804. Guelzim N, Bottani S, Bourgine P, Kepes F. Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet 2002;31:60 – 63. Qian J, Lin J, Luscombe NM, Yu H, Gerstein M. Prediction of regulatory networks: genome-wide identiﬁcation of transcription factor targets from gene expression data. Bioinformatics 2003;19:-. Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 2002;18(Suppl 1):S233–S240. Bussemaker HJ, Li H, Siggia ED. Regulatory element detection using correlation with expression. Nat Genet 2001;27:167–171. Oobatake M, Kono H, Wang Y, Sarai A. Anatomy of speciﬁc interactions between lambda repressor and operator DNA. Proteins 2003;53:33– 43. Asturias FJ, Craighead JL. RNA polymerase II at initiation. Proc Natl Acad Sci USA 2003;100:-. Armache KJ, Kettenberger H, Cramer P. Architecture of initiationcompetent 12-subunit RNA polymerase II. Proc Natl Acad Sci USA 2003;100:6964 – 6968. Bushnell DA, Kornberg RD. Complete, 12-subunit RNA polymerase II at 4.1-A resolution: implications for the initiation of transcription. Proc Natl Acad Sci USA 2003;100:6969 – 6973. Rosenfeld N, Elowitz MB, Alon U. Negative autoregulation speeds the response times of transcription networks. J Mol Biol 2002;323: 785–793. Misteli T. The ﬂow of gene expression. Nat Struct Mol Biol 2004;11:202–205. Elowitz MB, Leibler S. A synthetic oscillatory network of transcriptional regulators. Nature 2000;403:335–338. Elowitz ME. Transport, assembly, and dynamics in systems of interacting proteins. Princeton, NJ: Princeton University; 1999. von Hippel PH. An integrated model of the transcription complex in elongation, termination, and editing. Science 1998;281:660 – 665. Bushnell DA, Westover KD, Davis RE, Kornberg RD. Structural 546 40. 41. -. G. CAVELIER AND D. ANASTASSIOU basis of transcription: an RNA polymerase II-TFIIB cocrystal at 4.5 Angstroms. Science 2004;303:983–988. Dundr M, Hoffmann-Rohrer U, Hu QY, Grummt I, Rothblum LI, Phair RD, Misteli T. A kinetic framework for a mammalian RNA polymerase in vivo. Science 2002;298:-. Algire MA, Maag D, Savio P, Acker MG, Tarun SZ, Jr., Sachs AB, Asano K, Nielsen KH, Olsen DS, Phan L, Hinnebusch AG, Lorsch JR. Development and characterization of a reconstituted yeast translation initiation system. RNA 2002;8:382–397. Lipford JR, Deshaies RJ. Diverse roles for ubiquitin-dependent proteolysis in transcriptional activation. Nat Cell Biol 2003;5:845– 850. Kiehl TR, Mattheyses RM, Simmons MK. Hybrid simulation of cellular behavior. Bioinformatics 2004;20:316 –322. Morton-Firth CJ, Bray D. Predicting temporal ﬂuctuations in an intracellular signalling pathway. J Theor Biol 1998;192:117–128. Das B, Butler JS, Sherman F. Degradation of normal mRNA in the nucleus of Saccharomyces cerevisiae. Mol Cell Biol 2003;23:-. Levine M, Tjian R. Transcription regulation and animal diversity. Nature 2003;424:147–151. McAdams HH, Shapiro L. A bacterial cell-cycle regulatory network operating in time and space. Science 2003;301:1874 –1877. Bray D. Molecular networks: The top-down view. Science 2003;301: 1864 –1865. Vilar JM, Leibler S. DNA looping and physical constraints on transcription regulation. J Mol Biol 2003;331:981–989. Setty Y, Mayo AE, Surette MG, Alon U. Detailed map of a cis-regulatory input function. Proc Natl Acad Sci USA 2003;100:-. Buchler NE, Gerland U, Hwa T. On schemes of combinatorial transcription logic. Proc Natl Acad Sci USA 2003;100:5136 –5141. 52. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003;13:2498 –2504. 53. Mewes HW, Frishman D, Güldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Münsterkoetter M, Rudd S, Weil B. MIPS: a database for genomes and protein sequences. Nucleic Acids Res 2002;30:31–34. 54. Kashtan N, Itzkovitz S, Milo R, Alon U. Topological generalizations of network motifs, arXiv:q-bio.MN/- v3 16 May 2004. 55. Ihmels J, Friedlander G, Bergmann S, Sarig O, Ziv Y, Barkai N. Revealing modular organization in the yeast transcriptional network. Nat Genet 2002;31:370 –377. 56. Wang YL, Liu CL, Storey JD, Tibshirani RJ, Herschlag D, Brown PO. Precision and functional speciﬁcity in mRNA decay. Proc Natl Acad Sci USA 2002;99:5860 –5865. 57. Wilusz CJ, Wormington M, Peltz SW. The cap-to-tail guide to mRNA turnover. Nat Rev Mol Cell Biol 2001;2:-. Fan JS, Yang XL, Wang WG, Wood WH, Becker KG, Gorospe M. Global analysis of stress-regulated mRNA turnover by using cDNA arrays. Proc Natl Acad Sci USA 2002;99:-. Wilusz CJ, Wilusz J. Bringing the role of mRNA decay in the control of gene expression into focus. Trends Genet 2004;20:-. Wolfger H, Mamnun YM, Kuchler K. The Yeast Pdr15p ATPbinding cassette (ABC) protein is a general stress response factor implicated in cellular detoxiﬁcation. J Biol Chem 2004;279:-. Deutschbauer AM, Williams RM, Chu AM, Davis RW. Parallel phenotypic analysis of sporulation and postgermination growth in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 2002;99:15530 – 15535.