Connection between genes and pathways

Connection between genes and pathways

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am reading about a paper about inferencing pathway information in cancer cells. Authors refer to ERBB2 as a gene and a pathway. I don't have solid biology background. What exactly means when we refer to ERBB2 as a pathway? Does it refer to activity of a protein encoded by ERBB2?

ErbB2 is a receptor tyrosine kinase. This is a class of receptors, which are located in the membrane ob the cell and which pass signals from the outside of the cell into it, so it can for example react to changing conditions.

ErbB2 (which is also called HER2, which translates into "human epidermal growth factor receptor 2") normally reacts when it binds it ligand epidermal growth factor (EGF). Upon binding, ErbB2 changes its conformation, phosphorylates itself (meaning it adds phosphogroups to specific amino acids of the protein) which leads phosphorylation (and therefore activation) of other proteins, which finally results in changed expression of genes, which are involved in cellular proliferation, differentiation, and survival (see this reference: "Review of epidermal growth factor receptor biology.").

This leads to the second part of your question what pathway means in this context: The sequence of different molecules which start at the receptor and finally induces changes of gene expression in the cells nucleus are called pathway. For members of the Erb-family of receptors there are different possibilites, but the picture below shows how this works in principle:

The image is from the Wikipeadiapage on EGFR. You can see, that the ligand binds extracellular to the receptor, which is then phosphorylated, and then passes the signal downstream for example to Ras-Raf-Mek-Erk and then into the nucleur (in fact, activated Erk enters the nucleus).

This cascade also shows, why these receptors are often mutated in cancers: They are involved in the regulation of genes important for survival, proliferation and differentiation, which is very important for cancer cells. ErbB2 for example is mutated in about 30% of breast cancers. These mutations often lead to a constitutive active receptor which is permanently phosphorylated independent of ligand binding. This leads to a permanent activation of the downstream pathways which is obviously not a good thing, since these genes are normally under a very tight regulation.

ERBB2 appears to be a (tyrosine kinase) receptor; that is, it lives in the cell membrane; with a part that sticks outside of the cell, and part that sits inside. When the right molecule attaches to the outside part, the inside part changes shape, which alters its function, causing it to start phosphorylating other proteins, which alters their functions.

So the ERBB2 pathway is the set of genes/proteins whose behavior is affected by ERBB2.

Google "ERBB2 pathway" and "tyrosine kinase receptor" for more info.

Biology and sexual orientation

The relationship between biology and sexual orientation is a subject of research. While scientists do not know the exact cause of sexual orientation, they theorize that it is caused by a complex interplay of genetic, hormonal, and environmental influences. [1] [2] [3] Hypotheses for the impact of the post-natal social environment on sexual orientation, however, are weak, especially for males. [4]

Biological theories for explaining the causes of sexual orientation are favored by scientists. [1] These factors, which may be related to the development of a sexual orientation, include genes, the early uterine environment (such as prenatal hormones), and brain structure.

Considering interactions between genes, environments, biology, and social context

Kristen Jacobson received her Ph.D. in Human Development and Family Studies from the Pennsylvania State University in 1999. She spent a year as a postdoctoral scholar in psychiatric genetics under the direction of Dr. Kenneth Kendler at the Virginia Institute for Psychiatric and Behavioral Genetics, where she later served as faculty from 2000-2005. Dr. Jacobson is currently an Assistant Professor of Psychiatry at the University of Chicago, and serves as the Associate Director for Twin Projects and the Associate Director of the Clinical Neuroscience and Psychopharmacology Research Unit. Dr. Jacobson is a collaborator on a number of twin studies of children, adolescents, and adults, and is currently conducting a multidisciplinary, multi-level study of adolescent development, From Neighborhoods to Neurons and Beyond, funded by an NIH New Innovator Award . She is editor of a special issue of Behavior Genetics entitled Pathways between Genes, Brain, and Behavior (expected publication January, 2010). New areas of research involve pilot studies of epigenetics in both mice and humans.

Bronfenbrenner’s bioecological model (Bronfenbrenner & Ceci, 1994) highlights the need to consider interactions between individual, family, peer, school, and community characteristics in understanding individual differences in human development. In order to obtain a complete understanding of the processes involved in individual differences, multidisciplinary studies that measure risk and protective factors at multiple levels of analysis are required. With recent advances in human molecular genetics, the need to integrate environmental measures into genomic studies is of even greater importance. While the mapping of the human genome and the corresponding availability of genome-wide association analysis (GWAS) techniques has led to a flurry of research activity trying to discover “genes for” particular disorders and traits, a significant body of research, both historic as well as quite recent, cautions that efforts to uncover specific genetic variants that ignore the effects of social and contextual environments in genetic studies of individual differences in human behavior and traits may be futile. This essay briefly reviews some of the most interesting work regarding the interplay of genes and environments on individual differences in human development.

Nature versus Nurture

For years, behavioral genetic studies using twin or adoptive samples have been considered the gold standard for assessing the joint effects of nature and nurture in accounting for individual differences in human behaviors and traits. Decades of behavioral genetic research have demonstrated the importance of genetically-influenced characteristics on individual differences in child, adolescent, and adult behaviors and traits. At the same time, behavioral genetic studies have revealed that generally over half of the variation in individual behaviors and traits is due to environmental factors, typically environmental factors that are unique across people within the same family or that have different effects on behavior (i.e., nonshared environmental influence).

Genetic influence has been found on “environmental” measures, suggesting the presence of gene à environment correlations. Gene à environment correlations arise because exposure to certain risk and protective environments is not random, but rather is influenced by inherited characteristics of the individual, and also because children “inherit” both genes and environments from their parents. The role of genes and environments in mediating pathways between risk and behavior is complex, however. For example, recent quasi-longitudinal work using twins to understand the relationship between peer group deviance and adolescent problem behavior found that while genetic factors accounted for most of the relationship between earlier problem behavior and later peer group deviance (consistent with genetic characteristics of an individual relating to peer selection), the relationship between prior peer group deviance and later problem behavior was largely environmentally mediated (consistent with peer influence effects (Kendler, Jacobson, Myers, & Eaves, 2008).

Nature and Nurture

While the nature versus nurture debate may have attenuated in recent years with consensus from many fields regarding the importance of both genes and environments, other areas of research have further identified interactions between nature and nurture as important components of individual differences. A host of adoption studies in the 1980s and 1990s have shown that genetic liability to antisocial behavior (as indexed through biological parent psychopathology and substance abuse) is only associated with the development of adult criminality and aggression under adverse adoptive environmental conditions, indicating that neither nature nor nurture was sufficient in and of itself to cause pathology (Cadoret, Yates, Troughton, Woodworth, & Stewart, 1995 Cloninger & Gottesman, 1987).

Alternatively, gene X environment (gXe) interactions may be implicated when the relative importance of genetic influence on behaviors and traits as measured through standard twin designs varies across social and ecological context. For example, a study by Rowe, Almeida, and Jacobson (1999) integrated genetically-informative regression models within a hierarchical linear modeling design to show that levels of parental warmth, measured at the aggregate school level, moderated the heritability (i.e., proportion of individual differences due to genetic factors) of adolescent aggression. Heritabilities of delinquent behavior are increased among adolescents living in families with high rates of dysfunction (Button, Scourfield, Martin, Purcell, & McGuffin, 2005), while the heritability of adolescent smoking decreases with higher levels of parental monitoring (Dick et al., 2007). Family and personal religiosity has been shown to decrease the importance of genetic variance on adolescent substance use behaviors (Koopmans, Slutske, Heath, Neale, & Boomsma, 1999 Timberlake et al., 2006), and urban-rural differences in the heritability of adolescent alcohol use were found to be mediated by contextual factors such as alcohol sales and neighborhood migration (Dick, Rose, Viken, Kapiro, & Koskenvuo, 2001). These latter areas of research may be of particular importance in generalizing results from prior twin studies to minority individuals or individuals in socially and economically disadvantaged environments, as most large-scale twin registries are based on primarily middle-class, Caucasian or Asian samples.

More recently, attention has turned to using measured genotypes and measured environments to investigate ”classic” gXe interactions for a number of important behaviors. Caspi et al.(2002) have elucidated an important and highly replicated (Kim-Cohen et al., 2006) gXe interaction using measured genotype (MAO-A gene) and environmental risk (child abuse) variables, demonstrating that the relationship between child maltreatment and various indices of aggressive and antisocial behavior is attenuated among individuals with the high MAO-A activity genotype.

Another highly replicated interaction has been found between a serotonin transporter gene (5-HTTPLR) and stressful life events in predicting depression (Canli & Lesch, 2007). Further studies have found interactions between the 5-HTTPLR genotype and socioeconomic status (SES) for aggression in preadolescents (Nobile et al., 2007), between the 5-HTTPLR genotype and lab-induced stress for lab measures of aggression in adult males (Verona, Joiner, Johnson, & Bender, 2006) and between life stress and the 5-HTTPLR genotype for individual differences in amygdala activation (Canli et al., 2006). There is also emerging evidence for environmental modification of dopaminergic genes related to impulsivity and aggression, with studies finding significant interactions among the DRD4-7 repeat polymorphism and caregiver quality in predicting higher levels of aggression and impulsive traits in infants and preschoolers (Bakermans-Kranenburg & van Ijzendoorn, 2006 Sheese, Voelker, Rothbart, & Posner, 2007), and interactions between SES and the DRD4 gene for aggression in pre-adolescents (Nobile et al., 2007). Thus, genes implicated in multiple neurotransmitter pathways work in conjunction with a host of social and environmental experiences to alter individual differences across multiple behaviors and traits.

Additional Gene-Environment Interplay

While the above section concerns statistical interactions between genes and environments which may represent genetic sensitivity to environmental stressors, or, alternatively, environmental exacerbation of genetic effects, another potentially important avenue for research concerns the dynamic interplay between genes and environments, that is, genetic influence on environments and environmental influences on genes. By now, it is fairly common knowledge that when measures of family environment are treated as ‘phenotypes’ in traditional behavioral genetic models, significant genetic influences on these measures are often detected (Plomin & Bergeman, 1991). Decades of behavioral genetic studies have provided considerable evidence for significant genetic influence for measures such as various dimensions of parenting, indices of SES such as income and educational level, social support, and stressful life events (see Kendler & Baker [2007] for a recent review). What has been slower to develop, however, is the notion that environmental influences and experiences can have profound effects on genetic influence. While the underlying DNA structure and sequence individuals are born with does not change over time, a newer area of research in epigenetics is beginning to identify factors that may alter gene expression and function across the lifespan.

Epigenetics, defined formally as changes in gene expression caused by mechanisms other than changes in the underlying DNA sequence, offers an exciting new frontier in the study of human psychiatric and medical diseases, and psychological behaviors and traits. Epigenetic mechanisms include DNA methylation and chromatin remodeling, the latter via post-translational modifications (e.g. methylation, acetylation, phosphorylation and ubiquitylation) to histone proteins which form the scaffold for the DNA helix. Although some epigenetic processes are essential to organism function (e.g., differentiation of cells in the developing embryo during morphogenesis), other epigenetic processes can have major adverse effects on health and behavioral outcomes. While some epigenetic changes only occur within the course of one individual organism's lifetime, animal models suggest that other epigenetic changes can be inherited from one generation to the next (see Champagne [2008] for a review), contributing, in part, to the heritability of behavioral traits and psychiatric disease.

However, a growing field of research suggests that environmental experiences, particularly those related to stress, have the capacity to alter biological and genetic mechanisms associated with increased risk of problem behavior. Again, the notion that environmental experience can change biological processes has important historical precedence. Harlow’s seminal deprivation studies of non-human primates have shown that disruptions in early rearing environments have the capacity to disrupt psychobiological regulatory functions, leading to behavioral changes. Other important animal research has begun to identify the precise mechanisms by which social environmental factors can alter epigenetic programming. Relatively recent research using animal models offers an elegant demonstration of how early environmental stressors can alter neurobiological responsivity to future stressful conditioning (Meaney, 2001). Meaney’s model highlights how individual differences in maternal behaviors can cause regulatory changes in the corticotropin releasing hormone (CRH) system at the level of the central nucleus of the amygdala, and how these changes relate further to changes in adrenocortical and autonomic effects of later stressful events. Importantly, his work suggests that these effects can be altered through intervention (Weaver et al., 2005). Differences in early maternal care have also been associated with differences in methylation of the glucocorticoid receptor gene promoter in the hippocampus (Meaney & Szyf, 2005). Most critically, a recent comparison of post-mortem brain tissue from a sample of patients with a history of child abuse and/or neglect and who died by suicide indicated DNA hypermethylation of the rRNA promoter region in the hippocampus relative to controls who experienced sudden, accidental death (McGowan et al., 2008), supporting the hypothesis that epigenetic changes due to social and environmental experiences are related to behavioral traits.

Other studies of monozygotic twins have identified variations in DNA methylation levels in certain target gene promoter regions. Because identical twins share identical genomes and experience many of the same family environmental factors, this indicates that environmental experiences that are not shared among children in the same family have an important causal role in gene expression, and may further be related to behavioral differences among identical twin pairs. Importantly, within-pair differences in DNA methylation and histone acetylation patterns were increased in older twin pairs, especially those who had different lifestyles and had spent fewer years of their lives together, strongly supporting epigenetic processes as a part of nonshared environmental influence on individual differences (Fraga et al., 2005). This suggests that epigenetic processes represent a fundamental gene-environment interface in the development and ongoing plasticity of the human brain.


While there is no doubt that genetic studies of individual behaviors and traits will increase our understanding of both normal human variation and pathological disorders, there is increasing recognition that the interplay between genes and environments is remarkably complex. Not only are both genes and environments important for both normal and abnormal human development, but genes and environments operate interactively to produce both risk and resilience to specific behavioral and psychiatric disorders. More importantly, emerging lines of research from epigenetics suggest that not only can nature alter nurture, but nurture, in turn, has the power to modify nature. Thus, genomic studies that incorporate a range of social and environmental influences will further our understanding of the complex dance between nature and nurture in human development.

Bakermans-Kranenburg, M. J., & van Ijzendoorn, M. H. (2006). Gene-environment interaction of the dopamine d4 receptor (drd4) and observed maternal insensitivity predicting externalizing behavior in preschoolers. Dev Psychobiol, 48(5), 406-409.

Bronfenbrenner, U., & Ceci, S. J. (1994). Nature-nurture reconceptualized in developmental perspective: A bioecological model. Psychol Rev, 101(4), 568-586.

Button, T. M., Scourfield, J., Martin, N., Purcell, S., & McGuffin, P. (2005). Family dysfunction interacts with genes in the causation of antisocial symptoms. Behav Genet, 35(2), 115-120.

Cadoret, R. J., Yates, W. R., Troughton, E., Woodworth, G., & Stewart, M. A. (1995). Genetic-environmental interaction in the genesis of aggressivity and conduct disorders. Arch Gen Psychiatry, 52(11), 916-924.

Canli, T., & Lesch, K.-P. (2007). Long story short: The serotonin transporter in emotion regulation and social cognition. Nat Neurosci, 10(9), 1103.

Canli, T., Q. M., Omura, K., Congdon, E., Haas, B.W., Amin, Z., Herrmann, M.J., et al. (2006). Neural correlates of epigenesis. Proc Natl Acad Sci, 103, 16033-16038.

Caspi, A., McClay, J., Moffitt, T. E., Mill, J., Martin, J., Craig, I. W., et al. (2002). Role of genotype in the cycle of violence in maltreated children. Science, 297(5582), 851-854.

Champagne, F. A. (2008). Epigenetic mechanisms and the transgenerational effects of maternal care. Front Neuroendocrinol, 29(3), 386-397.

Cloninger, C. R., & Gottesman, I. (1987). Genetic and environmental factors in antisocial behavior disorder. In S. A. Mednick, T. E. Moffitt & S. A. Stack (Eds.), The causes of crime: New biological approaches (pp. 99-102). Cambridge: Cambridge University Press.

Dick, D. M., Rose, R. J., Viken, R. J., Kaprio, J., & Koskenvuo, M. (2001). Exploring gene-environment interactions: Socioregional moderation of alcohol use. J Abnorm Psychol, 110(4), 625-632.

Dick, D. M., Viken, R., Purcell, S., Kaprio, J., Pulkkinen, L., & Rose, R. J. (2007). Parental monitoring moderates the importance of genetic and environmental influences on adolescent smoking. J Abnorm Psychol, 116(1), 213-218.

Fraga, M. F., Ballestar, E., Paz, M. F., Ropero, S., Setien, F., Ballestar, M. L., et al. (2005). Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci U S A, 102(30), 10604-10609.

Kendler, K. S., & Baker, J. H. (2007). Genetic influences on measures of the environment: A systematic review. Psychol Med, 37(5), 615-626.

Kendler, K. S., Jacobson, K., Myers, J. M., & Eaves, L. J. (2008). A genetically informative developmental study of the relationship between conduct disorder and peer deviance in males. Psychol Med, 38(7), 1001-1011.

Kim-Cohen, J., Caspi, A., Taylor, A., Williams, B., Newcombe, R., Craig, I. W., et al. (2006). MAOA, maltreatment, and gene-environment interaction predicting children's mental health: New evidence and a meta-analysis. Mol Psychiatry, 11(10), 903-913.

Koopmans, J. R., Slutske, W. S., Heath, A. C., Neale, M. C., & Boomsma, D. I. (1999). The genetics of smoking initiation and quantity smoked in dutch adolescent and young adult twins. Behav Genet, 29(6), 383-393.

McGowan, P. O., Sasaki, A., Huang, T. C., Unterberger, A., Suderman, M., Ernst, C., et al. (2008). Promoter-wide hypermethylation of the ribosomal rna gene promoter in the suicide brain. PLoS ONE, 3(5), e2085.

Meaney, M. J. (2001). Maternal care, gene expression, and the transmission of individual differences in stress reactivity across generations. Annu Rev Neurosci, 24, 1161-1192.

Meaney, M. J., & Szyf, M. (2005). Maternal care as a model for experience-dependent chromatin plasticity? Trends Neurosci, 28(9), 456-463.

Nobile, M., Giorda, R., Marino, C., Carlet, O., Pastore, V., Vanzin, L., et al. (2007). Socioeconomic status mediates the genetic contribution of the dopamine receptor d4 and serotonin transporter linked promoter region polymorphisms to externalization in preadolescence. Development and Psychopathology, 19(4), 1147-1160.

Plomin, R., & Bergeman, C. S. (1991). The nature of nurture: Genetic influence on "environmental" measures. Behavioral & Brain Sciences, 14, 373-427.

Rowe, D. C., Almeida, D. M., & Jacobson, K. C. (1999). School context and genetic influences on aggression in adolesceence. Psychological Science, 10, 277-280.

Sheese, B., Voelker, P., Rothbart, M., & Posner, M. (2007). Parenting quality interacts with genetic variation in dopamine receptor d4 to influence temperament in early childhood. Developmental Psychopathology, 19, 1039-1046.

Timberlake, D. S., Rhee, S. H., Haberstick, B. C., Hopfer, C., Ehringer, M., Lessem, J. M., et al. (2006). The moderating effects of religiosity on the genetic and environmental determinants of smoking initiation. Nicotine Tob Res, 8(1), 123-133.

Verona, E., Joiner, T. E., Johnson, F., & Bender, T. W. (2006). Gender specific gene-environment interactions on laboratory-assessed aggression. Biol Psychol, 71(1), 33-41.

Weaver, I. C., Champagne, F. A., Brown, S. E., Dymov, S., Sharma, S., Meaney, M. J., et al. (2005). Reversal of maternal programming of stress responses in adult offspring through methyl supplementation: Altering epigenetic marking later in life. J Neurosci, 25(47), 11045-11054.


Of the 159 million high quality reads obtained, 117 million mapped to annotated exons. An average of 1.46 million exon-mapped reads were obtained for each library (sample replicate), corresponding to an average of 2.9 million exon-mapped reads for each individual (Figure S1 in Additional file 1). There was at least one mapped read for each library at 13,156 genes, including but not limited to 11,301 protein-coding genes, 801 psuedogenes, 893 long non-coding RNAs (lncRNAs), and 40 small RNAs, which includes 21 pre-miRNAs. Expression levels were normalized (variance stabilized) using protocols described in the DESeq2 package [30]. Pearson’s correlation coefficient for each pair of sample replicates was 0.98 ± 0.005, yielding an r-squared value of 0.96 ± 0.01. Data quality was further evaluated by validating the expression profiles of three genes by rt-qPCR, a mean Pearson’s r of 0.74 ± 0.07 was observed between the expression values measured by RNA-sequencing vs. rt-qPCR (Figure S2 in Additional file 1). Thus, based on both sample replicates and an independent method of measuring expression abundance, the data we obtained provide an accurate measurement of RNA transcript abundance.

Total gene expression structure

To determine if inter-individual gene expression variation was larger than intra-individual variation, and if individuals cluster by ancestry, a sample-by-sample correlation matrix was calculated and a hierarchical clustering dendrogram of all libraries was produced (Figure 1A). We observed that 74 of the 80 dissection replicates clustered together, consistent with the correlation results and indicating that intra-individual variation tends to be smaller than inter-individual variation. The three individuals whose dissection replicates did not pair were subsequently removed from all further analyses under the assumption that their lack of pairing was the product of dissection and/or processing error.

Overview of total gene expression variation at 13156 expressed genes. (A) A cluster dendrogram of libraries based on the following expression distance between each pair of libraries: 1-abs(r), where r is Pearson’s correlation coefficient for expression levels across all genes. Individual libraries and branches are colored to designate their group affiliation asterisks indicate three pairs of replicate libraries that do not cluster together. (B) Scatter plots of the first three principle components (PC) using data from individuals and all genes. The explained proportion of variation is annotated on each axis. (C) Scatter plots between the first few PCs and correlated explanatory variables.

An additional observation from the sample correlation dendrogram is the lack of clustering of individuals with the same ancestry. To further evaluate this a principle component (PC) analysis reveals that in contrast to what is commonly observed for genetic data [31-33] there is no evident structure in this cellular phenotype that corresponds to groups (Figure 1B). However, when the PC loadings for each individual are tested for correlations with other aspects of the data (Figure 1C), PC2 is correlated with fetal length at birth (r = -0.54, Bonferroni P value = 0.007), PC3 correlates with the sum of mapped reads (r = -0.62, Bonferroni P value = 0.0005), and PC4 correlates with normal maternal weight (r = 0.46, Bonferroni P value = 0.045). Additionally, analysis of genes that correlate with the loadings from the first three PCs [34,35] reveals enrichment in hundreds of gene ontology categories, particularly molecular function (GO:0003674), biological process (GO:0008150), binding (GO:0005488) and their sub-categories (Additional file 2: Table A), as well as numerous KEGG pathways (Additional file 2: Table B) highlighted by the most enriched KEGG pathway, namely 01100:Metabolic Pathways (adjusted P value = 2.9e-05). Overall, it appears that total transcriptome variation is largely influenced by factors other than group affiliation (i.e. population), and that transcript variation hence does not parallel expected patterns of genetic structure for these groups [32,36].

An apportionment of gene expression variation

Total variance in expression at each gene was apportioned among groups (Mst and Nst), among individuals within groups (Mit and Nit), and among dissection replicates (or within individuals, Met and Net). An analysis of variance (ANOVA) at each gene was performed to apportion the variation and two components of the data were used to derive the apportionment estimates - the additive components of variances and the sums of squares estimates (see the Methods section for details on these models). Under this framework we are able to model all groups simultaneously as well as model populations in pairs. Assuming a model with four populations, the variance (Mst, Mit, Met) and variation parameters (Nst, Nit, Net) are highly correlated across genes (Mst:Nst, r = 0.97 Mit:Nit, r = 0.95 Met:Net, r = 0.99 P = 2.2e-16 Figure S3 in Additional file 1), even though their distributions and mean estimates are quite different (Figure 2A to C). The uniqueness of the variance parameters (M*t) reflects the specific manner in which these values are derived - that is, by the additive component of variance from the expected mean squares in this type I hierarchical ANOVA (see Table S1 and S2 in Additional file 1). Given the correlation among parameter estimates and the lack of zero values in the sums of squares approach (Figure S3 in Additional file 1), we focus on the variation or variability parameters Nst, Nit, and Net. On average we find that 33.2% of the variability in gene expression is found among populations of cells within a single tissue (Net, permutation of reads among replicates, P = 0.22), 58.9% of the variability is among individuals within groups (Nit, permutation of libraries among individuals within groups, P = 0.048) and 7.9% of the variability is among groups (Nst, permutation of individuals among groups, P = 0.24) (Figure 2B and C). These estimates indicate that even though inter-individual variation is, on average, the largest component of expression variation, intra-individual variation cannot be ignored in measuring cellular phenotypes. Similarly, while among group expression variation does not, on average, reach the levels of structure seen at the genetic level, the group component does detectably influence expression variation, particularly at a subset of genes, which we explore below.

Apportionment Summaries. (A) The distribution of variance apportionments derived from the additive component of variance estimates. (B) The distribution of variation apportionments derived from the sum of squares. (C) Mean estimates for each apportionment parameter using both the variance and variation. (D) A dendrogram of weighted mean population distances derived from the Mst parameter. (E) A dendrogram of weighted mean population distances derived from the Nst parameter.

When modeling expression variation in a pairwise manner, mean estimates are similar to those observed in the four-population analysis (Table 1). However, among group variation (Nst) is in the range of 0.045 (for AF:EU) to 0.062 (for EA:SA). A dendrogram was constructed using mean pairwise Nst distances (Figure 2D and E). We find that the data are congruent with expectations from genetic data [36], with the exception that SA tend to be the most distant group.

Mean expression and apportionment estimates

The mean expression of each gene is significantly correlated with the residual (or intra-individual) sum of squares estimate (Pearson’s r = 0.60, P <0.001). This illustrates that as mean expression increases, variation in mRNA abundance among our sample replicates also increases. As such, we estimate that mean expression explains 36% of the variation in our error sum of squares. However, the among group (r = 0.018, P = 0.034) and among individuals within groups (r = -0.029, P = 0.001) sums of squares are more weakly correlated with mean expression. Consequently, the apportionment parameters are correlated with mean expression with coefficients of -0.446, 0.388, and 0.166 for Net, Nit, and Nst (P <0.001), respectively. The proportion of variation explained by mean expression for each apportionment parameter is thus 20%, 15%, and 2.7% for Net, Nit, and Nst, respectively. This suggests that mean expression is having a modest influence on parameter estimates, and the acquisition of more reads will not greatly influence the apportionment estimates.

Differential gene expression among individuals

The proportion of genes that vary significantly among individuals in expression levels was analyzed via a F-ratio test between inter-individual and intra-individual variance. We observed that 5,880 genes, or 44.5% of all genes (at an FDR 5%), exhibited significant among individual, within group variation. Additionally, fitting two linear models to the data (a null model and a second model that includes individuals as an explanatory variable), followed by a Chi-squared test of model fitting, results in 5,491 genes (41.7% of all genes) with significant inter-individual variance (at an FDR 5%). There is an 84% overlap between the significant genes in both analyses. We estimated the proportion of within-group variation explained by inter-individual variation with the parameter Nis (SSb/SSb + SSe see Methods). On average 64% of the within-group variation is attributed to individuals, indicating substantial inter-individual variation. Those genes that are significantly differentially expressed (DE) among individuals, as determined by the F-ratio test, have a minimum Nis value of 0.65. To determine if there may be significant variation attributed to intra-individual variation at some loci, we inverted the F-ratio test by placing the intra-individual mean squares in the numerator and inter-individual mean squares in the denominator, but observed no significant loci after Benjamini-Hochberg correction. Overall, this illustrates that there is substantial inter-individual variation in gene expression variation.

Differential gene expression among groups

Three different methods were used to identify and quantify genes that may be differentially expressed among human groups: two published methods (DESeq [30] and tweeDESeq [37]), and a permutation of the hierarchical ANOVA. The two published methods can only compare two groups at a time, while permutations of the hierarchical ANOVA permit the analysis of two or more groups simultaneously.

While there is marked variation in the number of DE genes that each method identified, there are consistent trends (Table 2). For example, the relative proportion of DE genes for each pair of populations were correlated between methods (Pearson’s r = 0.927, P <0.008) and comparisons that included South Asians tended to have the most DE genes for any one group. Further, 99% and 92% of the genes identified as DE by the DESeq and tweeDESeq methods respectively were also identified as DE by the permutation method. In the permutation analysis, the cutoff Nst value for DE genes differs slightly depending on the groups being compared but averages out to an Nst estimate of at least 0.326. The reduced number of DE genes identified with the DESeq and tweeDESeq methods is because both methods are model-based analyses with specific tests and false discovery correction of differential expression. The permutation method presented here simply identifies extremes in the observed data that are difficult to explain by random chance.

To determine the potential biological relevance of the genes identified as DE, we tested for enrichment in GO and KEGG pathways. When testing the union of all pairwise permutation DE genes (1,784 DE genes), we observed enrichment in 15 KEGG pathways and 371 GO categories at a moderate-confidence FDR of 20% (5 KEGG and 201 GO at a high-confidence FDR of 5%) (Table 3, Additional file 3: Table A). In general, KEGG and GO enrichments indicate that genes involved in cellular signaling, immune response, tissue and organ development, and metabolism pathways are DE among groups.

Non-neutral gene expression profiles

Although it is difficult to determine if expression at a particular gene is evolving according to neutrality or under selection, we are able to identify expression profiles that conform to four specific patterns of selection: directional, balancing, stabilizing, and diversifying. Importantly, these analyses do not test for deviations from neutrality, but rather identify genes that exhibit expression profiles consistent with selection on quantitative traits [38,39]. Traits under directional selection are expected to exhibit shifts in mean expression among groups exemplified by greater among group variation relative to within group variation, and would hence be consistent with previously identified DE genes. Balancing selection is exemplified by high diversity or variation among individuals within a population but low variation among populations. Stabilizing selection results in low levels of expression variance among individuals while diversifying selection is reflected in high levels of expression variance among individuals. We identified genes that typify each selection profile using apportionment of variation estimates, estimates of total expression variance, and a series of permutations, as described in Methods.

Using data from the model fitting all four groups simultaneously, we observe that the among groups variation (log(SSa)) correlates positively with the among individuals variation (log(SSb), Pearson’s r = 0.579, P <2.2e-16), in agreement with expectations under neutrality [40]. Additionally, the variation within individuals (log(SSe) also correlates positively with the among individuals variation (Pearson’s r = 0.46, P <2.2e-16) and the among groups variation (Pearson’s r = 0.25, P <2.2e-16) (Figure 3A). To estimate the proportion of the human placental transcriptome that may be consistent with neutral vs. non-neutral expectations, we performed a series of permutations (see Methods). We estimate that 64.8% of all genes are consistent with a neutral-drift model for a quantitative trait [38]. The most prevalent non-neutral profile of gene expression variation is stabilizing selection, which influences an estimated 26% of all genes, followed by directional (646 genes, 4.9%), diversifying (635 genes, 4.8%), and balancing (173 genes, 1.3%) selection (Figure 3B see Additional file 4 for a list of all genes).

Evaluating neutral vs. non-neutral evolution of the human placental transcriptome. (A) A scatter plot of among group and among individual variation as measured by the log of the corresponding sum of squares. Genes that were identified as having patterns of variation consistent with neutrality or with directional, diversifying, stabilizing, or balancing selection are color-coded. (B) A pie chart illustrating the proportion of genes consistent with a particular mode of evolution.

When each of these modes of selection are mapped onto the distribution of within-group and among-group variation (Figure 3A) we can identify near discrete sections of the distribution that reflect these observations. Interestingly, there are areas of the distribution where these modes of selection overlap (Figure 3B). For example, there is a small set of genes for which expression variation is both large among individuals (diversifying) and among groups (directional) (Figure 4A and B). Conversely, some genes have more constraint in total variance, consistent with stabilizing selection, and yet also have significant shifts in mean expression among groups, consistent with directional selection (Figure 4A and C). And finally, constrained inter-individual expression (stabilizing selection) can also occur with reduced among group variation (balancing selection) (Figure 4D).

Boxplots of non-neutral expression variation. The y-axis of all plots illustrates the same range of expression. Each population is color-coded and the estimated Nst value for each gene is in the bottom left corner of each plot. (A) A gene consistent with directional selection. (B) A gene consistent with both directional and diversifying selection. (C) A gene consistent with both stabilizing and directional selection, with a dotted grey horizontal line to help view the shift in mean expression, while also presenting constrained among group, within individual variation. (D) A gene consistent with both stabilizing and balancing selection.

To determine if genes differentially expressed among groups, that is, those with a pattern consistent with directional selection, could effectively recapitulate group ancestry, we used expression variation across all 646 directional genes (those identified when modeling all four populations at once) to generate a UPGMA tree and perform a principle component analysis. We observe that individuals form monophyletic clades consistent with population ancestry (Figure 5A). Additionally, increased levels of population structure were observed in the principle component analysis but are only fully discernable when viewing the first three PCs together (Figure 5B). PC1 tends to distinguish individuals of African ancestry from those of non-African ancestry, while PC2 tends to distinguish SA from EA and PC3 distinguishes Europeans from non-Europeans (Figure S4 in Additional file 1).

Population structure revealed by genes consistent with directional selection. (A) A UPGMA tree of expression distances among all libraries and individuals at genes consistent with directional selection. (B) A 3D scatter plot of the first three PCs based on variation in the 646 genes consistent with directional selection. The proportion of explained variation is annotated on each axis and each individual’s group affiliation is color-coded to match the annotation in (A).

Expression variance, genetic diversity, and network connectivity

The prevalence of genes that deviate from neutral-drift expectations, particularly those consistent with stabilizing selection, prompted us to hypothesize that inter-individual variance in gene expression must have a genetic component. Specifically, we hypothesized that genes with greater expression constraint would have greater genetic constraint. Additionally, genes exhibiting large inter-individual expression variances may allow, through relaxed constraint or by necessity, a relative excess of variation. To evaluate this hypothesis, we tested for a correlation between expression variance and pairwise genetic diversity. Pairwise genetic diversity (π) was calculated for each gene, controlling for gene length [41], for three populations from the 1000 Genomes data: CEU = Northern Europeans, ASW = African Americans from the southwest USA, and CHS = Han Chinese from Southern China. We chose these three populations as they are the best available proxies for our sampled individuals. When diversity is compared from each population to expression variance, we observe a significant positive correlation (ASW: r = 0.213 CEU: r = 0.189 CHS: r = 0.177, P < 2.2e-16 Figure S5 in Additional file 1). In addition, expression variance also correlates with Tajima’s D values (ASW: r = 0.179 CEU: r = 0.129 CHS: r = 0.132, P < 2.2e-16 . These observations indicate that total expression variance has a small (r-squared = 0.04) albeit significant genetic and thus heritable component.

Another factor that may influence expression variance is the number of interacting partners a gene has. Previous work on gene-networks has illustrated that the degree of connectivity (number of interactions) influences the rate of molecular evolution [42]. Here, using data from BioGrid we tested if the number of interacting genes also influences the expression variance of a gene (Figure S6 in Additional file 1). Indeed, we observe a weak tendency for the expression variance to increase as the number of interacting genes decreases (Pearson’s r = -0.28, P < 2.2e-16 ).

To evaluate how both genetic diversity and connectivity may together influence gene expression variance we built an ANOVA model setting the coefficient of variation in gene expression as the response variable, and setting gene diversity and connectivity as explanatory variables with interaction. Each component of the model significantly influenced expression variance (diversity P < 2.2e-16 connectivity P < 2.2e-16 interaction P = 0.029) explaining an estimated 4.3%, 2.3%, and 0.07% of the total variance in expression variance, respectively.

Gene co-expresssion modules and functionality of selection categories

To determine if the sets of genes corresponding to the four non-neutral modes of evolution have a coherent biological effect, we tested for evidence of co-expression networks and enrichment in GO gene ontology terms and KEGG functional pathways. No enrichment was observed for genes consistent with a pattern of balancing selection. The results from the three other non-neutral modes are presented below.

Overall, genes consistent with directional selection (646 genes) were enriched in 145 GO categories and six KEGG pathways at an FDR of 20% (70 and 0, respectively, at an FDR of 5%). They are associated with extracellular and membrane regions, response to stress, infectious disease, signaling, binding, and metabolism pathways and categories (Additional file 3: Table B). Six co-expression modules were identified that form compact co-expression networks, but also interact with each other through a reduced number of loci (Figure 6A and B). The only individual module that is enriched for a particular set of functions is module 6 (red Module in Figure 6A). This is the smallest module, containing just 54 genes, but at an FDR of 20% this module is enriched for 110 GO categories (52 at FDR 5%, Additional file 3: Table C), and 15 KEGG pathways (7 at FDR 5%, Additional file 3: Table D). These genes are principally involved in defense and immune response but are also associated with vitamin absorption and digestion, and arachidonic acid metabolism, a key fatty acid.

Co-expression heatmaps and networks. Heatmaps of gene × gene expression correlations for genes under directional selection (A) and diversifying selection (D), respectively. Each row and column is the same set of genes, annotated by the same cluster dendrogram of gene expression distance. Additionally, each row and column is color-coded to its associated gene co-expression module. In the heatmap plot itself, the color red indicates more similar co-expression and blue indicates greater dissimilarity. Gene co-expression networks for genes under directional selection (B) and diversifying selection (C) are also presented. Nodes of interaction were only created for genes which present significant co-expression at an FDR of 1%. Black nodes are genes with at least 32 significant interactions. Red nodes are genes with at least seven significant interactions. Blue nodes are genes with at least two significant interactions. Green dots are genes with no significant interactions at an FDR of 1%.

To evaluate if the enrichment observed here is the product of unique expression in a particular population or variation across all groups, we partitioned all directional genes by their expression profiles using k-means clustering. When partitioning the expression profile data into two groups (k = 2), we observe two opposing profiles where expression is lowest in Africans, highest in South and East Asians, and intermediate in Europeans (cluster 1) or highest in Africans, lowest in South and East Asians, and intermediate in Europeans (cluster 2) (Figure 7, row K2). Enrichment tests for these two clusters reveal that only cluster 1 exhibits any enrichment, with ontology and pathway enrichment consistent with those observed above. This observation would be consistent with a hypothesis of adaptive responses in non-African populations during migrations out of Africa. However, when the data are partitioned into more clusters (k = 6), there is no ontology or pathway enrichment for those clusters that accentuate the expression differences between Africans and non-Africans (Figure 7, row K6, clusters 4 and 5). Note that we chose a K of 6 for this particular analysis because it is the first K that uniquely separates African from non-African populations in both an upregulated (cluster 4) and downregulated (cluster 5) manner. Results for K2 through K8 can be found in Additional file 1: Figure S7. Interestingly, it is rather cluster 1 (Figure 7, row K6), with elevated expression in South Asians relative to the other groups, that harbors the entire enrichment signal. These 111 genes are enriched at an FDR of 20% in 19 KEGG pathways (8 at FDR 5%) and 320 GO categories (136 at FDR 5%). Again, they are mostly involved in immune response and metabolism, consistent with the observations above (Additional file 3: Table E).

Expression levels for genes consistent with directional selection. Each dot represents an individual spaced across the x-axis and with mean normalized gene expression on the y-axis. The results of the cluster analysis are illustrated for two clusters (K2) and for six clusters (K6). Individuals are color-coded with respect to their associated group.

With diversifying genes, three co-expression modules (Figure 6D) were identified and two highly integrated networks along with two smaller networks (Figure 6C), consistent with the co-expression modules, were observed. Each module was enriched in numerous GO ontology terms (Additional file 3: Table F) and KEGG pathways (Additional file 3: Table G) with both unique and overlapping functions. Module 1 (Figure 6D, cyan) is enriched in 546 GO ontology terms and 22 KEGG pathways at an FDR of 20% (222 GO and 8 KEGG at FDR 5%) and involved in numerous areas of biology including growth, development, signaling, metabolism, and disease. Module 2 (Figure 6D, blue) is enriched in 131 GO ontology terms and three KEGG pathways at an FDR of 20% (35 GO and 2 KEGG at FDR 5%) and involved with binding and receptor interaction, specifically cytokine-cytokine receptor interaction and neuroactive ligand-receptor interaction. Module 3 (Figure 6D, dark red) is enriched in 378 GO ontology terms and 12 KEGG pathways at an FDR of 20% (132 GO and 9 KEGG at FDR 5%) and associated with disease and signaling pathways. The union of all diversifying genes reveals ontological and functional enrichment consistent with the above data (Table 4, Additional file 3: Table H).

Stabilizing genes formed four co-expression modules that, as a unit (Additional file 3: Table I), are associated with 1,245 GO ontology terms and 51 KEGG pathways at an FDR of 20% (898 GO and 39 KEGG at an FDR of 5%) and are involved with basic, largely intracellular, processes (Table 4). These include association with the splicesome, ribosomes, RNA transport, and protein processing. But they are also associated with neurological diseases such as Huntington’s, Parkinson’s, and Alzheimer’s disease. Finally, there are also associations with bacterial infection, hepatitis C, T-cell signaling, and cancer pathways. Individually, each module has a unique functional composition, but there is overlap at varying degrees for a few key pathways that include basic intracellular functions and associations with neurological diseases (Additional file 3: Table J and K).

The influence of biological traits on gene expression

Along with population ancestry, several anthropometric and dietary traits were also collected from each individual, to evaluate their association with expression variation. Starting with the model of gene expression used previously, which included technical (number of mapped reads and RNA quality) and population factors (group and individual), eight additional traits were added: sex of the child, weight of the child, length of the child, birthing manner (Cesarean or vaginal), maternal age, maternal body mass index, whether or not the mother drinks alcohol (outside of the pregnancy), and whether or not the mother is a vegetarian (see Methods for model details). Note that each new trait being modeled is a measure of inter-individual variation. The significance for each factor was determined by an F-test (FDR of 5%) using the mean square estimates of each factor over the residual (intra-individual variation).

On average each factor explained roughly 2% of the variation in the data, with intra-individual (32%) and inter-individual (41%) variation accounting for most of the variance among group variation explained 6.3% (Figure 8). As expected the vast majority of variation explained by each of the new explanatory variable was previously explained by variation among individuals, thus the reduction in the Nit estimate from 0.59 (Nit, Figure 2C), to 0.41 (Figure 8). All factors were enriched in no less than 59 GO ontology (Additional file 5: Table A) terms at an FDR of 5% and all but three factors (RIN, sex, and length) were enriched in at least one KEGG pathway at an FDR of 5% (Additional file 5: Table B). Importantly, the significance for all factors was dependent on the within group-among individual variation (Nit) and the mean expression of genes (Figure S7 in Additional file 1). As such, if a gene previously exhibited no significant variation among individuals in our simple model of gene expression then it did not exhibit any significant variation among any of the eight additional factors in our full model. Thus, all of the GO ontology terms and KEGG pathways observed for each of the new factors are simply a subset of those previously associated with variation among individuals, which was enriched in 104 KEGG pathways and 2,720 GO ontology terms at an FDR of 20% (65 KEGG, 1,729 GO at an FDR of 5%). On the technical side, genes that correlated with the number of mapped reads were overwhelmingly those that are highly expressed and associated with pathways such as Ribosome (KEGG 03010 adjusted P = 4.75e-23). Such technical artifacts are known to be an issue with this technology and are precisely why the number of mapped reads and RNA quality (RIN) values were included as leading explanatory variables in all models of gene expressions [43]. See Additional file 5, for all GO and KEGG enrichment data for each trait.

Apportionment bar plot. Each gene was fit to a single model accounting for 13 explanatory variables and the proportion of variation explained by each variable was estimated using the sum of squares approach.

One striking observation from the trait model fitting was that newborn weight was associated with three cancer pathways and the hematopoietic cell lineage pathway. This observation is consistent with reports of newborn birth weight being associated with increased risks of childhood leukemia [44,45]. Are the genes associated with this effect being downregulated as birthweight increases, or are they being upregulated? To evaluate this specific example and all other associated trait enrichments we partitioned the correlations between gene expression and the trait by the direction of their effect and then re-evaluated pathway associations (Figure 9, Additional file 5: Table C). The results indicate large coordinated changes in expression for each factor. For example, as newborn birth weight increases there is a decrease of expression in genes associated with the hematopoietic cell lineage, cancer pathways, bile secretion, dilated cardiomyopathy, and vascular smooth muscle contraction, but genes associated with protein processing in the endoplasmic reticulum increases. Further, individuals who normally consume alcohol have decreased expression in pathways such as glycolysis and fat digestion. Placentas from female children have increased expression in protein digestion, ECM-receptor interaction, amoebiasis, and focal adhesion. Placentas from Cesarean births exhibit decreased expression in glycolysis, protein processing in the endoplasmic reticulum and antigen processing. As a final example - as maternal body mass index increases there is a correlated increase in expression for genes involved in staphylococcus aureus infection, complement and coagulation cascades, and systemic lupus erythematosus pathways. These data, as presented in Figure 9, illustrate the correlated effect that gene expression changes may have on specific functional pathways and by inference on the physiology of an organ or individual.

Enrichment heatmap. A heatmap of Benjamini-Hochberg adjusted p-values for the association between each explanatory variable (x-axis) and KEGG pathway categories (y-axis). To be included in the heatmap a KEGG pathway had to be associated with at least one explanatory variable at an FDR of 1%. Additionally, each explanatory variable was partitioned by the direction of its association with gene expression. For example, the variable ‘All Veg. Genes’ annotates all genes that demonstrated a significant vegetarian diet effect, while the variable ‘Increased Exp. in Veg.’ annotates those vegetarian diet associated genes whose expression profile increased relative to non-vegetarians. Similarly ‘Pos. Age Genes’ annotates all genes that significantly correlated with maternal age in a positive manner.


The genes involved in respiratory and energy production are related to MS

The DEGs related to energy production and conversion were clustered and expressed higher in the maintainer line (Additional file 12: Figure S7). GO terms analysis showed that a large number of DEGs enriched in “oxidative-reduction” of BP and “oxidoreductase activity, oxidizing metal ions” of MF (Fig. 4). As a result, these drastically down-regulated genes functioned in energy production and conversion included NADH-related dehydrogenase (TRINITY_DN82756_c2_g5), ADP/ATP carrier protein (TRINITY_DN101445_c0_g1) and oxidoreductase (TRINITY_DN71824_c0_g2). These proteins are components in the mitochondrial respiratory chain. It suggests that mitochondrial respiratory related enzymes play a vital role in the eggplant MS line. The result is consistent with a previous study [41], in which the CMS related genes was explored in MS line of welsh onion. As we know, MS of many plants were associated with mitochondria, especially the CMS. Several researches have reported that the genes encoding mitochondrial respiratory chain enzymes and enzyme complexes were important to CMS lines in other plants [42, 43]. Therefore, our results are consistence with previous studies indicating the reduction in mitochondrial respiratory results in MS in eggplant.

The genes involved in carbohydrate metabolic pathways are related to MS

The carbohydrate metabolism pathway is one of the basic metabolic pathway during plant development. It supplies energy and carbohydrates for plants growth and development [44]. Glycosyltransferase (TRINITY_DN105743_c0_g1) and glycosyl hydrolases (TRINITY_DN56811_c0_g2) belonging to two types of enzymes in the carbohydrate metabolism pathway were down-regulated in the MS line in our study. The genes encoding the two enzyme families have been reported to be associated with cell-wall synthesis and degradation [45,46,47]. SPG2, a GT43 glycosyltansferase and UPEX1, a GT31 glycosyltransferase were involved in the formation of pollen wall primexine [48, 49]. These results suggest that glycosyltransferase and glycosyl hydrolases play a specific role in pollen development. In this study, we found the most enriched terms were carbon, starch and sucrose metabolism in KEGG analysis and the genes related to carbohydrate transport were clustered together in hierarchical clustering analysis, which were consistent with the conclusions from other studies. Interestingly, few of microspores were observed in our MS line. Therefore, we speculate that the decrease in carbohydrate metabolism influences pollen formation, leading to MS in eggplant.

The genes involved in amino acid transport and metabolic pathway are related to MS

Previously, it has been reported that glutamine in plant amino acid metabolism plays a central role during pollen development. The pollen population did not mature with glutamine starvation [50,51,52]. Fang [53] found that the expression of glutamine synthetase in the amino acid synthesis pathway decreased in pepper CMS. In accord with previous studies, one of the dramatically down-regulated genes in the MS line encoded glutamine synthetase (GS) (TRINITY_DN8287_c0_g1) in the study, indicating that the MS may result from the lack of glutamine synthetase. Another enzyme involved in amino acid metabolism in this study was NRT1/ PTR (nitrate transporter/ peptide transporter) family (TRINITY_DN134037_c0_g2, TRINITY_DN66882_c0_g2 and TRINITY_DN57038_c1_g1). Similarly, the abundance of NRT/PTR proteins was observed to decrease in Weichert’s [54] study. Researchers discovered that AtPTR5 mediated peptide transport into pollen through mutation and overexpression [55]. Hence, we infer that the MS in eggplant may be generated from a suppressed expression of NRT/PTR or other proteins that reduce transportation of peptides to affect pollen development.

Transcription factors regulated the MS related genes

A large number of studies have established that TFs regulate their targets by binding to the cis-elements in the promoters through an interaction with protein partners during plant growth and development and response to environmental stimuli. In the MS line, the expression of some TFs was also altered. 898 differentially expressed TFs could be classified into 63 families including AP2/ERF, MYB, NAC, WRKY, bHLH and MADS (Fig. 7). Transcriptional regulation has been demonstrated to be important for male fertility. A bHLH transcription factor, DYT1 can activate the expression of two TFs- MYB35 and MS1, which influences the tapetum function and pollen development [56]. With further investigation, it was found that DYT1 directly regulates the expression of TDF1 (DEFECTIVE in TAPETAL DEVELOPMENT and FUNCTION1, a putative R2R3 MYB transcription factor) [57], which in turn promotes the expression of AMS that is a regulator of pollen wall formation [58]. AMS acts upstream of MS188 that affects the expression of MS1 [58]. These results show that these transcription factors form a genetic pathway in pollen development. Our study has found more TFs than previous studies, suggesting that it is a more complicated transcriptional regulation network operative in pollen development in eggplant. Some TFs belonging to the same family were up-regulated and other were down-regulated, which complies with a recent study of bud dormancy in grapevine [59] (Additional file 1: Table S1). The result demonstrated that the members from a same TF family may play different roles in pollen development, which may constitute a more complex transcriptional regulatory network.

Materials and methods

Cell culture and RNA isolation


The human myeloma cell line INA-6 was maintained in RPMI 1640 medium, supplemented with 50 μM 2-mercaptoethanol, 10% FCS and 100 U/ml of penicillin and streptomycin (all from Invitrogen GmbH, Karlsruhe, Germany). RNA was isolated from cells either withdrawn from IL-6 for 13 h with or without restimulation with IL-6 for 1 h or permanently maintained in the presence of 1 ng/ml IL-6 (permanent IL-6). Recombinant human IL-6 was a gift from S Rose-John (Kiel, Germany).

D53wt cells were grown in 10% FCS in McCoys 5A modified medium containing 400 μg/ml geneticin (Gibco ® , Thermo Fisher Scientific, Waltham, MA, USA) and 250 μg/ml hygromycin (Roche, Mannheim, Germany). D53wt are a derivative of the colorectal carcinoma cell line DLD-1, kindly provided by Bert Vogelstein [80]. These cells harbor an inactive 241F p53 mutant and are stably transfected with a tetracycline-responsive p53 expression system (tet-off). Induction of p53wt was performed by replacing the medium with tetracycline-free cell culture media [81]. Previous studies have shown that p53 is efficiently upregulated after 6 h [82, 83]. As we were interested in identifying the direct effects of p53 transcriptional regulation, we chose to induce for 6 h for the tiling array experiment [83]. Induction of p53 mRNA and upregulation of the known p53 target gene p21 CIP1/WAF1 was controlled by qRT-PCR (Additional file 1: Figure S2, [83]).

Cell cycle

Human foreskin fibroblasts obtained from ATCC (American Type Culture Collection, LGC Standards, Teddington, Middlesex, UK) were cultured in DMEM (Gibco ® , Thermo Fisher Scientific, Waltham, MA, USA) supplemented with 10% FCS (Lonza, Basel, Switzerland). Then 10 6 cells were subcultured in T300 flasks. After 24 h, the fibroblasts were synchronized in G0 by serum deprivation for 48 h [84]. Restimulation was carried out by adding a medium containing 20% FCS. Cells were harvested at different time points to obtain cell populations mainly at G1, S or G2/M phases of the cell cycle. Synchronized cells were analyzed by flow cytometry as described previously [82–84]. Total RNA was extracted using TRIzol ® (Invitrogen GmbH, Karlsruhe, Germany). The RNA integrity for each sample was controlled with the total RNA Nano Assay and the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) [84]. All samples included in the experiments had RIN >8.

Whole genome tiling arrays

The Affymetrix Human Whole Genome Tiling Array 1.0 Set consisting of 14 arrays was used according to the manufacturer’s instructions, except that separate labeling reactions were used for each array starting from 10 μg total RNA.

Tiling array data analysis

We used the TileShuffle algorithm described in [41] to determine expressed and differentially expressed genomic intervals in an unbiased way. Briefly, TileShuffle differentiates expression signals from background noise taking into account common tiling array biases. Windowing was used to reduce cross-hybridization effects. The significance of windows was assessed using empirical q-values that were estimated by repeatedly permuting probes on the array. Probes were binned with respect to the GC content of their sequences, and probes belonging to different bins may not be interchanged during permutation. The analysis of differential expression was implemented in a similar manner. Here, log-fold-changes between tiling arrays in both cellular states were used as measures of differential expression. Since sequence-specific effects were canceled out, affinity binning was obsolete in this context. We avoided considering signal intensity variation at the detection limit as differential expression by requiring that differentially expressed intervals must also be significantly expressed relative to the background distribution in at least one of the investigated conditions. Such intervals are called DE-TARs. This is analogous to the common non-specific filtering in conventional microarray data analysis.

Affymetrix Human Whole Genome Tiling Array 1.0 Set raw signal intensities were mapped to human genome version NCBI36 using Affymetrix BPMAP files [85]. Expressed segments were detected with the TileShuffle parameter settings: window size =200, the window score was defined as the arithmetic mean trimmed by the maximal and minimal values over signal intensities of all probes in a window, number of permutations =10,000 and number of GC classes =4. All windows with an adjusted P<0.05 according to Benjamini and Hochberg [86] were defined to be significantly expressed. DE-TARs are differentially expressed TileShuffle intervals with adjusted P<0.005 (window size =200, the window score was defined as the log-fold-change discarding all probes with converse behavior as observed for the relevant significantly expressed windows, number of permutations =100,000 and number of GC classes =1). Finally, the genome coordinates of all significantly expressed and all significantly differentially expressed segments were lifted over to GRCh37 (hg19) using [87].

Defining a set of bona fidenon-coding segments

Apart from overlaps with protein-coding annotation, we mainly relied on RNAcode[88] for predicting likely protein-coding segments within the TARs and DE-TARs. RNAcode considers synonymous amino acid substitutions, reading frame conservation and the occurrence of premature stop codons. It was applied to genome-wide Multiz alignments [89] for 46 vertebrate genomes downloaded from [90]. All segments with an RNAcodeP<0.05 were considered de novo protein-coding regions. We refrained from adjusting P values for multiple testing, as we were not interested in a set of highly reliable protein-coding segments (i.e. reducing the number of false positives), but in reducing the number of regions falsely interpreted as non-coding (i.e. reducing the number false negatives). An RNAcodeP<0.05 resulted in 84.8% sensitivity (according to known protein-coding exons annotated in Gencode v12) and 97.2% specificity (according to 10,000 sampled intergenic intervals preserving the length distribution and repeat content of protein-coding exons). RNAcode requires an input alignment of at least three evolutionarily related sequences leaving stretches of genomic DNA without an RNAcode score, because DNA sequence is not sufficiently conserved.

Bona fide non-coding intervals in intergenic and intronic regions were constructed from the significantly expressed and differentially expressed segments by: (i) removing all nucleotides overlapping exons of known protein-coding transcript isoforms (Gencode v12 [46], UCSC genes [91], RefSeq[92] or Ensembl[93] gene annotation) or known pseudogenes (Gencode v12), (ii) removing all nucleotides overlapping predicted protein-coding segments (RNAcode[88]), (iii) removing all segments not classified by RNAcode having a sequence similarity to known human amino acid sequences (RefSeq database from 7 March 2012, tblastn with -word-size 3[94] and e<0.05) and (iv) as the smallest species of ncRNAs so far described in humans – tinyRNAs and splice-site RNAs – are between 17 and 18 bp in length, the remaining intervals smaller than 17 bp were discarded.

Detection of macroRNAs

The statistical tiling array data analysis outlined above reports segments deemed as highly or differentially expressed. Individual segments are typically short but often appear strongly enriched in large genomic intervals. Gaps within such accumulations of segments may be caused by variations in signal intensity, a drop of signal within intronic regions or by repeat regions that are not covered by the tiling array. Regions in which significant segments are highly enriched may thus reflect large biologically relevant entities. Merging segments with a maximum distance only reproduces the same picture at lower resolution but is inadequate for identifying local accumulations. Instead, stairFinder is based on estimating segment density using biweight kernels (see e.g. [95]) and a given bandwidth where segments are represented by their center position and weighted by their length. The bandwidth of the kernel is a smoothing parameter that significantly influences the resulting estimate, because a larger bandwidth tends to aggregate more segments into one single density peak. A bandwidth of 100,000 gave the best results, relative to known annotation. Each estimated peak including its flanking density minima was then processed to identify the accumulation boundaries using a flooding procedure to exclude single short outlying segments. More precisely, the boundaries were defined as the leftmost and rightmost positions between the two flanking minima where the density estimate remains above the local flooding level, which is set to the current peak multiplied by a given level parameter (0≤level≤1). We used a local flooding level of 50%. Setting the flooding level to 0 thus identifies the flanking minima as the boundary of the accumulation region. In the final step, accumulation regions that overlap with each other were combined. The stairFinder software reports these combined regions together with information on their segment coverage and silhouette as a clustering measure.

Annotation categories

A detailed listing of annotation sets used is given in Additional file 1: Section 7.2 and Table S28.

Statistical analysis of annotation overlaps

The overlap with annotation sets was calculated using R version 2.14.2 [96] and the Bioconductor library genomeIntervals[97]. We further used the R library Snow to enable parallel processing [98]. For each of the three experimental settings (STAT3, p53 and cell cycle), the overlap with a particular annotation set was computed in terms of: (i) the absolute number of nucleotides in the DE-TAR overlapping with a particular annotation and (ii) the odds ratio of the observed relative overlap versus a mean relative overlap of N=100 randomized background lists. Each background list consists of randomly generated genomic intervals of the same length distribution as observed for the corresponding DE-TARs excluding assembly gaps and repeat regions as annotated by RepeatMasker version Open-3.0[99]. Each list contains as many intervals as in DE-TAR. The sampling space for bona fide non-coding intergenic DE-TARs and bona fide non-coding intronic DE-TARs was reduced to intergenic regions (the complement of protein-coding gene annotation derived from Gencode release v12, UCSC genes, RefSeq and Ensembl genes) or intronic regions (all nucleotides not overlapping with any exon annotated as a protein-coding exon in Gencode release v12, UCSC genes, RefSeq or Ensembl genes). Observed odds, randomized odds and odds ratios are defined as follows:

F uture Q uestions

As described above, there are both basic model and human examples that show the importance of stress resistance and stress-induced damage in regulating aging and disease. The potential for leveraging this relationship to enhance human health is uncertain, but highly attractive, because stress is modifiable both in the environment and in individuals’ responses. Several key questions need to be addressed to fill in the gaps described above to help clarify the hypothesized stress-longevity nexus, and lead us to the most feasible and promising interventions.

Under What Conditions Is Stress Linked to Disease Processes?

What kinds of stress are linked to aging biology or specific disease processes? What cellular and neuroendocrine changes mediate these links? Can studies of stress biology in small organisms contribute meaningfully to our understanding of the relationship between social-psychological stress and human disease risk? Clearly the type and duration of the stressor is an important determinant of hormesis versus damage. For example, acute psychological stress can enhance cell-mediated immune responses, whereas chronic exposure can dampen it in rodents ( 19). In animal models, it may be helpful to identify what drives the switch between enhancing versus damaging effects. Further, what are the different effects during the developmental period, early in life, versus in aged organisms? This could move the field toward a finer grained analysis of “stress” so as to better understand good versus bad stress, at which developmental periods stress can have the most salutary versus damaging effects, and the inflection point at which a stressor becomes toxic and overwhelming. Identification of the molecular players involved in this switch is a major gap in the field.

In humans, this question becomes more complex, given the vast range of stressor exposures and types of stress responses. We rely on naturalistic exposures to chronic stressors, and responses range from depression and disease to psychological resiliency. Social factors, such as secure attachment, higher educational attainment, and social support, are already known to have stress-buffering and salutary effects and cannot be ignored in longevity research.

The study of biological predispositions to respond in particular ways under stress is a new frontier. The range of human response to common stressors in part depends on genotype, as shown by recent studies focusing on cell aging mechanisms ( 20). Although these are relatively small effects, they will contribute to our ability to focus treatments and tailor them, as part of the new movement toward Precision Medicine. Next steps include understanding the range of genotypes, and gene–gene interactions, and how these protect an individual from an adverse environment, as well as Gene × Environment interactions. Subsequently, identifying those most vulnerable and shaping the environment to be most conducive to health and resiliency will need to be major foci for antiaging interventions.

How Do Stress Exposure and the Cellular Response to Stress Regulate Aging? What Experimental Models May Be Useful in Defining the Impact of Stress at a Molecular and Cellular Level?

Caloric restriction and GH/IGF-1 deficiency lead to longevity in lower species through certain pathways that change energy metabolism and removal of toxins and can improve response to physiological stressors. None of these studies have examined psychological stressors. There are still fundamental questions in basic research, such as how to understand the stress resistance to longevity relationship mechanistically and what aspects of the relationship might be most open to intervention ( 21). Of the nine hallmarks of aging recently suggested by López-Otín and coworkers (see Figure 1) ( 22), some have already been linked to the stress response ( 3). The pathways in animal studies that have been examined so far include loss of proteostasis and epigenetic changes in specific loci in the brain. In humans, relations have been examined between psychological stress with genomic instability (DNA damage), telomere attrition, and certain patterns of gene expression.

Stress resistance may be central to the Hallmarks of aging: Does this extend to humans? (adapted with permission from Lopez-Otin et al. ( 22)).

Genetic Redundancy

Genetic Interactions Provide an Experimental Framework for Genetic Redundancy

Genetic interactions (also referred to as epistatic interactions or epistasis) provide a framework for studying genetic redundancy experimentally. Two genes are said to have a genetic interaction if simultaneous perturbations to both genes result in a phenotype that would not have been expected by the phenotypes of both single perturbations. For example, two genetic perturbations that result in a surprising decrease in fitness in the resulting organism are often cited as evidence of genetic redundancy between the two genes. For example, if deleting genes A and B individually results in no discernible phenotype, but deleting them simultaneously results in a nonviable organism, one expects that A and B are to some extent genetically redundant. Thus, measuring fitness or another phenotype quantitatively can enable precise detection of such redundancies. For example, if the fitness of a genetic perturbation to gene ‘x’ is fx relative to wild type, then assuming the effects of perturbing genes A and B are independent, we expect the fitness of the combined perturbations fAB to be equal to the product of the fitness of single perturbations: fAB = fAfB. If, on the contrary, fAB is significantly less than fAfB, then A and B are said to negatively interact (synthetic or compensatory interaction), which is indicative of genetic redundancy.

Genetic interactions can be used to experimentally assess the model of genetic redundancy described above. Duplicate pairs are often compensatory: recent large-scale experiments on double mutants in baker’s yeast, for example, demonstrated that ∼30% of duplicate pairs exhibit a negative genetic interaction in comparison to a background rate of only 1–2% among random pairs of genes. While gene duplicates are relatively common, interactions between duplicates account for only a small portion of the observed pairwise genetic redundancy. The same large-scale study in yeast found that duplicate pairs accounted for only hundreds of tens of thousands negative interactions, suggesting that gene duplication explains only a small fraction of the observed genetic redundancy. Most of the other cases of genetic redundancy appear to occur between redundant pathways that exhibit little or no direct sequence similarity. These large-scale studies in yeast have also demonstrated that synthetic genetic interactions are highly modular: if two pathways perform compensatory functions, all of the genes in one pathway show genetic interactions with all of the genes in the second pathway. It has been shown that the vast majority of cases of pairwise genetic redundancy observed in yeast are examples of the so-called ‘between-module’ interactions. However, it is still unclear if these trends extend to more complex organisms.


Long-standing barriers impeding the construction of large-scale kinetic models of metabolism are being overcome with the help of developments in high-throughput technologies and computational analyses. Modelers are now faced with the challenge of integrating the increasingly available building blocks to create coherent mathematical representations of biological systems. Here, we presented our efforts to develop a modeling framework for constructing large-scale kinetic models that mechanistically link transcriptional regulation and metabolism. This allowed us to gain understanding of complex physiological relations from fluxome, metabolome, and gene expression data. We demonstrated the ability of our method to capture these relations, its flexibility to simulate different experiments, and its robustness with respect to modeling approximations and data uncertainty by analyzing the response of S. cerevisiae under different stress conditions. Importantly, our approach can be applied to other organisms of medical and industrial relevance (or cell types in multi-cellular organisms) for which a metabolic network reconstruction, metabolic flux measurements, and gene expression data are available for the conditions of interest.

The method provides efficient solutions to large-scale modeling challenges

One of the major challenges in constructing large-scale kinetic models is the definition of appropriate reaction rate expressions. Instead of defining mechanistic reaction rate expressions on a case-by-case basis, some approaches streamline this process by relying on generic expressions to translate a metabolic network into a kinetic model in an automated or semi-automated fashion. Different general forms have been proposed, such as log-linear kinetics [32], Michaelis-Menten-type kinetics [33], “convenience” kinetics [19], or GMA kinetics [23]. GMA kinetics are used, for example, in ensemble modeling [20] and mass action stoichiometric simulation (MASS) models [21]. In ensemble modeling and MASS models, the enzymatic reactions are decomposed into their elementary steps, and each step is then modeled using mass action kinetics. The decomposition increases the resolution of the model, preserves enzyme saturation behavior, and simplifies the parameter estimation problem, but at the price of considerably increasing the size of the model (i.e., the number of dynamic variables and model parameters) and the amount of data required to estimate parameter values. In contrast, we used a special case of GMA kinetics that requires a minimal number of parameters, which can be obtained directly from available experimental data (see Methods and Additional File 1). Moreover, enzymatic reactions were not decomposed into elementary steps to avoid increasing the size of the model.

Another challenge is the determination of model parameter values. The difficulty in solving this problem is linked to the form of the kinetic expressions and to the availability of experimental data. If experimental data are not available, approaches such as log-linear kinetics and “convenience” kinetics require mining the literature for parameter values, which (aside from the skepticism about the validity of combining parameter values from different conditions to simulate a specific experiment) could be impractical for large-scale models. Approaches using GMA kinetics partially avoid literature mining. In these approaches, such as MASS modeling [21], thermodynamic information collected from the literature (e.g., equilibrium constants, Gibbs free energies, etc.) is combined with experimentally determined metabolite and/or enzyme concentrations and flux distributions to estimate the remaining model parameters (i.e., the rate constants). For the common case of incomplete data, the missing information is approximated to “typical” values or is randomly generated to create an ensemble of models that are screened for models that agree with experimental observations [20]. Based on the “sloppiness” property, we would expect that models parameterized using “typical” values will perform reasonable well. However, the typical values generally fall within relatively wide ranges, making the selection of parameter values to simulate a particular condition a non-trivial task. In contrast, the rate expressions we used enabled us to readily obtain the bulk of the model parameters (221 out of 227) directly from available experimental data (i.e., flux distributions and gene expression see Methods and Additional File 1). Moreover, we circumvented mining the literature or using randomly generated values for thermodynamic parameters by assuming a single parameter (β) for relating the forward and backward reaction rates to the overall rate for all reversible reactions. This crude approximation, inspired in part by the “sloppiness” property of biological systems, worked surprisingly well for the examples studied here. Our method performed well even if the uptake and production rates of extracellular metabolites were the only metabolic data available, as demonstrated in the analysis of S. cerevisiae tolerance to WOAs (see Additional File 1 for simulations using only uptake and production rates of extracellular metabolites under histidine starvation).

An additional attribute of our method is the use of gene expression data to parameterize the model to simulate different conditions, an element that has been used in constraint-based approaches to create context-specific models [7–10], but which has not been fully exploited in other kinetic modeling approaches. An exception is the work by Bruck et al. [34], in which gene expression was integrated with a kinetic model of S. cerevisiae glycolysis based on a mechanistic model developed by Teusink et al. [35]. However, Bruck et al. [34] estimated a subset of 31 parameters to fit the model to data from all conditions they simulated and did not present simulations without the gene expression data, preventing an assessment of the contribution of gene expression changes. In contrast, our models were able to simulate metabolic responses with a smaller subset of fitting parameters and our analysis showed the important role of gene expression on model predictions. Note that requiring gene expression data in order to simulate other conditions could also be considered a weakness, but no other model includes the prediction of protein/gene expression changes for the systems of the size of the network we analyzed.

Constructed models generated biological insights

We demonstrated that the constructed models were able to integrate transcriptional and metabolic responses to produce insights that would have been difficult to grasp from the analysis of the individual responses. For example, in their analysis of S. cerevisiae response to WOAs, Abbott et al. [24] identified differentially expressed genes as those with an expression change larger than two-fold and a false discovery rate lower than 0.5%. With these criteria, they found hundreds of differentially expressed genes under each treatment condition, but only 14 genes that were upregulated under all treatment conditions. Therefore, they concluded that the generic (i.e., common to all treatments) transcriptional response to WOAs was minimal and suggested that more relevance should be given to the specific responses to the specific treatment conditions. We agree that attention should be paid to the specific responses, but our analysis also suggests that the generic response, despite involving a few genes, is a major factor contributing to WOA tolerance. Based on our simulation results, we hypothesize that S. cerevisiae tightly regulates the expression levels of two reactions (glucose uptake-phosphorylation and decarboxylation of pyruvate to acetaldehyde) to increase the tolerance under all treatment conditions. Firstly, this generic response was not identified in the Abbott et al. [24] analysis because the gene expression changes for these reactions did not meet their criteria for differentially expressed genes (see Additional File 1). Secondly, we estimated that regulating these two reactions accounted for most of the increase in tolerance to WOAs (Figure 8 and Table 5). If correct, this hypothesis implies that S. cerevisiae has a generic response to WOAs that is critical for the adaptation to these stressors.

Identification of important reactions in a metabolic network has been one of the major goals of several model-based approaches. For example, Kummel et al. [36] developed a thermodynamics-based method to identify regulated reactions, assuming that reactions far from equilibrium are more likely to be regulated. In contrast with our method, their approach does not use any kinetic information but requires thermodynamic and metabolome data. In another example, Smallbone et al. [22] combined log-linear kinetics with metabolic control analysis [37, 38] to identify reactions exerting the most control over biomass production in a genome-scale metabolic network of S. cerevisiae. Similar to these efforts, our method was able to identify important regulated reactions under specific conditions. However, our method also provided mechanistic insights into how the cell regulates such reactions through transcriptional regulation and how this response is reflected in its phenotype.

In another effort to link the regulatory and metabolic responses, Moxley et al. [25] proposed a hybrid approach to predict changes in metabolic fluxes using gene expression changes. Their approach was based on the assumption that gene expression changes and fluxes are more correlated in pathways with fewer metabolite-enzyme interactions (metabolite-enzyme interactions exist between an enzyme and metabolites that regulate its activity). Thus, their approach combined a metabolic network model with a metabolite-enzyme interaction network. Using this approach, they predicted flux changes that had a relatively high correlation (ρ = 0.80) with the experimentally estimated flux changes for a subset of reactions. For the same subset, our model predictions showed a considerably higher correlation (ρ = 0.96). Moreover, our method required less information because knowledge of the metabolite-enzyme interaction network is not needed. Interestingly, their predictions, using only the metabolic network model (without considering metabolite-enzyme interactions), had a similar ρ of approximately 0.75, reflecting the major contribution of the network structure to its function. In terms of biological insights, they observed a redistribution of the glycine synthesis fluxes. They proposed that the increase in glycine production from threonine is mediated by the increased expression of the associated genes, but they do not fully explain why the flux from serine to glycine decreased. Our analysis led to the plausible explanation that the decrease in the flux from serine to glycine could have been caused by the decrease of tetrahydrofolate, which, in turn, could have been caused by off-target inhibitions of 3-AT. In addition, and in contrast with their approach, our method also predicted concentration changes. In fact, we are unaware of other modeling efforts with similar scope that produce similar levels of accuracy, using condition-specific data directly as model parameters and using only five fitting parameters.

An additional conjecture about the use of gene expression changes to parameterize protein activity changes can be derived from our simulation results. We omitted post-translational and other regulatory mechanisms and yet the model predictions were consistent with experimental data. This suggests that, for the metabolic network and the experiments considered here, transcriptional regulation was the main mechanism that regulated the response at the system level. Moreover, the accuracy of the model predictions suggests that gene expression changes were a good approximation for protein level changes, in agreement with experimental observations [27, 39].

Further developments

The proposed method does not need knowledge of the absolute values of metabolite concentrations for steady-state simulations, but these are required for analysis of transient behavior. Developments in analytical techniques have increased the accuracy and scope of metabolite concentration measurements. However, such data are still generally incomplete and, thus, missing data must be estimated or assumed. Note that the requirement of metabolite concentrations to describe dynamic behavior is common to similar modeling approaches. Thus, it remains to be investigated how the proposed modeling framework performs in describing dynamic and transient properties associated with metabolic processes.

The models constructed with the proposed method present some limitations. For example, the generic rate expressions may be poor approximations for some reactions or may miss important allosteric regulations (e.g., feedback loops) and other factors that have an effect on protein activity and abundance (e.g., post-translational modifications). Lumping sequential reactions reduced the size of the model. However, in our approach, the rate expressions for lumped reactions are only an approximation to the sequence of individual reactions. In the experiments we analyzed, the final results were not sensitive to our somewhat arbitrary parameter choice m i and β. This may not be always the case and estimating more accurate parameters values may be necessary. As for any method, identifying and correcting modeling errors is a painstaking task. This could be especially true for automated model generation. Procedures to address this problem in a systematic way need to be developed. Furthermore, our method needs to be tested to determine whether it can be applied to genome-scale metabolic networks. Such application could be problematic because of the higher uncertainty of lowly expressed genes and small metabolic fluxes, the buildup of approximation errors, and numerical challenges to solve the model. Regarding its scope, the proposed method is limited to gene expression and metabolism. Although it enables a deeper, mechanistic analysis of these processes, further developments to include other cellular processes (e.g., signal transduction, cell division, etc.) would greatly enhance the modeling framework.


Davies, H. et al. Mutations of the BRAF gene in human cancer. Nature 417, 949–954 (2002).

Wan, P.T. et al. Mechanism of activation of the RAF-ERK signaling pathway by oncogenic mutations of B-RAF. Cell 116, 855–867 (2004).

Santarosa, M. & Ashworth, A. Haploinsufficiency for tumour suppressor genes: when you don't need to go all the way. Biochim. Biophys. Acta 1654, 105–122 (2004).

Knudson, A.G. Cancer genetics. Am. J. Med. Genet. 111, 96–102 (2002).

Friedberg, E.C. DNA damage and repair. Nature 421, 436–440 (2003).

Nowell, P.C. Tumor progression: a brief historical perspective. Semin. Cancer Biol. 12, 261–266 (2002).

Maley, C.C. et al. Selectively advantageous mutations and hitchhikers in neoplasms: p16 lesions are selected in Barrett's esophagus. Cancer Res. 64, 3414–3427 (2004).

Van Dyke, T. & Jacks, T. Cancer modeling in the modern era: progress and challenges. Cell 108, 135–144 (2002).

Horvitz, H.R. Worms, life, and death. Chembiochem 4, 697–711 (2003).

Sherr, C.J. Cancer cell cycles revisited. Cancer Res. 60, 3689–3695 (2000).

Ortega, S., Malumbres, M. & Barbacid, M. Cyclin D-dependent kinases, INK4 inhibitors and cancer. Biochim. Biophys. Acta 1602, 73–87 (2002).

Classon, M. & Harlow, E. The retinoblastoma tumour suppressor in development and cancer. Nat. Rev. Cancer 2, 910–917 (2002).

Ichimura, K. et al. Deregulation of the p14ARF/MDM2/p53 pathway is a prerequisite for human astrocytic gliomas with G1-S transition control gene abnormalities. Cancer Res. 60, 417–424 (2000).

Vogelstein, B., Lane, D. & Levine, A.J. Surfing the p53 network. Nature 408, 307–310 (2000).

Oren, M. Decision making by p53: life, death and cancer. Cell Death Differ. 10, 431–442 (2003).

Prives, C. & Hall, P.A. The p53 pathway. J. Pathol. 187, 112–126 (1999).

Klein, G. Perspectives in studies of human tumor viruses. Front. Biosci. 7, d268–d274 (2002).

Munger, K. & Howley, P.M. Human papillomavirus immortalization and transformation functions. Virus Res. 89, 213–228 (2002).

zur Hausen, H. Oncogenic DNA viruses. Oncogene 20, 7820–7823 (2001).

Hunter, T. Signaling–2000 and beyond. Cell 100, 113–127 (2000).

Komarova, N.L., Sengupta, A. & Nowak, M.A. Mutation-selection networks of cancer initiation: tumor suppressor genes and chromosomal instability. J. Theor. Biol. 223, 433–450 (2003).

Rowley, J.D. The critical role of chromosome translocations in human leukemias. Annu. Rev. Genet. 32, 495–519 (1998).

Mitelman, F. Recurrent chromosome aberrations in cancer. Mutat. Res. 462, 247–253 (2000).

Verheul, H.M., Voest, E.E. & Schlingemann, R.O. Are tumours angiogenesis-dependent? J. Pathol. 202, 5–13 (2004).

Tlsty, T.D. & Hein, P.W. Know thy neighbor: stromal cells can contribute oncogenic signals. Curr. Opin. Genet. Dev. 11, 54–59 (2001).

Fata, J.E., Werb, Z. & Bissell, M.J. Regulation of mammary gland branching morphogenesis by the extracellular matrix and its remodeling enzymes. Breast Cancer Res. 6, 1–11 (2004).

Kerbel, R. & Folkman, J. Clinical translation of angiogenesis inhibitors. Nat. Rev. Cancer 2, 727–739 (2002).

Folkman, J. Role of angiogenesis in tumor growth and metastasis. Semin. Oncol. 29, 15–18 (2002).

Ferrara, N., Hillan, K.J., Gerber, H.P. & Novotny, W. Discovery and development of bevacizumab, an anti-VEGF antibody for treating cancer. Nat. Rev. Drug Discov. 3, 391–400 (2004).

Kondo, K., Klco, J., Nakamura, E., Lechpammer, M. & Kaelin, W.G. Jr. Inhibition of HIF is necessary for tumor suppression by the von Hippel-Lindau protein. Cancer Cell 1, 237–246 (2002).

Semenza, G.L. Targeting HIF-1 for cancer therapy. Nat. Rev. Cancer 3, 721–732 (2003).

Strausberg, R.L., Simpson, A.J. & Wooster, R. Sequence-based cancer genomics: progress, lessons and opportunities. Nat. Rev. Genet. 4, 409–418 (2003).

Loeb, L.A., Loeb, K.R. & Anderson, J.P. Multiple mutations and cancer. Proc. Natl. Acad. Sci. USA 100, 776–781 (2003).

Rajagopalan, H., Nowak, M.A., Vogelstein, B. & Lengauer, C. The significance of unstable chromosomes in colorectal cancer. Nat. Rev. Cancer 3, 695–701 (2003).

Sieber, O.M., Heinimann, K. & Tomlinson, I.P. Genomic instability—the engine of tumorigenesis? Nat. Rev. Cancer 3, 701–708 (2003).

Wang, T.L. et al. Prevalence of somatic alterations in the colorectal cancer cell genome. Proc. Natl. Acad. Sci. USA 99, 3076–3080 (2002).

Lengauer, C., Kinzler, K.W. & Vogelstein, B. Genetic instabilities in human cancers. Nature 396, 643–649 (1998).

Duesberg, P. & Li, R. Multistep carcinogenesis: a chain reaction of aneuploidizations. Cell Cycle 2, 202–210 (2003).

Albertson, D.G. & Pinkel, D. Genomic microarrays in human genetic disease and cancer. Hum. Mol. Genet. 12 (spec. no. 2), R145–R152 (2003).

Shiloh, Y. & Kastan, M.B. ATM: genome stability, neuronal development, and cancer cross paths. Adv. Cancer Res. 83, 209–254 (2001).

Scully, R. & Livingston, D.M. In search of the tumour-suppressor functions of BRCA1 and BRCA2. Nature 408, 429–432 (2000).

Maser, R.S. & DePinho, R.A. Connecting chromosomes, crisis, and cancer. Science 297, 565–569 (2002).

Pihan, G. & Doxsey, S.J. Mutations and aneuploidy: co-conspirators in cancer? Cancer Cell 4, 89–94 (2003).

Rajagopalan, H. et al. Inactivation of hCDC4 can cause chromosomal instability. Nature 428, 77–81 (2004).

Shay, J.W. & Roninson, I.B. Hallmarks of senescence in carcinogenesis and cancer therapy. Oncogene 23, 2919–2933 (2004).

Chambers, A.F., Groom, A.C. & MacDonald, I.C. Dissemination and growth of cancer cells in metastatic sites. Nat. Rev. Cancer 2, 563–572 (2002).

Fidler, I.J. Critical determinants of metastasis. Semin. Cancer Biol. 12, 89–96 (2002).

Hunter, K.W. Host genetics and tumour metastasis. Br. J. Cancer 90, 752–755 (2004).

Hruban, R.H., Goggins, M., Parsons, J. & Kern, S.E. Progression model for pancreatic cancer. Clin. Cancer Res. 6, 2969–2972 (2000).

Aguirre, A.J. et al. Activated Kras and Ink4a/Arf deficiency cooperate to produce metastatic pancreatic ductal adenocarcinoma. Genes Dev. 17, 3112–3126 (2003).

Jen, J. et al. Molecular determinants of dysplasia in colorectal lesions. Cancer Res. 54, 5523–5526 (1994).

Pretlow, T.P. Aberrant crypt foci and K-ras mutations: earliest recognized players or innocent bystanders in colon carcinogenesis? Gastroenterology 108, 600–603 (1995).

Sieben, N.L. et al. In ovarian neoplasms, BRAF, but not KRAS, mutations are restricted to low-grade serous tumours. J. Pathol. 202, 336–340 (2004).

Kinzler, K.W. & Vogelstein, B. Colorectal Tumors. in The Genetic Basis of Human Cancer (eds. Vogelstein, B. & Kinzler, K.W.) 565–587 (McGraw-Hill, New York, 1998).

Barbacid, M. ras genes. Annu. Rev. Biochem. 56, 779–827 (1987).

Bos, J.L. ras oncogenes in human cancer: a review. Cancer Res. 49, 4682–4689 (1989).

Zhang, Z. et al. Wildtype Kras2 can inhibit lung carcinogenesis in mice. Nat. Genet. 29, 25–33 (2001).

Diaz, R. et al. The N-ras proto-oncogene can suppress the malignant phenotype in the presence or absence of its oncogene. Cancer Res. 62, 4514–4518 (2002).

Bronner-Fraser, M. Development. Making sense of the sensory lineage. Science 303, 966–968 (2004).

Jiricny, J. Eukaryotic mismatch repair: an update. Mutat. Res. 409, 107–121 (1998).

Fishel, R. & Wilson, T. MutS homologs in mammalian cells. Curr. Opin. Genet. Dev. 7, 105–113 (1997).

Lynch, H.T. & de la Chapelle, A. Hereditary colorectal cancer. N. Engl. J. Med. 348, 919–932 (2003).

Yamamoto, H., Imai, K. & Perucho, M. Gastrointestinal cancer of the microsatellite mutator phenotype pathway. J. Gastroenterol. 37, 153–163 (2002).

Honchel, R., Halling, K.C. & Thibodeau, S.N. Genomic instability in neoplasia. Semin. Cell Biol. 6, 45–52 (1995).

Brown, P.O. & Botstein, D. Exploring the new world of the genome with DNA microarrays. Nat. Genet. 21, 33–37 (1999).

Polyak, K. & Riggins, G.J. Gene discovery using the serial analysis of gene expression technique: implications for cancer research. J. Clin. Oncol. 19, 2948–2958 (2001).

Jones, P.A. & Baylin, S.B. The fundamental role of epigenetic events in cancer. Nat. Rev. Genet. 3, 415–428 (2002).

Feinberg, A.P. & Tycko, B. The history of cancer epigenetics. Nat. Rev. Cancer 4, 143–153 (2004).

Collins, F.S., Green, E.D., Guttmacher, A.E. & Guyer, M.S. A vision for the future of genomics research. Nature 422, 835–847 (2003).

Schadt, E.E., Monks, S.A. & Friend, S.H. A new paradigm for drug discovery: integrating clinical, genetic, genomic and molecular phenotype data to identify drug targets. Biochem. Soc. Trans. 31, 437–443 (2003).

Paddison, P.J. et al. A resource for large-scale RNA-interference-based screens in mammals. Nature 428, 427–431 (2004).

Berns, K. et al. A large-scale RNAi screen in human cells identifies new components of the p53 pathway. Nature 428, 431–437 (2004).

Rosenblatt, K.P. et al. Serum proteomics in cancer diagnosis and management. Annu. Rev. Med. 55, 97–112 (2004).

Luo, J., Isaacs, W.B., Trent, J.M. & Duggan, D.J. Looking beyond morphology: cancer gene expression profiling using DNA microarrays. Cancer Invest. 21, 937–949 (2003).

Ma, X.J. et al. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell 5, 607–616 (2004).

Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

Masayesva, B.G. et al. Gene expression alterations over large chromosomal regions in cancers include multiple genes unrelated to malignant progression. Proc. Natl. Acad. Sci. USA 101, 8715–8720 (2004).

Stewart, S.A. & Weinberg, R.A. Senescence: does it all happen at the ends? Oncogene 21, 627–630 (2002).

Feldser, D.M., Hackett, J.A. & Greider, C.W. Telomere dysfunction and the initiation of genome instability. Nat. Rev. Cancer 3, 623–627 (2003).

Chan, S.R. & Blackburn, E.H. Telomeres and telomerase. Phil. Trans. R. Soc. Lond. B 359, 109–121 (2004).

Cech, T.R. Beginning to understand the end of the chromosome. Cell 116, 273–279 (2004).

Miklos, G.L. & Maleszka, R. Microarray reality checks in the context of a complex disease. Nat. Biotechnol. 22, 615–621 (2004).

Hope, K.J., Jin, L. & Dick, J.E. Human acute myeloid leukemia stem cells. Arch. Med. Res. 34, 507–514 (2003).

Berking, C. & Herlyn, M. Human skin reconstruct models: a new application for studies of melanocyte and melanoma biology. Histol. Histopathol. 16, 669–674 (2001).

Kuperwasser, C. et al. Reconstruction of functionally normal and malignant human breast tissues in mice. Proc. Natl Acad. Sci. USA 101, 4966–4971 (2004).

Frei, E.I. & Eder, J.P. Principles of dose, schedule, and combination Therapy. in Cancer Medicine (eds. Kufe, D.W. et al.) 669–677 (B.C. Decker, Inc., Hamilton, Ontario, 2003).

Pegram, M.D., Konecny, G. & Slamon, D.J. The molecular and cellular biology of HER2/neu gene amplification/overexpression and the clinical development of herceptin (trastuzumab) therapy for breast cancer. Cancer Treat. Res. 103, 57–75 (2000).

Druker, B.J. et al. Chronic myelogenous leukemia. in Hematology 2001 (American Society of Hematology Education Program) 87–112 (American Society of Hematology, 2001).

Mechtersheimer, G. et al. Gastrointestinal stromal tumours and their response to treatment with the tyrosine kinase inhibitor imatinib. Virchows Arch. 444, 108–118 (2004).

Langer, C.J. Emerging role of epidermal growth factor receptor inhibition in therapy for advanced malignancy: focus on NSCLC. Int. J. Radiat. Oncol. Biol. Phys. 58, 991–1002 (2004).

Duensing, A., Heinrich, M.C., Fletcher, C.D. & Fletcher, J.A. Biology of gastrointestinal stromal tumors: KIT mutations and beyond. Cancer Invest. 22, 106–116 (2004).

Paez, J.G. et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science 304, 1497–1500 (2004).

Lynch, T.J. et al. Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. N. Engl. J. Med. 350, 2129–2139 (2004).

Schmitt, C.A. & Lowe, S.W. Apoptosis and therapy. J. Pathol. 187, 127–137 (1999).

Danial, N.N. & Korsmeyer, S.J. Cell death: critical control points. Cell 116, 205–219 (2004).

Brown, J.M. & Wouters, B.G. Apoptosis: mediator or mode of cell killing by anticancer agents? Drug Resist. Updat. 4, 135–136 (2001).

Weinstein, I.B. et al. Disorders in cell circuitry associated with multistage carcinogenesis: exploitable targets for cancer prevention and therapy. Clin. Cancer Res. 3, 2696–2702 (1997).

Nygren, P. & Larsson, R. Overview of the clinical efficacy of investigational anticancer drugs. J. Intern. Med. 253, 46–75 (2003).

Shih, L.Y. et al. Heterogeneous patterns of FLT3 Asp(835) mutations in relapsed de novo acute myeloid leukemia: a comparative analysis of 120 paired diagnostic and relapse bone marrow samples. Clin. Cancer Res. 10, 1326–1332 (2004).

Kinzler, K.W. & Vogelstein, B. Lessons from hereditary colon cancer. Cell 87, 159–170 (1996).

Weissleder, R. & Ntziachristos, V. Shedding light onto live molecular targets. Nat. Med. 9, 123–128 (2003).

Sidransky, D. Emerging molecular markers of cancer. Nat. Rev. Cancer 2, 210–219 (2002).

Gschwind, A., Fischer, O.M. & Ullrich, A. The discovery of receptor tyrosine kinases: targets for cancer therapy. Nat. Rev. Cancer 4, 361–370 (2004).

Downward, J. Targeting RAS signalling pathways in cancer therapy. Nat. Rev. Cancer 3, 11–22 (2003).

Malumbres, M. & Barbacid, M. To cycle or not to cycle: a critical decision in cancer. Nat. Rev. Cancer 1, 222–231 (2001).

Giles, R.H., van Es, J.H. & Clevers, H. Caught up in a Wnt storm: Wnt signaling in cancer. Biochim. Biophys. Acta 1653, 1–24 (2003).

Cantley, L.C. The phosphoinositide 3-kinase pathway. Science 296, 1655–1657 (2002).

Shi, Y. & Massague, J. Mechanisms of TGF-β signaling from cell membrane to the nucleus. Cell 113, 685–700 (2003).

Ruiz i Altaba. A., Stecca, B. & Sanchez, P. Hedgehog–Gli signaling in brain tumors: stem cells and paradevelopmental programs in cancer. Cancer Lett. 204, 145–157 (2004).

Adams, J.M. Ways of dying: multiple pathways to apoptosis. Genes Dev. 17, 2481–2495 (2003).

Blagosklonny, M.V. & Pardee, A.B. The restriction point of the cell cycle. Cell Cycle 1, 103–110 (2002).

Plas, D.R. & Thompson, C.B. Cell metabolism in the regulation of programmed cell death. Trends Endocrinol. Metab. 13, 75–78 (2002).

Green, D.R. & Evan, G.I. A matter of life and death. Cancer Cell 1, 19–30 (2002).

Eng, C., Kiuru, M., Fernandez, M.J. & Aaltonen, L.A. A role for mitochondrial enzymes in inherited neoplasia and beyond. Nat. Rev. Cancer 3, 193–202 (2003).

Lum, L. & Beachy, P.A. The Hedgehog response network: sensors, switches, and routers. Science 304, 1755–1759 (2004).

Brivanlou, A.H. & Darnell, J.E. Jr. Signal transduction and the control of gene expression. Science 295, 813–818 (2002).

Vogelstein, B. & Kinzler, K.W. The Genetic Basis of Human Cancer (McGraw-Hill, Toronto, 2002).

Cameron, E.R. & Neil, J.C. The Runx genes: lineage-specific oncogenes and tumor suppressors. Oncogene 23, 4308–4314 (2004).

Watch the video: New CRISPR breakthrough allows to switch genes on and off. Lifespan News (August 2022).