5.7: Whole Genome Duplication - Biology

5.7: Whole Genome Duplication - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

As you trace species further back in evolutionary time, you have the ability to ask different sets of questions. In class, the example used was K. waltii, which dates to about 95 millions years earlier than S.cerevisiae and 80 million years earlier than S.bayanus.

Looking at the dotplot of S.cerevisiae chromosomes and K.waltii scaffolds, a divergence was noted along the diagonal in the middle of the plot, whereas most pairs of conserved region exhibit a dot plot with a clear and straight diagonal. Viewing the segment at a higher magnification (Figure 5.25), it seems that S.cerevisiae sister fragments all map to corresponding K.waltii scaffolds.

Schematically (Figure 5.26) sister regions show gene interleaving. In duplicate mapping of centromeres, sister regions can be recognized based on gene order. This observed gene interleaving provides evidence of complete genome duplication.

Origin of the Yeast Whole-Genome Duplication

Whole-genome duplications (WGDs) are rare evolutionary events with profound consequences. They double an organism’s genetic content, immediately creating a reproductive barrier between it and its ancestors and providing raw material for the divergence of gene functions between paralogs. Almost all eukaryotic genome sequences bear evidence of ancient WGDs, but the causes of these events and the timing of intermediate steps have been difficult to discern. One of the best-characterized WGDs occurred in the lineage leading to the baker’s yeast Saccharomyces cerevisiae. Marcet-Houben and Gabaldón now show that, rather than simply doubling the DNA of a single ancestor, the yeast WGD likely involved mating between two different ancestral species followed by a doubling of the genome to restore fertility.

Citation: Wolfe KH (2015) Origin of the Yeast Whole-Genome Duplication. PLoS Biol 13(8): e1002221.

Published: August 7, 2015

Copyright: © 2015 Kenneth H. Wolfe. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Funding: The author received no specific funding for this work.

Competing interests: The author has declared that no competing interests exist.

The unicellular baker’s yeast Saccharomyces cerevisiae was the first eukaryote to have its genome sequenced, using the first generation of automated sequencing machines and before the advent of the whole-genome shotgun approach. The sequencing was done during the period between 1990 and 1996 by an international consortium that included many small European laboratories, one of which was mine. Each laboratory was given a “tranche” of about 30 kb to sequence, and when you had completed that chunk, you could apply for another one. We were paid €2 per base pair. Progress meetings, chaired energetically by André Goffeau [1], were held every six months to ensure that the project remained on track. At these meetings, each group would make a 5-minute presentation about the genes they had found in their current chunk. The presentations were often tedious, enlivened only by the occasional exigency for André to reassign pieces of DNA from the sequencing tortoises to the hares. But as the project progressed, a pattern began to emerge: many of the chunks were similar to other chunks. The first clone that I sequenced happened to contain the centromere of chromosome II, and I noticed that a gene beside it had a paralog beside the centromere of chromosome IV [2]. My second chunk, from chromosome XV, contained four genes that had four paralogs, in the same order, on chromosome I [3].

When the complete genome was released in April 1996, we were able to identify 55 large duplicated blocks of this type, ranging in size from three to 18 duplicated genes (Fig 1) [4]. Two observations indicated that the duplications were quite old: the average amino acid sequence identity between the gene pairs was only 63%, and within each block only about 25% of the genes were actually duplicated, the others being single copy. This pattern suggested that the whole block was initially duplicated, and then many individual genes were deleted. Two other observations suggested that the blocks were remnants of duplicated chromosomes that had become rearranged during evolution: there were almost no overlaps between the blocks, and the orientation of each pair of blocks was conserved relative to the centromeres and telomeres. This layout of blocks was consistent with duplication of the whole genome followed by both extensive deletion of single genes and genome rearrangement solely by the process of reciprocal translocation between chromosomes [4]. Under this hypothesis, there had been an ancient whole-genome duplication (WGD), and the 55 blocks that we could identify were simply the most duplicate-dense regions that still survived without evolutionary rearrangement.

The upper panel shows how duplicated blocks were initially identified using only genes that remain in duplicate in S. cerevisiae [4]. The lower panel shows how additional data from non-WGD yeasts such as Lachancea waltii [5] allowed the parts of the genome that were not initially allocated to blocks to be placed into pairs, providing a duplication map that covered the whole S. cerevisiae genome. Letters A–W represent genes, and dots represent centromeres. Only two chromosomes (yellow and brown) are shown.

The hypothesis of a WGD in S. cerevisiae was confirmed in 2004 when three groups sequenced the genomes of species that had branched off from this lineage before the WGD occurred [5–7]. These non-WGD genomes had a “double conserved synteny” relationship with the S. cerevisiae genome—that is, instead of each pair of duplicated regions, they had a single region containing all the genes in a merged order (Fig 1). This discovery allowed the entire genome of S. cerevisiae to be mapped into pairs of regions via their double conserved synteny with the non-WGD species, even if the pairs retain no duplicated genes, thus filling the gaps between the initial map of 55 duplicated blocks. These analyses proved that the WGD encompassed the entire genome of S. cerevisiae and showed that its 16 centromeres fall into eight ancestral pairs that are syntenic with centromeres of the non-WGD species. It therefore appeared that the WGD turned an eight-chromosome ancestor into a 16-chromosome descendant. From this complete map, we now know that among the 5,774 protein-coding genes of S. cerevisiae, there are 551 pairs of duplicated genes (ohnologs) that were formed by the WGD and that about 144 chromosomal rearrangements scrambled the genome after the WGD [8,9]. We also know that the WGD is not confined to Saccharomyces but occurred in the common ancestor of six genera, some of which diverged from each at an early stage when more than 4,000 genes were still duplicated, leading to later losses of different gene copies in different lineages [10].

What were the molecular events that caused the WGD? It is relatively easy to draw a diagram summarizing the history of each chromosomal region (Fig 2), but it is much more difficult to specify the provenance of the intermediate molecules and the timescales involved. Two alternative scenarios can describe the steps in Fig 2. In both scenarios, event 1 is a DNA replication, and cells W and Z are each capable of mating (they are respectively a non-WGD haploid and a post-WGD haploid). The key question is whether the DNA molecules labeled X and Y existed in (1) two different cells of the same species or (2) two cells of two different species. Scenario 1 is called autopolyploidization, in which case event 1 corresponds to a simple cell division and event 2 is a mating between gametes from the same species or some other form of cell fusion. Scenario 2 is called allopolyploidization or hybridization, in which case event 1 is a speciation and event 2 is an interspecies mating or cell fusion. If event 2 was a mating, then an additional step such as deletion of one allele at the MAT locus is necessary to convert cell Z from a nonmating zygote to a mating gamete—but it is not essential that this additional step occurred immediately after event 2. In fact, a long delay in which cell Z replicated mitotically for many generations could be useful because it could allow reproductive isolation from cells of type W to build up. Eventually (event 3), mating between two post-WGD haploid cells of type Z can produce a post-WGD diploid like cell ZZ, which is the state in which S. cerevisiae is normally found in nature.

See text for details. In an allopolyploidization, the red and blue chromosomes are called homeologs.

The major difference between these two scenarios is the amount of time (T) that elapsed between events 1 and 2: was it a few generations or millions of years? In scenario 1, molecules X and Y must be identical, whereas in scenario 2 they could have any level of sequence divergence from minimal to extensive, and they could also differ by chromosomal rearrangements. It has been difficult to design tests that could differentiate between these scenarios, but an analysis of the inferred order of genes along molecules X and Y did not find any rearrangements and so did not rule out scenario 1 [9]. However, it has been frustrating that we could not pin down the details of this crucial phase of yeast evolution, which gave birth to many pairs of genes with substantially divergent functions [11–15].

In this issue of PLOS Biology, Marcet-Houben and Gabaldón now report strong evidence in support of interspecies hybridization (scenario 2) as the source of the two subgenomes in post-WGD species [16]. By phylogenetic analysis using state-of-the-art methods, they show that molecules X and Y have phylogenetic affinities to two different non-WGD lineages that they call the KLE and ZT clades. The KLE clade (Kluyveromyces, Lachancea, and Eremothecium) is the group of non-WGD species that was sequenced in 2004 [5–7]. The ZT clade (Zygosaccharomyces and Torulaspora) is a separate, more recently studied non-WGD lineage [17,18]. Previous phylogenetic studies using supertrees or concatenated data suggested that the ZT clade is sister to the post-WGD clade, with the KLE clade being an out-group to them both [17,19,20]. The new analysis [16] made trees for each gene individually and found that, although the majority of genes in post-WGD species do cluster phylogenetically with the ZT clade as expected, a significant minority (about 30%) instead either cluster with the KLE clade or form an outgroup to a KLE + ZT clade. This phylogenetic heterogeneity was not noticed before because the KLE signal is only present in a minority of genes, and it is swamped by the ZT signal in methods that try to place the post-WGD clade at a single point on the tree.

Marcet-Houben and Gabaldón interpret this phylogenetic heterogeneity as evidence that the two post-WGD subgenomes have separate origins, one from the ZT clade and the other from an unidentified lineage that is an outgroup to KLE + ZT. Under the simplest hypothesis of hybridization, we might then expect that phylogenetic trees constructed from ohnolog pairs should show one S. cerevisiae gene grouping with the ZT clade and the other grouping with the KLE clade, but in fact, most ohnolog pairs group with each other, with the ZT clade as their closest relative [16]. The authors’ explanation for these two results—an excess of ZT-like ohnolog pairs, and an excess of ZT-like genes in the whole genome (which is mostly singletons)—is that the post-WGD genomes have been affected by biased gene conversion that preferentially replaced some KLE-derived sequences with copies of the homeologous ZT-derived sequences, homogenizing these regions and obliterating their signal of KLE ancestry.

The hybridization proposed by Marcet-Houben and Gabaldón makes a lot of sense in terms of what we know about the biology of yeast interspecies hybrids. Many yeast strains, most notably those used in commercial settings where stress tolerance is important, have turned out to be interspecies hybrids. For instance, the yeast used to brew lager (S. pastorianus) is a hybrid between S. cerevisiae and S. eubayanus [21,22], and many other combinations of genomes from different species of Saccharomyces have been found in nature [23]. These interspecies hybrids are usually infertile (unable to sporulate) because the two copies (homeologs) of each chromosome that they contain are too dissimilar to pair properly during meiosis [24–26]. One simple way to restore fertility is to double the genome, allowing each chromosome to pair with an identical partner instead of trying to pair with the homeolog. In this model, cell Z changes from being a nonmater (effectively diploid) to a mater (effectively haploid—perhaps by deletion of a MAT allele), then two cells of type Z mate to produce cell ZZ (diploid), and cell ZZ is able to go through meiosis and make spores with twice the DNA content of cell W. Thus, one hypothesis that Marcet-Houben and Gabaldón propose is that event 2 was an interspecies mating and event 3 was a restoration of fertility by genome doubling, with a possible interval of many mitotic generations between these two events. Alternatively, they hypothesize that event 2 may have been an interspecies fusion of diploid cells, obviating the need for a separate event 3.

The obscuring of the phylogenetic signal of hybridization by subsequent gene conversions [16] is consistent with the known genome structures of some interspecies hybrids. The yeasts Pichia sorbitophila [27] and Candida orthopsilosis [28] are both interspecies hybrids, but in each case extensive homogenization of parts of the genome has occurred. This process of homogenization has been called overwriting, loss of heterozygosity, or gene conversion by different groups. It leaves the number of chromosomes unchanged (equal to the sum of the numbers of chromosomes in the two incoming subgenomes) but involves the replacement of sequences in one subgenome by sequences copied from the other subgenome (cell H in Fig 2). Homogenization could occur on scales as small as a few hundred base pairs (gene conversion) or as large as whole chromosome arms (break-induced replication). In the latter case, even differences in gene order such as inversions between the parental species could be ironed out.

The discovery that the yeast WGD was an allopolyploidization adds complexity to what initially seemed to be a simple story of duplication. If an interspecies hybrid such as P. sorbitophila with a partly homogenized genome developed two mating types and these could mate to form a diploid that could sporulate efficiently, the result would be a species with a genome resembling the inferred progenitor of the post-WGD clade (cell HH in Fig 2). Allopolyploidy answers some old questions about why genes were retained in duplicate if their sequences were identical (answer: they weren’t identical), what the immediate selective advantage of the post-WGD cell was (answer: hybrid vigor), and how the post-WGD lineage became reproductively isolated from the pre-WGD lineage (answer: delay between events 2 and 3). But it also raises new questions about homogenization (how much of the genome? how often? why is it biased?) and about the mechanism of restoration of fertility (why is event 3 so rare, apparently happening only once in the budding yeast family even though event 2 happened quite often?).

Ancient WGDs have been detected right across the eukaryotic tree of life, including in animals, ciliates, fungi, and, most prominently, plants [29–32]. If extensive gene conversion can obscure the traces of allopolyploidization in yeast genomes, one might wonder how many of these other ancient WGDs also began as interspecies hybridizations. In fact, there is evidence from plants that gene conversion acts continually to homogenize ohnolog pairs [32,33] and that hybrid plants can show preferential retention of DNA from one parent over the other [34] similar to the situation in P. sorbitophila [27]. Detecting the yeast hybridization in the presence of these obscuring factors required both good luck and good timing: good luck that a reference species closer to one parent than to the other had been sequenced and good timing that the hybrid was sampled before all traces of its hybrid origin had faded away. These fortunate circumstances may not hold for ancient hybridizations in other eukaryotes, but as a famous golfer once said, “The harder I practice, the luckier I get.” Detecting that they are hybridizations may become possible with exhaustive sampling of possible parental lineages and the use of sensitive phylogenomic methods of the type introduced by the authors [16].

Evolution of gene expression after whole-genome duplication: New insights from the spotted gar genome

Whole-genome duplications (WGDs) are important evolutionary events. Our understanding of underlying mechanisms, including the evolution of duplicated genes after WGD, however, remains incomplete. Teleost fish experienced a common WGD (teleost-specific genome duplication, or TGD) followed by a dramatic adaptive radiation leading to more than half of all vertebrate species. The analysis of gene expression patterns following TGD at the genome level has been limited by the lack of suitable genomic resources. The recent concomitant release of the genome sequence of spotted gar (a representative of holosteans, the closest-related lineage of teleosts that lacks the TGD) and the tissue-specific gene expression repertoires of over 20 holostean and teleostean fish species, including spotted gar, zebrafish, and medaka (the PhyloFish project), offers a unique opportunity to study the evolution of gene expression following TGD in teleosts. We show that most TGD duplicates gained their current status (loss of one duplicate gene or retention of both duplicates) relatively rapidly after TGD (i.e., prior to the divergence of medaka and zebrafish lineages). The loss of one duplicate is the most common fate after TGD with a probability of approximately 80%. In addition, the fate of duplicate genes after TGD, including subfunctionalization, neofunctionalization, or retention of two "similar" copies occurred not only before but also after the divergence of species tested, in consistency with a role of the TGD in speciation and/or evolution of gene function. Finally, we report novel cases of TGD ohnolog subfunctionalization and neofunctionalization that further illustrate the importance of these processes.

Keywords: PhyloFish medaka teleost transcriptome zebrafish.

© 2017 Wiley Periodicals, Inc.

Conflict of interest statement

The authors have nothing to declare.


The diagram (A) shows the partitioning of Gar genes and…

Figure 2. Examples of conserved neofunctionalized and…

Figure 2. Examples of conserved neofunctionalized and subfunctionalized ohnologs in zebrafish and medaka, based on…

Figure 3. Conservation of expression after the…

Figure 3. Conservation of expression after the TGD for genes that have been retained in…

Figure 4. Conservation of expression after the…

Figure 4. Conservation of expression after the TGD for genes conserved as duplicates in one…

Figure 5. Distribution of expression pattern correlations…

Figure 5. Distribution of expression pattern correlations between zebrafish and medaka TGD ohnologs and orthologs

Figure 6. Expression of conserved TGD ohnologs…

Figure 6. Expression of conserved TGD ohnologs in zebrafish and medaka reveals four classes of…

The house spider genome reveals an ancient whole-genome duplication during arachnid evolution

Background: The duplication of genes can occur through various mechanisms and is thought to make a major contribution to the evolutionary diversification of organisms. There is increasing evidence for a large-scale duplication of genes in some chelicerate lineages including two rounds of whole genome duplication (WGD) in horseshoe crabs. To investigate this further, we sequenced and analyzed the genome of the common house spider Parasteatoda tepidariorum.

Results: We found pervasive duplication of both coding and non-coding genes in this spider, including two clusters of Hox genes. Analysis of synteny conservation across the P. tepidariorum genome suggests that there has been an ancient WGD in spiders. Comparison with the genomes of other chelicerates, including that of the newly sequenced bark scorpion Centruroides sculpturatus, suggests that this event occurred in the common ancestor of spiders and scorpions, and is probably independent of the WGDs in horseshoe crabs. Furthermore, characterization of the sequence and expression of the Hox paralogs in P. tepidariorum suggests that many have been subject to neo-functionalization and/or sub-functionalization since their duplication.

Conclusions: Our results reveal that spiders and scorpions are likely the descendants of a polyploid ancestor that lived more than 450 MYA. Given the extensive morphological diversity and ecological adaptations found among these animals, rivaling those of vertebrates, our study of the ancient WGD event in Arachnopulmonata provides a new comparative platform to explore common and divergent evolutionary outcomes of polyploidization events across eukaryotes.

Keywords: Centruroides sculpturatus Evolution Gene duplication Genome Hox genes Parasteatoda tepidariorum.


No detectible signal of WGD exists in the analysis of gene family membership. There is no peak at four genes per family for any of the vertebrates (Figure S1) as might result from 2R. Presumably this results from a great number of subsequent gene losses that have erased this signal. Likewise, the phylogenetic timing of the duplication events is also inconclusive, because duplications are common on every branch (see Figure 4). Although there is a somewhat greater number assigned to the base of vertebrates, there is no reliable way to evaluate the significance of this. In fact, even if this larger number could be found to be statistically significant, it may simply indicate that this was a period with an accelerated duplication of individual genes or multigene segments or a reduction in the rate of gene loss, rather than indicating WGD.

Conclusive evidence for 2R is seen only when data from gene families, phylogenetic trees, and genomic map position are all taken together, as has been advocated by others [21,32,43]. When examining the genomic map position of only those genes in the human genome that trace their ancestry back to a duplication event at the base of vertebrates, a clear pattern of tetra-paralogons emerges, indicating that 2R occurred at the base of vertebrates. This signal remains most clearly in 25% of the human genome that forms the largest category in the analysis shown in Figures 5 and 6, but we also find that 72% of all human genes are included in the total extent of all of the paralogons that overlap with these regions, providing the least constrained estimate of the portion of the human genome still retaining structure from the 2R. This is the outside estimate, because some portion could have as well been the result of segmental duplications of regions earlier established by WGD. This is in contrast to the pattern seen for the many other gene duplications, which generated paralogs that are predominantly arranged in tandem.

This is particularly compelling considering that this signal has survived more than 450 MY of genome rearrangements and the loss of many genes. We can imagine the effect that duplications, translocations, inversions, and deletions (and combinations thereof) would have had on this analysis: (1) Duplications would cause an increase beyond the 4-fold category (2) translocations would decrease the 4-fold category if they are pervasive enough to clear large regions of paralogs (3) inversions can either cause a decrease in the number of chromosomes hit by moving paralogous genes beyond the detection of the sliding window analysis or cause an increase by spreading some paralogous genes across the boundaries into adjacent segments both of which can be exacerbated by gene translocations that blur the edges of the corresponding regions and (4) deletions would generally increase the 3-fold chromosome category at the expense of the 4-fold category, and a deletion that occurred between the two WGDs would increase the 2-fold chromosome category. Additionally, in some cases, a few individual gene deletions or translocations may have eliminated the links between pairs of duplicated genes. Through these, and combinations of these events, the original 4-fold co-linearity established by 2R (or something less than the perfect 4-fold pattern, if these duplications were long separated) has been eroded.

These tetra-paralogons are spread across nearly all human chromosomes (Table 2). Notably, chromosome Y does not have any tetra-paralogons, perhaps due to its relatively recent origin and small number of genes, or perhaps this indicates a more rapid rate of gene movement. Chromosome 21 also has no tetra-paralogons, and Chromosome 18 has only one that is small. These chromosomes, and other regions without tetra-paralogons, could be of recent origin or they could have undergone multiple rearrangement events that would have destroyed the signal.

Although our study does not specifically address the effect that 2R has had on vertebrate evolution, we note two interesting observations. First, the vast majority of duplicated genes were subsequently deleted, indicating that relatively few genes may have been responsible for the increased complexity seen in vertebrates. Second, it is possible that many genes were loosed from constraint after the genome duplications and experienced an accelerated rate of sequence change before returning to single copy, and it is possible that this has played some role in the evolution of vertebrate complexity [44].

The mechanism of these genome duplication events, whether two separate rounds of either auto- or allo-tetraploidy or a single octoploidy, remains uncertain. We speculate that the most likely scenario is two rounds of closely spaced auto-tetraploidization events, based on the following observations. For most sets of tetra-paralogs, some pairs within the set extend over a longer region than others, indicating two distinct duplication events. If, alternatively, there had been a single octoploidy, then we would have to hypothesize multiple occasions in which two of the four descendant genomic segments lost the same sets of genes independently, which seems unlikely. The phylogenetic trees for the gene families are not consistently nested, as would be expected in the case of allo-tetraploidy or two widely spaced auto-tetraploidy events. Finally, tree topologies of genes within paralogy blocks are not always congruent, indicating that the process of gene loss and rediploidization spanned the duplication events [17].

It remains unclear to what extent such large-scale genomic events have driven macroevolutionary change versus the regular accumulation of small mutations, as is the central tenet of the classical model of evolution. We imagine that rapid and extensive evolutionary change could possibly be an emergent property of having all genes duplicated at the same time, allowing this expanded gene repertoire to evolve together, and so reach a greater level of interaction and complexity than could evolve from cumulative single gene duplications. WGDs have occurred in many lineages, including frogs [45,46], fish [41,42,47], yeast [27–30], Arabidopsis [27–30], and corn and several other crop species [48], all of which are being studied by modern genomics techniques. We view the broad and pervasive distribution of these tetra-paralogons in the human genome, despite the remarkably small number of genes remaining in duplicate, as robust evidence that 2R occurred at the base of Vertebrata, and anticipate that future studies will soon illuminate the roles this has played in the evolutionary success of the vertebrate lineage.

Exploring whole-genome duplicate gene retention with complex genetic interaction analysis

Whole-genome duplication has played a central role in the genome evolution of many organisms, including the human genome. Most duplicated genes are eliminated, and factors that influence the retention of persisting duplicates remain poorly understood. We describe a systematic complex genetic interaction analysis with yeast paralogs derived from the whole-genome duplication event. Mapping of digenic interactions for a deletion mutant of each paralog, and of trigenic interactions for the double mutant, provides insight into their roles and a quantitative measure of their functional redundancy. Trigenic interaction analysis distinguishes two classes of paralogs: a more functionally divergent subset and another that retained more functional overlap. Gene feature analysis and modeling suggest that evolutionary trajectories of duplicated genes are dictated by combined functional and structural entanglement factors.

Copyright © 2020 The Authors, some rights reserved exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


Fig. 1.. Triple-mutant synthetic genetic array (SGA)…

Fig. 1.. Triple-mutant synthetic genetic array (SGA) analysis for paralogs.

Fig. 2.. Distribution of different types of…

Fig. 2.. Distribution of different types of trigenic interactions for paralogs.

Fig. 3.. Mapping functional relationship of paralogs…

Fig. 3.. Mapping functional relationship of paralogs through their digenic and trigenic interactions.

Fig. 4.. Trigenic interaction fraction correlates with…

Fig. 4.. Trigenic interaction fraction correlates with fundamental physiological and evolutionary properties.

Fig. 5.. Trigenic interaction fraction reveals the…

Fig. 5.. Trigenic interaction fraction reveals the functional divergence of duplicated genes and illuminates gene…

Fig. 6.. The evolution of retained overlap…

Fig. 6.. The evolution of retained overlap due to evolutionary constraints acting on duplicated gene…

Published by Fi Gennu

Fi Gennu is a pen-name used for tracking certain posts on the blog. Often they're posts produced with the aid of Hemingway. It's almost certain that Alun Salt either wrote or edited this post.View all posts by Fi Gennu

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

The Week in Botany

Each Monday we send an email collating the links people following @BotanyOne on Twitter are sharing. You can see the emails we send, and sign up to get the latest email in your inbox each Monday, at Revue.




About Us

Botany One is a blog run by the Annals of Botany Company, a non-profit educational charity.

In addition to Botany One, the company currently publishes three journals, the Annals of Botany, AoB PLANTS, and in silico Plants.

Gene fate after single whole-genome duplication in angiosperm

CG, CHG and CHH methylation patterns of the lotus genes with different duplication status (fate after a WGD). Credit: SHI Tao

Multiple whole-genome duplications (WGDs) are found in most sequenced angiosperms. WGDs help plants to survive in extreme environments and contribute to phenotypic innovations. Duplicated genes following WGD often have different fates: They can quickly disappear again, be retained for long(er) periods, or subsequently undergo small-scale duplications. But why do different genes have different fates following a WGD? How can different expression, epigenetic regulation, and functional constraints be associated with these different gene fates following a WGD? To answer these questions requires a model plant with a single WGD during its evolutionary past.

Researchers from the Wuhan Botanical Garden of the Chinese Academy of Sciences (CAS), Ghent University, University of Maryland and Sun Yat-sen University have investigated the lotus, an angiosperm with a single WGD during the K-pg boundary.

Relying on an improved intraspecific-synteny identification by a high-throughput chromosome conformation capture (Hi-C)-based genome assembly, transcriptome, and bisulfite sequencing, the researchers explored not only the fundamental distinctions in genomic features, expression and methylation patterns of genes of different fates after a WGD, but also what shaped post-WGD expression divergence and expression bias between duplicates.

Also, they found biases in expression levels between different subgenomes reflecting subgenome dominance, which were associated with the bias of subgenome fractionation. Based on the observed subgenome pattern, they suggest that the lotus might be an ancient allopolyploid.

This study on the genome duplication of lotus emphasizes the impact of functional constraints on gene fate and post-WGD duplicates divergence in plants.

The article, titled "Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plant," has been published in Molecular Biology and Evolution.


Ancient whole-genome duplications (WGDs) are major evolutionary events that have impacted several eukaryotic lineages, including plants, animals, and fungi [1]. Among plants, ancestral WGDs have been identified in monocots and core eudicots [2], and more recent events are apparent in many lineages such as Arabidopsis, maize, and soybean [3–5]. In vertebrates, the existence of two ancestral WGDs (but also more recent ones in teleost fishes and frogs) has been proposed [2]. Earlier work has focused on establishing the periods at which these events occurred [6,7] and on assessing the functional and evolutionary aftermath of the doubling of the entire genetic complement [8]. However, we still do not fully understand what initially triggered these events. Perhaps the best-studied WGD is the one affecting an ancestor of the baker's yeast Saccharomyces cerevisiae, an event supported by the finding of numerous blocks of paralogs with conserved synteny [7,9]. It is now established that this event occurred just before the separation of Vanderwaltozyma polyspora from the S. cerevisiae lineage, originating a clade of post-WGD species (Fig 1A) [10]. In addition, it has been shown that the genome doubling was followed by extensive genome rearrangements and rampant gene loss that have since shaped these species' genomes, resulting in only a minor fraction of the WGD-derived paralogs (ohnologs) being retained [11,12]. Based on the high level of synteny found between reconstructed ancestrally duplicated gene blocks, it has been proposed that the yeast WGD has its origin in an autopolyploidization event [11]. This proposition has important implications with respect to the possible initial selective advantages that played a role after the polyploidization event. Polyploidy has been considered to promote evolutionary innovation because it facilitates neo- and subfunctionalization and buffers deleterious mutations. However, these mechanisms only provide an advantage after some time has passed and a number of mutations have accumulated. Conversely, simple increase in ploidy has been considered to put barriers to fast adaptation, as it masks beneficial recessive mutations and avoids rapid purging of deleterious mutations. Furthermore, most experimental work comparing populations of different ploidy generally provides support for the superiority of the normal ploidy versus increased ploidies in a given species [13]. Thus, the nature of the initial evolutionary advantage of the yeast WGD remains an open question.

(A) Evolutionary relationships of the analysed species. The tree was built using a maximum likelihood approach on a concatenated alignment of 516 widespread orthologs. All branches had maximal bootstrap support (100%). The WGD and the pre-KLE (Kluyveromyces, Lachancea, and Eremothecium) branch are marked with coloured circles. Branches in the lineage leading from S. cerevisiae to the root are numbered from more ancestral (n1) to more recent (n8). (B) Duplication densities (duplications per gene per branch) calculated for each annotated branch, either using the entire set of gene trees (green dots) or only the ohnologs (yellow dots). (C) Sequence divergence between yeast sequences belonging to two populations: duplication mapped at the WGD branch (blue) and duplication mapped at the pre-KLE branch (red). Graphs represent frequencies of normalized blast scores, Kimura distances, and estimated divergence age, respectively. Normalized blast score is the result of dividing the blast score obtained when aligning the seed yeast protein to the ohnolog pair by the blast score obtained from aligning the seed yeast protein to itself. The Kimura distance between the two sequences was calculated using protdist as implemented in the phylip package after aligning the two sequences. PL-R8s [14] was used to assess the divergence times in individual trees that contained two ohnologous genes. Data on which this figure is based are provided in S1 Data.

WGDs leave a footprint in the form of cohorts of homologous genes that duplicated in the same period. Phylogenetic analysis of gene families informs on the relative age of duplications [15,16] and hence is a powerful tool to study WGDs. When ancestral duplications are inferred from the genes encoded in a genome and their relative dates are mapped to a reference species tree, ancient WGDs are expected to lead to an accumulation of duplications mapped to the lineage in which the event occurred. Earlier analyses have used such approach to detect ancient duplications in vertebrates [17,18] and plants [19]. However, despite extensive phylogenetic work [20–22], no study has assessed the global phylogenetic congruence of gene duplications and the WGD that occurred in the lineage leading to S. cerevisiae. Here, we set out to investigate patterns of past duplications in S. cerevisiae by analysing genome-wide sets of gene phylogenies (i.e., phylomes).

Whole-genome duplication as a key factor in crop domestication

Polyploidy is commonly thought to be associated with the domestication process because of its concurrence with agriculturally favourable traits and because it is widespread among the major plant crops 1–4 . Furthermore, the genetic consequences of polyploidy 5–7 might have increased the adaptive plasticity of those plants, enabling successful domestication 6–8 . Nevertheless, a detailed phylogenetic analysis regarding the association of polyploidy with the domestication process, and the temporal order of these distinct events, has been lacking 3 . Here, we have gathered a comprehensive data set including dozens of genera, each containing one or more major crop species and for which sufficient sequence and chromosome number data exist. Using probabilistic inference of ploidy levels conducted within a phylogenetic framework, we have examined the incidence of polyploidization events within each genus. We found that domesticated plants have gone through more polyploidy events than their wild relatives, with monocots exhibiting the most profound difference: 54% of the crops are polyploids versus 40% of the wild species. We then examined whether the preponderance of polyploidy among crop species is the result of two, non-mutually-exclusive hypotheses: (1) polyploidy followed by domestication, and (2) domestication followed by polyploidy. We found support for the first hypothesis, whereby polyploid species were more likely to be domesticated than their wild relatives, suggesting that the genetic consequences of polyploidy have conferred genetic preconditions for successful domestication on many of these plants.

During the past 13,000 years of human history, hundreds of crop plants were independently domesticated at different regions across the globe 9 . Despite their independent origin, many domesticated plants share a similar set of morphological and physiological traits, termed the domestication syndrome 10 , that collectively distinguish crop plants from their wild progenitors. Polyploidy is also considered as an important trait in the domestication process 11–13 and it has been hypothesized that the genetic consequences of polyploidy, including increased allelic diversity, heterozygosity and enhanced meiotic recombination, have increased the adaptive plasticity of polyploid plants under cultivation conditions 5–7 . This has resulted in larger phenotypic breadth on which natural and artificial selection could act, enabling successful domestication. Indeed, some of our most important crop species, including wheat, potato, cotton and sugar cane, have experienced complex histories of repeated polyploidization events. However, previous surveys 1,14 did not find statistical support for the hypothesis that polyploidy is a more frequent phenomenon in cultivated plants than in wild species.


  1. Ozanna

    I would be sick with those in the crib.

  2. Taymullah

    You have hit the mark. Thought excellent, I support.

  3. Cynyr

    Quite right! I think, what is it excellent idea.

  4. Eldrick

    Sorry I'm stopping but could give you more information.

  5. Armaan

    A really interesting selection.

Write a message