Defining paper(s) in epigenetics

For someone who is interested in learning about the discovery of epigenetics, which are the foundational defining papers in the area?

I understand that Robin Holliday was the first to discuss the possible role of DNA methylation in the control of Gene expression. In his paper "The inheritance of epigenetic defects" he presents what is one of the first modern formulations of what we now regard as epigenetics. The term "epigenetics" itself was coined by Conrad Waddington although this predated our modern understanding of heredity.

Holliday, R., The inheritance of epigenetic defects, 1987, Science, 238, 4824

There are few single papers that really drive the field by themselves, epigenetics has been a long slow progression (although more recently the definition has been muddied and started to include non-epigenetic modes of gene regulation). A pretty good review of a lot of research, including a more circumspect discussion about what epigenetics is can be found in Youngson and Whitelaw's Annual Review (

I found two nice review papers focused of epigenetics and human disease, one from 2004, and one from 2006. But perhaps you are looking more for something like this?

I also found this article in the BBC news: The Ghost in Your Genes

They have shown that a famine at critical times in the lives of the grandparents can affect the life expectancy of the grandchildren. This is the first evidence that an environmental effect can be inherited in humans.


  • Jiang Y-H, Bressler J, Beaudet AL. 2004. Epigenetics and human disease. Annual review of genomics and human genetics 5: 479-510.

  • Rodenhiser D, Mann M. 2006. Epigenetics and human disease: translating basic biology into clinical applications. CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne 174: 341-8.

  • Haig D. 2004. The (dual) origin of epigenetics. Cold Spring Harbor symposia on quantitative biology 69: 67-70. [pdf]

Conrad Waddington and the origin of epigenetics

Classics is an occasional column, featuring historic publications from the literature. These articles, written by modern experts in the field, discuss each classic paper's impact on the field of biology and their own work.

Denis Noble Conrad Waddington and the origin of epigenetics. J Exp Biol 15 March 2015 218 (6): 816–818. doi:

Denis Noble discusses Conrad Waddington's classic paper, ‘The genetic assimilation of the bithorax phenotype’, published in Evolution in 1956.

Denis Noble discusses Conrad Waddington's classic paper, ‘The genetic assimilation of the bithorax phenotype’, published in Evolution in 1956.

In 1956, the British developmental biologist, Conrad Waddington, published a paper in the journal Evolution (Waddington, 1956) in which he succeeded in demonstrating the inheritance of a characteristic acquired in a population in response to an environmental stimulus. Much earlier, in 1890, August Weismann had tried and failed to achieve this. He amputated the tails of five successive generations of mice and showed absolutely no evidence for an effect on subsequent generations. Weismann's discovery that the effects of an environmental stimulus (tail amputation) cannot be transmitted to subsequent generations, together with his assumption that genetic change is random, formed the foundations of the Modern Synthesis (Neo-Darwinism) of our understanding of genetic inheritance.

Waddington's approach, however, was much more subtle and more likely to be successful because he realised that the way to test for the inheritance of acquired characteristics is first to discover what forms of developmental plasticity already exist in a population, or that the population could be persuaded to demonstrate with a little nudging from the environment. By exploiting plasticity that already existed he was much more likely to mimic a path that evolution itself could have taken.

He used the word ‘canalised’ for this kind of persuasion since he represented the developmental process as a series of ‘decisions’ that could be represented as ‘valleys’ and ‘forks’ in a developmental landscape (Fig. 1). He knew from his developmental studies that embryo fruit flies could be persuaded to show different thorax and wing structures, simply by changing the environmental temperature or by a chemical stimulus. In his landscape diagram, this could be represented as a small manipulation in slope that would lead to one channel in the landscape being favoured over another, so that the adult could show a different phenotype starting from the same genotype.

Waddington's developmental landscape diagram. The landscape itself and the ball at the top are from his original diagram. The subsequent positions of the ball have been added to illustrate his point that development can be canalised to follow different routes (A and B). The plasticity to enable this to happen already exists in the wild population of organisms (modified diagram by K. Mitchell).

Waddington's developmental landscape diagram. The landscape itself and the ball at the top are from his original diagram. The subsequent positions of the ball have been added to illustrate his point that development can be canalised to follow different routes (A and B). The plasticity to enable this to happen already exists in the wild population of organisms (modified diagram by K. Mitchell).

The next step in his experiment was to select for and breed from the animals that displayed the new characteristic. Exposed to the same environmental stimulus, these gave rise to progeny with an even higher proportion of adults displaying the new character. After a relatively small number of generations, he found that he could then breed from the animals and obtain robust inheritance of the new character even without applying the environmental stimulus. The characteristic had therefore become locked into the genetics of the animal. He called this process genetic assimilation. What he had succeeded in showing was that an acquired characteristic could first be inherited as what we would now call ‘soft’ inheritance, and that it could then be assimilated into becoming standard ‘hard’ genetic inheritance. Today, we call ‘soft’ inheritance epigenetic inheritance, and of course, we know many more mechanisms by which the same genome can be controlled to produce different epigenetic effects.

What was happening at the gene level in Waddington's experiments? A standard Neo-Darwinist explanation might be that some mutations occurred. That is possible, but extremely unlikely on the time scale of the experiment, which was only a few generations. Moreover, random mutations would occur in individuals, not in a whole group. Single small mutations would have taken very many generations to spread through whole populations, and many such mutations would have been required.

But I think there is a much simpler explanation. Recall that the experiment exploited plasticity that is already present in the population. That strongly suggests that all the alleles (gene variants) necessary for the inheritance of the characteristic were already present in the population, but not initially in any particular individuals in the correct combination. The experiment simply brings them together. This is a modification of the pattern of the genome in response to the environmental change, but not in a way that requires any new mutations. I came to this conclusion before reading Waddington's (1957) book, The Strategy of the Genes. But it is in fact one of Waddington's own ideas! He writes ‘There is no … reason which would prevent us from imagining that all the genes which eventually make up the assimilated genotype were already present in the population before the selection began, and only required bringing together’ (p. 176). Not only does he clearly see this possibility, he also tests it. He continues (p. 178) ‘Attempts to carry out genetic assimilation starting from inbred lines have remained quite unsuccessful. This provides further evidence that the process depends on the utilisation of genetic variability in the foundation stock with which the experiment begins’. His text could not be clearer.

Orthodox Neo-Darwinists dismissed Waddington's findings as merely an example of the evolution of phenotype plasticity. That is what you will find in many of the biology textbooks even today (e.g. Arthur, 2010). I think that Waddington showed more than that. Of course, plasticity can evolve, and that itself could be by a Neo-Darwinist or any other mechanism. But Waddington was not simply showing the evolution of plasticity in general he was showing how it could be exploited to enable a particular acquired characteristic in response to an environmental change to be inherited and be assimilated into the genome. Moreover, he departed from the strict Neo-Darwinist view by showing that this could happen even if no new mutations occur (Fig. 2).

Waddington's diagram to show how the developmental landscape relates to individual genes (bottom pegs) through networks of interactions in the organism. Since he also showed the influence of the external environment on canalisation of development, I have extended the diagram by adding the top part to represent the environmental influences. It is the combination of these influences that can lead to an evolutionary change without mutations (modified from Waddington, 1957).

Waddington's diagram to show how the developmental landscape relates to individual genes (bottom pegs) through networks of interactions in the organism. Since he also showed the influence of the external environment on canalisation of development, I have extended the diagram by adding the top part to represent the environmental influences. It is the combination of these influences that can lead to an evolutionary change without mutations (modified from Waddington, 1957).

Epigenetics means ‘above genetics’ and it was originally conceived by Waddington himself to describe the existence of mechanisms of inheritance in addition to (over and above) standard genetics (Bard, 2008). Waddington regarded himself as a Darwinist since Darwin also, in The Origin of Species, included the inheritance of acquired characteristics. But significantly, Waddington was not a Neo-Darwinist since Neo-Darwinism, following Weismann, specifically excludes such inheritance. Waddington was a profound thinker about biology, and much else too. The Strategy of the Genes is a masterly account of the many reasons why he dissented from Neo-Darwinism, and it has stood the test of time. It was reprinted over half a century later, in 2014. He did not describe himself as a Lamarckian, but by revealing mechanisms of inheritance of acquired characteristics, I think he should be regarded as such. The reason he did not do so is that Lamarck could not have conceived of the processes that Waddington revealed. Incidentally, it is also true to say that Lamarck did not invent the idea of the inheritance of acquired characteristics. But, whether historically correct or not, we are stuck today with the term ‘Lamarckian’ for inheritance of a characteristic acquired through an environmental influence.

Waddington's concepts of plasticity and epigenetics have been very influential in my own thinking about experiments on cardiac rhythm. We found that the heart's pacemaker is very robust, so much so that protein mechanisms normally responsible for a large part of the rhythm could be completely blocked or deleted (Noble et al., 1992). Only very small changes in rhythm occur, because other mechanisms come into play to ensure that pacemaker activity continues. The relation between individual genes and the phenotype is therefore mediated through networks of interactions that can buffer individual gene variation, just as Waddington envisaged in his diagrams of epigenetic effects and canalisation. This is one of the reasons why I became interested in evolutionary biology many years ago, and why I have also explored ways in which evolutionary theory can be integrated with recent discoveries in molecular and physiological biology (Noble et al., 2014).

Waddington's concepts are also highly relevant to biologists interested in the ways in which organisms adapt to their environment, and to comparative biologists interested in how this varies between species. Many of the ways in which modern epigenetics plays an essential role in these fields have been described in a special issue of this journal (see overview by Knight, 2015). The discovery of epigenetic marking of DNA and its associated chromatin proteins has opened up new vistas for experimental biology.

I conclude this article with a warning: if you are inspired to try to repeat Waddington's 1956 experiment, do remember that you will fail if you try to do it on a cloned laboratory population. The mechanism depends on using a wild population with natural genetic diversity. In this respect it resembles a phenomenon first noted by James Baldwin (1896). This is that individuals in a population with the ‘correct’ allele combinations could choose a new environment and so permanently change the evolutionary development in that environment. It resembles Waddington's idea, as he himself recognised, because it does not require new mutations. More recently, Karl Popper, the great logician of science, also noted the possible importance of genetic assimilation without mutations in evolutionary theory (Niemann, 2014 Noble, 2014). Popper and Waddington had both taken part in discussions on evolutionary biology during the 1930s and 1940s when the field of molecular biology was still developing (Niemann, 2014).

While celebrating the recent rapid rise in epigenetics research (see Hoppeler, 2015 Knight, 2015), let's also celebrate the father of epigenetics, Conrad Waddington, who opened our eyes to the rich opportunities of adaptation through epigenetic regulation.

Epigenetic inheritance versus plasticity

An appreciation of the role of chromatin as a carrier of epigenetic information that can propagate active and silent activity states during cell division came from the study of different biological processes and model organisms. These include, to name but a few, heterochromatin inheritance in yeast, X-chromosome inactivation (the process by which one of the copies of the female X chromosome is silenced), or genomic imprinting (the parent-of-origin-specific repression of certain genes) in mammals vernalization (the induction of flowering by exposure to prolonged cold during winter) in plants position effect variegation (the silencing of a gene in some cells through its abnormal juxtaposition to heterochromatin) in Drosophila. These studies demonstrated that differentially expressed states can be transmitted across cell divisions, once they are established and in the absence of the original signal. Studies of cellular reprogramming in the germline and early embryogenesis 19,20,21,22 , during induced pluripotency (iPS) 23,24 , or upon somatic nuclear transfer 25,26 have shown that chromatin and DNA methylation act as important ‘epigenetic barriers’ (Fig. 1) that prevent changes in gene expression and cell identity.

Epigenetic systems (Box 1) include heterochromatin (HP1 and H3K9me3 (trimethylation of histone 3 lysine 9)), Polycomb (PRC1 and PRC2) and Trithorax (COMPASS (complex proteins associated with SET1)) complexes. These complexes are thought to perpetuate functional responses by modifying histone proteins in chromatin and by binding their own histone marks in order to convey stable inheritance. Indeed, nucleosomes are subject to constant remodelling, histones are exchanged and all DNA and histone marks discovered so far are reversible, although the rates of exchange and the stability of the marks vary in different genomic domains 27 . Therefore, most regulatory signals would be rapidly lost in the absence of tight self-reinforcing loops that maintain the memory of the chromatin state 28 . Furthermore, the inheritance of epigenetic marks through cell division requires that they survive DNA replication and mitosis (Fig. 2). This is particularly relevant for histone modifications, because nucleosomes do not have a DNA template-based duplication system. Deposition of parental H3 and H4 histones occurs within few hundred base pairs of their pre-replication position and, upon replication, they are roughly equally distributed to the leading and the lagging strand daughter DNA molecules, through the action of dedicated molecular complexes 29,30 . Chromatin maturation factors, including DNMT1–UHRF1, EZH2 and HP1, use the proliferating cell nuclear antigen (PCNA a DNA clamp that is essential for replication) or origin recognition complex (ORC) proteins as tethering components 31,32,33,34 (Fig. 2a). In addition, Polycomb components utilize their DNA-anchoring factors to propagate mitotic memory. Loss of the target DNA sequence elements results in loss of PcG proteins and of gene silencing within a few cell divisions in Drosophila 35,36 , although sequence-independent propagation of silencing can be maintained in mammalian cell culture 37 . Mitotic retention of regulatory components (Fig. 2c), including transcription factors and some of the epigenetic machineries described above 38,39 , has been well-documented in recent years 40,41 . Inheritance through meiosis is also possible at least to some extent, as shown by the ability of maternally deposited H3K27me3 to control DNA methylation-independent imprinting 42,43 . An additional possibility is that only a fraction of the marks can be meiotically transmitted, but this might be sufficient to reconstruct chromatin organization in the subsequent generation 44 .

a, DNA replication during the S phase of the cell cycle is a challenge to the maintenance of nucleosome marks. Epigenetic components, such as HMTs and UHRF1, interact with components of the DNA replication machinery, such as the PCNA clamp, in order to reconstitute chromatin domains after the passage of the fork. The case of DNA methylation is depicted schematically. Newly replicated DNA is unmethylated (empty lollipops the methylated template DNA strand is not shown here for simplicity). The UHRF1/DNMT1 complex associated with PCNA facilitates remethylation of hemimethylated DNA after DNA replication. b, Both constitutive (involving H3K9 methylases and HP1) and facultative (involving PRC1 and PRC2) heterochromatin, as well as euchromatic features (involving an interplay between PRC1, PRC2, Trithorax/COMPASS and ATP-dependent chromatin remodelling complexes), are stably maintained during interphase in order to prevent genes from inappropriately switching their functional states. SWI/SNF is a nucleosome remodelling complex. c, During mitosis, most chromosome-associated factors are evicted during chromosome condensation, but ‘mitotic bookmarking’ of genes is achieved by the maintenance of key components (such as certain transcription factors or RNA polymerase III) bound to their target loci.

Owing to the lack of a precise ‘replication’ process for parental nucleosomes and to the loss of many DNA-binding factors and chromatin-associated components during mitosis and meiosis, the inheritance of single nucleosome marks poses specific challenges 28 . Mathematical modelling and biological evidence suggest that chromatin heritability requires the establishment of domains of several or even hundreds of kilobases in size 45,46,47 . Indeed, the genome is now known to be hierarchically organized in a series of 3D structures, starting from nucleosome clutches, to chromatin loops, to chromosomal domains called topologically associating domains (TADs), and finally to active or repressive compartments and chromosome territories 15,46,48,49,50 . TADs and compartments might stabilize functional states and drive their own inheritance. Furthermore, multiple epigenetic machineries often act together to stabilize heritable states. For example, PRC2 collaborates with PRC1 complexes and DNA methylation is sustained by heterochromatin proteins and/or small RNA pathways 51 . In summary, epigenetic inheritance can involve multiple layers and usually entails the cooperation of partially overlapping signals, initially dependent on DNA sequence (elicited by transcription factor binding or RNA-mediated mechanisms). Each of these layers adds a degree of stability, but each of them is also reversible, allowing plasticity in the presence of regulatory cues 47,52 . The inheritance of chromatin states in the absence of chromatin domains, or without self-reinforcing mechanisms, is more challenging 28 . This might require retention of transcription factors, histone variants and histone modifiers during DNA replication and mitotic bookmarking 53 .

Epigenetics: The Science of Change

For nearly a century after the term “epigenetics” first surfaced on the printed page, researchers, physicians, and others poked around in the dark crevices of the gene, trying to untangle the clues that suggested gene function could be altered by more than just changes in sequence. Today, a wide variety of illnesses, behaviors, and other health indicators already have some level of evidence linking them with epigenetic mechanisms, including cancers of almost all types, cognitive dysfunction, and respiratory, cardiovascular, reproductive, autoimmune, and neurobehavioral illnesses. Known or suspected drivers behind epigenetic processes include many agents, including heavy metals, pesticides, diesel exhaust, tobacco smoke, polycyclic aromatic hydrocarbons, hormones, radioactivity, viruses, bacteria, and basic nutrients.

In the past five years, and especially in the past year or two, several groundbreaking studies have focused fresh attention on epigenetics. Interest has been enhanced as it has become clear that understanding epigenetics and epigenomics—the genomewide distribution of epigenetic changes—will be essential in work related to many other topics requiring a thorough understanding of all aspects of genetics, such as stem cells, cloning, aging, synthetic biology, species conservation, evolution, and agriculture.

Multiple Mechanisms

The word “epigenetic” literally means “in addition to changes in genetic sequence.” The term has evolved to include any process that alters gene activity without changing the DNA sequence, and leads to modifications that can be transmitted to daughter cells (although experiments show that some epigenetic changes can be reversed). There likely will continue to be debate over exactly what the term means and what it covers.

Many types of epigenetic processes have been identified—they include methylation, acetylation, phosphorylation, ubiquitylation, and sumolyation. Other epigenetic mechanisms and considerations are likely to surface as work proceeds. Epigenetic processes are natural and essential to many organism functions, but if they occur improperly, there can be major adverse health and behavioral effects.

Perhaps the best known epigenetic process, in part because it has been easiest to study with existing technology, is DNA methylation. This is the addition or removal of a methyl group (CH3), predominantly where cytosine bases occur consecutively. DNA methylation was first confirmed to occur in human cancer in 1983, and has since been observed in many other illnesses and health conditions.

Another significant epigenetic process is chromatin modification. Chromatin is the complex of proteins (histones) and DNA that is tightly bundled to fit into the nucleus. The complex can be modified by substances such as acetyl groups (the process called acetylation), enzymes, and some forms of RNA such as microRNAs and small interfering RNAs. This modification alters chromatin structure to influence gene expression. In general, tightly folded chromatin tends to be shut down, or not expressed, while more open chromatin is functional, or expressed.

One effect of such processes is imprinting. In genetics, imprinting describes the condition where one of the two alleles of a typical gene pair is silenced by an epigenetic process such as methylation or acetylation. This becomes a problem if the expressed allele is damaged or contains a variant that increases the organism’s vulnerability to microbes, toxic agents, or other harmful substances. Imprinting was first identified in 1910 in corn, and first confirmed in mammals in 1991.

Researchers have identified about 80 human genes that can be imprinted, although that number is subject to debate since the strength of the evidence varies. That approximate number isn’t likely to rise much in years to come, writes a team including Ian Morison, a senior research fellow in the Cancer Genetics Laboratory at New Zealand’s University of Otago, in the August 2005 Trends in Genetics. Others in the field disagree. Randy Jirtle, a professor of radiation oncology at Duke University Medical Center, and his colleagues estimated in the June 2005 issue of Genome Research that there could be about 600 imprinted genes in mice in an October 2005 interview Jirtle said he’s anticipating a similar tally for humans, even though the known imprintable genes of mice and people have an overlap of only about 35%.

Links to Disease

Among all the epigenetics research conducted so far, the most extensively studied disease is cancer, and the evidence linking epigenetic processes with cancer is becoming “extremely compelling,” says Peter Jones, director of the University of Southern California’s Norris Comprehensive Cancer Center. Halfway around the world, Toshikazu Ushijima is of the same mind. The chief of the Carcinogenesis Division of Japan’s National Cancer Center Research Institute says epigenetic mechanisms are one of the five most important considerations in the cancer field, and they account for one-third to one-half of known genetic alterations.

Many other health issues have drawn attention. Epigenetic immune system effects occur, and can be reversed, according to research published in the November–December 2005 issue of the Journal of Proteome Research by Nilamadhab Mishra, an assistant professor of rheumatology at the Wake Forest University School of Medicine, and his colleagues. The team says it’s the first to establish a specific link between aberrant histone modification and mechanisms underlying lupus-like symptoms in mice, and they confirmed that a drug in the research stage, trichostatin A, could reverse the modifications. The drug appears to reset the aberrant histone modification by correcting hypoacetylation at two histone sites.

Lupus has also been a focus of Bruce Richardson, chief of the Rheumatology Section at the Ann Arbor Veterans Affairs Medical Center and a professor at the University of Michigan Medical School. In studies published in the May–August 2004 issue of International Reviews of Immunology and the October 2003 issue of Clinical Immunology, he noted that pharmaceuticals such as the heart drug pro-cainamide and the antihypertensive agent hydralazine cause lupus in some people, and demonstrated that lupus-like disease in mice exposed to these drugs is linked with DNA methylation alterations and interruption of signaling pathways similar to those in people.

Substantial Changes

Most epigenetic modification, by whatever mechanism, is believed to be erased with each new generation, during gameto-genesis and after fertilization. However, one of the more startling reports published in 2005 challenges this belief and suggests that epigenetic changes may endure in at least four subsequent generations of organisms.

Michael Skinner, a professor of molecular biosciences and director of the Center for Reproductive Biology at Washington State University, and his team described in the 3 June 2005 issue of Science how they briefly exposed pregnant rats to individual relatively high levels of the insecticide methoxychlor and the fungicide vinclozolin, and documented effects such as decreased sperm production and increased male infertility in the male pups. Digging for more information, they found altered DNA methylation of two genes. As they continued the experiment, they discovered the adverse effects lasted in about 90% of the males in all four subsequent generations they followed, with no additional pesticide exposures.

The findings are not known to have been reproduced. If they are reproducible, however, it could “provide a new paradigm for disease etiology and basic mechanisms in toxicology and evolution not previously appreciated,” says Skinner. He and his colleagues are conducting follow-up studies, assessing many other genes and looking at other effects such as breast and skin tumors, kidney degeneration, and blood defects.

Other studies have found that epigenetic effects occur not just in the womb, but over the full course of a human life span. Manel Esteller, director of the Cancer Epigenetics Laboratory at the Spanish National Cancer Center in Madrid, and his colleagues evaluated 40 pairs of identical twins, ranging in age from 3 to 74, and found a striking trend, described in the 26 July 2005 issue of Proceedings of the National Academy of Sciences. Younger twin pairs and those who shared similar lifestyles and spent more years together had very similar DNA methylation and histone acetylation patterns. But older twins, especially those who had different lifestyles and had spent fewer years of their lives together, had much different patterns in many different tissues, such as lymphocytes, epithelial mouth cells, intra-abdominal fat, and selected muscles.

As one example, the researchers found four times as many differentially expressed genes between a pair of 50-year-old twins compared to 3-year-old twins, and the 50-year-old twin with more DNA hypomethylation and histone hyperacetylation (the epigenetic changes usually associated with transcriptional activity) had the higher number of overexpressed genes. The degree of epigenetic change therefore was directly linked with the degree of change in genetic function.

Sometimes the effects of epigenetic mechanisms show up in living color. Changes in the pigmentation of mouse pup fur, ranging from yellow to brown, were directly tied to supplementation of the pregnant mother’s diet with vitamin B12, folic acid, choline, and betaine, according to studies by Jirtle and Robert Waterland published in August 2003 (issue 15) in Molecular and Cellular Biology. The color changes were directly linked to alterations in DNA methylation. In a study forthcoming in the April 2006 issue of EHP, Jirtle and his colleagues also induced these alterations through maternal ingestion of genistein, the major phytoestrogen in soy, at doses comparable to those a human might receive from a high-soy diet. The methylation changes furthermore appeared to protect the mouse offspring against obesity in adulthood, although there are hints that genistein may also cause health problems, via additive or synergistic effects on DNA methylation, when it interacts with other substances such as folic acid.

Other Drivers of Change

Substances aren’t the only sources of epigenetic changes. The licking, grooming, and nursing methods that mother rats use with their pups can affect the long-term behavior of their offspring, and those results can be tied to changes in DNA methylation and histone acetylation at a glucocorticoid receptor gene promoter in the pup’s hippocampus. This finding was published in the August 2004 issue of Nature Neuroscience by Moshe Szyf, a professor in McGill University’s Department of Pharmacology and Therapeutics, and his colleagues. In the same study, the researchers found that the effects weren’t written in stone giving the drug trichostatin A to older pups could help reverse the effects of poor maternal care received when they were younger. In the 6 June 2003 Journal of Biological Chemistry and the 23 November 2005 Journal of Neuroscience, Szyf and many of the same colleagues also demonstrated that giving the amino acid l -methionine to older pups could negate the benefits of high-quality maternal care received when they were younger.

Along with behavior, mental health may be affected by epigenetic changes, says Arturas Petronis, head of the Krembil Family Epigenetics Laboratory at the Centre for Addiction and Mental Health in Toronto. His lab is among the first in the world, and still one of only a few, to study links between epigenetics and psychiatry. He and his colleagues are conducting large-scale studies investigating links between schizophrenia and aberrant methylation, and he says understanding epigenetic mechanisms is one of the highest priorities in human disease biology research. “We really need some radical revision of key principles of the traditional genetic research program,” he says. “Epigenetics brings a new perspective on the old problem and new analytical tools that will help to test the epigenetic theory.” He suggests that more emphasis is needed on studying non-Mendelian processes in diseases such as schizophrenia, asthma, multiple sclerosis, and diabetes.

The past decade has also been productive in developing strong links between aberrant DNA methylation and aging, says Jean-Pierre Issa, a professor of medicine at The University of Texas M.D. Anderson Cancer Center. He presented information on aging and epigenetic effects at a November 2005 conference titled “Environmental Epigenomics, Imprinting, and Disease Susceptibility,” held in Durham, North Carolina, and sponsored in part by the NIEHS. Some of the strongest, decade-old evidence shows progressive increases in DNA methylation in aging colon tissues, and more recent evidence links hypermethylation with atherosclerosis. Altered, age-related methylation has also been found in tissues in the stomach, esophagus, liver, kidney, and bladder, as well as the tissue types studied by Esteller. Much of Issa’s current work focuses on the links between epigenetic processes, aging, the environment, and cancer, and possible ways to therapeutically reverse methylation linked with cancer.

Current and Future Quandaries

The accumulated evidence indicates that many genes, diseases, and environmental substances are part of the epigenetics picture. However, the evidence is still far too thin to form a basis for any overarching theories about which substances and which target genes are most likely to mediate adverse effects of the environment on diseases, says Melanie Ehrlich, a biochemistry professor at the Tulane University School of Medicine and Tulane Cancer Center who has been conducting research on the topic for more than two decades.

That sense of uncertainty generally leaves epigenetics out of the regulatory picture. “It’s [too early] to actually use it at the moment,” says Julian Preston, acting associate director for health at the EPA’s National Health and Environmental Effects Research Laboratory. But Preston says the agency already relies more on its improving understanding of mechanistic processes, including epigenetics, and there is a clear effort within the EPA to expand genomics efforts both within the agency and with others with whom the agency works.

At the FDA, scientists are investigating many drugs that function through epigenetic mechanisms (although as spokes-woman Christine Parker notes, the agency bases its approvals on results of clinical trials, not consideration of the mechanism by which a drug works). One such drug, azacitidine, has been approved for use in the United States to treat myelodysplastic syndrome, a blood disease that can progress to leukemia. The drug turns on genes that had been shut off by methylation. The drug’s epigenetic function doesn’t make it a “miracle drug,” however. Trials indicate it benefits only 15% of those who take it, and a high percentage of people suffer serious side effects, including nausea (71%), anemia (70%), vomiting (54%), and fever (52%).

Ehrlich points out that azacitidine also has effects at the molecular level—such as inhibiting DNA replication and apoptosis—that may be part of its therapeutic benefits. The drug’s mixed results might also be explained in part by a study published in the October 2004 issue of Cancer Cell by Andrew Feinberg, director of the Johns Hopkins University Center for Epigenetics in Common Human Disease, and his colleagues. They found that each of two tested drugs, trichostatin A and 5-aza-2′-deoxycytidine (which is related to azacitidine), can turn on hundreds of genes while also turning off hundreds of others. If that finding holds in other studies, it suggests one key reason why it is so difficult to create a drug that doesn’t cause unintended side effects.

Public and Private

Despite the potentially huge role that epigenetics may play in human disease, investment in this area of study remains tiny compared to that devoted to traditional genetics work. Several efforts to change that are under way.

In Europe, the Human Epigenome Project was officially launched in 2003 by the Wellcome Trust Sanger Institute, Epigenomics AG, and the Centre National de Génotypage. The group’s focus is on DNA methylation research tied to chromosomes 6, 13, 20, and 22. They may be joined soon by organizations in Germany and India, where scientists plan to work on chromosomes 21 and X, respectively, says Sanger senior investigator Stephan Beck.

But comprehensively studying all the epigenetic and epigenomic factors related to a multitude of diseases and health conditions will take much more work. “A [comprehensive] Human Epigenome Project is a lot more complicated than a Human Genome Project,” Jones says. “There’s only one genome, [but] an epigenome varies in each and every tissue.” The Human Genome Project was a worldwide effort that took more than a decade and billions of dollars to complete.

Jones and Robert Martienssen addressed some of the complexities of a comprehensive, worldwide Human Epigenome Project in the 15 December 2005 issue of Cancer Research. Reporting on a June 2005 workshop convened by the American Association for Cancer Research, they concluded that, despite all the looming difficulties, such a project is essential, and the technology is sufficiently advanced to begin.

“I think it’s going to happen a lot sooner than I thought just a year or so ago,” Jirtle says. A group of researchers has already started the footwork to launch a U.S. complement to the European Human Epigenome Project effort [see box, p. A165].

Other efforts are gaining ground. Another European group, the Epigenome Network of Excellence, took off in June 2004. This information exchange network includes members in the public and private sectors spread throughout ten Western European countries. Their objectives are to coordinate research, provide mentors, and encourage dialogue via their website. And in Asia, a conference held 7–10 November 2005 in Tokyo, “Genome-Wide Epigenetics 2005,” was dedicated in large part to facilitating a coordinated epigenomics research effort in Japan and possibly all of Asia, says Ushijima, one of the conference’s organizers.

In the United States, the National Cancer Institute and the National Human Genome Research Institute formally kicked off a major effort 13 December 2005 that will include epigenomic work. The pilot project of The Cancer Genome Atlas, funded by $50 million each from the two institutes, is designed to lay the groundwork for comprehensive study of genomic factors related to human cancer. The initial three-year effort is expected to focus on just two or three of the more than 200 cancers known to exist, but if it’s successful in developing methods and technologies, the number of cancers evaluated could then expand. If a high number of cancer genes are eventually scrutinized, the effort would be the equivalent of thousands of Human Genome Projects.

To help push the boundaries further, the NIEHS and the National Cancer Institute are in the midst of awarding grants totaling $3.75 million to study a wide range of epigenetic topics, such as identification of high-risk populations, dietary influences on cancer, and detailed study of numerous specific mechanisms linking environmental agents with epigenetic mechanisms and resulting disease. The dozen or so recipients are expected to launch their projects by fall 2006.

The NIEHS has also begun to integrate epigenomics projects into its research portfolio over the past five to six years. “It’s an emerging area that’s very important,” says Frederick Tyson, a program administrator in the NIEHS Division of Extramural Research and Training. And epigenetics is likely to be one of the half dozen or so most important considerations as NIEHS proceeds with its Environmental Genome Project, according to institute director David Schwartz.

The DNA Methylation Society, a professional group, has been growing slowly but steadily over the past decade, says founder and current vice president Ehrlich. As part of its efforts, the society launched a journal, Epigenetics, in January 2006 with the goal of covering a full spectrum of epigenetic considerations—medical, nutritional, psychological, behavioral—in any organism. Such groups are a valuable rallying point for this field, Jirtle says. He himself slowly worked his way into epigenetics from an initial cancer focus, and his segue is typical of many. “If you study epigenetics, you don’t have a home we come from all different fields,” he says.

Interest in the private sector is also picking up. For instance, Epigenomics AG, with offices in Berlin and Seattle, is working on early detection and diagnosis of cancer and endometriosis (for which there is limited evidence of an epigenetic component), as well as development of products to predict effectiveness of drugs to treat these diseases. Founded in 1998, and now with about 150 employees, the company is focusing on DNA methylation mechanisms, and is working with companies such as Abbott Laboratories, Johnson & Johnson, Philip Morris, Roche Diagnostics, Pfizer, and AstraZeneca. CEO Oliver Schacht says the surging interest in this field is typified by the difference between the 2004 American Association for Cancer Research conference, which had half a dozen or so talks or posters on epigenetics, and the 2005 event, which had about 200.

Tool Time

If epigenetic work is to continue breaking new ground, many observers say technology will need to continue advancing. Jones and Martienssen note in their paper that there must be additional improvements in high-throughput technologies, analytical techniques, computational capability, mechanistic studies, and bioinformatic strategies. They also say there is a need for basics such as standardized reagents and a consistent supply of antibodies for testing.

Preston agrees with many of these ideas, and says there is also a need to develop a comprehensive tally of all proteins in the cell and to get better protein modification information. He says universities are recognizing the demand for the talents needed to solve epigenomics problems, and are increasing their efforts to cover these topics in various ways, especially at the graduate school level.

Other groups are doing their part by creating tools to further the field. All the imprinted genes identified so far are tracked in complementary efforts by Morison’s and Jirtle’s groups and the Mammalian Genetics Unit of the U.K. Medical Research Council. The European managers of the DNA Methylation Database have assembled a compendium of known DNA methylations that, although not comprehensive, still provides a useful tool for researchers investigating the roughly 22,000 human genes.

Kunio Shiota, a professor of cellular biochemistry at the University of Tokyo and one of the co-organizers of the November 2005 Tokyo conference, says epigenetic advances will rely in part on a range of processes that are slowly becoming familiar to more researchers—massively parallel signature sequencing (MPSS), chromatin immunoprecipitation microarray analysis (ChIP-chip), DNA adenine methyltransferase identification (Dam-ID), protein binding microarrays (PBM), DNA immunoprecipitation microarray analysis (DIP-chip), and more. Someday, he says, these terms could become fully as familiar as MRI and EKG.

The rapidly growing acceptance of epigenetics, a century after it first surfaced, is a huge step forward, in Jirtle’s opinion. “We’ve done virtually nothing so far,” he says. “I’m biased, but the tip of the iceberg is genomics and single-nucleotide polymorphisms. The bottom of the iceberg is epigenetics.”


A. Genetic Susceptibility

1. Genome-wide Association Studies

The past decade has seen an explosion of studies seeking to understand how genetic variability contributes to disease. These population-based studies, known as genome-wide association studies, seek to detect genetic variants—most commonly single nucleotide polymorphisms (SNPs)—that are associated with complex traits in populations (e.g., susceptibility to cancer). 30 Recent studies of humans and dogs with osteosarcoma revealed multiple SNPs associated with risk for the development of osteosarcoma. 31,32

Numerous studies associating common genetic variants with osteosarcoma risk have been published in the past 15 years. 33� While in these studies risk SNPs have been linked to biological pathways with known relevance to osteosarcomagenesis, their statistical power has been limited by a small sample size because of the rarity of this cancer type. A recently published study sought to overcome such limitations through an international collaborative effort comparing genotypes of 941 human osteosarcoma cases with those of 3291 controls. Data from this study demonstrated a significant association of 3 SNPs with osteosarcoma risk. The first (rs1906953 P = 8.1 × 10 𢄩 ) is located within intron 7 of the glutamate receptor metabotropic 4 (GRM4) gene at 6p21.3. 31 GRM4 plays a role in cyclic AMP signaling, which has been linked to osteosarcoma in a number of studies, 40,41 indicating its plausible ability to confer osteosarcoma risk. The locus maps to a DNase I hypersensitivity region in the Encyclopedia of DNA Elements data set, suggesting that it may contain active regulatory elements. The second and third SNPs (rs7591996 and rs10208273 P = 1.0 × 10 𢄨 and 2.9 × 10 𢄧 , respectively) are located in the gene desert at 2p25.2. While neither of these lead SNPs were associated with regulatory elements or transcription factor binding sites in the Encyclopedia of DNA Elements data set, several surrogate SNPs occurred within transcription factor binding sites or altered known regulatory motifs. 31

Pet dogs develop osteosarcoma that shares many features with the human disease, including tumor histology, gene expression, response to chemotherapy, and risk for pulmonary metastasis. 42 Accordingly, the dog with osteosarcoma provides a valuable model for the study of cancer-associated genes, drug development, and prognostic markers. A recently published genome-wide association study sought to identify risk loci for osteosarcoma in 3 dog breeds at high risk for osteosarcoma. The study included 286 greyhounds, 135 Rottweilers, and 141 Irish wolfhounds, with relatively equal numbers of cases and controls for each breed. The study identified 33 inherited risk loci accounting for 55�% of phenotype variance within a breed. The SNP with the strongest association with osteosarcoma development in greyhounds was located 150 kilobases upstream of the CDKN2A/B genes, which are known to play a key role in osteosarcoma development and progression (see section III, B, 3, a). The top SNP in Rottweilers and Irish wolfhounds alters an evolutionarily constrained enhancer element that was active in human osteosarcoma cells. Loci among all breeds were enriched for genes with key functions in bone differentiation and development. 32

2. Genetic Syndromes Associated with Osteosarcoma

Increased risk of osteosarcoma is associated with a number of well-defined genetic syndromes: hereditary retinoblastoma (germline mutation of the Rb gene), Li-Fraumeni syndrome (germline mutation of the p53 gene), Bloom syndrome (germline mutation of the RECQL2 gene), Werner syndrome (germline mutation of the RECQL3 gene), and Rothmund-Thomson syndrome (germline mutation of the RECQL4 gene). 43 Many of these genes and pathways are commonly altered by somatic mutations in osteosarcoma tumors, although the mechanism of mutation is often distinct from these germline mutations.

B. Somatic Genetic Alterations in Osteosarcoma

As stated above, the rarity of osteosarcoma within the population makes comprehensive genetic and genomic analyses difficult. Numerous studies, often investigating only a subset of common genetic mutations in relatively small patient or cell line cohorts, have been reported. Not surprisingly, such studies make it difficult to confidently assess the frequency and functional consequences of the investigated genetic abnormalities in osteosarcoma. Accordingly, mutation frequencies often are reported as a range identified from multiple publications. Of note, the Therapeutically Applicable Research To Generate Effective Treatments Osteosarcoma Project is currently in progress ( This is a large-scale, multi-institutional collaborative effort to comprehensively identify genetic and epigenetic aberrations in osteosarcoma using a combination of genomic approaches. The results of this study are expected to be published within the next year and should shed new light on the genetic and epigenetic drivers of osteosarcoma. This effort will move osteosarcoma into the postgenomic era, allowing for subsequent interrogations of genetic contributions to osteosarcoma to be completed in silico, a luxury available in the study of other more common cancers but not yet possible for osteosarcoma.

1. Genetic Heterogeneity

As is described in detail below, the mutational landscape of osteosarcoma is highly complex and varies significantly between tumors. 5 This high degree of intertumor heterogeneity confounds our understanding of the molecular pathogenesis of osteosarcoma and may explain some of the difficulty in identifying therapeutic agents that are likely to improve outcomes for the spectrum of patients with osteosarcoma.

2. Chromosomal Abnormalities

A hallmark of osteosarcoma is chromosomal instability (CIN), 44,45 a form of genome-wide alteration characterized by a high degree of losses and gains of full chromosomes or chromosomal segments. 46,47 CIN has been shown to result from a loss of function in cell cycle checkpoint and DNA damage response pathways. 48,49 As described below, these pathways can be dysregulated in osteosarcoma via both genetic and epigenetic mechanisms. Aberrant maintenance of telomeres through a mechanism known as alternative lengthening of telomeres also has been shown to result in CIN in osteosarcoma. 50,51

Unlike many other sarcomas, osteosarcoma lacks a canonical translocation or genetic mutation. 4,5,52� Rather, osteosarcoma is a cancer typified by widespread and heterogeneous abnormalities in chromosomal number and substructure. Osteosarcoma ploidy can range from haploidy to hexaploidy. 52,58 While myriad chromosomal losses/gains have been identified, chromosome 1 is most often gained and chromosomes 9, 10, 13, and 17 are most often lost. 52,58 The most common copy number alterations are deletions of portions of chromosomes 3, 6, 9, 10, 13, 17, and 18 and amplifications of portions of chromosomes 1, 6, 8, and 17. These regions encode a number of tumor suppressors and oncogenes, respectively. 5

3. Tumor Suppressors

A. Rb Pathway

Rb is a critical regulator of the G1-to-S cell cycle transition. In the absence of mitogenic stimuli, Rb remains dephosphorylated and binds to E2F family transcription factors, preventing their activation of cell cycle progression. During normal mitosis, this is reversed via Rb phosphorylation by CDK4. Loss-of-function Rb mutations remove this cell cycle checkpoint. 59 The CDKN2A locus (also known as INK4A) encodes 2 functionally and structurally distinct genes via alternative splicing. The first, p16 INK4a , is a negative regulator of CDK4. The second, p14 ARF , is a key regulator of p53 (see below). Loss of p16 INK4a function alleviates negative regulation of CDK4, resulting in Rb inactivation. 60 Thus, mutations in the CDKN2A gene can phenocopy loss-of-function Rb mutations.

Loss-of-function Rb mutations occur in up to 70% of osteosarcoma cases 61-64 the most common is loss of heterozygosity. 62,65,66 Other types of Rb mutations include structural rearrangements and point mutations. 61�,67� In one study 70% of patients possessed deletions or rearrangements in the CDKN2A gene with the potential to reduce the expression or function of p16 INK4a . 57,70�

B. p53 Pathway

p53 Is a transcription factor that regulates critical genes in DNA damage response, cell cycle progression, and apoptosis pathways. 74 p53 acts as a tumor suppressor in essentially all tumor types, and its function can be affected by mutations to the gene itself or by mutations to up- or downstream mediators of its activity. 75 p14 ARF normally acts to sequester the E3 ubiquitin ligase MDM2 in the nucleolus, preventing it from promoting p53 degradation. 74 p14 ARF is expressed from the same CDKN2A locus that encodes p16 INK4a (see above). 60 Similar to p16 INK4a in the Rb pathway, loss-of-function mutations in the p14 ARF gene can phenocopy mutations to TP53. 74

Loss-of-function TP53 mutations occur in as many as three-fourths of osteosarcoma cases. 5 These mutations include allelic loss (75�%), rearrangements (10�%), and point mutations (20– 30%). 76� A recent study demonstrated that 9.5% of young patients (<30 years of age) with sporadic osteosarcoma carried either a rare germline TP53 exonic variant or the canonical Li-Fraumeni mutation, but that these variants are absent from patients who develop osteosarcoma later in life. 84 As stated above, as many as 70% of osteosarcoma tumors harbor mutations with the potential to affect p14 ARF expression or function and, therefore, alter p53 function. 57,70�

C. Other Tumor Suppressors

Other tumor suppressors associated with deletions or loss of heterozygosity in osteosarcoma include APC, BUB3, FGFR2, LSAMP, RECQL4, and WWOX. 65,85�

4. Oncogenes

A. Rb Pathway

E2F3 and CDK4, both of which counteract Rb control of cell cycle progression, have been estimated to possess gain-of-function mutations in 60% and 10% of tumors, respectively. 88,98,99

B. p53 Pathway

MDM2 is an E3 ubiquitin ligase that acts as a negative regulator of p53 (see above). The MDM2 gene is amplified in 3�% of osteosarcoma tumors. 61,99� COPS3 also promotes proteosomal degradation of p53 and is estimated to cause gain-of-function mutations in 20�% of osteosarcomas. 92,94,103�

C. c-Myc

c-Myc is a key transcription factor that acts as a general amplifier of gene expression, enhancing the transcription of essentially all genes with active promoters in a given cell, 107,108 and is a well-described oncogene with gained function in most tumor types. 109 c-Myc is amplified in 7�% of osteosarcoma tumors 88,94,103,110� and overexpressed in at least 34% of tumors. 115,116

D. Other Oncogenes

Other oncogenes associated with amplifications in osteosarcoma include CDC5L, MAPK7, MET, PIM1, PMP22, PRIM1, RUNX2, and VEGFA. 85,88,94,98,104,114,117� Collectively, the finding that near ubiquitous alterations in the Rb and p53 pathway function in osteosarcoma through both gain- and loss-of-function mutations indicates that loss of cell cycle control and inappropriate DNA damage response are key drivers of osteosarcoma development. The role that these genetic alterations play in tumor progression and metastasis, however, remains less clear.

C. Mechanisms of Genetic Aberration in Osteosarcoma

As described above, osteosarcoma tumors display a compendium of genetic abnormalities with a high degree of intertumor heterogeneity. While certain genes are commonly altered across tumors, the most common genetic characteristic of osteosarcoma tumors is the remarkable breadth of genetic changes relative to normal tissue. Like the mutations themselves, the mechanisms by which these genetic alterations are acquired are likely to represent a broad spectrum both within and across tumors. Classically defined modes of genetic mutation are known to occur in osteosarcoma. For example, point mutations are likely the result of errors in DNA replication and subsequent proof reading, whereas aneuploidy is the result of errors in chromosomal segregation during cell division. 126� In addition to these well-defined modes of genetic mutation, a novel mechanism of mutation acquisition known as chromothripsis has recently been identified. This term describes a phenomenon by which tens to hundreds of genomic rearrangements occur during cancer development in a one-off cellular crisis. This occurs through reciprocal exchange of genetic material within or between chromosomes. In contrast to the gradual mode of accumulated genetic aberrations in cancer cells acquired through singular mutational events and subsequent Darwinian clonal selection, this model posits “punctuated equilibrium” as the primary mode of tumor evolution. In their landmark paper, Stephens et al. 6 demonstrated that chromothripsis occurs in at least 2𠄳% of all cancers and approximately 33% of osteosarcoma tumors.

Epigenetic inheritance

It is clear that at least some epigenetic modifications are heritable, passed from parents to offspring in a phenomenon that is generally referred to as epigenetic inheritance, or passed down through multiple generations via transgenerational epigenetic inheritance. The mechanism by which epigenetic information is inherited is unclear however, it is known that this information, because it is not captured in the DNA sequence, is not passed on by the same mechanism as that used for typical genetic information. Typical genetic information is encoded in the sequences of nucleotides that make up the DNA this information is therefore passed from generation to generation as faithfully as the DNA replication process is accurate. Many epigenetic modifications, in fact, are spontaneously “erased” or “reset” when cells reproduce (whether by meiosis or mitosis), thereby precluding their inheritance.

Epigenetics: Definition & Examples

Epigenetics literally means "above" or "on top of" genetics. It refers to external modifications to DNA that turn genes "on" or "off." These modifications do not change the DNA sequence, but instead, they affect how cells "read" genes.

Examples of epigenetics

Epigenetic changes alter the physical structure of DNA. One example of an epigenetic change is DNA methylation — the addition of a methyl group, or a "chemical cap," to part of the DNA molecule, which prevents certain genes from being expressed.

Another example is histone modification. Histones are proteins that DNA wraps around. (Without histones, DNA would be too long to fit inside cells.) If histones squeeze DNA tightly, the DNA cannot be "read" by the cell. Modifications that relax the histones can make the DNA accessible to proteins that "read" genes.

Epigenetics is the reason why a skin cell looks different from a brain cell or a muscle cell. All three cells contain the same DNA, but their genes are expressed differently (turned "on" or "off"), which creates the different cell types.

Epigenetic inheritance

It may be possible to pass down epigenetic changes to future generations if the changes occur in sperm or egg cells. Most epigenetic changes that occur in sperm and egg cells get erased when the two combine to form a fertilized egg, in a process called "reprogramming." This reprogramming allows the cells of the fetus to "start from scratch" and make their own epigenetic changes. But scientists think some of the epigenetic changes in parents' sperm and egg cells may avoid the reprogramming process, and make it through to the next generation. If this is true, things like the food a person eats before they conceive could affect their future child. However, this has not been proven in people.

Epigenetics and cancer

Scientists now think epigenetics can play a role in the development of some cancers. For instance, an epigenetic change that silences a tumor suppressor gene — such as a gene that keeps the growth of the cell in check — could lead to uncontrolled cellular growth. Another example might be an epigenetic change that "turns off" genes that help repair damaged DNA, leading to an increase in DNA damage, which in turn, increases cancer risk.


C.D.A. and T.J. are grateful to all of our laboratory members, past and present, and to our scientific colleagues in the field for helping us 'write' the history that we review here. Their hard work, insights and passion for the field of epigenetics have made the last 20 years such an enjoyable ride. We thank P. Jones (Grand Rapids, Michigan, USA) and A. Tarakhovsky (New York, USA) for giving us feedback on this manuscript and M. Onishi-Seebacher (Freiburg, Germany) for help with the figure preparations and reference listings. Our objective in this article is to be more reflective than comprehensive, and admittedly, we have brought forward our personal views. We ask for understanding from those colleagues whose important contributions could not be explicitly mentioned.

Researchers in epigenetics, genetics, systems biology, and computational biology with a background in molecular biology, genetics

Section II: Where Am I? Genomic Features and DNA Sequence Principles Defining Sites of Epigenetic Regulation: Machine Learning

Chapter 1. Computational Identification of Polycomb/Trithorax Response Elements

  • Abstract
  • Introduction to Polycomb/Trithorax Response Elements
  • 2003 Ad Hoc Approach to PRE Prediction, Together with its Particular Motivations
  • Evaluating Classification Performance
  • Results of 2003 PRE Prediction
  • New Motifs Discovered
  • Recasting PRE Prediction as a Machine Learning Problem
  • Misclassification Costs and the Trade-off Dimension
  • Evolutionary Analysis and Search-Space Reduction
  • Today: Genome-Wide Profiling Data
  • How Good is Our Method When Evaluated Under These Data?
  • Sensitivity and Specificity of Genome-Wide Profiling
  • Conclusion
  • References
  • Glossary
  • List of Acronyms and Abbreviations

Chapter 2. Modeling Chromatin States

  • Abstract
  • The Purpose of Modeling Chromatin States
  • The Common Approach
  • What Have We Learned From those Models?
  • Static Versus Dynamical Models
  • Chromatin States are Attractors
  • References
  • Glossary
  • List of Acronyms and Abbreviations

Chapter 3. Crossing Borders: Modeling Approaches to Understand Chromatin Domains and Their Boundaries

  • Abstract
  • Introduction
  • The Expanding Universe of Structural Domains
  • Experimental Techniques
  • Modeling Higher Order Chromatin Structure
  • Formalizing Compartment Calling and Boundary Prediction
  • Outlook
  • References
  • List of Acronyms and Abbreviations

Chapter 4. Inferring Chromatin Signaling From Genome-Wide ChIP-seq Data

  • Abstract
  • Introduction
  • Experimental Techniques
  • Modeling
  • Perturbing the System
  • Outlook
  • References
  • Glossary
  • List of Acronyms and Abbreviations

Section III: Everything’s Moving: In Vivo Dynamics of Epigenetic Regulators: Kinetic Models Based on Ordinary Differential Equations

Chapter 5. “In Vivo Biochemistry”: Absolute Quantification and Kinetic Modeling Applied to Polycomb and Trithorax Regulation

  • Abstract
  • Introduction
  • Absolute Quantification In Vivo: A Technical Challenge
  • Everything’s Moving: Methods for Measuring Kinetic Parameters In Vivo
  • In Vivo Biochemistry of an Epigenetic System: What Did We Learn?
  • Mathematical Modeling of the System: Defining What We Don’t Know
  • Back to the Bench: Testing the Model by Perturbing the System
  • Outlook
  • Acknowledgments
  • References
  • Glossary
  • List of Acronyms and Abbreviations

Chapter 6. Modeling Distributive Histone Modification by Dot1 Methyltransferases: From Mechanism to Biological Insights

  • Abstract
  • Introduction
  • Modeling Histone Modification by Dot1
  • Acknowledgments
  • References
  • Glossary
  • List of Acronyms and Abbreviations
  • Appendix: Parameter Estimation and Formal Mathematical Description of the Models

Section IV: Reconciling Randomness and Precision: Bistable Epigenetic Memory and Switching: Stochastic Models

Chapter 7. Modeling Bistable Chromatin States

  • Abstract
  • Cell Memory by Chromatin-Based Epigenetics
  • Modeling Approach and Philosophy
  • Nucleosome-Mediated Epigenetics
  • DNA Methylation-Mediated Epigenetics
  • Outlook
  • References
  • Glossary
  • List of Acronyms and Abbreviations

Chapter 8. Quantitative Environmentally Triggered Switching Between Stable Epigenetic States

  • Abstract
  • Introduction
  • Memory of the Cold is Digital and is Stored Locally in the Chromatin
  • Cold Registration is Digital
  • Model Validation
  • Outlook
  • References
  • Glossary
  • List of Acronyms and Abbreviations

Section V: The Third and Fourth Dimensions: Chromosomal Long Range Interactions: Polymer Models

Chapter 9. On the Nature of Chromatin 3D Organization: Lessons From Modeling

  • Abstract
  • Introduction
  • Polymer Models of Chromatin
  • Models to Reconstruct 3D-Conformation From Contact Data
  • Conclusions
  • References
  • Glossary

Chapter 10. From Chromosome Conformation Capture to Polymer Physics and Back: Investigating the Three-Dimensional Structure of Chromatin Within Topological Associating Domains

  • Abstract
  • Introduction
  • From 5C/Hi-C Maps to Three-Dimensional Structures
  • Description of the Data-Driven Polymer Model
  • What can be Learned From Data-Driven, Polymer-Based Reconstructions of Chromosome Structure?
  • What are the Molecular Mechanisms Behind Interaction Energies?
  • Conclusions and Perspectives
  • References
  • Glossary
  • List of Acronyms and Abbreviations

Chapter 11. A Combination Approach Based on Live-Cell Imaging and Computational Modeling to Further Our Understanding of Chromatin, Epigenetics, and the Genome

  • Abstract
  • Introduction: Genomic DNA, Nucleosomes, and Chromatin
  • Experimental Techniques: Single-Nucleosome Imaging in Living Cells Using Super-Resolution Microscopy
  • Modeling: Monte Carlo Simulation of Chromatin Dynamics
  • Another Result From the Model: Physical Size of Transcription Complexes
  • Perspective
  • References
  • Glossary
  • List of Acronyms and Abbreviations

Chapter 12. Capturing Chromosome Structural Properties From Their Spatial and Temporal Fluctuations


In eukaryotic cells, DNA is packed as chromatin whose functional units are nucleosomes. Each nucleosome is composed of an octamer of four core histones (H3, H4, H2A, and H2B), around which is wrapped 147 base pairs of DNA [1]. The globular regions of the histones form the core of the nucleosome, while the N-terminal tails protrude from the nucleosomes and are enriched with a variety of posttranslational modifications (PTMs). PTMs can also occur on the lateral surface of the nucleosome core regions of histones that are in contact with the DNA [2], with both tail and core modifications influencing the chromatin structure by altering the net charge of histones, by altering inter-nucleosomal interactions, and by facilitating the recruitment of specific proteins such as bromo-, chromo-, Tudor, PWWP, MBT, and PHD domain-containing proteins [1].

Histone modifications, and the enzymes implementing them, can contribute to chromatin compaction, nucleosome dynamics, and transcription. These modifications can be implemented in response to intrinsic and external stimuli. Dysregulation of these processes can shift the balance of gene expression and are therefore frequently observed in human cancers, either by gain or loss of function, overexpression, suppression by promoter hypermethylation, chromosomal translocation, or mutations of the histone-modifying enzymes/complexes or even the modification site of the histone [2,3,4]. Indeed, mutations in chromatin-bound proteins are among the top frequently mutated targets in cancer [5]. The dysregulation of certain chromatin-associated proteins may act as drivers in certain types of cancer [6, 7]. Consequently, abnormal cellular proliferation, invasion, and metastasis and chemoresistance may occur during disease progression [8]. However, there is still a substantial base of knowledge that needs to be gained in order to define the roles of histone modifications and its enzymatic machinery during development and disease settings.

This review focuses on the recent progress in our understanding of histone modifications in mammals, highlighting the mechanisms of PTMs in cancer with the availability of new assays, techniques and inhibitors for fine mapping the modifications genome-wide and the potential to use in the treatment of cancers. We will define what marks are epigenetic, and why and how the balance is maintained between different modifications for proper regulation of gene expression. We will also address the histone modifications in cancer as biomarkers of cancer progression and/or prognosis.

Histone modifications, modifiers, and their functions in development and cancers

Transcription activation and repression are controlled by an array of histone modifiers and chromatin-bound proteins. A balance between specific modifications and modifiers are maintained at the steady state of the cell to maintain the chromatin structure, execute the proper gene expression program, and control the biological outcome (Fig. 1). Once the balance is disrupted, cell phenotypes may be altered and primed for disease onset and progression [9,10,11]. Therefore, understanding the functions of the key regulators of histone modifications will help us to develop chemical probes to maintain the homeostasis and restore the balanced state of the cell (Fig. 2).

Balanced states of transcription maintained by the versatile chromatin proteins and histone modifications. The balanced states of transcription are maintained by the chromatin modifiers and histone modifications. The histone-modifying enzymes are depicted as apples (activation) and oranges (repression) in the two weighing pans respectively. The chromatin states are maintained and balanced by a number of activation marks and repression marks. Histone marks highlighted in bold are considered to be hallmarks of euchromatin (H4K16ac) and heterochromatin (H3K9me3 and H3K27me3) respectively

Pharmacological restoration of the epigenetic balance of gene expression in human cancers. a MLL translocation and SEC promote the leukemogenesis in MLL-rearranged leukemia. Enhancing the wild-type MLL1 recruitment to chromatin by hijacking the IL1/IRAK4 and CKII/tasapse1 pathways displaces the MLL chimera and SEC and inhibits leukemogenesis. b MLL3 mutation in the PHD leads to the loss of function of MLL3/COMPASS and decreased enhancer H3K4 methylation. EZH2 inhibition by small molecules (e.g., GSK-126) inhibits EZH2 enzymatic activity and decreases H3K27 methylation to restore the tumor suppressor gene expression. c H3K27M mutation leads to the global increase of H3K27 acetylation and aberrant gene expression. Inhibition of BRD4 by small molecules (e.g., JQ-1) displaces the protein from chromatin and restores the normal-like gene expression and inhibits DIPG from progression

We primarily focus this review on methylation, acetylation, and ubiquitination of the PTMs associated with the development and cancers. The other types of modifications including those that are newly identified will also be briefly discussed towards the end of the review. The major types of modifications of histones on tails or within the nucleosome core that are discussed in this review are summarized in Table 1.


Histone methylation is a dynamic process with key roles in development and differentiation [30, 31]. For instance, H3K4 methyltransferases play crucial roles on Hox gene regulation during developmental stage [32, 33]. Aberrant levels of histone methylation are likely to play a causal role in tumorigenesis. The outcomes of methylation on histones are highly context dependent and can be associated with different gene expression status. Histone methylation is intimately associated with transcriptional regulation by influencing chromatin architecture, recruiting transcriptional factors, interacting with initiation and elongation factors, and affecting RNA processing [34].

Histone methylation takes place on the side-chain nitrogen atoms of both lysine and arginine residues, most heavily on histone H3 followed by H4 [35]. Multiple methylation states exist for both lysine and arginine methylation, and these can elicit different outcomes for transcriptional regulation. Lysine can be mono-, di-, or trimethylated by six major classes of histone lysine methyltransferase complexes (KMT1-6) [36]. The KMT1 family contains at least four members in mammals including SUV39H1/2, G9a, GLP, and SETDB1, with H3K9 as the substrate for methylation [37, 38]. The KMT2 family enzymes are found within the macromolecular complex called complex of proteins associated with Set1 (COMPASS) and deposit mono-, di-, or trimethyl marks on H3K4 [16,17,18]. The KMT3 family contains NSD1, NSD2 (WHSC1), and NSD3 (WHSC1L1) and primarily methylates H3K36 [39]. The KMT4 family has DOT1L as the sole member, which implements H3K79 methylation [24, 40]. The KMT5 family comprises the PR-Set7 and SUV4-20H1/2, which implement H4K20 monomethylation and di-/trimethylation, respectively [41]. The KMT6 family includes the functionally redundant enzymes EZH1 and EZH2 for H3K27 mono-, di-, and trimethylation [23].

Lysine methylation has been known to be a reversible process since the discovery of the lysine demethylase LSD1 [42]. There are at least six families of histone lysine demethylases with both unique and overlapping functions. The KDM1 family includes LSD1 (KDM1A) and LSD2 (KDM1B), both of which can demethylate H3K4me2/me1 but not H3K4me3 [42, 43]. Moreover, LSD1 can also work on H3K9 demethylation through the switching from its repressive complex with CoREST interaction to an activating complex with androgen receptor (AR) interaction [19, 44,45,46]. All other family members of lysine demethylases harbor the Jumonji (JmjC) domain, which due to the different chemistry involved have the potential for removing the trimethyl mark, unlike the LSD family. JHDM1A (KDM2A) and JHDM1B (KDM2B) belong to the KDM2 family with activities towards H3K36me2/me1 and H3K4me3 [19]. JHDM1A was the first JmjC domain-containing demethylase identified [47]. The KDM3 family comprises KDM3A, KDM3B, and JMJD1C, with demethylase activities for H3K9me2/me1 [19]. The KDM4 family includes KDM4A, KDM4B, KDM4C, and KDM4D, with diverse demethylase activities towards H3K9me3/me2 and H3K36me3/me2. The KDM5 family contains KDM5A, KDM5B, KDM5C, and KDM5D, all of which can demethylate H3K4me3/me2. KDM6 family includes UTX (KDM6A), JMJD3 (KDM6B), and UTY. UTX and JMJD3 are specific for H3K27me3/me2, while the Y-linked paralog, UTY, has little catalytic activity. Several of the KDMs have been considered as contributing factors for the development of multiple cancers, and thus postulated to be potential drug targets. KDM inhibitors could be valuable for both elucidating their cellular functions and as potential therapeutics [48,49,50].

The most well-characterized methylation marks on lysine residues associated with transcriptional activation include H3K4 [51], H3K36 [39], and H3K79 [24], and transcriptional repression-associated methylations occur on H3K9 [37], H4K20 [41], and H3K27 [52] (Fig. 1). Notably, the co-occurrence of large regions of H3K27 methylation harboring smaller regions of H3K4 methylation marks constitutes the “bivalent domains,” which are thought to be important for maintaining pluripotency by silencing developmental genes in embryonic stem cells (ESCs) while keeping them poised for activation during developmental stage [53,54,55]. Altering the balance of these histone modifications of gene expression may contribute to the pathogenesis of cancers [9, 10].

Histone H3K4 methylation is implemented by methyltransferases in the COMPASS family including SET1A, SET1B, and MLL1-4 at enhancers and promoters [16, 54, 56,57,58,59,60]. Different subunits of COMPASS have also been shown to regulate H3K4 di- and/or trimethylation including WDR5, Ash2L, RbBP5, and Dyp30 which are subunits shared by all COMPASS family members [61, 62]. SET1A and SET1B primarily trimethylate histone H3K4 at promoters [16, 63], albeit the majority of SET1B is localized in the cytoplasm [57]. Interestingly, the oncogenic function of SET1A has been implicated in breast cancer metastasis, lung cancer, and colorectal cancer tumorigenesis through both methylation of histones and the non-histone substrate YAP respectively [64, 65]. MLL1 and MLL2 implement di- and trimethylation at promoters and/or Polycomb response elements (PRE), and MLL2 can also methylate H3K4 at both promoters of bivalent genes and enhancers [17, 54, 59]. Interestingly, the reconstituted MLL1 SET domain with WRAD complex allows it to mono-, di-, and trimethylate H3K4 in vitro [61, 66], although the monomethylation activity of MLL1 in vivo was not demonstrated so far. MLL3 and MLL4 are capable for the monomethylation of H3K4 at enhancers [67]. The methylation kinetics by MLL1 core complex demonstrated in the in vitro reconstitution assays suggests that the di- or trimethylation by SET1A, SET1B, MLL1, and MLL2 may not require the monomethylation by MLL3 and MLL4. This was also supported by the distinct genomic localization of different COMPASS methyltransferases demonstrated in ChIP-seq of these factors [58, 59, 67, 68].

Although structures of COMPASS family of H3K4 methyltransferases have been resolved recently [61, 69, 70], small molecule inhibitors that directly inhibit the enzymatic activities are still unavailable. Development of these inhibitors would not only serve as molecular tools to dissect the detailed functions but also contribute to clinical treatment of various cancers with aberrant activities or expression of COMPASS methyltransferases. In addition to the well-studied methyltransferase activity of the COMPASS family, recent efforts have been devoted to investigating the catalytic independent roles of COMPASS methyltransferases (and the same approach could be applied to other types of histone modifiers) [58, 71,72,73,74]. For instance, the requirement of SET1A in ESC proliferation and self-renewal is unaffected by removal of the catalytic SET domain, while the SET domain is required for proper differentiation [58]. Likewise, SET1B, independent of its SET domain, is essential for suppressing ADIPOR1 signaling in the cytoplasm for eliciting tumorigenic effect [57]. Given the importance of SET1B-ADIPOR1 signaling in triple negative breast cancer (TNBC), AdipoRon, the ADIPOR1 agonist, has been proposed as a novel therapeutic strategy for clinical treatment of TNBC [57].

MLL1 is frequently mutated through translocation with other oncogenic partners in acute myeloid and lymphoid leukemia (AML and ALL), accounting for

80% of childhood leukemia and 5–10% of adult leukemia [75]. The chimeric proteins lack the catalytic SET domain of MLL1 and drive leukemogenesis. Recently, we identified strategies for the treatment of MLL-rearranged leukemia via stabilizing the wild-type copy of MLL to attenuate the aberrant transcription mediated by MLL fusion proteins and their oncogenic co-factor, the super elongation complex (SEC) [76, 77] (Fig. 2a). These studies also indicate that not only the catalytic activities but also the protein levels/protein turnover determine the outcome of their activities. Nonetheless, completely knocking out the oncogenic fusion proteins still remains a hard-to-target issue in MLL-rearranged leukemia.

MLL3 and MLL4 are both found to be highly mutated in cancer [4, 10, 18, 78]. MLL3 has a mutation hot spot at the plant homeodomain (PHD) cluster, whereas MLL4 mutations are more evenly distributed throughout the protein [10, 18]. Our recent study documented that mutations within the MLL3 PHD cluster disrupt its interaction with the BAP1 tumor suppressor and correlates with poor patient survival [10]. Since MLL3 and MLL4 catalytic activity is dispensable for development and enhancer RNA synthesis [72, 73], it will be important to investigate catalytic and non-catalytic tumor suppressor roles of these proteins.

The histone H3K4me3 mark can help recruit the chromatin remodeling factors CHD1 [79] and BPTF [80], chromatin remodelers which can help open chromatin. In addition, our laboratory discovered that BRWD2/PHIP may recognize H3K4 methylation marks through a CryptoTudor domain adjacent to a bromodomain, suggesting synergy between acetylation and methylation in transcription regulation by this protein [81]. Pharmacologically targeting the catalytic activity of COMPASS methyltransferases, the protein-protein interactions (PPI) between key COMPASS subunits, or the binding of proteins to methylated H3K4, can each be harnessed to further facilitate the understanding of the downstream events and open new therapeutic approaches for cancer treatment. PPI disruptors of the Menin-MLL interaction, namely MI-463, MI-503, and M-525 [82, 83] and OICR-9429 for the WDR5-MLL interaction [84], have been developed with the hope of treating MLL-rearranged and CEBPA mutant leukemia. A complete list of compounds discussed in this review is listed in Table 2.

Histone H3K36me3 is detected in the body of actively transcribed genes due to the association of the enzyme SET2 with the phosphorylated form of CTD of RNA Pol II [39]. The function of H3K36me2, implemented by ASH1L and the NSD1-3 family, is less well-understood. Recently, a potential crosstalk between H3K4me3 and H3K36me2 was shown to occur at the hub of LEDGF [98]. LEDGF directly interacts with Menin and MLL1 through its integrase-binding domain and is required for MLL1-dependent transcription and leukemic transformation [99, 100]. Meanwhile, LEDGF binds to dimethylated H3K36 through its PWWP domain [98, 101]. LEDGF has drawn increasing attention since studies have shown that LEDGF is essential in MLL-rearranged leukemia, but not hematopoiesis, which raised the therapeutic potential of targeting LEDGF effectively without general side effects in the hematopoietic system [102, 103]. Due to the multifaceted roles of LEDGF and its interactions with a plethora of proteins with divergent functions [99, 104, 105], whether its role during leukemogenesis is dependent on MLL1 needs to be determined. Limited success for treating MLL-rearranged leukemia has been gained through targeting LEDGF using CP65, a cyclic peptide used for the inhibition of HIV viral replication, since the same domain on LEDGF bind to both the HIV integrase and MLL1 [92]. Degrading LEDGF may be a new direction using proteolysis targeting chimera (PROTAC) technology [106] which will be discussed in later sections. H3K36 me3 can also prevent methylation by PRC2 of the nearby H3K27 residue on the same histone tail [107].

Histone H3K79 methylation mark implemented by DOT1L, the only enzyme responsible for the deposition, is on the globular domain of the histones correlated with active gene expression [2, 40, 108]. DOT1L is also the only enzyme catalyzing lysine methylation that has a methyltransferase distinct from a SET domain, and a demethylase for H3K79 has not been identified to date. DOT1L is found in a complex named DotCom with MLL translocation partners AF9, or its paralog ENL, and AF10 [109]. DOT1L activity also promotes breast cancer cell proliferation and metastasis [110]. The aberrant upregulation of H3K79 methylation in leukemia [111] led to the development and use of DOT1L inhibitor EPZ-5676 for the treatment of MLL-rearranged leukemia [85] which is currently under clinical investigation [86].

Histone H3K9 and H3K27 methylations are required for the formation of distinct forms of heterochromatin [112]. Histone H3K9me3 and H3K27me3 have been proposed to be the only true “epigenetic marks” since they have defined mechanisms for being heritable after DNA replication [112]. The deposition machineries of H3K9me3 and H3K27me3 share a distinct “write-and-read” mechanism with both enzymatic activity and the ability to bind and recognize the modification within the same enzyme or enzyme complex, thus allowing for a positive feedback loop. For H3K9me3, SUV39H1 contains both the write-and-read module (chromodomain and SET domain) [113] and the methyl-lysine recognition further promotes methylation activity [114]. HP1 proteins—HP1α (CBX5), HP1β (CBX1), and HP1γ (CBX3), contain the methyl-lysine-binding chromodomain [115] and perform important roles in heterochromatin formation. Methylation of histone H3 lysine 9 written by SUV39H1 creates a binding site for HP1 proteins, which in turn recruit more SUV39H1, and this mechanism contributes to the propagation of heterochromatin formation [116]. In the case of H3K27me3, EZH2 implements H3K27 methylation within the PRC2 complex, while the EED subunit recognized this methylation and allosterically further activates the SET domain of EZH2 [23, 117]. Similar to the distinct distributions of H3K4me1/2/3 by COMPASS family, H3K27me1/2/3 distributions throughout the genome are mutually exclusive to each other, with H3K27me3 mainly at promoters (especially at bivalent genes), H3K27me2 at intergenic regions, and H3K27me1 in the gene bodies of actively transcribed genes [9]. Because EZH2 and SUZ12 subunits of PRC2 are required for HP1α stability, the heterochromatin markers H3K27me3 and H3K9 methylation may cooperate to maintain heterochromatin protein 1α at chromatin highlighting the crucial crosstalks between H3K9me2/3 and H3K27me3 pathways of gene silencing [118]. EZH2 inhibitors are frequently used to prevent unwanted histone methylation of tumor suppressor genes when EZH2 is aberrantly expressed in cancer cells or mutated (gain of function, Y641 in the SET domain) [9, 77, 119]. Our recent study demonstrated that cancer cells harboring MLL3 PHD mutations are more sensitive to the depletion of EZH2, SUZ12, and EED in the PRC2 complex [10]. Harnessing the synthetic lethality and dependency of the MLL3-UTX-PRC2 regulatory axis is a promising therapeutic stratification for the use of EZH2 inhibitors (Fig. 2b).

In addition to the frequent mutations of a broad spectrum of histone modifiers, several mutations on histone tails (H3K27M, H3K36M and H3G34V/R) have been found to be associated with tumorigenesis in different types of cancers [120]. A common feature of the mutated “oncohistones” is that they all impede the deposition of the proper histone modification at the mutation site, or surrounding residues in the case of H3G34 mutation, leading to transcriptional reprogramming and tumorigenesis [120]. A recurrent single-nucleotide substitution resulting in H3.3K27M has been discovered in diffused intrinsic pontine gliomas (DIPGs) accompanied with a global loss of H3K27me3 and reduced PRC2 catalytic activity, but higher levels of H3K27 acetylation, making it promising for BRD4 inhibition therapy [9, 87] (Fig. 2c). Histone H3K36M mutation is found in chondroblastoma, head and neck squamous cell carcinoma, and colorectal cancer, while H3G34V/R mutations have been found in both glioma and bone cancers [120]. Despite the limited progress in understanding the roles of these mutated histones in cancer development, there is an unmet need to perform a comprehensive synthetic lethality study to investigate whether the tumors bearing certain mutations are more dependent on certain signaling pathways for the exploration of potential therapeutic strategies to more effectively tailor treatment regimens for patients.

Methylation of H4K20 is associated with both transcriptional activation and repression depending on methylation states. H4K20me1 catalyzed by PR-Set7 is associated with activation and marking points of origin for DNA replication [121, 122]. On the other hand, H4K20me2/3 catalyzed by SUV4-20H1/2 is associated with repression of transcription by maintaining pericentric and telomeric heterochromatin [121]. H4K20me2/3 methylation can enhance chromatin condensation in vitro [25]. Loss of H4K20me3 has been described as a hallmark of cancer [26]. Dynamic regulation of H4K20 methylation was recently reported in C. elegans, where a new subfamily of the Jumonji C (JmjC) histone demethylases, DPY-21, was found to convert H4K20me2 to H4K20me1 to control higher-order structure of the two female X chromosomes, promote chromosome compaction, and repress gene expression [27]. Whether the human counterpart, RSBN1, has a role in reduced H4K20me3 in human cancer remains to be investigated.

In addition to the versatile states of lysine methylation, arginine residues can also be modified via monomethylation and symmetric and asymmetric dimethylation (MMA, SDMA, and ADMA) by a subset of protein arginine methyltransferases (PRMTs) including PRMT1, CARM1, PRMT5, and PRMT6 [123, 124]. The removal of the arginine methylation can occur through its deimination to citrulline by PADI4 [21] (please refer to section “Other types of histone modifications” for further discussion). PRMTs methylate not only histone tails, but also a large number of non-histone substrates [123]. This should be taken into consideration when interpreting studies using PRMT inhibitors since the outcomes may be through affecting numerous signaling pathways regulated by the substrates of a particular PRMT member. Nevertheless, success has been made developing specific inhibitors for CARM1/PRMT4 for the treatment of multiple myeloma [125], which can methylate H3R17me2a and H3R26me2a involved in transcriptional activation [123].s

Despite the tremendous progress made discovering the families of histone methyltransferases, demethylases, and the mutations of histones in cancer, there is still much to be learned of the biological roles of these proteins and their interplay in different developmental stages and disease settings.


Acetylation is a reversible modification on the ε-amino group of lysine residues that is controlled by two groups of enzymes: histone acetyltransferases (HATs) [126] and histone deacetylases (HDACs) [91]. There are three major families of HATs in humans that are well-studied including GNAT (HAT1, GCN5, PCAF), MYST (Tip60, MOF, MOZ, MORF, HBO1), and p300/CBP [127]. Notably, HATs can also catalyze the acetylation of a broad range of non-histone proteins including tumor suppressors and oncogenes, namely p53, Rb, and Myc to regulate protein stability, DNA binding, protein-protein interaction, enzymatic activity, or protein localization [89]. Acetylation of the histone tails neutralizes the positively charged lysines, which has been suggested to disrupt the interaction between the tail and the negatively charged nucleosomal DNA to facilitate opening of chromatin to promote active transcription. Acetylated lysines on chromatin can also promote open chromatin by being bound by a variety of bromodomain-containing transcription factors, including those in chromatin remodeling complexes such as the BAF complex [128, 129].

The well-conserved mark H4K16ac reduces chromatin compaction in vitro [130] and is associated with more open chromatin in vivo [131]. Genetic studies in Drosophila have shown that when H4K16 has been changed to arginine, female flies are viable and only males die due to the special role of H4K16ac in promoting X chromosomal dosage compensation. Reduced H4K16ac is associated with a variety of cancers [26, 126] and may in some cases have prognostic value [28].

Acetylation on H3K27 is prominent at active promoters and, together with p300 and H3K4me1, marks active enhancers [29, 132]. Histone H3K27ac is deposited by CBP/p300 and serves in part to counteract Polycomb silencing since acetylation precludes methylation by PRC2 at this site [133]. Acetylation not only affects the charge and promotes structural changes of chromatin, but the acetyl group also functions as a signal recognized by bromodomain (BRD)-containing proteins (acetyl-lysine binding proteins) such as the bromodomain and extraterminal domain (BET) bromodomain proteins BRD2, BRD3, and BRD4 [128]. Mutation, aberrant expression, and gene fusions have been found in these proteins and implicate their roles in cancer development and progression [22, 128, 134].

Deacetylation of histones by the HDACs diminishes the accessibility of transcription factors by forming a closed chromatin conformation [135]. There are 18 HDACs in mammals divided into four major families: Class I (HDACs 1, 2, 3, and 8) are ubiquitously expressed in human cell lines and tissues in the nucleus Class II (HDACs 4, 5, 6, 7, 9 and 10) exhibit tissue-specific expression and can shuttle between the nucleus and cytoplasm Class III or sirtuins (SIRT1-7), which are NAD + dependent and have a very distinct catalytic mechanism for deacetylation compared with other classes of HDACs Class IV has only one recently identified member, HDAC11 [136]. HDAC11 is capable of deacetylating divergent histone sites, making the substrate specificity low and functionally redundant in certain scenarios [3]. Similar to HATs, HDACs also have a number of non-histone substrates such as p53, Hsp90, TCF, and β-catenin [89].

Due to the dynamic nature of histone acetylation, inhibitors targeting HDACs, HATs, and bromodomain proteins have been developed and are in different preclinical and clinical stages for cancer therapy. Overexpression of HDACs has been found in a variety of cancers and correlates with significant decrease in both disease-free and overall survival and predicts poor patient prognosis [136,137,138]. HDAC activity is a key mediator of survival and tumorigenic capacity, making it a compelling target for a panel of different cancers, and indeed, HDAC inhibitors are the most mature epigenetic drugs developed to date. Vorinostat and romidepsin are FDA-approved HDAC inhibitors for the treatment of refractory cutaneous T cell lymphoma (CTCL), and there are many others currently under different stages of clinical assessment, most of them focused on hematological malignancies [138]. It should be noted that some of the HDAC inhibitors also exhibit inhibition activities towards PI3K (CUDC-907), EGFR (CUDC-101), and others. This may be desirable from the clinical perspective for limiting the dosage and toxicity by dually targeting two oncogenic pathways. However, it is a caveat when using this compound to study the molecular function of HDACs since they may exert efficacies through signaling pathways other than histone deacetylation. Despite the huge success targeting HDACs in the clinic, targeting HATs has lagged behind. C646 [139] and A-485 [90] are the only relatively potent and selective synthetic inhibitors for p300/CBP based on the virtual screening using a p300 HAT/Lys-CoA crystal structure. Their efficacy in preclinical models needs to be rigorously established in future studies.

The BET-bromodomain proteins are extensively studied, benefiting greatly from the availability of selective inhibitors [88, 140]. The strong phenotypic changes by BET protein inhibition justify the discovery and development of BET inhibitors to diminish their functions in hematological and solid tumors. The BET-bromodomain-specific inhibitors, JQ1 [141], I-BET [142], and I-BET151 [93], represent the initial successes of BET inhibitor development. The initial success of BET-bromodomain degraders has led to a series of studies increasing the potency of the compounds by linking the BET inhibitor moiety to the ligand that recruits the E3 ligase using the PROTAC technology [106] for degradation for the treatment of both hematologic disorders and solid tumors such as castration-resistant prostate cancer and triple negative breast cancer (TNBC) [94, 95, 143]. BET degraders, dBET1 and dBET6, potently and specifically target the BET bromodomain proteins for the treatment of AML and T-ALL [144, 145]. In this case, a phthalimide moiety is appended a competitive antagonist of BET bromodomains JQ-1 and the protein will undergo cereblon (CRBN) E3 ubiquitin ligase-dependent degradation [144]. Interestingly, studies have shown that thalidomide-targeted degradation can also be applied to selectively target the “undruggable” Zinc Finger (ZF) transcriptional factors using derivatized thalidomide analogs [96].


Monoubiquitination of histones most commonly occurs on H2A and H2B [97]. H2AK119 ubiquitination is implemented by RING1A/B in the PRC1 complex [146] and is removed by the BAP1 deubiquitinase complex [147]. H2AK119ub1 is linked with chromatin compaction and transcriptional silencing [146]. H2BK120ub1 is carried out by the UBE2A/B (RAD6) E2 ubiquitin conjugating enzyme and the RNF20/40 E3 ligase at actively transcribed genes [12]. The presence of H2BK120ub1 is coupled with high levels of methylation on H3K4 and H3K79 [13, 15, 148]. Similar to other modifications, histone ubiquitination is also linked with transcriptional activation and silencing by affecting a higher-order chromatin structure [97] and behaves as a signal for subsequent histone modifications via recruiting other machineries [149].

Crosstalk among epigenetic factors occurs at two major levels. First, numerous studies have demonstrated crosstalk between different histone modifications [3, 60]. For instance, histone H2B monoubiquitination is a prerequisite for H3K4 methylation by COMPASS, and H3K79 methylation by DOT1L [60]. On the contrary, H3K4 methylation by MLL/COMPASS inhibits the deposition of H3R2me2a by PRMT6, and vice versa, making the two marks mutually exclusive [3]. Second, the crosstalk between histone modifiers themselves can control normal and malignant states of cell proliferation. For instance, the BAP1 H2A deubiquitinase recruits H3K4 monomethylase MLL3 to monomethylate gene enhancers, while disruption of the interaction between BAP1 and MLL3 contributes to the pathogenesis of multiple cancers [10]. The H3K27 demethylase UTX is also a key component of MLL3/COMPASS, and its recruitment and activity is also dependent on BAP1 to execute proper functions at enhancers [10]. Other histone demethylases are also found in large histone-modifying complexes such as KMTs and HDACs. In this case, LSD1 is found in the CoREST-HDAC complex in association with HDACs, CoREST, and BHC80, and the interaction with these factors regulates its stability and activity [46].

Other types of histone modifications

Phosphorylation of histone tails adds a negative charge to the histone tails, thus changing the conformation of chromatin structure and interactions with transcription factors. Histone H3S10 phosphorylation is a well-characterized modification associated with chromosome condensation during mitosis and is implemented by the Aurora kinases, while H3S10p implemented by the MSK/Jil1 family is involved in positive regulation of transcription [150]. Dephosphorylation of this site is mediated by PP2A and is related to repression of gene expression [150]. Phosphorylation on Serine 139 of H2AX (γH2AX) is induced by stimuli of DNA damage and is an early response in DNA double-strand break signaling. Multiple kinases can mediate the phosphorylation on this particular site including ATM, ATR, and DNA-PK [151]. Although these two modifications have been intensively used as markers of cell cycle progression and the DNA damage response, the consequences of the modification and downstream events remain largely unknown [20]. Serine 31 phosphorylation is unique to histone H3.3 and was originally identified to be localized adjacent to centromeres in metaphase chromosomes [14]. It is also a mitosis-specific marker different from H3 S10P and S28P in terms of timing and localization [14]. Banaszynski’s group recently found that the function of H3.3.S31P to promote p300 activity and enhancer acetylation in mESCs [152].

With extensive studies being focused on methylation, acetylation, ubiquitination, and phosphorylation of histones, a plethora of other modifications have also been reported for histones including lysine crotonylation, butyrylation, propionylation, tyrosine hydroxylation, biotinylation, neddylation, sumoylation, O-GlcNAc, ADP ribosylation, N-formylation, proline isomerization, and citrullination [31, 153,154,155,156].

With the use of an integrated, mass spectrometry-based proteomics approach, lysine crotonylation has been designated as a specific mark of active sex chromosome-linked genes in post-meiotic male germ cells via associating with active chromatin, including promoters and active enhancers [82]. Intriguingly, the YEATS domain proteins display high binding affinity for crotonyl-lysine, linking this modification to active transcription [157]. Two recent studies highlight the possibility of targeting YEATS domains in MLL-rearranged leukemia, potentially synergizing with BET and DOT1L inhibition [158, 159]. Besides crotonylation, butyrylation and propionylation are two other non-acetyl-lysine acylation modifications actively occupying gene promoters and exerting their functions in a similar fashion as histone acetylation [160,161,162].

Neddylation, the covalent conjugation of NEDD8, a ubiquitin-like protein, is deposited on histone H2A by the E3 ligase RNF168. The neddylation of H2A on K119 prevents ubiquitination at this site and results in decreased response to DNA damage, suggesting a role of the neddylation pathway to DNA damage repair [163]. In addition to histone ubiquitination and neddylation, histone H4 can also be modified with SUMO (small ubiquitin-related modifier) family proteins to mediate transcriptional repression through the recruitment of histone deacetylases and heterochromatin protein 1 (HP1) [164].

Biotinylation of lysines on histones has also been described as a rare modification [165], but it has not been widely studied and its biological significance is not well-established. Serine/threonine O-GlcNAcylation of epigenetic factors such as HCF1 and TET2 has been well-established [166, 167]. However, whether histones are modified by O-GlcNAc in vivo in mammalian cells remain debated [168, 169]. The occurrence of histone ADP ribosylation is universal on all core histones and histone H1. Despite its universal presence, the biological consequence is quite divergent on different lysines modified ranging from DNA repair, replication and transcription [170]. N-formylation of lysines of histones represents a noncanonical secondary modification that arises from oxidative DNA damage [171]. Since the modification also occurs on lysine residues, it may interfere with methylation or acetylation of the same residue and contribute to the pathophysiology of oxidative and nitrosative stress. Likewise, noncovalent proline isomerization of histone H3 influences the H3K36 lysine methylation to enhance transcription [172]. Finally, deimination or citrullination of arginine residues by PADI4 antagonizes arginine methylation by converting arginine or methylarginine to the nonconventional amino acid citrulline [173, 174]. Hypercitrullination can promote chromatin decondensation [175]. Intriguingly, citrullinated histone H3 may also function as a novel prognostic marker associated with exacerbated inflammatory response in patients with advanced cancer [176].

Overall, some of these more recently identified histone modifications could affect the conventional modifications such as methylation, acetylation, ubiquitination, and phosphorylation via competing with the same sites on histones for modification or through crosstalk through conferring a conformational change, thus altering the downstream signaling and gene expression regulation. The rarity of these modifications in the genome may indicate functions in fine tuning the conventional modifications in response to various circumstances such as DNA damage and oxidative stress. The crosstalk and biological consequences of these rare modifications need to be further characterized in future studies.

Techniques for mapping and characterizing the modifications and their genome distribution

Identification of histone modifications has been greatly aided by the development of mass spectrometric techniques [154, 177,178,179,180]. Bottom-up, middle-down, and top-down strategies have their own advantages and challenges [181, 182]. Bottom-up mass spectrometry typically analyzes small peptides generated from trypsinization, which can provide the highest accuracy for identifying modifications. Top-down mass spectrometry attempts to identify the entire complement of modifications starting from an intact protein. Middle-down mass spectrometric analyzes larger peptides generated from rarer histone-cutting enzymes such as Glu-C. The Middle-down approach allows relatively high sensitivity compared to top-down, while still allowing identification of the complement of modifications on an entire histone tail, the location of the majority of histone modifications. By identifying which modifications occur on the same histone, potential synergistic or antagonistic effects of different modifications can be revealed [182].

The successes in identifying numerous histone modifications leave the challenge of identifying the function of these modifications. In genetically tractable organisms such as yeast and fruit flies, organisms have been generated where all of the histone gene copies have been replaced with a mutation of a modification site to an unmodifiable residue [183,184,185]. shRNA-mediated knockdown or CRISPR/Cas9 knockout of histone modifiers can be used to assess the function of a histone modifier. Knocking in mutations of the catalytic site of the enzyme can be used to determine whether the effects observed upon loss of the modifier is due to the loss of the histone modification or due to disruption of the macromolecular, multifunctional complexes in which some of these enzymes are found.

The functional consequences of histone modifications in different conditions or perturbations can be evaluated with such techniques as RNA-seq for quantification of mature transcripts, precision nuclear run-on sequencing (PRO-seq) [186] or native elongating transcript sequencing (NET-seq) [187] for quantification of nascent transcripts. Methylated DNA immunoprecipitation sequencing (MeDIP-seq) [188], MethylC-seq [189], and reduced representation bisulfite sequencing (RRBS-seq) [190] can be used to measure changes in DNA methylation. Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq) [191], DNAse-seq [192], and Formaldehyde-Assisted Isolation of Regulatory Element sequencing (FAIRE-seq) [193] can be used to assess changes in accessibility of chromatin. The advances of single-molecule detection of posttranslational modifications on nucleosomes allow the detection of combinatorial modification states and genomic positions of nucleosomes [194].

The development of enzymatic inhibitors can be challenging for a variety of histone modifiers: first, proteins within an enzyme family can preserve sequence and structural similarities, which can hinder the ability to obtain specific small molecule inhibitors second, a large number of chromatin-related proteins lack druggable pockets. The aforementioned ligand-dependent degradation of proteins, PROTAC [106], HaloPROTAC [195, 196], small molecule-assisted shutoff (SMASh) degraders [197], and dTAG [198], has all been used to route target proteins for proteasome-dependent degradation (Table 3), thus bypassing the need for an enzymatic therapeutic target [199]. The use of these technologies to degrade chromatin-related proteins will significantly advance our understanding of the roles of histone modifications and chromatin in normal biological processes, as well as aid the rational design of efficient and potent small molecules with therapeutic value.

From 2D to 4D: capturing nucleosomes dynamics

Due to the dynamic nature of nucleosomes and chromatin structure, various approaches are required to explore this space. The “2D” represents the broad spectrum of histone modifications as discussed in this review, either acting alone or in combination with other modifications for synergistic, additive, or antagonistic effects to sophisticatedly regulate gene expression in a timely manner. The “3D” lies in how histone modifications affect the chromatin organization, higher-order structures of chromatin, and interactions of distal regulatory elements. The 3D structure can be captured by Hi-C, a comprehensive way to measure chromatin interactions across the human genome [200]. Although histone modifications and chromatin architecture are profiled in separate assays, researchers are actively making predictions and modeling of the chromatin organization such as chromatin interaction hubs and topologically associated domain (TAD) boundaries using cell type-specific histone marks [201, 202]. The integration of ChIP-seq and Hi-C datasets reveal important information of how chromatin organization could have an impact on gene regulation and how chromatin architecture can be predicted using ChIP-seq data [203]. Nonetheless, an experimental method combining histone marks ChIP-seq and Hi-C would be useful to directly address these questions. Super-resolution imaging using a three-dimensional stochastic optical reconstruction microscope (3D-STORM) is another approach to capture the 3D organization of chromatin in different epigenetic states and reveal structural details of chromatin [204]. The “4D” relies on the real-time monitoring of modification dynamics. This could be achieved by using acute degradation strategies such as HaloPROTAC or auxin-inducible degron (AID) tagging of histone modifiers [205, 206], real-time visualization of chromatin modifications with confocal and structured illumination microscopy [207], and fluorescent ligand labeling for direct visualization of chromatin factors using Halo-tag [208, 209] or SNAP-tag [210]. The acute degradation strategies have apparent advantages over commonly shRNA-mediated knockdown or CRISPR/Cas9-mediated knockout of histone modifiers, which takes several days to months, and phenomena may be due to secondary effects. The acute degradation strategies are much more specific with less off-target effects and capture early effects on chromatin when coupled with conventional ChIP-seq or ATAC-seq. For example, 60 min of auxin treatment in cells with AID tagging of PAF1 resulted in a major depletion of endogenous PAF1 protein, confirming the release of Pol II from promoter-proximal pausing was a direct consequence PAF1 loss [206].

Future directions

Great effort has been devoted to understand the role of histone modifications and the enzymatic machinery involved in the implementation of these modifications during development and in disease, especially for cancer. Precise techniques are being developed for mapping the localization and function of histone modifications in the genome from population of cells to hopefully few or even single cells. The proteins that specifically bind histone modifications translate information to regulate gene expression by recruiting or removing other transcription factors. It is crucial to characterize the various functions of PTMs and their modifiers in human cancer. Nevertheless, capturing modification dynamics remains a challenging problem to study the function of histone modifications in vivo.

Development of assays and specific small molecule inhibitors (enzymatic/non-enzymatic) for targeting disease-related PTMs requires extensive knowledge based on the X-ray/Cryo-EM crystal structures of modifiers and modification-binding factors. The tool molecules or chemical probes will further elucidate the in vivo biological function of the key players on chromatin. The “quality” (specificity and potency) of the chemical probes and the thoughtful design of the experimental assays largely determine the outcome and interpretation of results. In addition, the identification of non-histone substrates is critical for defining the roles of the histone modifiers in order to develop more specific inhibitors targeting the desired pathway [76, 89].

It is interesting that histone modifiers often reside within large multi-protein complexes for proper function, such as MLL/COMPASS, PRC2, and HDAC complexes. Understanding how the key enzymes function with other subunits (e.g., activity and stability regulation) within the same complex will inform the design of small molecule disruptors of the protein complexes. The MLL-menin and MLL-WDR5 inhibitors fall within this class, and other interfaces between the catalytic domain and scaffolding proteins may be desirable targets to be harnessed for small molecule development with the gain of structure knowledge. Together with other approaches for targeting histone modifications such as enzymatic activity inhibition and small molecule degraders by PROTACs, chromatin-related proteins and modifications are considered as favorable drug targets, and a number of agents have been designed and used in different stages of clinical trials combined with currently available chemotherapies [6].

Reducing the toxic side effects of epigenetic drugs is a challenging issue when testing the agents in clinical trials [211]. The synthetic lethal approach is currently being explored to reduce toxic off-target side effects and to combat therapy resistance by targeting multiple genes using a combination of drug treatments. Moreover, histone modifications may potentially act as biomarkers in cancer diagnosis and prognostic predictors [26, 212]. The ultimate goal is to translate epigenetic therapy into the clinic for the treatment of cancers and tailor efficient strategies based on cancer types and epigenome alterations.