Information

7.19E: Regulation of Sigma Factor Translation - Biology

7.19E:  Regulation of Sigma Factor Translation - Biology



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Sigma factors are proteins that regulate gene expression that are controlled at various levels, including at the translational level.

Learning Objectives

  • Explain the regulation of sigma factor translation

Key Points

  • Sigma factor expression is often associated with environmental changes that cause changes in gene expression.
  • The translational control of sigma factors is critical in its role in transcription regulation.
  • Sigma factor translation is controlled by small noncoding RNAs that can either activate or inhibit translation.

Key Terms

  • oxidative stress: Damage caused to cells or tissue by reactive oxygen species.
  • sigma factor: A sigma factor (σ factor) is a protein needed only for initiation of RNA synthesis.
  • RpoS protein: RpoS is a central regulator of the general stress response and operates in both a retroactive and a proactive manner: not only does it allow the cell to survive environmental challenges, but it also prepares the cell for subsequent stresses (cross-protection).

Sigma factors are groups of proteins that regulate transcription and therefore function in house-keeping, metabolic, and regulation of growth processes in bacteria. Sigma factor expression is often associated with environmental changes that cause changes in gene expression. The regulation of expression of sigma factors occurs at transcriptional, translational, and post-translational levels as dictated by the cellular environment and the presence or absence of numerous cofactors.

Sigma factors include numerous types of factors. The most commonly studied sigma factors are often referred to as a RpoS proteins as the rpoS genes encode for sigma proteins of various sizes. In E. coli, the RpoS is the regulator of growth phase genes, specifically in the stationary phase. The RpoS is critical in the general stress responses and can either function in promoting survival during environmental stresses, but can also prepare the cell for stresses. Specifically, the translational control of the sigma factor is a major level of control.

The translational control of sigma factors involves the presence and function of small noncoding RNAs. Using RpoS proteins as the focus, the RpoS expression and transcription is regulated at the translational level. Small noncoding RNAs are able to sense environmental changes and stresses resulting in increased expression of RpoS protein. The small noncoding RNAs are able to specifically increase the amount of rpoS mRNA that undergoes translation.

The resultant increase of RpoS protein is based on the cellular environment and its needs. There are numerous classes of small noncoding RNAs that function in RpoS regulation, including DsrA, RprA and OxyS. These small noncoding RNAs are capable of sensing changes in temperature (DsrA), cell surface stress (RprA) and oxidative stress (OxyS). These RNAs can induce activation of rpoS translation. However, there are small noncoding RNAs, such as LeuO, that are capable of inhibiting rpoS translation as well via repression mechanisms. The regulation of rpoS translation is complex and involves cross-signaling and networking of numerous proteins and the regulatory small noncoding RNAs.


7.19E: Regulation of Sigma Factor Translation - Biology

Phosphorylation of RssB happens on the interface thus weakens the interaction σ S .

IraD interacts with the RssB at higher affinity regardless of phosphorylation.

A new two-tier mechanism for controlled proteolysis of σ S is proposed. RssB is fully activated during logarithmic growth, promoting σ S degradation. Phosphorylation on D58 partially switches the degradation off as the phosphorylation reduces RssB affinity for σ S . The binding of anti-adaptor protein IraD to RssB prevents any RssB- σ S interaction and shuts off σ S degradation completely.


Proteomic and Unbiased Post-Translational Modification Profiling of Amyloid Plaques and Surrounding Tissue in a Transgenic Mouse Model of Alzheimer’s Disease

Correspondence: [*] Correspondence to: Allan Stensballe, PhD, Department of Health Science and Technology, Aalborg University, Fredrik Bajersvej 7E, 9220 Aalborg, Denmark. Tel.: +45 6160 8786 Fax: +45 9815 4008 E-mail: [email protected] .

Note: [1] Shared senior authorship.

Abstract: Amyloid plaques are one of the hallmarks of Alzheimer’s disease (AD). The main constituent of amyloid plaques is amyloid-β peptides, but a complex interplay of other infiltrating proteins also co-localizes. We hypothesized that proteomic analysis could reveal differences between amyloid plaques and adjacent control tissue in the transgenic mouse model of AD (APPPS1-21) and in similar regions from non-transgenic littermates. Our microproteomic strategy included isolation of regions of interest by laser capture microdissection and analysis by liquid chromatography mass spectrometry-based label-free relative quantification. We consistently identified 183, 224, and 307 proteins from amyloid plaques, adjacent control and non-tg samples, respectively. Pathway analysis revealed 27 proteins that were significantly regulated when comparing amyloid plaques and corresponding adjacent control regions. We further elucidated that co-localized proteins were subjected to post-translational modifications and are the first to report 193 and 117 unique modifications associated to amyloid plaques and adjacent control extracts, respectively. The three most common modifications detected in proteins from the amyloid plaques were oxidation, deamidation, and pyroglutamylation. Together, our data provide novel information about the biological processes occurring within and around amyloid plaques in the APPPS1-21 mouse model of AD.

Keywords: Alzheimer’s disease, amyloid plaque, mass spectrometry, microdissection, pyroglutamate

Journal: Journal of Alzheimer's Disease, vol. 73, no. 1, pp. 393-411, 2020


SUMMARY OF THE INVENTION

Among the various aspects of the present disclosure is the provision of an isolated RNA-guided endonuclease, wherein the endonuclease comprises at least one nuclear localization signal, at least one nuclease domain, and at least one domain that interacts with a guide RNA to target the endonuclease to a specific nucleotide sequence for cleavage. In one embodiment, the endonuclease can be derived from a Cas9 protein. In another embodiment, the endonuclease can be modified to lack at least one functional nuclease domain. In other embodiments, the endonuclease can further comprise a cell-penetrating domain, a marker domain, or both. In a further embodiment, the endonuclease can be part of a protein-RNA complex comprising the guide RNA. In some instances, the guide RNA can be a single molecule comprising a 5′ region that is complementary to a target site. Also provided is an isolated nucleic acid encoding any of the RNA-guided endonucleases disclosed herein. In some embodiments, the nucleic acid can be codon optimized for translation in mammalian cells, such as, for example, human cells. In other embodiments, the nucleic acid sequence encoding the RNA-guided endonuclease can be operably linked to a promoter control sequence, and optionally, can be part of a vector. In other embodiments, a vector comprising sequence encoding the RNA-guided endonuclease, which can be operably linked to a promoter control sequence, can also comprise sequence encoding a guide RNA, which can be operably linked to a promoter control sequence.

Another aspect of the present invention encompasses a method for modifying a chromosomal sequence in a eukaryotic cell or embryo. The method comprises introducing into a eukaryotic cell or embryo (i) at least one RNA-guided endonuclease comprising at least one nuclear localization signal or nucleic acid encoding at least one RNA-guided endonuclease as defined herein, (ii) at least one guide RNA or DNA encoding at least one guide RNA, and, optionally, (iii) at least one donor polynucleotide comprising a donor sequence. The method further comprises culturing the cell or embryo such that each guide RNA directs a RNA-guided endonuclease to a targeted site in the chromosomal sequence where the RNA-guided endonuclease introduces a double-stranded break in the targeted site, and the double-stranded break is repaired by a DNA repair process such that the chromosomal sequence is modified. In one embodiment, the RNA-guided endonuclease can be derived from a Cas9 protein. In another embodiment, the nucleic acid encoding the RNA-guided endonuclease introduced into the cell or embryo can be mRNA. In a further embodiment, wherein the nucleic acid encoding the RNA-guided endonuclease introduced into the cell or embryo can be DNA. In a further embodiment, the DNA encoding the RNA-guided endonuclease can be part of a vector that further comprises a sequence encoding the guide RNA. In certain embodiments, the eukaryotic cell can be a human cell, a non-human mammalian cell, a stem cell, a non-mammalian vertebrate cell, an invertebrate cell, a plant cell, or a single cell eukaryotic organism. In certain other embodiments, the embryo is a non-human one cell animal embryo.

A further aspect of the disclosure provides a fusion protein comprising a CRISPR/Cas-like protein or fragment thereof and an effector domain. In general, the fusion protein comprises at least one nuclear localization signal. The effector domain of the fusion protein can be a cleavage domain, an epigenetic modification domain, a transcriptional activation domain, or a transcriptional repressor domain. In one embodiment, the CRISPR/Cas-like protein of the fusion protein can be derived from a Cas9 protein. In one iteration, the Cas9 protein can be modified to lack at least one functional nuclease domain. In an alternate iteration, the Cas9 protein can be modified to lack all nuclease activity. In one embodiment, the effector domain can be a cleavage domain, such as, for example, a FokI endonuclease domain or a modified FokI endonuclease domain. In another embodiment, one fusion protein can form a dimer with another fusion protein. The dimer can be a homodimer or a heterodimer. In another embodiment, the fusion protein can form a heterodimer with a zinc finger nuclease, wherein the cleavage domain of both the fusion protein and the zinc finger nucleases is a FokI endonuclease domain or a modified FokI endonuclease domain. In still another embodiment, the fusion protein comprises a CRISPR/Cas-like protein derived from a Cas9 protein modified to lack all nuclease activity, and the effector domain is a FokI endonuclease domain or a modified FokI endonuclease domain. In still another embodiment, the fusion protein comprises a CRISPR/Cas-like protein derived from a Cas9 protein modified to lack all nuclease activity, and the effector domain can be an epigenetic modification domain, a transcriptional activation domain, or a transcriptional repressor domain. In additional embodiments, any of the fusion proteins disclosed herein can comprise at least one additional domain chosen from a nuclear localization signal, a cell-penetrating domain, and a marker domain. Also provided are isolated nucleic acids encoding any of the fusion proteins provided herein.

Still another aspect of the disclosure encompasses a method for modifying a chromosomal sequence or regulating expression of a chromosomal sequence in a cell or embryo. The method comprises introducing into the cell or embryo (a) at least one fusion protein or nucleic acid encoding at least one fusion protein, wherein the fusion protein comprises a CRISPR/Cas-like protein or a fragment thereof and an effector domain, and (b) at least one guide RNA or DNA encoding at least one guide RNA, wherein the guide RNA guides the CRISPR/Cas-like protein of the fusion protein to a targeted site in the chromosomal sequence and the effector domain of the fusion protein modifies the chromosomal sequence or regulates expression of the chromosomal sequence. In one embodiment, the CRISPR/Cas-like protein of the fusion protein can be derived from a Cas9 protein. In another embodiment, the CRISPR/Cas-like protein of the fusion protein can be modified to lack at least one functional nuclease domain. In still another embodiment, the CRISPR/Cas-like protein of the fusion protein can be modified to lack all nuclease activity. In one embodiment in which the fusion protein comprises a Cas9 protein modified to lack all nuclease activity and a FokI cleavage domain or a modified FokI cleavage domain, the method can comprise introducing into the cell or embryo one fusion protein or nucleic acid encoding one fusion protein and two guide RNAs or DNA encoding two guide RNAs, and wherein one double-stranded break is introduced in the chromosomal sequence. In another embodiment in which the fusion protein comprises a Cas9 protein modified to lack all nuclease activity and a FokI cleavage domain or a modified FokI cleavage domain, the method can comprise introducing into the cell or embryo two fusion proteins or nucleic acid encoding two fusion proteins and two guide RNAs or DNA encoding two guide RNAs, and wherein two double-stranded breaks are introduced in the chromosomal sequence. In still another one embodiment in which the fusion protein comprises a Cas9 protein modified to lack all nuclease activity and a FokI cleavage domain or a modified FokI cleavage domain, the method can comprise introducing into the cell or embryo one fusion protein or nucleic acid encoding one fusion protein, one guide RNA or nucleic acid encoding one guide RNA, and one zinc finger nuclease or nucleic acid encoding one zinc finger nuclease, wherein the zinc finger nuclease comprises a FokI cleavage domain or a modified a FokI cleavage domain, and wherein one double-stranded break is introduced into the chromosomal sequence. In certain embodiments in which the fusion protein comprises a cleavage domain, the method can further comprise introducing into the cell or embryo at least one donor polynucleotide. In embodiments in which the fusion protein comprises an effector domain chosen from an epigenetic modification domain, a transcriptional activation domain, or a transcriptional repressor domain, the fusion protein can comprise a Cas9 protein modified to lack all nuclease activity, and the method can comprise introducing into the cell or embryo one fusion protein or nucleic acid encoding one fusion protein, and one guide RNA or nucleic acid encoding one guide RNA, and wherein the structure or expression of the targeted chromosomal sequence is modified. In certain embodiments, the eukaryotic cell can be a human cell, a non-human mammalian cell, a stem cell, a non-mammalian vertebrate cell, an invertebrate cell, a plant cell, or a single cell eukaryotic organism. In certain other embodiments, the embryo is a non-human one cell animal embryo.

Other aspects and iterations of the disclosure are detailed below.


Discussion

The primary findings of this study are that 1) female hearts are more susceptible than male hearts to methamphetamine-induced changes in gene transcription 2) methamphetamine does not induce long-lasting changes in myocardial gene transcription that persist following 1 month of subsequent abstinence from the drug and 3) methamphetamine induces sex-dependent changes in the transcription of genes that regulate the circadian clock in the heart.

This work was prompted by our previous finding that methamphetamine treatment for 10�ys causes female rats (but not their male siblings) to develop myocardial hypersensitivity to ischemic injury [14]. Importantly, this methamphetamine-induced effect persisted in female hearts following 1 month of subsequent abstinence from the drug, suggesting that it might result from long term changes in cardiac gene expression that are not rapidly reversed when methamphetamine exposure is discontinued. We anticipated that the identification of methamphetamine-induced changes in myocardial gene expression that are both sex dependent (occurring only in females) and that persist following a period of subsequent abstinence from the drug may provide a mechanistic basis for our observations regarding the impact of methamphetamine on the ischemic heart. Contrary to this hypothesis, our findings indicate that methamphetamine does not induce changes in myocardial gene transcription that persist long term after the drug has been discontinued.

Interactions between the period genes, BmalI, Npas2, Dbp, Clock, cryptochromes, and other genes that regulate circadian function are well characterized and have been the subject of recent reviews [27, 28]. The ability of methamphetamine to alter the expression of genes that regulate circadian rhythm in the hippocampus, striatum, and other regions of the brain is well established [29�]. However, this is the first study that we are aware of to demonstrate that methamphetamine alters the myocardial transcription of clock-related genes (Per2, Per3, Dbp, Clock, Bmal I, Cry2, Npas2) in the heart and that this occurs in a sex-dependent manner. The circadian clock plays an important role in regulating diurnal changes in cardiac metabolism, heart rate, and blood pressure [28, 34], and there is evidence from both animal models [35, 36] and human studies [37�] [40, 41] that disruption of the circadian clock adversely impacts the development of cardiovascular disease [27, 42, 43] and susceptibility to myocardial infarction [36, 44]. Thus, the observation that 10�ys of methamphetamine treatment alters the transcription of circadian clock genes and also causes female hearts to become hypersensitive to ischemic injury [14] is consistent with the work of other investigators. However, our findings do not provide an explanation for the observation that these animals remain hypersensitive to ischemia after a 1-month period of subsequent abstinence when transcription of these genes is no longer altered by methamphetamine. Our data do not rule out the possibility that methamphetamine induces epigenetic changes that serve as a “memory” of methamphetamine exposure and subsequently influence transcriptional changes induced by an ischemic insult. Further work is needed to determine whether methamphetamine induces epigenetic changes that alter transcriptional responses triggered by ischemia or other forms of cardiac stress.

Female hearts were significantly more sensitive than male hearts to methamphetamine-induced changes in gene expression, both in terms of the number of genes effected and the magnitude of the methamphetamine-induced changes. This might result from the fact that the rate of clearance of methamphetamine is lower in female rats than in male rats, resulting in females having a greater exposure (in terms of area under the concentration – time curve) than males given an equal dose of the drug [45]. Methamphetamine has been reported to disrupt the hypothalamic-pituitary-ovarian axis in females [46]. Thus, this sex difference could alternatively be secondary to changes in function of the hypothalamic- pituitary - ovarian axis, disruption of the cardioprotective effects of estrogen, or to sex differences in the brain’s response to methamphetamine rather than a direct effect of methamphetamine on the heart. It should be noted that not all methamphetamine-induced cardiac effects occur exclusively in female hearts. Some investigators have reported that males are more susceptible than females to methamphetamine-induced cardiomyopathy [47, 48]. Additional work is needed to understand the mechanism by which methamphetamine induces sex-dependent effects in the myocardium.

The finding that the female heart is more sensitive than the male heart to methamphetamine-induced changes in gene expression is consistent with previously reported cardiac sex-differences. Sexual dimorphism in rodent models of cardiovascular health and disease was recently the topic of an extensive review [49]. Baseline sex differences in the activity of ion channels [50], cardiac mitochondrial metabolism [51], cardiac expression of calcium handling proteins [52�], and sex differences in the concentration of norepinephrine in myocardial tissue [55, 56] have been reported in healthy rodents. Furthermore, the cardioprotective benefit of estrogen is well established [57]. Male hearts are more sensitive than female hearts to myocardial ischemic injury [15, 58�]. Male rodents are also reported to have more maladaptive cardiac remodeling, poorer recovery of ventricular function, and lower survival rates than females following a myocardial infarction [61, 62]. Sex-differences in the cardiac response to pressure overload, volume overload, and isoproterenol-induced hypertrophy have also been reported [49]. Thus, sex-dependent differences in both cardiac physiology and pathophysiology are well established. Our finding that the female cardiac transcriptome is more sensitive than the male transcriptome to the effects of methamphetamine extends our knowledge of cardiac sex differences.

RNA sequencing identified methamphetamine-induced changes in the number of cardiac transcripts for several circadian rhythm-related genes in the female heart (Fig. ​ (Fig.3 3 ). Most of these findings were replicated by qPCR (Fig. ​ (Fig.3 3 ). The qPCR data for Per3 and Clock demonstrated a trend in the same direction as the RNA sequencing data, but the methamphetamine-induced effect did not reach statistical significance for Per3 and Clock when measured by qPCR. RNA sequencing identified a methamphetamine-induced increase in the number of Dbp transcripts in female hearts (Fig. ​ (Fig.3e) 3 e) but no change male hearts. In contrast, qPCR found a significant increase in Dbp transcripts in both sexes (Fig. ​ (Fig.3f). 3 f). It is unclear why there is a disparity between these two methods of measuring Dbp transcripts in male hearts.

Genes encoding Per2, BMALI, and DBP are regulated by negative feedback mechanisms in which expression of the protein suppresses transcription of the gene [63]. Expression is also regulated by ubiquitin-dependent mechanisms that regulate rates of protein degradation [64�]. These mechanisms result in a cyclic pattern of expression over a 24-h time period. Based on the fact that expression of these proteins is tightly controlled by both transcriptional and proteolytic mechanisms, it is not surprising that data from western blotting experiments (Fig. ​ (Fig.4) 4 ) did not precisely mirror the changes observed at the transcript level (Fig. ​ (Fig.3). 3 ). The assessment of circadian clock genes (both at the transcript and protein levels) at only a single time point is a limitation of this study.

The vast majority of changes in transcripts for both male and female hearts were observed immediately following 10�ys of methamphetamine treatment (Fig. ​ (Fig.1a). 1 a). However, changes in the transcripts of 6 additional genes (3 in male hearts and 3 in female hearts) were identified following a 30-day period of subsequent abstinence from methamphetamine. Most (5 out of 6) of these changes were less than 2-fold in magnitude (Fig. ​ (Fig.1b 1 b Table ​ Table4). 4 ). It is noteworthy that all 3 changes observed in male hearts following 30�ys of abstinence involved genes that regulate the circadian rhythm and that no circadian-related genes were altered in female hearts following 30�ys of abstinence (Table ​ (Table4). 4 ). Previous studies have documented prolonged periods of disrupted sleep patterns in humans who formerly used methamphetamine [68, 69]. Thus, we speculate that sex differences in the expression of these circadian genes might reflect sex-dependent alterations in sleep patterns associated with the discontinuation of methamphetamine. Further work is needed to understand the mechanism and physiological impact of these changes.


7.19E: Regulation of Sigma Factor Translation - Biology

Over the past decades, Helicoverpa armigera nucleopolyhedrovirus (HearNPV) has been widely used for biocontrol of cotton bollworm, which is one of the most destructive pest insects in agriculture worldwide. However, the molecular mechanism underlying the interaction between HearNPV and host insects remains poorly understood. In this study, high-throughput RNA-sequencing was integrated with label-free quantitative proteomics analysis to examine the dynamics of gene expression in the fat body of H. armigera larvae in response to challenge with HearNPV. RNA sequencing-based transcriptomic analysis indicated that host gene expression was substantially altered, yielding 3,850 differentially expressed genes (DEGs), whereas no global transcriptional shut-off effects were observed in the fat body. Among the DEGs, 60 immunity-related genes were down-regulated after baculovirus infection, a finding that was consistent with the results of quantitative real-time RT-PCR. Gene ontology and functional classification demonstrated that the majority of down-regulated genes were enriched in gene cohorts involved in energy, carbohydrate, and amino acid metabolic pathways. Proteomics analysis identified differentially expressed proteins in the fat body, among which 76 were up-regulated, whereas 373 were significantly down-regulated upon infection. The down-regulated proteins are involved in metabolic pathways such as energy metabolism, carbohydrate metabolism (CM), and amino acid metabolism, in agreement with the RNA-sequence data. Furthermore, correlation analysis suggested a strong association between the mRNA level and protein abundance in the H. armigera fat body. More importantly, the predicted gene interaction network indicated that a large subset of metabolic networks was significantly negatively regulated by viral infection, including CM-related enzymes such as aldolase, enolase, malate dehydrogenase, and triose-phosphate isomerase. Taken together, transcriptomic data combined with proteomic data elucidated that baculovirus established systemic infection of host larvae and manipulated the host mainly by suppressing the host immune response and down-regulating metabolism to allow viral self-replication and proliferation. Therefore, this study provided important insights into the mechanism of host-baculovirus interaction.

Author contributions: Z.Z. and Z.H. designed the research L.X., C.Y., M.W., Z.L., B.S., Z.H. and Z.Z. performed the experiment L.X. and C.Y. analyzed the data L.X., Z.H. and Z.Z. wrote the paper.

This work was supported by Strategic Priority Research Program of the CAS Grant XDB11030600, National Key Plan for Scientific Research and Development of China Grant 2016YFD0500300, National Natural Science Foundation of China Grants 31472008 and 31672291, and Open Research Fund Program of State Key Laboratory of Integrated Management of Pest Insects and Rodents Chinese IPM 1515.


RESULTS

Characterization of rpoSTy19, a natural rpoS mutant allele of S. Typhi.

During a search for rpoS mutants in clinical isolates of Salmonella (36 data not shown), our attention was drawn by S. Typhi strain Ty19, a strain isolated from human blood. S. Typhi Ty19 produced a low amount of σ S (Fig. ​ (Fig.1, 1 , lane 2), compared to wild-type isolates of S. Typhi (Fig. ​ (Fig.1, 1 , lane 1) (36). Surprisingly, Ty19 was resistant to hydrogen peroxide (H2O2) in stationary phase (Fig. ​ (Fig.2A, 2A , columns 1 and 3), a phenotype dependent on σ S (Fig. ​ (Fig.2A, 2A , columns 1 and 5). The sequence of the promoter and leader regions of the rpoS gene in Ty19 (rpoSTy19) was wild type, but the open reading frame contained a G/T mutation at position 845, which resulted in a glycine-to-valine amino acid substitution at residue 282 (G282V) in the σ S Ty19 protein.

Detection of σ S and σ S Ty19 in S. Typhi (STY) and S. Typhimurium (STM). Overnight LB cultures at 37ଌ were analyzed by Western blotting with anti-σ S antibodies. A 5-μg portion of total protein was loaded into each slot. Lanes: 1, 5959 2, Ty19 3, Ty19K 4, 2922K 5, 2922Kcrl 6, 2922KrpoS 7, 2922KrpoSTy19 8, 2922KrpoSTy19crl.

In vivo characterization of the rpoSTy19 mutant allele. (A to C) Resistance to hydrogen peroxide of Salmonella strains carrying wild-type or rpoSTy19 mutant alleles. (A) S. Typhi strains. Columns: 1, 5959 2, 5959crl 3, Ty19 4, Ty19crl 5, Ty19K. (B) S. Typhi strains 5959 (WT), Ty19, Ty19crl, and Ty19K. (C) S. Typhimurium strains 2922K (WT), 2922KΔcrl, 2922KrpoSTy19, 2922KrpoSTy19Δcrl, and 2922KrpoS. Cells were grown to stationary phase in LB medium at 37ଌ, washed, resuspended in PBS to an OD600 of 1 (A) and 0.1 (B and C), and H2O2 15 mM was added. The results of representative experiments are shown in panels B and C. Similar results were obtained in repeat experiments. (D) Expression of a katE-lacZ gene fusion in S. Typhimurium carrying the wild-type rpoS and mutant rpoSTy19 alleles. Columns: 1, 2922KkatE-lacZ 2, 2922KrpoS katE-lacZ 3, 2922Kcrl katE-lacZ 4, 2922KrpoSTy19 katE-lacZ 5, 2922KrpoSTy19crl katE-lacZ. The β-galactosidase activity was measured in overnight LB cultures at 37ଌ according to the method of Miller (27).

The chaperone-like protein Crl increases the performance of σ S , but its impact on the H2O2 resistance level of S. Typhimurium in stationary phase is hardly detectable in standard growth conditions (37, 40). Most interestingly, a crl mutation affected the ability of S. Typhi Ty19 to resist to H2O2 (Fig. ​ (Fig.2A, 2A , columns 3 to 4), whereas, as expected, no significant effect of the crl mutation was detected on the H2O2 resistance level of strain 5959 (Fig. ​ (Fig.2A, 2A , columns 1 to 2), an S. Typhi strain wild-type for rpoS (36). At low cell density, the effect of the crl mutation on the H2O2 resistance level of Ty19 was drastic and not very different from that of the rpoS deletion (Fig. ​ (Fig.2B). 2B ). These results suggested that the σ S activity in Ty19 was more dependent on Crl activation than in strain 5959.

The activity of σ S Ty19 is highly dependent on Crl.

To characterize the rpoSTy19 allele in an otherwise isogenic background, the rpoS gene from S. Typhimurium ATCC 14028 was replaced by the rpoSTy19 allele, yielding strain 2922KrpoSTy19. As previously observed in S. Typhi, σ S Ty19 was detected in lower amounts than the wild-type σ S protein (Fig. ​ (Fig.1, 1 , lanes 4 and 7). Interestingly, levels of H2O2 resistance were more dependent on Crl in the rpoSTy19 mutant than in the wild-type strain (Fig. ​ (Fig.2C). 2C ). Consistent with this finding, the expression level of a lacZ gene fusion in katE, a σ S -dependent gene encoding a catalase required for the H2O2 resistance of Salmonella in stationary phase (37), was affected by the crl mutation in the rpoSTy19 mutant but not in the wild-type strain (Fig. ​ (Fig.2D, 2D , columns 4 to 5 and columns 1 to 3). σ S Ty19 production levels were not lowered in the absence of Crl (Fig. ​ (Fig.1, 1 , lanes 7 to 8). These results suggested that the activity of σ S Ty19 was highly dependent on Crl activation. At low cell density, the rpoSTy19 mutant was slightly less resistant to H2O2 than the wild-type strain (Fig. ​ (Fig.2C) 2C ) and expressed the katE-lacZ fusion to slightly lower levels (Fig. ​ (Fig.2D 2D ).

In agreement with the in vivo data, in vitro transcription experiments using three different σ S -dependent promoters—katE, katN, and poxB�monstrated that the activity of the σ S Ty19 protein was lower, and much more dependent on Crl activation, than that of the wild-type σ S protein (Fig. ​ (Fig.3A). 3A ). In these assays, the addition of Crl rescued σ S Ty19 activity to a level similar to that obtained using the wild-type σ S protein in the absence of Crl (Fig. ​ (Fig.3A). 3A ). In conclusion, the G282V substitution in σ S Ty19 both decreased the activity of σ S and increased its dependency to Crl activation.

In vitro characterization of the rpoSTy19 mutant allele. (A) Single-round runoff transcripts of katE (lanes 1 to 4), katN (lanes 5 to 8), and poxB (lanes 9 to 12) promoters. RNA polymerases reconstituted with His6-σ S Ty19 (lanes 1 and 2, 5 and 6, and 9 and 10) or His6-σ S WT (lanes 3 and 4, 7 and 8, and 11 and 12) in the absence (lanes 1, 3, 5, 7, 9, and 11) or presence of Crl (lanes 2, 4, 6, 8, 10, and 12) were incubated with plasmid templates for 12 min at 30ଌ before the addition of a mixture of heparin and XTPs as described in Materials and Methods. The 32 P-labeled transcripts were analyzed on a 7% sequencing gel, and the band intensities quantified as indicated below each lane. (B) Real-time SPR experiments showing the effect of Crl (5 μM) on the binding of His6-σ S Ty19 (left 625 nM) or His6-σ S WT (right 625 nM) to the immobilized RNAP core. The sensorgrams shown in blue and cyan correspond to the binding of σ S in the absence of Crl, and those in red and magenta correspond to that in the presence of Crl. No E-binding of His6-σ S Ty19 can be detected in the absence of Crl.

σ S Ty19 depends on Crl for binding to RNAP core.

In vitro transcription assays showed that the activity of σ S Ty19 is impaired and that this defect is rescued by Crl (Fig. ​ (Fig.3A). 3A ). Because Crl favors σ S binding to the RNAP core enzyme E, one possibility to explain this finding is that Crl compensates a low E-binding propensity of σ S Ty19.

To study the interaction between σ S Ty19 and E, we used a surface plasmon resonance (SPR) assay that we had previously setup (12). A monoclonal antibody specific for the C terminus of the RNAP α subunit (α-CTD which plays no role in the association of σ with E) was covalently immobilized to the dextran surface of a sensor chip and used to capture E noncovalently.

We observed that, in the absence of Crl, no binding of σ S Ty19 (up to 2.5 μM) to E could be detected (Fig. ​ (Fig.3B). 3B ). On the contrary, in the presence of Crl, the ability of σ S Ty19 to bind to E was restored to a level similar to that observed with the wild-type σ S protein in the absence of Crl (Fig. ​ (Fig.3B), 3B ), in agreement with the in vitro transcription data (Fig. ​ (Fig.3A). 3A ). Altogether, these results showed that σ S Ty19 is impaired for E binding and that this defect is alleviated by Crl.

G282 is located in a flexible loop in region 4 of σ S .

Sequence alignments of σ 70 family members, to which σ S belongs, have revealed that they are constituted of four conserved regions (regions 1 to 4, Fig. ​ Fig.4B) 4B ) (26, 29, 33). Among these, region 2 (subregions 2.1 to 2.4) and region 4 (subregions 4.1 and 4.2) contain DNA-binding domains that mediate recognition of the conserved � and � elements of σ 70 -dependent promoters, respectively. The linear division of σ 70 factors into functionally distinct regions has been largely confirmed by structural data, which revealed that primary sigma factors have four flexibly linked domains—σ1.1, σ2, σ3, and σ4𠅌ontaining regions 1.1, 1.2-2.4, 3.0-3.1, and 4.1-4.2, respectively (6, 7, 26, 29, 33) (Fig. ​ (Fig.4). 4 ). The σ1.1 region is unstructured in all available crystal structures. In addition, the linker between σ3 and σ4 corresponds to region 3.2 (Fig. ​ (Fig.4). 4 ). Regions 2 and 4 are not only involved in DNA binding but also contain critical determinants for binding the β′ and β subunits of the RNAP, respectively (1, 29) (Fig. ​ (Fig.5A). 5A ). The G282V substitution is located in region 4 of σ S Ty19 (Fig. ​ (Fig.4B 4B ).

Crl binds to σ S domain 2. (A) Model of the σ S structure. The structural domains σ2 (residues 53 to 163), σ3 (residues 164 to 216), linker (residues 217 to 244), and σ4 (residues 245 to 314) are represented in light green, yellow, orange, and blue, respectively. The G282 is represented in red. (B) BACTH analysis of Crl interactions with σ S , σ S Ty19, and truncated σ S proteins. A schematic representation of the four regions of the σ S protein showing highly conserved amino acid sequence with the σ 70 family members (7, 26) is shown at the top of the panel. The efficiencies of functional complementation between the indicated hybrid proteins were quantified by measuring β-galactosidase activities in E. coli BTH101 cells harboring the corresponding plasmids as described in Materials and Methods. The β-galactosidase activity was measured according to the method of Miller (27).

Position of the G282V substitution that affects the interaction of σ S Ty19 with the RNAP core enzyme. (A) Structure of the modeled σ S positioned in the T. thermophilus enzyme core. The αΙ, αII, β, β′, and ω subunits are represented in pale yellow, beige, brown, pink, and pale brown, respectively. The region corresponding to the β-flap is colored in orange, and the region corresponding to the β′ zinc finger in cyan. (B) Close-up view of the interaction between the core enzyme and σ4. A valine is shown at position 282 of the σ S .

In the Eσ 70 holoenzyme, the σ subunit stretches across the upstream face of the enzyme, making extensive contacts with subunits β and β′ of the RNAP and with its DNA recognition elements positioned to contact the promoter (29). A flexible flap domain of subunit β of the RNAP (β-flap), interacts with region 4 of σ 70 , involving mainly residues belonging to the flexible loop between helix H2 and helix H3 of σ4, positioning region 4.2 to interact with the � promoter element, the 㬜-terminal domain, and activators (11) (Fig. ​ (Fig.5). 5 ). This interaction depends on a hydrophobic patch on one face of the short helix stretch located at the tip of the flap domain, called the β-flap-tip helix (15). The contact between the β-flap-tip helix hydrophobic patch and the σ hydrophobic region is essential for the stable interaction of the β-flap-tip helix with the H2-H3 loop. The σ4 domain (region 4.1-4.2) is C-shaped, with a concave pocket coated with hydrophobic residues of region 4.1. In the holoenzyme, the β-flap-tip helix fits into this concave pocket (Fig. ​ (Fig.5 5 ).

In the structure of the modeled σ S , based on the structure of T. thermophilus σ 70 (47), the G282 residue is located at the top of the H2-H3 flexible loop (residues L280-E289, Fig. ​ Fig.4A). 4A ). When docked onto E, this loop lies in a cleft formed by the β-flap on one side and the β-zinc finger on the other side (Fig. ​ (Fig.5). 5 ). Interestingly, both the β-flap and the β′ zinc finger region sequences are highly conserved between Thermus thermophilus and Salmonella enterica. In contrast, the residues corresponding to G282 in σ 70 are either an aspartate or a methionine, whose side chains easily accommodate into the E cleft as observed in the T. thermophilus Eσ 70 holoenzyme (47). Substitution G282V could directly disrupt the interaction between σ S and E through two different mechanisms. First, G282V could modify the conformation of the H2-H3 loop by increasing its rigidity and thus reducing its capacity to interact with the β-flap. Second, the interaction of the loop with the β-flap could be sterically destabilized because of the presence of the valine side chain.

Crl binds to domain 2 of σ S .

The σ S domain involved in Crl binding had thus far remained unknown. To understand better the effect of the G282V substitution on the interaction between Crl and σ S , the bacterial two-hybrid system (BACTH system [19]) was used. In this system, the T25-σ S and Crl-T18 hybrid proteins were shown to interact, yielding levels of β-galactosidase activity higher than those detected in negative controls (Fig. ​ (Fig.4B) 4B ) (28). We previously showed that the first 71 residues of σ S were not required for Crl binding since the T25-σ S 72-330 and Crl-T18 chimeras interacted efficiently (28 and Fig. ​ Fig.4B). 4B ). The level of β-galactosidase activity detected with T25-σ S 72-330 was actually higher than with T25-σ S , which might be due to the higher expression of T25-σ S 72-330 compared to T25-σ S , as detected by immunoblotting with a polyclonal σ S antibody (28 data not shown) (Fig. ​ (Fig.4B). 4B ). For the T25-σ S Ty19 chimera, the levels of β-galactosidase activity were also higher than for the T25-σ S chimera (Fig. ​ (Fig.4B), 4B ), but in this case the amount of T25-σ S Ty19 detected by the σ S antibody was lower than that of T25-σ S (data not shown).

To determine whether the C-terminal domain of σ S interacts with Crl, truncated variants of the T25-σ S chimera were assessed in the BACTH. The T25-σ S 90-330, T25-σ S 169-330, and T25-σ S 238-330 chimeras did not yield significant β-galactosidase activity (Fig. ​ (Fig.4B), 4B ), although the protein amounts were similar to that of the T25-σ S 72-330 chimera, as assessed by immunodetection with the σ S antibody (data not shown). In contrast, the five chimeras T25-σ S 1-254, T25-σ S 72-254, T25-σ S 1-167, T25-σ S 56-167, and T25-σ S 72-167 yielded levels of β-galactosidase activity in the BACTH that were higher than that detected with the T25-σ S chimera, showing that they were able to interact with the Crl-T18 protein (Fig. ​ (Fig.4B). 4B ). These chimeras were barely or not detectable by the σ S polyclonal antibody (data not shown), suggesting either that their amounts were low or that these chimeras did not react efficiently with the σ S antibody. Altogether, these results demonstrated that amino acids 72 to 167 in σ S are sufficient for interaction with Crl. Interestingly, the G282V substitution in T25-σ S 90-330 and T25-σ S 238-330 did not allow them to interact with Crl-T18 (Fig. ​ (Fig.4B). 4B ). Thus, this substitution likely favors the interaction of T25-σ S Ty19 with Crl-T18 indirectly, through conformational changes of T25-σ S Ty19.

RpoSTy19 confers a competitive fitness to Salmonella, conditional to the crl status of the bacterial population.

One likely hypothesis to explain the appearance of the rpoSTy19 allele in natural isolates of S. Typhi is that this mutant allele confers a competitive fitness. We previously set up a survival assay with mixed populations of Salmonella in which the Δcrl mutation increased the competitive fitness of Salmonella in stationary phase (40). We used this assay to assess the competitive fitness of strains carrying the rpoSTy19 allele.

Two strains of Salmonella were mixed in equal cell numbers in LB liquid medium, and the numbers of each were monitored for several days (Fig. ​ (Fig.6B). 6B ). The rpoSTy19 mutant showed a competitive advantage during stationary phase over wild-type strain ATCC 14028 (Fig. 6Bc ). Two days after inoculation of the medium, rpoSTy19 mutant cells represented more than 80% of the total population. However, the gain of fitness afforded by the rpoSTy19 allele was lost in a population carrying a deletion of the crl gene. Indeed, in Δcrl context, strains carrying the rpoSTy19 allele were outcompeted by strains carrying a wild-type rpoS allele (Fig. 6Bd ). In a similar way, the ΔrpoS mutant was outcompeted by the wild-type strain (40) (Fig. ​ (Fig.6B, 6B , panel b), suggesting that the gain of fitness afforded by the rpoSTy19 mutation is conditional upon the presence of crl. Vice versa, the Δcrl mutant also showed a competitive advantage over the wild-type strain (Fig. 6Be ), but this gain of fitness was lost in populations carrying the rpoSTy19 allele (Fig. 6Bf and h ). In control experiments, wild-type strain ATCC 14028 showed a fitness similar to that of wild-type strain 2922K (Fig. 6Ba ), and the Km or Cm resistance cartridges, harbored by some of the strains, had no effect (Fig. 6Ba and g ) (40). Finally, we observed no significant differences when we compared the ability of wild-type and mutant strains to survive in monocultures under the same conditions (Fig. ​ (Fig.6A 6A ).

Survival and competitive fitness of Salmonella strains during stationary phase. (A) Survival in stationary-phase cultures in LB medium at 37ଌ. Cells from overnight LB cultures of S. Typhimurium (a) and S. Typhi (b) strains were washed, resuspended in PBS to an OD600 of 1.0, diluted in fresh LB medium, and incubated at 37ଌ with shaking. Aliquots of bacteria were removed at timed intervals and numbers of viable cells were determined on LB plates. One-hundred percent survival corresponds to the number of cells in cultures grown overnight (day 1). The results of representative experiments are shown. Similar results were obtained in repeat experiments. (B) Competition assays between S. Typhimurium (a to g) and S. Typhi (h) strains. Overnight LB cultures were washed and resuspended in PBS to an OD600 of 1.0. In each of the eight experiments, the two strains indicated were mixed in equal cell numbers in fresh LB medium to give a total of about 3,000 cells ml 𢄡 (time zero), and the mixtures were incubated at 37ଌ with shaking. Aliquots of bacteria were removed at timed intervals, and the numbers of viable cells of each strain were determined. The numbers of cells of each strain are reported as a percentage of the total number of viable cells in the culture.


7.19E: Regulation of Sigma Factor Translation - Biology

A single human gene can potentially yield a diverse array of alternative mRNA isoforms, thereby expanding both the repertoire of gene products and subsequently the number of alternative proteins produced. mRNAs with different exon combinations are transcribed from most (up to 90%) human genes, and can generate variants that differ in regulatory untranslated regions, or encode proteins with different sub-cellular localisations and functions 1 – 5 . Altered splicing patterns have been suggested as a new hallmark of cancer cells 6 – 8 , and in prostate cancer there is emerging evidence that expression of specific mRNA isoforms derived from cancer-relevant genes may contribute to disease progression 9 – 11 .

Androgen steroid hormones and the androgen receptor (AR) play a key role in the development and progression of prostate cancer, with alternative splicing enabling cancer cells to produce constitutively active ARs 11 – 13 . The AR belongs to the nuclear receptor superfamily of transcription factors, and is essential for prostate cancer cell survival, proliferation and invasion 14 – 16 . Classically, androgen binding promotes AR dimerization and its translocation to the nucleus, where it acts as either a transcriptional activator or a transcriptional repressor to dictate prostate specific gene expression patterns 17 – 23 . The major focus for prostate cancer therapeutics has been to reduce androgen levels through androgen deprivation therapy (ADT), either with inhibitors of androgen synthesis (for example, abiraterone) or with antagonists that prevent androgen binding to the AR (such as bicalutamide or enzalutamide) 24 . Although ADT is usually initially effective, most patients ultimately develop lethal castrate resistant disease for which there are limited treatment options 11 , 12 .

Androgens and other steroid hormones have also been associated with alternative splicing. Recent RNA-sequencing-based analysis of the androgen response of prostate cancer cells grown in vitro and within patients following ADT identified a set of 700 genes whose transcription is regulated by the AR in prostate cancer cells 25 . However, in addition to regulating transcriptional levels, steroid hormone receptors can control exon content of mRNA 10 , 26 – 29 . In prostate cancer androgens can modulate the expression of mRNA isoforms via pre-mRNA processing and promoter selection 9 , 10 , 18 , 30 . The AR can recruit the RNA binding proteins Sam68 and p68 as cofactors to influence alternative splicing of specific genes, and studies using minigenes driven from steroid responsive promoters indicate that the AR can affect both the transcriptional activity and alternative splicing of a subset of target genes 11 , 31 , 32 . Other steroid hormones also coordinate both transcription and splicing decisions 29 . The thyroid hormone receptor (TR) is known to play a role in coordinating the regulation of transcription and alternative splicing 27 , and the oestrogen receptor (ER) can both regulate alternative promoter selection and induce alternative splicing of specific gene sets that can influence breast cancer cell behaviour 28 , 33 – 35 .

In previous work we used exon level microarray analysis to identify 7 androgen dependent changes in mRNA isoform expression 10 . However, to what extent androgen-regulated mRNA isoforms are expressed in clinical prostate cancer is unclear. To address this, here we have used RNA-Sequencing data to globally profile alternative isoform expression in prostate cancer cells exposed to androgens, and correlated the results with transcriptomic data from clinical tissue. Our findings increase the number of known AR regulated mRNA isoforms by 10 fold and imply that pre-mRNA processing is an important mechanism through which androgens regulate gene expression in prostate cancer.

Cell culture was as described previously 25 , 36 . All cells were grown at 37ଌ in 5% CO 2. LNCaP cells (CRL-1740, ATCC) were maintained in RPMI-1640 with L-Glutamine (PAA Laboratories, R15-802) supplemented with 10% Fetal Bovine Serum (FBS) (PAA Laboratories, A15-101). For androgen treatment of cells, medium was supplemented with 10% dextran charcoal stripped FBS (PAA Laboratories, A15-119) to produce a steroid-deplete medium. Following culture for 72 hours, 10 nM synthetic androgen analogue methyltrienolone (R1881) (Perkin-Elmer, NLP005005MG) was either added (Androgen +) or absent (Steroid deplete) for the times indicated.

RNA-seq transcript expression analysis of previously generated data 25 was performed according to the Tuxedo protocol 37 . All reads were first mapped to human transcriptome/genome (build hg19) with TopHat 38 /Bowtie 39 , followed by per-sample transcript assembly with Cufflinks 40 . The mapped data was processed with Cuffmerge , Cuffdiff and Cuffcompare , followed by extraction of significantly differentially expressed genes/isoforms expression changes between cells grown with androgen and cells grown without androgens were assessed. Reference files for the human genome (UCSC build hg19) were downloaded from the Cufflinks pages: ( UCSC-hg19 package from June 2012 was used.). The software versions used for the analysis were: TopHat v1.4.1, SAM tools Version: 0.1.18 (r982:295), bowtie version 0.12.8 (64-bit) and cufflinks v1.3.0 (linked against Boost version 104000). The Tuxedo protocol 37 was carried out as follows: For steps 1𠄵, no parameters (except for paths to input/output files) were altered. In step 15, additional switches -s, -R, and -C were used when running cuffcompare. Steps 16� (extraction of significant results) were performed on the command line.

RNA extraction, RT–PCR and real-time PCR

Cells were harvested and total RNA extracted using TRIzol (Invitrogen, 15596-026) according to manufacturer's instructions. RNA was treated with DNase 1 (Ambion, AM2222) and cDNA was generated by reverse transcription of 500ng of total RNA using the Superscript VILO cDNA synthesis kit (Invitrogen, 11754-050). Alternative events were analysed by either reverse transcriptase PCR or real-time PCR. Exon profiles were monitored and quantified using the Qiaxcel capillary electrophoresis system (Qiagen) and percentage inclusion was calculated as described previously 10 . Real time PCR was performed in triplicate on cDNA using SYBR® Green PCR Master Mix (Invitrogen, 4309155) and the QuantStudio 7 Flex Real-Time PCR System (Thermo Fisher Scientific). Samples were normalised using the average of three reference genes, GAPDH, β -tubulin and actin. Ct values for each sample were calculated using SDS 2.4 software (Applied Biosystems) and relative mRNA expression was calculated using the 2-Δ㥌t method. All primer sequences are listed in Supplementary Table 1 . Raw Ct values are given in Dataset 1 41 .

The following commercial antibodies were used in the study: anti-RLN2 rabbit monoclonal (Abcam, ab183505 1:1000 dilution), anti-TACC2 rabbit polyclonal antibody (11407-1-AP, Proteintech 1:500 dilution), anti-NDUFV3 rabbit polyclonal antibody (13430-1-AP, Proteintech 1:500 dilution), anti-actin rabbit polyclonal (A2668, Sigma 1:2000 dilution), anti-α-Tubulin mouse monoclonal (Sigma, T5168 1:2000 dilution), normal rabbit IgG (711-035-152, Jackson labs 1:2000 dilution) and normal mouse IgG (715-036-150, Jackson labs 1:2000 dilution).

Gene ontology (GO) analysis of RNA-Seq data was carried out as described previously 42 . Enrichment of GO terms (with b500 annotations) was calculated using the goseq R package (version 1.18.0). Genes were considered significant at a p-value threshold of 0.05 after adjustment using the Benjamini-Hochberg false discovery rate.

Bioinformatic analysis of patient transcriptome data

Available clinical and processed RNA-Seq data from The Cancer Genome Atlas (TCGA) prostate adenocarcinoma (PRAD) cohort, comprising 497 tumour samples from as many patients with different stages / Gleason grades and 52 matched samples taken from normal prostate tissue (were downloaded from the Broad Institute TCGA Genome Analysis Center (Firehose 16/01/28 run https://doi.org/10.7908/C11G0KM9 43 ). Transcriptome data from the TCGA PRAD cohort were analysed for alternative isoform expression, with transcript models relying on TCGA GAF2.1, corresponding to the University of California, Santa Cruz (UCSC) genome annotation from June 2011 ( hg19 assembly ). This annotation encompassed 42 of the 73 androgen-regulated alternative mRNA isoform pairs identified. These were studied using two types of analysis: 1) differential transcript expression between tumour and normal prostate tissue and 2) correlation between isoform expression in tumour samples and Gleason score or tumour stage.

Differential isoform and gene expression analysis was performed on estimated read counts using the limma software R package (version 3.7) following its RNA-Seq analysis workflow 44 . This workflow was also used for differential isoform ratio analysis, relying on logit-transformed ratio (see below). An FDR-adjusted p-value of 0.05 for the moderated t-statistics was used as threshold for significance of differential expression. Individual isoform expression was estimated in TPM (transcripts per million mapped reads). The expression ratio, henceforth called PSI (percent spliced-in), of each annotated androgen-regulated isoform pair in each TCGA sample was calculated as the ratio between the expression of isoform 1 and the total expression of isoforms 1 and 2 combined, i.e. the sum of their expressions. For each isoform pair, ΔPSI is the difference of median PSI between the tumour and the normal groups of samples.

Two-tailed Spearman’s rank correlation tests were used to study the association between isoform expression and both Gleason score and tumour stage (these were used herein as numeric variables). An FDR-adjusted p-value of 0.05 was used as threshold for significance. Isoform expression differences between tumour and normal samples were considered equivalent to those detected in LNCaP cells under androgen stimulation when there was a statistically significant consistent change in the levels of the expected induced or repressed isoform (1 or 2), concomitant with no contradictory change in the PSI. Isoform “switches” were considered equivalent when there was a minimum (ΔPSI > 2.5%) and statistically significant consistent change in the PSI. Equivalent criteria were used to evaluate the equivalence between androgen-dependence and the associations with Gleason score and tumour stage.

Statistical analyses were conducted using the GraphPad Prism software (version 5.04/d). PCR quantification of mRNA isoforms was assessed using the unpaired student’s t-test.

Data is presented as the mean of three independent samples ± standard error of the mean (SEM). Statistical significance is denoted as * p ≤ 0.05, ** p ≤ 0.01, *** p ≤ 0.001 and **** p ≤ 0.0001.

Results Global identification of androgen-dependent mRNA isoform production in prostate cancer cells predicts a major role for alternative promoter utilisation

We analysed previously published RNAseq data from LNCaP cells 25 to globally profile how frequently androgens drive production of alternative mRNA isoforms in prostate cancer cells. This analysis identified a group of 73 androgen regulated alternative mRNA isoforms, which could be validated by visualisation on the UCSC Genome Browser 45 ( Table 1 ). 64 AR regulated mRNA isoforms were novel to this study. Experimental validation in an independent RNA sample set using RT-PCR confirmed 17/17 of these alternative events at the mRNA level ( Supplementary Figure 1 ). 73% of genes (53/73) with identified alternative androgen regulated mRNA isoforms also changed their overall expression levels in response to androgens ( Table 2 ). Some of the androgen regulated alternative events are in genes are already implicated in in either prostate cancer or other cancer types (summarised in Table 3 ). However, Gene Ontology analysis of these 73 genes did not identify any significantly enriched biological processes.

The 73 identified mRNA isoforms were generated via androgen-regulated utilisation of 56 alternative promoters, 4 alternative 3′ ends and 13 alternative splicing events ( Figure 1A ). Of the 56 androgen regulated alternative promoters that were identified, 23 alternative promoters were induced by androgens (including LIG4 , Figure 1B ), 26 promoters were repressed by androgens, and for 7 genes there was a switch in usage from one promoter to another ( Table 1 ). The alternative splicing events that were under androgen control included 12 alternative exons and one androgen-regulated intron retention ( Table 1 ). 10 of these are novel to this study, including exclusion of an alternative exon in ZNF678 ( Figure 1C ). Of the alternative exons, six genes contained switches in previously unannotated protein-coding exons in response to androgen-exposure. We also identified four androgen regulated alternative mRNA 3' end isoform switches, including a switch in the 3’ end of the mRNA transcript for the MAT2A gene ( Figure 1D ).

Figure 1. Global identification of androgen-dependent mRNA isoform production in prostate cancer cells predicts a major role for alternative promoter utilisation.

(A) Analysis of RNAseq data from LNCaP cells grown with (A+) or without androgens (R1881) (steroid deplete, SD) for 24 hours identified 73 androgen regulated alternative mRNA isoforms. The 73 alternative events were generated via androgen-regulated utilisation of 56 alternative promoters, 4 alternative 3' ends and 13 alternative splicing events. (B) Androgens drive a promoter switch in the LIG4 gene, which produces an mRNA isoform with an alternative 5’UTR. Visualisation of our LNCaP cell RNA-seq reads for the LIG4 gene on the UCSC genome browser identified a switch from promoter 1 to alternative promoter 2 in cells grown in the presence of androgens. Promoter 2 is predicted to produce a different 5’UTR without influencing the protein sequence (left panel). Quantitative PCR using primers specific to each promoter indicate that in response to androgens there is repression of promoter 1 and induction of promoter 2 (right panel). (C) Androgens drive alternative splicing of the ZNF678 gene. Visualisation of our LNCaP cell RNA-seq reads for the ZNF678 gene on the UCSC genome browser identified a switch to inclusion of a cassette exon in the presence of androgens. Inclusion of the alternative cassette exon in the ZNF678 gene is predicted to induce a switch to an alternative non-coding mRNA isoform (left panel). Quantitative PCR using primers in flanking exons confirmed increased inclusion of the alternative exon in LNCaP cells exposed to androgens (right panel). (D) Androgens promote selection of an alternative 3’ end for the MAT2A gene. Visualisation of our LNCaP cell RNA-seq reads for the MAT2A gene on the UCSC genome browser indicates a switch to reduced usage of an alternative 3’ end in the presence of androgens (left panel). Quantitative PCR using primers specific to each isoform confirmed down-regulation of an alternative 3’ end (p<0.01). Alternative 3’ ends for the MAT2A gene are predicted to produce proteins with different amino acid sequences and to influence a known Pfam domain (right panel).

Androgen regulated events control the production of alternative protein isoforms, non-coding RNAs and alternative 5' UTRs

48/73 (66%) of the androgen regulated alternative events detected in response to androgen stimulation are predicted to change the amino acid sequence of the resulting protein ( Table 1 ). Some of these are already known to have a well characterised role in prostate cancer progression, including an alternative promoter in the oncogene TPD52 that produces a protein isoform called PrLZ ( Figure 2A ) 46 – 49 . Others are not so well characterised. Using western blotting we could detect a novel shorter protein isoform corresponding to androgen-driven selection of an alternative promoter in the TACC2 gene ( Figure 2B ) and exclusion of a cassette exon in the NDUFV3 gene, which we show also produces a novel shorter protein isoform ( Figure 2C ). We also detected a switch in the 3' end of the mRNA transcript for the MAT2A gene, which is predicted to produce a protein isoform with a shorter C-terminal domain ( Figure 1D ) and induction of an alternative 3' isoform of CNNM2, which is predicted to be missing a conserved CBS domain ( Table 1 and Supplementary Figure 1 ).

Figure 2. Androgen regulated mRNA isoform switches control alternative protein isoforms and non-coding RNAs.

( A ) Androgens induce an alternative promoter in the oncogene TPD52 that produces an isoform called PrLZ. Visualisation of our LNCaP cell RNA-seq reads for the TPD52 gene on the UCSC genome browser identified a switch from promoter 1 to alternative promoter 2 in cells grown in the presence of androgens. Promoter 2 is known to produce an alternative protein isoform of TPD52 known as PrLZ (left panel). Quantitative PCR using primers specific to each promoter indicate an induction of the PrLZ isoform in response to androgens (middle panel). PrLZ has an alternative N-terminal amino acid sequence which results in an alternative protein isoform and disrupts a known Pfam domain (right panel). ( B ) Androgens induce an alternative promoter in the TACC2 gene that produces a novel alternative protein isoform. Visualisation of our LNCaP cell RNA-seq reads for the TACC2 gene on the UCSC genome browser identified a switch from promoter 1 to alternative promoter 2 in cells grown in the presence of androgens. Promoter 2 is predicted to produce an alternative shorter protein isoform of TACC2 (isoform 2) (left panel). Quantitative PCR using primers specific to each promoter indicate a switch from isoform 1 to isoform 2 in response to androgens (middle panel). Detection of TACC2 protein in LNCaP by western blotting (cells were grown with or without androgens for 24 or 48 hours). Tubulin was used as a loading control. Exposure to androgens for 48 hours induces expression of the alternative TACC2 protein isoform (right panel). ( C ) Androgens drive alternative splicing of the NDUFV3 gene. Visualisation of our LNCaP cell RNA-seq reads for the NDUFV3 gene on the UCSC genome browser identified a switch to exclusion of a cassette exon in the presence of androgens (left panel). Quantitative PCR using primers in flanking exons confirmed less inclusion of the alternative exon in LNCaP cells exposed to androgens (middle panel). Exclusion of the alternative cassette exon is predicted to produce an alternative protein isoform. Detection of NDUFV3 protein in LNCaP cells using western blotting (right panel). ( D ) Androgens suppress an alternative promoter in the RLN2 gene, which produces a shorter non-coding mRNA isoform. Visualisation of our LNCaP cell RNA-seq reads for the RLN2 gene on the UCSC genome browser identified a switch from promoter 1 to alternative promoter 2 in cells grown in the presence of androgens. Promoter 2 is predicted to produce an untranslated non-coding mRNA isoform (left panel). Quantitative PCR using primers specific to each promoter indicated a significant switch in promoter usage in response to androgens (middle panel). Detection of RLN2 protein in LNCaP by western blotting (cells were grown with or without androgens for 48 hours). Actin was used as a loading control. As seen previously 55 , androgens suppress RLN2 protein levels.

11 of the remaining identified androgen-regulated alternative events change the expression of mRNAs from coding to non-coding or untranslated (not predicted to produce a protein) ( Table 1 ). These included promoter switches for the RLN1 and RLN2 genes which encode peptide hormones that may be important in prostate cancer 5 , 50 – 55 . Androgens drive a promoter switch in both RLN1 and RLN2 to produce predicted non-coding or untranslated mRNA isoforms, reducing expression of protein-coding RLN1 and RLN2 mRNA isoforms. To test whether prostate cancer cells turn off gene expression by switching between utilisation of promoters that generate coding and noncoding mRNAs, we analysed RLN2 protein levels. Consistent with our hypothesis and a previous study 55 , RLN2 protein production was negatively regulated by androgens in parallel to the switch to the non-coding mRNA isoform ( Figure 2D ).

14 of the identified androgen-dependent mRNA isoforms lead to/result in coding mRNAs with altered 5’ untranslated regions (5′ UTR) with no impact on the coding sequence. These include a promoter switch in the LIG4 gene ( Figure 1B ).

Differential expression of androgen-dependent mRNA isoforms in prostate adenocarcinoma versus normal tissue

To investigate potential links between androgen-dependent mRNA isoforms and tumourigenesis, we analysed the expression of 41 androgen-regulated mRNA isoform pairs in clinical prostate adenocarcinoma and normal prostate tissues. This analysis utilised transcriptomic data from 497 tumour samples and 52 normal samples in the PRAD TCGA cohort 104 . The remaining isoform pairs identified within our dataset have not been previously annotated by UCSC, therefore it was not possible to include them in our comparison. A description of the cohort used is summarised in Table 4 .

Table 4. Description of the TCGA PRAD cohort.

Features Total Cases
Cohort 497 patients
Tumour 497
Normal 52 (w/tumour matched
sample available)
Gleason grade
6 50
7 287
8 67
9 140
10 4
Tumour stage
T2a 14
T2b 10
T2c 192
T3a 173
T3b 140
T4 12
Gleason grade (alternative gleason grade
grouping)
1 (primary +
secondary score ≤ 6)
50
2 (3 + 4) 171
3 (4 + 3) 123
4 (4 + 4) 93
5 (primary +
secondary score ≥ 9)
111

All tumours were hormone naive (not subject to ADT) at the time of sample collection

33 of the 42 mRNA isoform pairs exhibited significant differences in the expression of at least one of the isoforms, or in the isoform expression ratio between tumour and normal tissues ( Table 5 ). 13 of those tumour-specific alterations mimicked the effect of androgen stimulation in LNCaP cells: the changes were in form of alternative promoters for TACC2 , TPD52 , NUP93 , PIK3R1 , RDH13 , ZFAND6 , CDIP1 , YIF1B , LIMK2 , and FDFT1 an alternative 3´ end in CNNM2 and alternative exons in NDUFV3 and SS18 ( Figure 3 , Table 5 & Supplementary Figure 2 ). Two of the alternative promoters ( ZFAND6 and CDIP1 ) are predicted to introduce a change in the 5′UTR, whereas all the others are predicted to alter the resulting protein isoform. A number of mRNA isoforms that were androgen responsive in LNCaP cells showed tumour specific alterations opposite to the effect of androgen stimulation. These were LIG4 , MAPRE2 , OSBPL1A , SEPT5 , NR4A1 , and RCAN1 ( all predicted to alter the resulting protein isoform except LIG4 ). For the remaining 14 mRNA isoform pairs, the data was inconclusive according to the consistency conditions listed in the methods section ( Table 5 ).

Violin-boxplots of expression in transcripts per million mapped reads (TPM) of Isoforms 1 (left panel) and 2 (central panel), and of their expression ratio in PSI (right panel) in normal and tumour samples. The mean log2 fold-change (logFC) in expression between tumour and normal samples and the associated FDR-adjusted p-value for the moderated t-statistic of differential expression are shown for both isoforms (left and central panels). The mean difference in PSI (deltaPSI) between tumour and normal samples and the associated FDR-adjusted p-value for the Mann-Whitney U test of differential splicing are shown (right panel).

Changes in androgen-dependent mRNA isoform expression during tumour progression

We next investigated whether the identified androgen-dependent mRNA isoforms are differentially expressed during prostate cancer progression by correlating the expression levels of each isoform with Gleason scores and prostate tumour grades within the PRAD TCGA cohort ( Figure 4 & Figure 5 , Table 6 & Table 7 and Supplementary Figure 3 & Supplementary Figure 4 ). For 6 of the alternative mRNA isoforms responsive to androgens (made from alternative promoters in LIG4, OSBPL1A, CLK3, TSC22D3 & ZNF32 and utilising an alternative exon in ZNF121 ), the expression changed significantly with Gleason score and showed specific alterations consistent with the effect of androgen stimulation. Conversely, 9 alternative isoforms (which were androgen responsive in LNCaP cells) showed tumour specific alterations opposite to the effect of androgen stimulation (including an alternative promoters in NUP93 and the alternative 3୎nd of MAT2A) . 3 androgen regulated mRNA isoforms ( OSBPL1A , CLK3 and TSC22D3 ) change significantly with both Gleason grade and tumour stage.

Figure 4. Differential alternative mRNA isoform expression in the TGCA PRAD cohort across different Gleason grades for OSBPL1A , CLK3 , TSC22D and ZNF121 .

Violin-boxplots of expression in transcripts per million mapped reads (TPM) of Isoforms 1 (left panel) and 2 (central panel), and of their expression ratio (right panel) by Gleason grade. Their respective Spearman’s correlation coefficient (Rho) with grade and associated FDR-adjusted p-value are shown.

Figure 5. Differential alternative mRNA isoform expression in the TGCA PRAD cohort across different tumour stages for OSBPL1A , CLK3 and TSC22D3 .

Violin-boxplots of expression in transcripts per million mapped reads (TPM) of Isoforms 1 (left panel) and 2 (central panel), and of their expression ratio (right panel) by tumour stage. Their respective Spearman’s correlation coefficient (Rho) with stage and associated FDR-adjusted p-value are shown.

The main function of the androgen receptor (AR) is as a DNA binding transcription factor that regulates gene expression. Here we show the AR can couple hormone induced gene transcription to alternative mRNA isoform expression in prostate cancer. In response to androgens, the AR can induce the use of alternative promoters, induce the expression of alternatively spliced mRNA isoforms, regulate the expression of non-coding mRNA transcripts, and promote the transcription of mRNA isoforms encoding different protein isoforms. Importantly, we also find that some of these alternative mRNA isoforms are differentially regulated in prostate cancer versus normal tissue and also significantly change expression during tumour progression. Our data suggest that most androgen regulated alternative mRNA isoforms are generated through alternative promoter selection rather than through internal alternative exon splicing mechanisms. This suggests expression of alternative isoforms of specific genes can be a consequence of RNA polymerase being recruited to different promoters in response to androgen stimulation. Alternative promoter usage has been observed for many genes and is believed to play a significant role in the control of gene expression 4 , 105 , 106 . Alternative promoter use can also generate mRNA isoforms with distinct functional activities from the same gene, sometimes having opposing functions 11 .

Androgen exposure further drives a smaller number of alternative splicing events suggesting that the AR could contribute to altered patterns of splicing in prostate cancer cells. Tumour progression is believed to be associated with a coordinated change in splicing patterns which is regulated by several factors including signalling molecules 7 . We also identified 4 AR regulated alternative mRNA 3′ end isoform switches. This is the first time that regulation of 3′ mRNA end processing has been shown to be controlled by androgens. The selection of alternative 3′ ends can produce mRNA isoforms differing in the length of their 3′ UTRs (which can lead to the inclusion or exclusion of regulatory elements and influence gene expression), or in their C-terminal coding region (which can contribute to proteome diversity) 107 – 114 . Defective 3′ mRNA processing of numerous genes has been linked to an oncogenic phenotype 115 – 119 , and the 3′ mRNA end profiles of samples from multiple cancer types significantly differ from those of healthy tissue samples 115 , 119 – 121 .

Based on the findings presented in this study, we propose that activated AR has the ability to coordinate both transcriptional activity and mRNA isoform decisions through the recruitment of co-regulators to specific promoters. The genomic action of the AR is influenced by a large number of collaborating transcription factors 122 – 124 . Specifically, Sam68 and p68 have been shown to modulate AR dependent alternative splicing of specific genes and are significantly overexpressed in prostate cancer 31 , 32 . In future work it will be important to define the role of specific AR co-regulators in AR mediated isoform selection.

Some of the androgen dependent mRNA isoforms identified here are predicted to yield protein isoforms that may be clinically important, or to switch off protein production via generation of noncoding mRNA isoforms. Although the functional significance of the alternative mRNA isoforms identified in this study is yet largely unexplored, as is their role in the cellular response to androgens, the presented results emphasize the importance of analysing gene regulation and function at the mRNA isoform level.

The data referenced by this article are under copyright with the following copyright statement: Copyright: � 2018 Munkley J et al.

Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

The RNASeq data from LNCaP cells has been published previously https://doi.org/10.1016/j.ebiom.2016.04.018 25

The RNAseq custom tracks are available in Supplementary File 1 . To view these files please load them onto the UCSC website using the ‘My data’ tab and 𠆌ustom tracks’. Then ‘Paste URLs or data’. The data is aligned to Feb 2009 (GRCh37/hg19).

Prostate adenocarcinoma cohort RNA-Seq data was downloaded from the Broad Institute TCGA Genome Analysis Center: Firehose 16/01/28 run https://doi.org/10.7908/C11G0KM9 43

Dataset 1: Real-time PCR raw Ct values 10.5256/f1000research.15604.d212873 41

Dataset 2: Raw unedited western blot images 10.5256/f1000research.15604.d212874 125


(III) Nucleic Acids Encoding RNA-Guided Endonucleases or Fusion Proteins

Another aspect of the present disclosure provides nucleic acids encoding any of the RNA-guided endonucleases or fusion proteins described above in sections (I) and (II), respectively. The nucleic acid can be RNA or DNA. In one embodiment, the nucleic acid encoding the RNA-guided endonuclease or fusion protein is mRNA. The mRNA can be 5′ capped and/or 3′ polyadenylated. In another embodiment, the nucleic acid encoding the RNA-guided endonuclease or fusion protein is DNA. The DNA can be present in a vector (see below).

The nucleic acid encoding the RNA-guided endonuclease or fusion protein can be codon optimized for efficient translation into protein in the eukaryotic cell or animal of interest. For example, codons can be optimized for expression in humans, mice, rats, hamsters, cows, pigs, cats, dogs, fish, amphibians, plants, yeast, insects, and so forth. Programs for codon optimization are available as freeware. Commercial codon optimization programs are also available.

In some embodiments, DNA encoding the RNA-guided endonuclease or fusion protein can be operably linked to at least one promoter control sequence. In some iterations, the DNA coding sequence can be operably linked to a promoter control sequence for expression in the eukaryotic cell or animal of interest. The promoter control sequence can be constitutive, regulated, or tissue-specific. Suitable constitutive promoter control sequences include, but are not limited to, cytomegalovirus immediate early promoter (CMV), simian virus (SV40) promoter, adenovirus major late promoter, Rous sarcoma virus (RSV) promoter, mouse mammary tumor virus (MMTV) promoter, phosphoglycerate kinase (PGK) promoter, elongation factor (ED1)-alpha promoter, ubiquitin promoters, actin promoters, tubulin promoters, immunoglobulin promoters, fragments thereof, or combinations of any of the foregoing. Examples of suitable regulated promoter control sequences include without limit those regulated by heat shock, metals, steroids, antibiotics, or alcohol. Non-limiting examples of tissue-specific promoters include B29 promoter, CD14 promoter, CD43 promoter, CD45 promoter, CD68 promoter, desmin promoter, elastase-1 promoter, endoglin promoter, fibronectin promoter, Flt-1 promoter, GFAP promoter, GPIIb promoter, ICAM-2 promoter, INF-β promoter, Mb promoter, Nphsl promoter, OG-2 promoter, SP-B promoter, SYN1 promoter, and WASP promoter. The promoter sequence can be wild type or it can be modified for more efficient or efficacious expression. In one exemplary embodiment, the encoding DNA can be operably linked to a CMV promoter for constitutive expression in mammalian cells.

In certain embodiments, the sequence encoding the RNA-guided endonuclease or fusion protein can be operably linked to a promoter sequence that is recognized by a phage RNA polymerase for in vitro mRNA synthesis. In such embodiments, the in vitro-transcribed RNA can be purified for use in the methods detailed below in sections (IV) and (V). For example, the promoter sequence can be a T7, T3, or SP6 promoter sequence or a variation of a T7, T3, or SP6 promoter sequence. In an exemplary embodiment, the DNA encoding the fusion protein is operably linked to a T7 promoter for in vitro mRNA synthesis using T7 RNA polymerase.

In alternate embodiments, the sequence encoding the RNA-guided endonuclease or fusion protein can be operably linked to a promoter sequence for in vitro expression of the RNA-guided endonuclease or fusion protein in bacterial or eukaryotic cells. In such embodiments, the expressed protein can be purified for use in the methods detailed below in sections (IV) and (V). Suitable bacterial promoters include, without limit, T7 promoters, lac operon promoters, trp promoters, variations thereof, and combinations thereof. An exemplary bacterial promoter is tac which is a hybrid of trp and lac promoters. Non-limiting examples of suitable eukaryotic promoters are listed above.

In additional aspects, the DNA encoding the RNA-guided endonuclease or fusion protein also can be linked to a polyadenylation signal (e.g., SV40 polyA signal, bovine growth hormone (BGH) polyA signal, etc.) and/or at least one transcriptional termination sequence. Additionally, the sequence encoding the RNA-guided endonuclease or fusion protein also can be linked to sequence encoding at least one nuclear localization signal, at least one cell-penetrating domain, and/or at least one marker domain, which are detailed above in section (I).

In various embodiments, the DNA encoding the RNA-guided endonuclease or fusion protein can be present in a vector. Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/mini-chromosomes, transposons, and viral vectors (e.g., lentiviral vectors, adeno-associated viral vectors, etc.). In one embodiment, the DNA encoding the RNA-guided endonuclease or fusion protein is present in a plasmid vector. Non-limiting examples of suitable plasmid vectors include pUC, pBR322, pET, pBluescript, and variants thereof. The vector can comprise additional expression control sequences (e.g., enhancer sequences, Kozak sequences, polyadenylation sequences, transcriptional termination sequences, etc.), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like. Additional information can be found in “Current Protocols in Molecular Biology” Ausubel et al., John Wiley & Sons, New York, 2003 or “Molecular Cloning: A Laboratory Manual” Sambrook & Russell, Cold Spring Harbor Press, Cold Spring Harbor, N.Y., 3 rd edition, 2001.

In some embodiments, the expression vector comprising the sequence encoding the RNA-guided endonuclease or fusion protein can further comprise sequence encoding a guide RNA. The sequence encoding the guide RNA generally is operably linked to at least one transcriptional control sequence for expression of the guide RNA in the cell or embryo of interest. For example, DNA encoding the guide RNA can be operably linked to a promoter sequence that is recognized by RNA polymerase III (Pol III). Examples of suitable Pol III promoters include, but are not limited to, mammalian U6, U3, H1, and 7SL RNA promoters.


Methods

Experimental animals

All experiments utilized C57BL/6J female mice, between 6 and 12 weeks of age, from The Jackson Laboratory. The Jackson Laboratory’s animal care and use committee (ACUC) approved all experiments.

MA9 Retroviral Supernatant Production

Retroviral supernatant was generated as described 20 . Briefly, transient transfection of 10cm plates of HEK293T cells with 10 μg pMSCV-IRES-MLL-AF9-GFP 20 and 10 μg π ecotropic packaging vector was performed using a Calcium Phosphate Transfection Kit (Invitrogen). After 24 h, media was replaced with fresh growth media. Retroviral supernatant was harvested 24 h later, filtered through a 0.45-μm polyvinylidene difluoride syringe filter, and utilized directly for transduction of primary cells.

Primary cell isolation and retroviral infection

Single-cell suspensions of bone marrow were prepared by filtering crushed, pooled femurs, tibiae and iliac crests from each mouse. Bone marrow mononuclear cells were isolated by Ficoll-Paque (GE Healthcare) density centrifugation and lineage depletion was performed using the Biotin Mouse Lineage Panel (BD Biosciences), including antibodies against CD3ɛ, CD11b, B220, Gr-1 and Ter-119, with streptavidin M-280 dynabeads (Invitrogen). Lineage-depleted bone marrow cells were stained with a combination of fluorochrome-conjugated antibody clones from eBioscience or BD Biosciences: c-Kit (clone 2B8), Sca-1 (clone E13-161.7), CD150 (clone mShad150), CD34 (clone RAM34), FcγR (clone 2.462), mature lineage (Lin) marker mix (B220, CD11b, CD4, CD8, Ter-119, Gr-1) and the viability stain propidium iodide (PI). Stained cells were sorted using a FACSAria with Diva software (BD) based on the following surface marker profiles: LT-HSC (Lin − Sca-1 + c-Kit + CD150 + CD34 −/lo ), ST-HSC (Lin − Sca-1 + c-Kit + CD150 + CD34 + ), MPP (Lin − Sca-1 + c-Kit + CD150 − CD34 + ), CMP (Lin − Sca-1 − c-Kit + CD150 − CD34 + FcγR lo ) and GMP (Lin − Sca-1 − c-Kit + CD150 − CD34 + FcγR + ). Purity of sorted cells was found to be ≥95%. For transduction, cells were suspended in fresh retroviral supernatant supplemented with 10% fetal calf serum (Sigma), 100 ng ml −1 murine stem cell factor (mSCF), 10 ng ml −1 mIL-3, 10 ng ml −1 mIL-6 (Peprotech) and 4 μg ml −1 polybrene, spun at 2,500 r.p.m. for 1 h at 32 °C, then transferred to an incubator at 37 °C and 5% CO2. Cells were collected after 24 h and resuspended in fresh retroviral supernatant for a second spinfection, as above. 24 h after second spinfection, GFP + cells were sorted by FACSAria and plated into in vitro culture or directly transplanted in vivo.

In vitro culture of MA9-Transformed cells

GFP + cells were plated into methocult GF M3434 (Stem Cell Technologies). After 7 days of culture, colonies were counted and pooled, and then 10 4 cells were replated in the same medium. For the third round of culture, 5 × 10 3 cells were plated. At the end of the third round, single colonies were plucked from methylcellulose and transferred to liquid culture containing Iscove’s Modified Dulbecco’s Media (IMDM) (Invitrogen) with 10% fetal calf serum (Sigma), 50 ng ml −1 mSCF, 10ng ml −1 mIL-3 and 10 ng ml −1 mIL-6 (Peprotech). Cultures were maintained at a density ≤1 × 10 6 cells per ml at 37 °C and 5% CO2.

In vivo transplantation

Recipient mice were sublethally irradiated (600 rads, 137 Cs) and intravenously injected with 10 3 , 10 4 , or 10 5 cells from primary leukaemia cell lines, or 500 GFP + transduced primary cells.

Analysis of leukaemic mice

Transplanted recipient mice were monitored every 4 weeks by peripheral blood sampling and differential blood count using a KX21-N Automated Hematology Analyser (Sysmex). Mice demonstrating elevated white blood cell counts and declining health status were sacrificed, and peripheral blood, spleen and bone marrow were harvested. Single-cell suspensions of peripheral blood, spleen and bone marrow were analysed by flow cytometry for GFP expression (BD FACS Calibur), and bone marrow was stained with fluorochrome-conjugated antibodies against c-Kit (clone 2B8), Sca-1 (clone E13-161.7), CD150 (clone mShad150), CD34 (clone RAM34), FcγR (clone 2.462), and mature lineage (Lin) marker mix (B220, CD11b, CD4, CD8, Ter-119 and Gr-1) to assess frequency of LSCs (L-GMP GFP + Lin − Sca-1 − c-Kit + CD150 − CD34 + FcγR + ) using a LSRII (BD). Flow cytometry data was analysed using FlowJo software (TreeStar).

RNA-seq library construction and analysis

50,000 bulk leukaemia (GFP + ) cells from the spleen of recipient mice were sorted directly into 350 μl of RLT buffer (Qiagen) and flash-frozen. In addition, primary mouse ST-HSC (3,000–10,000), MPP (10,000–100,000), CMP (8,000–50,000) and GMP (10,000–50,000) were isolated as described above, sorted directly into 350 μl of RLT buffer and flash-frozen. Three to six independent biological replicates of bulk leukaemia cells and normal cells were sampled. Total RNA was isolated according to manufacturer’s protocols (Qiagen) including DNase treatment, and quality was assessed using an Agilent 2100 Bioanalyzer and RNA 6000 Nano kit. RNA samples were processed using the NuGen Ovation RNA-Seq V2 kit. Amplified complementary DNA was sheared to approximately ∼ 300 bp using a Covaris E220 Focused Ultrasonicator. RNA-seq library preparation used the TruSeq DNA sample prep kit v2 (Illumina). Sheared DNA was end repaired to generate blunt ended fragments. Magnetic bead purification was used to enrich end repaired DNA that was >100 bp. Purified fragments were ‘A’ tailed and ligated to Illumina Y-adaptors containing a ‘T’ overhang and unique indices for multiplexing. Ligated fragments were purified on magnetic beads followed by PCR amplification to provide Illumina flow cell compatible sequences on the ends. PCR products were purified using magnetic beads followed by QC size distribution analysis using the Agilent 2100 Bioanalyzer and the DNA 1000 chip assay. Average library size was 350 bp with insert sizes being ∼ 230 bp. Quantitative PCR using the Library Quantitation kit (Kapa Biosystems) was used to estimate library concentrations. Libraries were sequenced on the Illumina HiSeq 2000 platform at a sequencing depth of >35 million reads per sample. Transcript abundances were estimated for each RNA-seq sample using RSEM 44 . PCA plot to visualize similarity in transcriptome profiles was computed after log-transforming trimmed mean of M-values (TMM) normalized count data 45 . Unrooted RNA-seq dendrograms were generated from Pearson correlation coefficients derived from normalized read counts (log2 transformed) and visualized with APE as.phylo(). Resultant edge lengths were normalized for legibility. Read counts estimated for each gene by RSEM were given as input to the R package edgeR for differential expression analysis 46 . Differentially expressed genes were clustered using partition k-medioids algorithm available in the R package cluster 47 . The value of k was chosen based on the number of comparisons that resulted in clustering differentially expressed genes at a false discovery rate<5%. Clustered gene expression profiles were visualized using heatmap.2 function in R package gplots. In populations with more than 2 replicates, we chose to display the pair with the greatest correlation. Single-nucleotide variants and small insertions and deletions were identified using FreeBayes, a Bayesian genetic variant detector 22 . Variants detected using this method were filtered based on read depth and mapping quality to reduce false positive rate 23 . Germline SNPs identified in C57BL/6 mice using the Sanger Institute SNP resource 24 and SNPs identified in any of our matched normal cell RNA-seq data were removed. To test the clinical significance of our ST-HSC-derived versus progenitor-derived gene expression signature in human AML patient data, we used TCGA gene expression signatures of 200 adult de novo AML patients 21 . Prognostic significance of each gene was computed by classifying TCGA samples into two groups based on expression profile of the gene. A sample with expression level higher than a threshold (median+0.5 × mad) was considered high and expression below the threshold (median−0.5 × mad) was considered low. Cox proportional hazard model was used to test the association between the expression level of the gene and clinical outcome.

Real-time PCR

Semi-quantitative real-time PCR was performed using RT2 SYBR Green mix (Qiagen) on an ABI Biosystems 7500 and Ct values were normalized to GAPDH levels. The primers used were: Gapdh: 5′-CGTCCCGTAGACAAAATGGT-3′ and 5′-CTCCTGAAAGATGGTGATGG-3′, MA9: 5′-TGTGAAGCAGAAATGTGTGG-3′ and 5′-TGCCTTGTCACATTCACCAT-3′.

ATAC-seq library construction and analysis

Cryopreserved leukaemic spleen and/or bone marrow samples were thawed, and FACSAria (BD) sorted 50 K bulk leukaemia (GFP+) cells or LSCs (L-GMP GFP+ Lin- Sca-1- c-Kit+ CD34+ FcγR+). Samples were placed at 37 °C and 5% CO2 in IMDM plus 10% fetal calf serum for 30 min, then harvested, washed with 1 × PBS, and ATAC-seq libraries were prepared as previously described 27 . Quantitative PCR using the Library Quantitation kit (Kapa Biosystems) was used to estimate library concentrations. Libraries were sequenced on the Illumina HiSeq 2000 platform generating 2 × 150 bp paired end reads at a sequencing depth of ∼ 30–50 million reads per sample. Reads were aligned to the mouse mm9 assembly using bowtie2. Only reads that uniquely mapped to the genome were used in subsequent analysis. Duplicate reads were eliminated to avoid potential PCR amplification artifacts and to eliminate mitochondrial DNA duplicates. Post-alignment filtering resulted in ∼ 20–30 million uniquely aligned singleton reads per library. Samples with <10 million uniquely aligned reads were considered noisy and were excluded from the analysis. ATAC-seq peaks in each sample were identified using MACS2 48 with the following settings: MACS2-2.1.0.20140616/bin/macs2 callpeak -t <input tag file> -f BED -n <output peak file> -g 'mm' --nomodel --shift -100 --extsize 200 -B –broad. Peak annotation was performed via HOMER v4.6 annotatePeaks.pl for the mm9 assembly. Previously published ATAC-seq libraries from LSK, CMP and GMP populations were downloaded from GEO (GEO: GSE59992) 25 and processed as described above. Heatmap presentation of cell type correlation was computed via Pearson correlation matrix of HOMER peak scores, following bedtools intersect replicate merging and visualized with gplots heatmap.2(). ATAC-seq data were visualized on the UCSC genome browser 49 (http://genome.ucsc.edu/) with mouse mm9 assembly, along with bed files from ChIP-seq analysis of H3K4me1 and H3K27ac in mouse C57BL/6 bone marrow (ENCODE) 28 and H3K27ac in a MA9/Nras G12D murine AML cell line (RN2) (GEO GSM1262348) 29 . RNA-seq data subset on promoter ATAC-seq peak score utilized log2(FPKM+1) and was generated by edcf() function.

Analysis of human AML enhancers and DNA methylation

Lift-over of cell of origin–specific open chromatin peaks from the mouse (mm9) to the human (hg19) genome was performed using UCSC liftOver tool. These loci were assessed for enhancer potential by comparison with the FANTOM5 enhancer database 34,35 and H3K27ac ChIP-seq data of the human MA9 cell line MOLM14 (GEO GSM1587893) 36 using Bedtools. To assess prognostic significance of DNA methylation status surrounding these loci, regions of ±10 kb from the centre of each open chromatin peak were examined. All CpG probes on the Infinium 450 K Human Methylation BeadChip array that fell within these regions were tested individually for prognostic significance in human AML TCGA data 21 . Methylation status of each sample, for each probe, was classified as high (β>0.8), partial (0.2<β<0.8) or low (β<0.2). Cox proportional hazard model was used to test association of methylation status and clinical outcome.

Statistical analyses

For overall survival, log-rank (Mantel-Cox) test was performed on Kaplan–Meier survival curves. A minimum of five mice per sample type and cell dose was chosen to detect a hazards ratio of >3 with 80% power at an error rate of 5%. On the basis of pre-established criteria, animals were excluded from the analysis if they did not demonstrate >1% detectable GFP + cells in the peripheral blood by 1 month post-transplant. Recipient mice were randomized for which cell line or primary AML cell type they were transplanted with. Analysis of mice was not blinded during the experiment. Limiting dilution analysis was performed with ELDA software 50 . Statistical analysis of non-survival data was performed by non-parametric one-way analysis of variance (Kruskal–Wallis test) followed by Dunn’s multiple comparisons test, or two-way analysis of variance followed by Tukey’s multiple comparisons test. For these experiments, a minimum of 3 samples per condition was chosen to detect statistical differences with >80% power at an error rate of 5%. Variance was similar between the groups that were statistically compared. Pearson’s correlation coefficient calculation was used to calculate overall similarity between RNA-seq transcriptomes or ATAC-seq global open chromatin signatures. Pearson’s χ 2 -tests were used for determining significance of the enrichment of prognostic data from published TCGA human de novo AML data.

Data availability

RNA-seq and ATAC-seq data can be found in GEO under accession number GSE74691. All other relevant code is available from the authors on request.