Are there any viruses that integrate their DNA into organellar DNA?

Are there any viruses that integrate their DNA into organellar DNA?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

It is known that many viruses (e.g. retroviridae) integrate in the nuclear genome of their host as part of their cycle. However, I'd like to know if integration can happen in organellar DNA (cpDNA and mtDNA) as well.

I've found a paper proposing that some chloroplast genes have a viral origin; however, the virus would have been present into the ancestral alpha-proteobacterium that later became the chloroplast, so no direct chloroplast infection is involved.

There is also this almost identical question on ResearchGate, but frankly the answers there are not satisfactory.

I'm only aware of one paper that claims to show regular viral integration into mitochondrial DNA, and even there they suggest it might be a host defense rather than a viral life cycle thing:

Interestingly, 69.4% (50/72) of HBV integration sites were observed in human mitochondrial DNA (mtDNA) (Supplementary Table 2)… However, 32% of HBV integration sites in mtDNA (16/50) were in the displacement loop (D-loop) region, which is known as the major control site for mtDNA expression as it contains the origin of replication for the heavy DNA strand and the major promoters of transcription (Figure 1B). Of note, we repeatedly detected mtDNA integration events through the same micro-homology site in independent mice, indicating the hotspot for HBV integration in mtDNA, which is probably related to replication or transcription…

We found that HBV integrations were mediated through MMEJ, which is known as an alternative DNA double-strand break (DSB) repair system and reported as a principal mediator during mitochondrial DNA lesions. This suggests that early-phase HBV integration occurred in mtDNA. Damage in mtDNA is known to induce autophagy and was removed in C. elegans, therefore it is possible that integration of HBV DNA into mtDNA might be a defensive mechanism to protect the nucleus from HBV integration.

--Characterization of HBV integration patterns and timing in liver cancer and HBV-infected livers

I would like to see this replicated and extended before I spent too much time wondering about functions and mechanisms.

Virus Weaves Itself Into The DNA Transferred From Parents To Babies

Parents expect to pass on their eye or hair color, their knobby knees or their big feet to their children through their genes. But they don't expect to pass on viruses through those same genes.

New research from the University of Rochester Medical Center shows that some parents pass on the human herpes virus 6 (HHV-6) to their children because it is integrated into their chromosomes. This is the first time a virus has been shown to become part of the human DNA and then get passed to subsequent generations. This unique mode of congenital infection may be occurring in as many as 1 of every 116 newborns, and the long-term consequences for a child's development and immune system are unknown.

"At this point, we know very little about the implications of this type of infection, but the section of the chromosome into which the virus appears to integrate is important to the maintenance of normal immune function," said Caroline Breese Hall, M.D., professor of Pediatrics and Medicine at the University of Rochester Medical Center, and author of the study which publishes in Pediatrics this month. "With further study, we hope to discern whether this type of infection affects children differently than children infected after birth."

HHV-6 causes roseola, an infection that is nearly universal by 3 years of age. The typical roseola syndrome produces several days and up to a week of a high fever and may have variable other symptoms including mild respiratory and gastrointestinal symptoms. With roseola, just as the fever breaks, the child may briefly develop a rash. A congenital infection of HHV-6 &ndash or one that is present at birth &ndash produces high levels of virus in the body but scientists (doctors) do not know whether it produces any developmental or immune system problems.

Some congenital infections can cause serious problems in fetuses. If a mother contracts cytomegalovirus (CMV) while pregnant, her fetus is at risk of hearing or vision loss, developmental disabilities and problems with the lungs, liver and spleen. Some of those health problems don't show up until months or years after birth. HHV-6 virus is a closely related virus to CMV, and the congenital infection rate of CMV is similar to that of congenital HHV-6 &ndash about 1 percent. However, this research shows that a congenital HHV-6 infection differs greatly from a congenital CMV infection in that it is often integrated into the chromosomes of the baby rather than passed through the placenta.

"This is the first time a herpes virus has been recognized to integrate into the human genome. To think that it's actually a part of us &ndash that's really fascinating," said Mary Caserta, M.D., associate professor of Pediatrics at the University of Rochester Medical Center and one of the paper's authors. "This opens up a whole new realm of exploration."

Of 254 children enrolled in this study between July 2003 and April 2007, 43 had congenital HHV-6 infections based on cord blood samples. Of 211 children without congenital infection, 42 were children who acquired an HHV-6 infection during the study. Of the infants who had congenital infections, 86 percent of them (37) had the virus integrated into their chromosomes. Only six of the congenitally infected babies were infected by the mother through the placenta .

Children who had integrated HHV-6 had higher levels of virus in the body than those who were infected through the placenta. HHV-6 DNA was found in the hair of one parent of all children with integrated virus with available parental samples (18 mothers and 11 fathers), which means the children acquired the integrated infections through their mother's egg or father's sperm at conception. The virus's DNA was not found in hair samples of parents of children who were infected after birth.

This study is part of a series of ongoing studies on children with HHV-6 infections at the University's Golisano Children's Hospital at Strong.

This study was funded by grants from the National Institute of Child Health and Development and, in part, by grants from the General Clinical Research Center, the National Center for Research Resources, the National Institutes of Health and the HHV-6 Foundation.

Story Source:

Materials provided by University of Rochester Medical Center. Note: Content may be edited for style and length.

Our complicated relationship with viruses

When viruses infect us, they can embed small chunks of their genetic material in our DNA. Although infrequent, the incorporation of this material into the human genome has been occurring for millions of years. As a result of this ongoing process, viral genetic material comprises nearly 10 percent of the modern human genome. Over time, the vast majority of viral invaders populating our genome have mutated to the point that they no longer lead to active infections. But, as scientists funded by the National Institutes of Health have demonstrated, they are not entirely dormant.

Sometimes, these stowaway sequences of viral genes, called "endogenous retroviruses" (ERVs), can contribute to the onset of diseases such as cancer. They can also make their hosts susceptible to infections from other viruses. However, scientists have identified numerous cases of viral hitchhikers bestowing crucial benefits to their human hosts -- from protection against disease to shaping important aspects of human evolution, such as the ability to digest starch.

Protecting Against Disease

Geneticists Cedric Feschotte, Edward Chuong and Nels Elde at the University of Utah have discovered that ERVs lodged in the human genome can jump start the immune system.

For a virus to successfully make copies of itself inside a host cell, it needs molecular tools similar to the ones its host normally uses to translate genes into proteins. As a result, viruses have tools meticulously shaped by evolution to commandeer the protein-producing machinery of human cells.

Feschotte and his team recognized that because viruses tend to attack the immune system, they may be particularly adept at manipulating immune system genes. Ancient human genomes may have evolved in response. Feschotte believes it is possible that the genomes of humans (or our ancient ancestors) repurposed viral DNA for their own defense, using it to spur the immune system into action against viruses and other foreign invaders.

"We hypothesized that these ERVs were likely to be primary players in regulating immune activity because viruses themselves evolved to hijack the machinery to control immune cells," says Feschotte.

To investigate their hypothesis, Feschotte and his team used a gene-editing technique called CRISPR to systematically eliminate individual ERV sequences in human cells. After removing one of the sequences, the researchers observed a notable weakening of immune function when the cells were challenged by viral infection. The removal of three other ERV sequences also compromised the immune response.

These findings suggest that each of these ERV elements can activate different gene components of the immune system. The team believes there are thousands more ERV sequences with similar regulatory activities, and it hopes to explore them systematically in future studies.

"We think we've only scratched the surface here on the regulatory potential of ERVs," says Feschotte.

Underscoring the complicated relationship humans have with viruses, strong evidence also exists that in some cases ERVs cause cancer but in other cases they protect against cancer. For example, an ERV called ERV9 can detect cancer-related damage in the DNA of cells in the testis. ERV9 then prompts a neighboring gene to induce the damaged cells to commit suicide. This protective mechanism ensures that the cancer cells will not spread.

Shaping Human Evolution

Scientists have also discovered that viral intruders have driven the evolution of human physiological functions ranging from early development to digestion.

Nearly 20 years ago, scientists identified an ERV-derived gene called syncytin that appears to play a key role in the development of the human placenta. Syncytin originated from a retroviral gene encoding a protein that is embedded in the outer surface of a virus. This protein mediates the fusion of the virions with the host cell membrane, thereby facilitating viral infection. In a remarkable turn of events, the human body has repurposed the viral protein's cell-fusing activities to promote the formation of the layer of cells that merge the placenta and the uterus.

Scientists have also found that viral invaders are critical to humans' ability to digest starch. The insertion of an ERV near the human pancreatic gene for making amylase -- a protein that helps humans digest carbohydrates -- led to the expression of amylase in saliva. The consequent ability to digest starch in the mouth has had profound effects on the human diet, notably a shift toward eating foods like rice and wheat. By helping to kick start digestion in the mouth, amylase relieves some of the burden of breaking down food faced by the small intestine. If this critical enzyme were not excreted in saliva, the small intestine would have more difficulty metabolizing sugars and starches.

More recently, in 2016, a team of U.S. and Israeli researchers reported that a common strategy that host organisms use for nullifying viruses -- bombarding them with mutations -- has helped shape human evolution.

The researchers, led by computational biologist Alon Keinan of Cornell University, in collaboration with Erez Levanon from Bar-Ilan University, study a virus-fighting family of human enzymes called APOBECs. During periods when DNA unzips into two single strands -- when it has been damaged, is in the process of being copied, or is being transcribed into RNA -- the APOBEC enzymes seek out bits of viral DNA. They then systematically strafe the viral DNA -- typically swapping many instances of one DNA base for another -- in order to neutralize pathogens lurking within the host genome.

It's likely that this APOBEC mechanism has also mutated non-viral portions of the human genome. Keinan says the majority of these genetic changes would have done enough damage to cause disease. For the most part, such mutations have been weeded out of the population because they were harmful to survival and reproduction. However, researchers have increasingly linked APOBECs to various cancers.

Keinan's team has shown that these mutations are also occurring in cells that develop into sperm and eggs and so they are inherited by future generations. And not all of the mutations have been detrimental. The genetic changes that survived through evolutionary time -- the ones that did not lead to disease -- are more likely to be beneficial. This insight suggests that the APOBEC anti-viral mechanism has helped shape primate evolution through a variety of yet-to-be-identified beneficial mutations. Keinan's team has reported tens of thousands of such mutations in hominid genomes and is now searching for specific examples that led to changes in function that have contributed to human evolution.

While the search for additional examples of beneficial ERVs and antiviral mechanisms continues, scientists are learning more about viral trespassers with the help of large databases of genomic information from numerous species. They're trying to figure out how viral DNA integrates into host genomes, how ERVs can jump from one host species to another and how to protect people in the case of these rare, but occasionally deadly, events.

Get Involved

To learn how you can get involved in neglected disease drug, vaccine or diagnostic research and development, or to provide updates, changes, or corrections to the Global Health Primer website, please  view our FAQs .

Viral vector vaccines benefit from improved understanding of viral biology. Several online resources for information on viral biology are available online, including:

For more detailed information on the current state of viral vector vaccine design see the recent review by Draper and Heeney, available here.

Other scientists are divided about the importance of the new work and its relevance to human health, and some are harshly critical. “There are open questions that we’ll have to address,” says molecular biologist Rudolf Jaenisch of the Massachusetts Institute of Technology (MIT), who led the work.

Yet a few veteran retrovirologists are fascinated. “This is a very interesting molecular analysis and speculation with supportive data provided,” says Robert Gallo, who heads the Institute of Human Virology and looked at the newly posted preprint at Science’s request. “I do not think it is a complete story to be certain … but as is, I like it and my guess is it will be right.”

All viruses insert their genetic material into the cells they infect, but it generally remains separate from the cell’s own DNA. Jaenisch’s team, intrigued by reports of people testing positive for SARS-CoV-2 after recovering, wondered whether these puzzling results reflected something of an artifact from the polymerase chain reaction (PCR) assay, which detects specific virus sequences in biological samples such as nasal swabs, even if they are fragmented and can’t produce new viruses. “Why do we have this positivity, which is now seen all over the place, long after the active infection has disappeared?” says Jaenisch, who collaborated with the lab of MIT’s Richard Young.

To test whether SARS-CoV-2’s RNA genome could integrate into the DNA of our chromosomes, the researchers added the gene for reverse transcriptase (RT), an enzyme that converts RNA into DNA, to human cells and cultured the engineered cells with SARS-CoV-2. In one experiment, the researchers used an RT gene from HIV. They also provided RT using human DNA sequences known as LINE-1 elements, which are remnants of ancient retroviral infections and make up about 17% of the human genome. Cells making either form of the enzyme led to some chunks of SARS-CoV-2 RNA being converted to DNA and integrated into human chromosomes, the team reports in their preprint, posted on bioRxiv on 13 December.

If the LINE-1 sequences naturally make RT in human cells, SARS-CoV-2 integration might happen in people who have COVID-19. This could occur in people coinfected with SARS-CoV-2 and HIV, too. Either situation may explain PCR detecting lingering traces of coronavirus genetic material in people who no longer have a true infection. And it could confuse studies of COVID-19 treatments that rely on PCR tests to indirectly measure changes in the amount of infectious SARS-CoV-2 in the body.

David Baltimore, a virologist at the California Institute of Technology who won the Nobel Prize for his role in discovering RT, describes the new work as “impressive” and the findings as “unexpected” but he notes that Jaenisch and colleagues only show that fragments of SARS-CoV-2’s genome integrate. “Because it is all pieces of the coronaviral genome, it can’t lead to infectious RNA or DNA and therefore it is probably biologically a dead end,” Baltimore says. “It is also not clear if, in people, the cells that harbor the reverse transcripts stay around for a long time or they die. The work raises a lot of interesting questions.”

Virologist Melanie Ott, who studies HIV at the Gladstone Institute of Virology and Immunology, says the findings are “pretty provocative” but need thorough follow-up and confirmation. “I have no doubt that reverse transcription can happen in vitro with optimized conditions,” Ott says. But she notes that SARS-CoV-2 RNA replication takes place in specialized compartments in the cytoplasm. “Whether it happens in infected cells and … leads to significant integration in the cell nucleus is another question.”

Retrovirologist John Coffin of Tufts University calls the new work “believable,” noting that solid evidence shows that LINE-1 RT can allow viral material to integrate in people, but he’s not yet convinced. The evidence of SARS-CoV-2 sequences in people, Coffin says, “should be more solid,” and the in vitro experiments conducted by Jaenisch’s team lack controls he would have liked to have seen. “All in all, I doubt that the phenomenon has much biological relevance, despite the authors’ speculation,” Coffin says.

Zandrea Ambrose, a retrovirologist at the University of Pittsburgh, adds that this kind of integration would be “extremely rare” if it does indeed happen. She notes that LINE-1 elements in the human genome rarely are active. “It is not clear what the activity would be in different primary cell types that are infected by SARS-CoV-2,” she says.

One particularly harsh Twitter critic, a postdoctoral researcher in a lab that specializes in retroviruses, went so far as to call the preprint’s conclusions “a strong, dangerous, and largely unsupported claim.” Jaenisch emphasizes that the paper clearly states the integration the authors think happens could not lead to the production of infectious SARS-CoV-2. “Let’s assume that we can really resolve these criticisms fully, which I’m trying to do,” Jaenisch says. “This might be something not to worry about.”

Why the Study Claiming SARS-CoV-2’s RNA Is Fused Into Human DNA Is Flawed

In September 1957, Francis Crick proposed the ‘central dogma of molecular biology’. He suggested that information always flows in living beings from DNA – a stable, inheritable molecule – through a relatively unstable intermediate, the messenger RNA, and then onto proteins, which are the workhorses of all life functions. And everywhere scientists looked, they realised all organisms followed this dogma – until 1970.

In this year, Howard Temin and David Baltimore found something odd in one group of viruses.

Viruses, like other living beings, come in all shapes and sizes, and are classified into different families. However, viruses are not classified the same way as other life forms. This is because they can be both alive and not alive – a feature that demands that taxonomists also consider other attributes that make viruses different.

Another such feature is their genetic material.

Viruses are the only known life-forms that can use RNA as their genetic material. There are different kinds of RNA-containing viruses. To propagate itself, each virus makes a copy of the information in its genetic material to pass onto its ‘daughter’ viruses. Some viruses contain the machinery to make copies of their RNA, and they don’t have a DNA component in their life cycle whatsoever. The influenza, hepatitis C and SARS-CoV-2 viruses are in this category. These viruses also deviate from the central dogma only slightly: there is no DNA, but the information flows only from the RNA to proteins.

But what Temin and Baltimore discovered in 1970 was a proper exception to the central dogma. They found viruses that could make a DNA copy with their RNA using an enzyme called reverse transcriptase, in a process called reverse transcription. A virus then mixes this DNA with the DNA of its host, thus becoming part of the host forever. Such viruses – called retroviruses – violate the central dogma because information first flows from RNA to DNA, and then from the DNA to the RNA to proteins.

Viruses like HIV and Rous sarcoma belong to this family.

In all, there are seven families, or groups, of viruses, and each group specifies special adaptations, refined over years of evolution, often through several hosts. It’s also unusual – maybe even impossible – to have members of one class of viruses show fundamental properties associated with another.

This is why a preprint paper uploaded to the bioRxiv preprint server on December 13 caught the scientific community by surprise. The paper claimed, outlandishly, that parts of the SARS-CoV-2 viral RNA could be reverse transcribed into DNA and integrated into the human genome.

According to the paper’s authors, they were attempting to explain why some COVID-19 patients showed signs of the virus in RT-PCR tests even weeks after recovering from the disease. Their explanation is based on a group of genetic entities called long interspersed nuclear elements (LINE). The human genome has multiple LINEs – effectively, parts of our DNA responsible for reverse-transcribing human RNA into DNA, and integrating it into the human DNA at a different part. The paper claims these LINEs do the same thing with parts of the novel coronavirus’s RNA as well.

This process differs from what retroviruses like HIV do routinely: they use their own proteins to convert and mix the DNA.

The authors’ claims are based largely on one primary observation and one experiment. The observation banks on a powerful tool called RNA-seq, which provides the sequences of all the RNA molecules produced by a cell. So a RNA-seq’s output is a sort of measure of all the genes that are active in the target cell. The authors reported that in cells infected with SARS-CoV-2, there were some viral RNA sequences interspersed between RNA sequences of human genes.

This data may seem convincing at first glance, but the devil is in the details. The authors appear to have overlooked the fact that in the process of preparing a sample for RNA-seq, the scientist must herself artificially reverse transcribe RNA into DNA – because only DNA can be sequenced (for further study). So the chimeric viral and human RNA could just be an artefact of the RNA-seq process, since reverse transcriptases are known to mix and match target sequences.

To prove their claims in an experimental setup, the authors genetically altered cells to make proteins that can perform reverse transcription. Then they infected these cells with the SARS-CoV-2 virus, and reported that the SARS-CoV-2 viral RNA is converted into DNA.

They performed the experiment by forcing cells to make unnatural quantities of two proteins: LINEs and HIV reverse transcriptase (RT). The problem with the former is that LINEs are rarely produced naturally in the same quantities as those in the experiment, raising doubts about whether the results reflect what is realistically possible. And the problem with the latter is that there is no chance HIV RT is naturally present in a cell infected with SARS-CoV-2 because the two viruses do not infect the same cell types. So the experimental evidence has some big loopholes that don’t in any way justify what the authors claim.

Instead, the authors could have provided data from an older technique: the Southern blot. In 1973, the English molecular biologist Edwin Southern reported a very simple way to check if a particular fragment of DNA is present in a given sample. A DNA molecule has two strands (the ‘double helix’), and the string of nucleobases on one strand can only pair to a specific string of nucleobases on the other. So Southern figured that by studying one strand, researchers could know what the other strand looked like.

The way to do this – for example – is to synthesise one strand of the SARS-CoV-2 DNA and mix it with copies of human DNA, and check for signs of binding.

The preprint paper’s lack of convincing evidence has opened it up to criticism from scientists for its erroneous assertions and unproven claims. At the same time, David Baltimore, who won a Nobel Prize for helping discover the reverse transcriptase enzyme, told the prominent Science magazine the study was “impressive”, and other news outlets have amplified his comments.

Such words have elevated the study’s profile in a way it didn’t deserve to be in the middle of a pandemic scarred by misinformation and pseudoscience. The manuscript’s bioRxiv page itself includes numerous demands from researchers around the world (as comments) to take it down.

To be clear, what the preprint’s authors have claimed is still within the realm of possibility, but their experiments and interpretations aren’t convincing. The claim is extraordinary: the first report of reverse transcription by a non-retrovirus. It would mean there’s a chance that your body keeps a record of all RNA viruses that ever infected it, and open up a whole new angle to immune memory. But extraordinary claims require extraordinary evidence – which the preprint paper doesn’t have. So for now, we wait for proof.

Arun Panchapakesan is a molecular biologist working in the HIV-AIDS laboratory at the Jawaharlal Nehru Centre for Advanced Scientific Research, Bengaluru.


Simulation study

We first compare the performance of VirTect with other three methods ViralFusionSeq, VirusFinder2 and Virus-Clip [25] using simulation. We randomly select 160 viral sequences (sizes ranging from 500 bp to 1000 bp) from genotype C [26] of the HBV genome and insert them to chromosome 1, 2, 3 and 4 of the human reference genome (hg19, GRCh37). The GenBank ID of HBV Genotype C is AB014381.1. In this way, we generate five related genomes with virus integrations. Each genome has 40 virus integration sites and 25 of them are common in all five genomes. In the simulation, we also randomly put SNVs and Indels near the integration site (50 bp neighborhood). Given the four simulated genomes, we use ART [27] to simulate the Illumina paired-end reads with a read length of 100 bp and an insert size of 300 bp (standard deviation 50 bp). For each genome, we simulate six datasets at coverage 3X, 5X, 10X, 20X, 30X and 40X. For VirTect, we test its performance starting from fastq files (VirTect:fastq) and from bam files (VirTect:bam). For VirTect:fastq, we use BWA to map all paired-end reads to the human reference genome (hg19) and the HBV genomes (genotype A-H) simultaneously. For VirTect:bam, the short reads are first mapped to the human reference genome. VirTect then uses BWA to realign the partially unaligned reads to the HBV genomes (genotype A-H). For the other two algorithms, we use the default parameter settings. All algorithms are tested on a Linux sever (32-core Intel Xeon 2.40 GHz CPU and 256Gb memory).

We first apply the other three algorithms to each data set individually and compare their performances with VirTect. Figure 2 shows the sensitivities and false discovery rates (FDR) of these algorithms on each genome separately. We define an integration prediction as a true positive if the distance between the predicted integration site and the real integration site is less than 350 bp. We find that VirTect achieves the highest sensitivities and the lowest FDR across all five genomes. Especially, at low coverage depth (3X, 5X and 10X), the sensitivities of VirTect are much higher than the other three algorithms and its FDRs are 0. VirTect:fastq is a little more sensitive than VirTect:bam at 3X and 5X coverage, but overall their performances are very similar. The other three algorithms had a higher FDR at low coverage because a number of predicted integration sites are far from the true integration sites.

The sensitivity (a-e) and FDR (f-j) of the four algorithms on the simulation data at different sequencing coverages

The above comparison is a bit unfair for other algorithms because the other three algorithms do not use all data to detect common integration sites. We then merge sequencing data of the five genomes as one data set and apply the other two algorithms to the merged data and compare their performances. Note that for VirTect, we do not need to physically merge the data and this is more convenient to analyze multiple related-samples. Figure 3a-b shows that VirTect also has the highest sensitivity and the lowest FDR across all coverages among the three algorithms. Figure 3c and d shows the distance between the detected integration sites and the true integration sites at 25X and 100X coverage. Compared with the other algorithms, the integration sites predicted by VirTect are closest to the true integration sites and the predicted integration sites of VirTect are only up to a few bp away from the true integration sites in most cases. We also compare the computational time of different algorithms. Figure 4. shows the running time using eight cores on the simulation dataset of Genome 1 and the merged dataset, respectively. We see that VirTect only takes around 1 fifth of the computational time of ViralFusionSeq and VirusFinder2 and a little faster than Virus-Clip.

The Sensitivity (a) and FDR (b) on the merged data. (c, d) Boxplots of breakpoint estimation accuracy on merged data at 25X and 100X coverage

a The computational time (in hour) on the simulated Genome 1 data at different coverages. b The computational time (in hour) on merged data at different coverages

Real data analysis

In this section, we compare the performance of VirTect with the other two algorithms on real data sets. We consider two real data sets in this study. One is a multi-regional whole exome sequencing (WES) data from an HBV-related hepatocellular carcinoma (HCC) patient [28]. The patient’s ID is 213 and tumors from five regions are sequenced by Illumina platform with a read length of 75 bp. The mean insert size is 200 bp with a standard deviation of 50 bp. The other data consists of nine whole genome sequencing (WGS) data of HBV-related HCC patients [29]. The read length of this data is 90 bp and the coverage is around 30X.

For the multi-region WES data, VirTect is able to detect one HBV integration sites. The integration site is at chromosome 5:1295527 (Fig. 5). The integration sites are located at promoter region of the telomerase reverse transcriptase (TERT). Previous research showed that TERT is the most prevalent gene integrated by HBV in HCC [30]. Moreover, all tumor regions have this integration event, implying that this event might be an early carcinogenesis event. When we apply the other two algorithms to data of each region, they fail to detect any integration site. When we merge the multi-regional data together, they also fail to detect any event.

VirTect identifies an HBV integration site at the TERT promoter region in patient 213. All tumors from different regions have this integration event. The discordant and sandwich-mapped reads to the HBV genome (a) and human genome (b) are shown

For the WGS data, we downloaded hepatocellular carcinoma samples, 101 T, 105 T, 106 T, 108 T, 113 T,114 T,115 T,116 T and 117 T reported by Sung et al. 2012 [29]. Here, we only report the results for VirTect and VirusFinder2 because ViralFusionSeq failed due to insufficient memory and Virus-Clip did not finish computation after a week. The running time of VirTect and VirusFinder2 is shown in Fig. 6. VirTect and VirusFinder2 detected all integration sites reported by Sung et al. 2012 [29]. Some of these integration sites interrupt important cancer genes such as CCNE1 (sample 106 T, chr19:30304177) and NTRK3 (sample 108 T, chr15:88688212). Details about these integration sites are in Additional file 1: Table S1. Figure 7a shows one integration cite at chr1:151503388 at the gene CGN. VirusFinder2 costs long time (> 3 days) and a large amount of memory (about 70 Gb) to finish the computation. In comparison, VirTect uses 1.5 days and no more than 30Gb memory. In addition to the reported integration sites, VirTect detects a new integration site at chromosome X:14603545 (Fig. 7b) overlapping with the gene GLRA3.

a The mean coverage of the 9 WGS data. b The computational time (in day) of VirTect and VirusFinder2 on these 9 WGS data

a A known HBV integration site detected by both VirTect and VirusFinder2 in sample 101 T. The mappings of the supporting reads to the human genome (left panel) and to the virus genome (right panel) are shown. The split position of each read is marked by a scissor icon. b A new integration site detected by VirTect

Integration preference versus oncogenic selection

We see two uses for profiling the insertion site preferences for integrating vectors. First, in functional genomics screens, insertion profiles that emerge can be compared with expected profiles that are only structure based rather than genetics based. A striking example of this is evident in the oncogene screens conducted with the SB transposon [58, 59], which is illustrated in Figure 6 with respect to the Braf gene. Integration sites that emerged from the screen are shown across the entire locus (Figure 6b) and in a selected region comprising exons 10-13/introns 10-12 (Figure 6d), where most of the integrations were selected because of induced expression of a truncated gain-of-function kinase polypeptide. Panels a and c show insertion site preference scores across the region obtained using an automated script (ProTIS) that counts and scores preferred TA dinucleotide insertion sites based on V stepvalues [115]. The results shown in Figure 6 make two strong points. The first is that the frequency of oncogenic insertions in a select region correspond to that predicted on the basis of preference profiling (Figure 6c,d specifically, microscale structure can be a good predictor of integration site preference). The second is that many predicted hotspots (Figure 6a,b) were not sites that lead to oncogenesis. The combination of these two observations enhances the biologic importance of the integrations into introns 11 and 12.

SB insertions across the mouse Braf gene. Thirty Sleeping Beauty (SB) insertions deposited in the Retroviral-Tagged Cancer Gene Database were mapped across the entire Braf transcript and 10 kilobases upstream (NCBI 36 build note that Braf is transcribed right-to-left). Most oncogenic insertions occurred in introns 11 and 12 (formerly annotated as intron 9). (a) ProTIS profiling across the entire gene reveals predicted hotspots for SB integration, but (b) most actual integrations were found in a relatively low scoring region corresponding to introns 11 and 12. A blowup of this local 4.9 kilobase region demonstrates that (c) ProTIS scores closely match (d) patterns of actual transposon integration. bp, base pairs

The second application of predicting profiles of vector insertions may be as part of a risk assessment program. Although current understanding of integration site preferences for most vectors is still inadequate to allow prediction of the probability of integration into specific genes, genome-wide integration datasets may suggest the likelihood that a vector will integrate within the general vicinity of a specific gene. Similarly, analysis of DNA structural characteristics may be used to assess the likelihood that each vector will integrate within specific regions of genes. For example, although Braf can act as a potent oncogene, the pattern of SB integrations into Braf suggest that integrations into a relatively small region of the gene (introns 11 and 12) are the most highly selected for oncogenesis, in spite of the presence of hotspots across the entire gene. Thus, the range of possible insertions that are capable of generating an oncogenic transcript, combined with the relative 'attractiveness' of the sequence across these regions, will dictate the chances of insertional activation.

An analysis of several structural characteristics is presented for the mouse c-myc gene (Figure 5), the human ortholog of which is activated in many cancers [141]. The figure highlights the 3 kilobase region encompassing the promoter that harbors the bulk of oncogenic retroviral integrations at this locus that have been deposited in the Retroviral-Tagged Cancer Gene Database (RTCGD [142]). The sequence was divided into 50 base pair (bp) bins, and the total values for V step, A-philicity, jaggedness, and bendability were summed across each bin. Measured in 50 bp bins, these structural parameters are highly variable across the sequence, and vary independently from each other. Actual oncogenic retroviral insertions observed in insertional mutagenesis screens and deposited into the RTGCD are shown for comparison in Figure 5a. The profiles indicate two features of transposons under consideration for gene therapy. First, the most likely sites for SB transposons to integrate (Figure 5g) are shifted away from the most commonly found activation sites, as revealed by retroviral integrations (Figure 5a). Second, the profile of TTAA sites, required by the piggyBac transposon (Figure 5f), is similar to the preferred SB sites, and further shows that some regions harboring retroviral integrations contain no TTAA sequences, making piggyBac insertions into these sites impossible. Thus, at first approximation, it would appear that the transposons are less likely to insert close to the c-myc promoter than are retroviral vectors. In support of this, c-myc is infrequently hit in SB-based insertional mutagenesis screens to date, only one c-myc integration has been deposited into the RTCGD. In contrast, many retroviral insertions into c-myc have been mapped, although the number of deposited retroviral insertions is much higher than the number of transposons.

The relative lack of SB insertions into c-myc may be due to either a paucity of favorable SB insertion sites in regions of the gene competent for oncogenic activation, or an overall lack of oncogenic selection for insertions into this gene. In support of the former, transposon-free amplification of c-myc was one of the few genomic aberrations observed in tumors harboring mobile transposons (Largaespada DA, Collier LC, Hackett CS, unpublished observations), suggesting that activation of c-myc plays a role in the biology of these tumors (there was probably oncogenic selection for the genomic amplicon). Similar ProTIS analysis of the LMO2 locus revealed the most preferential integration sites for SB transposons that were considerably farther away from the LMO2 promoter than mapped integrations by activating retroviruses [115]. That said, it is evident that prediction of vector integration is not precise and even rare integrations into unfavorable sites have a potential to promote oncogenic expansion, as indicated in Figure 6.

Virus Questions! - How they replicate exactly.

Been seeing quite a few DNA questions in general on MCAT stuff, and I get these correct but now that I think about it my Virus' knowledge is not very exact.

Question: Can a DNA virus integrate itself into the genome or is it only RNA virus's? When a DNA virus integrates itself, what enzyme is it using? Is the integration complementary and antiparallel or exactly the same?

EDIT: I also had the following questions before but they have been answered by this kid video:

DNA Virus's - how do they replicate? Do they always undergo a lysogenic cycle (if so, when they integrate onto a chromosome are they making a complementary anti-parallel strand. if so, what enzmye is being used to do this?)? Do they create RNA and proteins using the host cell as well as replicate their DNA?

RNA Virus's (positive sense) - can some of these not integrate into the genome and directly go make proteins using host cell machinery? How is their RNA strand replicated into many copies (we don't have a way to replicate RNA). Pretty confused here.

Pseudotyping of Viral Vectors

Retroviruses and adeno-associated viruses have a single protein coating their membrane, while adenoviruses are coated with both an envelope protein and fibers that extend away from the surface of the virus. The envelope proteins on each of these viruses bind to cell-surface molecules such as heparin sulfate, which localizes them upon the surface of the potential host, as well as with the specific protein receptor that either induces entry-promoting structural changes in the viral protein, or localizes the virus in endosomes wherein acidification of the lumen (anatomy) induces this refolding of the viral coat. In either case, entry into potential host cells requires a favorable interaction between a protein on the surface of the virus and a protein on the surface of the cell.

For the purposes of gene therapy, one might either want to limit or expand the range of cells susceptible to transduction by a gene therapy vector. To this end, many vectors have been developed in which the endogenous viral envelope proteins have been replaced by either envelope proteins from other viruses, or by chimeric proteins. Such chimera would consist of those parts of the viral protein necessary for incorporation into the virion as well as sequences meant to interact with specific host cell proteins. Viruses in which the envelope proteins have been replaced as described are referred to as pseudotyped viruses.

For example, the most popular retroviral vector for use in gene therapy trials has been the lentivirus Simian immunodeficiency virus coated with the envelope proteins, G-protein, from Vesicular Stomatitus virus. This vector is referred to as VSV G-pseudotyped lentivirus, and infects an almost universal set of cells. This tropism is characteristic of the VSV G-protein with which this vector is coated.

Many attempts have been made to limit the tropism of viral vectors to one or a few host cell populations. This advance would allow for the systemic administration of a relatively small amount of vector. The potential for off-target cell modification would be limited, as well as many concerns from the medical community. Most attempts to limit tropism have used chimeric envelope proteins bearing antibody fragments. These vectors show great promise for the development of "magic bullet" gene therapies.