Information

Nature Scientific Reports vs. BMC Genomics

Nature Scientific Reports vs. BMC Genomics



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I do not know if this question to suitable here or not. I posted it as I think biologists can help me in this question. If not suitable here, then I am sorry for that.

I have written paper which I spent about 3 years working on it and it has nice findings and original sequenced data (deposited in NCBI). I submitted the paper to "Genome Biology" and I got editorial rejection but they offered to transfer the paper to "BMC Genomics". However, my supervisor refused this and suggested that we submit to "Nature Scientific Report". I tried to convince him to go to BMC Genomics because it is more focused journal and it is better for my CV, however, he said the second one is better because it has better impact factor and it is Nature! (Impact factor: 5 for Scientific Reports vs 4.04 for BMC Genomics).

What do you think as I am really confused which one to select?


This is too long for a comment, so I post it here. And these are my personal views, someone else might handle this different. I would submit the paper to BMC Genomics for some reasons:

  • The editor offered to transfer the paper, so you get past the editorial review. This is not guaranteed for Nature Scientific Report (NSR). You still have to get through the peer review, but this is not different between both journals.
  • Honestly: Forget about the difference in the Impact Factor. Both are medium range impact factor journals closely located together. My most cited paper is published in such a medium impact journal, today you find papers by a Medline search, so this doesn't matter anymore (besides prestige). Additionally a lot of citations will raise the journals impact factor in the future. I would understand this argument when you would have to choose between Nature and BMC Genomics, there is quite a difference.
  • If your work fits better into BMC Genomics the chance for acceptance is higher, so I would go there. This is of course always a point of consideration. The same is true for your CV.

If you have the chance to do the decision for yourself without angering your supervisor too much, I would go for it. Even better would be to convince him.


I disagree somewhat with the others comments. Scientific reports has a broader audience, higher impact factor and is associated with the Nature family of journals.

The BMC family of journal family does not have the same reputation and certainly plosONE does not, and has a reputation among some as publishing almost anything. For anyone outside your specific field they will consider publication in SR a better publication. This does not mean that SR is better journal in a scientific sense but if you are looking for prestige SR is better journal for sure.


Editor Profiles

Clare joined the BMC series as a manuscript editor in 2019 before becoming editor of BMC Genomics, BMC Genetics and BMC Medical Genomics in September 2020. She started out studying Human Genetics at Trinity College Dublin and following that, she obtained a Masters in Cardiovascular Science from the University of Edinburgh, and subsequently, a PhD researching the role of a gene in the control of inflammation and oxidative stress in adipose tissue. She worked for a year for a small academic publishing company before joining the BMC. She has an interest in research integrity and open access, to ensure that reliable, reproducible research is available to all.


Valuable data often go unpublished when they could be helping to progress science. Hence, the BMC Series introduced Data notes, a short article type allowing you to describe your data and publish them to make your data easier to find, cite and share.

You can publish your data in BMC Genomic Data (genomic, transcriptomic and high-throughput genotype data) or in BMC Research Notes (data from across all natural and clinical sciences).

More information about our unique article type can be found on the BMC Genomic Data and BMC Research Notes journal websites.


Case presentation

The case below describes multiple paragangliomas diagnosed in a Russian woman, presenting as two CPGLs at both sides of the neck and one VPGL. The aim of the study was to investigate the molecular mechanisms underlying the development of multiple paragangliomas by examining clinical and pathological characteristics along with the genetic variations of the three tumors.

A 50-year-old female was diagnosed with extravascular compression of the carotid arteries and CPGLs on both sides of the neck. Clinical symptoms include arterial hypertension and painless rounded masses. Computed tomography (CT) and ultrasound (US) study revealed the presence of tumors in the areas of the carotid bifurcation, solid neoplasia 32 × 25 mm on the left side of the neck and two-nodal tumor 46 × 24 mm on the right side of the neck, respectively. These neck masses were heterogeneous in structure and predominantly hypoechoic and hypervascular. The CT study with contrast also revealed the presence of hypointense, highly vascularized masses at the right and left carotid bifurcations (Fig. 1).

Computed tomography of the patient's head and neck before surgery. CT scan (left) 3D reconstruction (right)

The patient was subjected to surgery for the left tumor resection. At the time of surgery, the lower pole of the hypervascularized tumor was located below the outer carotid artery (OCA) bifurcation, spreading along the carotid arteries in the proximal direction and wrapping around the posterior, anterior, and lateral surfaces. Bifurcation of carotid arteries was involved in the tumor mass. The upper pole of the tumor was associated with the vagus nerve. The tumor (25 × 2 × 17 mm) was completely removed and subjected to pathological evaluation. The patient was discharged with a planned re-hospitalization to remove the tumor on the right. Histological examination of the resected tumor confirmed carotid paraganglioma (Fig. 2). Hematoxylin-eosin (H&E) staining showed a Zellballen structure that is typical for paragangliomas. Chief tumor cells exhibited positive staining for chromogranin A, synaptophysin, and CD56 antibodies indicating a neuroendocrine tumor. S100 protein was expressed in sustentacular cells. Tumor cells were negative for cytokeratin AE1/AE3.

Hematoxylin-eosin (H&E) staining of carotid and vagal paragangliomas. a left CPGL, b right CPGL, and c right VPGL. Specific “Zellballen” growth pattern of paragangliomas (small nests of chief cells with pale eosinophilic staining, surrounded by supporting sustentacular cells) can be seen

A surgery on the right side of the neck was performed after a year. As per the US study reports, VPGL and enlarged lymph node were primarily detected. During the surgery, lymph node (15 × 5 mm) above the carotid artery bifurcation was removed and further subjected to histological examination for metastases. In addition, tumor-like mass (35 × 20 mm), laterally suppressing the internal carotid artery (ICA) was observed directly at carotid artery bifurcation and resected. After the tumor resection, one more tumor-like formation (60 × 20 mm) originating from the vagus nerve was visualized under it and removed. The vagus nerve was resected and ligated. Both the tumors were histologically examined and displayed paragangliomas with no lymph node metastases (Fig. 2).

For all the tumor samples from the patient (left and right CPGLs, VPGL), immunohistochemistry (IHC) analysis of succinate dehydrogenase (SDH) subunit expression was performed (Additional file 1). SDH complex consists of four subunits (SDHA, SDHB, SDHC, and SDHD) encoded by the corresponding genes [5, 6]. Germline and somatic mutations in the SDHx genes are commonly associated with paragangliomas/pheochromocytomas [7, 8]. Immunohistochemistry for SDH subunits is a valuable additional tool in the histopathological study of paragangliomas that is used in the clinic for assessment of SDH loss, which can be associated with the pathogenic mutations in any SDHx genes.

Immunoreactions for SDH subunits were carried out using primary antibodies from Abcam (USA) for each SDH subunit: SDHA, monoclonal, clone 2E3GC12FB2AE2 SDHB, monoclonal, clone 21A11AE7 SDHC, monoclonal, clone EPR11035(B) SDHD, polyclonal. We found weak diffuse weak SDHB staining in VPGL and both the left and right side CPGLs. According to the literature, weak diffuse staining of the SDHB subunit can reflect pathogenic mutations in any SDHx genes. We detected weak diffuse SDHB staining in all studied tumors indicating the presence of germline pathogenic mutation in one of the SDHx genes in the patient.

Additionally, we carried out the exome-sequencing of three tumors, lymph node, and blood from the patient. The DNA from tumors and lymph node was extracted with a High Pure FFPET DNA Isolation Kit (Roche, Switzerland). The DNA was isolated from blood cells using a MagNA Pure Compact Nucleic Acid Isolation Kit I (Roche) on a MagNA Pure Compact Instrument (Roche). Exome libraries were prepared with the Rapid Capture Exome Kit (left CPGL) and TruSeq Exome Library Prep Kit (right CPGL and VPGL) from Illumina (USA). High-throughput exome sequencing was performed on a NextSeq 500 System (Illumina) under a paired-end mode of 76 × 2 bp for tumors and lymph node, and 156 × 2 bp for blood with 300x minimum coverage. The exome sequencing data of paragangliomas are available in the NCBI SRA under the accession numbers PRJNA411769 (left CPGL, Pat01), PRJNA476932 (right CPGL, Pat104), and PRJNA561073 (VPGL, Pat6). Bioinformatic analysis is described in our previous study [9]. Missense variants were considered as likely pathogenic if they were predicted by at least three prediction tools and characterized by conservation score ≥ 0.5.

Exome analysis revealed a likely pathogenic germline missense variant in the SDHD gene, NM_003002.3: c.305A > G, p.H102R (chr11: 111959726, rs104894302). Pathogenic/likely pathogenic germline variants in other genes, for which the association with paragangliomas/pheochromocytomas has been shown, were not found.

Identified likely pathogenic somatic variants were different for each tumor (Additional file 2). In left CPGL, we found missense likely pathogenic somatic variants in two genes, TENM3 [NM_001080477: c.C5082A, p.N1694K (chr4: 183696084)] and EPHA5 [NM_004439: c.G682A, p.V228I (chr4: 66467587)].

In right CPGL, a variety of likely pathogenic variants (stop-gain, frameshift, and missense) were detected. Stop-gain variants were found in NRXN3 [NM_004796: c.C1387T, p.Q463X (chr14: 79432478)] and RELN [NM_005045: c.C9052T, p.R3018X (chr7: 103137114)], missense variants were revealed in TRIP12, JAG1, ASXL1, LMBRD1, DHX9, AASS, and TP53. For the TP53 gene, we found two mutations: a pathogenic/likely pathogenic variant, NM_001126115: c.A446T, p.D149V (chr17: 7577096, rs587781525), that has been reported in ClinVar, and a previously undescribed likely pathogenic variant, NM_000546: c.A170G, p.D57G (chr17: 7579517).

In the case of VPGL, we found a pathogenic variant in mtDNA (MT: 3243, rs199474657) and likely pathogenic missense, frameshift and stop-gain variants in a number of genes (LRP1, SPEN, PPP4R1, XPO6, FBN1, C1QB, and others) (Additional file 2).


Background

Massive forest decline as a result of negative anthropogenic and climatic effects, often aggravated by pests, fungi, and other phytopathogens, has been observed almost everywhere. Environmental changes, such as increased average annual temperatures, decreased precipitation, more frequent droughts, can weaken trees and make fungi much more destructive. Forest conservation has become a serious issue since the scale of plant death caused by phytopathogenic fungi is enormous. For instance, tree diseases have caused the loss of approximately 100 million elm trees in the United Kingdom and the United States, and the list can be continued. Among all phytopathogens, fungi cause 64% of infection-related species extinction and regional extirpation events [1].

The basidiomycete genus Armillaria plays a very important role in forest ecosystems worldwide and currently includes more than 40 officially described species [2, 3]. Armillaria species differ significantly in virulence, for example, some species, such as A. ostoyae, are the main cause of tree death while other species colonize plants already damaged by various factors (drought, pests, etc.) [4, 5]. Difference in pathogenicity has also been observed in A. ostoyae, however virulence variation of A. borealis has not been studied yet [6].

Armillaria borealis (Marxm. & Korhonen) is a fungus from the Physalacriaceae family (Basidiomycota) widely distributed in Eurasia, including Siberia and the Far East [1]. Species from this genus cause the root white rot disease that weakens and often kills woody plants [7]. Several phylogenetic and genomic studies on A. ostoyae have been carried out due to its high pathogenic potential and common occurrence [4, 8, 9], while little is known about ecological behavior of A. borealis. According to field research data, A. borealis is less pathogenic than A. ostoyae, and its aggressive behavior is rare. Mainly A. borealis behaves as a secondary pathogen killing trees already weakened by biotic and abiotic factors [10,11,12,13]. However, changing environment might cause unpredictable effects in fungi behavior.

Armillaria spp. impact on forest populations has both economic and ecological significance. They attack hundreds of different tree species (e.g., Abies, Picea, Pinus, Betula, Sorbus, Juglans, Malus, etc.) in both hemispheres under different climatic conditions, and are among the most destructive forest pathogens [2, 14, 15].

Identification of species and pathogenicity levels of Armillaria is crucial for forest conservation. The genomic data are needed to study the pathogenicity of pathogenic species and to better understand their impact on trees and the host-pathogen interactions. In addition, comparative genomics can help to resolve complex phylogeny of Armillaria species. It is worth noting that fungi genomic data are also important for industrial applications. For example, white rot Armillaria fungi are capable of lignin and cellulose decomposition, and they can be used to utilize the wood and paper production waste [16].

A. borealis is very important for the vast boreal forest ecosystems. However, despite the enormous influence of Armillaria species on forestry, horticulture, and agriculture, fungi of this genus and their pathogenicity are still not well-studied in this large region, which makes the presented genomic study very much needed.

There are already published genomic and proteomic data for A. mellea, A. solidipes, and A. ostoyae revealing the presence of plant cell wall degradation enzymes (PCWDE) and some secreted proteins [17,18,19]. Genomic analysis of other pathogenic basidiomycetes, such as Moniliophthora [20, 21], Heterobasidion [22], and Rhizoctonia [23], also revealed genes encoding PCWDE, as well as secreted enzymes and secondary metabolism effector proteins as putative pathogenicity factors. However, the life cycle and the distribution strategy of Armillaria members indicate that they may have evolved other additional mechanisms for pathogenicity, which along with other potential genomic mechanisms are not yet studied [24]. It is worth noting that the role and functional significance of mobile and highly repetitive elements (REs) are still not completely clear. Gradually accumulated data suggest that REs can play an important role in the evolutionary development of organisms, replication, and formation of nucleoprotein complexes, as well as affect gene expression [17]. Genomes of fungi are densely packed containing effector genes and transposable elements (TEs) [25,26,27]. It was reported that different fungal pathogens, such as Fusarium [28] and Verticillium [29], have similar genome architecture. So, it is expected that TEs may play important roles in host switching and adaptation to new ecological niches [30]. It was found in Magnaporthe oryzae that genes involved in host specialization were associated with TEs [31].


Plant materials and growth conditions

Dr. Thomas W. Okita from Washington State University provided the Kitaake seeds, which were originally obtained from Dr. Hiroyuki Ito, Akita National College of Technology, Japan. Dr. Jan E. Leach at Colorado State University provided seeds for Zhenshan 97, Minghui 63, IR64 and 93–11. Seeds of Kasalath were provided by the USDA Dale Bumpers National Rice Research Center, Stuttgart, Arkansas. Seeds were germinated on 1/2x MS (Murashige and Skoog) medium. Seedlings were transferred to a greenhouse and planted 3 plants/pot during the springtime (Mar. 2, 2017) in Davis, California. The light intensity was set at approximately 250 μmol m − 2 s − 1 . The day/night period was set to 14/10 h, and the temperature was set between 28 and 30 °C [29]. Rice plants were grown in sandy soil supplemented with nutrient water. The day when the first panicle of the plant emerged was recorded as the heading date for that plant. Kasalath seeds were received later, and the heading date was recorded in the same way. The experiment was repeated in winter.

Construction of a phylogenetic tree

We obtained 178,496 evenly distributed SNPs by dividing the genome into 3.8 kb bins and selecting one or two SNPs per bin randomly according to the SNP density of the bin. Genotypes of all the rice accessions, including 3010 accessions of the 3 K Rice Genomes Project and additional noted accessions, were fetched from the SNP database RiceVarMap v2.0 [30] and related genomic data [31] and used to calculate an IBS distance matrix which was then applied to construct a phylogenetic tree by the unweighted neighbor-joining method, implemented in the R package APE [32]. Branches of the phylogenetic tree were colored according to the classification of the 3010 rice accessions [2].

Genome sequencing and assembly

High molecular weight DNA from young leaves of KitaakeX was isolated and used in sequencing. See (Additional file 1) for further details.

Annotation of protein-coding genes

To obtain high-quality annotations, we performed high throughput RNA-seq analysis of libraries from diverse rice tissues (leaf, stem, panicle, and root). Approximately 683 million pairs of 2 × 151 paired-end RNA-seq reads were obtained and assembled using a comprehensive pipeline PERTRAN (unpublished). Gene models were predicted by combining ab initio gene prediction, protein-based homology searches, experimentally cloned cDNAs/expressed-sequence tags (ESTs) and assembled transcripts from the RNA-seq data. Gene functions were further annotated according to the best-matched proteins from the SwissProt and TrEMBL databases [33] using BLASTP (E value < 10 − 5 ) (Additional file 11). Genes without hits in these databases were annotated as “hypothetical proteins”. Gene Ontology (GO) [34] term assignments and protein domains and motifs were extracted with InterPro [35]. Pathway analysis was derived from the best-match eukaryotic protein in the Kyoto encyclopedia of genes and genomes (KEGG) database [36] using BLASTP (E value< 1.0e − 10 ).

Genome Synteny

We used SynMap (CoGe, www.genomevolution.org) to identify collinearity blocks using homologous CDS pairs with parameters according to Daccord et al. [37] and visualized collinearity blocks using Circos [38].

Repeat annotation

The fraction of transposable elements and repeated sequences in the assembly was obtained merging the output of RepeatMasker (http://www.repeatmasker.org/, v. 3.3.0) and Blaster (a component of the REPET package) [39]. The two programs were run using nucleotide libraries (PReDa and RepeatExplorer) from RiTE-db [40] and an in-house curated collection of transposable element (TE) proteins, respectively. Reconciliation of masked repeats was carried out using custom Perl scripts and formatted in gff3 files. Infernal [41] was adopted to identify non-coding RNAs (ncRNAs) using the Rfam library Rfam.cm.12.2 [42]. Results with scores lower than the family-specific gathering threshold were removed when loci on both strands were predicted, only the hit with the highest score was kept. Transfer RNAs were also predicted using tRNAscan-SE [43] at default parameters. Repeat density was calculated from the file that contains the reconciled annotation (Additional file 10).

Analysis of genomic variations

Analysis of SNPs and InDels: We used MUMmer (version 3.23) [26] to align the Nipponbare and Zhenshan97 genomes to the KitaakeX genome using parameters -maxmatch -c 90 -l 40. To filter the alignment results, we used the delta -filter − 1 parameter with the one-to-one alignment block option. To identify SNPs and InDels we used show-snp option with parameter (−Clr TH). We used snpEff [44] to annotate the effects of SNPs and InDels. Distribution of SNPs and InDels along the KitaakeX genome was visualized using Circos [38].

Analysis of PAVs and Inversions: We used the show-coords option of MUMmer (version 3.23) with parameters -TrHcl to identify gap regions and PAVs above 86 bp in size from the alignment blocks. We used the inverted alignment blocks with ≥98% identity from the show-coords output file to identify inversions.

To identify genomic variations between Kitaake and KitaakeX we sequenced and compared the sequences using the established pipeline [15].

BAC library construction

Arrayed BAC libraries were constructed using established protocols [45]. Please see Additional file 1 for further details.

Genome size estimation

We used the following methodology to estimate KitaakeX genome size:

(1) Using the Illumina fragment library, we created a histogram of 24mer frequencies. This was performed by first counting the frequency of all 24mers. The number of kmers at each frequency was tallied, and a histogram was created. (2) The kmer histogram generally indicates a peak value at a particular frequency corresponding to the average coverage of 24mers on the genome. (3) We then took the peak value representing the coverage on the genome, and computed the total bases in the Illumina library. Further dividing the total bases by the coverage, provided an estimate of the genome size. This value is generally accurate to +/− 10%.


Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018562:203–9 Available from: http://www.nature.com/articles/s41586-018-0579-z. [cited 2018 Oct 12].

McArt DG, Bankhead P, Dunne PD, Salto-Tellez M, Hamilton P, Zhang S-D. cudaMap: a GPU accelerated program for gene expression connectivity mapping. BMC Bioinformatics. 201314:305 Available from: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-305. [cited 2018 Oct 18].

Mejía-Roa E, Tabas-Madrid D, Setoain J, García C, Tirado F, Pascual-Montano A. NMF-mGPU: non-negative matrix factorization on multi-GPU systems. BMC Bioinformatics. 201516:43 Available from: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-015-0485-4. [cited 2018 Oct 18].

Schatz MC, Trapnell C, Delcher AL, Varshney A. High-throughput sequence alignment using graphics processing units. BMC Bioinformatics. 20078:474 Available from: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-474. [cited 2018 Oct 18].

Nobile MS, Cazzaniga P, Tangherloni A, Besozzi D. Graphics processing units in bioinformatics, computational biology and systems biology. Brief Bioinform. 201618:bbw058 Available from: https://academic.oup.com/bib/article-lookup/doi/10.1093/bib/bbw058. [cited 2019 May 20].

Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol. 201612:878 Available from: http://msb.embopress.org/content/12/7/878. [cited 2019 May 20].

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems 2016. Available from: http://arxiv.org/abs/1603.04467

Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, et al. Automatic differentiation in PyTorch. 2017


Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, et al.Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 15:1034–50.

Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, et al.Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci. 2009 106:9362–7.

Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, et al.Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012 337:1190–95.

The ENCODE Project, Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 489:57–74.

Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, et al.A comparative encyclopedia of DNA elements in the mouse genome. Nature. 2014 515:355–64.

Gerstein MB, Rozowsky J, Yan KK, Wang D, Cheng C, Brown JB, et al.Comparative analysis of the transcriptome across distant species. Nature. 2014 512:445–8.

Vernot B, Stergachis AB, Maurano MT, Vierstra J, Neph S, Thurman RE, et al.Personal and population genomics of human regulatory variation. Genome Res. 2012 22:1689–97.

Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, et al.Integrative analysis of 111 reference human epigenomes. Nature. 2015 518:317–30.

Stergachis AB, Neph S, Sandstrom R, Haugen E, Reynolds AP, Zhang M, et al.Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature. 2014 515:365–70.

Cheng Y, Ma Z, Kim BH, Wu W, Cayting P, Boyle AP, et al.Principles of regulatory information conservation between mouse and human. Nature. 2014 515:371–5.

Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A, et al.Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science. 2010 328:1036–40.

The FAANG Consortium. Functional Annotation of Animal Genomes (FAANG): a coordinated international action to accelerate genome to phenome. http://www.faang.org. Accessed 13 Nov 2019.

Andersson L, Archibald AL, Bottema CD, Brauning R, Burgess SC, Burt DW, et al.Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biol. 2015 16:57.

Tuggle CK, Giuffra E, White SN, Clarke L, Zhou H, Ross PJ, et al.GO-FAANG meeting: a gathering on functional annotation of animal genomes. Anim Genet. 2016 47:528–33.

Kern C, Wang Y, Chitwood J, Korf I, Delany M, Cheng H, et al.Genome-wide identification of tissue-specific long non-coding RNA in three farm animal species. BMC Genomics. 2018 19:684.

Giuffra E, Tuggle CK, FAANG Consortium. Functional annotation of animal genomes (FAANG): current achievements and roadmap. Ann Rev Anim Biosci. 2019 7:65–88.

Harrison P, Fan J, Richardson D, Clarke L, Zerbino D, Cochrane G, et al.FAANG, establishing metadata standards, validation and best practices for the farmed and companion animal community. Anim Genet. 2018 49:520–6.

The FAANG Consortium. The FAANG Data Coordination Center. https://data.faang.org. Accessed 11 Nov 2019.

The FR-AgENCODE group. FR-AgENCODE: a FAANG pilot project for the annotation of livestock genomes. http://www.fragencode.org. Accessed 11 Nov 2019.

Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013 10:1213–8.

Lieberman-Aiden E, Van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al.Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009 326:289–93.

Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al.STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 29:15–21.

Dobin A, Gingeras TR. Mapping RNA-seq reads with STAR. Curr Protocol Bioinform. 2015 51:11–4.

Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011 12:323.

Mank JE. Sex chromosome dosage compensation: definitely not for everyone. Trends Genet. 2013 29:677–83.

Breschi A, Djebali S, Gillis J, Pervouchine DD, Dobin A, Davis CA, et al.Gene-specific patterns of expression variation across organs and species. Genome Biol. 2016 17:151.

Lin S, Lin Y, Nery JR, Urich MA, Breschi A, Davis CA, et al.Comparison of the transcriptional landscapes between human and mouse tissues. Proc Natl Acad Sci. 2014 111:17224–9.

Sudmant PH, Alexis MS, Burge CB. Meta-analysis of RNA-seq expression data across species, tissues and studies. Genome Biol. 287 16.

Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010 26:139–40.

Melé M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, et al.The human transcriptome across tissues and individuals. Science. 2015 348:660–5.

Gerner W, Käser T, Saalmüller A. Porcine T lymphocytes and NK cells – an update. Dev Comp Immunol. 2009 33:310–20.

Guzman E, Hope J, Taylor G, Smith AL, Cubillos-Zapata C, Charleston B. Bovine γδ T cells are a major regulatory T cell subset. J Immunol. 2014 193:208–22.

Kapushesky M, Adamusiak T, Burdett T, Culhane A, Farne A, Filippov A, et al.Gene Expression Atlas update – a value-added database of microarray and sequencing-based functional genomics experiments. Nucleic Acids Res. 2011 40:D1077–81.

Petryszak R, Burdett T, Fiorelli B, Fonseca NA, Gonzalez-Porta M, Hastings E, et al.Expression Atlas update – a database of gene and transcript expression from microarray-and sequencing-based functional genomics experiments. Nucleic Acids Res. 2014 42:D926–32.

Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, et al.Landscape of transcription in human cells. Nature. 2012 489:101.

Wucher V, Legeai F, Hédan B, Rizk G, Lagoutte L, Leeb T, et al.FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome. Nucleic Acids Res. 2017 45:e57.

Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, et al.The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012 22:1775–89.

Muret K, Klopp C, Wucher V, Esquerré D, Legeai F, Lecerf F, et al.Long noncoding RNA repertoire in chicken liver and adipose tissue. Genet Sel Evol. 2017 49:6.

Lagarde J, Uszczynska-Ratajczak B, Santoyo-Lopez J, Gonzalez JM, Tapanari E, Mudge JM, et al.Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq). Nat Commun. 2016 7:12339.

Hezroni H, Koppstein D, Schwartz MG, Avrutin A, Bartel DP, Ulitsky I. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 2015 11:1110–22.

Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016 44:W242–5.

Villar D, Berthelot C, Aldridge S, Rayner TF, Lukk M, Pignatelli M, et al.Enhancer evolution across 20 mammalian species. Cell. 2015 160:554–66. https://doi.org/10.1016/j.cell.2015.01.006.

Degner JF, Pai AA, Pique-Regi R, Veyrieras JB, Gaffney DJ, Pickrell JK, et al.DNase I sensitivity QTLs are a major determinant of human expression variation. Nature. 2012 482:390.

Qu K, Zaba LC, Giresi PG, Li R, Longmire M, Kim YH, et al.Individuality and variation of personal regulomes in primary human T cells. Cell Syst. 2015 1:51–61. https://doi.org/10.1016/j.cels.2015.06.003.

Scott-Browne JP, López-Moyado IF, Trifari S, Wong V, Chavez L, Rao A, et al.Dynamic changes in chromatin accessibility occur in CD8+ T cells responding to viral infection. Immunity. 2016 45:1327–40.

Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al.The accessible chromatin landscape of the human genome. Nature. 2012 489:75.

Rao S, Huntley M, Durand N, Stamenova E, Bochkov I, Robinson J, et al.A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014 159:1665–80.

Servant N, Lajoie BR, Nora EP, Giorgetti L, Chen CJ, Heard E, et al.HiTC: exploration of high-throughput ‘C’ experiments. Bioinformatics. 2012 28:2843–4.

Durand NC, Shamim MS, Machol I, Rao SS, Huntley MH, Lander ES, et al.Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016 3:95–8.

Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, et al.Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012 485:376–80.

Gong Y, Lazaris C, Sakellaropoulos T, Lozano A, Kambadur P, Ntziachristos P, et al.Stratification of TAD boundaries reveals preferential insulation of super-enhancers by strong boundaries. Nat Commun. 2018 9:542.

Crane E, Bian Q, McCord RP, Lajoie BR, Wheeler BS, Ralston EJ, et al.Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature. 2015 523:240.

Sofueva S, Yaffe E, Chan WC, Georgopoulou D, Rudan MV, Mira-Bontenbal H, et al.Cohesin-mediated interactions organize chromosomal domain architecture. EMBO J. 2013 32:3119–29.

Rudan MV, Barrington C, Henderson S, Ernst C, Odom DT, Tanay A, et al.Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 2015 10:1297–309.

Filippova D, Patro R, Duggal G, Kingsford C. Identification of alternative topological domains in chromatin. Algoritm Mol Bio. 2014 9:14.

Dixon JR, Gorkin DU, Ren B. Chromatin domains: the unit of chromosome organization. Mol Cell. 2016 62:668–80.

Lupiáñez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, et al.Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015 161:1012–25.

Yang Y, Zhang Y, Ren B, Dixon JR, Ma J. Comparing 3D genome organization in multiple species using Phylo-HMRF. Cell Syst. 2019. https://doi.org/10.1101/552505.

Fishman V, Battulin N, Nuriddinov M, Maslova A, Zlotina A, Strunov A, et al.3D organization of chicken genome demonstrates evolutionary conservation of topologically associated domains and highlights unique architecture of erythrocytes’ chromatin. Nucleic Acids Res. 2018 47:648–65.

Harmston N, Ing-Simmons E, Tan G, Perry M, Merkenschlager M, Lenhard B. Topologically associating domains are ancient features that coincide with Metazoan clusters of extreme noncoding conservation. Nat Commun. 2017 8:441.

Dixon JR, Jung I, Selvaraj S, Shen Y, Antosiewicz-Bourget JE, Lee AY, et al.Chromatin architecture reorganization during stem cell differentiation. Nature. 2015 518:331.

Doynova MD, Markworth JF, Cameron-Smith D, Vickers MH, O’Sullivan JM. Linkages between changes in the 3D organization of the genome and transcription during myotube differentiation in vitro. Skelet Muscle. 2017 7:5.

Schmitt AD, Hu M, Jung I, Xu Z, Qiu Y, Tan CL, et al.A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Rep. 2016 17:2042–59.

Djebali S, Wucher V, Foissac S, Hitte C, Corre E, Derrien T. Bioinformatics pipeline for transcriptome sequencing analysis. In: U Ørom, Enhancer RNAs, volume 1468. New York: Humana Press: 2017. p. 201–219.

Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, Van Baren MJ, et al.Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010 28:511–5.

Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011 27:2325–9.

Rohart F, Gautier B, Singh A, Lê Cao KA. mixOmics: an R package for omics feature selection and multiple data integration. PLoS Comput Biol. 2017 13:005752.

Kinsella RJ, Kähäri A, Haider S, Zamora J, Proctor G, Spudich G, et al.Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database. 2011 2011:bar030.

Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010 11:R25.

McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 2012 40:4288–97.

Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodol) 57:289–300.

Falcon S, Gentleman R. Using GOstats to test gene lists for GO term association. Bioinformatics. 2006 23:257–8.

Breiman L. Random forests. Mach Learn. 2001 45:5–32.

Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012 9:357–9.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.The sequence alignment/map format and SAMtools. Bioinformatics. 2009 25:2078–9.

Feng J, Liu T, Qin B, Zhang Y, Liu XS. Identifying ChIP-seq enrichment using MACS. Nat Protocol. 2012 7:1728–40.

Quinlan AR. BEDTools: the Swiss-army tool for genome feature analysis. Curr Protocol Bioinform. 2014 47:11–2.

Ballman KV, Grill DE, Oberg AL, Therneau TM. Faster cyclic loess: normalizing RNA arrays via linear models. Bioinformatics. 2004 20:2778–86.

Lun AT, Smyth GK. csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows. Nucleic Acids Res. 2015 44:e45.

Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011 27:1017–8.

Mathelier A, Fornes O, Arenillas DJ, Chen Cy, Denay G, Lee J, et al.JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2015 44:D110–5.

Servant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP, et al.HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 2015 16:259.

Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, et al.Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Methods. 2012 9:999–1003.

Durand NC, Robinson JT, Shamim MS, Machol I, Mesirov JP, Lander ES, et al.Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 2016 3:99–101.

Barrett T, Clark K, Gevorgyan R, Gorelenkov V, Gribov E, Karsch-Mizrachi I, et al.BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res. 2012 40(Database issue):D57–63. https://doi.org/10.1093/nar/gkr1163.

EMBL-EBI. BioSamples. https://www.ebi.ac.uk/biosamples. Accessed 13 Nov 2019.


Methods

Three types of data were required for this research: species abundance within angiosperm (and conifer) taxa at various levels, total annual value worldwide of plant products, by species, and a list of species whose genome sequence has been published. Our initial data on plants that have been sequenced was collected from the National Centre for Biotechnology Information (NCBI). This list was not comprehensive, since plants whose genomes had been sequenced recently at the time of the data collection (spring of 2015), such as Ananas comosus (pineapple) [8], Coffea canephora (Robusta coffee) [9], Musa balbisiana (wild banana) [10], and Utricularia gibba (humped bladderwort) [11] were not present in the NCBI list. We added as many of these we could find to our list and included them in the analysis. We have continued updating to May 2016.

In all, we found 202 distinct species whose genome had been sequenced however, only 172 were useful to the present study. All algae and mosses were dropped, due to the lack of any economic data. The remaining species, confined to the flowering plants (angiosperms) and the conifer order of gymnosperms, were classified by taxonomic class or subclass, order, family and genus, based on the APG III system of flowering plant classification [12]. The APG system was chosen rather than the Cronquist or other system [13], since it is continually updated to reflect recent plant DNA evidence and other data.

This dataset is available at: http://216.48.92.133/Softwares/PlantGenomes/index.htm . As more plant genomes are sequenced, more families and orders will be included. A fragment of this dataset is depicted in Table 1.

Next, economic data relating to agricultural and forestry products was collected. For agricultural products, this data was compiled from the Food and Agriculture Organization of the United Nations [14]. For agricultural production, the most recent data on economic value is dated from 2013. This data is presented in current US dollars.

Data on forestry products was compiled from a United Nations Economic Commission for Europe Timber Division report on the forestry industry published in 2006 [15]. The data included information on roundwood and sawnwood, for both conifers and non-conifer trees. The conifer section included data on pine, fir, and spruce, and information on birch, beech, poplar, and oak was found in the non-conifer section. Unfortunately, the data dated back to 2004 and only included select countries, notably European and North American. More recent world data for total roundwood and sawnwood production did not provide a breakdown by tree type. The UNECE/FAO Timber Division report provided exports for each country and from this data we aggregated across all countries the total value by each type of tree. This was done for both the sawnwood and roundwood data, and then summed for a grand total for each tree type. This number was then used as the economic value for each type of tree.

After having collected the economic value for all agricultural (including horticultural and other uses) and forestry products, we classified all sequenced species taxonomically according to APG III. For analytical purposes, we retained only order and family, as class/subclass was not of high enough resolution for meaningful analysis, while genus was too high a resolution, since for almost all the species we studied no economic data distinguished between species in the same genus. Once all products were classified, we calculated an aggregate value of for each family and order. Note that some species of economic value belong to a family and even to an order containing no genome-sequenced species when these data were collected.

The data on the total number of species in all of the families and orders was collected from The Plant List [16] and the Encyclopaedia Britannica [17], respectively.

From these data, we constructed Table 2, reflecting all the families found from both the agricultural and forestry products data, as well as from the list of plants sequenced. Only families containing species that have been sequenced, or have economic value, are included. Similarly, Table 3 was constructed for taxonomic orders. Almost a half of all angiosperm and gymnosperm plant orders, but less than a sixth of all families are present in these tables.

An overall summary of the data is presented in Table 4. Of note is the order Poales with 32 genomes sequenced, 31 in the family Poaceae (grasses) plus pineapple. For 15 families with species of economic value, we found no genome sequences have as yet been published, most of them in the six orders containing species of economic value but with no published sequences.


Sharing research data is as important as publishing in a journal or book. Find out about our research data products and services

We are a global publisher dedicated to providing the best possible service to the whole research community. We help authors to share their discoveries enable researchers to find, access and understand the work of others and support librarians and institutions with innovations in technology and data.

We use our position and our influence to champion the issues that matter to the research community – standing up for science taking a leading role in open research and being powerful advocates for the highest quality and ethical standards in research.


Watch the video: Scientific Reports from Nature Publishing Group (August 2022).