Equation for accurate prediction of PCR yield

Equation for accurate prediction of PCR yield

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

It is a cliche of freshman biology labs to point out that "every cycle of PCR doubles the DNA, so the yield will be $2^{cycles}$ times the template amount". However, if this were true, 1 ng of template would generate about 35 billion ng after 35 cycles, or 35 grams of DNA. This is clearly absurd and not the case.

Of course, the power-of-2 claim is a gross oversimplification (if anything, it is an upper bound - but even so, a very uninformative one), and in practice, yields will fall far short of it because:

  • Every single duplex of DNA does not denature at each cycle
  • Primers do not bind to every single molecule of DNA at each cycle
  • Not every DNA strand gets bound by a polymerase at every cycle
  • Not every polymerase that binds manages to complete the entire product in time in every cycle
  • The reaction inhibits itself by depleting dNTPs
  • The heat denatures the reaction by degrading enzyme

In fact, cursory examination of qPCR output often follows saturation kinetics:

Mathematical methods for modeling qPCR are obviously well developed.

My question is about ordinary PCR: Is it possible get a reasonable expectation of nanogram yield for an ordinary PCR done in a tabletop cycler, with typical PCR reagents?

For instance, when amplifying from a plasmid, I would like to calculate how many cycles to do, how much template to use, and how much product to load on the agarose gel to ensure that I will be able to clearly distinguish exponential amplification (both primers anneal), linear amplification (only one primer anneals), and no amplification (neither primer can anneal or the reaction did not work).

An expected efficiency for a typical PCR is 80%, meaning each cycle multiplies the copy number of the targeted DNA sequence 1.58 times.

Firstly, it makes more sense to refer to the amount of DNA in a polymerase chain reaction in terms of copy number or in terms of moles; the number of DNA molecules of interest is what the reaction is operating on, and the mass of product generated is a function of the length of the product (and, to a lesser degree, on the composition of the product).

The following discussion is sourced from this URL:

According to Perkin-Elemer, copy-number amplification of 100,000 fold of the targeted sequence of DNA can be expected from a PCR with 0.1 ng of Lambda phage DNA (a well-characterized and standard DNA isolate) in a 100 µL reaction with > 25 cycles of denaturation, annealing, and extension.

In the above 100,000-fold amplification example, if the targeted amplicon were to be 500 bp in length, the estimated molecular-weight of duplex DNA of 500bp is 325,000 g/mol (based an average base-pair having a molecular mass of 650 g/mol).

The Lamdba Phage genome is 42,502 base-pairs in length. 42,502 bp × 650 grams/mol/bp = 2.762×10^7 grams/mol.

0.1 ng Lambda DNA -> 0.1×10^-9 grams. 0.1×10^-9 g ÷ 2.762×10^7 g/mol = 3.619 × 10^-18 moles. 3.619 × 10^-18 moles × NA (Avogradro's Number) = 2.179×10^6 copies, or 2,179,000 copies.

2.179×10^6 copies × 100,000 = 2.179×10^11 copies. 2.179×10^11 copies ÷ NA × 325,000 g/mol = 1.176×10^-7 grams of sequence. 1.176×10^-7 grams is equal to 0.117 µg or 117 ng.

An amplification yield of 100,000x after 25 cycles would mean at each cycle 1 template would yield 1.58 templates for the next round of synthesis.

How was this calculated? If c is the number of copies made per round of synthesis, then:

c^25 = 100,000 = 10^5 so c^5 = 10 and so 5(log c) = log 10 = 1 so log c = 0.2 and c = 1.58 (approximately) (Or you could calculate the 25th root of 100,000 on a calculator, if you prefer.)

If we obtain 1.58 copies instead of the theoretical maximum of 2 copies, then the efficiency of the reaction could be said to be 79% (because 1.58/2.00 = 0.79).

One reason this calculation is important is that a slight loss of efficiency is magnified through the amplification. A reaction may appear to have not worked if the efficiency drops (in each cycle) by just a few percent. Optimization is critically important in the polymerase chain reaction.

The equation is correct, but there's an additional asymptotic limit to a maximum concentration of product depending on the starting concentration of NTPs, template and primer pairs in solution too.

Equation of Cellular Respiration

The equation of cellular respiration helps in calculating the release of energy by breaking down glucose in the presence of oxygen in a cell. If you are searching for information on the formula of cellular respiration equation, the following BiologyWise article will prove to be useful.

The equation of cellular respiration helps in calculating the release of energy by breaking down glucose in the presence of oxygen in a cell. If you are searching for information on the formula of cellular respiration equation, the following BiologyWise article will prove to be useful.

Cellular respiration is a common process that is carried out by many organisms to make and release energy. It is basically a process through which the cells covert glucose and oxygen to carbon dioxide and water, and hence release energy for ATP. ATP stands for adenosine triphosphate and is the free energy that is used by cells. It is basically an organic molecule that contains high-energy phosphate bonds. When a phosphate is passed from one ATP to another molecule, that molecule tends to gain energy. This reaction in which a molecule gains energy is known as endergonic reaction. The molecule from which the phosphate is removed tends to lose energy and give off heat. Such a reaction is known as exergonic reaction and the energy level of the molecule decreases.

Would you like to write for us? Well, we're looking for good writers who want to spread the word. Get in touch with us and we'll talk.

Cellular respiration is different from photosynthesis and is usually an aerobic reaction, that occurs in the presence of oxygen. There are four distinct processes that divide the total cellular respiration process. Let us see the four steps involved in brief, before we move into the details of what is the cellular respiration equation.

Steps Involved

  • The first step involves glycolysis. Glycolysis takes place in the cell’s cytoplasm and is an anaerobic process, that does not require oxygen. Glucose is broken down into two molecules of pyruvate in a 10-step process yielding 2 ATPs.
  • The next step involves the entry of pyruvate into the mitochondria that leads to the production of two molecules of acetyl-coenzyme A and 2 molecules of CO2.
  • The third step involves the Citric Acid Cycle (CAC). This is a 9-step reaction that takes place within the mitochondria. The reactions yield 2 ATPs and 4 CO2 molecules.
  • The last step involves the Electron Transport System or cytochrome system that takes place with the help of enzymes that are located in the inner mitochondrial membrane. This step yields the maximum number of ATPs, that is 32 ATP molecules, which makes the total energy produced to 36 ATPs.

These complex reactions lead to the production of 36 ATPs by utilizing one glucose molecule and six oxygen molecules.


The balanced cellular respiration equation yields 36 or 38 ATP molecules that depends on the extramitochondrial NADH-reducing equivalents, which are recycled for glycolysis like glycerol 3- phosphate that gives 36 ATP molecules and malate or aspartate shuttle yields 38 ATPs.

This is the balanced equation that yields energy. Cellular respiration helps cells break sugar which further helps in producing energy. It is an oxidation-reduction process or redox reaction. The oxidation of glucose as CO2 + H2O with an electron removed from C6H12O6. The reduction of oxygen to water with the passage of electron to oxygen is the reduction reaction. The NAD + (nictotinadenine dinucleotide) is a co-enzyme that is reduced to NADH, when it picks up two electrons and one hydrogen ion, making it an energy carrier molecule. Flavin adenine dinucleotide (FAD + ) gets reduced to FADH2, making it another co-enzyme that is an electron carrier.

The cellular respiration equation is a part of metabolic pathway that breaks down complex carbohydrates. It is an exergonic reaction where high-energy glucose molecules are broken down into carbon dioxide and water. It is also known as a catabolic reaction as a large molecule like a carbohydrate is broken down into smaller molecules.

Related Posts

The processes of photosynthesis and cellular respiration are linked to each other. It is important to understand the differences between the two.

Cellular respiration is a technique by which certain plants and organisms produce energy. In this BiologyWise article, we will put forth a detailed explanation on how plants resort to this&hellip

Do you want to know how the body cells convert food into energy, with the help of oxygen? Here is an overview of the steps involved in aerobic respiration. Scroll&hellip

Template DNA denaturation assessment

The initial denaturation step is carried out at the beginning of PCR to separate the double-stranded template DNA into single strands so that the primers can bind to the target region and initiate extension. Complete denaturation of the input DNA helps ensure efficient amplification of the target sequence during the first amplification cycle. Furthermore, the high temperature at this step helps inactivate heat-labile proteases or nucleases that may be present in the sample, with minimal impact on thermostable DNA polymerases. When using a hot-start DNA polymerase, this step also serves to activate the enzyme, although a separate activation step may be recommended by the enzyme supplier.

The initial denaturation step is commonly performed at 94–98°C for 1–3 minutes. The time and temperature of this step can vary depending on the nature of the template DNA and salt concentrations of buffer. For example, mammalian genomic DNA may require longer incubation periods than plasmids and PCR products, based on DNA complexity and size. Similarly, DNA with high GC content (e.g., >65%) often calls for longer incubation or higher temperature for denaturation (Figure 2). Buffers with high salts (as required by some DNA polymerases) generally need higher denaturation temperatures (e.g., 98°C) to separate double-stranded DNA (Figure 3).

Figure 2. Increasing the initial denaturation time improves the PCR yield of a GC-rich, 0.7 kb fragment amplified from a human gDNA sample. The initial denaturation steps were set to 0, 0.5, 1, 3, and 5 minutes respectively.

Some DNA polymerases such as Taq DNA polymerase can become easily denatured from prolonged incubation above 95°C. To compensate for decreased activity in this scenario, more enzymes may be added after the initial denaturation step, or a higher-than-recommended amount of DNA polymerase can be added at the beginning. Highly thermostable enzymes such as those derived from Archaea are able to withstand prolonged high temperatures and remain active throughout PCR (learn more about DNA polymerase characteristics).


Analysis of SMs A. nidulans on Complex Solid Medium Identifies 42 Compounds.

Initially, we evaluated the production of SMs on four different solid media [oatmeal agar (OTA), yeast extract sucrose (YES), Czapek yeast autolysate (CYA), and CYA with 50 g/L NaCl sucrose (CYAS) Materials and Methods] at 4, 8, and 10 d. The object of this was to identify a selection of media that (i) gave as many produced SMs as possible, (ii) showed one or more SMs unique to each medium, and (iii) had SMs that were only produced on two of the selected media.

These characteristics should allow us to have as many active gene clusters as possible, as well as ensuring unique production profiles for as many SM gene clusters as possible.

From this initial analysis, we selected the YES, CYA, and CYAS media for transcriptional profiling. On these media, we were able to separate and detect 59 unique SMs, of which we could name 42 by comparison with our extensive in-house library of microbial metabolites (31) and the AntiBase 2010 natural products database. The production profile of the compounds satisfied the three criteria listed above (Fig. 1, Fig. S1, and Dataset S1).

Venn diagram of SMs found on three different solid media. The number of different metabolites is sorted according to which media the metabolites have been identified on. The number of metabolites unable to be confidently identified are noted in parentheses. Details can be found in Dataset S1, and the chemical structures are illustrated in Fig. S1.

Generation of a Diverse Gene Expression Compendium for A. nidulans.

Samples were taken for transcriptional profiling from plates cultivated in parallel to those of the SM profiling above. RNA was purified, prepared for labeling, and hybridized to custom-designed Agilent Technologies arrays based on version 5 of the A. nidulans annotation (32).

The produced data were combined with previously published microarray data from A. nidulans bioreactor cultivations (33, 34) to form a microarray compendium spanning a diverse set of conditions, comprising 44 samples in total. The set includes four strains of A. nidulans. Four different growth media are included: three complex media (see above) and one minimal medium. Medium variations include five different defined carbon sources (ethanol, glycerol, xylose, glucose, and sucrose), as well as yeast extract. The combined compendium of expression data is available in Dataset S2.

Correlation-Based Identification of Gene Clusters.

To identify gene clusters efficiently around SM synthases, we developed a gene clustering score (CS) based on the Pearson product-moment correlation coefficient. Our CS gives a numerical value for correlation of the expression profile of a given gene with the expression profiles of the three immediate neighbor genes on either side. Only positive correlation is considered. Values for the CS are available in Dataset S2.

Statistical simulation of the distribution of CS on the given dataset showed that CS values ≥2.13 corresponded to a false-positive rate of 0.05 (Fig. S2). Therefore, CS ≥ 2.13 was used as a guideline for identifying the extent of gene clusters.

Prediction of the Extent of 51 Gene Clusters.

Evaluation of the size of the clusters around SM genes was performed using a precomputed list of 66 putative PKSs, NRPSs, and DMATSs from the secondary metabolite unique regions finder (SMURF) algorithm (3) based on the A. nidulans FGSC A4 gene set (35). In addition to these 66 genes, we added one prenyltransferase gene found in the primary literature (30) and three diterpene synthase (DTS) genes predicted by Bromann et al. (25), resulting in 70 putative biosynthetic genes. All 25 experimentally verified PKSs, NRPSs, DTSs, and prenyltransferases were found to be included in this list (Tables 1–3).

Equation for accurate prediction of PCR yield - Biology

To construct and validate a predicting genotype signature for pathologic complete response (pCR) in locally advanced rectal cancer (PGS-LARC) after neoadjuvant chemoradiation.

Methods and Materials

Whole exome sequencing was performed in 15 LARC tissues. Mutation sites were selected according to the whole exome sequencing data and literature. Target sequencing was performed in a training cohort (n = 202) to build the PGS-LARC model using regression analysis, and internal (n = 76) and external validation cohorts (n = 69) were used for validating the results. Predictive performance of the PGS-LARC model was compared with clinical factors and between subgroups. The PGS-LARC model comprised 15 genes.


The area under the curve (AUC) of the PGS model in the training, internal, and external validation cohorts was 0.776 (0.697-0.849), 0.760 (0.644-0.867), and 0.812 (0.690-0.915), respectively, and demonstrated higher AUC, accuracy, sensitivity, and specificity than cT stage, cN stage, carcinoembryonic antigen level, and CA19-9 level for pCR prediction. The predictive performance of the model was superior to clinical factors in all subgroups. For patients with clinical complete response (cCR), the positive prediction value was 94.7%.


The PGS-LARC is a reliable predictive tool for pCR in patients with LARC and might be helpful to enable nonoperative management strategy in those patients who refuse surgery. It has the potential to guide treatment decisions for patients with different probability of tumor regression after neoadjuvant therapy, especially when combining cCR criteria and PGS-LARC.

Wei-Wei Xiao, Min Li, Zhi-Wei Guo, Rong Zhang, Shao-Yan Xi, Xiang-Guo Zhang, and Yong Li contributed equally to this work.

Ming Li and Yuan-Hong Gao are senior authors who contributed equally to this work.

This research was supported by the National Natural Science Foundation of China (No. 81071891, 81672987) Natural Science Foundation of Guangdong Province (No. 2014A030312015) Science and Technology Program of Guangdong (No. 2015B020232008), Science and Technology Program of Guangzhou (No. 201508020250, 201604020003).

Disclosures: The authors declare that they have no conflicts of interest.

Size and other limitations

PCR works readily with a DNA template of up to two to three thousand base pairs in length. However, above this size, product yields often decrease, as with increasing length stochastic effects such as premature termination by the polymerase begin to affect the efficiency of the PCR. It is possible to amplify larger pieces of up to 50,000 base pairs with a slower heating cycle and special polymerases. These are polymerases fused to a processivity-enhancing DNA-binding protein, enhancing adherence of the polymerase to the DNA. [5] [6]

Other valuable properties of the chimeric polymerases TopoTaq and PfuC2 include enhanced thermostability, specificity and resistance to contaminants and inhibitors. [7] [8] They were engineered using the unique helix-hairpin-helix (HhH) DNA binding domains oftopoisomerase V [9] from hyperthermophile Methanopyrus kandleri. Chimeric polymerases overcome many limitations of native enzymes and are used in direct PCR amplification from cell cultures and even food samples, thus by-passing laborious DNA isolation steps. A robust strand-displacement activity of the hybrid TopoTaq polymerase helps solve PCR problems that can be caused by hairpins and G-loaded double helices. Helices with a high G-C content possess a higher melting temperature, often impairing PCR, depending on the conditions. [10]


Neoadjuvant chemotherapy is being used more and more frequently for treating breast cancer patients. This is due to its advantages in reducing tumor size, improving surgical options, and significantly increasing survival in responders. However, broad clinical application remains questionable because of a low response rate and the potential for significant side effects. The most extreme case is TNBC, which is the most aggressive subtype of breast cancer and has the worst prognostic outcome. Due to its heterogeneity, patients with TNBC respond differently to NCT. Numerous efforts have been put into developing the predictive signatures for TNBC, but, currently, there is no clinically applied predictive signature. Therefore, there is an urgent need for developing robust predictive biomarkers for TNBC patients. Although many studies have focused on the chemotherapy regulatory program difference between pCR and RD, 61-63 the mechanisms underlying the survival of resistant tumor cells remain poorly understood.

In this study, we developed a novel framework for identifying predictive gene signatures in breast cancer patients. We validated the efficacy of this framework by showing that the RPS predicted NCT response in breast cancer patients, particularly in ER-positive patients (Figure 2A-C and Table S6). In addition, compared to the commercialized signatures, the RPS had a comparable prediction ability across each individual dataset (Figure 2D-E and Table S6). We then applied the framework to TNBC patients and calculated the TNBC-RPS. The TNBC-RPS was predictive of the response in TNBC patients (Figure 3A-C and Table S7). Compared to the previously-developed ER-negative-specific and nonspecific prediction signatures, the TNBC-RPS outperformed 143 predictive gene signatures and presented robust prediction accuracy (Figure 3D-F and Table S7). Of importance, the TNBC-RPS leads to a higher AUC of up to 0.80 in TNBC patients (Figure 5A-B) and exceeded the performance of the 143 predictive gene signatures when combined with clinical predictors (Figure 5C). We, therefore, provide a new framework for identifying predictive markers of NCT response. In addition, to facilitate the clinical utility of RPS and TNBC-RPS signatures, we also provided a revised version of those two gene signatures with fewer genes (Table S15).

Previous studies calculated the scores of different gene signatures using only a single method. This strategy does not take into account the variation in the methods used to calculate the scores from the gene signatures. Instead of using this one-method-fits-all strategy, we validated the previously published signatures by applying the same algorithms that were used to calculate the scores of each signature to the same datasets and reproduced the published prediction performances. Then, we applied the gene signatures to the validation metadata for prediction. This made the prediction accuracy comparison more objective since we took the impact of different methods into consideration (Figures 2 and 3). In Table S4, we present the validation results of the previous signatures. We acquired consistent results by repeating previous published gene signatures in their validation datasets. Despite the subtle differences between the P-value reported previously and our calculated AUC (likely caused by the update or different normalization methods on the raw microarray data), we showed that our model significantly outperformed most of the reported signatures.

The drug-response mechanisms in breast cancer have been studied for many years but were still poorly understood. We investigated the association between the RPS and characteristics of the ER-positive tumor microenvironment, as well as between the TNBC-RPS and characteristics of the TNBC tumor microenvironment respectively (Figure 6). Of note, the RPS identified changes in the tumor cell proliferation rate and immune cell infiltration in ER-positive patients, which was supported by previous studies showing that the cell-cycle-related 16, 64-66 and immune-infiltration-related gene signatures 67-69 were associated with responsiveness. This observation could be further validated through the prediction performance of the 143 predictive gene signatures. For example, Oncotype DX, a signature composed of cell-cycle-related genes, and the Immune Signature Gene Module score were both predictive to the response in ER-positive patients (Table S6). 16, 28 The TNBC-RPS primarily captured the relative abundance of the stromal cells in the tumor microenvironment. Farmer et al reported the similar finding in TNBC patients, as well. 29 Meanwhile, we also used the stromal cell abundance for prediction in TNBC patients and got an AUC = 0.55, indicating a predictive role of stromal cells in TNBC patients’ NCT response (Figure S3E-F). 69-72 Therefore, our findings provide an understanding of cancer biology in breast cancer by showing which aspect(s) of the tumor microenvironment might influence the response to the NCT.

Although we have demonstrated the efficacy of the RPS and the TNBC-RPS in predicting the response to NCT, the prediction power and the applicable range of the model could be further improved. In addition to the gene signatures, other IHC-staining signatures or MRI imaging-based prediction models were used to predict the response to NCT. 73-75 However, we lack the data to compare the performance of our signatures to these methods or to integrate them into the model for better prediction. Moreover, our signatures were applicable to the prediction of the combination of antimetabolite-, anthracycline-, alkylating agent-, and taxane-based chemotherapy-treated patients and have not been extended to investigate its predictive power with other chemotherapy agents or targeted therapy agents. With the release of more gene expression data, it may be possible to extend the applicable range of our signatures or to develop drug-specific-predictive gene signatures.

In summary, we developed a framework for identifying a predictive gene signature in breast cancer and defined two gene signatures that could be used to predict NCT response in ER-positive and TNBC patients respectively. We have demonstrated that the RPS performed at a comparable level to the current commercialized signatures, while the TNBC-RPS outperformed 143 gene signatures for TNBC patients in prediction. More importantly, integrating the RPS or TNBC-RPS with current established clinical predictors enhanced the predictive power, compared to using the clinical predictors only. In addition, the RPS and TNBC-RPS captured different aspects of the tumor microenvironment, leading to tantalizing insights as to the potential biological mechanisms driving differences in the chemotherapeutic response. This computational framework can also be readily extended to define predictive biomarkers in other cancer types.

Cellular Respiration Equation

Aerobic Respiration Equation

The equation for aerobic respiration shows glucose being combined with oxygen and ADP to produce carbon dioxide, water, and ATP:

C6H12O6 (glucose)+ 6O2 + 36 ADP (depleted ATP) + 36 Pi (phosphate groups)→ 6CO2 + 6H2O + 36 ATP

In lactic acid fermentation, one molecule of glucose is broken down into two molecules of lactic acid. The chemical energy that was stored in the broken glucose bonds is moved into bonds between ADP and a phosphate group.

C6H12O6 (glucose) + 2 ADP (depleted ATP) + 2 Pi (phosphate groups) → 2 CH3CHOHCOOH (lactic acid) + 2 ATP

Alcoholic Fermentation Equation

Alcohol fermentation is similar to lactic acid fermentation in that oxygen is not the final electron acceptor. Here, instead of oxygen, the cell uses a converted form of pyruvate to accept the final electrons. This creates ethyl alcohol, which is what is found in alcoholic beverages. Brewers and distillers use yeast cells to create this alcohol, which are very good at this form of fermentation.

C6H12O6 (glucose) + 2 ADP (depleted ATP) + 2 Pi (phosphate groups)→ 2 C2H5OH (ethyl alcohol) + 2 CO2 + 2 ATP


For prediction of hybrid performance, the median of the correlations (r(y,hat )) between observed and predicted values in cross validation with type 2 hybrids was between 0.74 and 0.75 for grain yield and between 0.88 and 0.99 for grain dry matter content (Fig. 2). The differences in the median of the correlation between prediction with AFLPs, with all 10k mRNAs (mRNA10k), and with random samples of 1k out of the 10k mRNAs (mRNAr1k) were negligible. Prediction with mRNAs had a slightly smaller variation around the median than prediction with AFLPs. The average absolute prediction errors (|y-hat |) had about the same sizes for prediction with AFLPs, all 10k mRNAs and random samples of 1k out of the 10k mRNAs.

Prediction accuracy for hybrid performance of type 2 hybrids (left, in light gray) and type 0 hybrids (right, in dark gray). Correlations (r(y,hat )) between observed and predicted grain yield and grain dry matter content, and average absolute prediction error (|y-hat |) for the predictor sets AFLP (970 AFLP markers), mRNAr1k (1000 random mRNA transcripts), mRNA10k (10,810 mRNA transcripts). The boxplots show the distributions for 1000 cross validation runs, μ are the arithmetic means and Z the medians

For type 0 hybrids, the correlations between observed and predicted hybrid performance for both traits were lower than for type 2 hybrids. The median of the correlations in cross validation was between 0.54 and 0.56 for grain yield and between 0.29 and 0.41 for grain dry matter content. Differences in the median between the predictor sets AFLP, mRNA10k, and mRNAr1k were small. The ranges of the correlations were very large, and in some cross validation runs, even large negative correlations were observed. The average absolute prediction errors were greater than for type 2 hybrids and showed similar values for AFLPs and mRNA.

For prediction of mid-parent heterosis, the median of (r(y,hat )) with type 2 hybrids was between 0.81 and 0.82 for grain yield and between 0.90 and 0.91 for grain dry matter content (Fig. 3). The differences between the predictor sets AFLP, mRNA10k, mRNAr1k were negligible. The average absolute prediction error (|y-hat |) had about the same sizes for the three predictor sets.

Prediction accuracy for mid-parent heterosis of type 2 hybrids (left, in light gray) and type 0 hybrids (right, in dark gray). Correlations (r(y,hat )) between observed and predicted grain yield and grain dry matter content, and average absolute prediction error (|y-hat |) for the predictor sets AFLP (970 AFLP markers), mRNAr1k (1000 random mRNA transcripts), mRNA10k (10,810 mRNA transcripts). The boxplots show the distributions for 1000 cross validation runs, μ are the arithmetic means and Z the medians

For type 0 hybrids, the correlations between observed and predicted mid-parent heterosis were between 0.26 and 0.4 for grain yield. For grain dry matter content no correlation between observed and predicted values in cross validation was observed.

In additional analyses we investigated the effect of further reducing the number of predictor variables below 1000. A decline of the prediction accuracy was observed for both traits (results not shown), which is in line with the results of [6].

We further investigated a ridge regression model in which we included 1000 random mRNAs and in addition the AFLP markers as predictors. We found no situation where combining the predictor sets resulted in a greater prediction accuracy than using them individually (results not shown).


Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH, 43210, USA

Sarah E. Biehn & Steffen Lindert

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar


S.E.B. and S.L. conceptualized and designed the research approach, modeled and analyzed the data, implemented the scoring term into Rosetta, and wrote the paper.

Corresponding author

Watch the video: RTPCR report 72 hours क कय पग ह कय बहर ह रह ह सब candidates ..must watch.. (August 2022).