Do the BLAST scores have any relation between them?

Do the BLAST scores have any relation between them?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Is there any relation among the BLAST scores (E-value, similarity, identity, gap, bit score)? Is the e-value score for an alignment proportional to other scores, such as similarity score (i.e. the lower the e-value, the higher the similarity)?

Thank you!

Yes there is a relationship between them but you may not be able to observe correlation between some of them.

Number of matches and score are definitely proportional, however higher similarity would translate to higher score only if the lengths of the scoring pairs are the same. Gap would have a negative effect on the score but it totally depends on what your scores/penalties are (see BLAST help and documentation).

E-value is basically the likelihood of two random sequences having a certain match score from a given database. So whether or not it is low for a hit depends on the score distribution of the database. Lower E-value does not mean higher similarity. It means that a hit of this score has a low likelihood to be found by random chance. The score distribution is generally skewed towards low scores and so high scores usually have low E-value but what a "high" score is totally dependent on your queries and therefore this statement cannot be generalized.

From BLAST docs:


This formula makes eminently intuitive sense. Doubling the length of either sequence should double the number of HSPs attaining a given score. Also, for an HSP to attain the score 2x it must attain the score x twice in a row, so one expects E to decrease exponentially with score. The parameters K and lambda can be thought of simply as natural scales for the search space size and the scoring system respectively.

Do the BLAST scores have any relation between them? - Biology

The term homoeology has been used inconsistently in historical and modern contexts.

Homoeologs are pairs of genes that originated by speciation and were brought back together in the same genome by allopolyploidization.

Homoeologs are not necessarily one-to-one or positionally conserved.

Evolution-based computational methods have emerged to infer homoeologs from sequencing data.

The evolutionary history of nearly all flowering plants includes a polyploidization event. Homologous genes resulting from allopolyploidy are commonly referred to as ‘homoeologs’, although this term has not always been used precisely or consistently in the literature. With several allopolyploid genome sequencing projects under way, there is a pressing need for computational methods for homoeology inference. Here we review the definition of homoeology in historical and modern contexts and propose a precise and testable definition highlighting the connection between homoeologs and orthologs. In the second part, we survey experimental and computational methods of homoeolog inference, considering the strengths and limitations of each approach. Establishing a precise and evolutionarily meaningful definition of homoeology is essential for understanding the evolutionary consequences of polyploidization.

Correlation Coefficients: Determining Correlation Strength

Correlation Coefficients: Determining Correlation Strength

Instead of drawing a scattergram a correlation can be expressed numerically as a coefficient, ranging from -1 to +1. When working with continuous variables, the correlation coefficient to use is Pearson’s r.

The correlation coefficient (r) indicates the extent to which the pairs of numbers for these two variables lie on a straight line. Values over zero indicate a positive correlation, while values under zero indicate a negative correlation.

A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down. A correlation of +1 indicates a perfect positive correlation, meaning that as one variable goes up, the other goes up.

There is no rule for determining what size of correlation is considered strong, moderate or weak. The interpretation of the coefficient depends on the topic of study.

When studying things that are difficult to measure, we should expect the correlation coefficients to be lower (e.g. above 0.4 to be relatively strong). When we are studying things that are more easier to measure, such as socioeconomic status, we expect higher correlations (e.g. above 0.75 to be relatively strong).)

In these kinds of studies, we rarely see correlations above 0.6. For this kind of data, we generally consider correlations above 0.4 to be relatively strong correlations between 0.2 and 0.4 are moderate, and those below 0.2 are considered weak.

When we are studying things that are more easily countable, we expect higher correlations. For example, with demographic data, we we generally consider correlations above 0.75 to be relatively strong correlations between 0.45 and 0.75 are moderate, and those below 0.45 are considered weak.



Syntenator combines conservation of gene order and local sequence similarity to deduce gene orthology. Partial order alignments are represented by partial order graphs (POG). We present an implementation that operates on one POG and one simple chain graph, which is a representation of a linearly ordered gene set (e.g. a genome). An extension of the concept to the alignment of two arbitrary POGs will be discussed in detail.

Modifications to the recurrence relation

We need to modify the recurrence relation of the traditional Smith-Waterman approach to work on POGs. To compute a maximal alignment score for a particular pairing of vertices (n, m) by dynamic programming, we need to consider all gene vertices that are linked to n and m by outgoing edges. The corresponding recurrence relation of the score function for gapped local alignments is given in Eqn 1. [14]

Each cell S(n, m) of the dynamic programming matrix is maximized over the four possibilities: match, insertion, deletion and starting a new alignment. The main difference to traditional pairwise local alignment are P and Q, the sets of predecessor nodes of n and m in the corresponding POGs. For complex POGs, we have to consider |P| × |Q| alternative candidates in case of a match. The most simple case is |P| = |Q| = 1 if we were to align two genomes. Our implementation operates on one POG and one simple chain. Consequently, we have either |P| = 1 or |Q| = 1. The expressions s(n, m) and Δ denote the match score for two nodes and the gap penalty, respectively.

Gene order alignment

Initially, all pairwise alignments between two POGs (e.g. G1 and G2) are computed in forward and reverse direction. An alignment in the reverse direction requires the reversal of all edges in one of the two POGs. For each comparison (in both directions), we consider all local (sub)optimal alignments above a certain threshold Θ. All alignments are ranked by their scores in descending order. Based on these alignments, we decide which vertices match and should be fused into a common vertex. We greedily assign vertex matches by traversing the ordered list top-down.

Algorithm 1 (see Appendix) shows the adaptations of the algorithm of Lee et al. [13] to produce a set of all suboptimal alignment paths P. Such a path consists of a tuple (s, L, r) where s denotes the score, L is a list of aligned node pairs and r indicates wether a gene order was aligned in its original or reversed orientation. The score is adjusted by subtracting the initial score s initwhich is defined as the last minimal score encountered during traceback before the score exceeds the final alignment score or 0 if no such minimum exists. This adjustment is necessary to prevent that alignments inherit scores from previous higher scoring alignments.

Merging genome graphs

In POA, two graphs, G1 and G2, are merged after each round of pairwise alignments. We have already discussed how to identify pairs of vertices (e.g. (v, w) with vG1 and wG2) that should be merged between both graphs. We denote this as 1:1 mapping M.

In the merging step, we iterate over all vertices wG2 and add a copy of w to G1 if wM. If (v, w) ∈ M we fuse v and w by copying the genes stored at w to v. If a G1-equivalent of the predecessor node of w exists, we connect this G1-equivalent predecessor node of w to v. All connections between nodes that were not fused, but simply added to the graph, are retained in the merged graph.

The merging of two POGs may introduce cycles into the resulting POG for two reasons: 1) Local alignments are not collinear in the respective input POGs (Figure 2A). 2) Local alignments are produced in both orientations (forward and reverse, Figure 2B).

Removing cycles after merging POGs. Panel A depicts the situation where two local gene order alignments "cross". Matches between nodes are shown as dashed connections between G1 and G2. G3 shows the situation after the merging step where a loop has introduced a cycle. This cycle is detected by the program and removed by reversing all edges (see text). The final POG looks like G4. Panel B depicts the scenario where two local alignments exist in different orientations (A-B in G1, G2 and C-D in G1, G2r). G3 shows the final POG after merging and cycle removal. Solid edges stem from the reference graph G1. The two dashed edges have been introduced to represent order relations that are unique to G2. The edge from D to C in G2 would introduce a cycle and had to be removed. The "kinked" edge represents the alignment of C→D in G1 to D→C in G2.

These particular problems did not arise in the original implementation for protein or EST sequence alignment (e.g. [14]) where DAGs are aligned in one defined orientation (e.g. N to C terminus for proteins, 5' to 3' end for ESTs) and just one optimal alignment is reported.

To resolve newly introduced cycles in scenario 1 (Figure 2A), we use a topological ordering of G1 and check at all branching points, whether a loop path consisting of new nodes from G2 induces a cycle in the merged graph G3. We have to test if the loop path returns to a node in G1 at an index which is less or greater in terms of the topological order than the index of the branching point from which we started off. If the path is a forward path and the index of the returning point is smaller than the index of the branching point, all edges within the path have to be reversed to keep the graph acyclic. This procedure leads to G4 in Figure 2A. The case for the backward path works analogously. If the newly added loop is part of a greater loop in G1, we have to search in both directions for the endpoints of the old loop to define an order relation on the newly added loop.

The second case (Figure 2B) emerges if local alignments of opposite orientations exist. In the given example, a cycle would be formed between nodes C and D as they are aligned in opposite orientation to A and B. This is circumvented by keeping the edge orientation of one graph (G1) for the reverse alignment. The "dashed" edges are added to preserve the original order relations of G2.

Repetitive regions that may result from duplication events do not introduce cycles into the merged POG since we greedily enforce a 1:1 mapping of gene nodes. Only the best matching repeat copies would be merged.

Score function

Our algorithm relies on BLASTP hits as general similarity measure. From the set of all-against-all BLASTP hits, we save a bitscore for each gene pair in a lookup table. In case of alternative transcripts the highest score between any two protein products is saved.

We chose a scoring function that allows us to order alignments according to the number of aligned pairs or to the sum of pairwise similarities in case of equal numbers of pairs.

For each pair of genes (A, B) a symmetric score function is given by Eqn. 2. The individual contributions are shown in Eqn. 3.

S match(A, B) = s(A, B) + s(B, A)

We require sbitscore to be ≥ 50. The match score is always < 2: lim s bitscore → ∞ S m a t c h ( A , B ) → 2 [email protected]@[email protected]@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaaCbeaeaacyGGSbaBcqGGPbqAcqGGTbqBaSqaaiabdohaZnaaBaaameaacqqGIbGycqqGPbqAcqqG0baDcqqGZbWCcqqGJbWycqqGVbWBcqqGYbGCcqqGLbqzaeqaaSGaeyOKH4QaeyOhIukabeaakiabdofatnaaBaaaleaacqWGTbqBcqWGHbqycqWG0baDcqWGJbWycqWGObaAaeq[email protected][email protected] .

This can be interpreted as summing up over the entries of a non-symmetric weighted adjacency matrix of all pairwise homology relationships. A mismatch score is assigned if the two genes under comparison either have no BLAST hit or if they are located on different strands.

In order to score a match of vertices which contain multiple genes, we use a normalized sum-of-pairs score (Eqn. 4).

nv, wdenotes the number of genes of nodes v and w, n ( G v , G w ) [email protected]@[email protected]@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOBa42aaSbaaSqaaiabcIcaOiabdEeahnaaBaaameaacqWG2bGDa[email protected][email protected] denotes the number of species in the graphs of v and w. The term Cv, win the denominator of Eqn. 4 is a scaling factor whose definition depends on the current alignment score. Cv, wis equal to the number of comparisons between either all species in nodes v and w or the number of all species in the graphs of v and w (Eqn. 5). This correction scheme was implemented because weak BLAST hits tend to appear in the set of genes of both vertices more often if the number of compared genes increases. As a consequence pairwise scores tend to be higher than the averaged scores of multiple comparisons. In order to equalize this effect, we replace nv, wby n ( G v , G w ) [email protected]@[email protected]@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOBa42aaSbaaSqaaiabcIcaOiabdEeahnaaBaaameaacqWG2bGDa[email protected][email protected] as soon as the alignment score σ exceeds the threshold Θ. This triggers a switch towards a more specific search for alignments containing genes from multiple species.

Difference Between Causation and Correlation

There is much confusion in the understanding and correct usage of causation and correlation. These two terms are always interchanged especially in the fields of health and scientific studies.

Every time we see a link between an event or action with another, what comes to mind is that the event or action has caused the other. This is not always so, linking one thing with another does not always prove that the result has been caused by the other.

Causation is an action or occurrence that can cause another. The result of an action is always predictable, providing a clear relation between them which can be established with certainty.

Causation involves correlation which means that if an action causes another then they are correlated. The causation of these two correlated events or actions can be hard to establish but it is certain.

Establishing causality between two correlated things has perplexed those that are involved in the health and pharmaceutical industries. The fact that an event or action causes another must be obvious and should be done with a controlled study between two groups of people.

They must be from the same backgrounds and given two different experiences. The results are then compared and a conclusion can then be drawn from the outcome of the study. The process of observation plays a significant role in these studies as the subjects must be observed over a certain period of time.

Correlation is an action or occurrence that can be linked to another. The action does not always result to another action or occurrence but you can see that there is a relationship between them. Although the action does not make the other thing happen, the possibility of having something happen is great.

Correlation can be easily established through statistical tools. The correlated events or actions can be because of a common cause. Establishing correlation can be made certain if there are no explanations that will prove causality.

When you say that exposing kids to too much violence on television and films causes them to become violent adults can be untrue. Although violence on television and films can influence behavior, adults who are violent might have acquired the habit due to other factors such as poverty, mental illness, physical, mental, and emotional abuse as children.

It is therefore wrong to assume that violent behavior is due to television and films because there are several different aspects to consider. It is safer to say that there is a correlation between watching violent television shows and films and violent behavior than to say that violence in television and films causes violent behavior.

1. Causation is an occurrence or action that can cause another while correlation is an action or occurrence that has a direct link to another.
2. In causation, the results are predictable and certain while in correlation, the results are not visible or certain but there is a possibility that something will happen.
3. Establishing causality is harder while there are many statistical tools available to establish correlation between events or actions.


The TF mapping result set TFf comprises the best results when we only consider how many TFs are mapped, but it does not produce the best regulatory links set when integrated with the mapped TG set TGblbs. This indicates that the set TFsf containing the mapped TFs based on sequence similarity and subfamily classification contains the most efficiently and correctly mapped TFs for the purpose of mapping regulatory links.

The TGbs set, TGpr set and TGgalf set contain the best results of mapped TGs from the previous subsection, but they do not work well when used in the integration of regulatory elements to predict regulatory links. The additional predicted TGs in these sets lead to too many false regulatory links in the sets TFsf-TGbs, TFf-TGpr, TFsf-TGpr, and TFf-TGgalf. These false links indicate that many of the true TGs identified for the target genome in set TGbs, set TGpr and set TGgalf are, however, not the correctly mapped TGs linked to the right TF in their corresponding regulatory link result sets. All the predicted TGs do contain the TFBS motifs for some TF but these TGs need to correspond to the correct TF, identifying the right regulatory link in the target genome. These results suggest that, in order to be used to map regulatory links, the TGs identified in the target genome using TFBS motifs also need to be similar in sequence to the source genome TGs, as identified in set TGblbs. Therefore, the result of other possible sets, TFf-TFbs and TFsf-TGgalf, are not included in this paper as their sets of regulatory elements are already determined to lead to many false positives when integrated to produce a predicted regulatory link.

Don't blame the genes

The left has an impressive ability to lie to itself. When faced with facts that conflict with ideology, the easy way out is to deny the facts. This was long the case in economics but although there are no doubt a few who still regard the New Economic Plan as a betrayal, in general realism has prevailed. There are not many Guardian readers left who believe that Japan's problems arise because it did not follow the path pursued by Albania.

In science, though, self-delusion is still in charge. For genetics in particular, truths must be disowned because they are embarrassing. Racists are evil people who believe in immutable differences between groups ergo, no such difference can exist. Crime, as the whole world knows, results from inequality. Any suggestion that biology is involved must be, by definition, wrong. Because Hitler wished to improve the human race by selective breeding, genetics is a Nazi science whose every move is part of a eugenic plot.

Recently I was rash enough to write a book on human genetics. It was greeted by barking from both ends of the political spectrum. To some, neglecting to wring one's hands the requisite number of times whenever the word 'gene' is heard is a heinous crime to others it is an equal affront to suggest that biology might limit free will. The worst outrage of all is to hint that there is no conspiracy, that research is done to help the afflicted or for curiosity.

Public ignorance of what is really going on resides in two complementary facts: the idleness of reporters and the arrogance of scientists. Scientists are notoriously bad at disclosing the truth, but, in its futile quest for a hidden agenda that is not there, the press is missing much of the point.

Take the question of genetic differences between groups. Much though it might exasperate the Gene Pool Relations Board, such differences exist and I see nothing wrong in using the word 'race' to describe them. What is more, the ability to do well in IQ tests runs in families, and American blacks have an average score 15 points lower than do whites.

The Edinburgh Buffoon, Chris Brand, recently revived the ancient smear that this must be due to genes and is hence unalterable. Substitute 'blood pressure' for 'intelligence' and his error is obvious. High or low blood pressure runs in families (indeed, one of the genes involved was isolated this month). In America, middle-aged black men score about 15 points higher than whites. Although the figures are similar to those for IQ, the response is oddly different: racial divergence, most say, is due to the environment - to poor diet or to smoking. This can be (and has been, with much success) changed.

For blood pressure it seems obvious that inheritance within groups is irrelevant to divergence between them for IQ there is a curious readiness to accept that such differences are due to genes. The evidence on its own supports neither idea (although at least the environmentalists have some experiments to try).

All this is more interesting than a sterile debate about who is a racist. Rather than concentrate on Brand's elementary mistake, though, the fuss was about whether his book should be published.

Take, too, the 'gene for crime'. Half the 60,000 genes that make a human being are switched on in the brain. More and more mutations are found that influence behaviour. In one - and only one - Dutch family a single change interferes with nerve transmission. Almost everyone who has it has been in trouble. Schizophrenia, too, often leads to skirmishes with the law. It is now clear that some cases are due to damage to genes. It is only a matter of time before a genetic test is used in court.

Most geneticists have no problem with the research - which, in spite of endless argument about crime as a social construct, is no more perplexing than studying other characters (such as blood pressure or IQ) that involve both nature and nurture.

The interesting question, though, is not in the science but in how it is interpreted. It seems natural that an inborn disposition to crime (or, for that matter, to heart disease) should lead to forgiveness. That, though, is not the only possible response. In the 1930s a German geneticist claimed to have found a gene shared by many male homosexuals. The response of the Nazis was simple: sterilise them. That of the German Socialist Medical Association was equally straightforward: homosexuality is not under the control of free will and should no longer be illegal.

Whatever their ethical merits, both views make logical sense. In the United States, too, genes are appealed to both in mitigation and in blame. One murderer in Georgia is trying to escape the chair on the grounds that he has an inherited predisposition to crime. In Texas, though, the law has changed to ensure that those who might pose 'an enduring threat to society' (that is, those with bad genes) are executed.

AGAIN and again the story is the same. It is not science that is contentious but how it is used. Why should DNA be the only chemical immune from patent protection? It is unfair that genes from cancer patients be taken by vast corporations without donors getting a penny. But the best protection for those with interesting DNA is to get a good lawyer before someone else does it's not whether the gene should be patented, but who owns the patent. And - in spite of the hype about genetic engineering - the best way to design your baby is still to send him to Eton.

The new anti-genetics has an odd resonance from an earlier age when, inflamed by the true faith, Stalin denied the right of the subject itself to exist. So firm was his belief in the primacy of opinion that Lysenko - the acme of Soviet political correctness - was hired to ensure that DNA be abolished. The purge against reality was announced in 1948. What caused hunger was not a collapse of collective farms but science: 'It is high time to realise that today our Morganist-Mendelists are in effect making common cause with the international reactionary force of the bourgeois apologists, not only of the immutability of genes but also of the immutability of the capitalist system. . . Geneticists have done us tremendous harm. We must now finally and irrevocably take this reactionary and unscientific theory down from its pedestal. I am fully convinced that if we guide ourselves by the only correct theory, the theory of Marx, Engels, Lenin and Stalin, and take advantage of the tremendous care and attention which the genius of Stalin bestows upon men of Science, we shall undoubtedly be able to cope with this task.'

Well aware of the fate of dissidents, several geneticists read out a letter of apology: 'Glory to the Great Stalin, the leader of the people and the coryphaeus of progressive science!' Their retraction caused 'stormy, prolonged and mounting applause and cheers. All rise'.

Nowadays Stalin himself is denounced. Biology, though, is still in the firing line. Any report that has the temerity to consider science just as science is immediately reported to the great Party Congress of public opinion. Faced with the new Lysenkoism it is worth remembering that the moral issues lie not in genetics, but in the agenda of those who use it and that liberation lies not in denying science, but in understanding what it is trying to do.

Steve Jones is professor of genetics at University College London. His book, In The Blood, is published by HarperCollins ( pounds 20)

Think about or draw out your family tree adding aunts, uncles, and cousins. (If you don't have siblings or cousins just draw a big family tree from your imagination.) Based on your family tree, you can see that you are more closely related to your sister (or brother) than you are to your cousin that is there are fewer "branches" separating you from your sister than there are separating you and your cousin.

Now imagine that a biologist arrived at a big family reunion and had no idea who were sisters, cousins, aunts, uncles, etc. but tried to sort it out by how all of you look. Just based on how you look, would s/he be able to guess which of the two kids standing next to you is your sister and which is your cousin? In many families, the biologist may be able to make a pretty good guess based on your visible features (called your morphology), like number of arms/legs/eyes, hair color, nose shape, etc. (Notice that some of these morphological features are shared by all humans but that other features can be used to distinguish you from one another.) But this is not a fail-safe approach to determining familial relationships&mdashas some people look more like their cousin than their sister, right? You could just use morphology to make a good guess.

So what is the best way to determine how related you are to one another (besides just asking -- but stick with me here)? The biologist would have to look at your DNA! You get half of your DNA from your mother and half from your father. Both of those "halves" are very similar to one another&mdashwith one difference about every 1000 base pairs (but out of three billion total letters&mdashthat's three million differences!). And your mother and father got their DNA from their parents and so on up the family tree. Your DNA should be MUCH more similar to your sister's than your cousin's because you and your sister both got your DNA from the same parents, whereas there are many more branches in the tree (and thus many more matings and DNA base pair differences entering the tree) between you and your cousin. That is, you are much more similar genetically to your sister because you have more recent common ancestors than you and your cousin.

Family Trees In Biology

So how does all of this apply to biology? For centuries, scientists have been trying to draw the family tree that reflects the history and evolution of all animals on the earth. This tree would show which species are more closely related to one another, like the case where you are "closer" to your sister on your family tree than you are to your cousin. For example, humans are more closely related to chimpanzees than to dolphins, so chimps and humans would have fewer branches between them on the "animal family tree."

How do scientists make this family tree? For many years, scientists relied on comparisons of morphological characteristics (like hair, teeth, limbs, fins, hearts, livers, eyes, etc.) to try to figure out who was more closely related to whom. These kinds of comparisons are often accurate, but as you saw in the example of a human family, these physical characteristics can sometimes be misleading. Evidence of this concept is that different scientists would come up with different trees/relationships by using different sets of morphological information! So which tree is "right?"

To think about how to identify the "right" tree, we have to think about how these animals became different from one another throughout evolution. All heritable morphological changes (those changes that can be passed down to the next generation) are a result of changes (mutations) in an organism's DNA. This mutation can lead to a change in a protein sequence or a change in when, where or how much of the protein gets made. That's it! One or a couple of these changes can lead to big a difference in morphology and/or the way a single cell in the organism can function. So over billions of years of evolution, a slow accumulation of DNA sequence (and thus some protein sequence) changes has led to the existence of all of the earth's different species -- with some more closely related to one another than others. This whole process is called molecular evolution.

So, as we saw with the family reunion example, the best way to see how related two organisms are is to compare their DNA or protein sequences. (Remember that a protein's sequence is encoded in its gene's DNA - so the only way to get a protein sequence change is to get a change in the DNA that codes for it.) Those organisms with the most similar DNA/protein sequence are almost surely more closely related than those with less similar DNA/protein sequences.

Why didn't scientists use DNA sequences to build the trees 100 years ago? First, it has only been about 50 years since the discovery that DNA is actually the genetic material that gets passed on through generations. Second, DNA and protein sequencing technologies have only recently gotten efficient enough that DNA/protein sequence data is available from many different kinds of animals. With all of this new information, scientists are working hard to build the "true" animal family tree. And there have been cases where the tree built using DNA sequence data differs from those built using morphological data! (Can you explain for your project why DNA sequence is the "gold standard" for determining relatedness between animals?)

Note: Even though sequence comparison is the gold standard, it is not perfect. Sometimes comparisons of different proteins will yield different trees. Which one is right? Why might this happen?

B-Cells vs T-Cells (Similarities and Differences between B-Lymphocytes and T-Lymphocytes)

Lymphocytes are the key cells of the immune system and they are responsible for the adaptive immune response of an organism. They are also responsible many of the immunological characteristics such as specificity, diversity, memory and self/non-self recognition. Lymphocytes constitute about 20 – 40% of the body’s White blood cells and 99% of the cells of the lymph. Lymphocytes are broadly classified into THREE populations based on their function, lifespan, cell surface components and most importantly their place of maturation. They are B-Lymphocytes (B-Cells), T-Lymphocytes (T-Cells) and Natural Killer Cells (NK Cells).

B-Lymphocytes (B-Cells):

They mature in the bone marrow or bursa (in birds). B-cells possess membrane bound immunoglobulins which acts as the receptors for the antigens. They are involved in the humoral (antibody mediated) immune responses.

T-Lymphocytes (T-Cells):

They mature in the thymus, hence the name. T-cells possess receptors for antigens on their surface but it is structurally different from immunoglobulins. They are involved in Cell-mediated immune responses.

The present post discusses the Similarities and Differences between B-Lymphocytes (B-Cells) and T-Lymphocytes (T-Cells) with a Comparison Table.

Similarities between B-Cells and T-Cells

$. Both B-cells and T-cells are lymphocytes.

$. Both are the descendants of lymphoid progenitor cells.

$. Both are produced in the bone marrow.

$. Both are nucleated cells with a large nucleus.

$. Both are nonphagocytic cells.

$. Both are found in peripheral blood and all lymphoid tissues.

$. Both are involved in the adaptive immune response of an organism.

$. Both B-cells and T-cells are morphologically similar (cannot be distinguished morphologically under the light microscope).

Correlational research

Naturalistic observation is a method of observation, commonly used by psychologists, behavioral scientists and social scientists, that involves observing subjects in their natural habitats. Researchers take great care in avoiding making interferences with the behaviour they are observing by using unobtrusive methods. Objectively, studying events as they occur naturally, without intervention. (Manoli, Frank, Don Juan Gabriel 2007) They may observe animals in their natural habitat. They observe mating, living conditions, and many other qualities of animals. They can be overt (the participants are aware they are being observed) or covert (the participants do not know they are being observed) There are obviously more ethical guidelines to take into consideration when a covert observation is being carried out.

One popular method is called naturalistic observation, which requires a researcher to observe and record the natural environment without interference. An advantage of naturalistic observation is that the researcher is observing variables in a natural state. Some disadvantages are that it can be difficult to control the variables or prevent outside influences from affecting the results.
Another type of correlational research is called the survey method. Surveys are inexpensive and quick, and can be used to gather information from very large groups of people. However, poorly written survey questions can skew results. Another downside is that survey results are also dependent on survey respondents, who are not always reliable.
A third method for correlational research is archival research, which analyzes historical records. An advantage to this method is that it’s a viable way to analyze large amounts of data without spending a lot of money. A fault of this particular research method is that the researcher has no way of knowing if the original data collection methods were sound. Correlational studies are a helpful tool for performing psychological research. However, it’s important to remember that no study method is flawless. Researchers must take into consideration the limitations of both their chosen research method and correlational studies in general.

Survey research a research method involving the use of questionnaires and/or statistical surveys to gather data about people and their thoughts and behaviours. This method was pioneered in the 1930s and 1940s by sociologist Paul Lazarsfeld. The initial use of the method was to examine the effects of the radio on political opinion formation of the United States. One of its early successes was the development of the theory of two-step flow of communication. The method was foundational for the inception of the Quantitative research tradition in sociology. The two-step flow of communication model hypothesizes that ideas flow from mass media to opinion leaders, and from them to a wider population.

An archive is a way of sorting and organizing older documents, whether it be digitally (photographs online, E-mails, etc.) or manually (putting it in folders, photo albums, etc.). Archiving is one part of the curating process which is typically carried out by a curator. The art of searching for archives consists of four main step:
Thinking about questions to find the archive in mind. Ask oneself:* Do I need specific information or just am I just curious about a broad topic?* What is my topic of interest?* Should I be using an archive or a library?2. Get the basic facts about the topic of interest.3. Use websites associated with the particular archive building to search for the archive.4. Decide if one should visit the archive building for further assistance.
Many archives have been around for multiple hundreds of years. For instance Vatican Secret Archives was started in the 17th century AD and contains state papers, papal account books, and papal correspondence dating back to the 8th century. Most archives that are still in existence do not claim collections that date back quite as far as the Vatican Archive.

the reason for highlighting the breadth and depth of historical archives is to give some idea of the difficulties facing archival researchers in the pre-digital age. Some of these archives were dauntingly vast in the amount of records they held. For example, The Vatican Secret Archive had upwards of 52 miles of archival shelving. In an age where you could not simply enter your query into a search bar complete with Boolean operators the task of finding material that pertained to your topic would have been difficult at the least. The Finding aid made the work of sifting through these vast archives much more manageable.[4] A finding aid is a document that is put together by a archivist or librarian that contains information about the individual documents in a specific collection in an archive. These documents can be used to determine if the collection is relevant to a designated topic. Finding aids made it so a researcher did not have to blindly search through collection after collection hoping to find pertinent information. However, in the pre-digital age a researcher still had to travel to the physical location of the archive and search through a card catalog of finding aids.

Organizing, collecting, and archiving information using physical documents without the use of electronics is a daunting task. Magnetic storage devices provided the first means of storing electronic data. As technology has progressed over the years, so too has the ability to archive data using electronics. Long before the internet, means of using technology to help archive information were in the works. The early forms of magnetic storage devices that would later be used to archive information were invented as early as the late 1800s, but were not used for organizing information until 1951 with the invention of the UNIVAC I.
UNIVAC I, which stands for Universal Automatic Computer 1, used magnetic tape to store data and was also was the first commercial computer produced in the United States. Early computers such as UNIVAC I were enormous and sometimes took up entire rooms, rendering them completely obsolete in today's technological society. But the central idea of using magnetic tape to store information is a concept that is still in use today.
While most magnetic storage devices have been replaced by optical storage devices such as CDs, USB flash drives DVDs, some are still in use today.[5] In fact, the floppy drive is one example of a magnetic storage device that became extremely popular in the 1970s through the 1990s. Older 5.25" floppy discs have not been used for quite some time but the smaller 3.5" floppy discs aren't obsolete yet. The 3.5" discs hold approximately 1.44 mgs of data and for years have been used by millions of people to back up the information on their hard drives.
Magnetic tape has proven to be a very effective means of archiving data as large amounts of data that don’t need to be quickly accessed can be found on magnetic tape. That is especially true of aging data that may not need to be accessed again at all, but for different reasons still needs to be stored “just in case”
With the explosion of the internet over the past couple decades, archiving has begun to make its way online. Thedays of using electronic devices such as magnetic tape are coming to an end as people start to use the internet to archive their information.

In statistics and data analysis, a raw score is an original datum that has not been transformed. This may include, for example, the original result obtained by a student on a test (i.e., the number of correctly answered items) as opposed to that score after transformation to a standard score or percentile rank or the like.
Often the conversion must be made to a standard score before the data can be used. For example, an open ended survey question will yield raw data that cannot be used for statistical purposes as it is however a multiple choice question will yield raw data that is either easy to convert to a standard score, or even can be used as it is.

Dichotomous data are data from outcomes that can be divided into two categories (e.g. dead or alive, pregnant or not pregnant), where each participant must be in one or other category, and cannot be in both.
A standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent[1] and are administered and scored in a predetermined, standard manner.[2]
Any test in which the same test is given in the same manner to all test takers is a standardized test. Standardized tests need not be high-stakes tests, time-limited tests, or multiple-choice tests. The opposite of a standardized test is a non-standardized test. Non-standardized testing gives significantly different tests to different test takers, or gives the same test under significantly different conditions (e.g., one group is permitted far less time to complete the test than the next group).
Standardized tests are perceived as being more fair than non-standardized tests. The consistency also permits more reliable comparison of outcomes across all test takers.

Examples of negative correlations include those between exercise and heart failure, between successful test performance and feelings of incompetence, and between absence from school and school achievement.
When will a correlation be positive?
Suppose that an X value was above average, and that the associated Y value was also above average. Then the product
would be the product of two positive numbers which would be positive. If the X value and the Y value were both below average, then the product above would be of two negative numbers, which would also be positive.
Therefore, a positive correlation is evidence of a general tendency that large values of X are associated with large values of Y and small values of X are associated with small values of Y.
When will a correlation be negative?
Suppose that an X value was above average, and that the associated Y value was instead below average. Then the product
would be the product of a positive and a negative number which would make the product negative. If the X value was below average and the Y value was above average, then the product above would be also be negative.
Therefore, a negative correlation is evidence of a general tendency that large values of X are associated with small values of Y and small values of X are associated with large values of Y.