Exercise 4: Interpreting the results

Open the results file produced, and you will see a number of tables. The first table Summary lists how many of your query sequences were classified into each species. For details of what each query was classified as, look at the Classifications table. The tables displayed on the right show the overlap identities between query and database sequences for each gene. When multiple loci are used, first table shows the overall overlap identities when alignments from each gene are concatenated and tables underneath are the results for each gene.

Have a look at the Classification table, and you will see that the "Unknown3" sequence was not classified. Click on this sequence in the Classifications table to bring up the identity tables for that sequence. You will see that all the entries in this table are red, meaning they are less than the specified minimum overlap identity to classify at genus level (i.e. overlap identities are less than 90%). Thus we can be confident that this sample does not come from kiwi.

Now look at the results for Unknown1 by clicking on this sample in the Classifications table. You will see that all of the top matches for this sample are Apteryx haastii (Great Spotted kiwi), with overlap identities >95%, so this sample is classified as Apteryx haastii. The results table shows the name of the sequence with the top match, so we can look at this to see which known specimen or haplotype matched our unknown sample most closely.


We will now look in more detail at the identity tables. Firstly look at second and third tables. These show the identities between the query and database sequences for each gene. Remember that the Overlap Identity is the identity in the region of overlap between the query and database sequence. The Query Identity is the identity over the entire length of the query sequence where regions from the query that do not align to the database sequences (or where database sequences are missing) are counted as mismatches. Thus, where the query identity is lower than the overlap identity it indicates that the query sequence extends outside the database sequence.

Now look at the top table. This shows the overall results when alignments for each gene are concatenated:



In this table Overlap Identity is a weighted average over both cytochrome b and the control region, weighted according to the overlap length from each contributing locus. We can see that there are two samples in the database that match the query sequence with 100% identity in the region of overlap. The first of these samples, Apteryx haastii S.25729, only has a match for the control region sequence. This sample does not have a cytochrome b sequence in the database, and this is reflected in the column Loci matching which has "1 of 2/1". The "2/1" refers to 2 query sequences that could have matches (as there is both control region and cytochrome b sequences for the query), and 1 database sequence that could have potentially matched (as there is no cytochrome b for that database sequence). The Query Identity is low for this match because of the missing cytochrome b sequence. For the other top hits there are both control region and cytochrome b sequences, so the loci matching column has "2 of 2".

Now click on the Unknown2 result in the Classification table. In the results tables you can see that this sample has 100% identity with a number of Apteryx mantelli (North Island brown kiwi) database sequences so it is classified as this species. However, all the top hits only have cytochrome b sequences in the database.


Interpreting the results continued... alignments and trees