Multiple Sequence Alignment

Before sequences can be compared, they must be aligned. This is a trivial exercise if two sequences are from exactly the same genome region, are of the same length and neither has had any nucleotide insertions or deletions. In this case, simply lining up the two sequences automatically aligns each identical nucleotide site.

Unfortunately, the simple situation described above is rarely found, as you have observed when comparing your query sequence to database sequences in the results of your BLAST search. Misalignment between sequences may occur for several reasons. The sequences may be from slightly different, but overlapping, genome regions. Or, more commonly, the sequences are from the same genome region, but, as a result of mutations, one or both sequences have experienced one or more nucleotide insertions or deletions.

The objective of sequence alignment is to align homologous residues among two or more sequences. This is accomplished by placing gaps in one or both sequences so as to increase their similarity at each site, and to do so using the fewest number of gaps. A gap at a nucleotide site in one sequence represents either a deletion in that sequence or an insertion in the other sequence. Since it is usually unknown which event occurred, events resulting in caps are referred to as indels (insertion or deletion). Because indels are rare relative to nucleotide changes, gaps are inserted into sequences sparingly. This is accomplished by using a method that assigns "costs" to initiate and extend a gap and then charges these costs against the increased similarity of two sequences.


Next page: Align the plant DNA sequences

Exercise 3: Molecular Phylogenetics