Exercise 3: Converting a Mauve alignment into a standard alignment

The Mauve alignment document is not a standard alignment document and downstream applications such as phylogenetic tree building cannot be run directly from this document. Thus, it is sometimes necessary to convert a Mauve alignment into a standard Geneious alignment. This involves extracting each LCB alignment, then concatenating them to make a single alignment document.

To do this, select the alignment of NC_015758, NC_012943 and NC_009565 you created in Exercise 1. Then go to Tools → Extract Mauve Regions. In this example we only want to extract the aligned regions of the 3 sequences, and not the small regions of unaligned sequence, so we will set the Minimum number of sequences to 3 and the Maximum number of sequences to 3.

You should now see 3 new alignment documents, one for each LCB, in your document table. In order to concatenate these documents into one alignment, select all three and go to Tools → Concatenate Sequences or Alignments.

When concatenating alignments, you can either match the sequences within each alignment by name, or use their position in the alignment ("index") to determine which sequences to concatenate. In the 3 alignments you have extracted from Mauve, you will see that the 3 sequences are in the same order in each alignment, but their names are slightly different (i.e. they have the base numbering appended to the original name). Thus, we need to use "concatenate by index in alignment" in order to concatenate the alignments correctly.



Use the options as shown in the screenshot above and click OK. A new alignment document comprised of the 3 original LCB alignments will be created. On this alignment you can see the boundaries of the original LCB alignments by turning on the Source annotations.



You may now wish to use Batch Rename to shorten the sequence names in the alignment, as they will be very long. For example the names will be something like "NC_009565 (bases 1 to 936018) - NC_009565 (bases 3492134 to 936021) - NC_009565 (bases 3492476 to 4424434)". This can be shortened to NC_009565 by going to Edit → Batch Rename, and removing 96 characters from the end of the name.

Note that if one of your genomes is rearranged compared with others in the alignment, the order the sequences are concatenated in will be incorrect for that genome, and information on the rearrangement will be lost. You should be aware of this when using the alignment for downstream applications like tree building.

This concludes the Mauve tutorial.