In this tutorial you will learn how to process and de novo assemble next-generation sequencing data. The tutorial includes an overview of pairing, trimming and filtering steps that should normally be undertaken prior to assembly, and some general advice for de novo assembly.
In this exercise we will use the Geneious de novo assembler. Geneious also includes a number of other third party de novo assemblers including Spades, Tadpole, Velvet and MIRA. See Which de novo assembler is best for my data for more information on the other assemblers.
The example data provided with this tutorial are Illumina reads extracted from Sequence Read Archive (SRA) entry PRJEB10693. These data comprise paired Illumina MiSeq reads with a raw read length of 149 bp and an expected average insert size of 350 bp. The sequences are derived from the the genome of Escherichia coli str. K-12 substr. MG1655 (Full genome available at NC_000913).
To keep this tutorial a practical size a subset of 7800 paired reads is provided. These reads map to a 10,343 bp portion of the reference genome at coordinates 4,190,266-4,200,608. This region contains 12 complete coding sequences (CDS).
This tutorial requires the BBDuk Trimmer and the MAFFT multiple aligner which are both available as plugins. Go menu Tools → Plugins to install the BBDuk and MAFFT plugins.
If you want to open the tutorial in a different window, click the new window
button . If you prefer to view this tutorial on the
web, click
here.
Use the links below to move to each section of this tutorial.
Overview: NGS preprocessing best practice An overview of NGS preprocessing steps available in Geneious.
Exercise 1: NGS read Preprocessing In this exercise we show you how to pair and trim NGS reads.
Exercise 2: De novo Assembly of paired-end data In this exercise we will assemble the trimmed paired data using the Geneious de novo assembler.
Summary: Other preprocessing tools and general advice for de novo assembly