Overview: Best Practice for preprocessing of NGS reads

Proper preprocessing of your NGS reads will improve mapping accuracy and will also usually significantly reduce the possibility of false positive SNP calls.

Your first step should always be to Set Paired reads, followed by trimming, then if required, other preprocessing steps.

Importing/Pairing your NGS data

An NGS sequence service provider will normally provide Illumina paired read data as two separate forward and reverse read lists in fastq format.  In most cases the fastq lists will be compressed by gzip (.gz).  Geneious can import compressed or uncompressed fastq files.

If you import forward and reverse read files together via menu File → From Multiple files then Geneious will offer to pair the files and create a single paired read list. Similarly, if you drag and drop pairs of read lists into the Geneious window then you will be given the option to pair the reads during the import process.

Geneious will determine the likely read technology, so you only need to set the expected insert size (the expected average insert size excluding adapters) and hit OK. 


The output from the Pairing operation will be a single list of interlaced forward and reverse reads.


Manually pairing read lists

If you have already imported your reads lists as separate lists then you can pair after importing by selecting the lists and going menu Sequence → Set paired reads.


NGS Trimming

Accessed via menu Annotate & Predict → Trim using BBDuk.

It is important to trim reads prior to assembly.  Low quality calls at sequence ends will potentially prevent proper assembly and increase the computation and time required to perform assembly.

Geneious provides the BBDuk trimmer as a plugin which can be installed via menu Tools → Plugins.  BBDuk (Decontamination Using Kmers) is a fast and accurate tool for trimming and filtering NGS sequencing data. The plugin allows you to trim adapters using presets for Illumina adapters, trim ends by quality, trim adapters based on paired read overhangs, and finally discard short reads (and their pair mate) that are trimmed to below a minimum length.

The Quality (Q) value is a Phred score. The following table shows examples of how Q correlates to % Likelihood.  Choosing an appropriate Q value will depend on the overall quality of your data.  Generally trimming harder (by setting a higher Q value) will improve subsequent assembly provided it does not remove a significant proportion of your data.  For illumina data we recommend a Q value of 20.


Q value
% Likelihood call will be correct
6
75
10
90
13
95
20
99
30
99.9

Click the following link to jump to Exercise 1: NGS read Preprocessing