Skip to content

Adapter trimming and virtual library creation routine for Illumina Nextera Mate Pair libraries.

License

Notifications You must be signed in to change notification settings

CarolineOhrman/NxTrim

 
 

Repository files navigation

nxtrim: Software to remove Nextera Mate Pair adapters and categorise reads according to the orientation implied by the adapter location. This software is not commercially supported.

Copyright (c) 2014, Illumina, Inc. All rights reserved.

This software is provided under the terms and conditions of the BSD 2-Clause License

You should have received a copy of the BSD 2-Clause License along with this program. If not, see https://github.com/sequencing/licenses/.

####Dependencies

BOOST - we use Boost 1.55.0 but most recent versions are probably fine

####Installation

git clone [email protected]:sequencing/NxTrim.git
cd NxTrim
make
./nxtrim

You will also need to point the BOOST_ROOT environment variable at your boost installation eg.

export BOOST_ROOT=/your/boost/installation

if boost is installed globally then

export BOOST_ROOT=/usr/lib/

should work

####Usage Trim the data:

nxtrim -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -O sample 

Assemble with Velvet:

velveth output_dir 55 -short -fastq.gz sample.se.fastq.gz -shortPaired2 -fastq.gz sample.pe.fastq.gz -shortPaired3 -fastq.gz sample.mp.fastq.gz -shortPaired4 -fastq.gz sample.unknown.fastq.gz
velvetg output_dir -exp_cov auto -cov_cutoff auto -shortMatePaired4 yes

Assemble with SPAdes:

cat sample.mp.fastq.gz sample.unknown.fastq.gz > sample.allmp.fastq.gz
spades.py -k 21,33,55,77 -t 4  --pe1-s sample.se.fastq.gz --pe2-12 sample.pe.fastq.gz --hqmp3-12 sample.allmp.fastq.gz --hqmp3-fr  -o output_dir

Note we concatenate the unknown/mp libraries for SPAdes. This command is suitable for 2x151bp data, if you have 2x251bp then use -k 21,33,55,77,127.

####Details:

The default behaviour expects raw fastq files from a Nextera Mate-Pair library kit in Reverse-Forward orientation. Based on the location of the Nextera adapter sequence (if detected), nxtrim produces four different "virtual libraries":

  • mp: read pairs that are large insert-size mate-pairs
  • pe: read pairs that are short insert-sze paired-end reads
  • se: single reads
  • unknown: a library of read-pairs that are mostly large-insert mate-pair, but possibly contain a small proportion of paired end contaminants

The trimmer will reverse-complement the reads such that the resulting libraries will be in Forward-Reverse orientation, this reverse-complementing can be disabled via the --norc flag.

####Example data:

https://basespace.illumina.com/s/TXv32Ve6wTl9

Free registration required.

####References:

http://res.illumina.com/documents/products/technotes/technote_nextera_matepair_data_processing.pdf

http://res.illumina.com/documents/products/appnotes/appnote-nextera-mate-pair-bacteria.pdf

About

Adapter trimming and virtual library creation routine for Illumina Nextera Mate Pair libraries.

Resources

License

Stars

Watchers

Forks

Packages

No packages published