-
Notifications
You must be signed in to change notification settings - Fork 5
/
README
82 lines (65 loc) · 4.04 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
CRAMTools is a set of Java tools and APIs for efficient compression of sequence read data. Although this is intended as a stable version the code is released as early access. Parts of the CRAMTools are experimental and may not be supported in the future.
http://www.ebi.ac.uk/ena/about/cram_toolkit
Version 2.0
Input files:
Reference sequence in fasta format <fasta file>
Reference sequence index file <fasta file>.fai created using samtools (samtools faidx <fasta file>)
Input BAM file <BAM file> sorted by reference coordinates
BAM index file <BAM file>.bai created using samtools (samtools index <BAM file>)
Download and run the program:
Download the prebuilt runnable jar file from: https://github.com/vadimzalunin/crammer/blob/master/cramtools-1.0.jar?raw=true
Execute the command line program: java -jar cramtools.jar
Usage is printed if no arguments were given
To convert a BAM file to CRAM:
java -jar cramtools.jar cram --input-bam-file <bam file> --reference-fasta-file <reference fasta file> [--output-cram-file <output cram file>]
To convert a CRAM file to BAM:
java -jar cramtools.jar bam --input-cram-file <input cram file> --reference-fasta-file <reference fasta file> --output-bam-file <output bam file>
To build the program from source:
To check out the source code from github you will need git client: http://git-scm.com/
Make sure you have java 1.6 or higher: http://openjdk.java.net/ or http://www.oracle.com/us/technologies/java/index.html
Make sure you have ant version 1.7 or higher: http://ant.apache.org/
git clone git://github.com/vadimzalunin/crammer.git
ant -f build/build.xml runnable
java -jar cramtools.jar
To run unit tests:
ant -f build/build.xml test
Picard integraion
Some tools using Picard API should be able to read/write CRAM archives. For example:
java -cp cramtools.jar net.sf.picard.sam.ValidateSamFile INPUT=data.cram
However the following will not work:
java -cp cramtools.jar -jar ValidateSamFile.jar INPUT=data.cram
Reference sequence discovery
For tools that use Picard API the following rules describe how the reference sequence file is discovered:
1. Given an input file '<some name>.cram' search for a '<some name>.fa' file in the same directory.
2. Given an input file '<some name>.cram' search for a '<some name>.fa' file in the same directory, which should contain a full path to the reference file.
3. Use java property 'reference=<path to ref file>', usage: java -Dreference=<path to ref file> -cp cramtools.jar ...
The following tools have been included into this release:
Bam2Cram
Cram2Bam
ValidateCramFile (this works similar to ValidateSamFile tool from picard)
Lossy model
Bam2Cram allows to specify lossy model via a string which can be composed of one or more words separated by '-'.
Each word is an instruction about quality score treatment, which can be binning (Illumina 8 bins) or full scale (40 values).
Here are some examples:
N40-D8 preserve quality scores for non-matching bases with full precision, and bin quality scores for positions flanking deletions.
m5 preserve quality scores for reads with mapping quality score lower than 5
R40X10-N40 preserve non-matching quality scores and those matching with coverage lower than 10
Definitions:
R reference base
N non-reference (mis-matched) base
U unplaced read base
P pileup: capture all bases at a given position on the reference if there are at least 3 mismatches
D read positions flanking a deletion
M reads with mapping quality score higher than 40
m reads with mapping quality score lower than 40
By default no quality scores will be preserved.
Illimuna 8-binning scheme:
0, 1, 6, 6, 6, 6, 6, 6, 6, 6, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 22, 22, 22, 22, 22, 27, 27, 27, 27, 27, 33, 33, 33, 33, 33, 37,
37, 37, 37, 37, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40
Check for more on our web site:
http://www.ebi.ac.uk/ena/about/cram_toolkit