forked from VertebrateResequencing/vr-pipe
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
165 lines (123 loc) · 7.04 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
[You should read IMPORTANT_NOTES which notes version-specific issues]
VRPipe is a generic pipeline system designed for use by the Vertebrate
Resequencing team at the Sanger Instititue, but also for broad use for anyone.
For an overview of why it was created, read our vision document:
https://github.com/VertebrateResequencing/vr-pipe/wiki/Vision
It comprises mostly self-documented Perl code. There are both scripts and
modules within subfolders.
Help on each module is currently of limited availability, but eventually all
will be self-documented with POD, which can be read with eg. perldoc.
Each script has a --help opion, eg:
vrpipe-status --help
INSTALLATION
------------
You will need the source version of samtools compiled with -fPIC and -m64 in the
CFLAGS, and the environment variable SAMTOOLS pointing to that source directory
(which should now contain bam.h and libbam.a). Samtools can be downloaded from
here: http://sourceforge.net/projects/samtools/files/samtools/
Furthermore, it is required that a samtools executable be in your PATH. (Note
that tests use the samtools in your PATH, not samtools in your SAMTOOlS dir;
production pipelines will use the samtools at the absolute path you configure
them with during setup - the default will be the first one in your PATH.)
It is also recommended that you set PERL_INLINE_DIRECTORY to ~/.Inline and
create that directory.
You require Module::Build in order for the Build.PL script to work. It is
recommended you 'install Bundle::CPAN' using 'cpan' prior to attempting setup.
First, do the initial setup and dependency installation:
$ perl Build.PL
If this says you have "ERRORS/WARNINGS FOUND IN PREREQUISITES" try:
$ ./Build installdeps
to install missing prerequisites from CPAN. You may find that you have to
unset your LANG environment variable temporarily to install all CPAN
prerequisites. You will likely find that Inline::Filters fails its preprocess.t
test; this is ok and you can manually use 'cpan' to:
'force install Inline::Filters'. You may also find that
MooseX::Types::Parameterizable fails its tests. In that case you can manually
use 'cpan' to install an earlier version that should work:
'install JJNAPIORK/MooseX-Types-Parameterizable-0.07.tar.gz'.
Once the prerequisites are installed you will be asked a number of setup
questions such as the details of your production and testing databases. It is
recommended to use a stand-alone relational database such as MySQL. The pipeline
functionality requires such a database, but if you're only using code such as a
parser or the bas function you can use sqlite, supplying a file path for the
database name.
Once setup is complete modules/VRPipe/SiteConfig.pm will have been created. It
is this file that holds your site-wide configuration of VRPipe.
If you choose to use the local scheduler, you will need to add the scripts
directory to your PATH prior to running tests. Note that the local scheduler
currently doesn't work well with an sqlite database, so some tests are likely
to fail with that combination.
To test the code prior to using it:
$ ./Build test
A number of environment variables affect which tests run and how. In most cases
it is fine to not worry about these extra variables. They are described here for
the sake of completeness.
VRPIPE_TEST_PIPELINES, when true, will fully test all the pipelines and steps
instead of just the core functionality of the system.
These tests take a very long time - potentially hours -
so this is off by default. If you're interested in
running a particular pipeline, you can test just that
pipeline by setting this variable to true and then
running:
$ ./Build test --test_files t/VRPipe/Pipelines/[name].t --verbose
GATK, pointing to a directory containing GATK jar files, will enable tests that
require GATK
VRPIPE_VRTRACK_TESTDB will enable testing of the VRTrack DataSource, using the
value supplied as the database name (other database
details will come from the standard VRTrack environment
variables)
Additionally, tests that require certain executables will not run unless you
have those executables in your PATH.
To install:
$ ./Build install
(or just include the modules subdirectory in your PERL5LIB, and include the
scripts subdirectory in your PATH)
To create your production database (testing database is created automatically):
$ vrpipe-db_deploy
If in the future the VRPipe code is updated and there is a change to the schema,
you will need to stop VRPipe, update your code, then run:
$ vrpipe-db_upgrade
To use:
The daemons to run the system have not yet been written. In the mean time you
can run the scripts vrpipe-trigger_pipelines and vrpipe-dispatch_pipelines.
Set up a pipeline using vrpipe-setup and then it will execute if you have the
trigger and dispatch scripts running in the background.
As a general point, any perl script you write that wants to use some VRPipe
code should typically "use VRPipe::Persistent::Schema;". After that most things
should work without further "use" statements.
EXTERNAL SOFTWARE
-----------------
Some software need environment variables setup (using setenv in csh or export in
bash). The following list shows the name of the software, the environment
variable you need to set, and the value you should set it to, separated by
commas.
samtools,SAMTOOLS,/path/to/samtools/source_directory
GATK,GATK,/path/to/gatk_jar_files
picard,PICARD,/path/to/picard_jar_files
eg. to have GATK work properly in the pipelines you might do:
setenv GATK /path/to/gatk_jar_files
or
export GATK=/path/to/gatk_jar_files
depending on what shell you are using.
COPYRIGHT & LICENSE
-------------------
Copyright (c) 2011-2012 Genome Research Limited.
This file is part of VRPipe.
VRPipe is free software: you can redistribute it and/or modify it under the
terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program. If not, see L<http://www.gnu.org/licenses/>.
The usage of a range of years within a copyright statement contained within this
distribution should be interpreted as being equivalent to a list of years
including the first and last year specified and all consecutive years between
them. For example, a copyright statement that reads 'Copyright (c) 2005, 2007-
2009, 2011-2012' should be interpreted as being identical to a statement that
reads 'Copyright (c) 2005, 2007, 2008, 2009, 2011, 2012' and a copyright
statement that reads "Copyright (c) 2005-2012' should be interpreted as being
identical to a statement that reads 'Copyright (c) 2005, 2006, 2007, 2008, 2009,
2010, 2011, 2012'."