GitHub repository for binning CAMI2b challenge dataset using MetaBinner
We recommend using conda to run MetaBinner.
After installing Anaconda (or miniconda), fisrt obtain MetaBinner:
git clone https://github.com/ziyewang/metabinner_cami2b
Then simply create a metabinner_cami2b environment
cd metabinner_cami2b
conda env create -f metabinner_cami2b_env.yaml
conda activate metabinner_cami2b_env
(please make sure you have installed openssl)
cd CheckM-1.0.18
python setup.py install
Install checkM database:
CheckM relies on a number of precalculated data files which can be downloaded from https://data.ace.uq.edu.au/public/CheckM_databases/. (More details are available at https://github.com/Ecogenomics/CheckM/wiki/Installation#how-to-install-checkm):
mkdir <checkm_data_dir>
cd <checkm_data_dir>
wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz
tar xzf checkm_data_2015_01_16.tar.gz
checkm data setRoot .
CheckM requires the following programs to be added to your system path:
HMMER (>=3.1b1)
prodigal (2.60 or >=2.6.1) executable must be named prodigal and not prodigal.linux
pplacer (>=1.1) ;guppy, which is part of the pplacer package, must also be on your system path; pplacer binaries can be found on the pplacer GitHub page
or you can install the dependencies as follows:
conda install -c bioconda prodigal
conda install -c bioconda hmmer
conda install -c bioconda pplacer
An example:
#Filter short contigs and generate kmer profiles:
python scripts/filter_tooshort_for_contig_file.py test_data/final_contigs.fa 999
python scripts/gen_kmer.py test_data/final_contigs.fa 999 4
#path to the input files for metabinner and the output dir:
contig_file=test_data/final_contigs_999.fa
kmer_files=test_data/kmer_4_f999.csv
coverage_profiles=test_data/Coverage_f1k.tsv
output_dir=test_data/output
mkdir ${output_dir}/metabinner_res
bash ${metabinner_path}/code_for_cami2b/metabinner_cami2b_pipeline_v1.2.sh ${contig_file} ${output_dir} ${coverage_profiles} ${kmer_profile} ${metabinner_path}
#The file "final_result_combo_greedy_combo2_mypipeline.tsv" in the "${output_dir}/metabinner_res" is the final output.
Please send bug reports or questions (such as the appropriate modes for your datasets) to Ziye Wang: [email protected] and Dr. Shanfeng Zhu: [email protected]
[1] Lu, Yang Young, et al. "COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge." Bioinformatics 33.6 (2017): 791-798.
[2] https://github.com/dparks1134/UniteM.
[3] Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. 2015. "CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes." Genome Research, 25: 1043–1055.
[4] Graham ED, Heidelberg JF, Tully BJ. (2017) "BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation." PeerJ 5:e3035