- It is very situational whether mmlong2 is a good fit for your samples or project, but generally, the workflow is intended for highly complex metagenomes (e.g. soil, sewage sludge, human gut) and is not optimal for samples with very low microbial diversity (e.g. pure cultures, Zymo Mock DNA Standard).
- Please keep in mind that mmlong2 is a long-reads-only workflow, designed to work with Nanopore (about 1 % read error rate) or with PacBio HiFi (about 0.1 % read error rate) datasets. Short-read datasets can be used for mapping to improve genome recovery via differential coverage binning, but the workflow is not designed for short-read metagenomic assembly.
- It is also recommended that the input for mmlong2 would be at least 1 GB of sequenced data with multiple prokaryotic organisms.
- In general, mmlong2 is designed to be used on HPC clusters with ≥100 threads and ≥300 Gb of RAM allocated per workflow run.
- The metagenomic binning part of the workflow is compute intensive (optimized for MAG yield) and might take several days to weeks to complete.
- The mmlong2 workflow has been developed and tested on HPC nodes (Slurm cluster and bare metal) running on Ubuntu 22.04.
- It is highly recommended to perform read quality filtering (e.g. remove reads with less than Phred Q10 for Nanopore and Phred Q20 for PacBio HiFi as well as short-reads) before running mmlong2.
- Triming off read adaptor and barcode sequences as well as filtering out very short reads (e.g. below 200 bp for Nanopore or PacBio data) might also improve genome recovery.
- If you are only interested in getting the genomes, check out mmlong2-lite, which is a lightweight version of the pipeline with an identical prokaryotic genome recovery procedure and does not require large database installation.
- During a workflow run, temporary files might be generated and not deleted by Snakemake when the run finishes.
- By default, the current working directory is used to store these temporary files. Hence, it is recommended to have a directory dedicated to temporary files and provide it to mmlong2 through the
--temporary_dir
option.
- If the workflow crashes, try inspecting the Stdout and Snakemake logs for troubleshooting.
- The workflow can usually be resumed by re-running the same commands.
- If you want to resume the workflow from a new installation of mmlong2, it is highly recommended to first run the workflow with the
--touch
option to mark the generated files against deletion. - If the workflow still keeps crashing after several retries, feel free to post the error logs in the GitHub
Issues
section.
- Although it is possible to run the genome analysis section with a custom set of genomes by mimicking the workflow directory structure, this is quite technical to achieve and might lead to compatibility issues.
- A more streamlined method for providing custom genomes to the workflow will be part of a future release.
- At the moment, mmlong2 does not feature genome recovery of viruses or eukaryotes.
- Expansion of the binning features, however, is planned for future releases.