Snakemake - Observed Antibody Space

The Observed Antibody Space (OAS) is a collection of raw outputs from 58 Ig-seq experiments, covering over half a billion of sequences, and containing data from different species, disease states, age groups and B cell types [Kovaltsuk et al., 2018].

The files from this study were downloaded from the following link: http://antibodymap.org/ (downloaded 2019.09.06)

This GitHub repository contains a Snakemake pipeline to go from original FASTA files to a final table that contains concatenated CDR1, CDR2 and CDR3 regions for each one of the studies. In a separate table, the metadata from each sample was extracted (straight from the available JSON files). The metadata and final data table can be joined by a unique identifier.

How to run the pipeline

Following files should be edited based on your needs:

config.yaml: the config file contains all information about the studies to be fused together in the final table, input and output directories, and the project name.
cluster.json: depending on your resource usage, change this file.
Snakefile: contains all the steps in the pipeline. Here, the code for MiXCR alignment is run, and can be changed to one's needs.
run_smk.sh: can be used to fine-tune the settings for running the pipeline, e.g. how many times the script should re-run if an error occurs. Some additional information about the parameters for running this snakemake pipeline are found in the file 'info_run_snakemake'.

Following command can be used to run the pipeline: ./run_smk.sh

Snakemake rules

untar_fasta: take fasta files, untar them, and save them in the original directory
mixcr_analyze: alignment with MiXCR
fuse_studies: take all aligned files inside a study, and fuse them into a bigger dataframe
fuse_chains: fuse the three chains together and output a single table; the chain is defined inside a column
fuse_all_tables: fuse all the tables together from the different studies. Outputs the final table of this Snakemake pipeline
extract_and_save_metadata: takes the gzipped json files containing the metadata, and outputs a table with the metadata for every single sample

Contact

For any remaining questions or inquiries, send an email to: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
cluster.json		cluster.json
config.yaml		config.yaml
info_run_snakemake		info_run_snakemake
run_smk.sh		run_smk.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake - Observed Antibody Space

How to run the pipeline

Snakemake rules

Contact

About

Releases

Packages

Languages

dahjan/OAS-MiXCR-pipeline

Folders and files

Latest commit

History

Repository files navigation

Snakemake - Observed Antibody Space

How to run the pipeline

Snakemake rules

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages