Deep mutational scanning and machine learning uncover antimicrobial peptide features driving membrane selectivity
Abstract:
Many antimicrobial peptides directly disrupt bacterial membranes but frequently also damage mammalian membranes. Deciphering the rules governing membrane selectivity is critical to understanding their function and enabling therapeutic use. Past attempts to decipher these rules often fail because they cannot interrogate adequate peptide sequence variation. To overcome this problem, we develop deep mutational surface localized antimicrobial display (dmSLAY), which reveals more comprehensive positional residue importance and flexibility across an antimicrobial peptide sequence. We apply dmSLAY to Protegrin-1, a potent yet toxic antimicrobial peptide, and identify thousands of sequence variants that positively or negatively influence its antibacterial activity. Further analysis reveals that avoiding aromatic residues and eliminating disulfide bound pairs while maintaining membrane bound secondary structure greatly improves Protegrin-1 bacterial specificity. Moreover, our biochemical datasets then enabled machine learning models to accurately predict membrane specific activities for over 5.7 million Protegrin-1 variants, leading to the identification of one with greatly reduced toxicity and retention of activity in a murine intraperitoneal infection model. Our results describe an innovative approach for elucidating antimicrobial peptide sequence-structure-function relationships which can inform synthetic peptide-based drug design.
Here we describe all the steps required to reproduce the analysis on the paper "Deep mutational scanning and machine learning uncover antimicrobial peptide features driving membrane selectivity".link
This work flow is divided in 2 parts. Deep mutational scanning and Machine Learning.
One can run the script getCounts.sh to obtain the read count matrix for each sample.
To run this script you will need seqkit and flexbar installed in a unix enviromet.
Note: Since tha raw FASTQ files are larger the limit size allowed in github, they are not available yet, this script will not work.
We translated the peptide sequences with the biopython translate function, and compute the differences on the sequence with the reference protegrin-1 protein using a custom python script, described on translate_and_compute_changes_in_peptides.ipynb.
The differential analysis was done in R with Deseq2, described in the notebook deseq2_analysis.rmd.
This section requires the reads count matrix obtain with the script getCounts.sh and are stored at "/results/counts_matrix_stacked.csv", the notebook and script will read the file and run the differential analysis. This run should genarate the log2 fold change and pvalues to all 7104 peptides.
All these 3 models work in concensus to make the final predictions described in notebook AMP Predictions
conda env create -f environment.yml -n DMS_ML_AMP
All the code are commented so feel free to change the parameters to suit your data and needs.