Guidance on bulk BCR-seq analysis to quantify clonal sharing across samples #1738

laumonstre · 2024-08-06T20:25:04Z

laumonstre
Aug 6, 2024

Hi,

I am currently using MiXCR to re-analyze a large bulk BCR-seq dataset with the hope to quantify clonal sharing across tumour sites for each patient.

The dataset:

Bulk BCR libraries prepared from RNA extracted from 1-7 tumour sites per patient (n=30) using a 5'RACE protocol with a 3'-end primer specific for IGHG1. No UMIs were included.
Illumina sequencing configuration: paired-end sequencing with 2 x 250bp.

MiXCR 4.6.0-50-develop workflow used to date:

mixcr analyze using the generic-amplicon preset with --species hsa --rna --rigid-left-alignment-boundary --floating-right-alignment-boundary C --remove-step exportClones --assemble-clonotypes-by VDJRegion. At the end of this command, I obtain a list of clonotypes for each site within each patient (.clns).
mixcr findAlleles followed by mixcr findShmTrees (see this tutorial, starting at the allele inference step: https://mixcr.com/mixcr/guides/ig-trees-reconstruction/). All _reassigned.clns files from a given patient are submitted as input to findShmTrees, thus I reconstruct lineage trees across sites for each patient.
custom R script (based on this tutorial: https://mixcr.com/mixcr/guides/b-cell-lineages-webinar/) to represent lineage trees and compute, for each patient, the percentage of trees built with clonotypes coming from different sites.

Here are my questions:

For a fair amount of samples, mixcr qc reports 45-70 % of successfully aligned reads with 12-50% of off target reads. I believe I am using the correct preset so I’m thinking that the low immune infiltration of some tumours led to the amplification of spurious sequences. Does that sound plausible? Any advice on what to do with these samples?
For most samples, I noticed that the % of reads used in clonotypes is <10%. This percentage increases if I set --assemble-clonotypes-by to CDR3 rather than VDJRegion. I’m thinking that this is caused by a low sequencing quality as the % of overlapping read-pairs (overlappedPercents) sits around 40-60% for most samples. Would you recommend focusing on CDR3 rather than VDJRegion or can I still trust the VDJRegion reconstructed here?
What are your thoughts around reconstructing lineage trees from libraries that do not include UMIs? Is it reliable enough to quantify clonal sharing across tumour sites for each patient or not? Would it be best to compute pairwise distance metrics between samples based on the CDR3 sequences (https://mixcr.com/mixcr/reference/mixcr-postanalysis/#overlap-postanalysis)?
I observed some public clonotypes, I.e. clonotypes present in a lot of samples. Is there a way to filter them out before reconstructing lineage trees or performing the overlap postanalysis?

Thanks so much in advance for your help!

Answered by mizraelson

Aug 14, 2024

"For a fair amount of samples, mixcr qc reports 45-70 % of successfully aligned reads with 12-50% of off target reads. I believe I am using the correct preset so I’m thinking that the low immune infiltration of some tumours led to the amplification of spurious sequences. Does that sound plausible? Any advice on what to do with these samples?"

I suggest exporting the non-aligned reads by adding the following parameter to your analyze command: --output-not-used-reads1. Then you can manually inspect these reads to identify their origin.

"For most samples, I noticed that the % of reads used in clonotypes is <10%. This percentage increases if I set --assemble-clonotypes-by to CDR3 rather than …

View full answer

mizraelson · 2024-08-14T02:26:47Z

mizraelson
Aug 14, 2024
Collaborator

"For a fair amount of samples, mixcr qc reports 45-70 % of successfully aligned reads with 12-50% of off target reads. I believe I am using the correct preset so I’m thinking that the low immune infiltration of some tumours led to the amplification of spurious sequences. Does that sound plausible? Any advice on what to do with these samples?"

I suggest exporting the non-aligned reads by adding the following parameter to your analyze command: --output-not-used-reads1. Then you can manually inspect these reads to identify their origin.

"For most samples, I noticed that the % of reads used in clonotypes is <10%. This percentage increases if I set --assemble-clonotypes-by to CDR3 rather than VDJRegion. I’m thinking that this is caused by a low sequencing quality as the % of overlapping read-pairs (overlappedPercents) sits around 40-60% for most samples. Would you recommend focusing on CDR3 rather than VDJRegion or can I still trust the VDJRegion reconstructed here?"

Most likely, 250+250 sequencing is not enough to cover the full VDJRegion, considering it’s a 5’RACE protocol. Usually, you would want to use 300+300. You can proceed with assembling clones by CDR3 unless you are specifically interested in hypermutations.

"What are your thoughts around reconstructing lineage trees from libraries that do not include UMIs? Is it reliable enough to quantify clonal sharing across tumour sites for each patient or not? Would it be best to compute pairwise distance metrics between samples based on the CDR3 sequences (https://mixcr.com/mixcr/reference/mixcr-postanalysis/#overlap-postanalysis)?"

You would definitely need to assemble clones by a longer feature than CDR3 to reconstruct lineage trees. With this data, you can try --assemble-clonotypes-by {FR1Begin:CDR2Begin}+{FR3Begin:FR4End}. The correct approach would be to inspect your reads to identify the position of the gap and specify the feature that will exclude it. It is definitely better to use UMIs to be more certain, but you can still give it a try. You can use the nucleotide CDR3 sequence for pairwise distance.

"I observed some public clonotypes, I.e. clonotypes present in a lot of samples. Is there a way to filter them out before reconstructing lineage trees or performing the overlap postanalysis?"

Currently, we do not have any filter for public clonotypes in MiXCR. You can do it manually by filtering the .clns files.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance on bulk BCR-seq analysis to quantify clonal sharing across samples #1738

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Guidance on bulk BCR-seq analysis to quantify clonal sharing across samples #1738

laumonstre Aug 6, 2024

Replies: 1 comment

mizraelson Aug 14, 2024 Collaborator

laumonstre
Aug 6, 2024

mizraelson
Aug 14, 2024
Collaborator