Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Approach to estimate enrichment with playback #388

Open
DanteV19 opened this issue Nov 13, 2024 · 1 comment
Open

Approach to estimate enrichment with playback #388

DanteV19 opened this issue Nov 13, 2024 · 1 comment

Comments

@DanteV19
Copy link

Dear dev team,

I realize the recorded current in the bulk file is immutable during playback when performing adaptive sampling on it with Readfish. Although no actual enrichment can be achieved, I am wondering if the segmented "unblocked" reads can be used to still estimate enrichment with playback.

Since enrichment could be defined as the time you save by not sequencing things you don't want, I came up with a method to estimate enrichment by calculating the remaining bases of unblocked reads from adaptive sampling. Although I am not certain if this method is the correct way to estimate enrichment, so I would like your thoughts about it.

I will explain how I intend to calculate the remaining bases.

When performing readfish stats you can demultiplex the reads into *_proceed.fastq.gz, *_stop_receiving.fastq.gz and *_unblock.fastq.gz in my case for control and hum_test (with adaptive sampling).

Since there was no documentation I could find about how exactly these files were made, I assume hum_test_unblock.fastq.gz contains all complete reads that were supposedly "unblocked" (since I used playback).

Additionally, during execution of readfish the output of all individual reads should be generated in live_reads.fq.

# Fastq output for individual reads
debug_log = "live_reads.fq"

For complete toml see below:

# Basecaller configuration
[caller_settings.dorado]
config = "dna_r10.4.1_e8.2_400bps_5khz_hac"
address = "ipc:///tmp/.guppy/5555"
debug_log = "live_reads.fq"

# Aligner Configuration
[mapper_settings.mappy_rs]
fn_idx_in = "/data/GCF_000001405.40/human_ref.mmi"
debug_log = "live_alignments.paf"
n_threads = 4

[[regions]]
name = "hum_test"
min_chunks = 1 
max_chunks = 4 
targets = ["chr10"] 
single_on = "stop_receiving" 
multi_on = "stop_receiving"   
single_off = "unblock"        
multi_off = "unblock"         
no_seq = "proceed"            
no_map = "proceed"           
above_max_chunks = "unblock"
below_min_chunks = "proceed" 

[[regions]]
name = "control"
control = true
min_chunks = 0
max_chunks = 2
targets = []
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "stop_receiving"
multi_off = "stop_receiving"
no_seq = "stop_receiving"
no_map = "stop_receiving"
above_max_chunks = "stop_receiving"
below_min_chunks = "stop_receiving"

I noticed that in live_reads.fq there were multiple fragments with the same read IDs as the complete reads in hum_test_unblock.fastq.gz. So, I assume that complete "unblocked" reads were segmented into unblocked chunks in live_reads.fq, with the end of the first segment being the location of when the rejection signal was sent. If my assumptions are correct then I should be able to estimate enrichment by calculating the remaining bases of "unblocked" reads as:
length of complete "unblocked" reads (from _hum_test_unblock.fastq.gz_) - length of the first segment of corresponding read (from live_reads.fq) = remaining bases of the "unblocked" read

Subsequently, the average remaining bases of the "unblocked" reads can be calculated by summing up the above for all unblocked reads and dividing it by the number of "unblocked" reads.

Since the average of remaining bases could indicate the saved time during adaptive sampling can it also be used as an estimation of enrichment?

If you notice that any of my assumptions are wrong, please let me know and if possible advise me on changes I should make in my approach to calculate a valid measure for estimated enrichment with playback.

Thanks in advance

Copy link

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant