Approach to estimate enrichment with playback #388

DanteV19 · 2024-11-13T17:05:16Z

Dear dev team,

I realize the recorded current in the bulk file is immutable during playback when performing adaptive sampling on it with Readfish. Although no actual enrichment can be achieved, I am wondering if the segmented "unblocked" reads can be used to still estimate enrichment with playback.

Since enrichment could be defined as the time you save by not sequencing things you don't want, I came up with a method to estimate enrichment by calculating the remaining bases of unblocked reads from adaptive sampling. Although I am not certain if this method is the correct way to estimate enrichment, so I would like your thoughts about it.

I will explain how I intend to calculate the remaining bases.

When performing readfish stats you can demultiplex the reads into *_proceed.fastq.gz, *_stop_receiving.fastq.gz and *_unblock.fastq.gz in my case for control and hum_test (with adaptive sampling).

Since there was no documentation I could find about how exactly these files were made, I assume hum_test_unblock.fastq.gz contains all complete reads that were supposedly "unblocked" (since I used playback).

Additionally, during execution of readfish the output of all individual reads should be generated in live_reads.fq.

# Fastq output for individual reads
debug_log = "live_reads.fq"

For complete toml see below:

# Basecaller configuration
[caller_settings.dorado]
config = "dna_r10.4.1_e8.2_400bps_5khz_hac"
address = "ipc:///tmp/.guppy/5555"
debug_log = "live_reads.fq"

# Aligner Configuration
[mapper_settings.mappy_rs]
fn_idx_in = "/data/GCF_000001405.40/human_ref.mmi"
debug_log = "live_alignments.paf"
n_threads = 4

[[regions]]
name = "hum_test"
min_chunks = 1 
max_chunks = 4 
targets = ["chr10"] 
single_on = "stop_receiving" 
multi_on = "stop_receiving"   
single_off = "unblock"        
multi_off = "unblock"         
no_seq = "proceed"            
no_map = "proceed"           
above_max_chunks = "unblock"
below_min_chunks = "proceed" 

[[regions]]
name = "control"
control = true
min_chunks = 0
max_chunks = 2
targets = []
single_on = "stop_receiving"
multi_on = "stop_receiving"
single_off = "stop_receiving"
multi_off = "stop_receiving"
no_seq = "stop_receiving"
no_map = "stop_receiving"
above_max_chunks = "stop_receiving"
below_min_chunks = "stop_receiving"

I noticed that in live_reads.fq there were multiple fragments with the same read IDs as the complete reads in hum_test_unblock.fastq.gz. So, I assume that complete "unblocked" reads were segmented into unblocked chunks in live_reads.fq, with the end of the first segment being the location of when the rejection signal was sent. If my assumptions are correct then I should be able to estimate enrichment by calculating the remaining bases of "unblocked" reads as:
length of complete "unblocked" reads (from _hum_test_unblock.fastq.gz_) - length of the first segment of corresponding read (from live_reads.fq) = remaining bases of the "unblocked" read

Subsequently, the average remaining bases of the "unblocked" reads can be calculated by summing up the above for all unblocked reads and dividing it by the number of "unblocked" reads.

Since the average of remaining bases could indicate the saved time during adaptive sampling can it also be used as an estimation of enrichment?

If you notice that any of my assumptions are wrong, please let me know and if possible advise me on changes I should make in my approach to calculate a valid measure for estimated enrichment with playback.

Thanks in advance

The text was updated successfully, but these errors were encountered:

github-actions · 2024-11-13T17:05:36Z

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approach to estimate enrichment with playback #388

Approach to estimate enrichment with playback #388

DanteV19 commented Nov 13, 2024

github-actions bot commented Nov 13, 2024

Approach to estimate enrichment with playback #388

Approach to estimate enrichment with playback #388

Comments

DanteV19 commented Nov 13, 2024

github-actions bot commented Nov 13, 2024