Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uneven reading time within a SRA file #29

Open
jgans opened this issue Jun 18, 2020 · 1 comment
Open

Uneven reading time within a SRA file #29

jgans opened this issue Jun 18, 2020 · 1 comment

Comments

@jgans
Copy link

jgans commented Jun 18, 2020

I am downloading SRA files with prefetch and then reading them in C++ by iterating over ngs::ReadCollection. All calculations are running in the Cloud on AWS. I have found a small number of seeingly "pathological" SRA runs that take longer to read as the iteration progresses through the file.

For example, the graph below shows the time required to read sequential, 0.1% chunks of ERR3212419 (where the x-axis is the cumulative number of reads read as a percentage of the total number of reads in the SRA run). As shown in the graph, the first 16% of reads can be read from disk relatively quickly (approximately 2 seconds per 0.1% chunk). However, the time to read the same number of reads then jumps to approximately 12 seconds, and then jumps again to over 100 seconds. (I stopped after loading 21% of the reads).

Is there a way to read this SRA record (and records like it), so that the time required to read different parts of the file is even? This is important because I would like to read SRA records (from disk) in parallel, and the uneven time-to-read makes for significant load imbalances. In this example, parallel workers reading near the beginning of the file finish much faster than the parallel worker reading near the end of the file.

image

@kwrodarmer
Copy link
Contributor

Thank you for the detailed report!

Let us examine it before responding. This will not be instantaneous, but as quickly as we can.

Again, really sincere thanks for such great information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants