-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce maximum memory usage while reading SRA data? #97
Comments
Possible? Of course. Feasible? Not so much. Most likely the memory is being used for caching the reference sequences. If you are using an access pattern like The problem is thus: If you don't care about pairing mates (immediately), you can employ a strategy of extract-and-sort, like |
Thank you for the advice! From the description you provided, it appears that the maximum amount of RAM memory used by VDB should be proportional to the total size of the reference sequences, as opposed to the total size of the reads in an SRA record. Since the amount of RAM I'm using to process an SRA record is already proportional to the total amount of read sequence, the amount of RAM used by VDB for caching reference sequences should become a progressively smaller fraction of the total memory usage as I use cloud compute instances with progressively larger amounts of RAM. |
Using the C VDB API (and following the
fasterq-dump
utility strategy for accessing SRA records) for reading SRA data can consume a significant amount of RAM while reading an SRA record. This can be an issue when using attempting to minimize the amount of Cloud computing resources (i.e. instance RAM) when processing a large number of SRA records.The maximum amount of RAM used while reading (as measured with
/usr/bin/time -v
) depends on the record:While periodically calling
VCursorRelease()
andVCursorOpen()
to force the VDB interface to deallocate RAM offers a minor reduction in the maximum amount of RAM used (about 25%), this strategy significantly slows down the rate at which an SRA record is read.Is it possible/feasible to limit memory consumption using the VDB C API to sub-gigabyte levels, independent of the number of reads? The goal is to read through an SRA record once, as quickly as possible and using as little RAM as possible.
The text was updated successfully, but these errors were encountered: