-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add samtools-depth-single
process
#1311
Conversation
bb59892
to
77221e4
Compare
@@ -18,6 +21,13 @@ | |||
) | |||
|
|||
|
|||
def prune_zero_depth(stdout): | |||
"""Prune zero depth entries from the samtools depth output.""" | |||
df = pd.read_csv(StringIO(stdout), sep="\t", names=["chrom", "pos", "depth"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid reading the entire file, you can limit the import to non-zero values, perhaps something along the lines of (untested):
import pandas as pd
chunkz = pd.read_csv(StringIO(stdout), sep="\t", names=["chrom", "pos", "depth"], chunksize=5000)
df = pd.concat((x.query("depth > 0") for x in chunkz))
This works, though:
from io import StringIO
import pandas as pd
file_name = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
df = pd.read_csv(file_name)
flow = StringIO()
xy = df.to_csv(flow)
chunkz = pd.read_csv(flow , chunksize=5)
pd.concat((x.query("species == 'setosa'") for x in chunkz))
Unnamed: 0 sepal_length sepal_width petal_length petal_width species
0 0 5.1 3.5 1.4 0.2 setosa
1 1 4.9 3.0 1.4 0.2 setosa
2 2 4.7 3.2 1.3 0.2 setosa
3 3 4.6 3.1 1.5 0.2 setosa
4 4 5.0 3.6 1.4 0.2 setosa
5 5 5.4 3.9 1.7 0.4 setosa
6 6 4.6 3.4 1.4 0.3 setosa
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. In next iteration, consider adding additional output:
- HDF5 file format
- coverage data in BED file format + associated index for genome browsers (example, should be reimplemented for Python)
REID-2234
before non-breaking changes.
it might be sufficient to modify the existing CHANGELOG entry from previous
commit(s).
that break the api/interface). Examples: renaming the input/output, adding
mandatory input, removing input/output...
backwards-compatible manner. Examples: add output field, add non-mandatory
input parameter, use a different tool that produces same results...
the api/interface. Examples: typo fix, change/add warning messages...