Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] option to ignore soft-clipped parts in get_aligned_pairs() #1292

Open
LaraFuhrmann opened this issue Jun 6, 2024 · 6 comments

Comments

@LaraFuhrmann
Copy link

Dear developers,

thanks a lot for this great package, we are using it extensively for our scripts.

We are wondering if it would be possible to add the option to ignore or "clip-out" the soft-clipped parts of the reads in the function get_aligned_pairs?

Or is there an easy other way to do that without checking the cigar strings?

Thanks a lot,
Lara

@DrYak
Copy link

DrYak commented Jun 12, 2024

I've added PR #1293 to cover this need, and expanded the tests to cover this new code path.

@jmarshall
Copy link
Member

I don't believe there is a good way to do this at the moment, other than by looking at the CIGAR strings by hand as you suggest.

Thank you for your PR @DrYak but I would like to consider a more general approach instead: an option (e.g. with_cigar) for get_aligned_pairs() that would add another entry to the returned tuples that would tell you the CIGAR operator that each particular position tuple corresponds to. Probably giving just the enum CIGAR_OPS value, not the length as well.

So @LaraFuhrmann would ignore the tuples that say CSOFT_CLIP. And the facility may be useful for other use cases.

@DrYak
Copy link

DrYak commented Jun 18, 2024

@jmarshall that's indeed a more elegant solution covering more uses case.

I had at some point though about such an approach but went for the above PR to keep the changeset small.

I'll loko into your suggestion, but I fear more tests will need to be written 😅

@DrYak
Copy link

DrYak commented Jun 18, 2024

I had a bit of time this evening after work, and wrote another quick PR.

consider a more general approach instead: an option (e.g. with_cigar) for get_aligned_pairs() that would add another entry to the returned tuples that would tell you the CIGAR operator that each particular position tuple corresponds to.

Now available in PR #1294
Tests updated (but much more work than the previous PR 😅 )

@DrYak
Copy link

DrYak commented Jul 1, 2024

Hi, @jmarshall! Did you had time to check if my proposal of with_cigar (as you mentioned above) looks good to you?
(And are you happy with the tests?)
Thanks!

@DrYak
Copy link

DrYak commented Jul 31, 2024

Hi @jmarshall!
Just a gentle reminder to ask if you think you could find some time to have a look at PR #1294 ?
Even if you end-up not merging that but developing it on your own, it would greatly help Lara and me to know if this is the direction PySam might go in the future, so we could plan our software accordingly.
Thank you very much for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants