Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closest improved #185

Merged
merged 6 commits into from
Feb 8, 2024
Merged

Closest improved #185

merged 6 commits into from
Feb 8, 2024

Conversation

agalitsyna
Copy link
Member

Problem: If the segment does not have closest (e.g., in ignore_upstream=True setting), it will simply drop out of output table.

Solution: Added checking the indexes of df1 that are absent form the final returned hits

Why it was not observed before: No detiled testing for options ignore_upstream/ignore_downstream/ignore_overlap and their combinations.
If df1 has a segment from a chromosome X and df2 has a segment from it, but they are in an arrangements that won't produce a hit, this segment will be dropped out of df1 output.

Dragged modifications: overlap returned float indexes with failing the tests (numpy v1.22.4, pandas v1.5.2). Solution added for robustness of code across different software versions.

…osest hits and are not separate chromosomes.

Problem: If the segment does not have closest (e.g., in ignore_upstream=True setting), it will simply drop out of output table.

Solution: Added checking the indexes of df1 that are absent form the final returned hits

Why it was not observed before: No detiled testing for options ignore_upstream/ignore_downstream/ignore_overlap and their combinations.
If df1 has a segment from a chromosome X and df2 has a segment from it, but they are in an arrangements that won't produce a hit, this segment will be dropped out of df1 output.
@@ -511,8 +510,8 @@ def overlap(
index_col = return_index if isinstance(return_index, str) else "index"
index_col_1 = index_col + suffixes[0]
index_col_2 = index_col + suffixes[1]
df_index_1 = pd.DataFrame({index_col_1: df1.index[events1]})
df_index_2 = pd.DataFrame({index_col_2: df2.index[events2]})
df_index_1 = pd.DataFrame({index_col_1: df1.index[events1]}, dtype=pd.Int64Dtype())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe @nvictus introduced the _to_nullable_dtype() in the last update... maybe there is a way we can remove boilerplate & also use same strategy for closest ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but I don't see how it will help in this case, frankly. It's storage of the index that is always integer and not absent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I agree that we do not seem to need a nullable type here.

…tream/downstream when there is no region to match.
Copy link
Member

@golobor golobor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual change is small and nicely contained, seems good to me!

@@ -511,8 +510,8 @@ def overlap(
index_col = return_index if isinstance(return_index, str) else "index"
index_col_1 = index_col + suffixes[0]
index_col_2 = index_col + suffixes[1]
df_index_1 = pd.DataFrame({index_col_1: df1.index[events1]})
df_index_2 = pd.DataFrame({index_col_2: df2.index[events2]})
df_index_1 = pd.DataFrame({index_col_1: df1.index[events1]}, dtype=pd.Int64Dtype())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I agree that we do not seem to need a nullable type here.

@golobor golobor merged commit a058d90 into main Feb 8, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants