Closest improved #185

agalitsyna · 2024-02-06T20:39:15Z

Problem: If the segment does not have closest (e.g., in ignore_upstream=True setting), it will simply drop out of output table.

Solution: Added checking the indexes of df1 that are absent form the final returned hits

Why it was not observed before: No detiled testing for options ignore_upstream/ignore_downstream/ignore_overlap and their combinations.
If df1 has a segment from a chromosome X and df2 has a segment from it, but they are in an arrangements that won't produce a hit, this segment will be dropped out of df1 output.

Dragged modifications: overlap returned float indexes with failing the tests (numpy v1.22.4, pandas v1.5.2). Solution added for robustness of code across different software versions.

…osest hits and are not separate chromosomes. Problem: If the segment does not have closest (e.g., in ignore_upstream=True setting), it will simply drop out of output table. Solution: Added checking the indexes of df1 that are absent form the final returned hits Why it was not observed before: No detiled testing for options ignore_upstream/ignore_downstream/ignore_overlap and their combinations. If df1 has a segment from a chromosome X and df2 has a segment from it, but they are in an arrangements that won't produce a hit, this segment will be dropped out of df1 output.

gfudenberg · 2024-02-07T17:27:05Z

bioframe/ops.py

@@ -511,8 +510,8 @@ def overlap(
    index_col = return_index if isinstance(return_index, str) else "index"
    index_col_1 = index_col + suffixes[0]
    index_col_2 = index_col + suffixes[1]
-    df_index_1 = pd.DataFrame({index_col_1: df1.index[events1]})
-    df_index_2 = pd.DataFrame({index_col_2: df2.index[events2]})
+    df_index_1 = pd.DataFrame({index_col_1: df1.index[events1]}, dtype=pd.Int64Dtype())


I believe @nvictus introduced the _to_nullable_dtype() in the last update... maybe there is a way we can remove boilerplate & also use same strategy for closest ?

We can, but I don't see how it will help in this case, frankly. It's storage of the index that is always integer and not absent.

Yup, I agree that we do not seem to need a nullable type here.

…tream/downstream when there is no region to match.

golobor

The actual change is small and nicely contained, seems good to me!

golobor · 2024-02-08T20:06:06Z

bioframe/ops.py

@@ -511,8 +510,8 @@ def overlap(
    index_col = return_index if isinstance(return_index, str) else "index"
    index_col_1 = index_col + suffixes[0]
    index_col_2 = index_col + suffixes[1]
-    df_index_1 = pd.DataFrame({index_col_1: df1.index[events1]})
-    df_index_2 = pd.DataFrame({index_col_2: df2.index[events2]})
+    df_index_1 = pd.DataFrame({index_col_1: df1.index[events1]}, dtype=pd.Int64Dtype())


Yup, I agree that we do not seem to need a nullable type here.

agalitsyna added 2 commits February 6, 2024 15:02

Fix problem with overlap indexes data types (where returned as float)

e7d4c1c

agalitsyna requested review from nvictus and gfudenberg February 6, 2024 20:39

agalitsyna added 3 commits February 6, 2024 15:50

lint line fix

d93df6d

lint line fix

6a75081

black applied

c449aa0

gfudenberg reviewed Feb 7, 2024

View reviewed changes

Adding tests for cases for the edge cases for overlap with ignore_ups…

f42af4e

…tream/downstream when there is no region to match.

golobor approved these changes Feb 8, 2024

View reviewed changes

golobor merged commit a058d90 into main Feb 8, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closest improved #185

Closest improved #185

agalitsyna commented Feb 6, 2024

gfudenberg Feb 7, 2024

agalitsyna Feb 7, 2024

golobor Feb 8, 2024

golobor left a comment

golobor Feb 8, 2024

Closest improved #185

Closest improved #185

Conversation

agalitsyna commented Feb 6, 2024

gfudenberg Feb 7, 2024

Choose a reason for hiding this comment

agalitsyna Feb 7, 2024

Choose a reason for hiding this comment

golobor Feb 8, 2024

Choose a reason for hiding this comment

golobor left a comment

Choose a reason for hiding this comment

golobor Feb 8, 2024

Choose a reason for hiding this comment