Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up bcf genotype count #1281

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

astaric
Copy link

@astaric astaric commented Apr 22, 2024

Replace bcf_get_genotypes in bcf_genotype_count with implementation that does not require reading GT data for all samples. Code for computing the ploidy is based on the bcf_format_get_allele_indices function for getting the values in GT field but it just counts instead of reading the values.

This is an attempt at addressing #1280, but feel free to decline the PR if there is a better way to address this issue. Also not sure if I am allowed to add the benchmark tests since others are mostly comparisons with other implementations.

Before:

---------------------------------------------------------------------------------------------------------------- benchmark: 5 tests ----------------------------------------------------------------------------------------------------------------
Name (time in us)                                    Min                        Max                       Mean                StdDev                     Median                   IQR            Outliers          OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_access_pl_values_10_samples                 38.2920 (1.0)             185.3330 (1.0)              41.2211 (1.0)          8.9669 (inf)              39.9170 (1.0)          2.3340 (inf)         15;33  24,259.4229 (1.0)         733           1
test_access_pl_values_100_samples               136.3750 (3.56)            546.6249 (2.95)            142.9587 (3.47)        12.5355 (inf)             139.7920 (3.50)         3.6876 (inf)       119;289   6,995.0286 (0.29)       3296           1
test_access_pl_values_1000_samples            3,246.3749 (84.78)         8,651.0000 (46.68)         3,340.0859 (81.03)      425.3084 (inf)           3,273.8131 (82.02)       87.7501 (inf)           2;8     299.3935 (0.01)        274           1
test_access_pl_values_10000_samples         257,407.6669 (>1000.0)     270,355.5422 (>1000.0)     261,211.9690 (>1000.0)  6,149.1067 (inf)         258,542.3335 (>1000.0)  7,216.5631 (inf)           1;0       3.8283 (0.00)          4           1
test_access_pl_values_100000_samples     25,141,500.8749 (>1000.0)  25,141,500.8749 (>1000.0)  25,141,500.8749 (>1000.0)      0.0000 (1.0)      25,141,500.8749 (>1000.0)      0.0000 (1.0)           0;0       0.0398 (0.00)          1           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

After:

-------------------------------------------------------------------------------------------------------- benchmark: 5 tests --------------------------------------------------------------------------------------------------------
Name (time in us)                                Min                    Max                   Mean                StdDev                 Median                   IQR            Outliers          OPS            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_access_pl_values_10_samples             37.2080 (1.0)         579.0000 (1.59)         41.2996 (1.0)         20.2604 (2.46)         38.2080 (1.0)          2.3339 (1.06)        28;85  24,213.2972 (1.0)        1115           1
test_access_pl_values_100_samples            97.6250 (2.62)        364.2500 (1.0)         101.1441 (2.45)         8.2463 (1.0)          99.4590 (2.60)         2.2089 (1.0)       121;508   9,886.8864 (0.41)       5262           1
test_access_pl_values_1000_samples          684.9580 (18.41)     6,552.7500 (17.99)       728.7534 (17.65)      365.4035 (44.31)       693.0421 (18.14)        8.1980 (3.71)        7;107   1,372.2062 (0.06)        905           1
test_access_pl_values_10000_samples       6,489.2501 (174.40)   11,990.0419 (32.92)     7,187.3636 (174.03)   1,384.8602 (167.94)    6,695.7921 (175.25)     236.6672 (107.14)      15;20     139.1331 (0.01)        138           1
test_access_pl_values_100000_samples     77,245.3330 (>1000.0)  83,110.0838 (228.17)   78,858.3527 (>1000.0)  1,470.8695 (178.37)   78,792.9581 (>1000.0)  1,061.1972 (480.43)        2;1      12.6810 (0.00)         13           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant