Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for NaN early when doing comparison #142

Closed
wants to merge 1 commit into from

Conversation

stefanvodita
Copy link

I was looking into the float comparison code for an Apache Lucene issue (apache/lucene#13720) and thought there was opportunity for a micro-optimisation. We could move the NaN checks earlier, rather than do the required computations only to see later that the arguments were NaN.

@aherbert
Copy link
Contributor

aherbert commented Sep 5, 2024

TLDR; I don't think this will make a difference. The code as is favours the majority case of comparing finite values with an optimisation for double comparisons to skip a NaN check.


In checking for NaN early you "slow down" the case where two non-NaN values are not equal. We should not really require ULP precision when comparing NaN. It is not expected to be calling this method with NaNs. Note "slow down" may not be measurable if branch prediction skips over the NaN check.

Given that the major case is input of non-NaN numbers, then the current code is favouring that. If the values are within precision, then the NaN check is a check for the edge case usage. If they are not within precision then we can skip the NaN check.

Note that the double version is slightly different from the float version. It does not require a NaN check when the raw bits have opposite signs. I changed this in commit: 286932d for NUMBERS-184. Unfortunately the assumption for NaNs in doubles does not transfer across to the floats as in that case we would require the ulp argument to be a short (and any NaN value will have an integer representation with the sign bit removed above any possible short).

So the current code favours finite arguments that are not equal; your modification favours one or both arguments as NaN. In practice branch prediction should make the correct choice to ignore a NaN possibility and any changes will be very difficult to detect in execution time.

@aherbert
Copy link
Contributor

aherbert commented Sep 5, 2024

Note: In your Lucene code you are using the float method with an int ulp (Lucene PR 13723). Given you are not tied to the legacy API that Numbers inherited from Commons Math I would recommend you change the ulp argument to a short and adapt the double equality method. It is much cleaner.

If you consider the distance apart that 2 float values are at different ulp you will note that it is not a suitable measure of closeness when beyond the value of a short:

jshell> Float.intBitsToFloat(Float.floatToRawIntBits(1.0f) + (1 << 10))
$10 ==> 1.0001221

jshell> Float.intBitsToFloat(Float.floatToRawIntBits(1.0f) + (1 << 12))
$11 ==> 1.0004883

jshell> Float.intBitsToFloat(Float.floatToRawIntBits(1.0f) + (1 << 14))
$12 ==> 1.0019531

jshell> Float.intBitsToFloat(Float.floatToRawIntBits(1.0f) + (1 << 15))
$13 ==> 1.0039062

jshell> Float.intBitsToFloat(Float.floatToRawIntBits(1.0f) + (1 << 16))
$14 ==> 1.0078125

When you have two floats at ~32000 ulp apart then you can switch to using a relative or absolute delta.

@stefanvodita
Copy link
Author

Thank you @aherbert, that was really insightful and I ended up learning a couple things about floating point representation. I've updated the Lucene PR, basically taking the method for doubles and applying it to floats. Please correct me if I misunderstood your recommendation. I'm glad I opened this PR and got your feedback, but I'll close since I totally get your point about how checking for NaNs early isn't necessarily better.

@aherbert
Copy link
Contributor

aherbert commented Sep 6, 2024

Hi @stefanvodita, thanks for the feedback. I've added a note to your float code to indicate that you missed a return statement within the conditional, otherwise you fall-through to a check that is subject to overflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants