Check for NaN early when doing comparison #142

stefanvodita · 2024-09-05T17:03:48Z

I was looking into the float comparison code for an Apache Lucene issue (apache/lucene#13720) and thought there was opportunity for a micro-optimisation. We could move the NaN checks earlier, rather than do the required computations only to see later that the arguments were NaN.

aherbert · 2024-09-05T19:23:58Z

TLDR; I don't think this will make a difference. The code as is favours the majority case of comparing finite values with an optimisation for double comparisons to skip a NaN check.

In checking for NaN early you "slow down" the case where two non-NaN values are not equal. We should not really require ULP precision when comparing NaN. It is not expected to be calling this method with NaNs. Note "slow down" may not be measurable if branch prediction skips over the NaN check.

Given that the major case is input of non-NaN numbers, then the current code is favouring that. If the values are within precision, then the NaN check is a check for the edge case usage. If they are not within precision then we can skip the NaN check.

Note that the double version is slightly different from the float version. It does not require a NaN check when the raw bits have opposite signs. I changed this in commit: 286932d for NUMBERS-184. Unfortunately the assumption for NaNs in doubles does not transfer across to the floats as in that case we would require the ulp argument to be a short (and any NaN value will have an integer representation with the sign bit removed above any possible short).

So the current code favours finite arguments that are not equal; your modification favours one or both arguments as NaN. In practice branch prediction should make the correct choice to ignore a NaN possibility and any changes will be very difficult to detect in execution time.

aherbert · 2024-09-05T19:37:19Z

Note: In your Lucene code you are using the float method with an int ulp (Lucene PR 13723). Given you are not tied to the legacy API that Numbers inherited from Commons Math I would recommend you change the ulp argument to a short and adapt the double equality method. It is much cleaner.

If you consider the distance apart that 2 float values are at different ulp you will note that it is not a suitable measure of closeness when beyond the value of a short:

jshell> Float.intBitsToFloat(Float.floatToRawIntBits(1.0f) + (1 << 10))
$10 ==> 1.0001221

jshell> Float.intBitsToFloat(Float.floatToRawIntBits(1.0f) + (1 << 12))
$11 ==> 1.0004883

jshell> Float.intBitsToFloat(Float.floatToRawIntBits(1.0f) + (1 << 14))
$12 ==> 1.0019531

jshell> Float.intBitsToFloat(Float.floatToRawIntBits(1.0f) + (1 << 15))
$13 ==> 1.0039062

jshell> Float.intBitsToFloat(Float.floatToRawIntBits(1.0f) + (1 << 16))
$14 ==> 1.0078125

When you have two floats at ~32000 ulp apart then you can switch to using a relative or absolute delta.

stefanvodita · 2024-09-05T23:03:58Z

Thank you @aherbert, that was really insightful and I ended up learning a couple things about floating point representation. I've updated the Lucene PR, basically taking the method for doubles and applying it to floats. Please correct me if I misunderstood your recommendation. I'm glad I opened this PR and got your feedback, but I'll close since I totally get your point about how checking for NaNs early isn't necessarily better.

aherbert · 2024-09-06T07:16:20Z

Hi @stefanvodita, thanks for the feedback. I've added a note to your float code to indicate that you missed a return statement within the conditional, otherwise you fall-through to a check that is subject to overflow.

Check for NaN early when doing comparison

e6be091

stefanvodita closed this Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for NaN early when doing comparison #142

Check for NaN early when doing comparison #142

stefanvodita commented Sep 5, 2024

aherbert commented Sep 5, 2024

aherbert commented Sep 5, 2024

stefanvodita commented Sep 5, 2024

aherbert commented Sep 6, 2024

Check for NaN early when doing comparison #142

Check for NaN early when doing comparison #142

Conversation

stefanvodita commented Sep 5, 2024

aherbert commented Sep 5, 2024

aherbert commented Sep 5, 2024

stefanvodita commented Sep 5, 2024

aherbert commented Sep 6, 2024