Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nearbyhint: Fix for ffast-math #550

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

RafaGago
Copy link

See commit text.

This fixes:
#548

Notice that I wasn't able to verify it on Visual Studio, just g++ and clang.

The fix on 45cad81 wasn't working on
Clang.

On ffast-math the compiler is free to assume that "x + v -v = x".
45cad81 was workarounding this fact by storing "x + v" on a volatile
variable.

For Clang this wasn't enough to stop optimizing, as it correctly
detected that the variable is local-scope, so no one can take a
reference to it.

This commit reworks the fix by defining a function to do the operation
and disabling optimizations on that function for all supported
compilers (and those using the same frontend).

For non-supported compilers an #error is emitted, as the workaround
wasn't safe enough. It could even break between compiler versions. This
avoids potentially weird behaviour on the future.
@@ -1707,6 +1707,40 @@ namespace xsimd {
}


#if !defined(__FAST_MATH__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach! I wonder if the following works as expected and is faster or not:

namespace details {
template<class T>
T force_add(T x, T y) {
#if defined(__FAST_MATH__) && (defined(__clang__) || defined(__GNUC__))
__attribute__((noinline))
#endif
    return x + y;
}
}

then call it appropriately

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now on $DAILY_JOB.

Without trying I'd say that a competent compiler will still optimize it away.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But notice that the optimization we want to break is ffast-math related. Inlining should make no difference.

@serge-sans-paille
Copy link
Contributor

Can you give a try to the following patch instead: if it works, it's much less intrusive

diff --git a/include/xsimd/arch/generic/xsimd_generic_math.hpp b/include/xsimd/arch/generic/xsimd_generic_math.hpp
index 56e4d98..2ef41de 100644
--- a/include/xsimd/arch/generic/xsimd_generic_math.hpp
+++ b/include/xsimd/arch/generic/xsimd_generic_math.hpp
@@ -1722,8 +1722,8 @@ namespace xsimd {
         // to v. That's not what we want, so prevent compiler optimization here.
         // FIXME: it may be better to emit a memory barrier here (?).
 #ifdef __FAST_MATH__
-        volatile batch_type d0 = v + t2n;
-        batch_type d = *(batch_type*)(void*)(&d0) - t2n;
+        volatile auto d0 = (v + t2n).data;
+        batch_type d = batch_type(d0) - t2n;
 #else
         batch_type d0 = v + t2n;
         batch_type d = d0 - t2n;

@RafaGago
Copy link
Author

RafaGago commented Sep 1, 2021

Now I'm at $DAILY_JOB I don't have the codebase at hand, but I'd say that local scope volatile variables are not enough to stop Clang. It seems that it sees through the intrinsic types, otherwise it woudn't have correctly applied the optimization.

The problem with the volatile approach, is that even if it worked now we don't know if a compiler update would silently break it.

Notice that to remove the volatile instead of doing "(batch_type)(void*)(&d0)" you can do "const_cast<batch_type const&> (d0)"

@serge-sans-paille
Copy link
Contributor

#551 achieves the same result on my laptop, while being less intrusive on the codebase. Can you confirm it works for you too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants