Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace 8*PEXTRW with 1*MOVDQU in f32_to_s16 #123

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

WolfWings
Copy link

The existing code has a series of 8 sequential unrolled PEXTRW, which compilers generally cannot detect and optimize to a single MOVDQU instruction.

As such manually placing the optimized unaligned store intrinsic in place is an enormous performance win for SSE with identical output.

The existing code has a series of 8 sequential unrolled PEXTRW, which compilers generally cannot detect and optimize to a single MOVDQU instruction.

As such manually placing the optimized unaligned store intrinsic in place is an enormous performance win for SSE with identical output.
@WolfWings
Copy link
Author

Some recent versions of clang can identify this construct, but older ones cannot and no version of GCC I was able to test could make this optimization.

I chose the storeu as loadu is used elsewhere instead of juggling alignment issues, so the occasional extra cycle of latency to match existing code design seemed appropriate to also minimize the code change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant