Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simd: min / max helpers #379

Merged
merged 2 commits into from
Jan 10, 2024
Merged

simd: min / max helpers #379

merged 2 commits into from
Jan 10, 2024

Conversation

recp
Copy link
Owner

@recp recp commented Jan 9, 2024

New SIMD helpers:

  • glmm_min(a, b)
  • glmm_max(a, b)

simd helpers to match x86, neon and wasm behaviors ( without compromising performance ). I reversed a and b for wasm as @gottfriedleibniz suggested: #361

static inline __m128 glmm_min(__m128 a, __m128 b) { return _mm_min_ps(a, b); }
static inline __m128 glmm_max(__m128 a, __m128 b) { return _mm_max_ps(a, b); }

static inline float32x4_t glmm_min(float32x4_t a, float32x4_t b) { return vminq_f32(a, b); }
static inline float32x4_t glmm_max(float32x4_t a, float32x4_t b) { return vmaxq_f32(a, b); }

static inline glmm_128 glmm_min(glmm_128 a, glmm_128 b) { return wasm_f32x4_pmin(b, a); }
static inline glmm_128 glmm_max(glmm_128 a, glmm_128 b) { return wasm_f32x4_pmax(b, a); }

@gottfriedleibniz, @myfreeer a review would be awesome.

In NEON there are also vminnmq_f32()/vmaxnmq_f32() which are similar to vminq_f32()/vmaxq_f32(), not sure which is best to use to mimic _mm_min_ps/_mm_max_ps().

I think since we dont expect NaNs, 'maximum performance' is preferred, any ideas?

@kzhsw
Copy link

kzhsw commented Jan 10, 2024

For wasm, is it worth to give user an option to use the instructions from relaxed-simd proposal for maximum performance? It is currently in Phase 4, but only supported in limited environment (currently chrome 114 and wasmtime 15).

@myfreeer
Copy link
Contributor

Agreed with maximum performance, maybe some continous benchmark would help to mesure performance impact of pr or commit.

@gottfriedleibniz
Copy link

gottfriedleibniz commented Jan 10, 2024

With this PR, functions like glm_vec4_maxv can also be simplified.

In NEON there are also vminnmq_f32()/vmaxnmq_f32()

Only for ARMv8+ and it follows IEEE behaviour for NaN handling.

match x86, neon and wasm behaviors

Trying to match the behaviour here is a bit annoying. For NEON to emulate _mm_max_ps, and glm_vec4_maxv without intrinsics, it would require something along the lines of vbslq_f32(vcgtq_f32(a, b), a, b) (verification required) ... which is a bit overkill.

@recp
Copy link
Owner Author

recp commented Jan 10, 2024

@kzhsw thanks for sharing that, in the future to provide max performance some options can be provided 👍 I'm not wasm expert so cant talk much for now :)

maybe some continous benchmark would help to mesure performance impact of pr or commit.

@myfreeer I liked the benchmark idea, TODOs 🤗

With this PR, functions like glm_vec4_maxv can also be simplified.

@gottfriedleibniz thanks for your feedbacks the commit must do that: 6d8dd42

Trying to match the behaviour here is a bit annoying. For NEON to emulate _mm_max_ps, and glm_vec4_maxv without intrinsics, it would require something along the lines of vbslq_f32(vcgtq_f32(a, b), a, b) (verification required) ... which is a bit overkill.

actually just tried to match only simd behaviors where possible without bringing extra overhead but lets ignore it. 👍

Going to merge this, hope we will make it better over time.

@recp recp merged commit 2bd97f6 into master Jan 10, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants