simd: min / max helpers #379

recp · 2024-01-09T18:49:30Z

New SIMD helpers:

glmm_min(a, b)
glmm_max(a, b)

simd helpers to match x86, neon and wasm behaviors ( without compromising performance ). I reversed a and b for wasm as @gottfriedleibniz suggested: #361

static inline __m128 glmm_min(__m128 a, __m128 b) { return _mm_min_ps(a, b); }
static inline __m128 glmm_max(__m128 a, __m128 b) { return _mm_max_ps(a, b); }

static inline float32x4_t glmm_min(float32x4_t a, float32x4_t b) { return vminq_f32(a, b); }
static inline float32x4_t glmm_max(float32x4_t a, float32x4_t b) { return vmaxq_f32(a, b); }

static inline glmm_128 glmm_min(glmm_128 a, glmm_128 b) { return wasm_f32x4_pmin(b, a); }
static inline glmm_128 glmm_max(glmm_128 a, glmm_128 b) { return wasm_f32x4_pmax(b, a); }

@gottfriedleibniz, @myfreeer a review would be awesome.

In NEON there are also vminnmq_f32()/vmaxnmq_f32() which are similar to vminq_f32()/vmaxq_f32(), not sure which is best to use to mimic _mm_min_ps/_mm_max_ps().

I think since we dont expect NaNs, 'maximum performance' is preferred, any ideas?

kzhsw · 2024-01-10T03:46:01Z

For wasm, is it worth to give user an option to use the instructions from relaxed-simd proposal for maximum performance? It is currently in Phase 4, but only supported in limited environment (currently chrome 114 and wasmtime 15).

myfreeer · 2024-01-10T12:55:01Z

Agreed with maximum performance, maybe some continous benchmark would help to mesure performance impact of pr or commit.

gottfriedleibniz · 2024-01-10T15:52:41Z

With this PR, functions like glm_vec4_maxv can also be simplified.

In NEON there are also vminnmq_f32()/vmaxnmq_f32()

Only for ARMv8+ and it follows IEEE behaviour for NaN handling.

match x86, neon and wasm behaviors

Trying to match the behaviour here is a bit annoying. For NEON to emulate _mm_max_ps, and glm_vec4_maxv without intrinsics, it would require something along the lines of vbslq_f32(vcgtq_f32(a, b), a, b) (verification required) ... which is a bit overkill.

recp · 2024-01-10T21:18:59Z

@kzhsw thanks for sharing that, in the future to provide max performance some options can be provided 👍 I'm not wasm expert so cant talk much for now :)

maybe some continous benchmark would help to mesure performance impact of pr or commit.

@myfreeer I liked the benchmark idea, TODOs 🤗

With this PR, functions like glm_vec4_maxv can also be simplified.

@gottfriedleibniz thanks for your feedbacks the commit must do that: 6d8dd42

Trying to match the behaviour here is a bit annoying. For NEON to emulate _mm_max_ps, and glm_vec4_maxv without intrinsics, it would require something along the lines of vbslq_f32(vcgtq_f32(a, b), a, b) (verification required) ... which is a bit overkill.

actually just tried to match only simd behaviors where possible without bringing extra overhead but lets ignore it. 👍

Going to merge this, hope we will make it better over time.

simd: min / max helpers

dab8679

recp added help wanted feedback wanted labels Jan 9, 2024

simd: use new glmm_min/max in vec4 where possible

6d8dd42

recp merged commit 2bd97f6 into master Jan 10, 2024
50 checks passed

chenrui333 mentioned this pull request Mar 26, 2024

cglm 0.9.3 Homebrew/homebrew-core#167240

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simd: min / max helpers #379

simd: min / max helpers #379

recp commented Jan 9, 2024

kzhsw commented Jan 10, 2024

myfreeer commented Jan 10, 2024

gottfriedleibniz commented Jan 10, 2024 •

edited

Loading

recp commented Jan 10, 2024

simd: min / max helpers #379

simd: min / max helpers #379

Conversation

recp commented Jan 9, 2024

kzhsw commented Jan 10, 2024

myfreeer commented Jan 10, 2024

gottfriedleibniz commented Jan 10, 2024 • edited Loading

recp commented Jan 10, 2024

gottfriedleibniz commented Jan 10, 2024 •

edited

Loading