Add volk_64f_x2_dot_prod_64f #627

BatchDrake · 2023-08-04T08:31:12Z

This is basically the 64-bit version of volk_32f_x2_dot_prod_32f. Since this is my first PR to Volk and I will probably be writing a few more kernels for batched 64-bit 3D plane/rect intersections at some point, all stylistic/performance feedback is more than welcome.

This is the result of the test:

test 121
    Start 121: qa_volk_64f_x2_dot_prod_64f

121: Test command: /usr/bin/sh "/home/waldo/Documents/Development/volk/build/lib/volk_64f_x2_dot_prod_64f_test.sh" "/home/waldo/Documents/Development/volk/build/lib"
121: Test timeout computed to be: 10000000
121: RUN_VOLK_TESTS: volk_64f_x2_dot_prod_64f(131071,1)
121: generic completed in 0.210324 ms
121: u_sse completed in 0.212984 ms
121: u_sse3 completed in 0.218968 ms
121: u_sse4_1 completed in 0.210118 ms
121: u_avx completed in 0.141696 ms
121: u_avx2_fma completed in 0.178188 ms
121: a_generic completed in 0.375334 ms
121: a_sse completed in 0.290898 ms
121: a_sse3 completed in 0.38351 ms
121: a_sse4_1 completed in 0.508197 ms
121: a_avx completed in 0.493607 ms
121: a_avx2_fma completed in 0.529337 ms
121: Best aligned arch: u_avx
121: Best unaligned arch: u_avx
1/1 Test #121: qa_volk_64f_x2_dot_prod_64f ......   Passed    0.06 sec

The following tests passed:
        qa_volk_64f_x2_dot_prod_64f

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.07 sec

PD: While I expected some improvement (33% increase), I see that most of them perform worse than the generic kernel. Don't know whether it makes sense to keep the worst ones.

Signed-off-by: Gonzalo J. Carracedo Carballal <[email protected]>

jdemel · 2023-08-11T00:09:26Z

@BatchDrake thanks for this PR! I'll look into it.

jdemel

Again. Thanks for your PR. I hope I could add some hints.

jdemel · 2023-08-11T00:20:35Z

kernels/volk/volk_64f_x2_dot_prod_64f.h

+#ifdef LV_HAVE_GENERIC
+
+
+static inline void volk_64f_x2_dot_prod_64f_a_generic(double* result,


I suggest to remove this function. Old kernels have the aligned generic version sometimes. But the generic kernel should not rely on any alignment. Also, this kernel yields wildly differing results compared to the "unaligned".

jdemel · 2023-08-11T00:25:05Z

kernels/volk/volk_64f_x2_dot_prod_64f.h

+    for (; number < eighthPoints; number++) {
+
+        a0Val = _mm_load_pd(aPtr);
+        a1Val = _mm_load_pd(aPtr + 2);
+        a2Val = _mm_load_pd(aPtr + 4);
+        a3Val = _mm_load_pd(aPtr + 6);


This might actually be a source for slow results. Compilers are incredibly smart nowadays. This kind of manual "loop unrolling" might actually block some compiler optimizations.
You might want to start with godbolt.com and inspect the results for the generic kernel in case you compile for a specific SIMD extension. I'm aware that it might not be trivial to find the optimized assembly code in the output. Still, it is a possible starting point.

Add volk_64f_x2_dot_prod_64f

3c4e7e1

Signed-off-by: Gonzalo J. Carracedo Carballal <[email protected]>

BatchDrake force-pushed the feature/dotprod64f branch from cc36044 to 3c4e7e1 Compare August 4, 2023 08:32

jdemel reviewed Aug 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add volk_64f_x2_dot_prod_64f #627

Add volk_64f_x2_dot_prod_64f #627

BatchDrake commented Aug 4, 2023

jdemel commented Aug 11, 2023

jdemel left a comment

jdemel Aug 11, 2023

jdemel Aug 11, 2023

		#ifdef LV_HAVE_GENERIC


		static inline void volk_64f_x2_dot_prod_64f_a_generic(double* result,

Add volk_64f_x2_dot_prod_64f #627

Are you sure you want to change the base?

Add volk_64f_x2_dot_prod_64f #627

Conversation

BatchDrake commented Aug 4, 2023

jdemel commented Aug 11, 2023

jdemel left a comment

Choose a reason for hiding this comment

jdemel Aug 11, 2023

Choose a reason for hiding this comment

jdemel Aug 11, 2023

Choose a reason for hiding this comment