Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make use of posix_memalign for hfile buffer.
On AMD EPYC 7713 aligning to cache size boundaries makes a very significant difference to fp->backend->read performance in the kernel. A modern Intel CPU did not demonstrate this difference. x86 often have cache line size of 64 bytes, and apple Arm chips 128 bytes. I haven't tested if Arm benefits from alignment during read calls, but we can check size with sysconf(_SC_LEVEL1_DCACHE_LINESIZE). However to avoid additional autoconfery I just picked 256 as it gives us headroom and is simple. Speed ups on the AMD EPYC: time bash -c 'for i in `seq 1 30`;do cat < ~/lustre/enwik9| ./bgzip -l5 -@32 > /dev/null;done' Unaligned real 0m45.012s user 10m7.661s sys 0m58.770s Aligned real 0m30.717s user 11m14.004s sys 0m32.921s It is likely this could improve other bits of code too.
- Loading branch information