Skip to content

Commit

Permalink
Make use of posix_memalign for hfile buffer.
Browse files Browse the repository at this point in the history
On AMD EPYC 7713 aligning to cache size boundaries makes a very
significant difference to fp->backend->read performance in the
kernel.  A modern Intel CPU did not demonstrate this difference.

x86 often have cache line size of 64 bytes, and apple Arm chips 128
bytes.  I haven't tested if Arm benefits from alignment during read
calls, but we can check size with sysconf(_SC_LEVEL1_DCACHE_LINESIZE).
However to avoid additional autoconfery I just picked 256 as it gives
us headroom and is simple.

Speed ups on the AMD EPYC:

time bash -c 'for i in `seq 1 30`;do cat < ~/lustre/enwik9| ./bgzip -l5 -@32 > /dev/null;done'

Unaligned
real    0m45.012s
user    10m7.661s
sys     0m58.770s

Aligned
real    0m30.717s
user    11m14.004s
sys     0m32.921s

It is likely this could improve other bits of code too.
  • Loading branch information
jkbonfield committed Nov 14, 2024
1 parent 186d21b commit 63e1a18
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 1 deletion.
2 changes: 1 addition & 1 deletion configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,7 @@ HTS_HIDE_DYNAMIC_SYMBOLS

dnl FIXME This pulls in dozens of standard header checks
AC_FUNC_MMAP
AC_CHECK_FUNCS([gmtime_r fsync drand48 srand48_deterministic getauxval elf_aux_info])
AC_CHECK_FUNCS([gmtime_r fsync drand48 srand48_deterministic getauxval elf_aux_info posix_memalign])

# Darwin has a dubious fdatasync() symbol, but no declaration in <unistd.h>
AC_CHECK_DECL([fdatasync(int)], [AC_CHECK_FUNCS(fdatasync)])
Expand Down
5 changes: 5 additions & 0 deletions hfile.c
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,13 @@ hFILE *hfile_init(size_t struct_size, const char *mode, size_t capacity)
// FIXME For now, clamp input buffer sizes so mpileup doesn't eat memory
if (strchr(mode, 'r') && capacity > maxcap) capacity = maxcap;

#ifdef HAVE_POSIX_MEMALIGN
if (posix_memalign((void **)&fp->buffer, 256, capacity) < 0)
goto error;
#else
fp->buffer = (char *) malloc(capacity);
if (fp->buffer == NULL) goto error;
#endif

fp->begin = fp->end = fp->buffer;
fp->limit = &fp->buffer[capacity];
Expand Down

0 comments on commit 63e1a18

Please sign in to comment.