Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
scrooge: Optimize serialization performance
Problem TReusableMemoryTransport uses a ByteArrayInputStream with synchronised methods, the transport is used from a single thread and synchronization is not really required but it adds some overhead especially on newer JVMs without biased locking. Additionally this introduces additional level of indirection that JIT needs to optimize plus if we controlled the underlying buffer we could use methods from Unsafe to write primitives to the buffer without using an intermediate byte array and without using System.arraycopy which could be expensive for small arrays. Solution Introduce a single threaded implementation of MemoryTransport that directly manages the underlying byte array and also use direct memory writes for primitive types. JDK21 ``` Benchmark (size) Mode Cnt Score Error Units TransportBenchmark.encodeAirlineBaseline N/A thrpt 5 8975.741 ± 73.402 ops/s TransportBenchmark.encodeAirlineByteArrayTransport N/A thrpt 5 14632.672 ± 18.125 ops/s TransportBenchmark.encodeAirlineUnsafeTransport N/A thrpt 5 15742.416 ± 203.313 ops/s TransportBenchmark.encodeDoubleArrayBaseline 5 thrpt 5 3165165.229 ± 865.324 ops/s TransportBenchmark.encodeDoubleArrayBaseline 10 thrpt 5 1996723.513 ± 813.442 ops/s TransportBenchmark.encodeDoubleArrayBaseline 50 thrpt 5 517236.667 ± 9625.233 ops/s TransportBenchmark.encodeDoubleArrayBaseline 500 thrpt 5 55744.128 ± 31.492 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 5 thrpt 5 13034582.982 ± 113618.553 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 10 thrpt 5 7965754.309 ± 11327.671 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 50 thrpt 5 1976067.803 ± 8051.731 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 500 thrpt 5 212641.665 ± 279.586 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 5 thrpt 5 34390166.003 ± 42283.346 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 10 thrpt 5 26869677.811 ± 761342.592 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 50 thrpt 5 8028375.541 ± 75541.740 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 500 thrpt 5 1107122.950 ± 7175.220 ops/s ``` TwitterJDK 11 ``` Benchmark (size) Mode Cnt Score Error Units TransportBenchmark.encodeAirlineBaseline N/A thrpt 5 11765.483 ± 192.296 ops/s TransportBenchmark.encodeAirlineByteArrayTransport N/A thrpt 5 14030.813 ± 49.984 ops/s TransportBenchmark.encodeAirlineUnsafeTransport N/A thrpt 5 15385.809 ± 428.500 ops/s TransportBenchmark.encodeDoubleArrayBaseline 5 thrpt 5 7024298.511 ± 82939.013 ops/s TransportBenchmark.encodeDoubleArrayBaseline 10 thrpt 5 4436471.652 ± 76548.770 ops/s TransportBenchmark.encodeDoubleArrayBaseline 50 thrpt 5 1136314.914 ± 6571.789 ops/s TransportBenchmark.encodeDoubleArrayBaseline 500 thrpt 5 131403.572 ± 1452.554 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 5 thrpt 5 10450745.663 ± 23594.889 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 10 thrpt 5 6542692.999 ± 63437.149 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 50 thrpt 5 1710071.757 ± 10593.509 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 500 thrpt 5 187700.152 ± 969.060 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 5 thrpt 5 25352502.715 ± 65770.557 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 10 thrpt 5 19030775.867 ± 76415.965 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 50 thrpt 5 6069935.386 ± 19759.800 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 500 thrpt 5 698704.098 ± 1804.536 ops/s ``` TwitterJDK with -XX:-UseBiasedLocking ``` Benchmark (size) Mode Cnt Score Error Units TransportBenchmark.encodeAirlineBaseline N/A thrpt 5 8908.851 ± 94.576 ops/s TransportBenchmark.encodeAirlineByteArrayTransport N/A thrpt 5 13334.376 ± 208.466 ops/s TransportBenchmark.encodeAirlineUnsafeTransport N/A thrpt 5 14206.590 ± 120.613 ops/s TransportBenchmark.encodeDoubleArrayBaseline 5 thrpt 5 3305481.887 ± 9922.319 ops/s TransportBenchmark.encodeDoubleArrayBaseline 10 thrpt 5 2152912.133 ± 3121.711 ops/s TransportBenchmark.encodeDoubleArrayBaseline 50 thrpt 5 605487.993 ± 4952.450 ops/s TransportBenchmark.encodeDoubleArrayBaseline 500 thrpt 5 67344.671 ± 51.999 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 5 thrpt 5 9273549.548 ± 5494.655 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 10 thrpt 5 6435802.855 ± 6062.881 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 50 thrpt 5 1712034.742 ± 2193.393 ops/s TransportBenchmark.encodeDoubleArrayByteArrayTransport 500 thrpt 5 187845.588 ± 703.777 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 5 thrpt 5 25114440.253 ± 272593.238 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 10 thrpt 5 19643037.550 ± 207643.475 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 50 thrpt 5 6659417.813 ± 51079.641 ops/s TransportBenchmark.encodeDoubleArrayUnsafeTransport 500 thrpt 5 698239.188 ± 3356.311 ops/s ``` Differential Revision: https://phabricator.twitter.biz/D1176860
- Loading branch information