[perf] hashing function improvements #3242

romgrk · 2024-08-21T08:42:39Z

While I was working on #3241 I saw a few issues with your existing javascript implementation of murmur2, here are a few fixes. I've included a comparison with the WASM alternatives just for the context, but the murmur2_original and murmur2_improved entries of this summary are the important ones. As you can see, we can more than double the performance with the changes in this PR.

changeset-bot · 2024-08-21T08:42:42Z

⚠️ No Changeset found

Latest commit: 646e7ea

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

codesandbox-ci · 2024-08-21T08:44:27Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

packages/hash/src/index.ts

Andarist · 2024-08-21T09:08:43Z

Have you confirmed that this performs better in all modern engines?

romgrk · 2024-08-21T09:29:33Z

Have you confirmed that this performs better in all modern engines?

Here is a comparison. The benchmark code is available here and requires bun (for JSC) and gjs (for SpiderMonkey).

Andarist · 2024-08-21T10:11:04Z

It looks good, cc @emmatown

emmatown · 2024-08-22T23:56:22Z

packages/hash/src/index.ts

+
+  if (input.length > bufferLength) {
+    // bufferLength must be a multiple of 4 to satisfy Int32Array constraints
+    bufferLength = input.length + (4 - input.length % 4)


This isn't allocating enough space for strings where utf8 length != utf16 length and will just ignore some input at the end?

Right, and there's no way to know the byte length until it has been iterated/encoded. I'll allocate 2x the space to ensure there's enough space for the worst case.

emmatown · 2024-08-23T00:20:25Z

packages/hash/src/index.ts

+  if (hasTextEncoder === false) {
+    const bytes = []
+    for (let i = 0; i < input.length; i++) {
+      const codePoint = input.charCodeAt(i)
+      if (codePoint > 0xff) {
+        bytes.push(codePoint >>> 8)
+        bytes.push(codePoint & 0xff)
+      } else {
+        bytes.push(codePoint)
+      }
+    }
+    uint8View = new Uint8Array(bytes)
+    int32View = new Int32Array(uint8View.buffer, 0, Math.floor(bytes.length / 4))
+
+    return bytes.length
+  }


This needs to encode like TextEncoder since the hash needs to return the same results if there e.g. is TextEncoder when a page is server-rendered but not on the client

Self notes:
utf16 to unicode
unicode to utf8

romgrk · 2024-08-23T11:06:21Z

packages/hash/src/index.ts

+  return encoder!.encodeInto(input, uint8View).written as number;
+}
+
+export default function murmur2(input: string): string {


While I was exploring the data a bit more I realized that the hashing function is re-hashing the same strings a lot, and that we could speed it up by caching it in a Map.

One note before the results, I was testing with a dataset made of a bunch of MUI components copy/pasted. For this benchmark I switched to a new dataset that is directly extracted from the MUI dashboard template, which I think is more realistic and reflects production setups even better.

Here are the results, which indicate a substantial improvement for that dataset when caching the murmur2 result. The implementation is very naive, and I found that trying to mutate/update the map with more complex rules actually decreased the gains by a lot, probably because adding bookkeeping on each hash iteration is too much work. For comparison I also added a case for the cached version, but with a dataset with absolutely no duplicate, which means the caching code is pure overhead.

Last note, for this particular dataset, unlike the last dataset, SpiderMonkey shows a small/moderate performance decrease from original to improved, not sure why. This new dataset seems to have a bit shorter string styles, might explain why. I still think original to improved is a gain considering that Firefox accounts for less than 3% of the trafic right now.

So to sum this up, caching the hash result is a gain if we assume that we're going to see a lot of duplicates following each other, which seems to be the case for MUI cases. I would love to have your opinion on this change.

perf: improve murmur2

2606ce8

romgrk mentioned this pull request Aug 21, 2024

[perf] WASM hashing function #3241

Open

romgrk commented Aug 21, 2024

View reviewed changes

packages/hash/src/index.ts Show resolved Hide resolved

romgrk commented Aug 21, 2024

View reviewed changes

packages/hash/src/index.ts Outdated Show resolved Hide resolved

romgrk commented Aug 21, 2024

View reviewed changes

packages/hash/src/index.ts Show resolved Hide resolved

romgrk commented Aug 21, 2024

View reviewed changes

packages/hash/src/index.ts Outdated Show resolved Hide resolved

romgrk added 3 commits August 22, 2024 03:29

feat: legacy encoder

9a2008c

lint

c7cb943

Merge branch 'main' into perf-murmur2-improved

646e7ea

emmatown reviewed Aug 23, 2024

View reviewed changes

romgrk commented Aug 23, 2024

View reviewed changes

oliviertassinari mentioned this pull request Aug 26, 2024

[material-ui][docs] Refine the free templates page mui/material-ui#41469

Open

28 tasks

RodrigoHamuy mentioned this pull request Sep 30, 2024

add changesets release workflow NASA-AMMOS/3DTilesRendererJS#764

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[perf] hashing function improvements #3242

[perf] hashing function improvements #3242

romgrk commented Aug 21, 2024 •

edited

Loading

changeset-bot bot commented Aug 21, 2024 •

edited

Loading

codesandbox-ci bot commented Aug 21, 2024 •

edited

Loading

Andarist commented Aug 21, 2024

romgrk commented Aug 21, 2024

Andarist commented Aug 21, 2024

emmatown Aug 22, 2024

romgrk Aug 23, 2024

emmatown Aug 23, 2024

romgrk Aug 23, 2024

romgrk Aug 23, 2024

[perf] hashing function improvements #3242

Are you sure you want to change the base?

[perf] hashing function improvements #3242

Conversation

romgrk commented Aug 21, 2024 • edited Loading

changeset-bot bot commented Aug 21, 2024 • edited Loading

⚠️ No Changeset found

codesandbox-ci bot commented Aug 21, 2024 • edited Loading

Andarist commented Aug 21, 2024

romgrk commented Aug 21, 2024

Andarist commented Aug 21, 2024

emmatown Aug 22, 2024

Choose a reason for hiding this comment

romgrk Aug 23, 2024

Choose a reason for hiding this comment

emmatown Aug 23, 2024

Choose a reason for hiding this comment

romgrk Aug 23, 2024

Choose a reason for hiding this comment

romgrk Aug 23, 2024

Choose a reason for hiding this comment

romgrk commented Aug 21, 2024 •

edited

Loading

changeset-bot bot commented Aug 21, 2024 •

edited

Loading

codesandbox-ci bot commented Aug 21, 2024 •

edited

Loading