4x Faster LUT via StringZilla #36

ashvardanian · 2024-10-13T01:56:21Z

StringZilla brings hardware-accelerated Look-Up Table transformations that can leverage AVX-512 VBMI instructions on recent Intel Ice Lake CPUs (installed in most DGX servers), as well as older Intel Haswell, and newer Arm CPUs, like AWS Graviton 4.

Preliminary benchmarks on new x86 CPUs suggest up to 4x performance improvements compared to the OpenCV baselines. The results will differ depending on the CPU model. I generally recommend using r7iz and r8g AWS instances for profiling.

Summary by Sourcery

Implement a faster Look-Up Table transformation using StringZilla, which utilizes hardware acceleration on supported CPUs, replacing the existing OpenCV-based approach for significant performance gains.

New Features:

Introduce hardware-accelerated Look-Up Table transformations using StringZilla, leveraging AVX-512 VBMI instructions on compatible CPUs.

Enhancements:

Replace OpenCV LUT operations with a custom serialization-based approach for improved performance.

StringZilla brings hardware-accelerated Look-Up Table transformations that can leverage AVX-512 VBMI instructions on recent Intel Ice Lake CPUs (installed in most DGX servers), as well as older Intel Haswell, and newer Arm CPUs, like AWS Graviton 4. Preliminary benchmarks on new x86 CPUs suggest up to 4x performance improvements compared to the OpenCV baselines.

sourcery-ai · 2024-10-13T01:56:25Z

Reviewer's Guide by Sourcery

This pull request introduces StringZilla, a hardware-accelerated Look-Up Table (LUT) transformation library, to improve the performance of LUT operations. The changes primarily affect the apply_lut function in albucore/functions.py, replacing the OpenCV-based LUT application with a new StringZilla-based implementation. This change aims to leverage AVX-512 VBMI instructions on recent CPUs, potentially offering up to 4x performance improvements compared to the OpenCV baselines.

No diagrams generated as the changes look simple and do not need a visual representation.

File-Level Changes

Change	Details	Files
Replaced OpenCV LUT application with StringZilla-based implementation	Introduced a new `serialize_lookup_recover` function that uses StringZilla for LUT operations Modified the `apply_lut` function to use `serialize_lookup_recover` instead of `cv2.LUT` Updated the multi-channel LUT application to use the new StringZilla-based method	`albucore/functions.py`
Improved code formatting and readability	Split long function signatures into multiple lines for better readability Added type hints to improve code clarity	`albucore/functions.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @ashvardanian - I've reviewed your changes - here's some feedback:

Overall Comments:

Please provide benchmarks to validate the 4x performance improvement claim across different CPU architectures.
Consider implementing a fallback mechanism for systems without the required hardware support to maintain broad compatibility.
Add comments explaining the StringZilla implementation to improve code readability and maintainability.

Here's what I looked at during the review

🟡 General issues: 1 issue found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2024-10-13T01:57:12Z

albucore/functions.py

+    def serialize_lookup_recover(img: np.ndarray, lut: np.ndarray) -> np.ndarray:
+        # Encode image into bytes, perform the lookups and then decode the bytes back to numpy array
+        img_bytes = img.tobytes()
+        lut_bytes = lut.tobytes()
+        sz.translate(img_bytes, lut_bytes)
+        return np.frombuffer(img_bytes, dtype=img.dtype).reshape(img.shape)


question (performance): Can you provide context for replacing cv2.LUT with serialize_lookup_recover?

This change seems significant. Could you share any performance benchmarks or explain the rationale behind this new approach? It would be helpful to understand the benefits over the previous cv2.LUT method.

sourcery-ai · 2024-10-13T01:57:12Z

albucore/functions.py

@@ -42,16 +45,27 @@ def create_lut_array(
    raise ValueError(f"Unsupported operation: {operation}")


-def apply_lut(img: np.ndarray, value: float | np.ndarray, operation: Literal["add", "multiply", "power"]) -> np.ndarray:
+def apply_lut(


issue (complexity): Consider refactoring the implementation to improve code organization and clarity.

While the new implementation using stringzilla may offer performance benefits, it does increase code complexity. Consider the following suggestions to balance performance and readability:

Move serialize_lookup_recover outside of apply_lut:

def serialize_lookup_recover(img: np.ndarray, lut: np.ndarray) -> np.ndarray: img_bytes = img.tobytes() lut_bytes = lut.tobytes() sz.translate(img_bytes, lut_bytes) return np.frombuffer(img_bytes, dtype=img.dtype).reshape(img.shape) def apply_lut( img: np.ndarray, value: float | np.ndarray, operation: Literal["add", "multiply", "power"], ) -> np.ndarray: dtype = img.dtype if isinstance(value, (int, float)): lut = create_lut_array(dtype, value, operation) return serialize_lookup_recover(img, clip(lut, dtype)) num_channels = img.shape[-1] luts = create_lut_array(dtype, value, operation) return cv2.merge([serialize_lookup_recover(img[:, :, i], clip(luts[i], dtype)) for i in range(num_channels)])

Add comments explaining the performance benefits:

def serialize_lookup_recover(img: np.ndarray, lut: np.ndarray) -> np.ndarray: # This function uses stringzilla for efficient byte-level LUT application, # which can be faster than cv2.LUT for large images or frequent calls. img_bytes = img.tobytes() lut_bytes = lut.tobytes() sz.translate(img_bytes, lut_bytes) return np.frombuffer(img_bytes, dtype=img.dtype).reshape(img.shape)

Consider adding a benchmark comparison between this method and cv2.LUT to justify the added complexity. If the performance gain is minimal, you might want to revert to the simpler cv2.LUT implementation.

If you keep this implementation, add a note in the function docstring explaining why this approach was chosen over cv2.LUT.

These changes will help maintain the potential performance benefits while improving code readability and maintainability.

ternaus · 2024-10-28T14:40:25Z

Already added

sourcery-ai bot reviewed Oct 13, 2024

View reviewed changes

ternaus closed this Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4x Faster LUT via StringZilla #36

4x Faster LUT via StringZilla #36

ashvardanian commented Oct 13, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 13, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Oct 13, 2024

sourcery-ai bot Oct 13, 2024

ternaus commented Oct 28, 2024

4x Faster LUT via StringZilla #36

4x Faster LUT via StringZilla #36

Conversation

ashvardanian commented Oct 13, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Oct 13, 2024 • edited Loading

Reviewer's Guide by Sourcery

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Oct 13, 2024

Choose a reason for hiding this comment

sourcery-ai bot Oct 13, 2024

Choose a reason for hiding this comment

ternaus commented Oct 28, 2024

ashvardanian commented Oct 13, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Oct 13, 2024 •

edited

Loading