Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Embree build with SYCL support and OneAPI 2024.1.0 #6808

Merged
merged 36 commits into from
Nov 20, 2024

Conversation

lumurillo
Copy link
Contributor

@lumurillo lumurillo commented May 31, 2024

Enable the SYCL support for the Embree package.

Type

  • Bug fix (non-breaking change which fixes an issue): Fixes #
  • New feature (non-breaking change which adds functionality). Resolves #
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) Resolves #

Motivation and Context

This change will allow the implementation of SYCL GPU implementation for the RayCastingScene class.

Checklist:

  • I have run python util/check_style.py --apply to apply Open3D code style
    to my code.
  • This PR changes Open3D behavior or adds new functionality.
    • Both C++ (Doxygen) and Python (Sphinx / Google style) documentation is
      updated accordingly.
    • I have added or updated C++ and / or Python unit tests OR included test
      results
      (e.g. screenshots or numbers) here.
  • I will follow up and update the code if CI fails.
  • For fork PRs, I have selected Allow edits from maintainers.

Description

  • Updated the OneAPI version for the Docker files.
  • Updated the CMake version.
  • Disable the build of the ISPC module.

Copy link

update-docs bot commented May 31, 2024

Thanks for submitting this pull request! The maintainers of this repository would appreciate if you could update the CHANGELOG.md based on your changes.

@ssheorey
Copy link
Member

ssheorey commented Jun 3, 2024

Hi @lumurillo can you update the embree Python examples to use SYCL devices, if available?

examples/python/geometry/ray_casting_closest_geometry.py
examples/python/geometry/ray_casting_sdf.py
examples/python/geometry/ray_casting_to_image.py

@ssheorey ssheorey marked this pull request as draft June 14, 2024 19:21
@ssheorey ssheorey self-requested a review June 14, 2024 19:21
@lumurillo lumurillo force-pushed the lumurillo/use-oneapi-2024.1.0 branch from 94a7db2 to 048008c Compare October 4, 2024 17:35
cpp/open3d/t/geometry/RaycastingScene.h Outdated Show resolved Hide resolved
struct CPUImpl;
#ifdef BUILD_SYCL_MODULE
struct SYCLImpl;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here. To keep the ABI same for sycl and C++ builds, use a single pointer to DevImpl (device impl.) instead of keeping 2 structs. In the ctor, we can initialize this to CPUImpl * or SYCLImpl* appropriately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to keep both, e.g. if the user explicitly wants to choose CPU, despite having a SYCL device. In this case, let's keep both all the time.

I would recommend changing each XImpl to XImpl* (for all), in line with the PIMPL pattern (pointer to implementation). This ensures the ABI doesn't change later, even if we add data to the XImpl structs.

typedef Eigen::Matrix<float, 2, 1, Eigen::DontAlign> Vec2f;
typedef Eigen::Vector3f Vec3f;

void enablePersistentJITCache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCL only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpp/open3d/t/geometry/RaycastingScene.cpp Show resolved Hide resolved
Use sycl_target_sources for files with sycl code in cmake.

style fix
#else
setenv("SYCL_CACHE_PERSISTENT","1",1);
setenv("SYCL_CACHE_DIR","cache",1);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to SYCLUtils.cpp and enable the user to enable and disable caching. Also, we should not overwrite user preferences (if they have already set these env vars, do not change them). The SYCL_CACHE_DIR should be somewhere in the home folder. For reference, here is the CUDA ptx cache default folders:
https://developer.nvidia.com/blog/cuda-pro-tip-understand-fat-binaries-jit-caching/

Perhaps $HOME/.sycl/ComputeCache/?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults for sycl:
https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md.
I think we should not change the default cache location. Enabling cache by default makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssheorey ssheorey force-pushed the lumurillo/use-oneapi-2024.1.0 branch 2 times, most recently from cb71f22 to aa8f03e Compare October 10, 2024 17:38
@lumurillo lumurillo force-pushed the lumurillo/use-oneapi-2024.1.0 branch from aa8f03e to d3e68ea Compare October 12, 2024 17:50
@ssheorey ssheorey added this to the v0.19 milestone Nov 6, 2024
@lumurillo lumurillo force-pushed the lumurillo/use-oneapi-2024.1.0 branch from f4c3a26 to 6284e3d Compare November 9, 2024 14:07
@lumurillo lumurillo marked this pull request as ready for review November 9, 2024 17:00
Copy link
Member

@ssheorey ssheorey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Luis, looks good! See some review comments below.

[Performance] About queue.wait_and_throw() - I would recommend avoiding it in each function. Instead have a sycl utils level wait_and_throw(). This lets users launch a raycast task on the GPU, do something else on the CPU (maybe read more data, etc.) then do a queue.wait_and_throw() to get results.

The current alternative would be multithreading (check if GIL is released for raycast ops in Python), but this needs quite a bit more effort from the user.

@@ -12,10 +12,14 @@
#include "open3d/t/geometry/RaycastingScene.h"

// This header is in the embree src dir (embree/src/ext_embree/..).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move comment

#else
setenv("SYCL_CACHE_PERSISTENT","1",1);
setenv("SYCL_CACHE_DIR","cache",1);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults for sycl:
https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md.
I think we should not change the default cache location. Enabling cache by default makes sense.

struct CPUImpl;
#ifdef BUILD_SYCL_MODULE
struct SYCLImpl;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to keep both, e.g. if the user explicitly wants to choose CPU, despite having a SYCL device. In this case, let's keep both all the time.

I would recommend changing each XImpl to XImpl* (for all), in line with the PIMPL pattern (pointer to implementation). This ensures the ABI doesn't change later, even if we add data to the XImpl structs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Optional] Does it make sense to move the SYCL code to a new file RaycastingSceneSYCL.cpp?

queue_.wait_and_throw();

// Free the allocated memory
sycl::free(previous_geom_prim_ID_tfar, queue_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a memory leak, if there's an exception in the queue above? To prevent that, set previous_geom_prim_ID_tfar=nullptr; after free and do another free in the SYCLImpl dtor if the ptr is not null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -689,16 +1169,50 @@ struct RaycastingScene::Impl {
LoopFn);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use STL here for ArraySum and ArrayPartialSum, and memcpy for CopyArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -122,14 +122,10 @@ def test_test_lots_of_occlusions(device):
def test_add_triangle_mesh(device):
cube = o3d.t.geometry.TriangleMesh.from_legacy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove from_legacy for mesh creation functions now.

@ssheorey
Copy link
Member

Some updates needed to pass the CUDA CI tests. I made a branch copy and ran them here:
https://github.com/isl-org/Open3D/actions/runs/11887514382

@lumurillo I'll take care of this.

@ssheorey
Copy link
Member

Looks good. We can visit the wait_and_throw() optimization later. Thanks @lumurillo !

Copy link
Contributor

@benjaminum benjaminum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ssheorey ssheorey merged commit 5f4985b into isl-org:main Nov 20, 2024
29 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants