-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] How OpenCL device mode is being reused for SYCL #59
Comments
Just to list a couple of points that should allow us to get going: 1. Since SYCL device model is the same as OpenCL, how much diagnostics from OpenCL do you plan to reuse? I think it would be important for Clang users if SYCL allows similar functionality to be compiled/rejected, so that vendors that supported OpenCL in Clang (not necessarily just upstream!) can have a safe path to support SYCL. Also just a general note on your prototype implementation, that has very little diagnostics functionality. It is perfectly reasonable that the prototype doesn't provide full functionality however it is important to select the directions that wouldn't prevent implementing important functionality. Clang has a reputation of being high quality compiler frontend and one of its main strengths has always been helpful diagnostics. It is important therefore that new implementations adhere to the same philosophy. I would therefore advise to look at this topic even for the host compilation mode - especially in the space of interfacing between host and device or things like restrictions to 2. OpenCL has a number of special opaque types - sampler, image, events, etc... that I believe appear in SYCL too. Does SYCL implementation plan to reuse those? If yes how? 3. OpenCL provides support for explicit address spaces. SYCL doesn't have explicit address spaces but the memory hierarchy is identical to OpenCL and the behavior as described indicates that compiler support is required. What will be your strategy for implementing address space support for SYCL? 4. How do you implement vector functionality - in particular swizzles or conversion ranking? Is any OpenCL functionality required? 5. Do you plan to reuse functionality of OpenCL kernel? If yes how? Not directly related to OpenCL functionality but how do you plan to solve outlining of data structures? The proposed patch https://reviews.llvm.org/D60455 suggest solution for the device functions but don't you need to mark data structures that are to be compiled for the device too i.e. classes that can't use virtual functions on the device? Also should template function also be outlined? 6. Do you plan to reuse OpenCL extensions, OpenCL command line options or builtin functions functionality? 7. Do you plan to provide implementation of SYCL_EXTERNAL? 8. Since the prototype implementation uses 2 separate compilation phases for host and device with different target information - how do you plan to ensure the data structures are the same on the host and device? 9. The prototype implementation makes use of various AST visitors. Is there any information why they are required? 10. There is some functionality specific to Intel FPGAs - do you plan to add this to Clang too? |
@AnastasiaStulova, thanks for posting these questions. I'll try to address them ASAP. Meanwhile, I've made a small experiment with re-using OpenCL sampler type in recently added SYCL class. Here is few early observations:
Diff is here: bader#1 I would really like to know your feedback on that. |
Ok, it looks very reasonable.
I think it's very reasonable to pick a name prefixed with
That's perfectly understandable but considering that SYCL language is designed as a library it should hopefully not be counter intuitive to the developers.
Would lambda still be removed in non optimized mode? |
I picked that have for a few reasons:
Do you want me to update the LLVM IR opaque type name to align it with
If I understand the idea behind SYCL API design correctly, SYCL types restrictions are supposed to be enforced by the C++ wrapper class design and do not require non-standard C++ diagnostics. sampler s;
sampler t = s; // error Assignment can be disallowed by assignment operator overloading. This approach should minimize the changes in the compiler.
No. It would look like we allocate a structure on the stack, copy kernel arguments into this structure and pass it to other function. I don't think this breaks anything as it's quite similar to how "device enqueue" feature in implemented in OpenCL. |
I think
Yep, makes sense to use API logic wherever applicable. I am guessing it's mainly restrictions on the declaration scopes or the uses inside structs you might not be able to represent in APIs? |
It's hard to say "how much". Reviewing the OpenCL diagnostics I noted that there are just a few restrictions for standard C code and the rest of the rules are applied to new "C extensions" like new data types (like images, events, etc.), function attributes and address spaces. This doesn't mean that using OpenCL diagnostics can't be used. I think once we complete SYCL 1.2.1 functionality implementation (i.e. add support for images, streams and hierarchical parallelism), we will go over OpenCL diagnostics more thoroughly and see if it can be applied in SYCL mode. Anyway we will have to use deferred diagnostics engine to avoid applying diagnostics to the host part. NOTE: there are a few diagnostics implemented for CUDA/OpenMP, which use this infrastructure and can be re-used in SYCL mode (e.g. restricting exceptions in "device part").
#167 and #171 demonstrate how OpenCL types can be re-used in SYCL.
"Raw" C++ pointers are supposed to be mapped to "generic OpenCL pointers". SYCL pointer classes are using corresponding OpenCL address space qualifiers to enable correct "inference". For inference, current plan is to re-use InferAddressSpaces llvm pass - https://llvm.org/doxygen/InferAddressSpaces_8cpp_source.html.
vector functionality is implemented as C++ template classes. Conversion operators/swizzles follow standard C++ semantics. One interesting note is that some expressions with swizzles can be evaluated at compile time thanks to expression templates feature.
Yes. SYCL device compiler generates OpenCL kernel function, which calls "lambda body" method.
Yes. It will be added later. We are trying to make patches smaller, so it's easier to review. Current state of the implementation is available on the GitHub.
Yes. AFAIK, outlining of template functions is supported. https://reviews.llvm.org/D60455 has a test for that case: clang/test/CodeGenSYCL/device-functions.cpp.
I think re-using some OpenCL command line options might be useful for SYCL mode (e.g. -cl-fast-relaxed-math even mentioned in the SYCL spec).
Yes, but it's not done yet. I'll make sure we review the implementation design as soon as it's ready.
I think the answer is "the same as OpenCL", but single source provides additional guarantee that both compilers see the same data types definition (if user do not screw it up with pre-processor).
Somewhat related article from the Codeplay's blog: https://www.codeplay.com/portal/01-30-18-getting-sycl-to-interact-with-opencl-code
Are you referring to the SemaSYCL.cpp code?
Yes. |
A little bit more details about it you can find here - https://reviews.llvm.org/D60455#1472705 . |
Generic address space is not supported in CL1.2 devices. InferAS pass doesn't deduce address spaces but only eliminates address space conversions.
Hmm, I can see from your link that you are using Clang vector extension that is the same as OpenCL vectors!
Ok. Let me know once it's done! |
@asavonic, please, correct me if I wrong, but I think you are working on improving InferAS pass to do the "inference" of address spaces. In cases when CL1.2 environment requirements are enforced the plan is to run the inference pass and if it failed (i.e. there "generic" pointers in resulting IR), we fail compilation. For OpenCL 2.0+ environment, SPIR-V module is allowed to have "generic" pointers. We can enforce OpenCL 1.2 restrictions with the compiler option.
Right. If compiler supports "vector extension", Anyway, they point is there are no compiler changes required to support SYCL vector class.
Here is the first version: #181. |
"Inference" is a bit vague term. I suspect that you and Anastasia are talking about the same thing. One thing to note here: we're not relying on IAS pass - this is just an performance optimization that may not work in some cases (e.g. when loads/stores to memory which could not be removed by Mem2Reg are involved).
Right. |
Perhaps I am missing something but I don't think it demonstrates reusing diagnostics from OpenCL. Can you provide some more information please. |
Sorry, I was meant to say "the concept of deferring the diagnostics". I'll leave some ideas on re-using existing diagnostics in PR comment. |
CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGOpenMPRuntime.cpp
CONFLICT (content): Merge conflict in clang/docs/LanguageExtensions.rst
CONFLICT (content): Merge conflict in README.md
…… (#67069) We noticed some performance issue while in lldb-vscode for grabing the name of the SBValue. Profiling shows SBValue::GetName() can cause synthetic children provider of shared/unique_ptr to deference underlying object and complete it type. This patch lazily moves the dereference from synthetic child provider's Update() method to GetChildAtIndex() so that SBValue::GetName() won't trigger the slow code path. Here is the culprit slow code path: ``` ... frame #59: 0x00007ff4102e0660 liblldb.so.15`SymbolFileDWARF::CompleteType(this=<unavailable>, compiler_type=0x00007ffdd9829450) at SymbolFileDWARF.cpp:1567:25 [opt] ... frame #67: 0x00007ff40fdf9bd4 liblldb.so.15`lldb_private::ValueObject::Dereference(this=0x0000022bb5dfe980, error=0x00007ffdd9829970) at ValueObject.cpp:2672:41 [opt] frame #68: 0x00007ff41011bb0a liblldb.so.15`(anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::Update(this=0x000002298fb94380) at LibStdcpp.cpp:403:40 [opt] frame #69: 0x00007ff41011af9a liblldb.so.15`lldb_private::formatters::LibStdcppSharedPtrSyntheticFrontEndCreator(lldb_private::CXXSyntheticChildren*, std::shared_ptr<lldb_private::ValueObject>) [inlined] (anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::LibStdcppSharedPtrSyntheticFrontEnd(this=0x000002298fb94380, valobj_sp=<unavailable>) at LibStdcpp.cpp:371:5 [opt] ... frame #78: 0x00007ff40fdf6e42 liblldb.so.15`lldb_private::ValueObject::CalculateSyntheticValue(this=0x000002296c66a500) at ValueObject.cpp:1836:27 [opt] frame #79: 0x00007ff40fdf1939 liblldb.so.15`lldb_private::ValueObject::GetSyntheticValue(this=<unavailable>) at ValueObject.cpp:1867:3 [opt] frame #80: 0x00007ff40fc89008 liblldb.so.15`ValueImpl::GetSP(this=0x0000022c71b90de0, stop_locker=0x00007ffdd9829d00, lock=0x00007ffdd9829d08, error=0x00007ffdd9829d18) at SBValue.cpp:141:46 [opt] frame #81: 0x00007ff40fc7d82a liblldb.so.15`lldb::SBValue::GetSP(ValueLocker&) const [inlined] ValueLocker::GetLockedSP(this=0x00007ffdd9829d00, in_value=<unavailable>) at SBValue.cpp:208:21 [opt] frame #82: 0x00007ff40fc7d817 liblldb.so.15`lldb::SBValue::GetSP(this=0x00007ffdd9829d90, locker=0x00007ffdd9829d00) const at SBValue.cpp:1047:17 [opt] frame #83: 0x00007ff40fc7da6f liblldb.so.15`lldb::SBValue::GetName(this=0x00007ffdd9829d90) at SBValue.cpp:294:32 [opt] ... ``` Differential Revision: https://reviews.llvm.org/D159542
This is to track the discussion about the device compilation flow in the Clang frontend.
There are multiple ways OpenCL features can be reused:
The objective is to avoid code duplication as much as possible and to keep code base simpler/easier to maintain.
The text was updated successfully, but these errors were encountered: