-
-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Fragment density map support. #99551
base: master
Are you sure you want to change the base?
Conversation
a5c4d73
to
6fda940
Compare
Just on a small side note. While the KHR extension we are using is called the Fragment Rate Shading extension, just talking about FRS is misleading. The KHR is Khronos' fully ratified VRS extension that includes support for what amounts to a fragment density map. I don't know why vendors introduced an EXT that just implements a fragment density map, instead of just using the KHR with tier 1 and 2 disabled (which is allowed in the KHR). It's confusing as heck because KHRs stand higher in standing than EXTs. It must be a mobile thing because the KHR seems pretty uniformly supported on desktop GPUs from Vulkan 1.0 onwards. Though I guess it has to do with the KHR allowing things that make certain performance improvements difficult (like allowing the density map to only effect a single subpass making the tile optimisation harder). Anyway, I only attempted to implement the EXT because Quest 1 and 2 didn't have support for the KHR extension, than Quest 3 introduced support for it and it took the pressure off. Especially seeing there just wasn't enough information or help available at the time to figure out what I was doing wrong :) Anyway, to make a long story short. I do think we need to be careful with our terminology to make sure people understand the differences between the two implementations. Finally, we should get @RandomShaper and @stuartcarnie to weigh in as well, if we're tinkering on this part of the implementation we should take into account that we may further need changes for support on DirectX (nice to have) and Metal (very important on AVP and future Apple XR hardware). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking excellent and I think we're just down to final choices.
Most remarks I have probably hark back to early choices made in my own PR that I don't feel are right anymore.
My main concern is that we are exposing too much of the choice between the two approaches to the rendering engine implementation. I think we can internalise a lot inside of the rendering driver, that by simply choosing upfront which extension we're using, the rendering driver can do the right thing so the code in renderer_rd
doesn't need to be bothered by it.
That should also future proof us to an eventual Metal and DirectX implementation.
return vrs_capabilities.max_fragment_size.x; | ||
case LIMIT_VRS_MAX_FRAGMENT_HEIGHT: | ||
return vrs_capabilities.max_fragment_size.y; | ||
case LIMIT_FRAGMENT_SHADING_RATE_TEXEL_WIDTH: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to split these into two groups? I guess its needed if we allow both extensions to be enabled, but I think it can be a valid restriction that we enable only one.
int32_t texel_height = RD::get_singleton()->limit_get(RD::LIMIT_VRS_TEXEL_HEIGHT); | ||
int32_t texel_width = 0; | ||
int32_t texel_height = 0; | ||
if (RD::get_singleton()->has_feature(RD::SUPPORTS_FRAGMENT_DENSITY_MAP)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this kind of bolts down to a design choice that I do know the right way forward for and ties into remarks I made in earlier comments.
Especially once Metal and DirectX come into play, we're starting to make a lot of GPU specific choices outside of our driver that will make code like in these spots more and more complex. If we can we should internalise these choices.
Also we're making the assumption here that if the density map extension is available, thats the choice we're going with. That may be true for the Quest (where this enables performance improvements), but there is no telling if this is universally true. It is very possible that on other hardware we should prioritise the KHR over the EXT if both are available.
I think we should have a way to indirect which one is prefered, where at initialisation either one or the other is enabled, and on calling RD::get_singleton()->limit_get(RD::LIMIT_VRS_TEXEL_WIDTH);
we return the correct value for the extension we use. And it will be future proof for when we have Metal and/or DirectX support.
This also ensures that we don't accidentally mix and match logic, we get a consistent choice made by the rendering driver.
We can't get away with some querying and specific logic (like whether our density map has rates or weights and whether X and Y are split into channels) but also that requirement can be queried from the rendering driver which then returns the correct requirements based on the chosen extension.
@@ -172,6 +172,7 @@ RID RenderForwardMobile::RenderBufferDataForwardMobile::get_color_fbs(Framebuffe | |||
|
|||
uint32_t view_count = render_buffers->get_view_count(); | |||
|
|||
bool vrs_uses_fdm = RD::get_singleton()->has_feature(RD::SUPPORTS_FRAGMENT_DENSITY_MAP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So just to hark back on my earlier remark up above. If we go for an API that always specifies a VRS attachment on the pass, and we just apply it to all subpasses, instead of adding an attachment to each subpass here, we can simplify this logic and not have to make choices here which may also end up being different on Metal and/or DirectX.
If we do keep the current approach of adding attachments to subpasses here, I think the constant passed here should be more generic, like RD::SUPPORT_VRS_PER_SUBPASS
that switches whether we can specify a density map per subpass instead of supplying one for the whole pass.
@@ -59,7 +59,7 @@ class FramebufferCacheRD : public Object { | |||
|
|||
static _FORCE_INLINE_ uint32_t _hash_pass(const RD::FramebufferPass &p, uint32_t h) { | |||
h = hash_murmur3_one_32(p.depth_attachment, h); | |||
h = hash_murmur3_one_32(p.vrs_attachment, h); | |||
h = hash_murmur3_one_32(p.fragment_shading_rate_attachment, h); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should remain being named vrs_attachment
, we're painting ourselves into a corner if Metal or DirectX or some other future extension also allows the density map to be specified per subpass, when we start naming variables in the renderer_rd
based off of differences in Vulkan extensions.
@@ -1424,7 +1424,7 @@ void RendererSceneRenderRD::sdfgi_set_debug_probe_select(const Vector3 &p_positi | |||
RendererSceneRenderRD *RendererSceneRenderRD::singleton = nullptr; | |||
|
|||
bool RendererSceneRenderRD::is_vrs_supported() const { | |||
return RD::get_singleton()->has_feature(RD::SUPPORTS_ATTACHMENT_VRS); | |||
return RD::get_singleton()->has_feature(RD::SUPPORTS_ATTACHMENT_FRAGMENT_SHADING_RATE) || RD::get_singleton()->has_feature(RD::SUPPORTS_FRAGMENT_DENSITY_MAP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I don't like this direction. The has_feature
check inside of the driver is perfectly capable of checking if one of these extensions is enabled.
Splitting that outside of the driver means that this list of checks will grow as we add more and more GPU drivers that have some form of VRS support.
(and I realise that probably came from my original PR, but that wasn't smart of me and we should do better :P )
@@ -725,9 +725,17 @@ uint32_t RenderSceneBuffersRD::get_velocity_usage_bits(bool p_resolve, bool p_ms | |||
} | |||
|
|||
RD::DataFormat RenderSceneBuffersRD::get_vrs_format() { | |||
return RD::DATA_FORMAT_R8_UINT; | |||
if (RD::get_singleton()->has_feature(RD::SUPPORTS_FRAGMENT_DENSITY_MAP)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I think it would be better to have a getter in the rendering device that returns the expected datatype used by the enabled VRS extension.
That harks back to my annoyance that limit_get
is way to constrained as a getter of values, we already abuse it to obtain various values that aren't strictly limits. Bit late now but we should have done that differently.
Instead of answering every single comment by @BastiaanOlij since it seems a lot of them share a common problem that we identified, I'll propose a new solution and see if you think we should go with it. There's one rule we need to adhere to here, which is that RenderingDeviceDriver should not hide away the details of the implementation that it chooses for VRS as we need the API to match as closely to a common ground for Vulkan and other APIs as possible. Any logic that makes it into the backend needs to be replicated into every other backend, which is why I'm against choosing the method for VRS inside of it. However, I think the idea to make this logic hidden inside of RenderingDevice would be really good instead. There's only one implementation and we usually already do all our logic that we don't want to see users to see between mapping the API they're used to and what the driver needs to do. Therefore I propose this design:
I think this can work out just fine. I'm really insistent on keeping the driver as explicit as possible here, and it's not an API we currently expose to users so we're allowed to change and refactor as needed, as nothing outside of RenderingDevice currently uses it. |
@DarioSamo I think we're mostly on the same page, we'll take a few things onto rocket chat so I understand where you're planning to put certain bits of logic. |
Just to recap as I talked to Dario earlier today. The thing that wasn't clear was that Dario wanted to move some logic into Rendering Device that I thought he was keeping in the rendering implementation. |
0854d00
to
b807c02
Compare
Requesting @BastiaanOlij to take another look, I believe the new version should be much closer to what we agreed to. I've not however ported VRS conversion to being done inside RenderingDevice yet, but perhaps you think the approach I went for so far isn't too bad and might make it unnecessary to do so. |
I think this for the most part looks great.
Yes totally agree, that should just be a follow up. The only thing that still bothers me, but which may be too much of a nitpick, is that I don't think using FSR and FDM as the main choice is valid. These are implementation details on the current extensions in Vulkan and our choice between the two is only valid specifically for the design choices on Qualcomm chipsets (at the moment). Basically there are these ingredients:
It is totally feasible on another GPU driver, or for a future Vulkan extension, that we get an FDM style density map that can be used per subpass, or an FSR style density map that can only be applied to the whole pass. So there is a disjoint in how we're exposing that. Point 4 is the most difficult though, because we have no way of determining which will be faster other than testing on different devices that support both. While today the only device where this matters is the Quest, and the same rules are likely for many other stand alone headsets as they all use the same XR2 chipset and Qualcomm drivers, there is no guarantee this holds up for other devices. So I feel |
For context for the rest as we sorted out most of it over Rocket Chat, the conclusions we arrived to are:
|
d6518a6
to
cb6f2a5
Compare
Co-Authored-By: Bastiaan Olij <[email protected]>
cb6f2a5
to
206e8d6
Compare
I feel the PR is in a pretty good state now. |
This is an attempt to revive #85824, as it's deemed as necessary to properly support foveated rendering on mobile VR devices. I've talked over it with @BastiaanOlij and he told me to go ahead and finish the PR myself.
For other reasons, this is also based on top of #98670 at the moment.
There's a few core changes introduced compared to the previous PR:
Contributed by W4 Games. 🍀