Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Fragment density map support. #99551

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DarioSamo
Copy link
Contributor

@DarioSamo DarioSamo commented Nov 22, 2024

This is an attempt to revive #85824, as it's deemed as necessary to properly support foveated rendering on mobile VR devices. I've talked over it with @BastiaanOlij and he told me to go ahead and finish the PR myself.

For other reasons, this is also based on top of #98670 at the moment.

There's a few core changes introduced compared to the previous PR:

  • The rendering device driver is much more explicit about whether it uses fragment rate shading or fragment density maps. This falls more in line with the design we want to follow where we do not want to take decisions such as which technique to use at the driver level but at a higher level instead.
  • VRS is still kept as a term for higher level rendering code and user-facing settings. We opt to choose the best supported method on the current driver based on what the driver reports instead.
  • The implementation actually works as intended now as the creation of fragment density map affects the entire render pass and not just a subpass.
  • Density map generation was tweaked as one particular error was fixed, the vertical focus of the eye position seems to incorrect. This gave me a nice quality improvement on the Quest 3. I'm not aware if it's not following the OpenXR convention properly in this case, so we might want to make sure that is correct.
    • I've also experimented with changing the formula of the density map generation slightly to something more standard, but I feel I'm missing context here as to how the original values were compute. TODO: @BastiaanOlij I've placed a lot of notes about this which we should discuss through the review process if possible or through RocketChat if you're available.

Contributed by W4 Games. 🍀

@BastiaanOlij
Copy link
Contributor

Just on a small side note.

While the KHR extension we are using is called the Fragment Rate Shading extension, just talking about FRS is misleading. The KHR is Khronos' fully ratified VRS extension that includes support for what amounts to a fragment density map.

I don't know why vendors introduced an EXT that just implements a fragment density map, instead of just using the KHR with tier 1 and 2 disabled (which is allowed in the KHR). It's confusing as heck because KHRs stand higher in standing than EXTs. It must be a mobile thing because the KHR seems pretty uniformly supported on desktop GPUs from Vulkan 1.0 onwards. Though I guess it has to do with the KHR allowing things that make certain performance improvements difficult (like allowing the density map to only effect a single subpass making the tile optimisation harder).

Anyway, I only attempted to implement the EXT because Quest 1 and 2 didn't have support for the KHR extension, than Quest 3 introduced support for it and it took the pressure off. Especially seeing there just wasn't enough information or help available at the time to figure out what I was doing wrong :)
Little did I know that only the EXT has the optimisations enabled that make this really shine on the Quest or I would have pushed harder.

Anyway, to make a long story short. I do think we need to be careful with our terminology to make sure people understand the differences between the two implementations.
On paper the KHR is superior to the EXT BUT with the EXT having specific performance optimisations on the Qualcomm XR2 chipset, it is the right choice there.
That choice may vary on other headsets depending on GPU specifics.

Finally, we should get @RandomShaper and @stuartcarnie to weigh in as well, if we're tinkering on this part of the implementation we should take into account that we may further need changes for support on DirectX (nice to have) and Metal (very important on AVP and future Apple XR hardware).
I don't know much about what Metal supports here, but DirectX is going to be an interesting challenge as I've heard that DirectX does not allow for separate density maps per view when multiview is used.

Copy link
Contributor

@BastiaanOlij BastiaanOlij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking excellent and I think we're just down to final choices.
Most remarks I have probably hark back to early choices made in my own PR that I don't feel are right anymore.

My main concern is that we are exposing too much of the choice between the two approaches to the rendering engine implementation. I think we can internalise a lot inside of the rendering driver, that by simply choosing upfront which extension we're using, the rendering driver can do the right thing so the code in renderer_rd doesn't need to be bothered by it.
That should also future proof us to an eventual Metal and DirectX implementation.

return vrs_capabilities.max_fragment_size.x;
case LIMIT_VRS_MAX_FRAGMENT_HEIGHT:
return vrs_capabilities.max_fragment_size.y;
case LIMIT_FRAGMENT_SHADING_RATE_TEXEL_WIDTH:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to split these into two groups? I guess its needed if we allow both extensions to be enabled, but I think it can be a valid restriction that we enable only one.

int32_t texel_height = RD::get_singleton()->limit_get(RD::LIMIT_VRS_TEXEL_HEIGHT);
int32_t texel_width = 0;
int32_t texel_height = 0;
if (RD::get_singleton()->has_feature(RD::SUPPORTS_FRAGMENT_DENSITY_MAP)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this kind of bolts down to a design choice that I do know the right way forward for and ties into remarks I made in earlier comments.

Especially once Metal and DirectX come into play, we're starting to make a lot of GPU specific choices outside of our driver that will make code like in these spots more and more complex. If we can we should internalise these choices.

Also we're making the assumption here that if the density map extension is available, thats the choice we're going with. That may be true for the Quest (where this enables performance improvements), but there is no telling if this is universally true. It is very possible that on other hardware we should prioritise the KHR over the EXT if both are available.

I think we should have a way to indirect which one is prefered, where at initialisation either one or the other is enabled, and on calling RD::get_singleton()->limit_get(RD::LIMIT_VRS_TEXEL_WIDTH); we return the correct value for the extension we use. And it will be future proof for when we have Metal and/or DirectX support.

This also ensures that we don't accidentally mix and match logic, we get a consistent choice made by the rendering driver.

We can't get away with some querying and specific logic (like whether our density map has rates or weights and whether X and Y are split into channels) but also that requirement can be queried from the rendering driver which then returns the correct requirements based on the chosen extension.

@@ -172,6 +172,7 @@ RID RenderForwardMobile::RenderBufferDataForwardMobile::get_color_fbs(Framebuffe

uint32_t view_count = render_buffers->get_view_count();

bool vrs_uses_fdm = RD::get_singleton()->has_feature(RD::SUPPORTS_FRAGMENT_DENSITY_MAP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to hark back on my earlier remark up above. If we go for an API that always specifies a VRS attachment on the pass, and we just apply it to all subpasses, instead of adding an attachment to each subpass here, we can simplify this logic and not have to make choices here which may also end up being different on Metal and/or DirectX.

If we do keep the current approach of adding attachments to subpasses here, I think the constant passed here should be more generic, like RD::SUPPORT_VRS_PER_SUBPASS that switches whether we can specify a density map per subpass instead of supplying one for the whole pass.

@@ -59,7 +59,7 @@ class FramebufferCacheRD : public Object {

static _FORCE_INLINE_ uint32_t _hash_pass(const RD::FramebufferPass &p, uint32_t h) {
h = hash_murmur3_one_32(p.depth_attachment, h);
h = hash_murmur3_one_32(p.vrs_attachment, h);
h = hash_murmur3_one_32(p.fragment_shading_rate_attachment, h);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should remain being named vrs_attachment, we're painting ourselves into a corner if Metal or DirectX or some other future extension also allows the density map to be specified per subpass, when we start naming variables in the renderer_rd based off of differences in Vulkan extensions.

@@ -1424,7 +1424,7 @@ void RendererSceneRenderRD::sdfgi_set_debug_probe_select(const Vector3 &p_positi
RendererSceneRenderRD *RendererSceneRenderRD::singleton = nullptr;

bool RendererSceneRenderRD::is_vrs_supported() const {
return RD::get_singleton()->has_feature(RD::SUPPORTS_ATTACHMENT_VRS);
return RD::get_singleton()->has_feature(RD::SUPPORTS_ATTACHMENT_FRAGMENT_SHADING_RATE) || RD::get_singleton()->has_feature(RD::SUPPORTS_FRAGMENT_DENSITY_MAP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I don't like this direction. The has_feature check inside of the driver is perfectly capable of checking if one of these extensions is enabled.
Splitting that outside of the driver means that this list of checks will grow as we add more and more GPU drivers that have some form of VRS support.

(and I realise that probably came from my original PR, but that wasn't smart of me and we should do better :P )

@@ -725,9 +725,17 @@ uint32_t RenderSceneBuffersRD::get_velocity_usage_bits(bool p_resolve, bool p_ms
}

RD::DataFormat RenderSceneBuffersRD::get_vrs_format() {
return RD::DATA_FORMAT_R8_UINT;
if (RD::get_singleton()->has_feature(RD::SUPPORTS_FRAGMENT_DENSITY_MAP)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I think it would be better to have a getter in the rendering device that returns the expected datatype used by the enabled VRS extension.

That harks back to my annoyance that limit_get is way to constrained as a getter of values, we already abuse it to obtain various values that aren't strictly limits. Bit late now but we should have done that differently.

servers/xr/xr_vrs.cpp Outdated Show resolved Hide resolved
@DarioSamo
Copy link
Contributor Author

DarioSamo commented Nov 25, 2024

Instead of answering every single comment by @BastiaanOlij since it seems a lot of them share a common problem that we identified, I'll propose a new solution and see if you think we should go with it.

There's one rule we need to adhere to here, which is that RenderingDeviceDriver should not hide away the details of the implementation that it chooses for VRS as we need the API to match as closely to a common ground for Vulkan and other APIs as possible. Any logic that makes it into the backend needs to be replicated into every other backend, which is why I'm against choosing the method for VRS inside of it.

However, I think the idea to make this logic hidden inside of RenderingDevice would be really good instead. There's only one implementation and we usually already do all our logic that we don't want to see users to see between mapping the API they're used to and what the driver needs to do.

Therefore I propose this design:

  • Drivers both explicitly report support for fragment shading rate (FSR) and fragment density map (FDM) and keep their associated explicit API as it currently is. This saves us a lot of trouble on understanding what the driver is doing from higher level code and guarantees their implementations remain consistent.
  • Rendering Device instead maintains the current VRS terminology. Rendering Device will have chosen during initialization what method to use and its methods will return the appropiate sizes and map the attachments correctly.
  • Any code that uses RenderingDevice will be completely unaware of what method was chosen. The user-facing API will remain the same for both. Formats can be requested from RenderingDevice with specific methods to get the preferred data format for the VRS map and the chosen tile size.
    • We might need to make a concession here on how to define the FSR/FDM map as part of the render pass, probably by making it apply to the entire render pass and specifying the subpasses that it should be used on. FDM does not support doing this per subpass, so we need to figure out how best to approach it here.

I think this can work out just fine. I'm really insistent on keeping the driver as explicit as possible here, and it's not an API we currently expose to users so we're allowed to change and refactor as needed, as nothing outside of RenderingDevice currently uses it.

@BastiaanOlij
Copy link
Contributor

@DarioSamo I think we're mostly on the same page, we'll take a few things onto rocket chat so I understand where you're planning to put certain bits of logic.

@BastiaanOlij
Copy link
Contributor

Just to recap as I talked to Dario earlier today. The thing that wasn't clear was that Dario wanted to move some logic into Rendering Device that I thought he was keeping in the rendering implementation.
With that cleared up, totally agree with Darios approach sketched above.

@DarioSamo
Copy link
Contributor Author

Requesting @BastiaanOlij to take another look, I believe the new version should be much closer to what we agreed to. I've not however ported VRS conversion to being done inside RenderingDevice yet, but perhaps you think the approach I went for so far isn't too bad and might make it unnecessary to do so.

@BastiaanOlij
Copy link
Contributor

I think this for the most part looks great.

I've not however ported VRS conversion to being done inside RenderingDevice yet, but perhaps you think the approach I went for so far isn't too bad and might make it unnecessary to do so.

Yes totally agree, that should just be a follow up.

The only thing that still bothers me, but which may be too much of a nitpick, is that I don't think using FSR and FDM as the main choice is valid. These are implementation details on the current extensions in Vulkan and our choice between the two is only valid specifically for the design choices on Qualcomm chipsets (at the moment).

Basically there are these ingredients:

  1. What is the format of the density map (Single channel combined densities using fragment size as used in FSR, or split channel with separate densities for X and Y represented as a percentage of max as used in FDM).
  2. Can the density map be applied on a subpass level or only per pass.
  3. Can the density map be applied per VRS layer, (multi layer texture), or is it a single map applied to all layers (single layer VRS texture). This I've heard is going to be important to complete the DirectX implementation, but I don't know this for sure.
  4. Which one is faster?

It is totally feasible on another GPU driver, or for a future Vulkan extension, that we get an FDM style density map that can be used per subpass, or an FSR style density map that can only be applied to the whole pass. So there is a disjoint in how we're exposing that.

Point 4 is the most difficult though, because we have no way of determining which will be faster other than testing on different devices that support both. While today the only device where this matters is the Quest, and the same rules are likely for many other stand alone headsets as they all use the same XR2 chipset and Qualcomm drivers, there is no guarantee this holds up for other devices.
Magic Leap for instance runs AMD hardware, though as I don't have a Magic Leap device I could not tell you which of these extensions it supports and which performs better.

So I feel RD::vrs_method should be a toggle between "PER SUBPASS" and "PER PASS" with maybe the rendering driver having both a query that returns which one (or both) are supported, and maybe a preferred method returned that for now returns "PER PASS" if available. Then the rest follows through from that choice.

@DarioSamo
Copy link
Contributor Author

DarioSamo commented Nov 29, 2024

For context for the rest as we sorted out most of it over Rocket Chat, the conclusions we arrived to are:

  • We'll improve the VRS Method detection to have a nice structured template of how to do this according to the hardware it best works on.
  • The fragment size path won't be exposed and we'll fall back to 4x4 only for now, we might reintroduce this after VRS is ported over to RenderingDevice in a future iteration. This is very low impact as no hardware is known to report above 4x4 fragment size at the moment.
  • We'll handle any future refactorings for differences between APIs or future methods by just refactoring the driver and fixing the implementation on RenderingDevice when the time comes, but the impact should be fairly low as it's the only class that uses RenderingDeviceDriver at the moment.

@DarioSamo DarioSamo force-pushed the fragment-density-map branch 2 times, most recently from d6518a6 to cb6f2a5 Compare November 29, 2024 17:01
@DarioSamo
Copy link
Contributor Author

I feel the PR is in a pretty good state now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants