Improve SSR raymarching performance #99693
Open
+152
−240
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR brings a major rewrite of the Screen Space Reflection raymarching code, targetting performance optimization :
Implements a DDA algorithm that marches the ray simultaneously in ndc and homogeneous view space, as described in "Efficient GPU Screen-Space Ray Tracing" (Morgan McGuire and al.).
Produces a linear depth buffer during the scale pre-pass (this was actually already the case for the single-eye setup, but not for VR). In conjunction with homogeneous view space marching, this removes the need for any reprojection in the ray marching loop.
Removes normal-roughness buffer fetches during marching, utilized to perform backface culling. This is now performed by comparing the current and previous samples' depth and rejecting hits when the ray exits the volume.
Solves 2 issues :
Projection::get_z_far()
orProjection::is_orthogonal()
). These can break under certain circumstances, typically when the zfar / znear ratio is very large, the projection matrix becomes infinite and it's not possible to extract zfar anymore from itHopefully improves code readability and establishes a good foundation for further improvements. A few ideas I leave to further PRs :
Visual differences
Cube roughness is 0.2, floor roughness is 0.0.
Depth threshold is 0.1.
Raymarching 512 steps.
This is the single-eye case. Any help to test it in VR is welcome.
Also, any test with more complex scenes would be appreciated.
Performance improvements
These should be material for both single-eye and VR setups. Although I couldn't get it statistically measured (I couldn't sort out yet the render graph messing up debug markers, despite active support from @clayjohn @Ansraer and @DarioSamo), the GPU traces below show clues of a ~20% compute time reduction.
In this context, please take the below with a grain of salt as it is my interpretation (also please ignore the markers) :
Any help on making this analysis stronger is welcome.
Top chart : before
Bottom chart : with this PR