Fast Post-Processing on the Oculus Quest and Unity

One of the earliest effects I implemented for The Last Clockwinder was HDR Tonemapping and Color Grading. We had decided from an early stage that the game was going to be lit entirely with static lightmappingThe entire environment is lit with a single, massive bounced area light!, and post processing goes hand-in-hand with that approach.

If you’re a VR developer you might find that surprising!

Generally, post processing isn’t viable on platforms like the Oculus Quest. Rendering a second pass requires you to first resolve your intermediate framebuffer to main memory and then read it back again. This can eat up 20-30% of your frame budget just waiting on memory!

Additionally, at least in Unity, MSAA and Foveated Rendering are only supported on the backbuffer, and not on intermediate buffers. Obviously, these are non-starters for efficient rendering.

A simple trick: move Post Post-Processing to the Forward Pass

Luckily, we can apply a trick here. Any post-processing that is independent per pixelEx: Color Grading, Tonemapping, Vignette, Film Grain. But not Blur, Bloom. can be moved to run directly in the forward geometry pass, rendered right at the end of the pixel shader. The pixel shaders calculate with 16bit HDR values, and are tone-mapped down to 8bit RGB just before outputting the final pixel color.

In our case, I modified the Unity URP forward renderer and Shader Graph packages to enable this tweak with a checkbox in the render pipelines settings asset. If you’re interested in replicating this you can take a look at the PR here in our public branch of the Unity Graphics repo.

// Used in Standard (Physically Based) shader
half4 LitPassFragment(Varyings input) : SV_Target
{
    UNITY_SETUP_INSTANCE_ID(input);
    UNITY_SETUP_STEREO_EYE_INDEX_POST_VERTEX(input);
  
    half4 color = UniversalFragmentPBR(inputData, surfaceData.albedo, surfaceData.metallic, surfaceData.specular, surfaceData.smoothness, surfaceData.occlusion, surfaceData.emission, surfaceData.alpha);
    // ... snipped for space
    color.rgb = MixFog(color.rgb, inputData.fogCoord);

    // (ASG) Add tonemapping and color grading in forward pass.
    // This uses the same color grading function as the post processing shader.
#ifdef _COLOR_TRANSFORM_IN_FORWARD
    color.rgb = ApplyColorGrading(color.rgb, _Lut_Params.w, TEXTURE2D_ARGS(_InternalLut, sampler_LinearClamp), _Lut_Params.xyz, TEXTURE2D_ARGS(_UserLut, sampler_LinearClamp), _UserLut_Params.xyz, _UserLut_Params.w);
#endif

    // Return linear color. Conversion to sRGB happens automatically through the target texture format.
    // (ASG) Note: sRGB conversion *must* be done in hardware, so that filtering / msaa
    // averaging is done properly in linear space, rather than in sRGB space.
    return color;
}

I’ve heard of a few other developers applying this technique, so I suspect that most of the AAA-looking games on the Quest store are doing something similar. I’d be curious to hear of any others doing something similar!

Downsides

Of course, there are some downsides, which are good reasons not to have this on by default for all Unity projects.

Incorrect Transparent Blending

In a standard two-pass HDR pipeline, blending transparent objects with the pixels behind it occurs on the HDR frame buffer. This means it happens in a linear RGB color space.If you’re not familiar with linear/non-linear RGB, you might check out my Strangeloop talk <a href=”https://www.youtube.com/watch?v=AS1OHMW873s”>here</a> on color science. Moving tone-mapping to the forward pass means that blending happens after tone-mapping.

But tone-mapping is a non-linear function. And so applying a linear blend operation after tone-mapping is not equivalent to applying the blend before tone-mapping.

The ACES filmic tone-mapping curve. It’s a non-linear transformation! (image: Chris Brejon)

Pragmatically, this means that your transparent objects will look a little bit different than they would in a traditional HDR pipeline, especially if you’re additively layering effects such that the final brightness starts to drift into either shoulder of the ACES S-Curve.Theoretically there should also be some incorrect blending around the anti-aliased edges of geometry. However, I’ve never noticed this in practice.

In our case, this effect was entirely unnoticeable. We don’t use many transparent materials to begin with, and none of them were very bright or dark. And for the vast majority of projects, you could adjust for these affects in your materials.

More Expensive Pixel Shaders

We’ve added a handful of math operations (Tonemapping) and a texture LUT read (Color Grading), so naturally our pixel shaders are a bit more expensive. And it’s more expensive to do this work in the pixel shader than a separate pass:

  • If you’re using MSAA 4x, any pixels hit by MSAA (pixels on the edges of triangles) will be executed 4 times for every final pixel. A standard post-process pass only executes once per pixel.

  • The GPU renders in quads, and some of these pixels are discarded. So you will be running post processing for these discarded pixels as well. This is called Quad Overdraw.

  • Inefficient draw ordering. If your objects aren’t perfectly sorted, pixel shaders may execute for occluded pixels, before being rendered over by later geometry. This is highly hardware specific. For example, the Qualcomm Snapdragon XR2 in the Quest 2 implements a Low Resolution Z-pass (aka. “Order independent depth testing”). However, it’s disabled for alpha cutout shaders, and only applies conservatively because of the low resolution.

That said, the cost of these additional operations is still quite low, especially if the LUT fits in cache.

Conclusion

The nice thing about our implementation is that we can continue to use all of the standard Post Processing Volume components and configuration on the Unity side. No changes needed to happen except in the renderer.

Looking forward to the future, Vulkan Subpasses should make this kind of work much easier. But until then, rolling it into the forward pass worked well for us, and I’d recommend it for anyone on a tile-based GPU platform.