Occlusion Culling

1. Introduction

Castle Game Engine can perform occlusion culling to avoid rendering things that are completely obscured by other things.

A typical scenario is when you move in a city with tall buildings, and camera is at a ground level. E.g. a game when you walk in a city, ride a car/tank/whatever in a city. In a typical view in such game, your viewing frustum includes many 3D things — but most of them are actually obscured by the building in front of you.

Occlusion Query effect

It is really trivial to use it: just set TCastleViewport.OcclusionCulling to true. You can do this at runtime (and change at any moment), you can also set it in the CGE editor.

Be sure to measure how effective this is:

  • Display FPS (see manual about FPS)

  • Display MyViewport.Statistics.ToString to observe how many shapes and scenes were effectively rendered.

2. Example

Example using this feature (it works both on OpenGL and OpenGLES) is in examples/viewport_and_scenes/occlusion_culling.

Occlusion Culling Example Occlusion Culling Example in Editor Occlusion Culling Example on Android

3. How it works

For every shape we check whether the shape was detected as "visible" by occlusion query in the previous frame.

  1. If the shape was visible in the previous frame, then we render it as usual in the current frame.

  2. If the shape was not visible in the previous frame, we render merely the bounding box of the shape (but this box is not actually visible in the color buffer — we render it only for the "occlusion query").

In both cases (rendering actual shape, rendering bounding box) we surround the render call with a special commands such that the visibility flag in updated based on whether the shape (or it’s bounding box) is visible, i.e. some pixels pass the depth test.

Note that this algorithm happens after the frustum culling has eliminated shapes that are definitely not visible because they are not within the viewing frustum. That is, frustum culling (that we do both per-scene and per-shape, by default) works as it did, and it remains a very useful algorithm — eliminating things earlier, with less hassle. The primary job of occlusion query is to eliminate shapes that are within the frustum, but are obscured by something else.

4. Why (and when) does this make the rendering faster

  • The only overhead we add is making the "occlusion query" (updating the visibility flag), which is a feature in modern OpenGL and OpenGLES.

  • On the plus side, the gain is that often we can render merely the bounding box (instead of the actual shape). This is a big gain, if you have lots of "heavy" shapes, that are typically obscured.

This often results in a significant performance gain.

5. How to make occlusion culling most efficient

5.1. Use also occlusion sorting

The detection "what is obscured" takes into account rendered earlier scenes. As the detection is done by GPU, everything that was in the Z-buffer is taken into account.

So for best results, sort your opaque shapes (that write to depth buffer) in the order "from front to back". That’s because you want to render the "things that may obscure other things" earlier.

Do this by setting TCastleViewport.OcclusionSort to sort2D or sort3D. See blending for description about possible sorting algorithms.

For demo that it works, test our examples/viewport_and_scenes/occlusion_culling in the CGE editor.

  • Use menu item "Edit → Show Statistics" (F8).

  • Observe that:

    • OcclusionCulling = false and OcclusionSort = sortNone results in most shapes being rendered (like 300).

    • OcclusionCulling = true and OcclusionSort = sortNone is better, when looking at a building wall obscuring most city you can easily have only 30 shapes rendered.

    • OcclusionCulling = true and OcclusionSort = sort3D is even better. Looking at a building wall obscuring most city you can easily have only a few shapes rendered.

The default TCastleViewport.OcclusionSort value, sortAuto, for now is equivalent to sortNone. It may change in the future to perform sorting esp. when TCastleViewport.OcclusionCulling is true, as occlusion culling and sorting are a natural pair, sorting make occlusion culling even more effective.

5.2. Other tips

  • Visibility of each shape is considered separately. So the "granularity" of how you split your scene into shapes matters.

  • As the visibility information is associated with shapes, this algorithm wouldn’t work correctly when you instantiate the scene many times. For this reason, the occlusion culling is not used for shapes that are within multiple instances of TCastleScene or TCastleTransform in one viewport.

  • The algorithm handles OK shapes that are partially transparent (using blending). Such shapes do not obscure other things, but they can be obscured by other (opaque) shapes — this works correctly with the algorithm.

6. Possible issues to be aware of

The biggest problem:

  • Sometimes you may see a lag of 1 frame when the object is not rendered, but it should be. This happens when the shape (or even it’s bounding box) was not visible in the previous frame, but now it is visible. If your game runs at 60 FPS, it means that 3D shapes may appear with a delay of 1/60 of the second.

    This is usually not noticeable. If need be, we have ideas to fight with this problem:

    • OcclusionQueryEnlargeBox: Single (make bbox larger, so that shape is considered visible sooner than it strictly has to be)

    • OcclusionQueryFramesToHide: Integer (do not hide the shape as soon as it is invisible for 1 frame)

    Neither of the proposed properties above are implemented for now. You’re welcome to report if you see this problem (1 frame lag) in a practical use-case, we can implement them then.

Minor problems:

  • Sometimes object state flips between "visible" and "not visible", making uneven frame render times. This happens when the proper shape is obscured, but its bounding box is not obscured. The issue can in general be ignored — user doesn’t see any "flipping".

    If the issue becomes more important, we could add in the future OcclusionQueryAlwaysCheckBox: Boolean (to always check bbox visibility, not actual shape, even when shape is visible). Above-mentioned OcclusionQueryFramesToHide would also help with this.

  • The algorithm is incompatible with doing DynamicBatching, so we just disable dynamic batching optimization on scenes where the occlusion query is active. Why?

    • Dynamic batching merges many shapes into one. This means that, at the very least, it would make the occlusion query less effective — only the "large" (merged) shapes are tested.

    • The implementation would need to be extended, to propagate visibility test results from the merged shape -> into original shapes. This makes complicated code, for a small gain.

To improve this documentation just edit this page and create a pull request to cge-www repository.