Occlusion Query

1. Introduction

Castle Game Engine can utilize GPU occlusion query to avoid rendering things that are completely obscured by other things.

A typical scenario is when you move in a city with tall buildings, and camera is at a ground level. E.g. a game when you walk in a city, ride a car/tank/whatever in a city. In a typical view in such game, your viewing frustum includes many 3D things — but most of them are actually obscured by the building in front of you.

Occlusion Query effect

It is really trivial to use it: just set on TCastleScene the MyScene.RenderOptions.OcclusionQuery to true. You can do this at runtime (and change at any moment), you can also set it in the CGE editor.

Be sure to measure FPS (see https://castle-engine.io/manual_optimization.php ) to see how effective it is.

2. How it works

For every shape within TCastleScene (where you set MyScene.RenderOptions.OcclusionQuery = true) we look at whether the shape was detected as "visible" by occlusion query in the previous frame (if it was not tested, e.g. because this is the 1st render, we assume it is visible).

  1. If the shape was visible in the previous frame, then we render it as usual in the current frame.

  2. If the shape was not visible in the previous frame, we render merely the bounding box of the shape (but this box is not actually visible in the color buffer — we render it only for the "occlusion query").

In both cases (rendering actual shape, rendering bounding box) we surround the render call with a special OpenGL(ES) calls such that the visibility flag in updated based on whether the shape (or it’s bounding box) is visible.

Note that this algorithm happens after the frustum culling has eliminated shapes that are definitely not visible because they are not within the viewing frustum. That is, frustum culling (that we do both per-scene and per-shape, by default) works as it did, and it remains a very useful algorithm — eliminating things earlier, with less hassle. The occlusion query optimization only helps to eliminate shapes that are within the frustum, but are obscured by something else.

3. Why (and when) does this make the rendering faster

  • The only overhead we add is making the "occlusion query" (updating the visibility flag), which is done using hardware-accelerated ARB_occlusion_query.

  • On the plus side, the gain is that often we can render merely the bounding box (instead of the actual shape). This is a big gain, if you have lots of "heavy" shapes, that are typically obscured.

This often results in a performance gain.

4. What it means for your scenes and shapes?

  • Visibility of each shape is considered separately. So the "granularity" of how you split your scene into shapes matters.

  • The detection "what is obscured" takes into account other (rendered earlier) scenes. As the detection is done by GPU, everything that was in the Z-buffer is taken into account.

    For best results, you may want to sort (maybe not every frame, but once in a while) your scenes in the order "from front to back" (this is the inverse order than done by TCastleTransform.SortBackToFront). That’s because you want to render the "things that may obscure other things" earlier. An example how to do this:

var
  { Has to be global, since TObjectList.Sort
    requires normal function (not "of object"). }
  SortCameraPosition: TVector3;

function CompareFrontToBack3D(A, B: Pointer): Integer;
begin
  // use TBox3D.CompareBackToFront3D and invert operands order
  Result := TBox3D.CompareBackToFront3D(TCastleTransform(B).BoundingBox, TCastleTransform(A).BoundingBox,
    SortCameraPosition);
end;

procedure SortTransformChildrenFrontToBack(const Items: TCastleTransform);
begin
  SortCameraPosition := Viewport.Camera.Position;
  Items.Sort(@CompareFrontToBack3D);
end;

procedure TCastleTransform.SortBackToFront2D;
begin
  SortBackToFront(bs2D, TVector3.Zero);
end;
  • As the visibility information is associated with shapes, this algorithm doesn’t work correctly when you instantiate the scene many times. While in general we allow to use the same instance of TCastleScene or TCastleTransform multiple times within one viewport, but the occlusion query has to be turned off on such scene.

  • The algorithm should handle OK shapes that are partially transparent (using blending). Such shapes do not obscure other things, but they can be obscured by other (opaque) shapes — this works correctly with the algorithm.

5. Problems

The biggest problem:

  • Sometimes you may see a lag of 1 frame when the object is not rendered, but it should be. This happens when the shape (or even it’s bounding box) was not visible in the previous frame, but now it is visible. If your game runs at 60 FPS, it means that 3D shapes may appear with a delay of 1/60 of the second.

    This is usually not noticeable. If need be, we have ideas how to enable some tweaking to fight with this problem on a per-scene basis:

    • RenderOptions.OcclusionQueryAlwaysCheckBox: Boolean

    • RenderOptions.OcclusionQueryEnlargeBox: Single

    • maybe even RenderOptions.OcclusionQueryFramesToHide: Integer

    Neither of the proposed properties above are implemented for now. You’re welcome to report if you see this problem (1 frame lag) in a practical use-case, we can implement them then.

Minor problems:

  • Sometimes object state flips between "visible" and "not visible", making uneven frame render times. This happens when the proper shape is obscured, but its bounding box is not obscured. The issue can in general be ignored — user doesn’t see any "flipping".

  • The algorithm, in its current form, is incompatible with doing DynamicBatching, so we just disable dynamic batching optimization on scenes where the occlusion query is active. Why?

    • Dynamic batching merges many shapes into one. This means that, at the very least, it would make the occlusion query less effective — only the "large" (merged) shapes are tested.

    • The implementation would need to be extended, to propagate visibility test results from the merged shape -> into original shapes. This makes complicated code, for a small gain.

6. Alternative: Coherent Hierarchical Culling

The page above described the currently advised approach, done by MyScene.RenderOptions.OcclusionQuery. We have an alternative approach implementing "Coherent Hierarchical Culling" activated by MyScene.RenderOptions.HierarchicalOcclusionQuery. But, sadly, at least for now, we cannot advise this alternative. While the "Coherent Hierarchical Culling" avoids the 1-frame-lag when object is shown, but the algorithm gets slower (and quite more complicated), making it impractical to use — at least in the current implementation, based on our tests.


To improve this documentation just edit the source of this page in AsciiDoctor (simple wiki-like syntax) and create a pull request to Castle Game Engine WWW (cge-www) repository.