How to "profile" (analyze time and memory usage) your Castle Game Engine applications

1. Introduction

This document describes how to use Valgrind to profile your Castle Game Engine applications.

Note that this document is just a summary. For the full description, read the documentation of FPC, Valgrind and Callgrind manual:

2. Compiling

If you use our build tool, just recompile your project like this:

castle-engine clean
castle-engine compile --mode=valgrind

Otherwise (if you don’t use our build tool), you have to do it more manually. Configure your compilation options to

  • Make sure to use these options:

    • You MUST use -gv option, this adds stuff necessary for valgrind.

    • You SHOULD use -gl (line info) to get line number information.

    • You SHOULD NOT use -Xs (strip debug info), it would strip useful function info from your exe.

  • With the exception of the options mentioned above, everything else should be configured like for a release build. Otherwise you may find serious "time eaters" in code related to range or overflow checking, and they will skew your results. You want to profile the application version that you release to users, which should have range/overflow checks turned off (for maximum speed). See here for a description what are range and overflow checks.

  • If you compile on the command-line using direct fpc ... command and @castle-fpc.cfg file, then you can apply the options indicated above inside the castle-fpc.cfg. Search castle-fpc.cfg for Valgrind options and uncomment them. Be sure to also comment out -Xs.

  • Make sure to recompile all the units. Call make clean or castle-engine clean or whatever other command you use to force recompiling all the code. Otherwise, you will not get profiling info inside some routines.

3. Analyze speed

Use valgrind’s callgrind tool.

Note that running program through callgrind adds an enormous slowdown, especially with instrumentation (this is when actual measurements take place). So it’s advised to start without instrumentation, and only turn it on for the interested code part.

valgrind --tool=callgrind --instr-atstart=no ./my-program

# from other shell:
callgrind_control -i on
callgrind_control -i off

# investigate the report:

There’s lots of useful information shown by kcachegrind. Personally I found it easiest to look at the "Call Graph" tab. "Drill down" by moving in this graph (and clicking on routines) to find the bottleneck that you can fix.

4. Analyze memory usage

Use valgrind’s massif tool.

Run like

valgrind --tool=massif --run-libc-freeres=no ./my-program

There are some more useful Valgrind options, we have them in massif_fpc script in . So just get , place it in your $PATH, and then execute

massif_fpc ./my-program

Afterwards investigate the resulting file, by

ms_print > massif_output.txt

Open the massif_output.txt in any text editor. It may look scary, but remain calm :)

  • You usually want to find the "peak" snapshot (moment when your application was using the most memory). You can find it looking at the Detailed snapshots line:

      Number of snapshots: 58
       Detailed snapshots: [1, 2, ..., 42 (peak), 46, 52]

    The "peak" is at 42nd snapshot in the example above. A graph (above the Number of snapshots line) should confirm that this is the highest moment in time.

  • Then find the analysis of this "peak" in massif_output.txt file, e.g. searching regexp ^ 42.

    Browse it, and the main "memory eater" should be visible.

  • Note that a memory may be allocated in some other library, e.g. inside OpenGL.

    This often happens because you use a lot of texture memory. Use TextureMemoryProfiler to analyze your texture memory usage. Use various optimization hints related to textures to decrease texture memory usage.

5. Alternative profiling methods, without Valgrind

Valgrind is really powerful, and I advice getting familiar with it. But if it seems too difficult (or not available on your platform), there are other ways to profile speed and memory usage of your programs.

For example:

  • You can measure the speed of operations using TCastleProfiler. It’s automatically used for various CGE loading operations (all you need to do is enable it, and show somewhere the report). It’s trivial to use it also for your own routines. The gathered times are grouped in a tree structure, so you can see what contributed to what.

  • You can measure the speed of your routines using ProcessTimer or Timer.

  • You can measure the memory usage of your textures using TextureMemoryProfiler. It measures the memory usage on GPU, so it’s actually something very different than what massif measures, and it makes sense independently if you use massif or not.

  • On Windows: There is a Very Sleepy profiler (can also save profiles in the same format as Valgrind).

  • On Nintendo Switch: There is a special profiler (see Nintendo Switch closed docs for details).

  • On Linux: Aside from Valgrind, there’s also gprof. But its text output is much harder to follow than Valgrind output with kcachegrind visualization.

  • See the manual about optimization for more ideas.

As a general rule, avoid judging the speed "by a hunch". Our intuitions about "what is fast / what is slow" are often wrong, it’s always better to actually measure the thing you want to optimize. And optimized code is usually harder to read/maintain, so you will do wisely by optimizing only what is really necessary.

To improve this documentation just edit this page and create a pull request to cge-www repository.