castle-engine clean castle-engine compile --mode=valgrind
This document describes how to use Valgrind to profile your Castle Game Engine applications.
Note that this document is just a summary. For the full description, read the documentation of FPC, Valgrind and Callgrind manual:
If you use our build tool, just recompile your project like this:
castle-engine clean castle-engine compile --mode=valgrind
Otherwise (if you don’t use our build tool), you have to do it more manually. Configure your compilation options to
Make sure to use these options:
You MUST use -gv option, this adds stuff necessary for valgrind.
You SHOULD use -gl (line info) to get line number information.
You SHOULD NOT use -Xs (strip debug info), it would strip useful function info from your exe.
With the exception of the options mentioned above, everything else should be configured like for a release build. Otherwise you may find serious "time eaters" in code related to range or overflow checking, and they will skew your results. You want to profile the application version that you release to users, which should have range/overflow checks turned off (for maximum speed). See here for a description what are range and overflow checks.
If you compile on the command-line using direct fpc ...
command and @castle-fpc.cfg
file, then you can apply the options indicated above inside the castle-fpc.cfg
. Search castle-fpc.cfg
for Valgrind options and uncomment them. Be sure to also comment out -Xs
.
Make sure to recompile all the units. Call make clean
or castle-engine clean
or whatever other command you use to force recompiling all the code. Otherwise, you will not get profiling info inside some routines.
Use valgrind’s callgrind
tool.
Note that running program through callgrind adds an enormous slowdown, especially with instrumentation (this is when actual measurements take place). So it’s advised to start without instrumentation, and only turn it on for the interested code part.
valgrind --tool=callgrind --instr-atstart=no ./my-program # from other shell: callgrind_control -i on callgrind_control -i off # investigate the report: kcachegrind
There’s lots of useful information shown by kcachegrind
. Personally I found it easiest to look at the "Call Graph" tab. "Drill down" by moving in this graph (and clicking on routines) to find the bottleneck that you can fix.
Use valgrind’s massif
tool.
Run like
valgrind --tool=massif --run-libc-freeres=no ./my-program
There are some more useful Valgrind options, we have them in massif_fpc script in https://github.com/castle-engine/cge-scripts/blob/master/massif_fpc . So just get https://raw.githubusercontent.com/castle-engine/cge-scripts/master/massif_fpc , place it in your $PATH
, and then execute
massif_fpc ./my-program
Afterwards investigate the resulting massif.out.xxx file, by
ms_print massif.out.xxx > massif_output.txt
Open the massif_output.txt
in any text editor. It may look scary, but remain calm :)
You usually want to find the "peak" snapshot (moment when your application was using the most memory). You can find it looking at the Detailed snapshots
line:
Number of snapshots: 58 Detailed snapshots: [1, 2, ..., 42 (peak), 46, 52]
The "peak" is at 42nd snapshot in the example above. A graph (above the Number of snapshots
line) should confirm that this is the highest moment in time.
Then find the analysis of this "peak" in massif_output.txt
file, e.g. searching regexp ^ 42
.
Browse it, and the main "memory eater" should be visible.
Note that a memory may be allocated in some other library, e.g. inside OpenGL.
This often happens because you use a lot of texture memory. Use TextureMemoryProfiler
to analyze your texture memory usage. Use various optimization hints related to textures to decrease texture memory usage.
Valgrind is really powerful, and I advice getting familiar with it. But if it seems too difficult (or not available on your platform), there are other ways to profile speed and memory usage of your programs.
For example:
You can measure the speed of operations using TCastleProfiler
. It’s automatically used for various CGE loading operations (all you need to do is enable it, and show somewhere the report). It’s trivial to use it also for your own routines. The gathered times are grouped in a tree structure, so you can see what contributed to what.
You can measure the speed of your routines using ProcessTimer
or Timer
.
You can measure the memory usage of your textures using TextureMemoryProfiler
. It measures the memory usage on GPU, so it’s actually something very different than what massif
measures, and it makes sense independently if you use massif
or not.
On Windows: There is a Very Sleepy profiler (can also save profiles in the same format as Valgrind).
On Nintendo Switch: There is a special profiler (see Nintendo Switch closed docs for details).
On Linux: Aside from Valgrind, there’s also gprof. But its text output is much harder to follow than Valgrind output with kcachegrind visualization.
See the manual about optimization for more ideas.
As a general rule, avoid judging the speed "by a hunch". Our intuitions about "what is fast / what is slow" are often wrong, it’s always better to actually measure the thing you want to optimize. And optimized code is usually harder to read/maintain, so you will do wisely by optimizing only what is really necessary.
To improve this documentation just edit this page and create a pull request to cge-www repository.
Copyright Michalis Kamburelis and Castle Game Engine Contributors.
This webpage is also open-source and we welcome pull requests to improve it.
We use cookies for analytics. See our privacy policy.