Last Update:
Simple Performance Timer
Table of Contents
When you’re doing a code profiling session it’s great to have advanced and easy to use tools. But what if we want to do some simple test/benchmark? Maybe a custom code would do the job?
Let’s have a look at simple performance timer for C++ apps.
Intro
A task might sound simple: detect what part of the code in the ABC module takes most of the time to execute. Or another case: compare execution time between the Xyz algorithm and Zyx.
Sometimes, instead of using and setting up advanced profiling tools I just use my custom profiling code. Most of the time I only need a good timer and a method to print something on the screen/output. That’s all. Usually it can be enough for most of cases… or at least a good start for more deep and advanced profiling session.
Little Spec
What do we want?
- I’d like to measure execution time of any function in my code and even part of a routine.
- The profiling code that needs to be added to routines must be very simple, ideally just one line of additional code.
- There should be a flag that will disable/enable profiling globally
Timer
Good timer is the core of our mechanism.
Here is a brief summary of available options:
- RDTSC instruction - it returns number of CPU cycles since the reset, 64 bit variable. Using this instruction is very low-level, but probably this is not what we need. Cpu cycles aren’t steady time events: power saving, context switching… See an interesting read from RandomAscii: rdtsc in the Age of Sandybridge.
- High performance timer on Windows - see Acquiring high-resolution time stamps. It gives highest possible level of precision (<1us).
GetTickCount
- 10 to 16 milliseconds of resolutiontimeGetTime
- uses system clock (so the same resolution as GetTickCount), but resultion can be increased up to even 1ms (via timeBeginPeriod). See full comparision between GetTickCount vs timeGetTime at RandomASCII blogstd::chrono
- finally, there are timers from STL library!system_clock
- system timesteady_clock
- monotonic clock, see the diff between system_clock at this SO questionhigh_resolution_clock
- highest possible resolution, multiplatform! Warning: it might be alias for system or steady clock… depending on the system capabilities.
Obviously we should generally use std::high_resolution_clock
,
unfortunately it’s not working as expected in VS2013 (where I developed
the original solution).
This is fixed in VS 2015: see this blog post from vs
team.
In general, if you’re using the latest compilers/libraries then
std::chrono
will work as expected. If you have some older tools, then
it’s better to double-check.
Output
Where do we want to get the results? In simple scenarios we might just use printf/cout. Other option is to log directly to some log file or use Debug View.
Performance cost
Measuring some effect can alter the results. How much elapsed time is affected by our profiling code? If it takes proportionally quite a long time (against the code that we measure) we might need to defer the process somehow.
For example if I want to measure execution time for just a small method that runs in a few microseconds, writing output to a file (each time the method is called) might be longer then the whole function!
So we can measure just the elapsed time (assuming that it’s very fast) and defer the process of writing the data to output.
Solution
Simple as it is:
void longFunction()
{
SIMPLEPERF_FUNCSTART;
SIMPLEPERF_START("loop ");
for (int i = 0; i < 10; ++i)
{
SIMPLEPERF_SCOPED("inside loop ");
//::Sleep(10);
internalCall();
}
SIMPLEPERF_END;
}
which shows at the end of the program:
main : 14837.797000
longFunction : 0.120000
loop : 0.109000
inside loop : 0.018000
internalCall : 0.008000
inside loop : 0.011000
internalCall : 0.009000
...
inside loop : 0.005000
internalCall : 0.002000
shortMethod : 15.226000
loop : 15.222000
We have 3 basic macros that can be used:
* SIMPLEPERF_FUNCSTART
- just put it at the beginning of the
function/method. It will show the name of the function and print how
long it took to execute
* SIMPLEPERF_SCOPED(str)
- place it at the beginning of a scope
* SIMPLEPERF_START(str)
- place it inside a function, as a custom
marker, where you don’t have a scope opened.
* SIMPLEPERF_END
- need to close SIMPLEPERF_START
* Plus:
* add #include "SimplePerfTimer.h
* enable it by setting #define ENABLE_SIMPLEPERF
(also in
SimplePerfTimer.h
for simplicity)
Additionally the code supports two modes:
- Immediate: will print just after the elapsed time is obtained. Printing might affect some performance.
- Retained: will collect the data so that it can be shown at the end of the program.
In retained mode we can call:
SIMPLEPERF_REPORTALL
- show the current dataSIMPLEPERF_REPORTALL_ATEXIT
- will show the data but aftermain()
is done. Can be called any time in the program actually.
The flag #define SIMPLEPERF_SHOWIMMEDIATE true
need to be set to
true
to use retained mode.
Problems
The whole timer might not work in multicore, multithreaded code since it does not use any critical sections to protected shared data, or doesn’t care about thread that the code is running. If you need more advanced timer then you will be interested in article at Preshing on Programming: A C++ Profiling Module for Multithreaded APIs.
Implementation details
github repo: github.com/fenbf/SimplePerfTimer
The core idea for the timer is to use destructor to gather the data. This way when some timer object goes out of the scope we’ll get the data. This is handy especially for whole functions/explicit scopes.
{ // scope start
my_perf_timer t;
}
In a basic immediate form the timer just saves time (using
QueryPerformanceCounter
) in the constructor and then in the destructor
measures end time and prints it to the output.
In the retained mode we also need to store that data for the future use. I simply create a static vector that adds a new entry in the constructor and then fills the final time in the destructor. I also take care about indents, so that the output looks nicely.
In the repo there is also a header only version (a bit simplified, using only immediate mode): see SimplePerfTimerHeaderOnly.h
Here is a picture showing timer results in Debug view:
Todo
- Add file/line information when printing the data?
- Use std::chrono for VS2015/GCC version
Summary
This post described a handy performance timer. If you just need to check
execution time of some code/system just include a header (+and add
related .cpp file) and use SIMPLEPERF_FUNCSTART
or
SIMPLEPERF_START(str)/END
in analysed places. The final output should
help you find hotspots… all without using advanced tools/machinery.
Once again the repo: github.com/fenbf/SimplePerfTimer
Resources
- MSDN: Acquiring high-resolution time stamps
- MSDN: Game Timing and Multicore Processors
- Preshing on Programming: A C++ Profiling Module for Multithreaded APIs
- codeproject: Timers Tutorial
- StackOverflow: resolution of std::chrono::high_resolution_clock doesn’t correspond to measurements
I've prepared a valuable bonus for you!
Learn all major features of recent C++ Standards on my Reference Cards!
Check it out here: