The image structures allow different access methods, with various levels of comfort and efficiency.
I compared them on a convolution routine, with a 5x5 kernel used on a 2048x2048 image, using the gprof profiler.
without compiler optimization
All the source was compiled with simple default options, including
-O0 to disable any compiler optimization.
% cumulative self self total
time seconds seconds calls s/call s/call name
44.95 40.53 40.53 10 4.05 4.05 convol_macro
30.39 67.93 27.40 10 2.74 2.74 convol_convert
23.22 88.87 20.94 10 2.09 2.09 convol_getset
1.44 90.17 1.30 main
So, the XX_XY(img, x, y) has the worst performances here, and
converting all the data to flt32 (convol_convert) before using
simple macros is worse than using type casting on each call (convol_getset).
with compiler optimization
All the source was compiled with -O2 compiler optimization.
% cumulative self self total
time seconds seconds calls s/call s/call name
43.95 27.67 27.67 10 2.77 2.77 convol_macro
36.63 50.73 23.06 10 2.31 2.31 convol_getset
18.71 62.51 11.78 10 1.18 1.18 convol_convert
0.71 62.96 0.45 main
First, the performance is increased by 32% to 57% for the generic macro and the conversion + specific macro versions, but strangely decreased for the get/set functions.
Overall, converting all the data to a known type and using predictable array access macros is the best solution, probably because the conversion is very efficient and the macros translate to a very clear syntax, easy to optimize.
and pointer arithmetics?
The macros still need one sum and one product for each data access; it could be simplified with pointer arithmetics, but for the moment these expected improvements have been mitigated by
- CPU limits : x86 platform has very few registers, so a complex loop requiring less a operations needing to keep track of more intermediate values may happen to be less efficient, finally, because it would need lots of memory movements in the CPU. calls is more likely to be efficiently optimized by the compiler.
- compiler optimization : a simple and clear code with few function calls is more likely to be efficiently optimized by the compiler.