Contenu | Menu

The image structures allow different access methods, with various levels of comfort and efficiency.

I compared them on a convolution routine, with a 5x5 kernel used on a 2048x2048 image, using the gprof profiler.

without compiler optimization

All the source was compiled with simple default options, including -O0 to disable any compiler optimization.

 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
44.95     40.53    40.53       10     4.05     4.05  convol_macro
30.39     67.93    27.40       10     2.74     2.74  convol_convert
23.22     88.87    20.94       10     2.09     2.09  convol_getset
 1.44     90.17     1.30                             main

So, the XX_XY(img, x, y) has the worst performances here, and converting all the data to flt32 (convol_convert) before using simple macros is worse than using type casting on each call (convol_getset).

with compiler optimization

All the source was compiled with -O2 compiler optimization.

 %   cumulative   self              self     total           
time   seconds   seconds    calls   s/call   s/call  name    
43.95     27.67    27.67       10     2.77     2.77  convol_macro
36.63     50.73    23.06       10     2.31     2.31  convol_getset
18.71     62.51    11.78       10     1.18     1.18  convol_convert
 0.71     62.96     0.45                             main

First, the performance is increased by 32% to 57% for the generic macro and the conversion + specific macro versions, but strangely decreased for the get/set functions.

Overall, converting all the data to a known type and using predictable array access macros is the best solution, probably because the conversion is very efficient and the macros translate to a very clear syntax, easy to optimize.

and pointer arithmetics?

The macros still need one sum and one product for each data access; it could be simplified with pointer arithmetics, but for the moment these expected improvements have been mitigated by