--- Nicolas Pitre nico@cam.org wrote:
On Wed, 25 Oct 2000, Andre wrote:
All results are in clock cycles on a Pentium III:
gcc version egcs-2.91.66 (ie stock RH 6.2):
gcc -O1 : 1329 clocks (asm mad_f_mul), 1529 clocks (C mad_f_mul) gcc -O2 : 1607 clocks (asm mad_f_mul), 2176 clocks (C mad_f_mul) gcc -O3 : 1608 clocks (asm mad_f_mul), 2186 clocks (C mad_f_mul)
gcc version 2.95.2 19991024:
gcc -O0 : 2946 clocks (asm mad_f_mul), 2616 clocks (C mad_f_mul) gcc -O1 : 1735 clocks (asm mad_f_mul), 1201 clocks (C mad_f_mul) gcc -O2 : 1380 clocks (asm mad_f_mul), 1567 clocks (C mad_f_mul) gcc -O3 : 1380 clocks (asm mad_f_mul), 1567 clocks (C mad_f_mul)
This is strange.
The gcc-2.95.2 results are somewhat different on a StrongARM. They are more inline with the egcs-2.91.66 results above...
In terms of clock cycles, I would expect a (single issue) StrongARM to require a lot more than a (super-scalar) Pentium III for a given C function.
Unfortunately though, I don't have a suitable ARM platform to do any gcc testing with, so haven't been able to test this. I have a copy of ARM's 'Armulator' simulator which gives very good profiling information, but only runs code compiled with ARM's C compiler. Unfortunately, ARM's C compiler makes such a bad job of 64bit data types that I don't think any MAD benchmarks generated this way are very useful as a guide to how fast the code would run when compiled with gcc.
I do have an ARM Integrator board (ARM720T cpu), which I guess should be able to run Linux, but I haven't yet investigated the effort required to get it to do so....
Out of interest, the imdct36() part of my ARM assembler imdct36 + windowing function requires just under 1100 clock cycles on the simulator. So with a bit of assembler tweaking, a humble StrongARM can be faster than a Pentium III :-)
Andre --
____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie