Have you compared your optimised version of 0.10.0b to what's now in 0.11.0b ? I'm curious what your benchmarking will reveal.. when I get a chance I'm definitely going to take a closer look at it.
I just compared the 0.11.0b version of imdct36() with mine:
gcc -O1 : 1329 clocks (asm mul_f), 1529 (C mul_f) gcc -O2 : 1607 clocks (asm mul_f), 2176 (C mul_f) gcc -O3 : 1608 clocks (asm mul_f), 2186 (C mul_f)
All of which beat my best of 2215 clocks. It could be closer on an embedded processor with a smaller cache though.
The raw output isn't identical to the older version, but i guess this is just different rounding errors (?).
My x86 assembler knowledge isn't to good so I haven't really looked at why gcc seems to be so much worse with optimisations above -O1. As well as being slower, code size almost doubles e.g. 10448 bytes (-O3) against 5391 (-O1) for the latest imdct36() using a C mul_f, so maybe its a problem with optimisation in my version of gcc (the default egcs-2.91.66 installed with RH6.2)
Does arm-elf-gcc behave in the same way ??
On a different subject, does anyone have access to an ARM
platform
for testing ?? I would be interested to know how the MAD code (with and without my changes) compares to ARM's own mp3 decode library (which claims to use only 29 MHz of cpu bandwidth for
real
time decode on an ARM7 core).
I have access to a StrongARM 1100 and a 110, but not to anything else. I don't think I have access to ARM's MP3 decoding library, so I can't do any comparisons against it. However, I can evaluate your improvements to MAD alone on the StrongARM.
ARM's MP3 decode library (see http://www.arm.com/SoftSys/as022.html ) isn't available for free so it might be difficult to get hold of a copy, but if the claims are true (ie 20 MHz cpu bandwidth for realtime decode on a StrongARM using 27k of code) it looks like being quite a good benchmark to compare MAD against... Is it anywhere near yet ??
If not (to answer lwong's questions...) its probably due to:
1) ARM's library being written with all critical sections in assembler by ARM engineers who know the architecture inside out.
2) ARM's library may use more approximate calculations in some places.
3) Any code compiled from C will have used ARM's own c compiler which from what I've heard seems to be at least 20 percent better than gcc.
____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie