--- Dan Malek dan@mvista.com wrote:
So, my idea was to modify imdct36 so the mad_f_mul + mad_f_mul .... sequences were replaced by 64-bit multiply/accumulates, then you just round/scale once at the end (like I would do on a DSP). This improved the performance by about 6%, and I ended up with great compiler generated code. For example:
t6 = mad_f_mul(X[4], 0x0ec835e8L) + mad_f_mul(X[13], 0x061f78aaL);
becomes:
macreg = 0; mad_f_mac(macreg, X[4], 0x0ec835e8L); mad_f_mac(macreg, X[13], 0x061f78aaL); t6 = mad_f_macscale(macreg);
Cool. There are actually macros to support this already - I did basically the same thing for imdct_s() in the patch I posted a link to previously. It's also in the ARM assembler imdct_l() function, but for some reason hasn't made it back into the C version yet (although I'm sure was on the TODO list at one stage ??).
I guess I need to run some official bit streams, but it sounds OK. This could certainly be the PowerPC optimization. I'll send a patch to someone if they would like to see it.
Straight to the list is good - I guess the patch won't be huge. Final say on if/when/inwhatform it gets rolled into the main code is down to Rob though.
Andre --
____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie