--- Dan Malek dan@mvista.com wrote:
So, my idea was to modify imdct36 so the mad_f_mul + mad_f_mul .... sequences were replaced by 64-bit multiply/accumulates, then you just round/scale once at the end (like I would do on a DSP). This improved the performance by about 6%, and I ended up with great compiler generated code. For example:
t6 = mad_f_mul(X[4], 0x0ec835e8L) + mad_f_mul(X[13], 0x061f78aaL);
becomes:
macreg = 0; mad_f_mac(macreg, X[4], 0x0ec835e8L); mad_f_mac(macreg, X[13], 0x061f78aaL); t6 = mad_f_macscale(macreg);
Cool. There are actually macros to support this already - I did basically the same thing for imdct_s() in the patch I posted a link to previously. It's also in the ARM assembler imdct_l() function, but for some reason hasn't made it back into the C version yet (although I'm sure was on the TODO list at one stage ??).
I guess I need to run some official bit streams, but it sounds OK. This could certainly be the PowerPC optimization. I'll send a patch to someone if they would like to see it.
Straight to the list is good - I guess the patch won't be huge. Final say on if/when/inwhatform it gets rolled into the main code is down to Rob though.
Andre --
____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie
Andre wrote:
There are actually macros to support this already
Yeah, after I did this, Nicolas Pitre pointed out the MLA macros. For PowerPC, I can make the MLA macros map into the "macreg" stuff I did in the layer3.c file. I was wondering about changing this custom hack I just did in layer3.c into the MLA macros. The imdct36 modificiations were almost a performance equivalent to using the MLA macros in synth.c for me (which was quite a bit :-). I offered to change the imdct36 into MLA macros if people can try it on other architectures.
Straight to the list is good
Oh, OK. I didn't know if you guys liked that or not. I sent them to Rob last night, about 400 lines or so (configure files, too). I'm not a Sourceforge user, so tell me how you use that to make this easier if that is preferred.
-- Dan