FWIW, here's a patch to use the multiply/accumulate macros in the Layer III IMDCT.
Cool! I would like to request one change. The lowest level MLA inline functions assume a pointer to memory locations, and this isn't the most efficient for the PowerPC implementation. I can shave about another 5 percent or so off the decoding time if I let the compiler actually think these are registers, and not just memory locations.
Are you sure the compiler won't optimize these references into direct register accesses?
Cheers, -rob