At long last, I have rewritten the Layer III long block 36-point IMDCT routine to make better use of symmetry and to reuse common subexpressions. I'm not convinced this is the best possible rewrite, but it is better for performance than what was before.
The change involves a slight time/space trade-off, but since MAD is already pretty good with memory I think the change is worth it.
Together with the changes from 0.10.1b, the CPU performance improvements are building up:
(StrongARM 1100 220MHz)
decoder version Layer I Layer II Layer III ------------------------------------------------------------------------------- MAD 0.10.2b (SSO) 18% 17% 28% MAD 0.10.1b (SSO) 18% 17% 31% MAD 0.10.0b 23% 22% 37%
(Celeron 500A)
decoder version Layer I Layer II Layer III ------------------------------------------------------------------------------- MAD 0.10.2b (SSO) 3% 3% 6% MAD 0.10.0b 4% 4% 7%
I suspect the next thing to be done is to modify the way Layer III decoding is performed such that requantization happens at the same time as Huffman decoding, instead of in a separate pass. This should reduce memory usage again as well as help performance.
The latest release can be found here:
ftp://ftp.mars.org/pub/mpeg/
Cheers, -rob