At long last, I have rewritten the Layer III long block 36-point IMDCT routine to make better use of symmetry and to reuse common subexpressions. I'm not convinced this is the best possible rewrite, but it is better for performance than what was before.
The change involves a slight time/space trade-off, but since MAD is already pretty good with memory I think the change is worth it.
Together with the changes from 0.10.1b, the CPU performance improvements are building up:
(StrongARM 1100 220MHz)
decoder version Layer I Layer II Layer III ------------------------------------------------------------------------------- MAD 0.10.2b (SSO) 18% 17% 28% MAD 0.10.1b (SSO) 18% 17% 31% MAD 0.10.0b 23% 22% 37%
(Celeron 500A)
decoder version Layer I Layer II Layer III ------------------------------------------------------------------------------- MAD 0.10.2b (SSO) 3% 3% 6% MAD 0.10.0b 4% 4% 7%
I suspect the next thing to be done is to modify the way Layer III decoding is performed such that requantization happens at the same time as Huffman decoding, instead of in a separate pass. This should reduce memory usage again as well as help performance.
The latest release can be found here:
ftp://ftp.mars.org/pub/mpeg/
Cheers, -rob
Have you researched whether the SSO change is standards conformant? Does it follow the mp[123] spec and test?
Have you researched whether the SSO change is standards conformant? Does it follow the mp[123] spec and test?
I suppose it was time I did some real compliance testing.
Part 4 of ISO/IEC 11172 defines the compliance tests. Section 2.6.3 defines the computational accuracy tests for audio decoders. It states, "To be called an ISO/IEC 11172-3 audio decoder, the decoder shall provide an output such that the rms level of the difference signal between the output of the decoder under test and the supplied reference output is less than 2^(-15)/sqrt(12) for the supplied sine sweep (20Hz-10kHz) with an amplitude of -20dB relative to full scale. In addition, the difference signal shall have a maximum absolute value of at most 2^(-14) relative to full-scale."
Then it says, "To be called a limited accuracy ISO/IEC 11172-3 audio decoder, the decoder shall provide an output for a provided test sequence such that the rms level of the difference signal between the output of the decoder under test and the supplied reference output is less than 2^(-11)/sqrt(12) for the supplied sine sweep (20Hz-10kHz) with an amplitude of -20dB relative to full scale."
I wrote a Perl script to calculate the RMS and maximum difference values and used it to test the output from MAD against the Layer III compliance testing stream and reference output.
The bad news is that MAD 0.10.2b with SSO fails both tests.
The good news is that the attached patch increases the output accuracy such that MAD with SSO is compliant with respect to limited accuracy.
More good news is that MAD without SSO is fully compliant.
While I was at it, I tested the output from some other decoders. It is interesting to note that the ARM (fixed-point) version of Xaudio is only limited-accuracy-compliant while the x86 (presumably floating-point) version is fully-compliant. Meanwhile the fixed-point version of mpg123 is not compliant at all, although the normal floating point version is.
Here are the details. Keep in mind the diff accuracy is only required for full compliance:
MAD 0.10.2b without SSO RMS level = 9.00047710072236e-08 RMS compliant (under maximum by 8.71966120173487e-06) Max diff = 9.5367431640625e-07 Diff compliant (under maximum by 6.00814819335938e-05)
MAD 0.10.2b with SSO RMS level = 0.00019839190882482 RMS not compliant (over limited accuracy by 5.74372532609463e-05) Max diff = 0.000553488731384277 Diff not compliant (over by 0.000492453575134277)
MAD 0.10.2b with SSO and patch (soon to be 0.10.3b) RMS level = 4.94641323225028e-05 RMS limited accuracy compliant (under maximum by 9.14905232413706e-05) Max diff = 0.000137686729431152 Diff not compliant (over by 7.66515731811523e-05)
Xaudio 1.3.1, ARM RMS level = 2.06487581418067e-05 RMS limited accuracy compliant (under maximum by 0.000120305897422067) Max diff = 6.58035278320312e-05 Diff not compliant (over by 4.76837158203125e-06)
Xaudio 1.3.1, x86 RMS level = 8.60182046119853e-06 RMS compliant (under maximum by 2.07845511543564e-07) Max diff = 1.53779983520508e-05 Diff compliant (under maximum by 4.56571578979492e-05)
mpg123-arm32 0.59r RMS level = 0.0705093214620266 RMS not compliant (over limited accuracy by 0.0703683668064627) Max diff = 0.100134015083313 Diff not compliant (over by 0.100072979927063)
mpg123 0.59r RMS level = 8.60182046119853e-06 RMS compliant (under maximum by 2.07845511543564e-07) Max diff = 1.53779983520508e-05 Diff compliant (under maximum by 4.56571578979492e-05)
MAD was tested with full 24-bit output. Since I only know how to get 16-bit output from Xaudio and mpg123, the remaining bits were set to zero for testing purposes, as instructed in the standard.
I tried to test splay also, but it failed to generate any output at all from the Layer III, 64 Kbps, 48000 Hz, single channel test stream.
I am going to experiment to see if there is a way to keep the SSO while improving accuracy even further than this patch does so that MAD can always be fully compliant.
Btw, there is a bug in 0.10.2b which can cause --disable-sso to fail. The following patch also corrects this.
Cheers, -rob
The bad news is that MAD 0.10.2b with SSO fails both tests.
The good news is that the attached patch increases the output accuracy such that MAD with SSO is compliant with respect to limited accuracy.
More good news is that MAD without SSO is fully compliant.
I am going to experiment to see if there is a way to keep the SSO while improving accuracy even further than this patch does so that MAD can always be fully compliant.
mad is still relatively new, plenty of time for improvement. Thanks for checking the compliance.