MAD 0.10.1b available - mad-dev

23 Apr 2000


      I made a new snapshot of MAD, version 0.10.1b.
ftp://ftp.mars.org/pub/mpeg/
I managed to create three new optimizations for this version to significantly
improve performance across all layers and use less memory at the same time.
On the StrongARM 1100, performance for Layer III improved from 37% to 31%.
Layer II improved from 22% to 17%, and Layer I from 23% to 18%. This puts MAD
ahead of all known integer decoders for Layer I and Layer II, and only behind
Xaudio (at 24%) for Layer III. For a more complete summary, see the TIMINGS
file in the distribution.
The specific optimizations I made were to observe that the least significant
12 bits in all the subband synthesis windowing coefficients were zero, so the
multiplication cycle count could be reduced on some machines by pre-shifting
these away to create greater leading-zero or leading-one counts. The second
optimization I made was to reduce the windowing coefficient table almost in
half (saving memory) by utilizing symmetry, and simultaneously localizing the
symmetric computation such that fewer overall memory references are needed.
The third optimization was to modify the way fixed-point multiplication is
performed during synthesis windowing, in conjunction with the first
optimization. Since the coefficients have only 16 significant fractional bits,
multiplying by a 12-fractional-bit number would yield exactly 28 fractional
bits. Thus all the fixed-point shifts can be eliminated during windowing if
the input is pre-shifted by 16 (28-12) bits. Another benefit of this
optimization is the compiler can more easily choose to use a
multiply-accumulate instruction if one is available.
This last optimization loses precision in the output, but I think this should
not generally be audible. I haven't yet analyzed the loss with respect to
decoding compliance, although this is my intention. If anyone is interested in
looking into this, I'd appreciate the help. The optimization can optionally be
turned off to get more accurate output.
I still have not yet rewritten the Layer III IMDCT, so Layer III performance
will get better still. How much? The lower bound can be seen in the Layer I
and Layer II numbers, since these are the simplest layers that use the same
subband synthesis.
This version also has some code cleanup, including fixed_t -> mad_fixed_t
renaming as suggested, new M/S and intensity stereo indicators for madplay's
verbose mode, and plenty of other miscellaneous changes. There is also new
sample code `minimad.c' which shows perhaps the simplest use of the libmad
API.
Cheers,
  -rob