subband synthesis and dct32() explained - mad-dev

8 Apr 2000


      The output of the decoding process, from any layer, is a set of subband
samples. There are 32 subbands which cover the entire frequency spectrum.
In Layer I, each decoded frame contains 12 complete sets of 32 subband samples
which, after synthesis, will become 384 PCM samples. In both Layer II and
Layer III, each frame contains 36 sets of 32 subband samples which, after
synthesis, will become 1152 PCM samples. Each channel is independent at this
point, so there are separate sets of subband samples for each channel.
Subband synthesis is the process of transforming the subband samples, which
are in the frequency domain, into PCM samples, in the time domain.
The MPEG audio standard defines a process for doing this, although the
algorithm presented in the standard is extremely inefficient to implement as
it stands. The algorithm involves a 1024-value block made up of 16 individual
64-value vectors. A single 64-value vector is generated from a complete set of
32 subband samples via something resembling a Discrete Cosine Transformation
(DCT). This transformation is what the matrix multiply in synth.c is all
about.
It turns out that due to trigonometric symmetry, it is only necessary to
calculate half of the 32->64 transformation. The relationship between the two
halves is spelled out in this comment in the code:
x[i]      =  x'[i + 16]  i = 0..15
    x[i + 17] = -x'[31 - i]  i = 0..15
    x[i + 32] = -x'[16 - i]  i = 0..15
    x[i + 48] = -x'[i]       i = 0..15
    x[16]     = 0
where x[] is the 64-value output vector recognized in the standard, and x'[]
is the output of a 32-point DCT of the 32-value input vector.
dct32() is merely an optimized implementation of a 32-point DCT based on Lee's
fast DCT algorithm (also sometimes credited to Hou.) For the details of this,
I recommend reading:
http://flux.fe.uni-lj.si/%7Earpi/stuff/fastdct.pdf
  http://developer.intel.com/drg/mmx/appnotes/ap533.htm
Now, rather than expand the output of the 32-point DCT into a vector of 64, it
is more efficient to leave them alone and simply take into account the
necessary changes in sign (seen above) later on in the calculations.
According to the standard, each computed 64-value vector is supposed to be
shifted into the 1024-value block, shifting out the oldest 64-value vector
from the block. A circular buffer is used for this in MAD rather than
shuffling memory around. Also, it's only half as big since we're storing 32
values instead of 64. The 32 values are also split in half because of the way
the PCM samples are constructed, into a lo half and a hi half. In the code,
the whole block is synth->filterout[channel][half][index] where the index
ranges 0..255 over the total set of 16 vectors for either half.
Finally, this value block is windowed and PCM samples are constructed. The
standard defines 512 windowing coefficients, found in the file D.dat. Due to
the way values are shifted into the block, each windowing pass covers
alternating halves of each vector in the block. This is where the lo and hi
sets come from, denoted with evenp and oddp pointers during the windowing and
PCM reconstruction.
This obviously isn't a complete description, but it should probably help in
navigating synth.c. If anyone would like more details, please ask.
-rob