The output of the decoding process, from any layer, is a set of subband samples. There are 32 subbands which cover the entire frequency spectrum.
In Layer I, each decoded frame contains 12 complete sets of 32 subband samples which, after synthesis, will become 384 PCM samples. In both Layer II and Layer III, each frame contains 36 sets of 32 subband samples which, after synthesis, will become 1152 PCM samples. Each channel is independent at this point, so there are separate sets of subband samples for each channel.
Subband synthesis is the process of transforming the subband samples, which are in the frequency domain, into PCM samples, in the time domain.
The MPEG audio standard defines a process for doing this, although the algorithm presented in the standard is extremely inefficient to implement as it stands. The algorithm involves a 1024-value block made up of 16 individual 64-value vectors. A single 64-value vector is generated from a complete set of 32 subband samples via something resembling a Discrete Cosine Transformation (DCT). This transformation is what the matrix multiply in synth.c is all about.
It turns out that due to trigonometric symmetry, it is only necessary to calculate half of the 32->64 transformation. The relationship between the two halves is spelled out in this comment in the code:
x[i] = x'[i + 16] i = 0..15 x[i + 17] = -x'[31 - i] i = 0..15 x[i + 32] = -x'[16 - i] i = 0..15 x[i + 48] = -x'[i] i = 0..15 x[16] = 0
where x[] is the 64-value output vector recognized in the standard, and x'[] is the output of a 32-point DCT of the 32-value input vector.
dct32() is merely an optimized implementation of a 32-point DCT based on Lee's fast DCT algorithm (also sometimes credited to Hou.) For the details of this, I recommend reading:
http://flux.fe.uni-lj.si/%7Earpi/stuff/fastdct.pdf http://developer.intel.com/drg/mmx/appnotes/ap533.htm
Now, rather than expand the output of the 32-point DCT into a vector of 64, it is more efficient to leave them alone and simply take into account the necessary changes in sign (seen above) later on in the calculations.
According to the standard, each computed 64-value vector is supposed to be shifted into the 1024-value block, shifting out the oldest 64-value vector from the block. A circular buffer is used for this in MAD rather than shuffling memory around. Also, it's only half as big since we're storing 32 values instead of 64. The 32 values are also split in half because of the way the PCM samples are constructed, into a lo half and a hi half. In the code, the whole block is synth->filterout[channel][half][index] where the index ranges 0..255 over the total set of 16 vectors for either half.
Finally, this value block is windowed and PCM samples are constructed. The standard defines 512 windowing coefficients, found in the file D.dat. Due to the way values are shifted into the block, each windowing pass covers alternating halves of each vector in the block. This is where the lo and hi sets come from, denoted with evenp and oddp pointers during the windowing and PCM reconstruction.
This obviously isn't a complete description, but it should probably help in navigating synth.c. If anyone would like more details, please ask.
-rob