I have doubts that the speed will eventually reach that of the floating-point decoders, but I'd love to be proven wrong. My primary goal was to have it run well on machines without an FPU, and so far so good.
Why do the floating point libs have the advantage? Pardon me, but I am new to mp3 decoding and the math involved.
I don't have a good grasp of why this is yet. I have some ideas but not much more than empirical evidence at the moment.
Anyone else have an opinion?
I have had some thoughts about using a 16-bit integer representation for the fixed-point operations (rather than 32) which risks losing some audio quality but might run a lot faster under many CPUs, particularly those with 2x16 or 4x16-bit SIMD instructions. It seems this should be possible without losing quality (for 16-bit PCM output anyway) but the hard part is dealing with a lot of the scaling that goes on with numbers outside the (-1.0, 1.0) range.
Could you give me a rough data flow from input file to sound output? I see it currently as:
get next chunk from file decode chunk into useful data send data to sound processing
and I am sure I am missing the whole picture.
Sure. Use of the libmad API currently goes something like this:
# include "libmad.h"
struct mad_stream stream; struct mad_frame frame; struct mad_synth synth;
/* open input file */
mad_stream_init(&stream); mad_frame_init(&frame); mad_synth_init(&synth);
/* load some portion of the file into a buffer, or map it into memory */
mad_stream_buffer(&stream, buffer, buflen);
do { mad_frame_decode(&frame, &stream); // frame <- decode(stream)
/* if -1 was returned and stream.error == MAD_ERR_BUFLEN, more input data is needed; if not EOF, refill the buffer from the start of the next frame (stream.next_frame) and call mad_stream_buffer() again, then restart decoding */
/* if desired, perform filtering on the frame */
mad_synth_frame(&synth, &frame); // synth <- synthesize(frame)
/* output the PCM samples now contained in synth.pcmout */ } while (!done);
/* close input file */
mad_synth_finish(&synth); mad_frame_finish(&frame); mad_stream_finish(&stream);
This is what the low-level API looks like. The high-level API simplifies this somewhat to just a few calls by using hooks you provide to manage refilling the input buffer, filtering, disposing of the output, and handling errors.
MAD_NUMCHANNELS(&frame) will tell you how many channels of output you have; a more precise determination of mono/stereo/dual-channel can also be made from frame.mode. The sampling frequency is in frame.sfreq.
The value in synth.pcmlen says how many PCM samples you have in each channel. (For Layer I, it's 384; for Layer II and Layer III, 1152.) The samples themselves are in synth.pcmout[channel][sample].
Note that the output PCM samples are still in fixed-point format to facilitate the widest possible usage. To get n-bit values, you will have to do some shifting, rounding, and possibly clipping (see audio_*.c for examples.)
HTH. -rob