Hi all, just grabbed the mad libs today. So far, the sound is as good as mpg123 (at least as far as my ears and speakers are concerned). The CPU usage is about 2 - 3 times that of mpg123 on the same mp3 files. However, mad seems to use a wee less memory.
All told, seems like a nice little lib, hope to dig into it more as time goes on. The web page mentions possible API changes, any idea of how soon? How far different?
Hi Sean,
Hi all, just grabbed the mad libs today. So far, the sound is as good as mpg123 (at least as far as my ears and speakers are concerned). The CPU usage is about 2 - 3 times that of mpg123 on the same mp3 files. However, mad seems to use a wee less memory.
The Layer III decoding should get a bit faster once the IMDCT is rewritten, and the memory footprint may even get smaller. :-)
I have doubts that the speed will eventually reach that of the floating-point decoders, but I'd love to be proven wrong. My primary goal was to have it run well on machines without an FPU, and so far so good.
All told, seems like a nice little lib, hope to dig into it more as time goes on. The web page mentions possible API changes, any idea of how soon? How far different?
Version 0.10.0b has a prototype API in place that I think is close to what I want. I'd very much like feedback from anyone who has looked at it. The API has two layers: a low-level interface in which each decoding step is carried out explicitly frame-by-frame, and a high-level interface which can be called either synchronously or asynchronously to decode and play an entire stream at once. The high-level interface merely makes calls to the low-level interface and uses callbacks to the application when it needs more input and to play the output.
The current madplay.c uses the high-level sync interface; the versions previous to 0.10.0b used the low-level interface. I am in the process of writing a small utility to calculate the exact playing time of an MPEG stream (even VBR streams) that will use the low-level API.
The part that's most incomplete right now is the async API. It will currently decode a stream in the background, but there's no way to send it control messages. (Well, you can send it messages, but they'll be ignored.) I'm not certain yet what these messages should look like.
Once I get some feedback and become fairly confident in the API, I plan to write up some documentation. Until then, feel free to ask if anything is unclear.
MAD's API is a little different from others I've seen in that the synthesis step is decoupled from the decoding step. This means there is an opportunity to write some interesting filters on the decoded subband samples before they are synthesized into PCM samples. For example, this is how madplay implements stereo->mono conversion given the -m switch, although for Layer III there is also a more efficient way to do this.
This is also where an equalizer filter could be added. Yet another possibility would be to mix two MPEG streams by decoding separately and joining in the filter, performing synthesis only once. This could be used for cross-fades or to overlay speech over music, etc. with minimal overhead.
Apart from the async API, the biggest change I can see coming would be for MPEG 2 multichannel support. I haven't looked into this very much yet but my copy of the ISO/IEC 13818-3 standard arrived recently so I can work on it.
-rob
The Layer III decoding should get a bit faster once the IMDCT is rewritten, and the memory footprint may even get smaller. :-)
Less memory is good (-:
I have doubts that the speed will eventually reach that of the floating-point decoders, but I'd love to be proven wrong. My primary goal was to have it run well on machines without an FPU, and so far so good.
Why do the floating point libs have the advantage? Pardon me, but I am new to mp3 decoding and the math involved.
Once I get some feedback and become fairly confident in the API, I plan to write up some documentation. Until then, feel free to ask if anything is unclear.
MAD's API is a little different from others I've seen in that the synthesis step is decoupled from the decoding step. This means there is an opportunity to write some interesting filters on the decoded subband samples before they are synthesized into PCM samples. For example, this is how madplay implements stereo->mono conversion given the -m switch, although for Layer III there is also a more efficient way to do this.
Could you give me a rough data flow from input file to sound output? I see it currently as:
get next chunk from file decode chunk into useful data send data to sound processing
and I am sure I am missing the whole picture.
I have doubts that the speed will eventually reach that of the floating-point decoders, but I'd love to be proven wrong. My primary goal was to have it run well on machines without an FPU, and so far so good.
Why do the floating point libs have the advantage? Pardon me, but I am new to mp3 decoding and the math involved.
I don't have a good grasp of why this is yet. I have some ideas but not much more than empirical evidence at the moment.
Anyone else have an opinion?
I have had some thoughts about using a 16-bit integer representation for the fixed-point operations (rather than 32) which risks losing some audio quality but might run a lot faster under many CPUs, particularly those with 2x16 or 4x16-bit SIMD instructions. It seems this should be possible without losing quality (for 16-bit PCM output anyway) but the hard part is dealing with a lot of the scaling that goes on with numbers outside the (-1.0, 1.0) range.
Could you give me a rough data flow from input file to sound output? I see it currently as:
get next chunk from file decode chunk into useful data send data to sound processing
and I am sure I am missing the whole picture.
Sure. Use of the libmad API currently goes something like this:
# include "libmad.h"
struct mad_stream stream; struct mad_frame frame; struct mad_synth synth;
/* open input file */
mad_stream_init(&stream); mad_frame_init(&frame); mad_synth_init(&synth);
/* load some portion of the file into a buffer, or map it into memory */
mad_stream_buffer(&stream, buffer, buflen);
do { mad_frame_decode(&frame, &stream); // frame <- decode(stream)
/* if -1 was returned and stream.error == MAD_ERR_BUFLEN, more input data is needed; if not EOF, refill the buffer from the start of the next frame (stream.next_frame) and call mad_stream_buffer() again, then restart decoding */
/* if desired, perform filtering on the frame */
mad_synth_frame(&synth, &frame); // synth <- synthesize(frame)
/* output the PCM samples now contained in synth.pcmout */ } while (!done);
/* close input file */
mad_synth_finish(&synth); mad_frame_finish(&frame); mad_stream_finish(&stream);
This is what the low-level API looks like. The high-level API simplifies this somewhat to just a few calls by using hooks you provide to manage refilling the input buffer, filtering, disposing of the output, and handling errors.
MAD_NUMCHANNELS(&frame) will tell you how many channels of output you have; a more precise determination of mono/stereo/dual-channel can also be made from frame.mode. The sampling frequency is in frame.sfreq.
The value in synth.pcmlen says how many PCM samples you have in each channel. (For Layer I, it's 384; for Layer II and Layer III, 1152.) The samples themselves are in synth.pcmout[channel][sample].
Note that the output PCM samples are still in fixed-point format to facilitate the widest possible usage. To get n-bit values, you will have to do some shifting, rounding, and possibly clipping (see audio_*.c for examples.)
HTH. -rob
On Mon, 3 Apr 2000, Rob Leslie wrote:
I don't have a good grasp of why this is yet. I have some ideas but not much more than empirical evidence at the moment.
Anyone else have an opinion?
If we suppose that each instruction take one cycle, a fixed point version need a multiplication and a shift, whereas a floating point instruction doesn't need a shift. And, CPUs with floating point support usually have pretty optimized and fast FP instructions.
I have had some thoughts about using a 16-bit integer representation for the fixed-point operations (rather than 32) which risks losing some audio quality but might run a lot faster under many CPUs, particularly those with 2x16 or 4x16-bit SIMD instructions. It seems this should be possible without losing quality (for 16-bit PCM output anyway) but the hard part is dealing with a lot of the scaling that goes on with numbers outside the (-1.0, 1.0) range.
I tried it with splay, but you loose quality anyway. It's awfully audible. The cummulative rounding error becomes too important.
Nicolas