There was one issue I ran into when implementing MP3 importing support for Audacity that I meant to ask about.
It seems that when you pass the decoder data from the input callback, it tries to synthesize everything you give it. If the end of the buffer you passed ends in the middle of a frame, you get seams. Passing all the data to the decoder at once solves the problem, but that's not practical.
As far as I can see, you can't get any metainformation (like the size of each frame) until you pass it data. Once you pass it data, you've probably misaligned the frame and produced garbage. What's the proper way to handle this situation?
Joshua
Joshua Haberman wrote:
It seems that when you pass the decoder data from the input callback, it tries to synthesize everything you give it. If the end of the buffer you passed ends in the middle of a frame, you get seams. Passing all the data to the decoder at once solves the problem, but that's not practical.
As far as I can see, you can't get any metainformation (like the size of each frame) until you pass it data. Once you pass it data, you've probably misaligned the frame and produced garbage. What's the proper way to handle this situation?
Each time you refill your buffer, you need to preserve the data in your existing buffer from stream.next_frame to the end.
This usually amounts to calling memmove() on this unconsumed portion of the buffer and appending new data after it, before calling mad_stream_buffer().
Cheers, -rob
* Rob Leslie (rob@mars.org) wrote:
Joshua Haberman wrote:
It seems that when you pass the decoder data from the input callback, it tries to synthesize everything you give it. If the end of the buffer you passed ends in the middle of a frame, you get seams. Passing all the data to the decoder at once solves the problem, but that's not practical.
As far as I can see, you can't get any metainformation (like the size of each frame) until you pass it data. Once you pass it data, you've probably misaligned the frame and produced garbage. What's the proper way to handle this situation?
Each time you refill your buffer, you need to preserve the data in your existing buffer from stream.next_frame to the end.
This usually amounts to calling memmove() on this unconsumed portion of the buffer and appending new data after it, before calling mad_stream_buffer().
Wonderful, that took care of the seaming problem! How can I determine the optimal buffer size to keep memmove()ing to a minimum? Should I hard-code a size that corresponds with a common mp3 frame size, or should I try to determine it a runtime?
It seems like the worst case would be having the buffer length just short of two frames -- then you'd copy almost all of the second frame in each run. Best case would be three or four frames at once...?
Joshua
Joshua Haberman wrote:
Each time you refill your buffer, you need to preserve the data in your existing buffer from stream.next_frame to the end.
This usually amounts to calling memmove() on this unconsumed portion of the buffer and appending new data after it, before calling mad_stream_buffer().
Wonderful, that took care of the seaming problem! How can I determine the optimal buffer size to keep memmove()ing to a minimum? Should I hard-code a size that corresponds with a common mp3 frame size, or should I try to determine it a runtime?
The size of an MP3 frame varies with the bitrate, sampling frequency, and time (on account of occasional "padding" slots), so trying to guess the frame size is unlikely to be beneficial. Since you'll only be calling memmove() on less than a full frame, for best results, I recommend a buffer size large enough to hold several frames.
Frames are relatively small. A 128 kbps 44100 Hz MP3 frame alternates between 417 and 418 bytes. Size can vary from 24 bytes (8 kbps 24 kHz) to 1440 bytes (320 kbps 32 kHz). The absolute minimum buffer size is a single frame, plus MAD_BUFFER_GUARD bytes.
Another way to consider it is that a 40000 byte buffer holds enough MP3 data for 2.5 seconds at 128 kbps, or 1 second at 320 kbps. This is a reasonable buffer size and is one that I often use.
It seems like the worst case would be having the buffer length just short of two frames -- then you'd copy almost all of the second frame in each run. Best case would be three or four frames at once...?
The larger the buffer, the better...
Cheers, -rob