How would I implement fast, sample-resolution random seeking with MAD? In other words, I need to be able to reposition the stream to a specific sample and get resynced in constant time and without too much overhead. The MP3 also needs to be streamed from disk rather than loaded entirely in RAM. I imagine the main problem would be reconstructing the bit pool at the seek point.
From my understanding of the MP3 format, the furthest back bit pool data can be stored is 9 frames. So here is my idea:
1. On initial load, create an index of all the frame offsets in the mp3 and cache it in RAM 2. When seeking to sample N, calculate the containing frame number F as F=N/1152 3. Use the cached frame index to find the file offset of frame F-9 4. Seek to that offset in the file and begin decoding 5. Decode 9 frames silently, ignoring any MAD_ERROR_BADDATAPTR 6. Begin playing at sample N which is within the current frame
Will this work? Is there a faster way that avoids decoding the audio data for the extra 9 frames?
On Nov 27, 2003, at 8:57 AM, Jedediah Smith wrote:
How would I implement fast, sample-resolution random seeking with MAD? In other words, I need to be able to reposition the stream to a specific sample and get resynced in constant time and without too much overhead. The MP3 also needs to be streamed from disk rather than loaded entirely in RAM. I imagine the main problem would be reconstructing the bit pool at the seek point.
From my understanding of the MP3 format, the furthest back bit pool data can be stored is 9 frames. So here is my idea:
- On initial load, create an index of all the frame offsets in the
mp3 and cache it in RAM 2. When seeking to sample N, calculate the containing frame number F as F=N/1152 3. Use the cached frame index to find the file offset of frame F-9 4. Seek to that offset in the file and begin decoding 5. Decode 9 frames silently, ignoring any MAD_ERROR_BADDATAPTR 6. Begin playing at sample N which is within the current frame
Will this work? Is there a faster way that avoids decoding the audio data for the extra 9 frames?
This seems a reasonable approach. Keep in mind the number of samples per frame will only be 1152 for MPEG-1 Layer III or II. Likewise the 9 frame reservoir limit only applies to MPEG-1. For MPEG-2 the theoretical maximum is 29 frames.
You don't have to fully decode each frame leading to your target frame; you can skip the synthesis step for all but the frame immediately prior to the target.
Super.. now is there I way I can get the mad api to tell me how many frames back the bit pool goes from a given frame? I could store this along with the frame offsets to optimize.
On Fri, 28 Nov 2003 15:02:26 -0800, Rob Leslie rob@mars.org wrote:
On Nov 27, 2003, at 8:57 AM, Jedediah Smith wrote:
How would I implement fast, sample-resolution random seeking with MAD? In other words, I need to be able to reposition the stream to a specific sample and get resynced in constant time and without too much overhead. The MP3 also needs to be streamed from disk rather than loaded entirely in RAM. I imagine the main problem would be reconstructing the bit pool at the seek point.
From my understanding of the MP3 format, the furthest back bit pool data can be stored is 9 frames. So here is my idea:
- On initial load, create an index of all the frame offsets in the mp3
and cache it in RAM 2. When seeking to sample N, calculate the containing frame number F as F=N/1152 3. Use the cached frame index to find the file offset of frame F-9 4. Seek to that offset in the file and begin decoding 5. Decode 9 frames silently, ignoring any MAD_ERROR_BADDATAPTR 6. Begin playing at sample N which is within the current frame
Will this work? Is there a faster way that avoids decoding the audio data for the extra 9 frames?
This seems a reasonable approach. Keep in mind the number of samples per frame will only be 1152 for MPEG-1 Layer III or II. Likewise the 9 frame reservoir limit only applies to MPEG-1. For MPEG-2 the theoretical maximum is 29 frames.
You don't have to fully decode each frame leading to your target frame; you can skip the synthesis step for all but the frame immediately prior to the target.
On Nov 28, 2003, at 4:47 PM, Jedediah Smith wrote:
Super.. now is there I way I can get the mad api to tell me how many frames back the bit pool goes from a given frame? I could store this along with the frame offsets to optimize.
Not simply. You can poke around in the frame yourself, if you like:
unsigned int main_data_begin;
{ struct mad_bitptr peek; unsigned long header;
mad_bit_init(&peek, stream->this_frame);
header = mad_bit_read(&peek, 32); if (!(header & 0x00010000L)) /* protection_bit */ mad_bit_skip(&peek, 16); /* crc_check */
main_data_begin = mad_bit_read(&peek, (header & 0x00080000L) /* ID */ ? 9 : 8);
mad_bit_finish(&peek); }
This reads the current frame's main_data_begin which tells you how many bytes before the frame its main_data begins. Note however that this count does not include header or side info bytes of previous frames. Headers are 4 or 6 bytes depending whether protection_bit is clear (see above). Layer III side information has the following number of bytes:
lsf = frame->header.flags & MAD_FLAG_LSF_EXT nch = MAD_NCHANNELS(&frame->header)
lsf !lsf +---------- nch == 1 | 9 17 nch != 1 | 17 32
The total byte size of the frame is stream->next_frame - stream->this_frame. This should be enough information to calculate the number of frames prior the current frame's main_data begins and thus where to start decoding in order to fully decode the current frame.
Note that with the exception of total frame byte size, the above discussion is only applicable to Layer III. Layers I and II have different frame structures but do not use a bit reservoir so they can always begin decoding at any frame.
Thanks Rob, helpful stuff!
But doesn't the main data pointer of the current frame come after the header of the previous frame? That's how I read the ISO spec:
"main_data_end - The value of main_data_end is used to determine the location in the bitstream of the last bit of main_data for the frame. The main_data_end value specifies the location as a negative offset in bytes from the next frame's frame header location in the main_data portion of the bitstream."
So: main_data_end for frame N-1 comes immediately after the header for frame N-1 main_data_begin for frame N is equivelant to main_data_end for frame N-1 thus main_data_begin for frame N comes immediately after the header for frame N-1
is that right?
On Sun, 30 Nov 2003 12:45:34 -0800, Rob Leslie rob@mars.org wrote:
On Nov 28, 2003, at 4:47 PM, Jedediah Smith wrote:
Super.. now is there I way I can get the mad api to tell me how many frames back the bit pool goes from a given frame? I could store this along with the frame offsets to optimize.
Not simply. You can poke around in the frame yourself, if you like:
unsigned int main_data_begin;
{ struct mad_bitptr peek; unsigned long header;
mad_bit_init(&peek, stream->this_frame); header = mad_bit_read(&peek, 32); if (!(header & 0x00010000L)) /* protection_bit */ mad_bit_skip(&peek, 16); /* crc_check */ main_data_begin = mad_bit_read(&peek, (header & 0x00080000L) /* ID */ ? 9 : 8); mad_bit_finish(&peek);
}
This reads the current frame's main_data_begin which tells you how many bytes before the frame its main_data begins. Note however that this count does not include header or side info bytes of previous frames. Headers are 4 or 6 bytes depending whether protection_bit is clear (see above). Layer III side information has the following number of bytes:
lsf = frame->header.flags & MAD_FLAG_LSF_EXT nch = MAD_NCHANNELS(&frame->header)
lsf !lsf +---------- nch == 1 | 9 17 nch != 1 | 17 32
The total byte size of the frame is stream->next_frame - stream->this_frame. This should be enough information to calculate the number of frames prior the current frame's main_data begins and thus where to start decoding in order to fully decode the current frame.
Note that with the exception of total frame byte size, the above discussion is only applicable to Layer III. Layers I and II have different frame structures but do not use a bit reservoir so they can always begin decoding at any frame.
On Dec 1, 2003, at 12:52 AM, Jedediah Smith wrote:
Thanks Rob, helpful stuff!
But doesn't the main data pointer of the current frame come after the header of the previous frame? That's how I read the ISO spec:
"main_data_end - The value of main_data_end is used to determine the location in the bitstream of the last bit of main_data for the frame. The main_data_end value specifies the location as a negative offset in bytes from the next frame's frame header location in the main_data portion of the bitstream."
This seems to be from an older or preliminary version of the spec. There is no such field in the current ISO/IEC 11172-3. Probably the field was changed to main_data_begin semantics before the spec was standardized.
So: main_data_end for frame N-1 comes immediately after the header for frame N-1 main_data_begin for frame N is equivelant to main_data_end for frame N-1 thus main_data_begin for frame N comes immediately after the header for frame N-1
is that right?
No -- the main_data_begin for frame N comes immediately after the header for frame N.
Cheers,