frame decoding questions

List overview All Threads
Download

newer

older

Decode MP3 at tx39

resample library

Russell O'Connor

25 May 2002 25 May '02

3:01 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

[To: mad-dev@lists.mars.org]

I'm trying to implement better seeking in my application.

First, easy question: I believe after calling mad_decode_header, the amount of (decoded) data in the next frame is:

64*MAD_NSBSAMPLES(&(sync.frame.header))*MAD_NCHANNELS(&(sync.frame.header))

Next question:

after calling mad_decode_header, am I allowed to call mad_decode_frame to actually decode the frame? The idea being to seek to where I want be by calling decode_header a bunch of time till I get to where I want to be.

I think decode_frame returns -1 after I call decode_header. I'm not sure if this the the behaviour to expect, or if I'm doing something wrong.

- -- Russell O'Connor roconnor@math.berkeley.edu http://www.math.berkeley.edu/~roconnor/ ``Later generations will regard set theory as a disease from which one has recovered.'' -- Poincare

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (SunOS) Comment: For info see http://www.gnupg.org iD8DBQE87v6KuZUa0PWVyWQRAnYWAJ9KyLO66lp5wJjSb4pKj0SOWp7tzQCcDubu mQRz82MfkI6fWF85Y97dzCw= =COB6 -----END PGP SIGNATURE-----

Show replies by date

Rob Leslie

25 May 25 May

4:56 a.m.

On Friday, May 24, 2002, at 08:01 PM, Russell O'Connor wrote:

...

I'm trying to implement better seeking in my application.

First, easy question: I believe after calling mad_decode_header, the amount of (decoded) data in the next frame is:

64*MAD_NSBSAMPLES(&(sync.frame.header))*MAD_NCHANNELS(&(sync.frame.header) )

I'm not sure what units you're using, so to clarify: the number of samples per channel in a frame is generally

32 * MAD_NSBSAMPLES(&frame.header)

however after synthesis it is best to use the value in synth.pcm.length instead. (The above expression is wrong if MAD_OPTION_HALFSAMPLERATE is in effect.)

Sometimes it's better to think in terms of playing time duration. After decoding the frame header, the playing time of the frame is stored in frame.header.duration. Another way to calculate the number of samples per channel would be:

mad_timer_count(frame.header.duration, frame.header.samplerate)

The number of channels of course is MAD_NCHANNELS(&frame.header) or, after synthesis, synth.pcm.channels. After synthesis you should also use synth.pcm.samplerate rather than frame.header.samplerate.

...

Next question:

after calling mad_decode_header, am I allowed to call mad_decode_frame to actually decode the frame? The idea being to seek to where I want be by calling decode_header a bunch of time till I get to where I want to be.

Yes, it is explicitly permitted to call mad_frame_decode() after mad_header_decode() to continue decoding the same frame.

There are actually three ways to use these routines:

1. Quickly scan frame headers only:

do { mad_header_decode(&header, &stream); } while (...);

2. Decode full frames:

do { mad_frame_decode(&frame, &stream); } while (...);

3. Decode full frames, but with opportunity to inspect each frame header prior to decoding the rest of the frame:

do { mad_header_decode(&frame.header, &stream); /* inspect header, decide whether to continue decoding the frame */ mad_frame_decode(&frame, &stream); } while (...);

The high-level API uses method 3 if you specify a header callback, otherwise it uses method 2. You can effect method 1 with the high-level API by returning MAD_FLOW_IGNORE from your header callback routine.

...

I think decode_frame returns -1 after I call decode_header. I'm not sure if this the the behaviour to expect, or if I'm doing something wrong.

Probably this is due to a real decoding error. If you're seeking around a Layer III stream, the bit reservoir may not have had an opportunity to be refilled, and it's normal to get errors on a few frames in this case.

Note that calling mad_header_decode() alone does not update the bit reservoir, so you will probably want to stop scanning headers and fully decode a few frames before your seek point. This will also update the Layer III overlap-add buffers for a smoother audio entry to the seek point. You will also want to call mad_synth_frame() on the frame immediately before your seek point and discard the resulting PCM in order to update the synthesis buffers, again for a smoother audio entry.

-- Rob Leslie rob@mars.org

Russell O'Connor

7:15 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

[To: mad-dev@lists.mars.org]

On Fri, 24 May 2002, Rob Leslie wrote:

...

I'm not sure what units you're using, so to clarify: the number of samples per channel in a frame is generally
 32 * MAD_NSBSAMPLES(&frame.header)
however after synthesis it is best to use the value in synth.pcm.length instead. (The above expression is wrong if MAD_OPTION_HALFSAMPLERATE is in effect.)

Hmmm, basically I'm looking for the length in bytes. Since the bytes per sample per channel is 2, then we get 64*NBSAMPLES*NCHANNELS. Or so I was thinking.

Since I'm seeking though the file to get the the spot the user has requested, I don't want to be synthing data.

...

...
I think decode_frame returns -1 after I call decode_header. I'm not sure if this the the behaviour to expect, or if I'm doing something wrong.

Probably this is due to a real decoding error. If you're seeking around a Layer III stream, the bit reservoir may not have had an opportunity to be refilled, and it's normal to get errors on a few frames in this case.

So since I haven't decoded the last couple of frames, getting an error on the first frame I decode in the middle of the stream is to be expeded? Do you know how many frames back I would at most have to decode before I can correctly decode my target frame? Do I need to synth those frames too?

So basically to do perfect seeking in MP3, I would have to skip though the headers, and keep a point back n number of frames, until I find where I want to be. Reinitalize mad_stream, etc. Seek the file to where my back pointer is indicating. Decode n frames silently, and then start playing at the appropriate place in the nth frame. Whew.

- -- Russell O'Connor roconnor@alumni.uwaterloo.ca http://www.math.berkeley.edu/~roconnor/ ``Later generations will regard set theory as a disease from which one has recovered.'' -- Poincare

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (SunOS) Comment: For info see http://www.gnupg.org iD8DBQE87zn7uZUa0PWVyWQRAqjAAJ4gZyjJPEln0HODk9ryCmTuv4IY2QCgj+iO X8a1OKNEV5Dixg3Ryxa4Mzk= =ttYm -----END PGP SIGNATURE-----

Rob Leslie

26 May 26 May

12:18 a.m.

On Saturday, May 25, 2002, at 12:15 AM, Russell O'Connor wrote:

...

Hmmm, basically I'm looking for the length in bytes. Since the bytes per sample per channel is 2, then we get 64*NBSAMPLES*NCHANNELS. Or so I was thinking.

I see. The number of bytes per sample is of course arbitrary, which is why I counted samples instead of bytes.

...

So since I haven't decoded the last couple of frames, getting an error on the first frame I decode in the middle of the stream is to be expeded?

Yes, this is not unusual.

...

Do you know how many frames back I would at most have to decode before I can correctly decode my target frame? Do I need to synth those frames too?

I think the worst case would be the largest possible bit reservoir against the smallest possible frame size.

The largest possible bit reservoir is 511 bytes (MPEG-1 only), and in this case the smallest Layer III frame size is 96 bytes (32 kbps, 48000 Hz). The bit reservoir reaches back over previous frames, but does not include space used by frame headers or Layer III side information. In MPEG-1, the largest possible frame header + side info size is 38 bytes, leaving 58 least possible bit reservoir bytes per frame. Rounding up, 511 / 58 is 9 frames.

The smallest possible frame size for Layer III at all is 24 bytes (8 kbps, 24000 Hz) and the largest bit reservoir in this case is 255 bytes (MPEG-2) . The largest possible frame header + side info size for MPEG-2 is 23 bytes, which is probably impractical in this case. More likely the stream would be single-channel, and in this case the header + side info size will be at most 15 bytes, leaving 9 bytes per frame. Rounding up, 255 / 9 is 29 frames.

These are extreme cases. I think it's actually rare for the bit reservoir to extend beyond one or two previous frames in the most common bitrate/samplerate combinations.

In any case, you only need to synth the frame immediately preceding the target frame.

...

So basically to do perfect seeking in MP3, I would have to skip though the headers, and keep a point back n number of frames, until I find where I want to be. Reinitalize mad_stream, etc. Seek the file to where my back pointer is indicating. Decode n frames silently, and then start playing at the appropriate place in the nth frame. Whew.

Since frames always have the same playing time for any given layer and sampling frequency, you don't have to read headers all the way to your seek point. You can calculate n * the duration of one frame, subtract this from the time of your seek point, and stop scanning when you reach the frame containing this time point.

-- Rob Leslie rob@mars.org

Russell O'Connor

1:36 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

[To: rob@mars.org, mad-dev@lists.mars.org]

On Sat, 25 May 2002, Rob Leslie wrote:

...

I see. The number of bytes per sample is of course arbitrary, which is why I counted samples instead of bytes.

Why do you say the number of bytes per sample is arbitrary (I guess I don't know much about mp3). I always thought it is 16 bits per sample per channel. Is this not always the case?

...

Since frames always have the same playing time for any given layer and sampling frequency, you don't have to read headers all the way to your seek point. You can calculate n * the duration of one frame, subtract this from the time of your seek point, and stop scanning when you reach the frame containing this time point.

Wouldn't variable bit rate files have differnt durations per frame depending on the frame?

- -- Russell O'Connor roconnor@alumni.uwaterloo.ca http://www.math.berkeley.edu/~roconnor/ ``Later generations will regard set theory as a disease from which one has recovered.'' -- Poincare

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (SunOS) Comment: For info see http://www.gnupg.org iD8DBQE88DwTuZUa0PWVyWQRAlRcAJ9bapYXGj0976reDig5l+/+HKaoNACgjaj6 KUoZafcu2WPyfkujeyhXpbE= =NrH2 -----END PGP SIGNATURE-----

Rob Leslie

2:28 a.m.

On Saturday, May 25, 2002, at 06:36 PM, Russell O'Connor wrote:

...

...
I see. The number of bytes per sample is of course arbitrary, which is why I counted samples instead of bytes.

Why do you say the number of bytes per sample is arbitrary (I guess I don't know much about mp3). I always thought it is 16 bits per sample per channel. Is this not always the case?

MP3s do not contain any sample size information; when decoding, samples are calculated with arbitrary precision. MAD decodes samples with 28+1 fractional bits and 3 bits of headroom, which is why you have to clip and scale these to your desired output sample size. 16 bits is common, but there's nothing to stop you from producing 24-bit or other sized output.

There is a common misunderstanding that the sample size of the source material from which MP3s are encoded has relevance to the sample size of the decoded output. If MP3 were a lossless codec, this might be true, but since MP3s reconstruct a signal from quantized subband values, the output signal does not necessarily fall neatly into the same precision samples as the original.

Additional precision is helpful when you want to further manipulate a signal before output, for example scaling up or down or mixing with another signal. Manipulating 16-bit samples directly would introduce too many rounding errors and degrade the signal unnecessarily.

...

...
Since frames always have the same playing time for any given layer and sampling frequency, you don't have to read headers all the way to your seek point. You can calculate n * the duration of one frame, subtract this from the time of your seek point, and stop scanning when you reach the frame containing this time point.

Wouldn't variable bit rate files have differnt durations per frame depending on the frame?

No. For a given layer and sampling frequency, the number of samples per frame (per channel) is fixed. See this chart:

http://board.mp3-tech.org/view.php3?bn=agora_mp3techorg&key=1019510889

The playing time duration is simply the number of samples divided by the sampling frequency.

-- Rob Leslie rob@mars.org

Russell O'Connor

11:19 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

[To: mad-dev@lists.mars.org]

Suppose after calling mad_header_decode I call mad_frame_decode. Is it possible that mad_frame_decode will giver a MAD_ERROR_BUFLEN or MAD_ERROR_BUFPTR, or will a successful call to mad_header_decode garentee there is enough info in the buffer to atempt a frame decode?

- -- Russell O'Connor roconnor@alumni.uwaterloo.ca http://www.math.berkeley.edu/~roconnor/ ``Later generations will regard set theory as a disease from which one has recovered.'' -- Poincare

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (SunOS) Comment: For info see http://www.gnupg.org iD8DBQE88W2OuZUa0PWVyWQRAnC/AJ9tl0o3HDvCbYV//B/ulrU1WKhPvgCbBDmc auyT/+3xuXJajzg6b4475fM= =cePC -----END PGP SIGNATURE-----

Rob Leslie

11:39 p.m.

On Sunday, May 26, 2002, at 04:19 PM, Russell O'Connor wrote:

...

Suppose after calling mad_header_decode I call mad_frame_decode. Is it possible that mad_frame_decode will giver a MAD_ERROR_BUFLEN or MAD_ERROR_BUFPTR, or will a successful call to mad_header_decode garentee there is enough info in the buffer to atempt a frame decode?

The buffer checks are performed while decoding the frame header, so if you have already successfully called mad_header_decode(), then mad_frame_decode() generally will not fail with MAD_ERROR_BUFLEN or MAD_ERROR_BUFPTR.

-- Rob Leslie rob@mars.org

8245

Age (days ago)

8246

Last active (days ago)

mad-dev@lists.mars.org

7 comments

2 participants

tags (0)

participants (2)

Rob Leslie
Russell O'Connor