So, I see comments to the effect of "need some n-bit type here", "depend on <type> being n-bits", etc. Is there a reason you did not use the int32_t and what not types? They are part of the new C standard, so will be portable.
As part of the library API change, it would be nice if library calls were prefixed by some library name, i.e. mad_audio_output(). This way calls for library routines are easily separated from other code. mapplay.c has a function called audio_init(), yet audio_output() is from libmad. This is down right confusing. Seems some functions are defined with the mad_ prefix and others are not.
Trying to use 64bit fpm on my pII box causes the CPU load to go up from the average 10% that top shows to around 15%, no noticable sound change.
How difficult would it be to add support for external dsp's, like on the sound blaster mp3 cards?
I used to own a netwinder, how is the performance on the arm chips?
So, I see comments to the effect of "need some n-bit type here", "depend on <type> being n-bits", etc. Is there a reason you did not use the int32_t and what not types? They are part of the new C standard, so will be portable.
There are only a very few places where I make assumptions about type length, and this is largely due to haste more than anything. These places are marked so I (or you :-) can fix them.
A bigger problem in my view with respect to portability are the places where I rely on sign-extending right-shifts, and the GCC extension I used to initialize members of the Huffman table unions. In time I'll fix these too.
The inline assembly naturally is also non-portable, but this is conveniently isolated and there are a few C substitutes at hand.
As part of the library API change, it would be nice if library calls were prefixed by some library name, i.e. mad_audio_output(). This way calls for library routines are easily separated from other code. mapplay.c has a function called audio_init(), yet audio_output() is from libmad. This is down right confusing. Seems some functions are defined with the mad_ prefix and others are not.
Be careful not to confuse libmad with madplay; all (well, most) exported libmad symbols are indeed prefixed with mad_. The only exception I think is the fixed-point abstraction which uses f_ and the fixed_t type. (Perhaps these should also be prefixed with mad_?)
The other audio_* calls you see in madplay are not part of libmad; they're part of madplay's audio abstraction to support multiple output modules. A look through audio.h, audio.c, and audio_*.c should be instructive. Admittedly, madplay probably should not have defined its own audio_init() and audio_finish(). Sorry about that. :-)
It might also have helped if I had put the libmad source strictly into a separate subdirectory, and I may eventually do this. For now, only what you see in libmad.h should be considered part of libmad.
Trying to use 64bit fpm on my pII box causes the CPU load to go up from the average 10% that top shows to around 15%, no noticable sound change. How difficult would it be to add support for external dsp's, like on the sound blaster mp3 cards?
I don't have very much experience programming DSPs so I don't know. Anyone?
I used to own a netwinder, how is the performance on the arm chips?
Funny you should ask. :-)
I was recently doing some performance tests to compare MAD against other fixed-point decoders I'm aware that also run under ARM. Here's what I found; this is the amount of CPU time required to decode a stereo MPEG stream in each of the audio layers as a percentage of audio real-time:
CPU: StrongARM 1100 220MHz OS: Linux 2.2.14-rmk5-np17-empeg22 Layer Layer Layer decoder version I II III ------------------------------------------------------------- [1] MAD 0.10.0b 23% 22% 37% [2] Xaudio 1.3.1 21% 25% 24% [3] splay-fixpoint 0.81 41% 36% 43% [4] mpg123-arm32 0.59r [5] 27%[6] 32%
[1] http://www.mars.org/home/rob/proj/mpeg/ (GPL) [2] http://www.xaudio.com/ (commercial) [3] ftp://ftp.netwinder.org/users/n/nico/ (GPL/LGPL) [4] http://melanoma.cs.rmit.edu.au/werj/simonb/ (restricted)
[5] failed to decode bitstream [6] outputs silence
This doesn't tell the full story, though, as there are reports of splay decoding Layer III using as little as 4% of the CPU under a 280MHz SA-110 Netwinder while Xaudio and MAD still require 15-23%. This is quite a dramatic improvement for splay, and I'd love to see if MAD could likewise be optimized. (Hi Nicolas!)
I think there is potential for MAD to reach at least Xaudio's level of performance on the ARM. MAD's Layer II decoding is already faster than it is with Xaudio, and all layers share the same CPU-intensive subband synthesis... The current challenge will be in writing a fast Layer III IMDCT, of which several examples exist.
-rob
On 04-Apr-2000 Rob Leslie wrote:
So, I see comments to the effect of "need some n-bit type here", "depend on <type> being n-bits", etc. Is there a reason you did not use the int32_t and what not types? They are part of the new C standard, so will be portable.
There are only a very few places where I make assumptions about type length, and this is largely due to haste more than anything. These places are marked so I (or you :-) can fix them.
would a s/long long/int64_t/g patch be accepted then?
A bigger problem in my view with respect to portability are the places where I rely on sign-extending right-shifts, and the GCC extension I used to initialize members of the Huffman table unions. In time I'll fix these too.
well gcc is one of the most portable compilers around, depending on gcc is not that horrible. But yeah, it would probably be better to fix this as well.
Be careful not to confuse libmad with madplay; all (well, most) exported libmad symbols are indeed prefixed with mad_. The only exception I think is the fixed-point abstraction which uses f_ and the fixed_t type. (Perhaps these should also be prefixed with mad_?)
if it is exported in libmad.h (or seen in 3rd party code) it should have the mad_ prefix or some other library specifix prefix.
The other audio_* calls you see in madplay are not part of libmad; they're part of madplay's audio abstraction to support multiple output modules. A look through audio.h, audio.c, and audio_*.c should be instructive. Admittedly, madplay probably should not have defined its own audio_init() and audio_finish(). Sorry about that. :-)
perhaps the audio functions could get wrapped into their own lib as well? If I were to write a player based on this lib, I would likely use the audio routines as well. No sense recreating the wheel.
It might also have helped if I had put the libmad source strictly into a separate subdirectory, and I may eventually do this. For now, only what you see in libmad.h should be considered part of libmad.
Moving to subdirs would be a good idea.
/mad /decoder /audio /id3 (maybe) /player /docs
I don't have very much experience programming DSPs so I don't know. Anyone?
As I see it, if a dsp exists, it does all the math. So it should simply be possible to do: stream -> dsp -> synth / sound (depending on dsp). In the library it should simply choose not to do the hard work, bypassing most of the lib.
I was recently doing some performance tests to compare MAD against other fixed-point decoders I'm aware that also run under ARM. Here's what I found; this is the amount of CPU time required to decode a stereo MPEG stream in each of the audio layers as a percentage of audio real-time:
What do you use to create these numbers? I have simply been using top.
This doesn't tell the full story, though, as there are reports of splay decoding Layer III using as little as 4% of the CPU under a 280MHz SA-110 Netwinder while Xaudio and MAD still require 15-23%. This is quite a dramatic improvement for splay, and I'd love to see if MAD could likewise be optimized. (Hi Nicolas!)
4%? impressive. A SA-110 at 280mhz (I thought the netwinder was 275) is roughly a pentium 200.
How does mad compare in quality of output? My sound system here is rather weak, so I can not really hear the difference between say mpg123 and mad (if there is one).
Your docs mention the 'is' kluge. What files use is? None of my collection seems to sound different whether I have the kluge enabled or not.
There are only a very few places where I make assumptions about type length, and this is largely due to haste more than anything. These places are marked so I (or you :-) can fix them.
would a s/long long/int64_t/g patch be accepted then?
Sure.
A bigger problem in my view with respect to portability are the places where I rely on sign-extending right-shifts, and the GCC extension I used to initialize members of the Huffman table unions. In time I'll fix these too.
well gcc is one of the most portable compilers around, depending on gcc is not that horrible. But yeah, it would probably be better to fix this as well.
I'm hoping I can at least get the code to compile with (gasp) Microsoft's VC++ compiler. There are some things I'd like to do that apparently a lot of Windows users would appreciate...
Other than building a generic MP3 player, an idea I have is to write something that will modify the global_gain field of a Layer III stream, effectively changing the overall loudness when the audio is decoded. There has been some interest expressed in this idea, as it would permit a way to "normalize" an MP3 file without converting to WAV first, normalizing, then encoding back into MP3, losing quality in the process. Modifying the global_gain field is trivial as it is located at fixed positions within each frame, and it has a direct and predictable effect on the Layer III requantization and scaling.
I'm not a Windows programmer, so I've promised only to write the back-end stuff to do this, including some analysis to determine what might be an appropriate global_gain offset. A command-line tool would be easy for me, but someone else will have to write the Windows GUI for the masses. :-) I'm guessing whoever does this *may* want to use a compiler other than gcc...
Be careful not to confuse libmad with madplay; all (well, most) exported libmad symbols are indeed prefixed with mad_. The only exception I think is the fixed-point abstraction which uses f_ and the fixed_t type. (Perhaps these should also be prefixed with mad_?)
if it is exported in libmad.h (or seen in 3rd party code) it should have the mad_ prefix or some other library specifix prefix.
I'll see about fixing this.
The other audio_* calls you see in madplay are not part of libmad; they're part of madplay's audio abstraction to support multiple output modules. A look through audio.h, audio.c, and audio_*.c should be instructive. Admittedly, madplay probably should not have defined its own audio_init() and audio_finish(). Sorry about that. :-)
perhaps the audio functions could get wrapped into their own lib as well? If I were to write a player based on this lib, I would likely use the audio routines as well. No sense recreating the wheel.
Maybe not a bad idea. The ID3 stuff should also probably be in a library.
Moving to subdirs would be a good idea.
/mad /decoder /audio /id3 (maybe) /player /docs
I'll probably do this later on, particularly once I get into cleaning up and writing some documentation.
As I see it, if a dsp exists, it does all the math. So it should simply be possible to do: stream -> dsp -> synth / sound (depending on dsp). In the library it should simply choose not to do the hard work, bypassing most of the lib.
That takes the fun, and maybe the whole point, away from the lib. ;-)
I was recently doing some performance tests to compare MAD against other fixed-point decoders I'm aware that also run under ARM. Here's what I found; this is the amount of CPU time required to decode a stereo MPEG stream in each of the audio layers as a percentage of audio real-time:
What do you use to create these numbers? I have simply been using top.
% time madplay -o raw:- $file >/dev/null % time xaudio -output=raw:- $file >/dev/null % time splay -d - $file >/dev/null % time mpg123 -s $file >/dev/null
I then divided the CPU time of each by the actual playing time of the file.
(Some versions of "time" also show the CPU use percentage directly, if you want to play the actual output through /dev/audio.)
How does mad compare in quality of output? My sound system here is rather weak, so I can not really hear the difference between say mpg123 and mad (if there is one).
Although I've not delved seriously into it yet, there is a formal ISO/IEC compliance test suite described by part 4 of the 11172 standard.
I have compared the output of MAD to *some* of the supplied compliance data, and the largest error I found was an off-by-one in the least significant of 24 bits. In other words, 16 bit output should largely be error-free...
This assumes you're not using the FPM_APPROX option; with this there is a noticeable degradation of quality. Another caveat is the 'is' kluge...
Your docs mention the 'is' kluge. What files use is? None of my collection seems to sound different whether I have the kluge enabled or not.
The meaning of 'is' is... intensity stereo. This is a joint stereo encoding option for all three layers, although it's not used very often in Layer III.
Briefly, Layer III can use any of the following stereo encodings:
regular stereo L and R channels are encoded independently
middle/side (M/S) joint stereo L+R and L-R are encoded in place of L and R
intensity joint stereo high frequencies are encoded mono with some stereo imaging info
M/S + intensity joint stereo a combination of M/S for low freqs and intensity for high freqs
Not many encoders support this last option, so it's not very common, but it's the one I've found that gives different results under different decoders.
Without the kluge, MAD produces a weak signal in the right channel for all the bitstreams I was able to find that use M/S + intensity stereo. With the kluge, MAD sounds most like the output from Xaudio, but there are still annoying artifacts in the right channel. The best output I've found (with a clear signal in both channels) comes from mpg123.
I've already started discussing this in other places, but I'd like to get into more of the details here, perhaps in another thread...
-rob
Briefly, Layer III can use any of the following stereo encodings:
regular stereo L and R channels are encoded independently
middle/side (M/S) joint stereo L+R and L-R are encoded in place of L and R
intensity joint stereo high frequencies are encoded mono with some stereo imaging info
M/S + intensity joint stereo a combination of M/S for low freqs and intensity for high freqs
when the player says "joint stereo" which one is it?
regular stereo L and R channels are encoded independently
middle/side (M/S) joint stereo L+R and L-R are encoded in place of L and R
intensity joint stereo high frequencies are encoded mono with some stereo imaging info
M/S + intensity joint stereo a combination of M/S for low freqs and intensity for high freqs
when the player says "joint stereo" which one is it?
One of the last three. :-)
I realize madplay doesn't tell you, mostly because libmad doesn't return this information in a way that's easily recognized.
For Layer III (only), when frame.mode == MAD_MODE_JOINT_STEREO, the frame.mode_ext bits have the following meaning:
0x1 intensity stereo 0x2 M/S stereo
either or both of these will be set in joint stereo mode.
I might be convinced to put this information in frame.flags in a way that's compatible with all layers; madplay could then also more easily show it.
-rob