Hi!
I want to a little discuss this topic - MAD performance.
1. The reason is - usage of MAD in embedded systems. In this case memory usage and cpu cycles consumption are become critical issues. Rewriting of 7-8 functions on asm can provide good compromise between software complexity and decoder performance. Yes it makes code platform specific - but I hope it is acceptable way for embedded systems.
2. some current results. Using such way I've ported MAD to 2 dsp platform: TI tms320vc55xx and 3DSP sp3r5m. Memory: Simple libmab + minimad application can be fitted in less than 16 000 DSP instructions and all data memory usage: constant tables + buffers + stack takes less than 64 kBytes. Cycles: there are functions with fixed "cycles requirements" - independent of input bitsream, slightly depended and highly. good example of first type is frame synthesis - mad_synth_frame() and last type - III_huffdecode(); now in average for these dsp synthesis takes ~100-120k cycles
3. interest. And finally I wanna ask only one thing: Is it possible to improve III_huffdecode() function in meaning of cycles usage? because right now average frame size of input mp3 bitstream is 300-400 bytes (128 kbps / 44 frames ) - and to decode this 350 bytes III_huffdecode() spends ~250k+ cycles! this is real bottleneck in calculation process. And at higher bitrates cycles consumption just growing up.
What ways can you propose to overcome it? for example scenario: - "unpack necessary tables" : ~50k cycles - "look up" decoding process: less than ... cycles
any proposal are highly welcome.