Rob + others,
Please find attached a version of the layer3 III_imdct_l() function I've written in ARM assembler.
I've been messing around with it for a while, mainly as an exercise to learn ARM assembler, but hopefully the end result is worth sharing.
Performance wise, it should be quite a bit faster than the current C version (and slightly more accurate as well since the multiply-accumulate steps accumulate into 64bits, then round back to 32bits only when finished).
Unfortunately, I don't actually have any ARM based hardware that will play audio, so its only been tested standalone with a small range of test cases on the 'armulator' ARM simulator. Any feedback (especially overall performance) or bug reports from anyone actually able to test it for real would be appreciated.
It assembles for me (using gcc v2.95.2) with just:
arm-elf-gcc -c arm_III_imdct_l.S
(making sure that the extension is .S rather then .s to cause gcc to run it though the C pre-processor).
I'd appreciate some feedback, even if the performance increase isn't big enough to bother including it in future releases.
Andre --
____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie
Andre,
Bravo!
I've only had a quick opportunity to test your code on a real ARM processor, but preliminary results seem to show about a 2 percentage point reduction in CPU utilization on my SA-1100, which is very good. It may be enough to bring Layer III performance on par with Xaudio's integer decoder under ARM.
I'll continue to do some tests, and may very well include the code in a future release.
Many thanks, -rob