Nicolas,
Well done for spotting the ARM 'Round while you shift' optimisation !
Originally in imdct_l_arm.S, I made a fairly arbitrary choice to round in some places and just shift in others to try and balance code-size/speed against accuracy. With your optimisation its possible to round everywhere with no penalty, so I guess it makes sense to do so. I've attached a new version to this email which hopefully should be the most accurate version so far - it now rounds everywhere like FPM_ARM and FPM_64BIT, but has the advantage over them of using 64bit accumulators for the imdct part.
Rob,
Just a small tweak: if ASO_IMDCT is defined, the window_l[] table in layer3.c doesn't need to be included (at a saving of 144 bytes....) as imdct_l_arm.S already contains its own copy.
Andre --
____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie
Thanks, Andre!
I've attached a new version to this email which hopefully should be the most accurate version so far - it now rounds everywhere like FPM_ARM and FPM_64BIT, but has the advantage over them of using 64bit accumulators for the imdct part.
I found a small bug, but after fixing it the output is indeed the most accurate so far: 5.375e-08 rms.
I've attached a patch for the bug; let me know if you'd prefer a better fix.
Just a small tweak: if ASO_IMDCT is defined, the window_l[] table in layer3.c doesn't need to be included (at a saving of 144 bytes....) as imdct_l_arm.S already contains its own copy.
Thanks for catching this.
Cheers, -rob