Thanks, Andre!
I've attached a new version to this email which hopefully should be the most accurate version so far - it now rounds everywhere like FPM_ARM and FPM_64BIT, but has the advantage over them of using 64bit accumulators for the imdct part.
I found a small bug, but after fixing it the output is indeed the most accurate so far: 5.375e-08 rms.
I've attached a patch for the bug; let me know if you'd prefer a better fix.
Just a small tweak: if ASO_IMDCT is defined, the window_l[] table in layer3.c doesn't need to be included (at a saving of 144 bytes....) as imdct_l_arm.S already contains its own copy.
Thanks for catching this.
Cheers, -rob