Hello Tony,
Friday, March 26, 2004, 7:06:37 PM, you wrote: TF> Hello Gregory, TF> In the link that you had provided, they are talking about the TF> output pcm bit width and not about the fixed point data width. And TF> some use floating point Math.
This link is about decoded pcm accuracy comparable to reference decoder. Pure MAD is strongly full compliance decoder. Because its rms satisfied necessary accuracy of 10^-6. For 16 bits tms55xx implementation I am using this link as follow. I have pcm output after my version of decoder and pcm after for example pure MAD. Calculating rms and see how it is big. So MAD rms ~10^-8 and my rms to MAD is 10^-4. So I have limited accuracy decoder with possible uncertainty of 10^-8.
Or what do you mean when ask: "With 16 bit implementation on TI55x how much accuracy you got?" ?
3DSP. For calculation I am using following scheme 4.28(result) = 4.28(data) * 1.31(constants). Why.
1. I am using a little different way to port MAD to tms55xx and 3DSP hardware. I don't provide operation like MAD_mul, but I am provide full functions like IMDCT_36(), III_requantize(), frame_synthesis() and so on. These function is implemented on asm in C callable form. Why? Because this way give best compromise between decoder complexity and achieved performance. I have to utilize as small as possible of MIPS for decoding process. And same about memory usage.
For example: in III_requantize() I use Taylor approximation instead of big table for "is" and result is very good :) - More accurate than current III_requantize() function provide - this is a reason of initiating of discussion. Achieved differences between my implementation and MAD are +-1 and we find why and how to improve it. This +-1 at Xr[] data gives ~ +-25 at sbsamples[] and rms of ~1.2e-8. So I start to worry about it and initiate this discussion. As I said - we have found what and why and how to improve it.
2. 3DSP hardware. I work with sp3r5m version of 3DSP. This one has 32x32 multiplier and 4 48 bit accumulators. The 64 bits of result of multiplication is placed in 2 accumulators as follow bits[0..47] in accum_0 and bits[31..64] in accum_1 - yes they are 17 bits overlapped. as result of this hardware feature if I will use 4.28=4.28 x 4.28 - the 4.28 result will be split between 2 accumulators - and for extraction I have to use some additional action. In case of 1.31 constants the result is placed in acuum_1 and there is no problem for extraction. So you see this is only hardware specific - and I don't try to achieve additional accuracy by 1.31 constants :). 24 bits of correct result will fully satisfy for me :).
TF> One more thing accuracy test is to be done only for TF> compl.bit in ISO test streams I believe. Yes. I will use it - when I finish.
And one more - I have word document which describes the Xr[] calculation through Taylor approximation and some pictures for fixed point fractional multiplication. I can send it to all who is interested.