Dan Malek wrote:
Rob Leslie wrote:
It also includes (untested) native fixed-point math support for the PowerPC platform, contributed by David Blythe.
It's about as good as the all 'C' version I sent you before :-). The comment about 4xx is misleading. If it was really done for the IBM 4xx processors, it would be using the MAC that it contains, not just standard PowerPC assembler fixed point multiply.
Perhaps it depends on what compiler you are using, (I'm using gcc 2.95.2). The FPM_64BIT option results in reasonable code, but gcc uses a 3 instruction sequence to do the reduction from 64 to 32 bits when it could be done with fewer. I didn't see any suitable 16-bit MAC instruction sequence to do what the MAD_F_MLA macro does on the 405gp, but if you suggest one I'll try it. My goal was to make the madplay distribution use less cpu on our 405, while trying not to lose accuracy. It went from ~40% to 25% with the simple changes i made to the mad-0.11.4b distribution. Other's experience may differ; for example, the changes i made have much smaller impact when sso optimizations are enabled. The comments about 4xx are there in case anyone was thinking that 64 bit or altivec instructions would have been more effective.
measuring on a 200Mhz 405 with gcc 2.95.2, -O3, i get the following cpu utilization
FPM_DEFAULT 20% (low accuracy) FPM_PPC 24% FPM_64BIT 35% FPM_64BIT + OPT_SSO 26% regards, david