I would like to have a look please Gregory. I have already started the port
// MLO( x, y); ld %r0, [x] ld %r1, [y] mul64 %r0, %r1 ld %r2, [%mhi]
// MLA( x1, y1); ld %r0, [x1] ld %r1, [y1] mul64 %r0, %r1 add %r2, [%mhi]
// MLA( x2, y2); ld %r0, [x2] ld %r1, [y2] mul64 %r0, %r1 add %r2, [%mhi]
Into
ld %r0, [x] ld %r1, [y] mul64 %r0, %r1 ld %r0, [x1] ld %r1, [y1] ld %r2, [%mhi] mul64 %r0, %r1 ld %r0, [x2] ld %r1, [y2] add %r2, [%mhi] mul64 %r0, %r1
Saves 2 cycles per multiply, only another 8 to go
Best Regards
Julian Gardner RSD Communications Ltd ============================ Please do not send me any html messages as these may be deleted at our mail server without notice to sender or recipient. Please ONLY send me file attachments with any messages with extensions .zip, .rar, or .pdf as any other file types may be deleted at our mail server without notice to sender or recipient.
-----Original Message----- From: Grigory A. [mailto:Ryhor@tut.by] Sent: Friday, July 02, 2004 2:34 AM To: Joolz [RSD] Cc: mad-dev@lists.mars.org Subject: Re: [mad-dev] A little help needed on code optimisation to deal with delay slots
Hi Joolz!
You have BIG problem :) 10 cycles stall every mul. The most "multiplication consuming" functions are
- IMDCT - actually it isn't too much
- synthesis filter
last one has a lot of multiplication all other relatively free from mult operation. May be stereo processing has a little.
If you have short of calculation power I can recommend you to do same as I did. I've separated calculation functions and implement them on asm. I can send the source code of of MAD with such improvements that I used for
- tms320vc55xx and
- sp3R5 3DSP cores.
-- Best regards, Grigory mailto:Ryhor@tut.by