New subject: Re[2]: A little help needed on code optimisation to deal with delay slots

2 Jul 2004

      I would like to have a look please Gregory. I have already started the
port
//	MLO( x, y);
    ld %r0, [x]
    ld %r1, [y]
    mul64 %r0, %r1
    ld %r2, [%mhi]
//	MLA( x1, y1);	
    ld %r0, [x1]
    ld %r1, [y1]
    mul64 %r0, %r1	
    add %r2, [%mhi]
//	MLA( x2, y2);
    ld %r0, [x2]
    ld %r1, [y2]
    mul64 %r0, %r1	
    add %r2, [%mhi]
Into
ld %r0, [x]
    ld %r1, [y]
    mul64 %r0, %r1
    ld %r0, [x1]
    ld %r1, [y1]
    ld %r2, [%mhi]
    mul64 %r0, %r1	
    ld %r0, [x2]
    ld %r1, [y2]
    add %r2, [%mhi]
    mul64 %r0, %r1
Saves 2 cycles per multiply, only another 8 to go
Best Regards
Julian Gardner
RSD Communications Ltd
============================
Please do not send me any html messages
as these may be deleted at our mail server
without notice to sender or recipient.
Please ONLY send me file attachments with
any messages with extensions .zip, .rar,
or .pdf as any other file types may be
deleted at our mail server without notice
to sender or recipient.
...
-----Original Message-----
From: Grigory A. [mailto:Ryhor@tut.by] 
Sent: Friday, July 02, 2004 2:34 AM
To: Joolz [RSD]
Cc: mad-dev@lists.mars.org
Subject: Re: [mad-dev] A little help needed on code 
optimisation to deal with delay slots
Hi Joolz!
You have BIG problem :) 10 cycles stall every mul.
The most "multiplication consuming" functions are

IMDCT - actually it isn't too much
synthesis filter

last one has a lot of multiplication
all other relatively free from mult operation. May be stereo 
processing has a little.
If you have short of calculation power I can recommend you to 
do same as I did. I've separated calculation functions and 
implement them on asm.
I can send the source code of of MAD with such improvements  
that I used for

tms320vc55xx and
sp3R5 3DSP cores.

--
Best regards,
 Grigory                            mailto:Ryhor@tut.by

RE: [mad-dev] A little help needed on code optimisation to deal with delay slots