I would like to have a look please Gregory. I have already started the
port
// MLO( x, y);
ld %r0, [x]
ld %r1, [y]
mul64 %r0, %r1
ld %r2, [%mhi]
// MLA( x1, y1);
ld %r0, [x1]
ld %r1, [y1]
mul64 %r0, %r1
add %r2, [%mhi]
// MLA( x2, y2);
ld %r0, [x2]
ld %r1, [y2]
mul64 %r0, %r1
add %r2, [%mhi]
Into
ld %r0, [x]
ld %r1, [y]
mul64 %r0, %r1
ld %r0, [x1]
ld %r1, [y1]
ld %r2, [%mhi]
mul64 %r0, %r1
ld %r0, [x2]
ld %r1, [y2]
add %r2, [%mhi]
mul64 %r0, %r1
Saves 2 cycles per multiply, only another 8 to go
Best Regards
Julian Gardner
RSD Communications Ltd
============================
Please do not send me any html messages
as these may be deleted at our mail server
without notice to sender or recipient.
Please ONLY send me file attachments with
any messages with extensions .zip, .rar,
or .pdf as any other file types may be
deleted at our mail server without notice
to sender or recipient.
> -----Original Message-----
> From: Grigory A. [mailto:Ryhor@tut.by]
> Sent: Friday, July 02, 2004 2:34 AM
> To: Joolz [RSD]
> Cc: mad-dev(a)lists.mars.org
> Subject: Re: [mad-dev] A little help needed on code
> optimisation to deal with delay slots
>
> Hi Joolz!
>
> You have BIG problem :) 10 cycles stall every mul.
> The most "multiplication consuming" functions are
> - IMDCT - actually it isn't too much
> - synthesis filter
>
> last one has a lot of multiplication
> all other relatively free from mult operation. May be stereo
> processing has a little.
>
> If you have short of calculation power I can recommend you to
> do same as I did. I've separated calculation functions and
> implement them on asm.
> I can send the source code of of MAD with such improvements
> that I used for
> - tms320vc55xx and
> - sp3R5 3DSP cores.
>
>
> --
> Best regards,
> Grigory mailto:Ryhor@tut.by
>
>