--- Dan Malek dan@mvista.com wrote: >
Well, I'm impressed........
I expected to do some serious profiling and assembler writing, but all I did was configure and make. It runs on my wimply 50 MHz embedded MPC860 PowerPC, uses about 80% of the CPU, NFS root file system. It's an Embedded Planet board with a Crystal CS4218 codec running Linux. I'm using mad-0.2.12b.
Pretty nice. Now I have to find another holiday week project :-).
If you are still looking for an assembler project, MAD could probably use a PowerPC assembler version of the imdct function to match the ARM version I wrote, or maybe an assembler version of the synthesis functions which I don't think anyone has had a go at yet.
There is also still some C optimisation to do if you are up to the challenge :-) see Rob's TODO list.
Andre --
____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie
Hi Dan,
In addition to what Andre says in his message (below), it wouldn't hurt to add a PPC assembly version of the FPM fixed-point multiply routines. Probably you're using the FPM_APPROX version now, so you could improve sound quality and possibly performance as well by writing a PPC version. See libmad/fixed.h for details.
BTW Dan, I can't tell if you're subscribed to mad-dev (under another address perhaps.) Let me know either way -- I assumed you were, but if not you should probably subscribe. If you are, you might want to change your subscription to match the address you send from so I don't have to manually approve your messages.
Cheers, -rob
Rob Leslie wrote:
........ it wouldn't hurt to add a PPC assembly version of the FPM fixed-point multiply routines.
OK, I'll look into it.
BTW Dan, I can't tell if you're subscribed to mad-dev (under another address perhaps.)
Yeah, it's probably another address. I create some domain name every once in a while and point all of my e-mail to a common point :-). I'm definitely subscribed....have been for a long time. I just finally got around to doing something.
....... manually approve your messages.
Sorry about that. It should be OK now.
-- Dan
Rob Leslie wrote:
In addition to what Andre says in his message (below), it wouldn't hurt to add a PPC assembly version of the FPM fixed-point multiply routines.
Actually, the compiler does such a good job, I don't think we need anything special on PowerPC. In 32-bit mode, the PowerPC has a 32x32 -> 64 bit multiply cabability. It uses two instructions (hi/low) and two destination registers. The C-code is optimized right into what I would write for an inline function. I have looked at the output from the IMDCTs and I don't think I can improve on that with any assembler either.
.... Probably you're using the FPM_APPROX version now,
Yes, that is what I was using. Unfortunately, the FPM_64BIT blows my CPU budget on the baby processor, but it is close. I can crank up the PLL and probably get a little more with some profiling.
I am finding bugs in the compiler with the multi-dimensional arrays and in 'make', so I need to correct these as well.
I'll keep you posted on any improvements I make.
Thanks for the great starting point!
-- Dan
Dan Malek wrote:
....... Unfortunately, the FPM_64BIT blows my CPU budget on the baby processor, but it is close. I can crank up the PLL and probably get a little more with some profiling.
Ooops....pilot error. It added about 28% to the CPU requirement, it still works OK (I should just try to play the damn files instead of just running profiling tools :-).
-- Dan
Dan Malek dan@mvista.com wrote:
Actually, the compiler does such a good job, I don't think we need anything special on PowerPC. In 32-bit mode, the PowerPC has a 32x32 -> 64 bit multiply cabability. It uses two instructions (hi/low) and two destination registers.
I don't know the PowerPC architecture, but at the very least if you have a multiply-accumulate instruction (or can simulate one) you may still be able to improve performance and accuracy. It may be worth looking at.
In any event, I'm very glad to hear it works. :-)
Cheers, -rob
Rob Leslie wrote:
....... but at the very least if you have a multiply-accumulate instruction (or can simulate one) you may still be able to improve performance and accuracy. It may be worth looking at.
Some PowerPC cores have a MAC, but it is not defined as a general part of the ISA..........But, you gave me an idea (that still doesn't require assembly programming)...........
In any event, I'm very glad to hear it works. :-)
Way cool :-).....
-- Dan
Dan Malek wrote:
........But, you gave me an idea (that still doesn't require assembly programming)...........
So, my idea was to modify imdct36 so the mad_f_mul + mad_f_mul .... sequences were replaced by 64-bit multiply/accumulates, then you just round/scale once at the end (like I would do on a DSP). This improved the performance by about 6%, and I ended up with great compiler generated code. For example:
t6 = mad_f_mul(X[4], 0x0ec835e8L) + mad_f_mul(X[13], 0x061f78aaL);
becomes:
macreg = 0; mad_f_mac(macreg, X[4], 0x0ec835e8L); mad_f_mac(macreg, X[13], 0x061f78aaL); t6 = mad_f_macscale(macreg);
Of course, on the longer multiply/accumulates this makes more sense, but you get the idea in a minimal amount of space here :-). The 'macreg' is a 64-bit signed long long, the 'mad_f_mac' macro is just the multiply part of mad_f_mul, and the 'mad_f_macscale' is just the rounding/scaling part of mad_f_mul, which is done just once at the end of the MAC sequence to get the mad_fixed_t result.
I guess I need to run some official bit streams, but it sounds OK. This could certainly be the PowerPC optimization. I'll send a patch to someone if they would like to see it.
-- Dan
Andre wrote:
If you are still looking for an assembler project,
I don't usually go looking for assembler projects, but I will probably get around to this one :-).
There is also still some C optimisation to do if you are up to the challenge :-) see Rob's TODO list.
I have been hanging out here for a while, and now that I am using the software as I intended long ago, I'll probably be a little more active.
Thanks.
-- Dan