Antti,
(I have copied the mad-dev list as I think Rob might be able to help out with the inline assembler syntax if it really is needed... :-)
For fast decoding, the situation you are trying to achieve is that after every 32x32 hardware multiply instruction, only the upper 32bits of the answer are referenced in order to generate the mad_fixed_t result. (You will have to check the compiler output to be sure...)
Please give this patch a try. It _may_ achieve the desired effect without needing to resort to assembler.... ?
--- fixed.h.original Wed May 30 04:11:29 2001 +++ fixed.h Wed May 30 04:42:12 2001 @@ -398,10 +398,16 @@ (((hi) << (32 - (MAD_F_SCALEBITS - 1))) | \ ((lo) >> (MAD_F_SCALEBITS - 1)))) + 1) >> 1) # else -# define mad_f_scale64(hi, lo) \ - ((mad_fixed_t) \ - (((hi) << (32 - MAD_F_SCALEBITS)) | \ - ((lo) >> MAD_F_SCALEBITS))) +# if defined(OPT_SPEED) +# define mad_f_scale64(hi, lo) \ + ((mad_fixed_t) \ + (((hi) << (32 - MAD_F_SCALEBITS)))) +# else +# define mad_f_scale64(hi, lo) \ + ((mad_fixed_t) \ + (((hi) << (32 - MAD_F_SCALEBITS)) | \ + ((lo) >> MAD_F_SCALEBITS))) +# endif # endif # define MAD_F_SCALEBITS MAD_F_FRACBITS # endif
Andre --
--- Antti Antinoja antti@neonzion.fi wrote:
I'll try to add the asm multiply function yet then we see.. the gap reduced greatly after i turned debugging off (..really).. It took ~ 72 s for a 60 s mp3....
I'm not sure if there would be any gain in case I implement an asm multiply function... I tryed that, but there are some things I can't figure out with the cris asm while in extended inline.... cris offers muls.m and mulu.m (m for format, b=byte, w=word, d=double word) functions.. I thing i should use the signed one.. ok.. but there seems to be something with the registers... the lo variable is read correctly from the registers, but hi will contain only some grap.. ( I made a little test.c program to evaluate this asm thingy easily...).
this is a non working attempt:
# define MAD_F_MLX(hi, lo, x, y) \ __asm__ ("muls.d %2, %3" \ : "=r" (lo), "=r" (hi) \ : "%r" (x), "r" (y) \ : "0")
The manual describes following :
Assembler syntax: MULS.m Rs,Rd Size: The operands are byte, word, or dword. The result is 64 bits. Operation: MOF = ((m)Rs * (m)Rd) >> 32; Rd = (dword)((m)Rs * (m)Rd); Description: Both operands are sign extended from the size (m) to dword, and the extended operands are multiplied, generating a 64-bit result. The lower 32 bits of the result are written to Rd, and the upper 32 bits are written to the multiply overflow register (MOF). N and Z flags are set depending on the 64-bit result. The V flag is set if the result is more than 32 bits: V-flag = ((Rd >= 0) && (MOF != 0)) || ((Rd < 0) && (MOF != -1))
____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie