Antti,
(I have copied the mad-dev list as I think Rob might be able to help out with the inline assembler syntax if it really is needed... :-)
For fast decoding, the situation you are trying to achieve is that after every 32x32 hardware multiply instruction, only the upper 32bits of the answer are referenced in order to generate the mad_fixed_t result. (You will have to check the compiler output to be sure...)
Please give this patch a try. It _may_ achieve the desired effect without needing to resort to assembler.... ?
--- fixed.h.original Wed May 30 04:11:29 2001 +++ fixed.h Wed May 30 04:42:12 2001 @@ -398,10 +398,16 @@ (((hi) << (32 - (MAD_F_SCALEBITS - 1))) | \ ((lo) >> (MAD_F_SCALEBITS - 1)))) + 1) >> 1) # else -# define mad_f_scale64(hi, lo) \ - ((mad_fixed_t) \ - (((hi) << (32 - MAD_F_SCALEBITS)) | \ - ((lo) >> MAD_F_SCALEBITS))) +# if defined(OPT_SPEED) +# define mad_f_scale64(hi, lo) \ + ((mad_fixed_t) \ + (((hi) << (32 - MAD_F_SCALEBITS)))) +# else +# define mad_f_scale64(hi, lo) \ + ((mad_fixed_t) \ + (((hi) << (32 - MAD_F_SCALEBITS)) | \ + ((lo) >> MAD_F_SCALEBITS))) +# endif # endif # define MAD_F_SCALEBITS MAD_F_FRACBITS # endif
Andre --
--- Antti Antinoja antti@neonzion.fi wrote:
I'll try to add the asm multiply function yet then we see.. the gap reduced greatly after i turned debugging off (..really).. It took ~ 72 s for a 60 s mp3....
I'm not sure if there would be any gain in case I implement an asm multiply function... I tryed that, but there are some things I can't figure out with the cris asm while in extended inline.... cris offers muls.m and mulu.m (m for format, b=byte, w=word, d=double word) functions.. I thing i should use the signed one.. ok.. but there seems to be something with the registers... the lo variable is read correctly from the registers, but hi will contain only some grap.. ( I made a little test.c program to evaluate this asm thingy easily...).
this is a non working attempt:
# define MAD_F_MLX(hi, lo, x, y) \ __asm__ ("muls.d %2, %3" \ : "=r" (lo), "=r" (hi) \ : "%r" (x), "r" (y) \ : "0")
The manual describes following :
Assembler syntax: MULS.m Rs,Rd Size: The operands are byte, word, or dword. The result is 64 bits. Operation: MOF = ((m)Rs * (m)Rd) >> 32; Rd = (dword)((m)Rs * (m)Rd); Description: Both operands are sign extended from the size (m) to dword, and the extended operands are multiplied, generating a 64-bit result. The lower 32 bits of the result are written to Rd, and the upper 32 bits are written to the multiply overflow register (MOF). N and Z flags are set depending on the 64-bit result. The V flag is set if the result is more than 32 bits: V-flag = ((Rd >= 0) && (MOF != 0)) || ((Rd < 0) && (MOF != -1))
____________________________________________________________ Do You Yahoo!? Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk or your free @yahoo.ie address at http://mail.yahoo.ie
Hi!
Got it working with asm multiply!
the working inline asm (for cris):
# define MAD_F_MLX(hi, lo, x, y) \ asm ("muls.d %2, %3\n\t" \ "move $mof, %1"\ : "=r" (lo), "=r" (hi)\ : "%r" (x), "0" (y) \ : "1")
some tests i run trough:
1. this will output to somewhere.. (no audio hw.. yet) [root@nz /var/tmp/koe/axis/devboard_lx/apps/mad-0.13.0b]528# ./madplay m.mp3 -t100 MPEG Audio Decoder 0.13.0 (beta) - Copyright (C) 2000-2001 Robert Leslie sec: 947085366 usec: 328802 Title: Chasing Sheep Is Best Left To Artist: Michael Nyman Album: The Essential Michael Nyman Ba Genre: Soul Year: 1992 Comment: Catalonia is an opresed nation 3829 frames decoded (0:01:40.0), -764.6 dB peak amplitude, 0 clipped samples sec: 947085463 usec 330781 elapsed: 97.001979
2. /dev/null [root@nz /var/tmp/koe/axis/devboard_lx/apps/mad-0.13.0b]528# ./madplay m.mp3 -t100 -o /dev/n ull MPEG Audio Decoder 0.13.0 (beta) - Copyright (C) 2000-2001 Robert Leslie sec: 947085621 usec: 802448 Title: Chasing Sheep Is Best Left To Artist: Michael Nyman Album: The Essential Michael Nyman Ba Genre: Soul Year: 1992 Comment: Catalonia is an opresed nation 3829 frames decoded (0:01:40.0), -764.6 dB peak amplitude, 0 clipped samples sec: 947085716 usec 323177 elapsed: 94.520729
3. /dev/null (2nd run.. ) [root@nz /var/tmp/koe/axis/devboard_lx/apps/mad-0.13.0b]528# ./madplay m.mp3 -t100 -o /dev/n ull MPEG Audio Decoder 0.13.0 (beta) - Copyright (C) 2000-2001 Robert Leslie sec: 947085820 usec: 582448 Title: Chasing Sheep Is Best Left To Artist: Michael Nyman Album: The Essential Michael Nyman Ba Genre: Soul Year: 1992 Comment: Catalonia is an opresed nation 3829 frames decoded (0:01:40.0), -764.6 dB peak amplitude, 0 clipped samples sec: 947085915 usec 102500 elapsed: 94.520052
4. -o k.wav (in and out files are on a nfs mount... slightly slower.. network delays and heavy driver (?) [root@nz /var/tmp/koe/axis/devboard_lx/apps/mad-0.13.0b]528# ./madplay m.mp3 -t100 -o k.wav MPEG Audio Decoder 0.13.0 (beta) - Copyright (C) 2000-2001 Robert Leslie sec: 947085998 usec: 353854 Title: Chasing Sheep Is Best Left To Artist: Michael Nyman Album: The Essential Michael Nyman Ba Genre: Soul Year: 1992 Comment: Catalonia is an opresed nation 3829 frames decoded (0:01:40.0), +0.1 dB peak amplitude, 1 clipped samples sec: 947086106 usec 609479 elapsed: 108.255625
5. same as previouse, but with -v. [root@nz /var/tmp/koe/axis/devboard_lx/apps/mad-0.13.0b]572# ./madplay m.mp3 -t100 -o k.wav -v MPEG Audio Decoder 0.13.0 (beta) - Copyright (C) 2000-2001 Robert Leslie sec: 947088534 usec: 87292 Title: Chasing Sheep Is Best Left To Artist: Michael Nyman Album: The Essential Michael Nyman Ba Genre: Soul Year: 1992 Comment: Catalonia is an opresed nation 00:01:40 Layer III, 192 kbps, 44100 Hz, joint stereo (MS), no CRC 3829 frames decoded (0:01:40.0), +0.1 dB peak amplitude, 1 clipped samples sec: 947088648 usec 638646 elapsed: 114.551354
The results show that some amount of time is spent by the nfs transfers. I'm not sure if the /dev/null realtime reuslts are good enough that I should start implementing the D/A hardware...
/a