In the process of searching for a good way to test my UDA1341 audio driver, I just cleaned up splay's fixpoint support a bit and created a ramdisk image with splay on it.
Sorry Rob, but I couldn't resist comparing splay against madplay on a SA1100 CPU because of recent concerns about splay benchmarks, threads, etc.
So here are the results. Since my ramdisk doesn't have the 'time' command, I grabbed a 'top' screen while each players were running in the background.
madplay:
12:27am up 27 min, 1 user, load average: 0.17, 0.04, 0.01 11 processes: 10 sleeping, 1 running, 0 zombie, 0 stopped CPU states: 23.8% user, 0.5% system, 0.0% nice, 75.5% idle Mem: 28140K av, 13328K used, 14812K free, 0K shrd, 6144K buff Swap: 0K av, 0K used, 0K free 3752K cached
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 84 root 16 0 1272 1272 1176 S 0 23.8 4.5 0:11 madplay 85 root 1 0 872 872 704 R 0 0.5 3.0 0:00 top 1 root 0 0 512 512 440 S 0 0.0 1.8 0:02 init 2 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kswapd 3 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kflushd 4 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kupdate 34 root 0 0 500 500 436 S 0 0.0 1.7 0:00 syslogd 50 root 0 0 568 568 492 S 0 0.0 2.0 0:00 inetd 54 root 0 0 472 472 400 S 0 0.0 1.6 0:00 getty 55 root 0 0 472 472 400 S 0 0.0 1.6 0:00 getty 56 root 0 0 904 904 748 S 0 0.0 3.2 0:00 bash
splay:
12:29am up 29 min, 1 user, load average: 0.12, 0.05, 0.01 13 processes: 11 sleeping, 2 running, 0 zombie, 0 stopped CPU states: 17.6% user, 0.5% system, 0.0% nice, 81.7% idle Mem: 28140K av, 13672K used, 14468K free, 0K shrd, 6144K buff Swap: 0K av, 0K used, 0K free 3752K cached
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 86 root 14 0 904 904 536 R 0 17.0 3.2 0:08 splay 88 root 0 0 904 904 536 S 0 0.5 3.2 0:00 splay 89 root 1 0 872 872 704 R 0 0.5 3.0 0:00 top 1 root 0 0 512 512 440 S 0 0.0 1.8 0:02 init 2 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kswapd 3 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kflushd 4 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kupdate 34 root 0 0 500 500 436 S 0 0.0 1.7 0:00 syslogd 50 root 0 0 568 568 492 S 0 0.0 2.0 0:00 inetd 54 root 0 0 472 472 400 S 0 0.0 1.6 0:00 getty 55 root 0 0 472 472 400 S 0 0.0 1.6 0:00 getty 56 root 0 0 904 904 748 S 0 0.0 3.2 0:00 bash 87 root 0 0 904 904 536 S 0 0.0 3.2 0:00 splay
Even if splay has 3 threads running, they account for 17.5% CPU vs 23.8% CPU for madplay. Both screen snapshots were taken at the same playback time so the "TIME" comparison should be accurate too.
The test file used is MPEG-1 Layer 3, joint stereo, 44100Hz, 128kbit/s.
This was tested on Linux 2.3.99-pre6-rmk1-np4. /proc/cpuinfo shows:
Processor : Intel StrongARM-1110 rev 5 (v4l) BogoMIPS : 194.15 Hardware : Intel-Assabet
The splay version I used is the cleaned one. Its playback performances is the same as the old one, however is starts right away instead of crunching 100% CPU for few seconds. This will produce proper results if mesured with the 'time' comands.
If you are interested, here are where you can find all relevant files:
splay-0.8.2-fp1.tgz ftp://ftp.netwinder.org/users/n/nico/ ramdisk_img_splay.gz ftp://ftp.netwinder.org/users/n/nico/ mad-0.10.3b.tar.gz ftp://ftp.mars.org/pub/mpeg/
Nicolas
Hi Nicolas,
Your results are interesting.
I tried to reproduce them on my SA-1100 platform, but I continue to get results similar to my previous tests:
empeg:~# time ./splay-0.8.2-fp1 -d - WS010038.A08.mp3 >/dev/null
real 1m40.466s user 1m37.240s sys 0m2.450s
empeg:~# time ./madplay -o pcm:- WS010038.A08.mp3 >/dev/null MPEG Audio Decoder version 0.10.4 (beta) - Copyright (C) 2000 Robert Leslie WS010038.A08.mp3: 9550 frames decoded (4:09.4)
real 1m9.162s user 1m6.000s sys 0m0.980s
Since this file's audio playing time is 4:09, I calculate about 40% and 27% CPU respectively for splay and madplay. (This madplay is a developmental 0.10.4b not 0.10.3b but the results are still similar.)
[I note that 40% is an improvement over my previous measurement of 43% for a previous version of splay on this platform :-]
Taking a running snapshot during live audio playback tends to confirm this:
empeg:~# ps u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND #0 20 0.0 11.7 2188 856 ttyS1 S Feb17 0:09 -bash #0 592 40.6 12.6 1856 924 ttyS1 S 11:15 0:11 ./splay-0.8.2-fp #0 593 0.0 12.6 1856 924 ttyS1 S 11:15 0:00 ./splay-0.8.2-fp #0 594 0.6 12.6 1856 924 ttyS1 S 11:15 0:00 ./splay-0.8.2-fp #0 606 0.0 9.5 2268 696 ttyS1 R 11:16 0:00 ps u
empeg:~# ps u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND #0 20 0.0 11.8 2188 868 ttyS1 S Feb17 0:09 -bash #0 611 26.8 10.2 7040 748 ttyS1 S 11:17 0:02 ./madplay WS0100 #0 615 0.0 9.4 2268 692 ttyS1 R 11:17 0:00 ps u
For reference:
empeg:~# cat /proc/cpuinfo Processor : Intel StrongARM-1100 rev 9 (v4l) BogoMIPS : 208.08 Hardware : SA1100-based Revision : 0000 Serial : 0000000000000000
empeg:~# uname -a Linux empeg 2.2.14-rmk5-np17-empeg22 #193 Sat Mar 25 18:04:26 GMT 2000 armv4l unknown
Incidentally, your CPU info was:
Processor : Intel StrongARM-1110 rev 5 (v4l) BogoMIPS : 194.15 Hardware : Intel-Assabet
I'm not very familiar with the differences between the SA-1100 and the SA-1110, but my understanding is that the SA-1110 has at least a faster memory bus.
At any rate, the reported time is definitely affected on the SA-110 and possibly the SA-1110 too when threads are used in splay, although curiously, reporting on my SA-1100 is not significantly affected.
To wit:
[without threads] empeg:~# time ./splay-0.8.2-fp1 -t 0 test.mp3
real 4m24.557s user 1m41.140s sys 0m2.650s
[with threads] empeg:~# time ./splay-0.8.2-fp1 test.mp3
real 4m24.727s user 1m34.870s sys 0m5.530s
So the total CPU time on the SA-1100 is similar: 104 vs 100 seconds.
Compare that with the SA-110:
[without threads] labrat:~$ time ./splay-0.8.2-fp1 -t 0 test.mp3
real 4m23.827s user 0m46.620s sys 0m1.850s
[with threads] labrat:~$ time ./splay-0.8.2-fp1 test.mp3
real 4m23.896s user 0m0.650s sys 0m1.540s
That's quite a difference: 48 vs 2 seconds total CPU? Can this be accurate?
labrat:~$ cat /proc/cpuinfo Processor : Intel sa110 rev 3 BogoMips : 262.14 Hardware : Rebel-NetWinder Serial # : 1517 Revision : 44ff
labrat:~$ uname -a Linux labrat 2.2.13 #27 Sat Apr 15 01:32:47 CDT 2000 armv4l unknown
If you believe the "with threads" numbers, splay is using less than 1% of the SA-110 CPU. (Incidentally, `top' seems to confirm this.) Without threads, it's more like 18%. For the record, madplay scores about 16% on the same machine with the same input file.
I don't have access to a SA-1110 machine so I can't compare directly, but I'd be happy to provide a `time' binary if you'd like to take measurements with something better than `top'.
Any idea why the numbers are so different for splay using threads vs splay without threads? Can you compare this on the SA-1110?
In any event, madplay seems to be significantly faster than splay right now *on the SA-1100* regardless of the threads issue. Perhaps someone else could compare on another SA-1100 host?
Cheers,
On Sat, 13 May 2000, Rob Leslie wrote:
Hi Nicolas,
Your results are interesting.
I tried to reproduce them on my SA-1100 platform, but I continue to get results similar to my previous tests:
empeg:~# time ./splay-0.8.2-fp1 -d - WS010038.A08.mp3 >/dev/null
real 1m40.466s user 1m37.240s sys 0m2.450s
empeg:~# time ./madplay -o pcm:- WS010038.A08.mp3 >/dev/null MPEG Audio Decoder version 0.10.4 (beta) - Copyright (C) 2000 Robert Leslie WS010038.A08.mp3: 9550 frames decoded (4:09.4)
real 1m9.162s user 1m6.000s sys 0m0.980s
Since this file's audio playing time is 4:09, I calculate about 40% and 27% CPU respectively for splay and madplay. (This madplay is a developmental 0.10.4b not 0.10.3b but the results are still similar.)
Did you read README.ARM in splay's archive? For best performances you need to add some compiler flags by hand. Also you could use the splay binary included in the ramdisk image I mentionned in my last mail.
Or send me your both binaries so I'll try them on my system.
Nicolas
Did you read README.ARM in splay's archive? For best performances you need to add some compiler flags by hand. Also you could use the splay binary included in the ramdisk image I mentionned in my last mail.
Yes, I followed the README.ARM instructions, and I also tried using the binary from your ramdisk image.
Or send me your both binaries so I'll try them on my system.
Your binaries are probably fine, but do you want me to send you a `time' program? (If so, please follow up privately.) You could try timing the pure decoding time like I did by dumping the PCM output to /dev/null.
I'm also curious how the threads issue is manifested on the SA-1110.
Cheers, -rob
On Sun, 14 May 2000, Rob Leslie wrote:
Your binaries are probably fine, but do you want me to send you a `time' program? (If so, please follow up privately.) You could try timing the pure decoding time like I did by dumping the PCM output to /dev/null.
OK, so I just did on SA1100 hardware for which I don't currently have audio output support:
[root@thinclient /]# date; splay -d - test.mp3 | cat > /dev/null; date Mon May 15 21:43:44 EDT 2000 Mon May 15 21:44:11 EDT 2000
[root@thinclient /]# date; madplay -o raw:/dev/null test.mp3; date Mon May 15 21:50:42 EDT 2000 MPEG Audio Decoder version 0.10.3 (beta) - Copyright (C) 2000 Robert Leslie test.mp3: 1959 frames decoded (0:51.1) Mon May 15 21:51:02 EDT 2000
[root@thinclient /]# cat /proc/cpuinfo Processor : Intel StrongARM-1100 rev 9 (v4l) BogoMIPS : 124.52 Hardware : ADS ThinClient
[root@thinclient /dev]# uname -s -r Linux 2.3.99-pre8-rmk1-np1
So we got:
splay: 27 sec 52.8 %CPU if real-time madplay: 20 sec 39.1 %CPU if real-time
The same tests with the same binaries on SA1110 hardware gave me:
[root@Linux /]$date; splay -d - test.mp3 | cat > /dev/null; date Thu Jan 1 00:17:03 UTC 1970 Thu Jan 1 00:17:27 UTC 1970
[root@Linux /]$date; madplay -o raw:/dev/null test.mp3; date Thu Jan 1 00:22:16 UTC 1970 MPEG Audio Decoder version 0.10.3 (beta) - Copyright (C) 2000 Robert Leslie test.mp3: 1959 frames decoded (0:51.1) Thu Jan 1 00:22:27 UTC 1970
[root@Linux /]$cat /proc/cpuinfo Processor : Intel StrongARM-1110 rev 5 (v4l) BogoMIPS : 194.15 Hardware : Intel-Assabet
[root@Linux /]$uname -s -r Linux 2.3.99-pre8-rmk1-np1
So we get:
splay: 24 sec 47.0 %CPU if real-time madplay: 11 sec 21.5 %CPU if real-time
So by looking at this splay is much worse than madplay... and even more on a SA1110 which is pretty weird.
However, here is what 'time' produces when both players are actually playing:
[root@Linux /]$time splay test.mp3 8.81user 0.25system 0:50.44elapsed 17%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (150major+131minor)pagefaults 0swaps
[root@Linux /]$time /tmp/madplay test.mp3 MPEG Audio Decoder version 0.10.3 (beta) - Copyright (C) 2000 Robert Leslie test.mp3: 1959 frames decoded (0:51.1) 11.37user 0.31system 0:50.39elapsed 23%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (311major+52minor)pagefaults 0swaps
Here splay is better than madplay and these 'time' results are also coherent with what I get from 'top' statistics.
If I do a
cat /dev/zero > /dev/null &
I can actually see with 'top' that splay uses approx 18% CPU and the 'cat' process uses the 80% leftover. When madplay uses its 23% CPU, cat actually gets 75% CPU.
It's true that splay uses a different object when writing to a file instead of the audio device, but looking at the code it shouldn't make so a big difference...
I don't know what to think.
Nicolas