I ported my x86 speedup diff to libmad 0.15.0b. It is a trivial diff that only touches the x86 assembly and replaces shrd with two faster instructions. shrd is slow as hell on today's x86 implementations because of the not normally used semantics regarding the CPU flags.
On my Athlon, Pentium 3 and VIA C3 it is about 30% faster.