On Thu, Apr 01, 2004 at 02:00:44PM +0100, Erik Jälevik wrote:
What's a good heuristic for determining whether a file is a valid MP3 or not?
My first thought was to grab a chunk at the beginning of the file and repeatedly call mad_header_decode until it comes back with a header and take that as an indication that the file is valid.
However, on trying this out on the first 10k of a plain text file, mad_header_decode still returns a header found at 8k but with wrong data in it. Maybe this is due to it just looking for the sync bits and this file happened to have them within its first 10k?
Is there a better way of doing this? Maybe start trying to decode and if more errors than a certain threshold occur, assume the file is not valid?
I read up to 25000 (arbitrary) bytes of data. Fatal errors are fatal; the only errors I handle are MAD_ERROR_LOSTSYNC, MAD_ERROR_BADCRC (which I don't have a test case for), and MAD_ERROR_BUFLEN/MAD_ERROR_BUFPTR.
More specifically, I count the number of bytes read in a given pass; if more than the threshold is read without getting a good packet, I bail. (I also explicitly subtract things like Xing and ID3 headers from this count; ID3v2 headers can easily be larger than that.)
The primary case where this matters is WAVs with MP3 data in them, and unknown headers. This handles those fine (though why people are putting MP3 data in WAVs is well beyond my comprehension).
My actual code is at
http://cvs.sourceforge.net/viewcvs.py/*checkout*/stepmania/stepmania/src/Rag...
See RageSoundReader_MP3::do_mad_frame_decode.
(Any comments on the way I'm doing this are appreciated. I havn't received any bug reports due to mis-detected files in quite a while.)