On Jun 5, 2006, at 2:06 AM, Stefan Huelswitt wrote:
I have a question regarding handling of ID3 tags which are non- latin1 encoded e.g. iso-8859-5.
I think I can get ucs4 strings from libid3tag and convert them with iconv to whatever codeset originaly was used.
How can I get information about the original codeset?
In ID3v2, the original text encoding is one of ISO-8859-1 (i.e. Latin-1), UTF-16, or UTF-8. You can find the encoding in the assigned field of the tag frame containing the string; usually it is the first field.
ID3v1 has no text encoding specification, so it is impossible to know what encoding was used; libid3tag assumes it is Latin-1 and translates this to UCS-4.
Is there another designated way to handle these cases?
If you happen to know the encoding of an ID3v1 tag is something other than Latin-1, you can translate it from UCS-4 back to Latin-1, then re-interpret it however you like.