On Jun 6, 2006, at 4:46 PM, Kurt Roeckx wrote:
On Mon, Jun 05, 2006 at 09:06:00AM +0000, Stefan Huelswitt wrote:
I have a question regarding handling of ID3 tags which are non- latin1 encoded e.g. iso-8859-5.
I think I can get ucs4 strings from libid3tag and convert them with iconv to whatever codeset originaly was used.
What you really should do is convert things to wchar_t, and then use functions that work with them (like wprintf()) to output things. This should be much more portable.
I find it unfortute libid3tag does everything with ucs4 instead of wchar_t's.
Unfortunately wchar_t is not a very portable way of supporting Unicode. Its size is compiler-dependent, and may not be large enough to represent all of the ISO/IEC 10646 code points. For example, ISO/ IEC 9899:1999 (C99) allows wchar_t to be as small as 8 bits; even 16 bits is not large enough. The type is also locale-dependent.
I'm not sure what's the best way to convert it from ucs4 to wchar_t though, and can't think of a portable way, and I'd love if someone could point me out how to do it.
(On Linux (glibc), casting it from a id3_ucs4_t to a wchar_t seems to work, but it's not portable.)
There's nothing wrong with this per se as long as wchar_t is at least 21 bits wide (for example, when __STDC_ISO_10646__ is defined) or you are sure the UCS-4 code point in question can otherwise be represented by a wchar_t -- and the relevant locale is compatible. Note that casting id3_ucs4_t to wchar_t is not the same as casting (id3_ucs4_t *) to (wchar_t *); the latter is definitely not portable though it may work if the underlying types happen to have the same size.
A simple way to translate a UCS-4 string to wchar_t might be (untested):
wchar_t *ucs4_to_wchar(id3_ucs4_t const *ucs4) { wchar_t *wchar;
wchar = malloc(id3_ucs4_size(ucs4) * sizeof(wchar_t)); if (wchar) { wchar_t *ptr = wchar; while ((*ptr++ = (wchar_t) *ucs4++)) ; }
return wchar; }
This assumes __STDC_ISO_10646__ is defined, or wchar_t is otherwise compatible with ISO/IEC 10646 character codes.