On Friday, March 1, 2002, at 12:48 AM, Biswaroop Banerjee wrote:
Can anybody tell me which char set is understood in HFS volumes. For e.g. in DOS only A-Z, 0-9 and _ are the valid characters. So, what is for HFS.
Names on HFS are 31 bytes (27 bytes for volume names) and can consist of any byte value except ASCII colon (":"). Note: that means a zero byte *is* valid (which can make things difficult for implementations that use C-style strings which are zero-terminated.
Above I said bytes, not characters. To support localizations to many languages, Mac OS supports a variety of character set encodings. Some of those encodings use two bytes to represent a single character. That means that file names might only contain 15 characters, which would occupy 30 bytes.
Off hand, I don't know if or where the various encodings are described. There may be documentation on Apple's developer web site.
Remember that HFS is case insensitive. The definition of what characters are "upper case" or "lower case" is based on the MacRoman encoding. MacRoman is similar to ISO Latin 1. Take a look at the Darwin sources for code that does a case insensitive string compare using MacRoman (it will be called as part of the B-tree key comparison function for the catalog B-tree).
Again, for writing into a HFS volume for creating a CD image can we go for UNICODE .
I would advise against that. While you can store just about any byte sequence (as long as it doesn't contain an ASCII colon), storing Unicode (eg., UTF-8 or UTF-16) would make for garbage-looking filenames when viewed on a Macintosh.
The HFS volumes contain data in "Big Endian " format. Can anybody tell me what are the fields which has to be filled in Big Endian format.
Everything is big endian. That even includes file names. So, Macintosh encodings that use two bytes per character will store those two bytes in big endian form on HFS. And the two bytes per UTF-16 code point are stored in big endian form on HFS Plus.
-Mark