Difference in Data types??

List overview All Threads
Download

newer

older

Differences - HFS/HFS+

Date Offset

Biswaroop(External)

29 Mar 2002 29 Mar '02

11:40 a.m.

HI, Everybody Wishing u all a very Happy Holi.( An Indian festival played with colours)

Well in the MDB structure for an HFS volume the field vol.drXTClpSiz /* clump size for extents overflow file */ is 4 bytes long. Again in the Catalog Data Record structure the member filClpSize; /* file clump size */ takes 2 bytes.

Therefore when i assign the value of the first variable to the second I lose information. But then one way is to make the MDB's variable contain a value only for 2 bytes then assignment won't make a data loss. But the value for the MDB's variable was calculated as 1/128 th part of the Total volume size. An emphirical formula used in the hfsutils package. Any comments on this??? Please is there any simple formula to find out the extent file size and the catalog file size for a volume when we know before hand how many files have to be in that volume. For eg. if i know i have to write "X" files contained in "Y" number of directories. Then can i calculate what should be the volume's clump size for the extents overflow file and the catalog file.

Waiting for ur explanations.

Bye Biswaroop

Each Day gives us an Opprutunity to Ruin it, those who Fail, Succeed in Life. -- Bisban

Attachments:

attachment.html (text/html — 3.5 KB)

Show replies by date

Simon Bazley

29 Mar 29 Mar

2:41 p.m.

Firstly, check out the code for the hfs file system on linux. Theres a lot of helpful little tricks when implementing things in C. In particular there is a directive which when added to the end of a struct made it pack as you'd expect it to instead of doing strange things similar to what you're suggesting. Again working out assumtions is easy when you can see how someone else has done it.

As for working out catalog file sizes based on the number of files, you can't. The problem is the eventual catalog size will be based on the number of nodes in the file, and the number of nodes depends on howmany files you can get in each node. That depends on the record size within the node and that varies for some node types, depending on the length of the filename. There are 4 types of node in a catalog file, I believe only 2 will vary in this way. It may be all 4, I'm not sure. Extent files however don't vary like this so you can work it out. Just checkout how big key structure is in an extents file, then work out the packing density of that structure in a node. Remember though that Nodes are 512 bytes (I think) in HFS, but 4k in HFS+ (this is because the maximum name length changed in HFS+, so records could be up to 4k big). I suspect you're best just to create all the keys then just combine them together and see what you get. I'm sure you could work it out, but It'd be easier just to make a massive file, then shrink it to the used size when you've finished adding stuff.

The size of the Catalog file and extents file (allocated size and nodes used) are stored in the mdb, incase that was what you meant.

Simon

"Biswaroop(External)" wrote:

...

HI, Everybody Wishing u all a very Happy Holi.( An Indian festival played with colours) Well in the MDB structure for an HFS volume the field vol.drXTClpSiz /* clump size for extents overflow file */ is 4 bytes long. Again in the Catalog Data Record structure the member filClpSize; /* file clump size */ takes 2 bytes. Therefore when i assign the value of the first variable to the second I lose information. But then one way is to make the MDB's variable contain a value only for 2 bytes then assignment won't make a data loss. But the value for the MDB's variable was calculated as 1/128 th part of the Total volume size. An emphirical formula used in the hfsutils package. Any comments on this??? Please is there any simple formula to find out the extent file size and the catalog file size for a volume when we know before hand how many files have to be in that volume. For eg. if i know i have to write "X" files contained in "Y" number of directories. Then can i calculate what should be the volume's clump size for the extents overflow file and the catalog file. Waiting for ur explanations. Bye Biswaroop Each Day gives us an Opprutunity to Ruin it, those who Fail, Succeed in Life. -- Bisban

Mark Day

5:07 p.m.

On Friday, March 29, 2002, at 03:40 AM, Biswaroop(External) wrote:

...

Well in the MDB structure for an HFS volume the field vol.drXTClpSiz /* clump size for extents overflow file */ is 4 bytes long. Again in the Catalog Data Record structure the member filClpSize; /* file clump size */ takes 2 bytes. Therefore when i assign the value of the first variable to the second I lose information.

I'm not sure why you're copying from one to the other. The drXTClpSiz is the clump size for the extents B-tree only. Since the B-tree is used in a very different way from typical user files, I don't see a reason to try and set an ordinary file's clump size to be the same as one of the B-trees.

I believe Apple's code sets the clump size in a catalog record to zero; I think you can do the same. It turns out that having different clump sizes for different files wasn't very useful. If an application really wanted to make sure that a file was allocated in large contiguous pieces, it was generally better to try and pre-allocate it in one giant contiguous piece (or when allocating additional space, make the entire allocation contiguous). At runtime, Apple's code just uses a volume-wide default for ordinary files (i.e. ones with a catalog record).

...

Please is there any simple formula to find out the extent file size and the catalog file size for a volume when we know before hand how many files have to be in that volume. For eg. if i know i have to write "X" files contained in "Y" number of directories. Then can i calculate what should be the volume's clump size for the extents overflow file and the catalog file.

Certainly no simple formula for the catalog B-tree. In part that is because the size of the catalog is determined in part by the lengths of the file and directory names (even more so on HFS Plus, where the keys in index nodes are variable length). And for volumes that are modified over time, the order of operations will affect the size of the B-tree in complex ways. I'm sure you could come up with a statistical guess based on average name lengths, average density of nodes (i.e. how "full" they are), etc.

Your particular case of creating a CD is actually a much simpler problem, and you can compute an exact answer if you want. Since the files won't be modified over time, you can guarantee that they will not be fragmented. That means you can get by with a minimal extents B-tree containing no leaf records. That means a single allocation block (for the header node; the other nodes are unused and should be filled with zeroes).

Since you know the complete set of files and directories in advance, you can build an optimal tree by packing as many leaf records in a node as possible, and then moving to the next node. All it requires is knowing the order that you will assign directory IDs to directories, and then be able to sort the file and directory names for the items in a single directory. That way you can predict the entire leaf sequence. Once you know the number of leaf nodes, you can calculate the number of index nodes that will be parents of the leaf nodes, and so on up the tree until you get to a level containing exactly one node (the root). This should be relatively easy for HFS because the records in index nodes are constant size, so the calculation for each level should just be a simple divide and round up. For HFS Plus, you would have to keep track of the actual file or directory names since the length of the keys in index nodes vary based on the name lengths.

If that's too complicated, you could always fall back to assuming a constant size (maximum or average) for all of the records. Don't forget that for thread records, the key is of fixed size but the data is variable (since it contains a variable-length string).

-Mark

James Pearson

2 Apr 2 Apr

11:57 a.m.

...

Certainly no simple formula for the catalog B-tree. In part that is because the size of the catalog is determined in part by the lengths of the file and directory names (even more so on HFS Plus, where the keys in index nodes are variable length). And for volumes that are modified over time, the order of operations will affect the size of the B-tree in complex ways. I'm sure you could come up with a statistical guess based on average name lengths, average density of nodes (i.e. how "full" they are), etc. .... If that's too complicated, you could always fall back to assuming a constant size (maximum or average) for all of the records. Don't forget that for thread records, the key is of fixed size but the data is variable (since it contains a variable-length string).

That's what I did for the mkhybrid code - I didn't want the catalog "file" to grow, so by trial and error, I set the initial size of the catalog file to twice the "default" used by the libhfs routines in hfsutils. This seems to work OK for virtually all cases (if this is isn't big enough, the mkhybrid code uses a brute force approach and 're-creates' the HFS side of the file system with a new default size twice as big again ...).

James Pearson

8532

Age (days ago)

8536

Last active (days ago)

hfs-user@lists.mars.org

3 comments

4 participants

tags (0)

participants (4)

Biswaroop(External)
James Pearson
Mark Day
Simon Bazley