Hello listers, The technical notes for HFS+ TN1150 refers to 'B-Trees' and the documentation for HFS in 'Inside Macintosh - Data Organisation on volumes' refers to 'B*- Trees'
Going by the definitions provided by NIST (the national institute of standards) a B-Tree is defined as Definition: A balanced search tree in which every node has between t-1 and 2t-1 children, where t is an arbitrary constant. This is a good structure if much of the tree is in slow memory (disk), since the height, and hence the number of accesses, can be kept small, say one or two, by picking a large t.
and a B*-Tree is defined as : Definition: A B-tree in which nodes are kept 2/3 full by redistributing keys to fill two child nodes, then splitting them into three nodes.
As far as I can gather from the definitions and the documentation, HFS uses a B*-Tree and HFS+ uses a B-Tree.... does anybody know why ? Does this mean that in case of HFS I would have to perform an additional redistribution of keys to keep the nodes just 2/3 full ( which I am not considering right now ) ? The HFS-Specs as such do not mention anything about keeping the nodes just 2/3 full ...or am I wrong there ?
Would be glad to hear your comments/feedback ...
Regards,
Nandini Hengen
Entwicklung, This is completely academic for what you're trying to do as you are only going to write to the tree once, so you don't have to maintain it in a balanced state. Just get your data, sort it and write it. Every node you make should be set exactly how you want it. I reccoment using the maximum 8 tree depths, then evenly filling everything under that with your data. You may find just filling each node is better, I'm not sure.
Remember the redistruction you talk about is purely a toss up between searchability and writability. That is the whole point in btrees
Simon
On Sun, 24 Mar 2002, Entwicklung wrote:
Hello listers, The technical notes for HFS+ TN1150 refers to 'B-Trees' and the documentation for HFS in 'Inside Macintosh - Data Organisation on volumes' refers to 'B*- Trees'
Going by the definitions provided by NIST (the national institute of standards) a B-Tree is defined as Definition: A balanced search tree in which every node has between t-1 and 2t-1 children, where t is an arbitrary constant. This is a good structure if much of the tree is in slow memory (disk), since the height, and hence the number of accesses, can be kept small, say one or two, by picking a large t.
and a B*-Tree is defined as : Definition: A B-tree in which nodes are kept 2/3 full by redistributing keys to fill two child nodes, then splitting them into three nodes.
As far as I can gather from the definitions and the documentation, HFS uses a B*-Tree and HFS+ uses a B-Tree.... does anybody know why ? Does this mean that in case of HFS I would have to perform an additional redistribution of keys to keep the nodes just 2/3 full ( which I am not considering right now ) ? The HFS-Specs as such do not mention anything about keeping the nodes just 2/3 full ...or am I wrong there ?
Would be glad to hear your comments/feedback ...
Regards,
Nandini Hengen
On Sunday, March 24, 2002, at 10:59 AM, Entwicklung wrote:
As far as I can gather from the definitions and the documentation, HFS uses a B*-Tree and HFS+ uses a B-Tree.... does anybody know why ? Does this mean that in case of HFS I would have to perform an additional redistribution of keys to keep the nodes just 2/3 full ( which I am not considering right now ) ? The HFS-Specs as such do not mention anything about keeping the nodes just 2/3 full ...or am I wrong there ?
It's just an imprecise use of terminology. Both HFS and HFS Plus can use the same implementation for B-trees (and they do in Mac OS 8, 9 and Mac OS X).
Actually, I think "B+-Tree" is more accurate. A B+-Tree is really just a B-tree where all of the data is kept in leaf nodes (and interior nodes contain only keys and pointers). The B-trees in HFS and HFS Plus do have that characteristic.
The difference between B-tree and B*-Tree is a difference in how full the nodes are kept in the face of inserts and deletes. Apple's code splits an overly full node into two nodes (not splitting two siblings into three like a B*-tree). However, Apple's code does have an extra optimization that tends to keep nodes fuller than a plain B-tree. When a node is too full to insert a new record, it first tries to "rotate" one or more records to a sibling. I know it tries rotating to the left; I can't remember whether it tries rotating to the right. If rotation is not possible, the node is split into two.
-Mark
Thanks a ton! - that answers my question...
Regards, N.Hengen
----- Original Message ----- From: "Mark Day" mday@apple.com To: "Entwicklung" entwicklung@whengenibk.de Cc: hfs-user@lists.mars.org; studentdev@lists.apple.com Sent: Monday, March 25, 2002 6:52 PM Subject: Re: [hfs-user] Clarification needed wrt B/B*-Trees
On Sunday, March 24, 2002, at 10:59 AM, Entwicklung wrote:
As far as I can gather from the definitions and the documentation, HFS uses a B*-Tree and HFS+ uses a B-Tree.... does anybody know why ? Does this mean that in case of HFS I would have to perform an additional redistribution of keys to keep the nodes just 2/3 full ( which I am not considering right now ) ? The HFS-Specs as such do not mention anything about keeping the nodes just 2/3 full ...or am I wrong there ?
It's just an imprecise use of terminology. Both HFS and HFS Plus can use the same implementation for B-trees (and they do in Mac OS 8, 9 and Mac OS X).
Actually, I think "B+-Tree" is more accurate. A B+-Tree is really just a B-tree where all of the data is kept in leaf nodes (and interior nodes contain only keys and pointers). The B-trees in HFS and HFS Plus do have that characteristic.
The difference between B-tree and B*-Tree is a difference in how full the nodes are kept in the face of inserts and deletes. Apple's code splits an overly full node into two nodes (not splitting two siblings into three like a B*-tree). However, Apple's code does have an extra optimization that tends to keep nodes fuller than a plain B-tree. When a node is too full to insert a new record, it first tries to "rotate" one or more records to a sibling. I know it tries rotating to the left; I can't remember whether it tries rotating to the right. If rotation is not possible, the node is split into two.
-Mark