Hi All,
I am writing an application on Mac to list out the files deleted from the system (even from the trash). I have written code to read the catalog file node by node. But I have no idea of how deleted files are represented and how can I access them. Any guesses?
Any input on this would be of great help.
Thanks in advance.
Best Regards Rashmi
Hi Rashmi, It's been a while since I've looked at the HFS specs (incase you don't already have it, I'd reccomend getting "Inside Macintosh Files" By Apple, Addison Wesley (ISBN 0-201-63244-6)). The only bit that I could find relating to deletion of the Trash, is page 2-37 Deleting Files and File Forks, which gives the function prototypes to call to delete a file and explains that all the system does when you delete is "delete both forks of a bile by removing the catalog entry for the file and asjusting the volume information block and volume bitmap accordingly". It also points out that "These functions do not actually erase the disk areas occupied" so you might be able to salvage the file. Anyway, answering your question, as each file (or directory) on disk is essential a node Id with a pointer to some Catalog Directory Data Record telling you where it is, I'd guess that files that are in the trash, simply point to some symbolic leaf file to indicate that status. I'd assume you could look at a specific Data Record for a file before and after you put it in the trash and see what changes. It could be that trash status is one of the Finder information flags, as otherwise I'm not use how the original directory location would be preserved (see pages 2-71 through 2-74 if you have no idea what I'm talking about).
If the catalog record is actually deleted, I'm not sure what the system does. I'd guess the system doesn't bother to actually wipe a disused catalog record, and just wipes some of it. My first guess would be to set the cdrType value to 0 (cdrType is an enum in which only values 1-4 mean anything). If that is the case you'd be able to find out which block of disk actually contains the original file and track it. However as is always the case with deleted files, you should check each block in the allocation table as you follow it. The deletion process says it zero's the relevant bits in the table so it's safe to say if a bit isn't zero it's been re-used already. The catalog record contains the ExtDataRec for the data and resource forks, so if the above is true, you're in luck, and you have the first 3 chunks of the data and resource forks. As for the Extents Overflow Files, I'm not use how it handles deletions, I can't see any obvious way for it to mark a record as deleted without actually wiping it, but the book does imply similarities with the type of Node used in the main catalog, which does. Under FAT each block of an actual file contains a reference somewhere to the next block. I'm not sure but if such a thing exists somewhere in HFS, but failing that I can't see how to recreate a deleted file in the absence of it's catalog and extents records.
Hope this helps, Simon
P.S. There is some comparison done in the HFS+ revision document between that and HFS (in particular it gives some details pictures relating to catalog and extent leaf format), that might give you a further insight, only I can't lay my hands on it now, so I can't tell you if it is useful.
Rashmi M wrote:
Hi All,
I am writing an application on Mac to list out the files deleted from the system (even from the trash). I have written code to read the catalog file node by node. But I have no idea of how deleted files are represented and how can I access them. Any guesses?
Any input on this would be of great help.
Thanks in advance.
Best Regards Rashmi
On Feb 8, 2006, at 3:53 AM, Rashmi M wrote:
I am writing an application on Mac to list out the files deleted from the system (even from the trash). I have written code to read the catalog file node by node. But I have no idea of how deleted files are represented and how can I access them. Any guesses?
First of all, "the trash" is just a separate directory on the disk. Putting something in the trash merely moves the file or directory into the trash directory. The file or directory doesn't actually get deleted until the user "empties" the trash. The name and location of the trash directory has changed in various versions of Mac OS. In Mac OS 9, the directory is named "Trash" and is in the volume's root directory; by convention, it has its "invisible" bit set in the Finder Info. In Mac OS X, there is a directory named ".Trashes" in the volume's root directory; inside there are directories whose names are numeric: a user ID for each user who has a trash directory (they're created on demand).
When a file or directory is actually deleted, its record(s) are removed from the Catalog B-tree. And if it had overflow extents (more than 3 extents for HFS, or more than 8 extents for HFS Plus) then the overflow extent records are removed from the Extents B- tree. In Mac OS X 10.4.0 and later, a file or directory can have extended attributes stored in the Attributes B-tree; records for the deleted item would be removed from the Attributes B-tree as well. The space occupied by a file's forks is freed by clearing the corresponding bits in the allocation bitmap.
Trying to recover deleted files is problematic. Many file systems, such as UFS, EXT, or FAT, will simply mark a directory entry as "deleted" by overwriting a small number of bytes; you may be able to restore those bytes to a non-deleted state and find some or all of the original file's information. It's generally not that easy with HFS or HFS Plus.
In the B-trees, there are typically several records in a single node. They're essentially an array of records. If you delete a record in the middle of the node, the records that follow it get shuffled up to overwrite the original record, usually leaving no remnants of the original record. If the record being deleted is the last one in the node, it can be deleted by merely decrementing the number of records, in which case it might be possible to recover the original record. But with Mac OS X, we found that some non-Apple disk repair utilities were too aggressive in trying to recover valid- looking records in the unused portion of the node, so we began explicitly overwriting the newly freed space with zeroes. So, if Mac OS X deleted a file, the original record will always be overwritten with other data (either other records, or zeroes).
So perhaps your best bet at recovery is if you can recognize the content of a file. You could scan the volume's free allocation blocks looking for recognizable content. But beware that a file may not have been stored contiguously on the media. And the content may have been moved over time (especially with Mac OS X's adaptive hot file clustering), so you may see valid-looking content that is actually from an older version of the file.
-Mark
Hello,
My question is not strictly about HFS/HFS+, but about the GPT (GUID Partition Table), the new partition table used by Apple for MacIntel computers. The table, in the MBR (Master Boot record), contains 4 entries. One of them should be the EFI Partition (about 200GB, with the EE code). If there is a Windows FAT32 or NTFS partition (what the new boot manager BootCamp does), one entry is busy (with the 07 code).
There remains also two. With normal tools, there should be only one Macintosh partition (with the AF code). Do you think that my tool should be coded defensively, that is foresee that some tool, some day, will allow the user to have two different Macintosh partitions ?
Thanks in advance for your ideas.