PreviousNextContents

Macintosh File Systems

 

Your Macintosh computer needs fast and easy access a tremendous amount of information in order to function and perform its various tasks. In particular, it must be able to access the system files, application programs, and other data as you work. This data may be stored on a variety of physical devices, including hard drives, floppy drives, CD-ROMs, memory cards, etc. In order to provide a consistent interface to these various physical devices the Macintosh File Systems were developed. To fully understand and appreciate how your Macintosh works and some of the things that may go wrong with it, you need to have some understanding of the Macintosh File Systems. Although this is a complex and somewhat technical subject, a general overview of the file systems will make you a more informed Macintosh user. In particular, the importance of preventative maintenance and backups should become apparent. You will also gain a better understanding of what is going on with your computer and drives when problems do develop.

The two most common Macintosh File Systems are the original Hierarchical File System Standard (HFS Standard or HFS) and the newer Hierarchical File System Extended (HFS Extended, HFS Plus, or HFS+). The original HFS file system was developed in the days of the original 400K floppy disk. At that time a 20 MB hard drive was considered a huge storage device. The HFS Extended format was developed primarily to make more efficient use of storage space on large drives. It is now the format most commonly used on the Macintosh and the format that we will focus on here. However, before delving into HFS Extended we will need to introduce a few basic concepts.

To allow computers to work in a consistent way with a variety of physical devices, a number of abstractions have been developed. Perhaps the most basic is the bit. A bit is the smallest unit of information that can be accessed by a computer and can be represented as a 0 or a 1. It may be stored in different devices in different ways. For example, a bit on a hard disk drive is stored as a magnetic trace, whereas on a CD-ROM it is stored as a pit on the disk. Within the computer circuitry a bit may be simply a pulse of electricity. Larger chunks of information are the byte and the word. A byte is simply eight bits and a word is two bytes. All information in the computer is encoded in the form of bits, bytes, and words.

The data storage device has been abstracted as a logical device called a volume. The computer recognizes a volume as one "device". It may actually be a floppy disk, one partition on a hard drive, a CD-ROM drive, etc. Note that one physical device, such as a single hard drive that has been partitioned, can be seen as multiple volumes by the file system. Data is stored on volumes in the form of files. A file is simply a named collection of bits. It may contain representations of user data, system data, programs, or even the structures used to keep track of where other files are stored. Macintosh files are currently split into two parts called forks–the data fork and the resource fork. It is possible for either fork to contain no data.

Both HFS and HFS Extended are specifications for how data and the information necessary to retrieve that data are stored on volumes. Volumes are divided into 512 byte logical blocks called sectors. The sector is an abstraction from the "sector" of a disk platter. The typical hard drive platter is divided into sectors of 512 bytes. Sectors are numbered from 0 and continuing to the last one on the volume. Space on a volume is allocated as a group of consecutive sectors called an allocation block. The size of the allocation block is set at the time the volume is initialized. The most common allocation block size is 4 K (8 sectors). There can be at most 2^32 allocation blocks on a volume. The file system attempts to provide allocation blocks for a file in a fixed size group called a clump. A larger clump size tends to decrease file fragmentation, but can result in wasted space being left at the end of the file. Finally, a series of contiguous allocation blocks that store a file on a volume is called an extent of the file.

The first block on a physical disk contains the driver descriptor map. This holds information about the number and location of drivers on the disk. The second block begins the disk's partition map. It specifies the start, length, and type of each partition (volume). The partition type may be HFS+, AU/X, MS-DOS, etc. The partition map is itself a partition and holds an entry for itself. The device driver (if present) is typically located after the partition map. Finally, the partitions themselves typically fill the remainder of the disk space.

A number of data structures work together to keep track of data on HFS Extended volumes. These include the following:

  • Volume Header
  • Catalog File
  • Extents File
  • Attributes File
  • Allocation File
  • Startup File

These structures will be described in more detail below. They each consist of one or more allocation blocks.

Volume Header

The HFS Extended Volume Header contains critical information about the volume as a whole. It corresponds to the Master Directory Block (MDB) of an HFS volume. A partial list of the information stored in the Volume

Header includes:

  • location and size of the other volume structure components
  • total number of folders and files on the drive
  • size of the allocation blocks in bytes
  • total number of allocation blocks on the volume
  • next free allocation block
  • default clump size for data and resource forks
  • next unusued catalog ID number
  • date/time of the volume's creation and last modification
  • language to use to display file and folder names
  • whether the volume is write-protected

The Volume Header is always located at the volume's second sector. Note that this may not be the actual second physical sector on a physical disk. Because the data in the Volume Header is so important, a copy of it is kept at the second to last sector on the volume. This is called the Alternate Volume Header. It is one of the few pieces of data on a volume that may not reside in an allocation block. This could occur if the second to last sector falls outside an allocation block. The Alternate Volume Header may be used by disk utilities such as TechTool Pro in the case of damage to the main Volume Header.

The Volume Header may become corrupt if the computer quits unexpectedly and the Volume Header has not been properly updated. This could also happen if a bad block were to develop in the Volume Header. If both the Volume Header and the Alternate Volume Header are incorrect, this can pose a challenge for repair utilities. Such damage may not be repairable. If the corruption to the Volume Header is severe enough, it may not even be possible to access data on the drive using standard software.

B-Trees

The Catalog file, Extents file, and Attributes file all make use of a data structure called a B-tree (Balanced tree) to store their information. A B-tree is a data structure specifically designed for fast retrieval of information. Using B-trees in the volume structures allows the File System to locate data on a volume containing hundreds of thousands of files in a reasonable length of time.

A B-tree file contains a series of nodes. Each node contains records. A record contains a key used to identify the record and also some data. The keys are unique and ordered so that the particular key for an individual record can be located via a search. The data may include pointers (links) to other nodes as well as other data associated with that particular key.

The nodes give the B-tree its structure and come in four types:

  • header node (the entry point into the tree)
  • map node (holds allocation data if the map record in the header gets full)
  • index node (holds pointer records)
  • leaf node (holds the data associated with a key)

A node has the following structure:

Node

The node descriptor indicates the type of node, the number of records it contains, where it belongs in the tree, and contains links to previous or next nodes.

A simple B-tree is illustrated below:

btree

In the above example it is clear that it will only be necessary to search at most three nodes to find the record associated with any key.

Damage to a B-tree may occur in the key field, pointer field, or data field. If damage occurs in a key field, then a record or whole sub-tree may not be able to be found. If in the data field of an index node (a pointer), then the sub-tree pointed to could be orphaned. Finally, if damage occurs to the data field of a leaf node, then the actual data itself for that key would be invalid. The type of damage to the file system will depend on whether the B-tree is holding the Catalog, Extents, or Attributes data and also on which type of node is damaged.

Catalog File

One of the most important files of the volume structures is the Catalog file. The Catalog file keeps track of the hierarchy of files and folders on a volume. The first extent of the Catalog file is stored in the Volume Header. This means that the Catalog header, or entry point into the Catalog, is stored in the Volume Header. If the Volume Header is damaged, then the Catalog Header may not be able to be found and it may not be possible to even locate the Catalog file.

Each file and folder in the Catalog file is assigned a unique identifier called the Catalog Node ID or CNID. For a file this is called the File ID and for a folder the Folder ID. For each file or folder the Parent ID is the CNID of the folder containing that item. Some important reserved CNID's follow:

  • 1–parent ID of the root folder
  • 2–CNID of the root folder
  • 3–CNID of the Extents file
  • 4–CNID of the Catalog file itself
  • 5–CNID of the bad block file (a special file described below)
  • 6–CNID of the Allocation file
  • 7–CNID of the Startup file
  • 8–CNID of the Attributes file

Every B-tree record must contain a key in order for the file system to be able to traverse the tree and locate that record. In the Catalog B-tree there are two possibilities for the key:

  • for a file or folder record the key contains the CNID of the parent and the name of the file and folder
  • for a thread record (a link) the key contains the CNID of the file or folder itself and no name

There are four types of Catalog leaf nodes:

  • folder record–contains information about a particular folder
  • file record–contains information about a particular file
  • folder thread record–links a folder to its parent folder
  • file thread record–links a file to its parent folder

Some of the more important information stored in the Catalog folder record includes the CNID of the folder, the number of files and folders in the folder, the creation and modification dates, the backup date, and the folder's permissions.

Information stored in the Catalog file record includes the CNID of the file, the creation and modification dates, the backup date, whether the file is locked, the location of the first eight extents of each fork, and the file's permissions.

The CNID and name of a file or folder allows the information for that item to be easily located in the Catalog B-tree. Corruption in the Catalog file can cause loss of the file or folder information contained in the Catalog records as well as incorrect placement of files and folders in the folder hierarchy. For example, if you were to suddenly find some of your files scattered about at the root level of the hard drive instead of in their correct folders, this could indicate damage to the Catalog file.

Extents File

When a file is saved, the file system assigns space on the volume to hold the file. This space consists of one or more allocation blocks. Each set of contiguous allocation blocks is called an extent. The file record of each file in the Catalog file holds the locations of the first eight extents of each fork in the file. The locations of any additional (or overflow) extents that make up a file's forks are maintained by the Extents file (also called the Extents overflow file).

The Extents file is stored as a simple B-tree. A record key in the Extents B-tree includes the CNID of the file, the type of fork (whether resource or data), and the offset in allocation blocks to the extent. Each extent location is represented as a pair of numbers: the first allocation block of the extent and the number of allocation blocks in the extent. This information is stored in the Extents file data record and allows a file fork's actual data to be located on the volume.

When the Extents B-tree is searched the information in the keys is compared in the following order: CNID, fork type, offset. Thus, the extents for each fork are grouped together and are located next to the extents for the other fork of the file.

Corruption in the Extents file could cause the file system to lose track of the locations of portions of the data in one or both forks of a file. This could cause files to be truncated or result in garbage data to appear in a file. If the Extents file itself cannot be located, then any data beyond the first eight extents of each file's fork, which is stored in the Catalog file, would be lost.

The Extents file holds information about a special file called the bad block file. If a sector is found to be bad, in other words it cannot hold data reliably, then the entire allocation block containing that sector is added to the bad block file. This ensures that the space occupied by the bad block will not be used to store data.

A bad block on a hard disk indicates an actual physical defect in the media surface at that location. Bad blocks are located during an initialization of the drive using the "zero all data" option. They may also be discovered by the drive itself as data is written to and read from the drive.

The bad block file is different than standard files. It does not have a record in the Catalog file and is not referenced in the Header file. The bad block file has a CNID of 5 for use as an identifier in the Extents file. Bad block extents are considered data forks. When a bad block is entered in the Extents file, its allocation block is marked as used in the Allocation file (see below). This prevents it from being used in the future. Keeping track of a bad block's location in the Extents file allows for consistency checks in the Allocation file. Every location marked as used in the Allocation file should correspond to an extent of some file.

An interesting aside is that when an HFS Extended volume is contained within an HFS wrapper (see HFS Wrapper below), all the extents of the HFS Extended volume are entered into the HFS volume's bad block file. This ensures that if the HFS wrapper volume mounts when using a version of the Mac OS that does not support HFS Extended, then the space occupied by the HFS Extended volume will not be written to.

Allocation File

The Allocation file keeps track of whether or not each allocation block in the volume is being used by the file system. It is a simple list with an entry for each allocation block indicating whether or not it is used. If an allocation block is marked as unused, then the file system may assign it to hold data for a new file. When a file is deleted, the allocation blocks occupied by that file are marked as free and they may be reused to hold other data at any time.

The allocation information for an HFS volume is stored in a special location on the volume called the Volume Bitmap, instead of being stored in an actual file.

Corruption in the Allocation file or the Volume Bitmap can cause the file system to think that areas actually storing data are available for use by another file. In that case the data in the original file may be overwritten and corrupted. If an unused area is marked as already allocated, then the file system will report that the volume has less free space available than it actually has.

Attributes File

The Attributes file is new to the HFS Extended specification. Like the Catalog and Extents files, the Attributes file is defined to be a B-tree. The Attributes File stores three types of 4 KB records: Inline Data Attribute records, Fork Data Attribute records and Extension Attribute records. Inline Data Attribute records store small attributes that can fit within the record itself. Fork Data Attribute records contain references to a maximum of eight extents that can hold larger attributes. Extension Attributes are used to extend a Fork Data Attribute record when its eight extent records are already used. Extended Attributes enable metadata to be associated with computer files not interpreted by the file system itself, whereas regular attributes have a purpose defined by the file system itself (such as permissions or records of creation and modification times). As an example, Extended Attributes are used by Apple's Safari browser to add security to the Macintosh operating system. When an application is downloaded by Safari, this information is stored as an Extended Attribute. The first time an attempt is made to launch the application, a dialog appears warning the user that they are about to open a downloaded application for the first time and requesting permission to proceed.

Startup File

The Startup file is intended for use by systems that do not have built-in ROM support for booting from HFS Extended volumes. It is similar to the Boot Blocks of an HFS volume. The first eight extents of the Startup File are stored in the Volume Header. This makes them easy to locate and read into memory. This file contains information formerly used by the computer's ROM to determine what program will boot the computer. For newer Macs, this is handled by the HFS Wrapper.

HFS Wrapper

Most HFS Extended volumes are embedded inside a locked HFS volume called the HFS wrapper. However, newer Mac models are beginning to support "pure" HFS Extended (wrapperless HFS Extended) format.

Embedding HFS Extended volumes in an HFS wrapper makes it possible for a computer with HFS (but not HFS Extended) support in ROM to boot from an HFS Extended volume. Additionally, if an HFS Extended volume is attached to a computer with HFS (but not HFS Extended) support, the HFS wrapper can be mounted and provide a message indicating that the computer does not support HFS Extended volumes. This was especially important during the years immediately after the introduction of the HFS Extended format. At that time many people were still using Mac OS 8.0 or earlier, which did not support HFS Extended volumes. When using a wrappered HFS+ volume under Mac OS 8.1 or above, the HFS Extended volume itself will mount and the HFS wrapper will not be visible.

The HFS wrapper contains an invisible minimal System and Finder file. The root folder of the wrapper is set as a "blessed" folder so that it can be used for startup. When starting up from that volume, the computer will begin the startup sequence from the special System on the wrapper volume, recognize and mount the HFS Extended volume, and then continue starting up from the System on the HFS Extended volume.

The HFS wrapper is locked so that its contents cannot be altered. This protects it from inadvertent corruption. It typically contains a text file named "Where_have_all_my_files_gone?" If a wrappered HFS Extended volume is attached to a computer that does not have support for the HFS Extended format, the HFS wrapper will mount and that text file will show up on the volume. The contents of the text file explain why the HFS Extended volume is not appearing.

Damage to the HFS wrapper can cause the HFS Extended volume to not be accessible or make it unable to startup the computer.

Journaling

Mac OS X 10.2.2 added a new feature to the HFS Extended file system called journaling. Journaling is part of a set of incremental enhancements to the HFS Extended file system and is backward compatible with earlier versions of that file system.

Journaling makes the file system more robust and helps protect against data loss. When journaling is enabled, the file system logs transactions as they occur. If your computer fails in the middle of an operation (which might occur due to a crash or power failure), disk reads and writes may be interrupted. This can cause discrepancies between the file system directory and the actual location and structure of stored files. In an unjournaled file system, volumes may be left in a corrupted state after an unexpected shutdown. If journaling was enabled, the file system can "replay" the information in its log and complete the interrupted operations when the computer restarts. Although there may be minor loss of data that was buffered at the time of the failure, the file system itself will be returned to a consistent state. This allows the computer to restart much faster since the volume structures will not need to be repaired during startup.

Journaling adds a small amount of extra overhead to file reads and writes. In most cases, the impact of journaling upon data access performance will not be noticed. However, for files requiring high transfer speeds, such as large video, graphics, or audio files, the reliability provided by journaling may not justify the performance loss when accessing the data.

 

PreviousNextContents