An architecture for multiple file system types in sun unix
Figures and Topics from this paper. Citation Type. Has PDF. Publication Type. More Filters. Frigate: a user-extensible OO file system. IEEE Concurr. Computer Science, Engineering. Highly Influenced. View 9 excerpts, cites methods and background.
View 1 excerpt, cites methods. Error Management in the Pluggable File System. We have created a Macintosh le system library which is portable to a variety of operating systems and platforms. Just a supplement to what Stephen Kitt already said: The entries in any directory in a classic Unix file system are hard links that map names to inodes —small fixed-size records in the file system. There were several different kinds of inode; A "regular file" inode contained information that the OS could use to find the pages of a file.
A "directory" inode basically is the same as a regular file, except that the OS treats the file contents as a table of hard links, and it doesn't allow user-mode programs to write the "file. When a program opens a "device special" file, the open file descriptor actually represents a channel connecting the program to the device driver. Solomon Slow Solomon Slow 1, 9 9 silver badges 19 19 bronze badges.
Hackers used to put devices in private directories that they owned and that allowed access to raw drives, memory, or whatever. The mknod system call also happened to be the way directories were created before the mkdir system call existed. This would create a completely empty directory without. The mkdir command would create the entries manually using the link system call.
RossRidge, I remember that too! And we could hide information from file system scanners by unlinking.. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown.
The Overflow Blog. Podcast Helping communities build their own LTE networks. Podcast Making Agile work for data science. Append-only files can be opened in write mode but data is always appended at the end of the file. Like immutable files, they cannot be deleted or renamed.
This is especially useful for log files which can only grow. A filesystem is made up of block groups. However, block groups are not tied to the physical layout of the blocks on the disk, since modern drives tend to be optimized for sequential access and hide their physical geometry to the operating system.
Block Group N Each block group contains a redundant copy of crucial filesystem control informations superblock and the filesystem descriptors and also contains a part of the filesystem a block bitmap, an inode bitmap, a piece of the inode table, and data blocks.
The structure of a block group is represented in this table: Super Block FS descriptors Block Bitmap Inode Bitmap Inode Table Data Blocks Using block groups is a big win in terms of reliability: since the control structures are replicated in each block group, it is easy to recover from a filesystem where the superblock has been corrupted. In Ext2fs, directories are managed as linked lists of variable length entries.
Each entry contains the inode number, the entry length, the file name and its length. By using variable length entries, it is possible to implement long file names without wasting disk space in directories. This way, it tries to ensure that the next block to read will already be loaded into the buffer cache. Readaheads are normally performed during sequential reads on files and Ext2fs extends them to directory reads, either explicit reads readdir 2 calls or implicit ones namei kernel directory lookup.
Ext2fs also contains many allocation optimizations. Block groups are used to cluster together related inodes and data: the kernel code always tries to allocate data blocks for a file in the same group as its inode.
This is intended to reduce the disk head seeks made when the kernel reads an inode and its data blocks. When writing data to a file, Ext2fs preallocates up to 8 adjacent blocks when allocating a new block. This preallocation achieves good write performances under heavy load. It also allows contiguous blocks to be allocated to files, thus it speeds up the future sequential reads.
These two allocation optimizations produce a very good locality of: related files through block groups related blocks through the 8 bits clustering of block allocations. The Ext2fs library To allow user mode programs to manipulate the control structures of an Ext2 filesystem, the libext2fs library was developed. This library provides routines which can be used to examine and modify the data of an Ext2 filesystem, by accessing the filesystem directly through the physical device.
The Ext2fs library was designed to allow maximal code reuse through the use of software abstraction techniques. For example, several different iterators are provided. Another iterator function allows an user-provided function to be called for each file in a directory.
Many of the Ext2fs utilities mke2fs , e2fsck , tune2fs , dumpe2fs , and debugfs use the Ext2fs library. This greatly simplifies the maintainance of these utilities, since any changes to reflect new features in the Ext2 filesystem format need only be made in one place--in the Ext2fs library.
This code reuse also results in smaller binaries, since the Ext2fs library can be built as a shared library image. Because the interfaces of the Ext2fs library are so abstract and general, new programs which require direct access to the Ext2fs filesystem can very easily be written.
For example, the Ext2fs library was used during the port of the 4. Very few changes were needed to adapt these tools to Linux: only a few filesystem dependent functions had to be replaced by calls to the Ext2fs library. The Ext2fs library provides access to several classes of operations. The first class are the filesystem-oriented operations. A program can open and close a filesystem, read and write the bitmaps, and create a new filesystem on the disk.
Functions are also available to manipulate the filesystem's bad blocks list. The second class of operations affect directories. A caller of the Ext2fs library can create and expand directories, as well as add and remove directory entries.
Functions are also provided to both resolve a pathname to an inode number, and to determine a pathname of an inode given its inode number. The final class of operations are oriented around inodes. It is possible to scan the inode table, read and write inodes, and scan through all of the blocks in an inode. Allocation and deallocation routines are also available and allow user mode programs to allocate and free blocks and inodes. The Ext2fs tools Powerful management tools have been developed for Ext2fs.
These utilities are used to create, modify, and correct any inconsistencies in Ext2 filesystems. The mke2fs program is used to initialize a partition to contain an empty Ext2 filesystem. The tune2fs program can be used to modify the filesystem parameters. The most interesting tool is probably the filesystem checker. E2fsck is intended to repair filesystem inconsistencies after an unclean shutdown of the system.
The original version of e2fsck was based on Linus Torvald's fsck program for the Minix filesystem. However, the current version of e2fsck was rewritten from scratch, using the Ext2fs library, and is much faster and can correct more filesystem inconsistencies than the original version. The e2fsck program is designed to run as quickly as possible. Since filesystem checkers tend to be disk bound, this was done by optimizing the algorithms used by e2fsck so that filesystem structures are not repeatedly accessed from the disk.
In addition, the order in which inodes and directories are checked are sorted by block number to reduce the amount of time in disk seeks. Many of these ideas were originally explored by [Bina and Emrath ] although they have since been further refined by the authors. In pass 1, e2fsck iterates over all of the inodes in the filesystem and performs checks over each inode as an unconnected object in the filesystem. That is, these checks do not require any cross-checks to other filesystem objects.
Examples of such checks include making sure the file mode is legal, and that all of the blocks in the inode are valid block numbers. During pass 1, bitmaps indicating which blocks and inodes are in use are compiled. If e2fsck notices data blocks which are claimed by more than one inode, it invokes passes 1B through 1D to resolve these conflicts, either by cloning the shared blocks so that each inode has its own copy of the shared block, or by deallocating one or more of the inodes.
Pass 1 takes the longest time to execute, since all of the inodes have to be read into memory and checked. The most important example of this technique is the location on disk of all of the directory blocks on the filesystem. This obviates the need to re-read the directory inodes structures during pass 2 to obtain this information. Pass 2 checks directories as unconnected objects.
Since directory entries do not span disk blocks, each directory block can be checked individually without reference to other directory blocks. This allows e2fsck to sort all of the directory blocks by block number, and check directory blocks in ascending order, thus decreasing disk seek time.
The directory blocks are checked to make sure that the directory entries are valid, and contain references to inode numbers which are in use as determined by pass 1. Pass 2 also caches information concerning the parent directory in which each directory is linked. If a directory is referenced by more than one directory, the second reference of the directory is treated as an illegal hard link, and it is removed. In pass 3, the directory connectivity is checked. E2fsck traces the path of each directory back to the root, using information that was cached during pass 2.
In pass 4, e2fsck checks the reference counts for all inodes, by iterating over all the inodes and comparing the link counts which were cached in pass 1 against internal counters computed during passes 2 and 3. Finally, in pass 5, e2fsck checks the validity of the filesystem summary information.
It compares the block and inode bitmaps which were constructed during the previous passes against the actual bitmaps on the filesystem, and corrects the on-disk copies if necessary. The filesystem debugger is another useful tool.
Debugfs is a powerful program which can be used to examine and change the state of a filesystem. Basically, it provides an interactive interface to the Ext2fs library: commands typed by the user are translated into calls to the library routines. Debugfs can be used to examine the internal structures of a filesystem, manually repair a corrupted filesystem, or create test cases for e2fsck.
Unfortunately, this program can be dangerous if it is used by people who do not know what they are doing; it is very easy to destroy a filesystem with this tool. For this reason, debugfs opens filesytems for read-only access by default.
Performance Measurements Description of the benchmarks We have run benchmarks to measure filesystem performances. The tests were run on Ext2 fs and Xia fs Linux 1. We have run two different benchmarks. It runs in five phases: it creates a directory hierarchy, makes a copy of the data, recursively examine the status of every file, examine every byte of every file, and compile several of the files.
0コメント