Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linux Virtual File System

Similar presentations

Presentation on theme: "Linux Virtual File System"— Presentation transcript:

1 Linux Virtual File System
Peter J. Braam P.J.Braam/CMU -- 1

2 Aims Present the data structures in Linux VFS
Provide information about flow of control Describe methods and invariants needed to implement a new file system Illustrate with some examples P.J.Braam/CMU -- 2

3 History BSD implemented VFS for NFS: aim dispatch to different filesystems VMS had elaborate filesystem NT/Win95 have VFS type interfaces Newer systems integrate VM with buffer cache. File access VFS nfs ufs Coda disk udp Venus P.J.Braam/CMU -- 3

4 Linux Filesystems Media based Network Special ones ext2 - Linux native
ufs - BSD fat - DOS FS vfat - win 95 hpfs - OS/2 minix - well…. Isofs - CDROM sysv - Sysv Unix hfs - Macintosh affs - Amiga Fast FS NTFS - NT’s FS adfs - Acorn-strongarm Network nfs Coda AFS - Andrew FS smbfs - LanManager ncpfs - Novell Special ones procfs -/proc umsdos - Unix in DOS userfs - redirector to user P.J.Braam/CMU -- 4

5 Linux Filesystems (ctd)
Forthcoming: devfs - device file system DFS - DCE distributed FS Varia: cfs - crypt filesystem cfs - cache filesystem ftpfs - ftp filesystem mailfs - mail filesystem pgfs - Postgres versioning file system Linux serves (unrelated to the VFS!) NFS - user & kernel Coda AppleShare - netatalk/CAP SMB - samba NCP - Novell P.J.Braam/CMU -- 5

6 Usefulness Linux is Obsolete Andrew Tanenbaum P.J.Braam/CMU -- 6

7 Linux VFS nfs ext2fs Coda FS Multiple interfaces build up VFS:
files dentries inodes superblock quota VFS can do all caching & provides utility fctns to FS FS provides methods to VFS; many are optional File access VFS nfs VFS ext2fs Coda FS VFS disk udp Venus P.J.Braam/CMU -- 7

8 User level file access Typical user level types and code:
pathnames: “/myfile” file descriptors: fd = open(“/myfile”…) attributes in struct stat: stat(“/myfile”, &mybuf), chmod, chown... offsets: write, read, lseek directory handles: DIR *dh = opendir(“/mydir”) directory entries: struct dirent *ent = readdir(dh) P.J.Braam/CMU -- 8

9 VFS Manages kernel level file abstractions in one format for all file systems Receives system call requests from user level (e.g. write, open, stat, link) Interacts with a specific file system based on mount point traversal Receives requests from other parts of the kernel, mostly from memory management P.J.Braam/CMU -- 9

10 File system level Individual File Systems
responsible for managing file & directory data responsible for managing meta-data: timestamps, owners, protection etc translates data between particular FS data: e.g. disk data, NFS data, Coda/AFS data VFS data: attributes etc in standard format e.g. nfs_getattr(….) returns attributes in VFS format, acquires attributes in NFS format to do so. P.J.Braam/CMU -- 10

11 Anatomy of stat system call
sys_stat(path, buf) { dentry = namei(path); if ( dentry == NULL ) return -ENOENT; inode = dentry->d_inode; rc =inode->i_op->i_permission(inode); if ( rc ) return -EPERM; rc = inode->i_op->i_getattr(inode, buf); dput(dentry); return rc; } Establish VFS data Call into inode layer of filesystem Call into inode layer of filesystem P.J.Braam/CMU -- 11

12 Anatomy of fstatfs system call
sys_fstatfs(fd, buf) { /* for things like “df” */ file = fget(fd); if ( file == NULL ) return -EBADF; superb = file->f_dentry->d_inode->i_super; rc = superb->sb_op->sb_statfs(sb, buf); return rc; } Translate fd to VFS data structure Call into superblock layer of filesystem P.J.Braam/CMU -- 12

13 Data structures VFS data structures for:
VFS handle to the file: inode (BSD: vnode) User instantiated file handle: file (BSD: file) The whole filesystem: superblock (BSD: vfs) A name to inode translation: dentry P.J.Braam/CMU -- 13

14 Shorthand method notation
super block methods: sss_methodname inode methods: iii_methodname dentry methods: ddd_methodname file methods: fff_methodname instead of : inode i_op lookup we write iii_lookup P.J.Braam/CMU -- 14

15 namei FS VFS struct dentry *namei(parent, name) {
if (dentry = d_lookup(parent,name)) else ddd_hash(parent, name) ddd_revalidate(dentry) iii_lookup(parent, name) sss_read_inode(…) struct inode *iget(ino, dev) { /* try cache else .. */ } P.J.Braam/CMU -- 15

16 Superblocks Handle metadata only (attributes etc)
Responsible for retrieving and storing metadata from the FS media or peers Struct superblocks hold things like: device, blocksize, dirty flags, list of dirty inodes super operations wait queue pointer to the root inode of this FS P.J.Braam/CMU -- 16

17 Super Operations (sss_)
Ops on Inodes: read_inode put_inode write_inode delete_inode clear_inode notify_change Superblock manips: read_super (mount) put_super (unmount) write_super (unmount) statfs (attributes) P.J.Braam/CMU -- 17

18 Inodes Inodes are VFS abstraction for the file
Inode has operations (iii_methods) VFS maintains an inode cache, NOT the individual FS’s (compare NT, BSD etc) Inodes contain an FS specific area where: ext2 stores disk block numbers etc AFS would store the FID Extraordinary inode ops are good for dealing with stale NFS file handles etc. P.J.Braam/CMU -- 18

19 What’s inside an inode - 1
list_head i_hash list_head i_list list_head i_dentry int i_count long i_ino int i_dev {m,a,c}time {u,g}id mode size n_link caching Identifies file Usual stuff P.J.Braam/CMU -- 19

20 What’s inside an inode -2
superblock i_sb inode_ops i_op wait objects, semaphore lock vm_area_struct pipe/socket info page information union { ext2fs_inode_info i_ext2 nfs_inode_info i_nfs coda_inode_info i_coda ..} u Which FS For mmap, networking waiting FS Specific info: blockno’s fids etc P.J.Braam/CMU -- 20

21 Inode state Inode can be on one or two lists: Transitions
(hash & in_use) or (hash & dirty ) or unused inode has a use count i_count Transitions unused  hash: iget calls sss_read_inode dirty in_use: sss_write_inode hash  unused: call on sss_clear_inode, but if i_nlink = 0: iput calls sss_delete_inode when i_count falls to 0 P.J.Braam/CMU -- 21

22 Inode Cache Players: 1. iget: if i_count>0 ++
2. iput: if i_count>1 - - 3. free_inodes 4. syncing inodes Inode_hashtable sss_clear_inode (freeing inos) or sss_delete_inode (iput) sss_read_inode (iget) Fs storage Unused inodes Fs storage Dirty inodes sss_write_inode (sync one) media fs only (mark_inode_dirty) Fs storage Used inodes P.J.Braam/CMU -- 22

23 Sales Red Hat Software sold 240,000 copies of Red Hat Linux in 1997 and expects to reach 400,000 in Estimates of installed servers (InfoWorld): - Linux: 7 million - OS/2: 5 million - Macintosh: 1 million P.J.Braam/CMU -- 23

24 Inode operations (iii_)
symbolic links readlink follow link pages readpage, writepage, updatepage - read or write page. Generic for mediafs. bmap - return disk block number of logical block special operations revalidate - see dentry sect truncate permission lookup: return inode calls iget creation/removal create link unlink symlink mkdir rmdir mknod rename P.J.Braam/CMU -- 24

25 Dentry world Dentry is a name to inode translation structure
Cached agressively by VFS Eliminates lookups by FS & private caches timing on Coda FS: ls -lR 1000 files after priming cache linux : 7.2secs linux : 0.6secs disk fs: less benefit, NFS even more Negative entries! Namei is dramatically simplified P.J.Braam/CMU -- 25

26 Inside dentry’s name pointer to inode pointer to parent dentry
list head of children chains for lots of lists use count P.J.Braam/CMU -- 26

27 Dentry associated lists
Legend: inode dentry dentry inode relationship dentry tree relationship inode I_dentry list head inode i_dentry list head = d_inode pointer = d_parent pointer d_alias chains place: d_instantiate remove: dentry_iput d_child chains place: d_alloc remove: d_prune, d_invalidate, d_put P.J.Braam/CMU -- 27

28 dhash(parent, name) list head
Dcache dentry_hashtable (d_hash chains) namei tries cache: d_lookup ddd_compare Success: ddd_revalidate d_invalidate if fails proceed if success Failure: iii_lookup find inode iget sss_read_inode finish: d_add can give negative entry in dcache dhash(parent, name) list head prune d_invalidate d_drop namei iii_lookup d_add unused dentries (d_lru chains) P.J.Braam/CMU -- 28

29 Dentry methods ddd_revalidate: can force new lookup
ddd_hash: compute hash value of name ddd_compare: are names equal? ddd_delete, ddd_put, ddd_iput: FS cleanup opportunity P.J.Braam/CMU -- 29

30 Dentry particulars: ddd_hash and ddd_compare have to deal with extraordinary cases for msdos/vfat: case insensitive long and short filename pleasantries ddd_revalidate -- can force new lookup if inode not in use: used for NFS/SMBfs aging used for Coda/AFS callbacks P.J.Braam/CMU -- 30

31 Dijkstra probably hates me Linus Torvalds
Style Dijkstra probably hates me Linus Torvalds P.J.Braam/CMU -- 31

32 Memory mapping vm_area structure has mmap vm_operations
inode, addresses etc. map, unmap swapin, swapout nopage -- read when page isn’t in VM mmap calls on iii_readpage keeps a use count on the inode until unmap P.J.Braam/CMU -- 32

Download ppt "Linux Virtual File System"

Similar presentations

Ads by Google