Presentation is loading. Please wait.

Presentation is loading. Please wait.

File System Numbers 4/18/2002 Michael Ferguson

Similar presentations


Presentation on theme: "File System Numbers 4/18/2002 Michael Ferguson"— Presentation transcript:

1 File System Numbers 4/18/2002 Michael Ferguson mpf7@cornell.edu

2 Why?  Make trace studies of filesystems to Inform development See trends in file system usage  Ask these questions How do people actually use filesystems? What to they store and how do they access their data? What caching strategies are best?  Filesystem statistics have wider implications Network activity may depend on these filesystem statistics (think of a web server)

3 What data do we gather?  User activity – e.g. number of users, amount of data transferred?  File access patterns – e.g. was the file read sequentially from start to finish?  File lifetimes – e.g. what percentage of files exist for less than a second?

4 File System Trace Studies  BSD Numbers from 1985 (Ousterhout & others)  Sprite Numbers from 1991(Ousterhout & others)  Windows NT numbers from 1999 (Vogels)

5 The BSD Study - 1985  Local BSD 4.2 filesystem on a 3 VAX-11/780s Ucbarpa – used by graduate students for program development and document formatting – 4 Mb of memory Ucbernie – used by grad students and by administration – 8 Mb of memory Ucbead – used to run CAD programs for EE – 16 Mb of memory  Average file accesses only a few hundred bytes/sec/user  75% of files open for less than ½ second  Many files only exist for a few seconds  File accesses tend to be sequential  Most file accesses are to short files but most bytes transferred are from large ones

6 Sprite Overview  Network-Oriented OS  File system servers and diskless workstations  Supports process migration

7 Sprite Study - Environment  40 10-mips workstations running Sprite  4 are fileservers  Memory averages 24Mb/workstation  Pmake commonly used to migrate processes and make use of idle workstations

8 Sprite Users  ~ ¼ OS researchers  ~ ¼ Architecture researchers design and simulate IO subsystems  ~ ¼ Researchers studying VSLI design and parallel processing  ~ ¼ Administrators, graphics researchers, and other people

9 Sprite – Measurement Approach  Instrumented kernels on file servers Kernel records trace of activity (open, close, delete, lseek, etc but not read or write) Kernel gives log to user process which records it in a file Can deduce exact range of bytes accessed lseek was modified to call file server Removed trace-file records and tape backup records  Total statistics are gathered in-kernel  I’ll talk about results in comparison with Windows

10 Windows NT Measurements  1998 – used 45 Windows NT 4 systems  Systems are used by one person at a time  Statistics are gathered with File system snapshots A transparent filter device driver for tracing

11 Windows trace summary

12 User Activity Comparison

13 File Access Pattern Comparison

14 File Lifetimes  Windows NT  Sprite

15 Sequential Runs - Comparison  Windows NT  Sprite

16 File Size Distribution - Comparison  Windows NT  Sprite

17 File Open Times - Comparison  Windows NT  Sprite

18 Windows NT interesting notes  Time between sequential reads and writes different – 90 microseconds for reads, 30 microseconds for writes  74% of sessions were opening files for control – not read or write common operation checks whether or not the volume is mounted

19 Statistical Gotcha!  The data from the Windows NT trace is not a Poisson process – it is better modeled by the Pareto distribution

20 Open requests vs. Poisson Process

21 What does it mean?  There is extreme variance at all time scales  Mean and variance of request distribution does not stabilize over time!  Other components have heavy-tail distribution as well: Process lifetime Number of DLLs accessed Number of files open per process Spacing of file accesses

22 File Size Distribution  File Sizes are not normally distributed!

23 Bottom Line – WinNT traces  Although all systems were interactive and used by a single person at a time 92% of file system operations were from processes that have no direct user input Even explorer.exe’s behavior does not come directly from the user “It is the structure and content of the filesystem that determines explorer’s file system interactions, not the user requests.”

24 Summary  We’ve followed several statistics through Sprite and Windows NT measurements Network filesystems are still feasible but Access is quite bursty Most accesses are for controlling files  But beware! Several statistical assumptions about filesystems seem to be just plain wrong

25 Summary


Download ppt "File System Numbers 4/18/2002 Michael Ferguson"

Similar presentations


Ads by Google