E X PLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University.

Slides:



Advertisements
Similar presentations
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Advertisements

Today From threads to file systems
Recovery 10/18/05. Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s.
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
Crash recovery All-or-nothing atomicity & logging.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Crash recovery All-or-nothing atomicity & logging.
Linux Operating System
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
CSE 451: Operating Systems Autumn 2013 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
Self stabilizing Linux Kernel Mechanism Doron Mishali, Alex Plits Supervisors: Prof. Shlomi Dolev Dr. Reuven Yagel.
Chapter Fourteen Windows XP Professional Fault Tolerance.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
5.1 Advanced Operating Systems Operating Systems Bugs Linux's code has been significantly growing during last years. Is this code bugs free? Obviously.
Page 1 of John Wong CTO Twin Peaks Software Inc. Mirror File System A Multiple Server File System.
Ivy: A Read/Write Peer-to-Peer File System A. Muthitacharoen, R. Morris, T. M. Gil, and B. Chen In Proceedings of OSDI ‘ Presenter : Chul Lee.
Copyright © 2015 – Curt Hill Version Control Systems Why use? What systems? What functions?
CS 4284 Systems Capstone Godmar Back Disks & File Systems.
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Using Model Checking to Find Serious File System Errors StanFord Computer Systems Laboratory and Microsft Research. Published in 2004 Presented by Chervet.
Serverless Network File Systems Overview by Joseph Thompson.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Advanced file systems: LFS and Soft Updates Ken Birman (based on slides by Ben Atkin)
Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem.
File Systems Operating Systems 1 Computer Science Dept Va Tech August 2007 ©2007 Back File Systems & Fault Tolerance Failure Model – Define acceptable.
Storage Systems CSE 598d, Spring 2007 Rethink the Sync April 3, 2007 Mark Johnson.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Silberschatz, Galvin and Gagne  Operating System Concepts Six Step Process to Perform DMA Transfer.
Linux File system Implementations
Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O.
I/O Software CS 537 – Introduction to Operating Systems.
Speculative Execution in a Distributed File System Ed Nightingale Peter Chen Jason Flinn University of Michigan Best Paper at SOSP 2005 Modified for CS739.
MINIX Presented by: Clinton Morse, Joseph Paetz, Theresa Sullivan, and Angela Volk.
W4118 Operating Systems Instructor: Junfeng Yang.
Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, Haryadi S. Gunawi.
REAL-TIME OPERATING SYSTEMS
Module 12: I/O Systems I/O hardware Application I/O Interface
Free Transactions with Rio Vista
Jonathan Walpole Computer Science Portland State University
Presented by: Daniel Taylor
Chapter Objectives In this chapter, you will learn:
Transactions and Reliability
Distributed File Systems
Database Corruption Advanced Recovery Techniques|
Journaling File Systems
Introduction to Operating Systems
RAID RAID Mukesh N Tekwani
QNX Technology Overview
Operating System Concepts
13: I/O Systems I/O hardwared Application I/O Interface
CS703 - Advanced Operating Systems
CSE 451: Operating Systems Spring 2012 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Free Transactions with Rio Vista
Printed on Monday, December 31, 2018 at 2:03 PM.
CSE 451 Fall 2003 Section 11/20/2003.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
RAID RAID Mukesh N Tekwani April 23, 2019
Advanced Recovery Techniques
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

E X PLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University

Why check storage systems?  Storage system errors are among the worst  kernel panic, data loss and corruption  Complicated code, hard to get right  Simultaneously worry about speed, failures and crashes  Hard to comprehensively test for failures, crashes Goal: comprehensively check many storage systems with little work

E X PLODE summary  Comprehensive: uses ideas from model checking  Fast, easy  Check a storage system: 200 lines of C++ code  Main requirement: 1 device driver. Run on Linux, FreeBSD  General, real: check live systems.  Can run, can check, even without source code  Effective  checked 10 Linux FS, 3 version control software, Berkeley DB, Linux RAID, NFS, VMware GSX 3.2/Linux  Bugs in all, 36 in total, mostly data loss  Subsumes our old work FiSC [OSDI 2004]

Checking complicated stacks  All real  Stack of storage systems  subversion: an open-source version control software  User-written checker on top  Recovery tools run after EXPLODE- simulated crashes subversion checker NFS client NFS server loopback JFS software RAID1 checking disk subversion checking disk %fsck.jfs %mdadm --assemble --run --force --update=resync %mdadm -a crash disk %svnadm.recover crash disk ok? crash

Outline  Core idea  Checking interface  Implementation  Results  Related work, conclusion and future work

Core idea: explore all choices  Bugs are often triggered by corner cases  How to find: drive execution down to these tricky corner cases When execution reaches a point in program that can do one of N different actions, fork execution and in first child do first action, in second do second, etc.

External choices creat /root b a c link unlink mkdir rmdir … …  Fork and do every possible operation Explore generated states as well Speed hack: hash states, discard if seen Users write code to check. E X PLODE “amplifies” the checks

Internal choices creat /root b a c Buffer cache misses kmalloc returns 0  Fork and explore internal choices

How to expose choices  To explore N-choice point, users instrument code using choose(N)  choose(N): N-way fork, return K in K’th kid  We instrumented 7 kernel functions in Linux void* kmalloc(size s) { if(choose(2) == 0) return NULL; … // normal memory allocation }

Crashes creat /root b a c  Dirty blocks can be written in any order, crash at any point Write all subsets fsck buffer cache check Users write code to check recovered FS

Outline  Core idea: explore all choices  Checking interface  What E X PLODE provides  What users do to check their storage system  Implementation  Results  Related work, conclusion and future work

What E X PLODE provides  choose(N) : conceptual N-way fork, return K in K’th child execution  check_crashes_now(): check all crashes that can happen at the current moment  Paper has more methods for checking crashes  Users embed non-crash checks in their code. E X PLODE amplifies them  err() : record trace for deterministic replay

 Example: ext3 on RAID  checker: drive ext3 to do something: mutate(), then verify what ext3 did was correct: check()  storage component: set up, repair and tear down ext3, RAID. Write once per system  assemble a checking stack What users do

 FS Checker  mutate  ext3 Component  Stack choose(4) mkdirrmdirrm filecreat file …/ syncfsync

 FS Checker  check  ext3 Component  Stack Check file exists Check file contents match Found JFS fsync bug, caused by re- using directory inode as file inode Checkers can be simple (50 lines) or very complex(5,000 lines) Whatever you can express in C++, you can check

 FS Checker  ext3 Component  Stack  storage component: initialize, repair, set up, and tear down your system  threads(): returns list of kernel thread IDs for deterministic error replay  Wrappers to existing utilities  Write once per system  Real code on next slide

 FS Checker  ext3 Component  Stack

 FS Checker  ext3 Component  Stack  assemble a checking stack  Let E X PLODE know how subsystems are connected together, so it can initialize, set up, tear down, repair the entire stack  Real code on next slide

 FS Checker  ext3 Component  Stack

Outline  Core idea: explore all choices  Checking interface: 200 lines of C++ to check a system  Implementation  Checkpoint and restore states  Deterministic replay  Checking process  Checking crashes  Checking “soft” application crashes  Results  Related work, conclusion and future work

Recall: core idea  “Fork” at decision point to explore all choices state: a snapshot of the checked system …

How to checkpoint live system? S0 S …  Hard to checkpoint live kernel memory  VM cloning is heavy-weight  checkpoint: record all choose() returns from S0  restore: umount, restore S0, re-run code, make K’th choose() return K’th recorded values  Key to E X PLODE approach 2 3 S = S0 + redo (2, 3)

Deterministic replay  Need it to recreate states, diagnose bugs Sources of non-determinism  Kernel choose() can be called by other code  Fix: filter by thread IDs. No choose() in interrupt  Kernel threads  Opportunistic hack: setting priorities. Worked well  Can’t use lock: deadlock. A holds lock, then yield to B  Other requirements in paper  Worst case: non-repeatable error. Automatic detect and ignore

E X PLODE: put it all together EXPLODEUser code EKM = EXPLODE device driver

Outline  Core idea: explore all choices  Checking interface: 200 lines of C++ to check a system  Implementation  Results  Lines of code  Errors found  Related work, conclusion and future work

E X PLODE core lines of code 3 kernels: Linux , , FreeBSD 6.0. FreeBSD patch doesn’t have all functionality yet Lines of code Kernel patch Linux 1,915 (+ 2,194 generated) FreeBSD1,210 User-level code6,323

Checkers lines of code, errors found Storage System CheckedComponentCheckerBugs 10 file systems7445,47718 Storage applications CVS27681 SubversionNot ported yet1 “E XP ENS IVE ” Berkeley DB Transparent subsystems RAID144FS NFS34FS4 VMware GSX/Linux 54FS1 Total1,1156,00836

Outline  Core idea: explore all choices  Checking interface: 200 lines of C++ to check a system  Implementation  Results  Lines of code  Errors found  Related work, conclusion and future work

FS Sync checking results App rely on sync operations, yet they are broken indicates a failed check

ext2 fsync bug Mem Disk A B A truncate A creat B write B fsync B … … B Events to trigger bug fsck.ext2 Bug is fundamental due to ext2 asynchrony crash! B Indirect block

Classic app mistake: “atomic” rename  Atomically update file A to avoid corruption  Problem: rename guarantees nothing abt. data fd = creat(A_tmp, …); write(fd, …); close(fd); rename(A_tmp, A); fsync(fd);

Outline  Core idea: explore all choices  Checking interface: 200 lines of C++ to check a system  Implementation  Results: checked many systems, found many bugs  Related work, conclusion and future work

Related work  FS testing  IRON  Static analysis  Traditional software model checking  Theorem proving  Other techniques

Conclusion and future work  E X PLODE  Easy: need 1 device driver. simple user interface  General: can run, can check, without source  Effective: checked many systems, 36 bugs  Future work:  Work closely with storage system implementers to check more systems and more properties  Smart search  Automatic diagnosis  Automatically inferring “choice points”  Approach is general, applicable to distributed systems, secure systems, …