April 6, 2016ASPLOS 2016Atlanta, Georgia. Yaron Weinsberg IBM Research Idit Keidar Technion Hagar Porat Technion Eran Harpaz Technion Noam Shalev Technion.

Slides:

Advertisements

Similar presentations

Chapter 4 Memory Management Page Replacement 补充：什么叫页面抖动？

Advertisements

Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.

Chapter 6: Process Synchronization

5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.

WHAT IS AN OPERATING SYSTEM? An interface between users and hardware - an environment "architecture ” Allows convenient usage; hides the tedious stuff.

G Robert Grimm New York University Virtual Memory.

Review: Chapters 1 – Chapter 1: OS is a layer between user and hardware to make life easier for user and use hardware efficiently Control program.

PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.

1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:

Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.

Two phase commit. What we’ve learnt so far Sequential consistency –All nodes agree on a total order of ops on a single object Crash recovery –An operation.

G Robert Grimm New York University Disco.

Scheduler Activations Effective Kernel Support for the User-Level Management of Parallelism.

OPERATING SYSTEM OVERVIEW

Remus: High Availability via Asynchronous Virtual Machine Replication.

Backup and Recovery Part 1.

Scheduler Activations Jeff Chase. Threads in a Process Threads are useful at user-level – Parallelism, hide I/O latency, interactivity Option A (early.

Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.

Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.

Virtual Memory.

CPU Scheduling - Multicore. Reading Silberschatz et al: Chapter 5.5.

Introduction to Embedded Systems

Zen and the Art of Virtualization Paul Barham, et al. University of Cambridge, Microsoft Research Cambridge Published by ACM SOSP’03 Presented by Tina.

Microkernels, virtualization, exokernels Tutorial 1 – CSC469.

Protection and the Kernel: Mode, Space, and Context.

1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.

Operating System 4 THREADS, SMP AND MICROKERNELS

CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.

Cosc 4740 Chapter 6, Part 3 Process Synchronization.

Chapter 41 Processes Chapter 4. 2 Processes  Multiprogramming operating systems are built around the concept of process (also called task).  A process.

Scheduler Activations: Effective Kernel Support for the User- Level Management of Parallelism. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,

Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,

Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.

Fall 2013 SILICON VALLEY UNIVERSITY CONFIDENTIAL 1 Introduction to Embedded Systems Dr. Jerry Shiao, Silicon Valley University.

Journaled Component Files John Scholes and Richard Smith 13 October, 2008 Or – How to never see FILE DAMAGED again!

4300 Lines Added 1800 Lines Removed 1500 Lines Modified PER DAY DURING SUSE Lab.

System Components ● There are three main protected modules of the System  The Hardware Abstraction Layer ● A virtual machine to configure all devices.

Storage Systems CSE 598d, Spring 2007 Rethink the Sync April 3, 2007 Mark Johnson.

Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,

Amanda Johnson Hannah Young Josh Taylor Rich Carroll Troy Gladhill Saunders Roesser.

Kernel Synchronization in Linux Uni-processor and Multi-processor Environment By Kathryn Bean and Wafa’ Jaffal (Group A3)

CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

Architectural Features of Transactional Memory Designs for an Operating System Chris Rossbach, Hany Ramadan, Don Porter Advanced Computer Architecture.

Presented by: Sagnik Bhattacharya Kingshuk Govil, Dan Teodosiu, Yongjang Huang, Mendel Rosenblum.

Interrupts and Interrupt Handling David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.

1.1 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 1: Introduction What Operating Systems Do √ Computer-System Organization.

CSCE451/851 Introduction to Operating Systems

REAL-TIME OPERATING SYSTEMS

Memory Management Paging (continued) Segmentation

Protection and OS Structure

Chapter 9: Virtual Memory

Chapter 1: Introduction

Modeling Page Replacement Algorithms

Operating System Structure

CS490 Windows Internals Quiz 2 09/27/2013.

Intro to Processes CSSE 332 Operating Systems

KERNEL ARCHITECTURE.

Module: Handling Exceptions

Introduction to Operating Systems

Chapter 3: Windows7 Part 2.

Chapter 9: Virtual-Memory Management

Memory Management Paging (continued) Segmentation

Chapter 3: Windows7 Part 2.

Modeling Page Replacement Algorithms

Memory Management Paging (continued) Segmentation

CSE 153 Design of Operating Systems Winter 2019

CSE 542: Operating Systems

Presentation transcript:

April 6, 2016ASPLOS 2016Atlanta, Georgia. Yaron Weinsberg IBM Research Idit Keidar Technion Hagar Porat Technion Eran Harpaz Technion Noam Shalev Technion

 Technology scaling ◦ Many-core architecture is here ◦ Machines with a thousand cores are subject to research [ TERAFLUX, 2014 ] 2

 Technology scaling  Nano scale phenomena  Hardware reliability decreases [Radetzki et al., 2013]  Faults more likely 3 Technology Node (nm) Relative Failure Rate

 Core failures can no longer be ruled out [Srinivasan et al. 2004] 4 Less Reliability More Cores

 What happens today? 5

 A strategy for overcoming Core Surprise Removal (CSR) ◦ Objective – keep the system alive following a core fault ◦ Easily integrate into existing operating systems 6

 A strategy for overcoming Core Surprise Removal (CSR) ◦ Objective – keep the system alive following a core fault ◦ Easily integrate into existing operating systems  Implementation in the Linux kernel.  Use Hardware Transactional Memory to cope with failures in critical kernel code  Provide a proof of concept on a real system. 7

 Chip Multi-Processor System ◦ Reliable shared memory  Fault-prone cores  Reliable Failure Detection Unit (FDU) [Weis et al.,2012] ◦ Halts execution of the faulty core ◦ Flush L1 upon failure detection ◦ Reports to OS. 8

 Fail-Stop Model ◦ Faulty core stops executing from some point onward ◦ Registers and buffers are unavailable ◦ L1 Cache data is flushed upon failure [Giorgi et al., 2014]. 9 Core L1 L2 Cache L3 Cache Reliable Shared Memory Core L1 L2 Cache Core L1 L2 Cache Core L1 L2 Cache

 Flag as faulty ◦ Treat it as offline, and never Hot-plug it again  Reset interrupt affinities ◦ Handle lost interrupts, migrate IPI queue  Migrate tasklets, work-queues  Update kernel services ◦ RCU subsystem, performance events, etc.  Terminate the running process ◦ Free its resources  Migrate processes. 10 OS dependent

 Flag as faulty  Reset interrupt affinities  Migrate tasklets, work-queues  Update kernel services  Terminate the running process  Migrate processes. 11 What about cascading failures?

Close Task Reset Interrupts Migrate Tasklets Mark Faulty Migrate Workqueues Migrate Processes Update Services

Close Task Reset Interrupts Migrate Tasklets Mark Faulty Migrate Workqueues Migrate Processes Update Services

Recovery WorkqueueTasklet Queue Close Task Reset Interrupts Migrate Tasklets Mark Faulty Migrate Workqueues Migrate Processes Update Services Queue Work

15 Recovery WorkqueueTasklet Queue Close Task Reset Interrupts Migrate Tasklets Mark Faulty Migrate Workqueues Migrate Processes Update Services Queue Work Recovery Ops Verify Visibility Inform FDU Queue Tasklets Resume FDU Triggered Ack

16 Migrate Tasklets  Use tasklets and work-queues to execute the recovery process  In a cascading failure case: ◦ FDU chooses a new core ◦ The third tasklet migrates the remaining operations. Mark as faulty Reset Interrupts Queue Work Tasklets Queue Close Task Migrate Workqueues Update Kernel Services Migrate Tasks Recovery Workqueue Executing core Any Core Queue Tasklets Verify Visibility Inform FDU Execute Tasklets Executing core

17  Designed to integrate into commodity operating systems  No overhead when the system is correct  Tolerates cascading failures  Scalable Recovery guarantees?

But… How? 18

19  Modified QEMU ◦ Crashes a random core at random time ◦ Distinguish between idle, user and kernel mode  Run different workloads ◦ Postmark, Metis and SPEC CPU2006 benchmarks  Recovery validation ◦ By creating a file and flushing it to the disk using sync

20 100%  Idle mode success rate: 100% 100%  User mode success rate: 100%  Meaning that the system is protected ALL the time, except for….  Kernel mode  Well… It’s complicated.

21 Core#0Core#1 Core#2Core#3  Fault during critical kernel section execution ◦ Deadlock ◦ Cannot kill kernel space ◦ Reclaim lock by keeping ownership?  No – inconsistent data.

22

23 System crashes always happen due to a held lock

24  Solution: Use Hardware Transactional Memory to execute kernel critical sections ◦ TxLinux [Rossbach et al. SOSP 07’]  For reliability purposes ◦ Does not use locks  Prevent deadlocks ◦ Execute atomically  Prevent inconsistent data

 A strategy for overcoming Core Surprise Removal (CSR) ◦ Objective – keep the system alive following a core fault ◦ Easily integrate into existing operating systems  Implementation in the Linux kernel  Use Hardware Transactional Memory to cope with failures in critical kernel code  Provide a proof of concept on a real system. 25

26  Replace scheduler locks with lock elision code  TSX is a best effort HTM ◦ Transactions are not guaranteed to commit  Retry ◦ Not all instructions can commit transactionally  Resort to regular locking ◦ Too large sections  Split Energy Saving Performance Gain Commit RateWorkload 4%-100%Idle 1%0%99.9%16-threads 3% 99.9%32-threads 2%4%99.8%64-threads

27 But again… How?

interrupts_disable(); //unresponsive If (fault_injection()==smp_processor_id()) while(TRUE); //”stops” executing 28  Crash simulation on a real system ◦ Executed in kernel mode

29 64-core server, only 0-15 are presented. 10 tasks are affined to each core. Failure is detected Core #13 has no tasks Tasks migrated to core #0 Load is balanced

30 Initial correct state cloud settingAfter a crash, original kernel Crash! Real Time: 7:58 Real Time: 8:00

31 Initial correct state cloud settingAfter a crash, CSR on host Crash! Alive Real Time: 8:00 Real Time: 7:58

32

33