Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using.

Similar presentations


Presentation on theme: "Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using."— Presentation transcript:

1 Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using Modern Interconnects Eitan Frachtenberg (eitanf@lanl.gov) With Fabrizio Petrini, Juan Fernandez, Dror Feitelson, Jose-Carlos Sancho, Kei Davis Computer and Computational Sciences Division Los Alamos National Laboratory Ideas that change the world

2 Eitan Frachtenberg MIT, 20-Sep-2004 2 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Cluster Supercomputers n Growing in prevalence and performance, 7 out of 10 top supercomputers n Running parallel applications n Advanced, high-end interconnects

3 Eitan Frachtenberg MIT, 20-Sep-2004 3 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Distributed vs. Parallel Distributed and parallel applications (including operating systems) may be distinguished by their use of global and collective operations n Distributed—local information, relatively small number of point-to-point messages n Parallel—global synchronization: barriers, reductions, exchanges

4 Eitan Frachtenberg MIT, 20-Sep-2004 4 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 System Software Components Job Scheduling Fault Tolerance Parallel I/O Communication Library Resource Management System Software System Software

5 Eitan Frachtenberg MIT, 20-Sep-2004 5 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Problems with System Software Independent single-node OS (e.g. Linux) connected by distributed dæmons: u Redundant components u Performance hits u Scalability issues u Load balancing issues

6 Eitan Frachtenberg MIT, 20-Sep-2004 6 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 OS’s Collective Operations Many OS tasks are inherently global or collective operations: n Job launching, data dissemination n Context switching n Job termination (normal and forced) n Load balancing

7 Eitan Frachtenberg MIT, 20-Sep-2004 7 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Local Operating System Resource Management Parallel I/O Fault Tolerance Job Scheduling User-Level Communication Local Operating System Resource Management Parallel I/O Fault Tolerance Job Scheduling User-Level Communication Node 1 Node 2 Global Parallel Operating System Job SchedulingFault ToleranceCommunicationParallel I/OResource Mgmt

8 Eitan Frachtenberg MIT, 20-Sep-2004 8 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 The Vision n Modern interconnects are very powerful u collective operations u programmable NICs u on-board RAM n Use a small set of network mechanisms as parallel OS infrastructure n Build upon this infrastructure to create unified system software n System software Inherits scalability and performance from network features

9 Eitan Frachtenberg MIT, 20-Sep-2004 9 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Example: ASCI Q Barrier [HotI’03]

10 Eitan Frachtenberg MIT, 20-Sep-2004 10 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Parallel OS Primitives n System software built atop three primitives u Xfer-And-Signal F Transfer block of data to a set of nodes F Optionally signal local/remote event upon completion u Compare-And-Write F Compare global variable on a set of nodes F Optionally write global variable on the same set of nodes u Test-Event F Poll local event

11 Eitan Frachtenberg MIT, 20-Sep-2004 11 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Core Primitives on QsNet n System software built atop three primitives u Xfer-And-Signal (QsNet): F Node S transfers block of data to nodes D 1, D 2, D 3 and D 4 F Events triggered at source and destinations SD1D1 D2D2 D4D4 D3D3 Source Event Destination Events

12 Eitan Frachtenberg MIT, 20-Sep-2004 12 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Core Primitives (cont.) n System software built atop three primitives u Compare-And-Write (QsNet): F Node S compares variable V on nodes D 1, D 2, D 3 and D 4 S D1D1 D2D2 D4D4 D3D3 Is V { ,  , >} to Value?

13 Eitan Frachtenberg MIT, 20-Sep-2004 13 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Core Primitives (cont.) n System software built atop three primitives u Compare-And-Write (QsNet): F Node S compares variable V on nodes D 1, D 2, D 3 and D 4 F Partial results are combined in the switches S D1D1 D2D2 D4D4 D3D3

14 Eitan Frachtenberg MIT, 20-Sep-2004 14 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 System Software Components Job Scheduling Fault Tolerance Parallel I/O Communication Library ResourceManagement System Software System Software

15 Eitan Frachtenberg MIT, 20-Sep-2004 15 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 - Inherits scalability from network primitives: - Data dissemination and coordination - Interactive job launching speeds - Context-switching at milliseconds level - Described in [SC’02] Scalable Tool for Resource Management

16 Eitan Frachtenberg MIT, 20-Sep-2004 16 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 State of the Art in Resource Management Resource managers (e.g. PBS, LSF, RMS, LoadLeveler, Maui) are typically implemented using u TCP/IP—favors portability over performance, u Poorly-scaling algorithms for the distribution/collection of data and control messages u Favoring development time over performance Scalable performance not important for small clusters but crucial for large ones. There exists a need for fast and scalable resource management.

17 Eitan Frachtenberg MIT, 20-Sep-2004 17 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Experimental Setup n 64 nodes/256 processors ES40 Alphaserver cluster n 2 independent network rails of Quadrics Elan3 n Files are placed in ramdisk in order to avoid I/O bottlenecks and expose the performance of the resource management algorithms

18 Eitan Frachtenberg MIT, 20-Sep-2004 18 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Launch Times (Unloaded System) The launch time is constant when we increase the number of processors. STORM is highly scalable

19 Eitan Frachtenberg MIT, 20-Sep-2004 19 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Launch Times (Loaded System, 12 MB) Worst case: 1.5  seconds to launch a 12 MB file on 256 processors

20 Eitan Frachtenberg MIT, 20-Sep-2004 20 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Measured and Estimated Launch Times The model shows that in an ES40-based Alphaserver a 12MB binary can be launched in 135ms on 16,384 nodes

21 Eitan Frachtenberg MIT, 20-Sep-2004 21 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Comparative Evaluation (Measured & Modeled)

22 Eitan Frachtenberg MIT, 20-Sep-2004 22 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 System Software ComponentsJobScheduling Fault Tolerance Parallel I/O Communication Library Resource Management System Software System Software

23 Eitan Frachtenberg MIT, 20-Sep-2004 23 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Job Scheduling n Controls the allocation of space and time resources to jobs n HPC apps have special requirements u Multiple processing and network resources u Synchronization ( < 1ms granularity) u Potentially memory hogs with little locality n Has significant effect on throughput, responsiveness, and utilization

24 Eitan Frachtenberg MIT, 20-Sep-2004 24 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 First-Come-First-Serve (FCFS)

25 Eitan Frachtenberg MIT, 20-Sep-2004 25 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Gang Scheduling (GS)

26 Eitan Frachtenberg MIT, 20-Sep-2004 26 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Implicit CoScheduling

27 Eitan Frachtenberg MIT, 20-Sep-2004 27 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Hybrid Methods n Combine global synchronization & local information n Rely on scalable primitives for global coordination and information exchange n First implementation of two novel algorithms: u Flexible CoScheduling (FCS) u Buffered CoScheduling (BCS)

28 Eitan Frachtenberg MIT, 20-Sep-2004 28 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Flexible CoScheduling (FCS) n Measure communication characteristics, such as granularity and wait times n Classify processes based on synchronization requirements n Schedule processes based on class n Described in [IPDPS’03]

29 Eitan Frachtenberg MIT, 20-Sep-2004 29 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 FCS Classification Granularity Block times Fine Coarse Short Long CS Always gang-scheduled F Preferably gang-scheduled DC Locally scheduled

30 Eitan Frachtenberg MIT, 20-Sep-2004 30 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Methodology n Synthetic, controllable MPI programs n Workload u Static: all jobs start together u Dynamic: different sizes, arrival and run times n Various schedulers implemented: u FCFS, GS, FCS, SB (ICS), BCS n Emulation vs. simulation u Actual implementation takes into account all the overhead and factors of a real system

31 Eitan Frachtenberg MIT, 20-Sep-2004 31 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Hardware Environment n Environment ported to three architectures and clusters: u Crescendo: 32x2 Pentium III, 1GB u Accelerando: 32x2 Itanium II, 2GB u Wolverine: 64x4 Alpha ES40, 8GB

32 Eitan Frachtenberg MIT, 20-Sep-2004 32 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Synthetic Application n Bulk synchronous, 3ms basic granularity n Can control: granularity, variability and Communication pattern

33 Eitan Frachtenberg MIT, 20-Sep-2004 33 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Synthetic Scenarios Balanced Complementing Imbalanced Mixed

34 Eitan Frachtenberg MIT, 20-Sep-2004 34 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Turnaround Time

35 Eitan Frachtenberg MIT, 20-Sep-2004 35 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Dynamic Workloads [JSSPP’03] n Static workloads are simple and offer insights, but are not realistic n Most real-life workloads are more complex n Users submit jobs dynamically, of varying time and space requirements

36 Eitan Frachtenberg MIT, 20-Sep-2004 36 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Dynamic Workload Methodology n Emulation using a workload model [Lublin03] n 1000 jobs, approx. 12 days, shrunk to 2 hrs n Varying load by factoring arrival times n Using same synthetic application, with random: u Arrival time, run time, and size, based on model u Granularity (fine, medium, coarse) u communication pattern (ring, barrier, none) n Recent study with scientific apps (yet unpublished)

37 Eitan Frachtenberg MIT, 20-Sep-2004 37 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Load – Response Time

38 Eitan Frachtenberg MIT, 20-Sep-2004 38 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Load – Bounded Slowdown

39 Eitan Frachtenberg MIT, 20-Sep-2004 39 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Timeslice – Response Time

40 Eitan Frachtenberg MIT, 20-Sep-2004 40 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 System Software Components Job Scheduling Fault Tolerance Parallel I/O CommunicationLibrary Resource Management System Software System Software

41 Eitan Frachtenberg MIT, 20-Sep-2004 41 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Buffered CoScheduling (BCS) n Buffer all communications n Exchange information about pending communication every time slice n Schedule and execute communication n Implemented mostly on the NIC n Requires fine-grained heartbeats n Described in [SC’03]

42 Eitan Frachtenberg MIT, 20-Sep-2004 42 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Design and Implementation n Global synchronization u Strobe sent at regular intervals (time slices) F Compare-And-Write + Xfer-And-Signal (Master) F Test-Event (Slaves) u All system activities are tightly coupled n Global Scheduling u Exchange of communication requirements F Xfer-And-Signal + Test-Event u Communication scheduling u Real transmission F Xfer-And-Signal + Test-Event

43 Eitan Frachtenberg MIT, 20-Sep-2004 43 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Design and Implementation n Implementation in the NIC u Application processes interact with NIC threads F MPI primitive  Descriptor posted to the NIC F Communications are buffered u Cooperative threads running in the NIC F Synchronize F Partial exchange of control information F Schedule communications F Perform real transmissions and reduce computations u Comp/comm completely overlapped

44 Eitan Frachtenberg MIT, 20-Sep-2004 44 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Design and Implementation n Non-blocking primitives: MPI_Isend/Irecv

45 Eitan Frachtenberg MIT, 20-Sep-2004 45 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Design and Implementation n Blocking primitives: MPI_Send/Recv

46 Eitan Frachtenberg MIT, 20-Sep-2004 46 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Performance Evaluation n BCS MPI vs. Quadrics MPI u Experimental Setup F Benchmarks and Applications NPB (IS,EP,MG,CG,LU) - Class C SWEEP3D - 50x50x50 SAGE - timing.input F Scheduling parameters 500μs communication scheduling time slice (1 rail) 250μs communication scheduling time slice (2 rails)

47 Eitan Frachtenberg MIT, 20-Sep-2004 47 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Performance Evaluation n Benchmarks and Applications (C) ApplicationSlowdown IS (32PEs)10.40% EP (49PEs) 5.35% MG (32PEs) 4.37% CG (32PEs)10.83% LU (32PEs)15.04% SWEEP3D(49PEs) -2.23% SAGE (62PEs) -0.42%

48 Eitan Frachtenberg MIT, 20-Sep-2004 48 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Performance Evaluation n SAGE - timing.input (IA32) 0.5% SPEEDUP

49 Eitan Frachtenberg MIT, 20-Sep-2004 49 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Blocking Communication Blocking vs. Non-blocking SWEEP3D (IA32) MPI_Send/Recv  MPI_Isend/Irecv + MPI_Waitall

50 Eitan Frachtenberg MIT, 20-Sep-2004 50 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 System Software Components Job Scheduling FaultTolerance Parallel I/O Communication Library Resource Management System Software System Software

51 Eitan Frachtenberg MIT, 20-Sep-2004 51 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Fault Tolerance Today Fault tolerance is commonly achieved, if at all, by n Checkpointing n Segmentation of the machine n Removal of fault-prone components Massive hardware redundancy is not considered economically feasible

52 Eitan Frachtenberg MIT, 20-Sep-2004 52 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Our Approach to Fault Tolerance n Recent work shows that scalable, system-level fault-tolerance is within reach with current technology, with low overhead, can be achieved through a global operating system n Two results provide the basis for this claim 1. Buffered CoScheduling that enforces frequent, global recovery lines and global control 2. Feasibility of incremental checkpoint

53 Eitan Frachtenberg MIT, 20-Sep-2004 53 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Checkpointing and Recovery n Simplicity F Easy implementation n Cost-effective F No additional hardware support Critical aspect: Bandwidth requirements Saving process state

54 Eitan Frachtenberg MIT, 20-Sep-2004 54 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Reducing Bandwidth n Incremental checkpointing F Only the memory modified from the previous checkpoint is saved to stable storage Full Process state Incremental

55 Eitan Frachtenberg MIT, 20-Sep-2004 55 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Enabling Automatic Checkpointing Low User intervention Checkpoint data Low Hardware Operating system Run-time library Application High automatic

56 Eitan Frachtenberg MIT, 20-Sep-2004 56 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 The Bandwidth Challenge Does the current technology provide enough bandwidth? Frequent Automatic

57 Eitan Frachtenberg MIT, 20-Sep-2004 57 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Methodology n Quantifying the Bandwidth Requirements F Checkpoint intervals: 1s to 20s F Comparing with the current bandwidth available 900 MB/s 75 MB/s Sustained network bandwidth Quadrics QsNet II Single sustained disk bandwidth Ultra SCSI controller

58 Eitan Frachtenberg MIT, 20-Sep-2004 58 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Memory Footprint Sage-1000MB954.6MB Sage-500MB497.3MB Sage-100MB103.7MB Sage-50MB55MB Sweep3D105.5MB SP Class C40.1MB LU Class C16.6MB BT Class C76.5MB FT Class C118MB Increasing memory footprint 64 Itanium II processors

59 Eitan Frachtenberg MIT, 20-Sep-2004 59 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Bandwidth Requirements Bandwidth (MB/s) Timeslices (s) 78.8MB/ s 12.1MB/ s Decreases with the timeslices Sage-1000MB

60 Eitan Frachtenberg MIT, 20-Sep-2004 60 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Bandwidth Requirements for 1 second Increases with memory footprint Single SCSI disk performance Most demanding

61 Eitan Frachtenberg MIT, 20-Sep-2004 61 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Increasing Memory Footprint Size Average Bandwidth (MB/s) Timeslices (s) Increases sublinearly

62 Eitan Frachtenberg MIT, 20-Sep-2004 62 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Increasing Processor Count Average Bandwidth (MB/s) Timeslices (s) Decreases slightly with processor count Weak-scaling

63 Eitan Frachtenberg MIT, 20-Sep-2004 63 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Technological Trends Performance of applications bounded by memory improvements Increases at a faster pace Performance Improvement per year

64 Eitan Frachtenberg MIT, 20-Sep-2004 64 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Conclusions n As clusters grow, interconnection technology advances: u Better bandwidth and latency u On-board programmable processor, RAM u Hardware support for collective operations Allows the development of common system infrastructure that is a parallel program in itself

65 Eitan Frachtenberg MIT, 20-Sep-2004 65 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Conclusions (cont.) n On top of infrastructure we built: u Scalable resource management (STORM) u Novel job scheduling algorithms u Simplified system design and communication library u Possible basis for transparent fault tolerance

66 Eitan Frachtenberg MIT, 20-Sep-2004 66 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Conclusions (cont.) n Experimental performance evaluation demonstrates: u Scalable interactive job launching and context- switching u Multiprogramming parallel jobs is feasible u Adaptive scheduling algorithms adjust to different job requirements, improving response times and slowdown in various workloads u Transparent, frequent checkpoint within current reach

67 Eitan Frachtenberg MIT, 20-Sep-2004 67 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 References Eitan’s web page http://www.cs.huji.ac.il/~etcs/pubs/ Fabrizio’s web page http://www.c3.lanl.gov/~fabrizio/publications.html PAL team web page: http://www.c3.lanl.gov/par_arch/Publications.html

68 Eitan Frachtenberg MIT, 20-Sep-2004 68 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Resource Overlapping

69 Eitan Frachtenberg MIT, 20-Sep-2004 69 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Turnaround Time

70 Eitan Frachtenberg MIT, 20-Sep-2004 70 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Response Time

71 Eitan Frachtenberg MIT, 20-Sep-2004 71 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Timeslice – Bounded Slowdown

72 Eitan Frachtenberg MIT, 20-Sep-2004 72 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 FCFS vs. GS and MPL

73 Eitan Frachtenberg MIT, 20-Sep-2004 73 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 FCFS vs. GS and MPL (2)

74 Eitan Frachtenberg MIT, 20-Sep-2004 74 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Backfilling n Backfilling is a technique to move jobs forward in queue n Can be combined with time-sharing schedulers such as GS when all timeslots are full

75 Eitan Frachtenberg MIT, 20-Sep-2004 75 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Backfilling n Backfilling is a technique to move jobs forward in queue n Can be combined with time-sharing schedulers such as GS when all timeslots are full

76 Eitan Frachtenberg MIT, 20-Sep-2004 76 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Effect of Backfilling

77 Eitan Frachtenberg MIT, 20-Sep-2004 77 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Characterization Data initialization Regular processing bursts Sage-1000MB

78 Eitan Frachtenberg MIT, 20-Sep-2004 78 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Communication Interleaved Sage-1000MB Regular communication bursts


Download ppt "Eitan Frachtenberg MIT, 20-Sep-2004 1 PAL Designing Parallel Operating Systems using Modern Interconnects CCS-3 Designing Parallel Operating Systems using."

Similar presentations


Ads by Google