Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.

Similar presentations


Presentation on theme: "Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1."— Presentation transcript:

1 Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers liun2@cs.rpi.edu 1

2 Outline Backgound Torus model, traffic model BG/L Ross: Massively Parallel Simulator Experiment results Future work 2

3 Background CODES: Enabling Co-Design of Multilayer Exascale Storage Architectures CODES GOAL: Develop a simulation framework for evaluating exascale storage design challenges. ⁻ Hardware Models ⁻ Storage Software Models ⁻ Storage System Architecture ⁻ Exascale I/O Workload Models ⁻ Simulation Framework - Integrate models and storage software into simulation framework 3

4 Torus Network Blue Gene and Cray XT supercomputer families adopt a 3-D torus Upcoming Blue Gene/Q will have a 5-D torus network Provide low latencies and high bandwidth at a moderate cost to construct. 4

5 Torus Traffic and Routing Using Markovian models ⁻ Each node continuously generates packets ⁻ Select random destination ⁻ Packet size fixed Dynamic routing VS. static routing −Avoid deadlocks −BGL eager/rendezvous protocols 5

6 Discrete Event Model Logic Process: Node – Events – Packet_generate_event – Packet_send_event – Packet_arrival_event – Packet_process_event 6

7 Simulation Testbed: BGL 32-bit IBM PowerPCs running at only 700 MHz 1 GB memory per node 1,024 dual processor “node” per rack 16-rack, 32,768-processor located at Rensselaer’s Computational Center for Nanotechnology Innovations (CCNI) Confusion? Simulating BGL torus on top of BGL 7

8 ROSS: Parallel Simulator Serial/Conservative/Optimistic Simulation Using Jefferson’s Time Warp event scheduling mechanism Reverse Computation 8

9 Validation Using Little’s Law Little's Law: the average number of customers in the store, L, is the effective arrival rate, λ, times the average time that a customer spends in the store 9

10 Validation Using Little’s Law 10

11 Latency Comparison: BGL vs. Simulation Using MPI Send()/MPI Recv() Collected data from 1,024-node torus in a 1x32x32 node configuration 11

12 Performance Metrics The performance study examines the impact of processor/core count on four primary metrics: (i) committed event-rate, (ii) percentage of remote events, (iii) efficiency and (iv) secondary rollbacks. 12

13 Million-Node Torus Scalability Packet injection rate 10 pkt/ms peak event-rate of 4.78 G/sec 13

14 Efficiency 14

15 Remote Event Rate Random destination selection creates a difficult scenario for parallel event scheduling because each packet randomly selects destination 15

16 Secondary Rollback Rate 16

17 Billion-Node Torus Scalability consume 2 TB memory total number of generated packets is O(10 11 ) total number of events scheduled is O(10 13 ) Packet injection rate 200 pkt/ns & 400 pkt/ns higher rollback probability larger event population leads to increased queuing overheads 17

18 Billion-Node Torus Scalability 18

19 Billion-Node Torus Scalability 19

20 Future work Application workload models: Application I/O kernel models, I/O characterization models I/O aggregator node models I/O network models: network cards, switches, and topologies I/O storage node models: storage software I/O storage software: models and prototype system software I/O controller models: RAID and enterprise storage devices Disk models: HDDs and SSDs 20

21 Future work Increase the fidelity of torus network model – Dynamic routing – Virtual channels – Different torus traffic model Tree network model based on Blue Gene families – MPI_Alltoall(), MPI_Bcast(),MPI_Reduce(); – Complex I/O workload drivers, like PHASTA 21

22 Related Work Heidelberger’s use of the YAWNS protocol to model the Blue Gene/L torus network on a per cycle basis appears to be one of the most accurate models created to date. Min and Ould Khaoua proposed a torus network model based on circuit switching. 22

23 Conslusions near linear speedup for our torus model peak event-rate on 32K cores is 4.78 G/sec demonstrated the ability to model a million-node and billion-node torus network on Blue Gene/L conducted comparison tests between actual Blue Gene torus network and our model using MPI Send()/MPI Recv() 23

24 Thank you for your attention! Questions? 24


Download ppt "Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1."

Similar presentations


Ads by Google