Extreme scale Lack of decomposition for insight Many services have centralized designs Impacts of service architectures  an open question Using Simulation.

Extreme scale Lack of decomposition for insight Many services have centralized designs Impacts of service architectures  an open question Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 2

Modular components design for composable services Explore the design space for HPC services Evaluate the impacts of different design choices Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 3

A taxonomy for classifying HPC system services A simulation tool to explore Distributed Key-Value Stores (KVS) design choices for large-scale system services An evaluation of KVS design choices for extreme-scale systems using both synthetic and real workload traces Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 4

Introduction & Motivation Key-Value Store Taxonomy Key-Value Store Simulation Evaluation Conclusions & Future Work Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 5

Job Launch, Resource Management Systems System Monitoring I/O Forwarding, File Systems Function Call Shipping Key-Value Stores Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 7

Scalability Dynamicity Fault Tolerance Consistency Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 8

Large volume of data and state information Distributed NoSQL data stores used as building blocks Examples:  Resource management (job, node status info)  Monitoring (system active logs)  File systems (metadata)  SLURM++, MATRIX [1], FusionFS [2] Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 9 [1] K. Wang, I. Raicu. “Paving the Road to exascale through Many Task Computing”, Doctor Showcase, IEEE/ACM Supercomputing 2012 (SC12) [2] D. Zhao, I. Raicu. “Distributed File Systems for Exascale Computing”, Doctor Showcase, IEEE/ACM Supercomputing 2012 (SC12)

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services Decomposition Categorization Suggestion Implication 11

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services Service model: functionality Data model: distribution and management of data Network model: dictates how the components are connected Recovery model: how to deal with component failures Consistency model: how rapidly data modifications propagate 12

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 13 Data model: centralized Network model: aggregation tree Recovery model: fail-over Consistency model: strong

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 14 Data Model: distributed with partition Network Model: fully-connected partial knowledge Recovery Model: consecutive replicas Consistency Model: strong, eventual VoldemortPastryZHT Datadistributed Network fully- connected partially- connected fully- connected Recovery n-way replications Consiste ncy eventualstrongeventual

Discrete Event Simulation  PeerSim  Evaluated others: OMNET++, OverSim, SimPy Configurable number of servers and clients Different architectures Two parallel queues in a server  Communication queue (send/receive requests)  Processing queue (process request locally) Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 16

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 17 The time to resolve a query locally (t LR ), and the time to resolve a remote query (t RR ) is given by: t LR = CS + SR + LP + SS + CR For fully connected: t RR = t LR + 2 × (SS + SR) For partially connected: t RR = t LR + 2k × (SS + SR) where k is the number of hops to find the predecessor

Defines what to do when a node fails How a node-state recovers when rejoining after failure Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 18 s 0 r 5,1 r 4,2 s 1 r 0,1 r 5,2 s 4 r 3,1 r 2,2 s 2 r 1,1 r 0,2 s 3 r 2,1 r 1,2 s 5 r 4,1 r 3,2 client EM X notify failure replicate s0 data first replica down second replica down replicate my data s 0 r 5,1 r 4,2 s 1 r 0,1 r 5,2 s 4 r 3,1 r 2,2 s 2 r 1,1 r 0,2 s 3 r 2,1 r 1,2 s 5 r 4,1 r 3,2 client EM X notify back s 0, s 4, s 5 data remove s 0 data s 0 is back remove s 5 data

Strong Consistency  Every replica observes every update in the same order  Client sends requests to a dedicated server (primary replica) Eventual Consistency  Requests are sent to randomly chosen replica (coordinator)  Three key parameters: N, R, W, satisfying R + W > N  Use Dynamo [G. Decandia, 2007] version clock to track different versions of data and detect conflicts Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 19

Evaluate the overheads  Different architectures, focus on distributed ones  Different models Light-weight simulations:  Largest experiments  25GB RAM, 40 min walltime Workloads  Synthetic workload with 64-bit key space  Real workload traces from 3 representative system services: job launch, system monitoring, and I/O forwarding Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 21

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 22 Validate against ZHT [1] (left) and Voldemort (right) ZHT  BG/P up to 8K nodes (32K cores) Voldemort  PROBE Kodiak Cluster up to 800 nodes [1] T. Li, X. Zhou, K. Brandstatter, D. Zhao, K. Wang, A. Rajendran, Z. Zhang, I. Raicu. “ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table”, IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2013

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 23 Partial connectivity  higher latency due to the additional routing Fully-connected topology  faster response (twice as fast at extreme scale)

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 24 Adding replicas always involve overheads Replicas have larger impact on fully connected than on partially connected

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 25 Higher failure frequency introduces more overhead, but the dominating factor is the client request processing messages

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 26 Eventual consistency has more overhead than the strong consistency

Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 27 Fully connected Partially connected For job launch and I/O forwarding Eventual consistency performs worse  almost URD for both request type and the key Monitoring Eventual consistency works better  all requests are “put”

ZHT (distributed key/value storage)  DKVS implementation MATRIX (runtime system)  DKVS is used to keep task meta-data SLURM++ (job management system)  DKVS is used to store task & resource information FusionFS (distributed file system)  DKVS is used to maintain file/directory meta-data Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 28

A taxonomy for classifying HPC system services A simulation tool to explore KVS design choices for large-scale system services An evaluation of KVS design choices for extreme-scale systems using both synthetic and real workload traces Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 30

Key-value Store is building block Service taxonomy is important Simulation framework to study services Distributed architecture is demanded Replication adds overhead Fully-connected topology is good  As long as the request processing message dominates Consistency tradeoffs Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 31 Write-Intensity/Availability Read-Intensity/Performance Eventual Consistency Strong Consistency Weak Consistency

Extend the simulator to cover more of the taxonomy Explore other recovery models  log-based  information dispersal algorithm Explore other consistency models Explore using DKVS in the development of: General building block library Distributed monitoring system service Distributed message queue system Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 32

DOE contract: DE-FC02-06ER25750 Part of NSF award: CNS-1042543 (PRObE) Collaboration with FusionFS project under NSF grant: NSF-1054974 BG/P resource from ANL Thanks to Tonglin Li, Dongfang Zhao, Hakan Akkan Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 33

More information: –http://datasys.cs.iit.edu/~kewang/http://datasys.cs.iit.edu/~kewang/ Contact: –kwang22@hawk.iit.edukwang22@hawk.iit.edu Questions? Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 34

Service Simulation  Peer-to-peer networks simulation  Telephony simulations  Simulation of consistency  Problem: not focus on HPC, or combine distributed features Taxonomy  Investigation of distributed hash tables, and an algorithm taxonomy  Grid computing workflows taxonomy  Problems: none of them drive features in a simulation Using Simulation to Explore Distributed Key-Value Stores for Extreme-Scale System Services 35

Extreme scale Lack of decomposition for insight Many services have centralized designs Impacts of service architectures  an open question Using Simulation.

Similar presentations

Presentation on theme: "Extreme scale Lack of decomposition for insight Many services have centralized designs Impacts of service architectures  an open question Using Simulation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Extreme scale Lack of decomposition for insight Many services have centralized designs Impacts of service architectures  an open question Using Simulation.

Similar presentations

Presentation on theme: "Extreme scale Lack of decomposition for insight Many services have centralized designs Impacts of service architectures  an open question Using Simulation."— Presentation transcript:

Similar presentations

About project

Feedback