ZHT A Fast, Reliable and Scalable Zero-hop Distributed Hash Table

Slides:



Advertisements
Similar presentations
Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry.
Advertisements

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
High throughput chain replication for read-mostly workloads
The google file system Cs 595 Lecture 9.
Load Rebalancing for Distributed File Systems in Clouds Hung-Chang Hsiao, Member, IEEE Computer Society, Hsueh-Yi Chung, Haiying Shen, Member, IEEE, and.
Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
ZHT 1 Tonglin Li. Acknowledgements I’d like to thank Dr. Ioan Raicu for his support and advising, and the help from Raman Verma, Xi Duan, and Hui Jin.
Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 22: Overlay Networks Xiaowei Yang
Spark: Cluster Computing with Working Sets
Overview on ZHT 1.  General terms  Overview to NoSQL dabases and key-value stores  Introduction to ZHT  CS554 projects 2.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
MS CLOUD DB - AZURE SQL DB Fault Tolerance by Subha Vasudevan Christina Burnett.
CS162 Operating Systems and Systems Programming Key Value Storage Systems November 3, 2014 Ion Stoica.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
MapReduce.
1 The Google File System Reporter: You-Wei Zhang.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Extreme scale Lack of decomposition for insight Many services have centralized designs Impacts of service architectures  an open question Using Simulation.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Managing a Cloud For Multi Agent System By, Pruthvi Pydimarri, Jaya Chandra Kumar Batchu.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
Dynamo: Amazon’s Highly Available Key-value Store DeCandia, Hastorun, Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels PRESENTED.
Dynamo: Amazon’s Highly Available Key-value Store
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Research of P2P Architecture based on Cloud Computing Speaker : 吳靖緯 MA0G0101.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Bruce Hammer, Steve Wallis, Raymond Ho
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
NGMAST Mobile DHT Energy1 Optimizing Energy Consumption of Mobile Nodes in Heterogeneous Kademlia-based Distributed Hash Tables Imre Kelényi Budapest.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
BIG DATA/ Hadoop Interview Questions.
Evaluating and Optimizing Indexing Schemes for a Cloud-based Elastic Key- Value Store Apeksha Shetty and Gagan Agrawal Ohio State University David Chiu.
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
Curator: Self-Managing Storage for Enterprise Clusters
Distributed Hash Tables
Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
Gregory Kesden, CSE-291 (Cloud Computing) Fall 2016
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
湖南大学-信息科学与工程学院-计算机与科学系
EECS 498 Introduction to Distributed Systems Fall 2017
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Chord and CFS Philip Skov Knudsen
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Presentation transcript:

ZHT A Fast, Reliable and Scalable Zero-hop Distributed Hash Table Tonglin Li, Xiaobing Zhou, Kevin Brandstatter, Dongfang Zhao, Ke Wang, Zhao Zhang, Ioan Raicu Illinois Institute of Technology, Chicago, U.S.A

A supercomputer is a device for turning compute-bound problems into I/O-bound problems. Ken Batcher This reveals a fact, that the bottle neck of all large scale computing systems is IO.

Big problem: file systems scalability Parallel file system (GPFS, PVFS, Lustre) Separated computing resource from storage Centralized metadata management Distributed file system(GFS, HDFS) Specific-purposed design (MapReduce etc.)

The bottleneck of file systems Metadata Concurrent file creates

Proposed work A distributed hash table (DHT) for HEC As building block for high performance distributed systems Performance Latency Throughput Scalability Reliability

Related work: Distributed Hash Tables Many DHTs: Chord, Kademlia, Pastry, Cassandra, C-MPI, Memcached, Dynamo ... Why another? Name Impl. Routing Time Persistence Dynamic membership Append Operation Cassandra Java Log(N) Yes No C-MPI C Dynamo 0 to Log(N) Memcached ZHT C++ 0 to 2

Zero-hop hash mapping

2-layer hashing

Architecture and terms Name space: 264 Physical node Manager ZHT Instance Partition: n (fixed) n = max(k)

How many partition per node can we do?

Membership management Static: Memcached, ZHT Dynamic Logarithmic routing: most of DHTs Constant routing: ZHT

Membership management Update membership Incremental broadcasting Remap k-v pairs Traditional DHTs: rehash all influenced pairs ZHT: Moving whole partition HEC has fast local network!

Consistency Updating membership tables Updating replicas Planed nodes join and leave: strong consistency Nodes fail: eventual consistency Updating replicas Configurable Strong consistency: consistent, reliable Eventual consistency: fast, availability

Persistence: NoVoHT NoVoHT persistent in-memory hash map Append operation Live-migration

Failure handling Insert and append Lookup Remove Send it to next replica Mark this record as primary copy Lookup Get from next available replica Remove Mark record on all replicas

Evaluation: test beds IBM Blue Gene/P supercomputer Commodity Cluster Up to 8192 nodes 32768 instance deployed Commodity Cluster Up to 64 node Amazon EC2 M1.medium and Cc2.8xlarge 96 VMs, 768 ZHT instances deployed

Latency on BG/P

Latency distribution Scales 75% 90% 95% 99% 64 713 853 961 1259 256 755 933 1097 1848 1024 820 1053 1289 3105

Throughput on BG/P

Aggregated throughput on BG/P

Latency on commodity cluster

ZHT on cloud: latency

ZHT on cloud: latency distribution ZHT on cc2.8xlarge instance 8 s-c pair/instance DynamoDB read DynamoDB write ZHT 4 ~ 64 nodes 0.99 0.9 Scales 75% 90% 95% 99% Avg Throughput 8 186 199 214 260 172 46421 32 509 603 681 1114 426 75080 128 588 717 844 2071 542 236065 512 574 708 865 3568 608 841040 DynamoDB: 8 clients/instance Scales 75% 90% 95% 99% Avg Throughput 8 11942 13794 20491 35358 12169 83.39 32 10081 11324 12448 34173 9515 3363.11 128 10735 12128 16091 37009 11104 11527 512 9942 13664 30960 38077 28488 Error

ZHT on cloud: throughput

Amortized cost

Applications FusionFS IStore MATRIX A distributed file system Metadata: ZHT IStore A information dispersal storage system MATRIX A distributed many-Task computing execution framework ZHT is used to submit tasks and monitor the task execution status

FusionFS result: Concurrent File Creates

Istore results

MATRIX results

Future work Larger scale Active failure detection and informing Spanning tree communication Network topology-aware routing Fully synchronized replicas and membership: Paxos protocol More protocols support (UDT, MPI…) Many optimizations

Conclusion ZHT :A distributed Key-Value store light-weighted high performance Scalable Dynamic Fault tolerant Versatile: works from clusters, to clouds, to supercomputers

Questions? Tonglin Li tli13@hawk.iit.edu http://datasys.cs.iit.edu/projects/ZHT/