Overview on ZHT 1.  General terms  Overview to NoSQL dabases and key-value stores  Introduction to ZHT  CS554 projects 2.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Kademlia: A Peer-to-peer Information System Based on the XOR Metric Petar Mayamounkov David Mazières A few slides are taken from the authors’ original.
ZHT 1 Tonglin Li. Acknowledgements I’d like to thank Dr. Ioan Raicu for his support and advising, and the help from Raman Verma, Xi Duan, and Hui Jin.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
NoSQL Databases: MongoDB vs Cassandra
Object Naming & Content based Object Search 2/3/2003.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Group 11 Sameera Shah & Fatemah Husain [10/31/13].
Session-01. Hibernate Framework ? Why we use Hibernate ?
ZHT A Fast, Reliable and Scalable Zero-hop Distributed Hash Table
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Systems analysis and design, 6th edition Dennis, wixom, and roth
:: Conférence :: NoSQL / Scalabilite Etat de l’art Samuel BERTHE10 Mars 2014Epitech Nantes.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
NoSQL Databases Oracle - Berkeley DB Rasanjalee DM Smriti J CSC 8711 Instructor: Dr. Raj Sunderraman.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Automated P2P Backup Group 1 Anderson, Bowers, Johnson, Walker.
By Vaibhav Nachankar Arvind Dwarakanath.  HBase is an open-source, distributed, column- oriented and sorted-map data storage.  It is a Hadoop Database;
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
NOSQL Implementation and examples Maciej Matuszewski.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT IT Monitoring WG Technology for Storage/Analysis 28 November 2011.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
NoSQL databases A brief introduction NoSQL databases1.
Big Data Yuan Xue CS 292 Special topics on.
CS 440 Database Management Systems Stored procedures & OR mapping 1.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Dive into NoSQL with Azure Niels Naglé Hylke Peek.
CS 405G: Introduction to Database Systems
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
CS122B: Projects in Databases and Web Applications Winter 2017
CSE-291 Cloud Computing, Fall 2016 Kesden
Modern Databases NoSQL and NewSQL
NOSQL.
CHAPTER 3 Architectures for Distributed Systems
PHP / MySQL Introduction
NOSQL databases and Big Data Storage Systems
NoSQL Systems Overview (as of November 2011).
Massively Parallel Cloud Data Storage Systems
NOSQL and CAP Theorem.
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
April 13th – Semi-structured data
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Introduction to NoSQL Database Systems
Kademlia: A Peer-to-peer Information System Based on the XOR Metric
Presentation transcript:

Overview on ZHT 1

 General terms  Overview to NoSQL dabases and key-value stores  Introduction to ZHT  CS554 projects 2

 Relational databases  Query with SQL  DB2, MySQL, Oracle, SQL Server  CS 425, 525  NoSQL databses  Loose consistency model  Simpler design  High performance  Distributed design 3

 Key-Value store  ZHT, Dynamo, Memcached, Cassandra, Chord  Document Oriented Databases  MongoDB, Couchbase  Graph databases  Neo4J, Allegro, Virtuoso 4

 Another name for Distributed Hash Table 5

6

7

 Updating membership tables  Planed nodes join and leave: strong consistency  Nodes fail: eventual consistency  Updating replicas  Configurable  Strong consistency: consistent, reliable  Eventual consistency: fast, availability 8

 Many DHTs: Chord, Kademlia, Pastry, Cassandra, C-MPI, Memcached, Dynamo...  Why another? NameImpl. Routing Time Persistence Dynamic membership Append Operation CassandraJavaLog(N)Yes No C-MPICLog(N)No DynamoJava 0 to Log(N) Yes No MemcachedC0No ZHTC++0 to 2Yes 9

 ZHT Bench: Benchmarking mainstream NoSQL databases  ZHT Cons: Eventual consistency support for ZHT  ZHT DMHDFS: Distributed Metadata Management for the Hadoop File System  ZHT Graph: Design and implement a graph database on ZHT  ZHT OHT: Hierarchical Distributed Hash Tables  ZHT ZST: Enhance ZHT through Range Queries and Iterators 10

11

 Familiar with Linux and it’s command line  Shell scripting language (eg. Bash, zsh…)  Programming skills in C++/C (except benchmark)  GCC compiler  No object oriented skill needed 12

 Goal: Extensively benchmarking NoSQL databases and analysis performance data.  ZHT, MongoDB, Cassandra  Neo4J (experiment for Graph)  And others…  Metrics  Latency and its distribution, throughput  Parameters  Message size  Scales  Key Distributions 13

 Goal 1: allow replicas serve read operation  Goal 2: maintain eventual consistency between replicas  Goal 3: make it scale (pretty hard!)  Optional goal: allow replicas serve write requests and maintain consistency (applying Paxos protocol, even harder) 14

 What is metadata?  Goal: improve HDFS performance by adding distributed metadata service  Requirement: experience with Hadoop and HDFS; strong programming skill in both Java and C++ 15

 Goal: build a graph databases on top of ZHT  How: construct a mapping from key-value store interface to graph interface 16

 Goal: adding a proxy level to ZHT architecture so to reduce concurrency stress to each server  Easy: make it work and scale  Hard: handle failures 17

 Goal: design and implement new interface methods to ZHT  Iterator: next/previous operation  Range get/put: given a range of key, return a series of results in one request loop  How?  Sorted map  B+ tree (bold!) 18

 Communication: come and talk to me (by appointment)  Make good use of Google  Fail quick, fail early, fail cheap.  Fast iteration: very small but frequent progress  Why bother? 80% points from projects! 19

Welcome abroad and enjoy! Tonglin Li 20